tier_match.Rd
Constructs a tier_match by running merge_plus
with different parameters sequentially
on the same data. Allows for sequential removal of observations after each tier.
tier_match(
data1,
data2,
by = NULL,
by.x = NULL,
by.y = NULL,
suffixes = c("_1", "_2"),
check_merge = TRUE,
unique_key_1,
unique_key_2,
tiers = list(),
takeout = "both",
match_type = "exact",
clean = FALSE,
clean_settings = build_clean_settings(),
score_settings = NULL,
filter = NULL,
filter.args = list(),
evaluate = match_evaluate,
evaluate.args = list(),
allow.cartesian = TRUE,
fuzzy_settings = build_fuzzy_settings(),
multivar_settings = build_multivar_settings(),
verbose = FALSE
)
data.frame. First to-merge dataset.
data.frame. Second to-merge dataset.
character string. Variables to merge on (common across data 1 and data 2). See merge
character string. Variable to merge on in data1. See merge
character string. Variable to merge on in data2. See merge
see merge
logical. Checks that your unique_keys are indeed unique, and prevents merge from running if merge would result in data.frames larger than 5 million rows
character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields)
character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields)
list(). tier is a list of lists, where each list holds the parameters for creating that tier. All arguments to tier_match listed after this argument can either be supplied directly to tier_match, or indirectly via tiers.
character vector, either 'data1', 'data2', 'both', or 'neither'. Removes observations after each tier from the selected dataset.
string. If 'exact', match is exact, if 'fuzzy', match is fuzzy.
Boolean, T/F, whether or not to clean strings prior to the match.
list. Settings for string cleaning. See clean_strings
and build_clean_settings
.
list. Settings for post-hoc matchscoring. See build_score_settings
.
function or numeric. Filters a merged data1-data2 dataset. If a function, should take in a data.frame (data1 and data2 merged by name1 and name2) and spit out a trimmed verion of the data.frame (fewer rows). Think of this function as applying other conditions to matches, other than a match by name. The first argument of filter should be the data.frame. If numeric, will drop all observations with a matchscore lower than or equal to filter.
list. Arguments passed to filter, if a function
Function to evalute merge_plus output. see evaluate_match
.
list. Arguments passed to function specified by evaluate
whether or not to allow many-many matches, see data.table::merge()
additional arguments for amatch, to be used if match_type = 'fuzzy'. Suggested defaults provided. (see amatch, method='jw')
list of settings to go to the multivar match if match_type
== 'multivar'. See multivar-match
.
boolean, whether or not to print tier names and time to match each tier as the matching happens.
list with matches, data1 and data2 minus matches, and match evaluation
See the tier match vignette to get a clear understanding of the tier_match syntax.
merge_plus clean_strings