merge_plus is a wrapper for a standard merge, a fuzzy string match, and a a ``multivar'' match based on several columns of the data. Parameters allow for control for fine-tuning of the match. This is primarily used as the workhorse for the tier_match function.

merge_plus(
  data1,
  data2,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  suffixes = c("_1", "_2"),
  check_merge = TRUE,
  unique_key_1,
  unique_key_2,
  match_type = "exact",
  fuzzy_settings = build_fuzzy_settings(),
  score_settings = NULL,
  filter = NULL,
  filter.args = list(),
  evaluate = match_evaluate,
  evaluate.args = list(),
  allow.cartesian = FALSE,
  multivar_settings = build_multivar_settings()
)

Arguments

data1

data.frame. First to-merge dataset (ordering matters - see Fuzzy Matching vignette.)

data2

data.frame. Second to-merge dataset.

by

character string. Variables to merge on (common across data 1 and data 2). See merge

by.x

length-1 character vector. Variable to merge on in data1. See merge

by.y

length-1 character vector. Variable to merge on in data2. See merge

suffixes

character vector with length==2. Suffix to add to like named variables after the merge. See merge

check_merge

logical. Checks that your unique_keys are indeed unique.

unique_key_1

character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields)

unique_key_2

character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields)

match_type

string. If 'exact', match is exact, if 'fuzzy', match is fuzzy. If 'multivar,' match is multivar-based. See multivar_match,

fuzzy_settings

additional arguments for amatch, to be used if match_type = 'fuzzy'. Suggested defaults provided. See build_fuzzy_settings.

score_settings

list. Score settings for post-hoc matchscores. See build_score_settings

filter

function or numeric. Filters a merged data1-data2 dataset. If a function, should take in a data.frame (data1 and data2 merged by name1 and name2) and spit out a trimmed verion of the data.frame (fewer rows). Think of this function as applying other conditions to matches, other than a match by name. The first argument of filter should be the data.frame. If numeric, will drop all observations with a matchscore lower than or equal to filter.

filter.args

list. Arguments passed to filter, if a function

evaluate

Function to evalute merge_plus output.

evaluate.args

list. Arguments passed to evaluate

allow.cartesian

whether or not to allow many-many matches, see data.table::merge()

multivar_settings

list of settings to go to the multivar match if match_type == 'multivar'. See multivar-match and build_multivar_settings.

Value

list with matches, filtered matches (if applicable), data1 and data2 minus matches, and match evaluation

See also

match_evaluate