match_evaluate takes in matches and outputs summary statistics for those matches, including the number of matches in each tier and the percent matched from each dataset.

match_evaluate(
  matches,
  data1,
  data2,
  unique_key_1,
  unique_key_2,
  suffixes = c("_1", "_1"),
  tier = "tier",
  tier_order = NULL,
  quality_vars = NULL
)

Arguments

matches

data.frame. Merged dataset.

data1

data.frame. First to-merge dataset.

data2

data.frame. Second to-merge dataset.

unique_key_1

character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields)

unique_key_2

character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields)

suffixes

character vector. Mnemonics associated data1 and data2.

tier

character vector. Default=NULL. The variable that defines a tier.

tier_order

character vector. Default= "tier". Variable that defines the order of tiers, if needed.

quality_vars

character vector. Variables you want to use to calculate the quality of each tier. Calculates mean.

Value

data.table. Table describing each tier according to aggregate_by variables and quality_vars variables.

Details

The most straightforward way to use match_evaluate is to pass it to the evaluate argument of tier_match or merge_plus. This will have merge_plus return a data.table with the evaluation information, alongside the matches themselves.

I

match_evaluate returns the number of matches in each tier, the number of unique matches in each tier, and the percent matched for each dataset. If no tiers are supplied, the entire dataset will be used as one "tier." The argument quality_vars allows for the calculation of averages of any columns in the dataset, by tier. The most straightforward case would be a matchscore, which can again all be done in merge_plus with the scoring argument. This lets you see the average matchscore by tier.

See also

merge_plus