evaluate a matched dataset

match_evaluate takes in matches and outputs summary statistics for those matches, including the number of matches in each tier and the percent matched from each dataset.

match_evaluate(
  matches,
  data1,
  data2,
  unique_key_1,
  unique_key_2,
  suffixes = c("_1", "_1"),
  tier = "tier",
  tier_order = NULL,
  quality_vars = NULL
)

Arguments

matches: data.frame. Merged dataset.
data1: data.frame. First to-merge dataset.
data2: data.frame. Second to-merge dataset.
unique_key_1: character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields)
unique_key_2: character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields)
suffixes: character vector. Mnemonics associated data1 and data2.
tier: character vector. Default=NULL. The variable that defines a tier.
tier_order: character vector. Default= "tier". Variable that defines the order of tiers, if needed.
quality_vars: character vector. Variables you want to use to calculate the quality of each tier. Calculates mean.

Value

data.table. Table describing each tier according to aggregate_by variables and quality_vars variables.

Details

The most straightforward way to use match_evaluate is to pass it to the evaluate argument of tier_match or merge_plus. This will have merge_plus return a data.table with the evaluation information, alongside the matches themselves.

match_evaluate returns the number of matches in each tier, the number of unique matches in each tier, and the percent matched for each dataset. If no tiers are supplied, the entire dataset will be used as one "tier." The argument quality_vars allows for the calculation of averages of any columns in the dataset, by tier. The most straightforward case would be a matchscore, which can again all be done in merge_plus with the scoring argument. This lets you see the average matchscore by tier.

Arguments

Value

Details

See also