match_evaluate.Rd
match_evaluate
takes in matches and outputs summary statistics for those matches, including
the number of matches in each tier and the percent matched from each dataset.
match_evaluate(
matches,
data1,
data2,
unique_key_1,
unique_key_2,
suffixes = c("_1", "_1"),
tier = "tier",
tier_order = NULL,
quality_vars = NULL
)
data.frame. Merged dataset.
data.frame. First to-merge dataset.
data.frame. Second to-merge dataset.
character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields)
character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields)
character vector. Mnemonics associated data1 and data2.
character vector. Default=NULL. The variable that defines a tier.
character vector. Default= "tier". Variable that defines the order of tiers, if needed.
character vector. Variables you want to use to calculate the quality of each tier. Calculates mean.
data.table. Table describing each tier according to aggregate_by variables and quality_vars variables.
The most straightforward way to use match_evaluate
is to pass it to the evaluate
argument of tier_match
or merge_plus
. This will have merge_plus
return a data.table with the evaluation information, alongside the matches themselves.
I
match_evaluate
returns the number of matches in each tier, the number of
unique matches in each tier, and the percent matched for each dataset. If no tiers are supplied,
the entire dataset will be used as one "tier."
The argument quality_vars
allows for the calculation of averages of any columns in the dataset, by tier.
The most straightforward case would be a matchscore, which can again all be done
in merge_plus
with the scoring argument. This lets you see the average matchscore by tier.
merge_plus