clean_strings takes a string vector and cleans it according to user-given options.

clean_strings(
  string,
  sp_char_words = fedmatch::sp_char_words,
  common_words = NULL,
  remove_char = NULL,
  remove_words = FALSE,
  stem = FALSE
)

Arguments

string

character or character vector of strings

sp_char_words

character vector. Data.frame where first column is special characters and second column is full words. The default is

common_words

data.frame. Data.frame where first column is abbreviations and second column is full words.

remove_char

character vector. string of specific characters (for example, "letters") to be removed

remove_words

logical. If TRUE, removes all abbreviations and replacement words in common_words

stem

logical. If TRUE, words are stemmed

Value

cleaned strings

Details

This function takes a variety of options, each of which changes the behavior. Without the default settings, clean_strings will do the following: make the string lowercase; replace special characters &, $, \ names ("and", "dollar", "percent", "at"); convert tabs to spaces and removes extra spaces. This default cleaning puts the strings in a standard format to allow for easier matching.

The other options allow for the removal or replacement of other words or characters.