clean_strings.Rd
clean_strings
takes a string vector and cleans it according to user-given options.
clean_strings(
string,
sp_char_words = fedmatch::sp_char_words,
common_words = NULL,
remove_char = NULL,
remove_words = FALSE,
stem = FALSE
)
character or character vector of strings
character vector. Data.frame where first column is special characters and second column is full words. The default is
data.frame. Data.frame where first column is abbreviations and second column is full words.
character vector. string of specific characters (for example, "letters") to be removed
logical. If TRUE, removes all abbreviations and replacement words in common_words
logical. If TRUE, words are stemmed
cleaned strings
This function takes a variety of options, each of which changes the behavior.
Without the default settings, clean_strings
will do the following:
make the string lowercase; replace special characters &, $, \
names ("and", "dollar", "percent", "at"); convert tabs to spaces and removes extra spaces.
This default cleaning puts the strings in a standard format to allow for easier matching.
The other options allow for the removal or replacement of other words or characters.