src.dackar.text_processing.Preprocessing¶

Created on October, 2022

@author: dgarrett622, wangc, mandd

Attributes¶

`textacyNormalize`
`textacyRemove`
`textacyReplace`
`numerizer`
`preprocessorDefaultList`
`preprocessorDefaultOptions`

Classes¶

Preprocessing

NLP Preprocessing class

Module Contents¶

src.dackar.text_processing.Preprocessing.textacyNormalize = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'unicode', 'whitespace'][source]¶

src.dackar.text_processing.Preprocessing.textacyRemove = ['accents', 'brackets', 'html_tags', 'punctuation'][source]¶

src.dackar.text_processing.Preprocessing.textacyReplace = ['currency_symbols', 'emails', 'emojis', 'hashtags', 'numbers', 'phone_numbers', 'urls', 'user_handles'][source]¶

src.dackar.text_processing.Preprocessing.numerizer = ['numerize'][source]¶

src.dackar.text_processing.Preprocessing.preprocessorDefaultList = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'whitespace',...[source]¶

src.dackar.text_processing.Preprocessing.preprocessorDefaultOptions[source]¶

class src.dackar.text_processing.Preprocessing.Preprocessing(preprocessorList=preprocessorDefaultList, preprocessorOptions=preprocessorDefaultOptions)[source]¶

Bases: object

NLP Preprocessing class

functionList = [][source]¶

preprocessorNames = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'unicode',...[source]¶

pipeline[source]¶

createTextacyNormalizeFunction(name, options)[source]¶

Creates a function from textacy.preprocessing.normalize such that only argument is a string and adds it to the functionList

Parameters:

name – str, name of the preprocessor
options – dict, dictionary of preprocessor options

Returns:

None

createTextacyRemoveFunction(name, options)[source]¶

Creates a function from textacy.preprocessing.remove such that the only argument is a string and adds it to the functionList

Parameters:

name – str, name of the preprocessor
options – dict, dictionary of preprocessor options

Returns:

None

createTextacyReplaceFunction(name, options)[source]¶

Creates a function from textacy.preprocessing.replace such that the only argument is a string and adds it to the functionList

Parameters:

name – str, name of the preprocessor
options – dict, dictionary of preprocessor options

Returns:

None

__call__(text)[source]¶

Performs the preprocessing

Parameters:: text – str, string of text to preprocess
Returns:: str, string of processed text
Return type:: processed