src.dackar.text_processing.Preprocessing

Created on October, 2022

@author: dgarrett622, wangc, mandd

Attributes

textacyNormalize

textacyRemove

textacyReplace

numerizer

preprocessorDefaultList

preprocessorDefaultOptions

Classes

Preprocessing

NLP Preprocessing class

Module Contents

src.dackar.text_processing.Preprocessing.textacyNormalize = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'unicode', 'whitespace'][source]
src.dackar.text_processing.Preprocessing.textacyRemove = ['accents', 'brackets', 'html_tags', 'punctuation'][source]
src.dackar.text_processing.Preprocessing.textacyReplace = ['currency_symbols', 'emails', 'emojis', 'hashtags', 'numbers', 'phone_numbers', 'urls', 'user_handles'][source]
src.dackar.text_processing.Preprocessing.numerizer = ['numerize'][source]
src.dackar.text_processing.Preprocessing.preprocessorDefaultList = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'whitespace',...[source]
src.dackar.text_processing.Preprocessing.preprocessorDefaultOptions[source]
class src.dackar.text_processing.Preprocessing.Preprocessing(preprocessorList=preprocessorDefaultList, preprocessorOptions=preprocessorDefaultOptions)[source]

Bases: object

NLP Preprocessing class

functionList = [][source]
preprocessorNames = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'unicode',...[source]
pipeline[source]
createTextacyNormalizeFunction(name, options)[source]

Creates a function from textacy.preprocessing.normalize such that only argument is a string and adds it to the functionList

Parameters:
  • name – str, name of the preprocessor

  • options – dict, dictionary of preprocessor options

Returns:

None

createTextacyRemoveFunction(name, options)[source]

Creates a function from textacy.preprocessing.remove such that the only argument is a string and adds it to the functionList

Parameters:
  • name – str, name of the preprocessor

  • options – dict, dictionary of preprocessor options

Returns:

None

createTextacyReplaceFunction(name, options)[source]

Creates a function from textacy.preprocessing.replace such that the only argument is a string and adds it to the functionList

Parameters:
  • name – str, name of the preprocessor

  • options – dict, dictionary of preprocessor options

Returns:

None

__call__(text)[source]

Performs the preprocessing

Parameters:

text – str, string of text to preprocess

Returns:

str, string of processed text

Return type:

processed