src.dackar.text_processing.Preprocessing¶
Created on October, 2022
@author: dgarrett622, wangc, mandd
Attributes¶
Classes¶
NLP Preprocessing class |
Module Contents¶
- src.dackar.text_processing.Preprocessing.textacyNormalize = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'unicode', 'whitespace'][source]¶
- src.dackar.text_processing.Preprocessing.textacyRemove = ['accents', 'brackets', 'html_tags', 'punctuation'][source]¶
- src.dackar.text_processing.Preprocessing.textacyReplace = ['currency_symbols', 'emails', 'emojis', 'hashtags', 'numbers', 'phone_numbers', 'urls', 'user_handles'][source]¶
- src.dackar.text_processing.Preprocessing.preprocessorDefaultList = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'whitespace',...[source]¶
- class src.dackar.text_processing.Preprocessing.Preprocessing(preprocessorList=preprocessorDefaultList, preprocessorOptions=preprocessorDefaultOptions)[source]¶
Bases:
object
NLP Preprocessing class
- preprocessorNames = ['bullet_points', 'hyphenated_words', 'quotation_marks', 'repeating_chars', 'unicode',...[source]¶
- createTextacyNormalizeFunction(name, options)[source]¶
Creates a function from textacy.preprocessing.normalize such that only argument is a string and adds it to the functionList
- Parameters:
name – str, name of the preprocessor
options – dict, dictionary of preprocessor options
- Returns:
None
- createTextacyRemoveFunction(name, options)[source]¶
Creates a function from textacy.preprocessing.remove such that the only argument is a string and adds it to the functionList
- Parameters:
name – str, name of the preprocessor
options – dict, dictionary of preprocessor options
- Returns:
None