src.dackar.utils.nlp.nlp_utils¶
Created on March, 2022
@author: wangc, mandd
Attributes¶
Functions¶
|
Generate data frame for visualization of spaCy doc with custom attributes. |
|
remove all custom pipes, and add new pipes |
|
Utility function to pretty print the dependency tree. |
|
|
|
Lammatize the variable list |
|
Generate entity pattern |
|
Generate a list of entity patterns |
|
Extend the doc's entity |
Module Contents¶
- src.dackar.utils.nlp.nlp_utils.displayNER(doc, includePunct=False)[source]¶
Generate data frame for visualization of spaCy doc with custom attributes.
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
includePunct – bool, True if the punctuaction is included
- Returns:
pandas.DataFrame, data frame contains attributes of tokens
- Return type:
df
- src.dackar.utils.nlp.nlp_utils.resetPipeline(nlp, pipes)[source]¶
remove all custom pipes, and add new pipes
- Parameters:
nlp – spacy.Language object, contains all components and data needed to process text
pipes – list, list of pipes that will be added to nlp pipeline
- Returns:
spacy.Language object, contains updated components and data needed to process text
- Return type:
nlp
- src.dackar.utils.nlp.nlp_utils.printDepTree(doc, skipPunct=True)[source]¶
Utility function to pretty print the dependency tree.
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
skipPunct – bool, True if skip punctuactions
- Returns:
None
- src.dackar.utils.nlp.nlp_utils.plotDAG(edges, colors='k')[source]¶
- Parameters:
edges – list of tuples, [(subj, conj), (..,..)] or [(subj, conj, {“color”:”blue”}), (..,..)]
colors – str or list, list of colors
- src.dackar.utils.nlp.nlp_utils.extractLemma(var, nlp)[source]¶
Lammatize the variable list
- Parameters:
var – str, string
nlp – object, preloaded nlp model
- Returns:
list, list of lammatized variables
- Return type:
lemVar
- src.dackar.utils.nlp.nlp_utils.generatePattern(form, label, id, attr='LOWER')[source]¶
Generate entity pattern
- Parameters:
form – str or list, the given str or list of lemmas that will be used to generate pattern
label – str, the label name for the pattern
id – str, the id name for the pattern
attr – str, attribute used for the pattern, either “LOWER” or “LEMMA”
- Returns:
dict, pattern will be used by entity matcher
- Return type:
pattern
- src.dackar.utils.nlp.nlp_utils.generatePatternList(entList, label, id, nlp, attr='LOWER')[source]¶
Generate a list of entity patterns
- Parameters:
entList – list, list of entities
label – str, the label name for the pattern
id – str, the id name for the pattern
attr – str, attribute used for the pattern, either “LOWER” or “LEMMA”
- Returns:
ptnList, list, list of patterns will be used by entity matcher
- src.dackar.utils.nlp.nlp_utils.extendEnt(matcher, doc, i, matches)[source]¶
Extend the doc’s entity
- Parameters:
matcher – spacy.Matcher, the spacy matcher instance
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
i – int, index of the current match (matches[i])
matches – List[Tuple[int, int, int]], a list of (match_id, start, end) tuples, describing
doc[start (the matches. A match tuple describes a span) – end]