src.dackar.pipelines.CustomPipelineComponents¶
Created on March, 2022
@author: wangc, mandd
Attributes¶
Functions¶
|
|
|
Normalizing Named Entities, remove the leading article and trailing particle |
|
Initialize the coreference, assign text and label to custom extension |
|
Lookup aliases and store result in |
|
propagate entity type stored in |
|
Anaphora resolution using coreferee |
|
Anaphora resolution using coreferee for Entities |
|
Expand the current entities, recursive function to extend entity with all previous NOUN |
Merge the same ID entities |
|
|
Expand the current entities |
Use pysbd as a sentencizer component for spacy |
Module Contents¶
- src.dackar.pipelines.CustomPipelineComponents.customLabel = ['STRUCTURE', 'COMPONENT', 'SYSTEM'][source]¶
- src.dackar.pipelines.CustomPipelineComponents.normEntities(doc)[source]¶
Normalizing Named Entities, remove the leading article and trailing particle
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after the normalizing named entities
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.initCoref(doc)[source]¶
Initialize the coreference, assign text and label to custom extension
ref_n
andref_t
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after the initializing coreference
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.aliasResolver(doc)[source]¶
Lookup aliases and store result in
alias
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after the alias lookup
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.propagateEntType(doc)[source]¶
propagate entity type stored in
ref_t
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after entity type extension
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.anaphorCoref(doc)[source]¶
Anaphora resolution using coreferee This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline
initCoref
to assign initial custom attributeref_n
andref_t
, then call pipelinealiasResolver
to resolve all the aliases used in the text. After all these pre-processes, we can useanaphorCoref
pipeline to resolve the coreference.- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.anaphorEntCoref(doc)[source]¶
Anaphora resolution using coreferee for Entities This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline
initCoref
to assign initial custom attributeref_n
andref_t
, then call pipelinealiasResolver
to resolve all the aliases used in the text. After all these pre-processes, we can useanaphorEntCoref
pipeline to resolve the coreference.- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.expandEntities(doc)[source]¶
Expand the current entities, recursive function to extend entity with all previous NOUN
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after expansion of current entities
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.mergeEntitiesWithSameID(doc)[source]¶
Merge the same ID entities
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after expansion of current entities
- Return type:
doc
- src.dackar.pipelines.CustomPipelineComponents.mergePhrase(doc)[source]¶
Expand the current entities This method will keep
DET
orPART
, using pipelinenormEntities
after this pipeline to remove them- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.doc.Doc, the document after merge phrase
- Return type:
doc