src.dackar.pipelines.CustomPipelineComponents¶
Created on March, 2022
@author: wangc, mandd
Attributes¶
Functions¶
  | 
|
  | 
Normalizing Named Entities, remove the leading article and trailing particle  | 
  | 
Initialize the coreference, assign text and label to custom extension   | 
  | 
Lookup aliases and store result in   | 
  | 
propagate entity type stored in   | 
  | 
Anaphora resolution using coreferee  | 
  | 
Anaphora resolution using coreferee for Entities  | 
  | 
Expand the current entities, recursive function to extend entity with all previous NOUN  | 
Merge the same ID entities  | 
|
  | 
Expand the current entities  | 
Use pysbd as a sentencizer component for spacy  | 
Module Contents¶
- src.dackar.pipelines.CustomPipelineComponents.customLabel = ['STRUCTURE', 'COMPONENT', 'SYSTEM'][source]¶
 
- src.dackar.pipelines.CustomPipelineComponents.normEntities(doc)[source]¶
 Normalizing Named Entities, remove the leading article and trailing particle
- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after the normalizing named entities
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.initCoref(doc)[source]¶
 Initialize the coreference, assign text and label to custom extension
ref_nandref_t- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after the initializing coreference
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.aliasResolver(doc)[source]¶
 Lookup aliases and store result in
alias- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after the alias lookup
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.propagateEntType(doc)[source]¶
 propagate entity type stored in
ref_t- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after entity type extension
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.anaphorCoref(doc)[source]¶
 Anaphora resolution using coreferee This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline
initCorefto assign initial custom attributeref_nandref_t, then call pipelinealiasResolverto resolve all the aliases used in the text. After all these pre-processes, we can useanaphorCorefpipeline to resolve the coreference.- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.anaphorEntCoref(doc)[source]¶
 Anaphora resolution using coreferee for Entities This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline
initCorefto assign initial custom attributeref_nandref_t, then call pipelinealiasResolverto resolve all the aliases used in the text. After all these pre-processes, we can useanaphorEntCorefpipeline to resolve the coreference.- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.expandEntities(doc)[source]¶
 Expand the current entities, recursive function to extend entity with all previous NOUN
- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after expansion of current entities
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.mergeEntitiesWithSameID(doc)[source]¶
 Merge the same ID entities
- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after expansion of current entities
- Return type:
 doc
- src.dackar.pipelines.CustomPipelineComponents.mergePhrase(doc)[source]¶
 Expand the current entities This method will keep
DETorPART, using pipelinenormEntitiesafter this pipeline to remove them- Parameters:
 doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
 spacy.tokens.doc.Doc, the document after merge phrase
- Return type:
 doc