src.dackar.pipelines.CustomPipelineComponents¶

Created on March, 2022

@author: wangc, mandd

Attributes¶

`logger`
`_`
`_`
`_`
`customLabel`
`aliasLookup`
`df`

Functions¶

`getEntID`()
`normEntities`(doc)	Normalizing Named Entities, remove the leading article and trailing particle
`initCoref`(doc)	Initialize the coreference, assign text and label to custom extension `ref_n` and `ref_t`
`aliasResolver`(doc)	Lookup aliases and store result in `alias`
`propagateEntType`(doc)	propagate entity type stored in `ref_t`
`anaphorCoref`(doc)	Anaphora resolution using coreferee
`anaphorEntCoref`(doc)	Anaphora resolution using coreferee for Entities
`expandEntities`(doc)	Expand the current entities, recursive function to extend entity with all previous NOUN
`mergeEntitiesWithSameID`(doc)	Merge the same ID entities
`mergePhrase`(doc)	Expand the current entities
`pysbdSentenceBoundaries`(doc)	Use pysbd as a sentencizer component for spacy

Module Contents¶

src.dackar.pipelines.CustomPipelineComponents.logger[source]¶

src.dackar.pipelines.CustomPipelineComponents._[source]¶

src.dackar.pipelines.CustomPipelineComponents._[source]¶

src.dackar.pipelines.CustomPipelineComponents._[source]¶

src.dackar.pipelines.CustomPipelineComponents.customLabel = ['STRUCTURE', 'COMPONENT', 'SYSTEM'][source]¶

src.dackar.pipelines.CustomPipelineComponents.aliasLookup[source]¶

src.dackar.pipelines.CustomPipelineComponents.df[source]¶

src.dackar.pipelines.CustomPipelineComponents.getEntID()[source]¶

src.dackar.pipelines.CustomPipelineComponents.normEntities(doc)[source]¶

Normalizing Named Entities, remove the leading article and trailing particle

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after the normalizing named entities
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.initCoref(doc)[source]¶

Initialize the coreference, assign text and label to custom extension ref_n and ref_t

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after the initializing coreference
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.aliasResolver(doc)[source]¶

Lookup aliases and store result in alias

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after the alias lookup
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.propagateEntType(doc)[source]¶

propagate entity type stored in ref_t

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after entity type extension
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.anaphorCoref(doc)[source]¶

Anaphora resolution using coreferee This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline initCoref to assign initial custom attribute ref_n and ref_t, then call pipeline aliasResolver to resolve all the aliases used in the text. After all these pre-processes, we can use anaphorCoref pipeline to resolve the coreference.

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.anaphorEntCoref(doc)[source]¶

Anaphora resolution using coreferee for Entities This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline initCoref to assign initial custom attribute ref_n and ref_t, then call pipeline aliasResolver to resolve all the aliases used in the text. After all these pre-processes, we can use anaphorEntCoref pipeline to resolve the coreference.

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.expandEntities(doc)[source]¶

Expand the current entities, recursive function to extend entity with all previous NOUN

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after expansion of current entities
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.mergeEntitiesWithSameID(doc)[source]¶

Merge the same ID entities

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after expansion of current entities
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.mergePhrase(doc)[source]¶

Expand the current entities This method will keep DET or PART, using pipeline normEntities after this pipeline to remove them

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after merge phrase
Return type:: doc

src.dackar.pipelines.CustomPipelineComponents.pysbdSentenceBoundaries(doc)[source]¶

Use pysbd as a sentencizer component for spacy

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.doc.Doc, the document after process
Return type:: doc