src.dackar.pipelines.CustomPipelineComponents

Created on March, 2022

@author: wangc, mandd

Attributes

logger

_

_

_

customLabel

aliasLookup

df

Functions

getEntID()

normEntities(doc)

Normalizing Named Entities, remove the leading article and trailing particle

initCoref(doc)

Initialize the coreference, assign text and label to custom extension ref_n and ref_t

aliasResolver(doc)

Lookup aliases and store result in alias

propagateEntType(doc)

propagate entity type stored in ref_t

anaphorCoref(doc)

Anaphora resolution using coreferee

anaphorEntCoref(doc)

Anaphora resolution using coreferee for Entities

expandEntities(doc)

Expand the current entities, recursive function to extend entity with all previous NOUN

mergeEntitiesWithSameID(doc)

Merge the same ID entities

mergePhrase(doc)

Expand the current entities

pysbdSentenceBoundaries(doc)

Use pysbd as a sentencizer component for spacy

Module Contents

src.dackar.pipelines.CustomPipelineComponents.logger[source]
src.dackar.pipelines.CustomPipelineComponents._[source]
src.dackar.pipelines.CustomPipelineComponents._[source]
src.dackar.pipelines.CustomPipelineComponents._[source]
src.dackar.pipelines.CustomPipelineComponents.customLabel = ['STRUCTURE', 'COMPONENT', 'SYSTEM'][source]
src.dackar.pipelines.CustomPipelineComponents.aliasLookup[source]
src.dackar.pipelines.CustomPipelineComponents.df[source]
src.dackar.pipelines.CustomPipelineComponents.getEntID()[source]
src.dackar.pipelines.CustomPipelineComponents.normEntities(doc)[source]

Normalizing Named Entities, remove the leading article and trailing particle

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after the normalizing named entities

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.initCoref(doc)[source]

Initialize the coreference, assign text and label to custom extension ref_n and ref_t

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after the initializing coreference

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.aliasResolver(doc)[source]

Lookup aliases and store result in alias

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after the alias lookup

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.propagateEntType(doc)[source]

propagate entity type stored in ref_t

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after entity type extension

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.anaphorCoref(doc)[source]

Anaphora resolution using coreferee This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline initCoref to assign initial custom attribute ref_n and ref_t, then call pipeline aliasResolver to resolve all the aliases used in the text. After all these pre-processes, we can use anaphorCoref pipeline to resolve the coreference.

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.anaphorEntCoref(doc)[source]

Anaphora resolution using coreferee for Entities This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline initCoref to assign initial custom attribute ref_n and ref_t, then call pipeline aliasResolver to resolve all the aliases used in the text. After all these pre-processes, we can use anaphorEntCoref pipeline to resolve the coreference.

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.expandEntities(doc)[source]

Expand the current entities, recursive function to extend entity with all previous NOUN

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after expansion of current entities

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.mergeEntitiesWithSameID(doc)[source]

Merge the same ID entities

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after expansion of current entities

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.mergePhrase(doc)[source]

Expand the current entities This method will keep DET or PART, using pipeline normEntities after this pipeline to remove them

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after merge phrase

Return type:

doc

src.dackar.pipelines.CustomPipelineComponents.pysbdSentenceBoundaries(doc)[source]

Use pysbd as a sentencizer component for spacy

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.doc.Doc, the document after process

Return type:

doc