src.dackar.pipelines.CustomPipelineComponents ============================================= .. py:module:: src.dackar.pipelines.CustomPipelineComponents .. autoapi-nested-parse:: Created on March, 2022 @author: wangc, mandd Attributes ---------- .. autoapisummary:: src.dackar.pipelines.CustomPipelineComponents.logger src.dackar.pipelines.CustomPipelineComponents._ src.dackar.pipelines.CustomPipelineComponents._ src.dackar.pipelines.CustomPipelineComponents._ src.dackar.pipelines.CustomPipelineComponents.customLabel src.dackar.pipelines.CustomPipelineComponents.aliasLookup src.dackar.pipelines.CustomPipelineComponents.df Functions --------- .. autoapisummary:: src.dackar.pipelines.CustomPipelineComponents.getEntID src.dackar.pipelines.CustomPipelineComponents.normEntities src.dackar.pipelines.CustomPipelineComponents.initCoref src.dackar.pipelines.CustomPipelineComponents.aliasResolver src.dackar.pipelines.CustomPipelineComponents.propagateEntType src.dackar.pipelines.CustomPipelineComponents.anaphorCoref src.dackar.pipelines.CustomPipelineComponents.anaphorEntCoref src.dackar.pipelines.CustomPipelineComponents.expandEntities src.dackar.pipelines.CustomPipelineComponents.mergeEntitiesWithSameID src.dackar.pipelines.CustomPipelineComponents.mergePhrase src.dackar.pipelines.CustomPipelineComponents.pysbdSentenceBoundaries Module Contents --------------- .. py:data:: logger .. py:data:: _ .. py:data:: _ .. py:data:: _ .. py:data:: customLabel :value: ['STRUCTURE', 'COMPONENT', 'SYSTEM'] .. py:data:: aliasLookup .. py:data:: df .. py:function:: getEntID() .. py:function:: normEntities(doc) Normalizing Named Entities, remove the leading article and trailing particle :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after the normalizing named entities :rtype: doc .. py:function:: initCoref(doc) Initialize the coreference, assign text and label to custom extension ``ref_n`` and ``ref_t`` :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after the initializing coreference :rtype: doc .. py:function:: aliasResolver(doc) Lookup aliases and store result in ``alias`` :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after the alias lookup :rtype: doc .. py:function:: propagateEntType(doc) propagate entity type stored in ``ref_t`` :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after entity type extension :rtype: doc .. py:function:: anaphorCoref(doc) Anaphora resolution using coreferee This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline ``initCoref`` to assign initial custom attribute ``ref_n`` and ``ref_t``, then call pipeline ``aliasResolver`` to resolve all the aliases used in the text. After all these pre-processes, we can use ``anaphorCoref`` pipeline to resolve the coreference. :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee :rtype: doc .. py:function:: anaphorEntCoref(doc) Anaphora resolution using coreferee for Entities This pipeline need to be added after NER. The assumption here is: The entities need to be recognized first, then call pipeline ``initCoref`` to assign initial custom attribute ``ref_n`` and ``ref_t``, then call pipeline ``aliasResolver`` to resolve all the aliases used in the text. After all these pre-processes, we can use ``anaphorEntCoref`` pipeline to resolve the coreference. :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee :rtype: doc .. py:function:: expandEntities(doc) Expand the current entities, recursive function to extend entity with all previous NOUN :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after expansion of current entities :rtype: doc .. py:function:: mergeEntitiesWithSameID(doc) Merge the same ID entities :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after expansion of current entities :rtype: doc .. py:function:: mergePhrase(doc) Expand the current entities This method will keep ``DET`` or ``PART``, using pipeline ``normEntities`` after this pipeline to remove them :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after merge phrase :rtype: doc .. py:function:: pysbdSentenceBoundaries(doc) Use pysbd as a sentencizer component for spacy :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.doc.Doc, the document after process :rtype: doc