src.dackar.pipelines.CustomPipelineComponents
=============================================

.. py:module:: src.dackar.pipelines.CustomPipelineComponents

.. autoapi-nested-parse::

   Created on March, 2022

   @author: wangc, mandd


Attributes
----------

.. autoapisummary::

   src.dackar.pipelines.CustomPipelineComponents.logger
   src.dackar.pipelines.CustomPipelineComponents._
   src.dackar.pipelines.CustomPipelineComponents._
   src.dackar.pipelines.CustomPipelineComponents._
   src.dackar.pipelines.CustomPipelineComponents.customLabel
   src.dackar.pipelines.CustomPipelineComponents.aliasLookup
   src.dackar.pipelines.CustomPipelineComponents.df


Functions
---------

.. autoapisummary::

   src.dackar.pipelines.CustomPipelineComponents.getEntID
   src.dackar.pipelines.CustomPipelineComponents.normEntities
   src.dackar.pipelines.CustomPipelineComponents.initCoref
   src.dackar.pipelines.CustomPipelineComponents.aliasResolver
   src.dackar.pipelines.CustomPipelineComponents.propagateEntType
   src.dackar.pipelines.CustomPipelineComponents.anaphorCoref
   src.dackar.pipelines.CustomPipelineComponents.anaphorEntCoref
   src.dackar.pipelines.CustomPipelineComponents.expandEntities
   src.dackar.pipelines.CustomPipelineComponents.mergeEntitiesWithSameID
   src.dackar.pipelines.CustomPipelineComponents.mergePhrase
   src.dackar.pipelines.CustomPipelineComponents.pysbdSentenceBoundaries


Module Contents
---------------

.. py:data:: logger

.. py:data:: _

.. py:data:: _

.. py:data:: _

.. py:data:: customLabel
   :value: ['STRUCTURE', 'COMPONENT', 'SYSTEM']


.. py:data:: aliasLookup

.. py:data:: df

.. py:function:: getEntID()

   
.. py:function:: normEntities(doc)

   Normalizing Named Entities, remove the leading article and trailing particle

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after the normalizing named entities
   :rtype: doc


.. py:function:: initCoref(doc)

   Initialize the coreference, assign text and label to custom extension ``ref_n`` and ``ref_t``

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after the initializing coreference
   :rtype: doc


.. py:function:: aliasResolver(doc)

   Lookup aliases and store result in ``alias``

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after the alias lookup
   :rtype: doc


.. py:function:: propagateEntType(doc)

   propagate entity type stored in ``ref_t``

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after entity type extension
   :rtype: doc


.. py:function:: anaphorCoref(doc)

   Anaphora resolution using coreferee
   This pipeline need to be added after NER.
   The assumption here is: The entities need to be recognized first, then call
   pipeline ``initCoref`` to assign initial custom attribute ``ref_n`` and ``ref_t``,
   then call pipeline ``aliasResolver`` to resolve all the aliases used in the text.
   After all these pre-processes, we can use ``anaphorCoref`` pipeline to resolve the
   coreference.

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
   :rtype: doc


.. py:function:: anaphorEntCoref(doc)

   Anaphora resolution using coreferee for Entities
   This pipeline need to be added after NER.
   The assumption here is: The entities need to be recognized first, then call
   pipeline ``initCoref`` to assign initial custom attribute ``ref_n`` and ``ref_t``,
   then call pipeline ``aliasResolver`` to resolve all the aliases used in the text.
   After all these pre-processes, we can use ``anaphorEntCoref`` pipeline to resolve the
   coreference.

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after the anaphora resolution using coreferee
   :rtype: doc


.. py:function:: expandEntities(doc)

   Expand the current entities, recursive function to extend entity with all previous NOUN

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after expansion of current entities
   :rtype: doc


.. py:function:: mergeEntitiesWithSameID(doc)

   Merge the same ID entities

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after expansion of current entities
   :rtype: doc


.. py:function:: mergePhrase(doc)

   Expand the current entities
   This method will keep ``DET`` or ``PART``, using pipeline ``normEntities`` after this pipeline to remove them

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after merge phrase
   :rtype: doc


.. py:function:: pysbdSentenceBoundaries(doc)

   Use pysbd as a sentencizer component for spacy

   :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

   :returns: spacy.tokens.doc.Doc, the document after process
   :rtype: doc