src.dackar.workflows.WorkflowBase
=================================

.. py:module:: src.dackar.workflows.WorkflowBase

.. autoapi-nested-parse::

   Created on April, 2024

   @author: wangc, mandd


Attributes
----------

.. autoapisummary::

   src.dackar.workflows.WorkflowBase.logger
   src.dackar.workflows.WorkflowBase._corefAvail
   src.dackar.workflows.WorkflowBase.ver


Classes
-------

.. autoapisummary::

   src.dackar.workflows.WorkflowBase.WorkflowBase


Module Contents
---------------

.. py:data:: logger

.. py:data:: _corefAvail
   :value: False


.. py:data:: ver

.. py:class:: WorkflowBase(nlp, entID='SSC', causalKeywordID='causal', *args, **kwargs)

   Bases: :py:obj:`object`


   Base Class for Workflow Analysis


   .. py:attribute:: type
      :value: 'WorkflowBase'


   .. py:attribute:: name
      :value: 'WorkflowBase'


   .. py:attribute:: nlp


   .. py:attribute:: _causalFile


   .. py:attribute:: _causalPOS


   .. py:attribute:: _causalKeywords


   .. py:attribute:: _statusFile


   .. py:attribute:: _statusKeywords


   .. py:attribute:: _updateStatusKeywords
      :value: False


   .. py:attribute:: _updateCausalKeywords
      :value: False


   .. py:attribute:: _conjectureFile


   .. py:attribute:: _conjectureKeywords


   .. py:attribute:: _doc
      :value: None


   .. py:attribute:: entityRuler
      :value: None


   .. py:attribute:: _entityRuler
      :value: False


   .. py:attribute:: _entityRulerMatches
      :value: []


   .. py:attribute:: _matchedSents
      :value: []


   .. py:attribute:: _matchedSentsForVis
      :value: []


   .. py:attribute:: _visualizeMatchedSents
      :value: True


   .. py:attribute:: _coref
      :value: False


   .. py:attribute:: _entityLabels


   .. py:attribute:: _entID
      :value: 'SSC'


   .. py:attribute:: _causalKeywordID
      :value: 'causal'


   .. py:attribute:: _causalNames
      :value: ['cause', 'cause health status', 'causal keyword', 'effect', 'effect health status', 'sentence',...


   .. py:attribute:: _extractedCausals
      :value: []


   .. py:attribute:: _causalSentsNoEnts
      :value: []


   .. py:attribute:: _rawCausalList
      :value: []


   .. py:attribute:: _causalSentsOneEnt
      :value: []


   .. py:attribute:: _entHS
      :value: None


   .. py:attribute:: _entStatus
      :value: None


   .. py:attribute:: _screen
      :value: False


   .. py:attribute:: dataframeRelations
      :value: None


   .. py:attribute:: dataframeEntities
      :value: None


   .. py:attribute:: _textProcess


   .. py:method:: reset()

      Reset rule-based matcher


   .. py:method:: textProcess()

      Function to clean text

      :param None:

      :returns: procObj, DACKAR.Preprocessing object


   .. py:method:: getKeywords(filename, columnNames=None)

      Get the keywords from given file

      :param filename: str, the file name to read the keywords

      :returns: dict, dictionary contains the keywords
      :rtype: kw


   .. py:method:: extractLemma(varList)

      Lammatize the variable list

      :param varList: list, list of variables

      :returns: list, list of lammatized variables
      :rtype: lemmaList


   .. py:method:: addKeywords(keywords, ktype)

      Method to update self._causalKeywords or self._statusKeywords

      :param keywords: dict, keywords that will be add to self._causalKeywords or self._statusKeywords
      :param ktype: string, either 'status' or 'causal'


   .. py:method:: addEntityPattern(name, patternList)

      Add entity pattern, to extend doc.ents, similar function to self.extendEnt

      :param name: str, the name for the entity pattern.
      :param patternList: list, the pattern list, for example:
      :param {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}


   .. py:method:: __call__(text, extract=True, screen=False)

      Find all token sequences matching the supplied pattern

      :param text: string, the text that need to be processed

      :returns: None


   .. py:method:: extractInformation()
      :abstractmethod:


      extract information

      :param None:

      :returns: None


   .. py:method:: visualize()

      Visualize the processed document

      :param None:

      :returns: None


   .. py:method:: isPassive(token)

      Check the passiveness of the token

      :param token: spacy.tokens.Token, the token of the doc

      :returns: True, if the token is passive
      :rtype: isPassive


   .. py:method:: isConjecture(token)

      Check the conjecture of the token

      :param token: spacy.tokens.Token, the token of the doc, the token should be the root of the Doc

      :returns: True, if the token/sentence indicates conjecture
      :rtype: isConjecture


   .. py:method:: isNegation(token)

      Check negation status of given token

      :param token: spacy.tokens.Token, token from spacy.tokens.doc.Doc

      :returns: tuple, the negation status and the token text
      :rtype: (neg, text)


   .. py:method:: findVerb(doc)

      Find the first verb in the doc

      :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

      :returns: spacy.tokens.Token, the token that has VERB pos
      :rtype: token


   .. py:method:: getCustomEnts(ents, labels)

      Get the custom entities

      :param ents: list, all entities from the processed doc
      :param labels: list, list of labels to be used to get the custom entities out of "ents"

      :returns: list, the customEnts associates with the "labels"
      :rtype: customEnts


   .. py:method:: getPhrase(ent, start, end, include=False)

      Get the phrase for ent with all left children

      :param ent: Span, the ent to amend with all left children
      :param start: int, the start index of ent
      :param end: int, the end index of ent
      :param include: bool, include ent in the returned expression if True

      :returns: Span or Token, the identified status
      :rtype: status


   .. py:method:: getAmod(ent, start, end, include=False)

      Get amod tokens for ent

      :param ent: Span, the ent to amend with all left children
      :param start: int, the start index of ent
      :param end: int, the end index of ent
      :param include: bool, include ent in the returned expression if True

      :returns: Span or Token, the identified status
      :rtype: status


   .. py:method:: getAmodOnly(ent)

      Get amod tokens texts for ent

      :param ent: Span, the ent to amend with all left children

      :returns: list, the list of amods for ent
      :rtype: amod


   .. py:method:: getCompoundOnly(headEnt, ent)

      Get the compounds for headEnt except ent

      :param headEnt: Span, the head entity to ent

      :returns: list, the list of compounds for head ent
      :rtype: compDes


   .. py:method:: getNbor(token)

      Method to get the nbor from token, return None if nbor is not exist

      :param token: Token, the provided Token to request nbor

      :returns: Token, the requested nbor
      :rtype: nbor


   .. py:method:: validSent(sent)

      Check if the sentence has valid structure, either contains subject or object

      :param sent: Span, sentence from user provided text

      :returns: bool, False if the sentence has no subject and object.
      :rtype: valid


   .. py:method:: findLeftSubj(pred, passive)

      Find closest subject in predicates left subtree or
      predicates parent's left subtree (recursive).
      Has a filter on organizations.

      :param pred: spacy.tokens.Token, the predicate token
      :param passive: bool, True if passive

      :returns: spacy.tokens.Token, the token that represent subject
      :rtype: subj


   .. py:method:: findRightObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl', 'oprd'], exclPrepos=[])

      Find closest object in predicates right subtree.
      Skip prepositional objects if the preposition is in exclude list.
      Has a filter on organizations.

      :param pred: spacy.tokens.Token, the predicate token
      :param exclPrepos: list, list of the excluded prepositions


   .. py:method:: findRightKeyword(pred, exclPrepos=[])

      Find
      Skip prepositional objects if the preposition is in exclude list.
      Has a filter on organizations.

      :param pred: spacy.tokens.Token, the predicate token
      :param exclPrepos: list, list of the excluded prepositions


   .. py:method:: findHealthStatus(root, deps)

      Return first child of root (included) that matches
      dependency list by breadth first search.
      Search stops after first dependency match if firstDepOnly
      (used for subject search - do not "jump" over subjects)

      :param root: spacy.tokens.Token, the root token
      :param deps: list, the dependency list

      :returns: token, the token represents the health status
      :rtype: child


   .. py:method:: isValidCausalEnts(ent)

      Check the entity if it belongs to the valid causal entities

        Args:

          ent: list, list of entities

        Returns:

          valid: bool, valid cansual ent if True


   .. py:method:: getIndex(ent, entList)

      Get index for ent in entList

      :param ent: Span, ent that is used to get index
      :param entList: list, list of entities

      :returns: int, the index for ent
      :rtype: idx


   .. py:method:: getConjuncts(entList)

      Get a list of conjuncts from entity list

      :param entList: list, list of entities

      :returns: list, list of conjuncts
      :rtype: conjunctList


   .. py:method:: collectSents(doc)

      collect data of matched sentences that can be used for visualization

        Args:
          doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines


   .. py:method:: extract(sents, predSynonyms=[], exclPrepos=[])

      General extraction method

      :param sents: list, the list of sentences
      :param predSynonyms: list, the list of predicate synonyms
      :param exclPrepos: list, the list of exlcuded prepositions

      :returns: generator, the extracted causal relation
      :rtype: (subject tuple, predicate, object tuple)


   .. py:method:: bfs(root, deps)

      Return first child of root (included) that matches
      entType and dependency list by breadth first search.
      Search stops after first dependency match if firstDepOnly
      (used for subject search - do not "jump" over subjects)

      :param root: spacy.tokens.Token, the root token
      :param deps: list, list of dependency

      :returns: spacy.tokens.Token, the matched token
      :rtype: child


   .. py:method:: findSubj(pred, passive)

      Find closest subject in predicates left subtree or
      predicates parent's left subtree (recursive).
      Has a filter on organizations.

      :param pred: spacy.tokens.Token, the predicate token
      :param passive: bool, True if the predicate token is passive

      :returns: spacy.tokens.Token, the token that represents subject
      :rtype: subj


   .. py:method:: findObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl'], exclPrepos=[])

      Find closest object in predicates right subtree.
      Skip prepositional objects if the preposition is in exclude list.
      Has a filter on organizations.

      :param pred: spacy.tokens.Token, the predicate token
      :param exclPrepos: list, the list of prepositions that will be excluded

      :returns: spacy.tokens.Token,, the token that represents the object
      :rtype: obj


   .. py:method:: isValidKeyword(var, keywords)

      :param var: token
      :param keywords: list/dict

      Returns: True if the var is a valid among the keywords


   .. py:method:: getStatusForSubj(ent, include=False)

      Get the status for nsubj/nsubjpass ent

      :param ent: Span, the nsubj/nsubjpass ent that will be used to search status
      :param include: bool, include ent in the returned expression if True

      :returns: Span or Token, the identified status
      :rtype: status


   .. py:method:: getStatusForObj(ent, include=False)

      Get the status for pobj/dobj ent

      :param ent: Span, the pobj/dobj ent that will be used to search status
      :param include: bool, include ent in the returned expression if True

      :returns: Span or Token, the identified status
      :rtype: status


   .. py:method:: getStatusForPobj(ent, include=False)

      Get the status for ent root pos ``pobj``

      :param ent: Span, the span of entity
      :param include: bool, ent will be included in returned status if True

      :returns: Span or Token, the identified health status