src.dackar.workflows.WorkflowBase ================================= .. py:module:: src.dackar.workflows.WorkflowBase .. autoapi-nested-parse:: Created on April, 2024 @author: wangc, mandd Attributes ---------- .. autoapisummary:: src.dackar.workflows.WorkflowBase.logger src.dackar.workflows.WorkflowBase._corefAvail src.dackar.workflows.WorkflowBase.ver Classes ------- .. autoapisummary:: src.dackar.workflows.WorkflowBase.WorkflowBase Module Contents --------------- .. py:data:: logger .. py:data:: _corefAvail :value: False .. py:data:: ver .. py:class:: WorkflowBase(nlp, entID='SSC', causalKeywordID='causal', *args, **kwargs) Bases: :py:obj:`object` Base Class for Workflow Analysis .. py:attribute:: type :value: 'WorkflowBase' .. py:attribute:: name :value: 'WorkflowBase' .. py:attribute:: nlp .. py:attribute:: _causalFile .. py:attribute:: _causalPOS .. py:attribute:: _causalKeywords .. py:attribute:: _statusFile .. py:attribute:: _statusKeywords .. py:attribute:: _updateStatusKeywords :value: False .. py:attribute:: _updateCausalKeywords :value: False .. py:attribute:: _conjectureFile .. py:attribute:: _conjectureKeywords .. py:attribute:: _doc :value: None .. py:attribute:: entityRuler :value: None .. py:attribute:: _entityRuler :value: False .. py:attribute:: _entityRulerMatches :value: [] .. py:attribute:: _matchedSents :value: [] .. py:attribute:: _matchedSentsForVis :value: [] .. py:attribute:: _visualizeMatchedSents :value: True .. py:attribute:: _coref :value: False .. py:attribute:: _entityLabels .. py:attribute:: _entID :value: 'SSC' .. py:attribute:: _causalKeywordID :value: 'causal' .. py:attribute:: _causalNames :value: ['cause', 'cause health status', 'causal keyword', 'effect', 'effect health status', 'sentence',... .. py:attribute:: _extractedCausals :value: [] .. py:attribute:: _causalSentsNoEnts :value: [] .. py:attribute:: _rawCausalList :value: [] .. py:attribute:: _causalSentsOneEnt :value: [] .. py:attribute:: _entHS :value: None .. py:attribute:: _entStatus :value: None .. py:attribute:: _screen :value: False .. py:attribute:: dataframeRelations :value: None .. py:attribute:: dataframeEntities :value: None .. py:attribute:: _textProcess .. py:method:: reset() Reset rule-based matcher .. py:method:: textProcess() Function to clean text :param None: :returns: procObj, DACKAR.Preprocessing object .. py:method:: getKeywords(filename, columnNames=None) Get the keywords from given file :param filename: str, the file name to read the keywords :returns: dict, dictionary contains the keywords :rtype: kw .. py:method:: extractLemma(varList) Lammatize the variable list :param varList: list, list of variables :returns: list, list of lammatized variables :rtype: lemmaList .. py:method:: addKeywords(keywords, ktype) Method to update self._causalKeywords or self._statusKeywords :param keywords: dict, keywords that will be add to self._causalKeywords or self._statusKeywords :param ktype: string, either 'status' or 'causal' .. py:method:: addEntityPattern(name, patternList) Add entity pattern, to extend doc.ents, similar function to self.extendEnt :param name: str, the name for the entity pattern. :param patternList: list, the pattern list, for example: :param {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]} .. py:method:: __call__(text, extract=True, screen=False) Find all token sequences matching the supplied pattern :param text: string, the text that need to be processed :returns: None .. py:method:: extractInformation() :abstractmethod: extract information :param None: :returns: None .. py:method:: visualize() Visualize the processed document :param None: :returns: None .. py:method:: isPassive(token) Check the passiveness of the token :param token: spacy.tokens.Token, the token of the doc :returns: True, if the token is passive :rtype: isPassive .. py:method:: isConjecture(token) Check the conjecture of the token :param token: spacy.tokens.Token, the token of the doc, the token should be the root of the Doc :returns: True, if the token/sentence indicates conjecture :rtype: isConjecture .. py:method:: isNegation(token) Check negation status of given token :param token: spacy.tokens.Token, token from spacy.tokens.doc.Doc :returns: tuple, the negation status and the token text :rtype: (neg, text) .. py:method:: findVerb(doc) Find the first verb in the doc :param doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines :returns: spacy.tokens.Token, the token that has VERB pos :rtype: token .. py:method:: getCustomEnts(ents, labels) Get the custom entities :param ents: list, all entities from the processed doc :param labels: list, list of labels to be used to get the custom entities out of "ents" :returns: list, the customEnts associates with the "labels" :rtype: customEnts .. py:method:: getPhrase(ent, start, end, include=False) Get the phrase for ent with all left children :param ent: Span, the ent to amend with all left children :param start: int, the start index of ent :param end: int, the end index of ent :param include: bool, include ent in the returned expression if True :returns: Span or Token, the identified status :rtype: status .. py:method:: getAmod(ent, start, end, include=False) Get amod tokens for ent :param ent: Span, the ent to amend with all left children :param start: int, the start index of ent :param end: int, the end index of ent :param include: bool, include ent in the returned expression if True :returns: Span or Token, the identified status :rtype: status .. py:method:: getAmodOnly(ent) Get amod tokens texts for ent :param ent: Span, the ent to amend with all left children :returns: list, the list of amods for ent :rtype: amod .. py:method:: getCompoundOnly(headEnt, ent) Get the compounds for headEnt except ent :param headEnt: Span, the head entity to ent :returns: list, the list of compounds for head ent :rtype: compDes .. py:method:: getNbor(token) Method to get the nbor from token, return None if nbor is not exist :param token: Token, the provided Token to request nbor :returns: Token, the requested nbor :rtype: nbor .. py:method:: validSent(sent) Check if the sentence has valid structure, either contains subject or object :param sent: Span, sentence from user provided text :returns: bool, False if the sentence has no subject and object. :rtype: valid .. py:method:: findLeftSubj(pred, passive) Find closest subject in predicates left subtree or predicates parent's left subtree (recursive). Has a filter on organizations. :param pred: spacy.tokens.Token, the predicate token :param passive: bool, True if passive :returns: spacy.tokens.Token, the token that represent subject :rtype: subj .. py:method:: findRightObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl', 'oprd'], exclPrepos=[]) Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations. :param pred: spacy.tokens.Token, the predicate token :param exclPrepos: list, list of the excluded prepositions .. py:method:: findRightKeyword(pred, exclPrepos=[]) Find Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations. :param pred: spacy.tokens.Token, the predicate token :param exclPrepos: list, list of the excluded prepositions .. py:method:: findHealthStatus(root, deps) Return first child of root (included) that matches dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not "jump" over subjects) :param root: spacy.tokens.Token, the root token :param deps: list, the dependency list :returns: token, the token represents the health status :rtype: child .. py:method:: isValidCausalEnts(ent) Check the entity if it belongs to the valid causal entities Args: ent: list, list of entities Returns: valid: bool, valid cansual ent if True .. py:method:: getIndex(ent, entList) Get index for ent in entList :param ent: Span, ent that is used to get index :param entList: list, list of entities :returns: int, the index for ent :rtype: idx .. py:method:: getConjuncts(entList) Get a list of conjuncts from entity list :param entList: list, list of entities :returns: list, list of conjuncts :rtype: conjunctList .. py:method:: collectSents(doc) collect data of matched sentences that can be used for visualization Args: doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines .. py:method:: extract(sents, predSynonyms=[], exclPrepos=[]) General extraction method :param sents: list, the list of sentences :param predSynonyms: list, the list of predicate synonyms :param exclPrepos: list, the list of exlcuded prepositions :returns: generator, the extracted causal relation :rtype: (subject tuple, predicate, object tuple) .. py:method:: bfs(root, deps) Return first child of root (included) that matches entType and dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not "jump" over subjects) :param root: spacy.tokens.Token, the root token :param deps: list, list of dependency :returns: spacy.tokens.Token, the matched token :rtype: child .. py:method:: findSubj(pred, passive) Find closest subject in predicates left subtree or predicates parent's left subtree (recursive). Has a filter on organizations. :param pred: spacy.tokens.Token, the predicate token :param passive: bool, True if the predicate token is passive :returns: spacy.tokens.Token, the token that represents subject :rtype: subj .. py:method:: findObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl'], exclPrepos=[]) Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations. :param pred: spacy.tokens.Token, the predicate token :param exclPrepos: list, the list of prepositions that will be excluded :returns: spacy.tokens.Token,, the token that represents the object :rtype: obj .. py:method:: isValidKeyword(var, keywords) :param var: token :param keywords: list/dict Returns: True if the var is a valid among the keywords .. py:method:: getStatusForSubj(ent, include=False) Get the status for nsubj/nsubjpass ent :param ent: Span, the nsubj/nsubjpass ent that will be used to search status :param include: bool, include ent in the returned expression if True :returns: Span or Token, the identified status :rtype: status .. py:method:: getStatusForObj(ent, include=False) Get the status for pobj/dobj ent :param ent: Span, the pobj/dobj ent that will be used to search status :param include: bool, include ent in the returned expression if True :returns: Span or Token, the identified status :rtype: status .. py:method:: getStatusForPobj(ent, include=False) Get the status for ent root pos ``pobj`` :param ent: Span, the span of entity :param include: bool, ent will be included in returned status if True :returns: Span or Token, the identified health status