src.dackar.workflows.WorkflowBase

Created on April, 2024

@author: wangc, mandd

Attributes

logger

_corefAvail

ver

Classes

WorkflowBase

Base Class for Workflow Analysis

Module Contents

src.dackar.workflows.WorkflowBase.logger[source]
src.dackar.workflows.WorkflowBase._corefAvail = False[source]
src.dackar.workflows.WorkflowBase.ver[source]
class src.dackar.workflows.WorkflowBase.WorkflowBase(nlp, entID='SSC', causalKeywordID='causal', *args, **kwargs)[source]

Bases: object

Base Class for Workflow Analysis

type[source]
name[source]
nlp[source]
_causalFile[source]
_causalPOS[source]
_causalKeywords[source]
_statusFile[source]
_statusKeywords[source]
_updateStatusKeywords = False[source]
_updateCausalKeywords = False[source]
_conjectureFile[source]
_conjectureKeywords[source]
_doc = None[source]
entityRuler = None[source]
_entityRuler = False[source]
_entityRulerMatches = [][source]
_matchedSents = [][source]
_matchedSentsForVis = [][source]
_visualizeMatchedSents = True[source]
_coref[source]
_entityLabels[source]
_entID[source]
_causalKeywordID[source]
_causalNames = ['cause', 'cause health status', 'causal keyword', 'effect', 'effect health status', 'sentence',...[source]
_extractedCausals = [][source]
_causalSentsNoEnts = [][source]
_rawCausalList = [][source]
_causalSentsOneEnt = [][source]
_entHS = None[source]
_entStatus = None[source]
_screen = False[source]
dataframeRelations = None[source]
dataframeEntities = None[source]
_textProcess[source]
reset()[source]

Reset rule-based matcher

textProcess()[source]

Function to clean text

Parameters:

None

Returns:

procObj, DACKAR.Preprocessing object

getKeywords(filename, columnNames=None)[source]

Get the keywords from given file

Parameters:

filename – str, the file name to read the keywords

Returns:

dict, dictionary contains the keywords

Return type:

kw

extractLemma(varList)[source]

Lammatize the variable list

Parameters:

varList – list, list of variables

Returns:

list, list of lammatized variables

Return type:

lemmaList

addKeywords(keywords, ktype)[source]

Method to update self._causalKeywords or self._statusKeywords

Parameters:
  • keywords – dict, keywords that will be add to self._causalKeywords or self._statusKeywords

  • ktype – string, either ‘status’ or ‘causal’

addEntityPattern(name, patternList)[source]

Add entity pattern, to extend doc.ents, similar function to self.extendEnt

Parameters:
  • name – str, the name for the entity pattern.

  • patternList – list, the pattern list, for example:

  • {"label" – “GPE”, “pattern”: [{“LOWER”: “san”}, {“LOWER”: “francisco”}]}

__call__(text, extract=True, screen=False)[source]

Find all token sequences matching the supplied pattern

Parameters:

text – string, the text that need to be processed

Returns:

None

abstract extractInformation()[source]

extract information

Parameters:

None

Returns:

None

visualize()[source]

Visualize the processed document

Parameters:

None

Returns:

None

isPassive(token)[source]

Check the passiveness of the token

Parameters:

token – spacy.tokens.Token, the token of the doc

Returns:

True, if the token is passive

Return type:

isPassive

isConjecture(token)[source]

Check the conjecture of the token

Parameters:

token – spacy.tokens.Token, the token of the doc, the token should be the root of the Doc

Returns:

True, if the token/sentence indicates conjecture

Return type:

isConjecture

isNegation(token)[source]

Check negation status of given token

Parameters:

token – spacy.tokens.Token, token from spacy.tokens.doc.Doc

Returns:

tuple, the negation status and the token text

Return type:

(neg, text)

findVerb(doc)[source]

Find the first verb in the doc

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

Returns:

spacy.tokens.Token, the token that has VERB pos

Return type:

token

getCustomEnts(ents, labels)[source]

Get the custom entities

Parameters:
  • ents – list, all entities from the processed doc

  • labels – list, list of labels to be used to get the custom entities out of “ents”

Returns:

list, the customEnts associates with the “labels”

Return type:

customEnts

getPhrase(ent, start, end, include=False)[source]

Get the phrase for ent with all left children

Parameters:
  • ent – Span, the ent to amend with all left children

  • start – int, the start index of ent

  • end – int, the end index of ent

  • include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getAmod(ent, start, end, include=False)[source]

Get amod tokens for ent

Parameters:
  • ent – Span, the ent to amend with all left children

  • start – int, the start index of ent

  • end – int, the end index of ent

  • include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getAmodOnly(ent)[source]

Get amod tokens texts for ent

Parameters:

ent – Span, the ent to amend with all left children

Returns:

list, the list of amods for ent

Return type:

amod

getCompoundOnly(headEnt, ent)[source]

Get the compounds for headEnt except ent

Parameters:

headEnt – Span, the head entity to ent

Returns:

list, the list of compounds for head ent

Return type:

compDes

getNbor(token)[source]

Method to get the nbor from token, return None if nbor is not exist

Parameters:

token – Token, the provided Token to request nbor

Returns:

Token, the requested nbor

Return type:

nbor

validSent(sent)[source]

Check if the sentence has valid structure, either contains subject or object

Parameters:

sent – Span, sentence from user provided text

Returns:

bool, False if the sentence has no subject and object.

Return type:

valid

findLeftSubj(pred, passive)[source]

Find closest subject in predicates left subtree or predicates parent’s left subtree (recursive). Has a filter on organizations.

Parameters:
  • pred – spacy.tokens.Token, the predicate token

  • passive – bool, True if passive

Returns:

spacy.tokens.Token, the token that represent subject

Return type:

subj

findRightObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl', 'oprd'], exclPrepos=[])[source]

Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.

Parameters:
  • pred – spacy.tokens.Token, the predicate token

  • exclPrepos – list, list of the excluded prepositions

findRightKeyword(pred, exclPrepos=[])[source]

Find Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.

Parameters:
  • pred – spacy.tokens.Token, the predicate token

  • exclPrepos – list, list of the excluded prepositions

findHealthStatus(root, deps)[source]

Return first child of root (included) that matches dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)

Parameters:
  • root – spacy.tokens.Token, the root token

  • deps – list, the dependency list

Returns:

token, the token represents the health status

Return type:

child

isValidCausalEnts(ent)[source]

Check the entity if it belongs to the valid causal entities

Args:

ent: list, list of entities

Returns:

valid: bool, valid cansual ent if True

getIndex(ent, entList)[source]

Get index for ent in entList

Parameters:
  • ent – Span, ent that is used to get index

  • entList – list, list of entities

Returns:

int, the index for ent

Return type:

idx

getConjuncts(entList)[source]

Get a list of conjuncts from entity list

Parameters:

entList – list, list of entities

Returns:

list, list of conjuncts

Return type:

conjunctList

collectSents(doc)[source]

collect data of matched sentences that can be used for visualization

Args:

doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

extract(sents, predSynonyms=[], exclPrepos=[])[source]

General extraction method

Parameters:
  • sents – list, the list of sentences

  • predSynonyms – list, the list of predicate synonyms

  • exclPrepos – list, the list of exlcuded prepositions

Returns:

generator, the extracted causal relation

Return type:

(subject tuple, predicate, object tuple)

bfs(root, deps)[source]

Return first child of root (included) that matches entType and dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)

Parameters:
  • root – spacy.tokens.Token, the root token

  • deps – list, list of dependency

Returns:

spacy.tokens.Token, the matched token

Return type:

child

findSubj(pred, passive)[source]

Find closest subject in predicates left subtree or predicates parent’s left subtree (recursive). Has a filter on organizations.

Parameters:
  • pred – spacy.tokens.Token, the predicate token

  • passive – bool, True if the predicate token is passive

Returns:

spacy.tokens.Token, the token that represents subject

Return type:

subj

findObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl'], exclPrepos=[])[source]

Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.

Parameters:
  • pred – spacy.tokens.Token, the predicate token

  • exclPrepos – list, the list of prepositions that will be excluded

Returns:

spacy.tokens.Token,, the token that represents the object

Return type:

obj

isValidKeyword(var, keywords)[source]
Parameters:
  • var – token

  • keywords – list/dict

Returns: True if the var is a valid among the keywords

getStatusForSubj(ent, include=False)[source]

Get the status for nsubj/nsubjpass ent

Parameters:
  • ent – Span, the nsubj/nsubjpass ent that will be used to search status

  • include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getStatusForObj(ent, include=False)[source]

Get the status for pobj/dobj ent

Parameters:
  • ent – Span, the pobj/dobj ent that will be used to search status

  • include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getStatusForPobj(ent, include=False)[source]

Get the status for ent root pos pobj

Parameters:
  • ent – Span, the span of entity

  • include – bool, ent will be included in returned status if True

Returns:

Span or Token, the identified health status