src.dackar.workflows.WorkflowBase¶

Created on April, 2024

@author: wangc, mandd

Attributes¶

`logger`
`_corefAvail`
`ver`

Classes¶

WorkflowBase

Base Class for Workflow Analysis

Module Contents¶

src.dackar.workflows.WorkflowBase.logger[source]¶

src.dackar.workflows.WorkflowBase._corefAvail = False[source]¶

src.dackar.workflows.WorkflowBase.ver[source]¶

class src.dackar.workflows.WorkflowBase.WorkflowBase(nlp, entID='SSC', causalKeywordID='causal', *args, **kwargs)[source]¶

Bases: object

Base Class for Workflow Analysis

type = 'WorkflowBase'[source]¶

name = 'WorkflowBase'[source]¶

nlp[source]¶

_causalFile[source]¶

_causalPOS[source]¶

_causalKeywords[source]¶

_statusFile[source]¶

_statusKeywords[source]¶

_updateStatusKeywords = False[source]¶

_updateCausalKeywords = False[source]¶

_conjectureFile[source]¶

_conjectureKeywords[source]¶

_doc = None[source]¶

entityRuler = None[source]¶

_entityRuler = False[source]¶

_entityRulerMatches = [][source]¶

_matchedSents = [][source]¶

_matchedSentsForVis = [][source]¶

_visualizeMatchedSents = True[source]¶

_coref = False[source]¶

_entityLabels[source]¶

_entID = 'SSC'[source]¶

_causalKeywordID = 'causal'[source]¶

_causalNames = ['cause', 'cause health status', 'causal keyword', 'effect', 'effect health status', 'sentence',...[source]¶

_extractedCausals = [][source]¶

_causalSentsNoEnts = [][source]¶

_rawCausalList = [][source]¶

_causalSentsOneEnt = [][source]¶

_entHS = None[source]¶

_entStatus = None[source]¶

_screen = False[source]¶

dataframeRelations = None[source]¶

dataframeEntities = None[source]¶

_textProcess[source]¶

reset()[source]¶: Reset rule-based matcher

textProcess()[source]¶

Function to clean text

Parameters:: None
Returns:: procObj, DACKAR.Preprocessing object

getKeywords(filename, columnNames=None)[source]¶

Get the keywords from given file

Parameters:: filename – str, the file name to read the keywords
Returns:: dict, dictionary contains the keywords
Return type:: kw

extractLemma(varList)[source]¶

Lammatize the variable list

Parameters:: varList – list, list of variables
Returns:: list, list of lammatized variables
Return type:: lemmaList

addKeywords(keywords, ktype)[source]¶

Method to update self._causalKeywords or self._statusKeywords

Parameters:

keywords – dict, keywords that will be add to self._causalKeywords or self._statusKeywords
ktype – string, either ‘status’ or ‘causal’

addEntityPattern(name, patternList)[source]¶

Add entity pattern, to extend doc.ents, similar function to self.extendEnt

Parameters:

name – str, the name for the entity pattern.
patternList – list, the pattern list, for example:
{"label" – “GPE”, “pattern”: [{“LOWER”: “san”}, {“LOWER”: “francisco”}]}

__call__(text, extract=True, screen=False)[source]¶

Find all token sequences matching the supplied pattern

Parameters:: text – string, the text that need to be processed
Returns:: None

abstract extractInformation()[source]¶

extract information

Parameters:: None
Returns:: None

visualize()[source]¶

Visualize the processed document

Parameters:: None
Returns:: None

isPassive(token)[source]¶

Check the passiveness of the token

Parameters:: token – spacy.tokens.Token, the token of the doc
Returns:: True, if the token is passive
Return type:: isPassive

isConjecture(token)[source]¶

Check the conjecture of the token

Parameters:: token – spacy.tokens.Token, the token of the doc, the token should be the root of the Doc
Returns:: True, if the token/sentence indicates conjecture
Return type:: isConjecture

isNegation(token)[source]¶

Check negation status of given token

Parameters:: token – spacy.tokens.Token, token from spacy.tokens.doc.Doc
Returns:: tuple, the negation status and the token text
Return type:: (neg, text)

findVerb(doc)[source]¶

Find the first verb in the doc

Parameters:: doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
Returns:: spacy.tokens.Token, the token that has VERB pos
Return type:: token

getCustomEnts(ents, labels)[source]¶

Get the custom entities

Parameters:

ents – list, all entities from the processed doc
labels – list, list of labels to be used to get the custom entities out of “ents”

Returns:

list, the customEnts associates with the “labels”

Return type:

customEnts

getPhrase(ent, start, end, include=False)[source]¶

Get the phrase for ent with all left children

Parameters:

ent – Span, the ent to amend with all left children
start – int, the start index of ent
end – int, the end index of ent
include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getAmod(ent, start, end, include=False)[source]¶

Get amod tokens for ent

Parameters:

ent – Span, the ent to amend with all left children
start – int, the start index of ent
end – int, the end index of ent
include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getAmodOnly(ent)[source]¶

Get amod tokens texts for ent

Parameters:: ent – Span, the ent to amend with all left children
Returns:: list, the list of amods for ent
Return type:: amod

getCompoundOnly(headEnt, ent)[source]¶

Get the compounds for headEnt except ent

Parameters:: headEnt – Span, the head entity to ent
Returns:: list, the list of compounds for head ent
Return type:: compDes

getNbor(token)[source]¶

Method to get the nbor from token, return None if nbor is not exist

Parameters:: token – Token, the provided Token to request nbor
Returns:: Token, the requested nbor
Return type:: nbor

validSent(sent)[source]¶

Check if the sentence has valid structure, either contains subject or object

Parameters:: sent – Span, sentence from user provided text
Returns:: bool, False if the sentence has no subject and object.
Return type:: valid

findLeftSubj(pred, passive)[source]¶

Find closest subject in predicates left subtree or predicates parent’s left subtree (recursive). Has a filter on organizations.

Parameters:

pred – spacy.tokens.Token, the predicate token
passive – bool, True if passive

Returns:

spacy.tokens.Token, the token that represent subject

Return type:

subj

findRightObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl', 'oprd'], exclPrepos=[])[source]¶

Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.

Parameters:

pred – spacy.tokens.Token, the predicate token
exclPrepos – list, list of the excluded prepositions

findRightKeyword(pred, exclPrepos=[])[source]¶

Find Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.

Parameters:

pred – spacy.tokens.Token, the predicate token
exclPrepos – list, list of the excluded prepositions

findHealthStatus(root, deps)[source]¶

Return first child of root (included) that matches dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)

Parameters:

root – spacy.tokens.Token, the root token
deps – list, the dependency list

Returns:

token, the token represents the health status

Return type:

child

isValidCausalEnts(ent)[source]¶: Check the entity if it belongs to the valid causal entities

Args:

ent: list, list of entities

Returns:

valid: bool, valid cansual ent if True

getIndex(ent, entList)[source]¶

Get index for ent in entList

Parameters:

ent – Span, ent that is used to get index
entList – list, list of entities

Returns:

int, the index for ent

Return type:

idx

getConjuncts(entList)[source]¶

Get a list of conjuncts from entity list

Parameters:: entList – list, list of entities
Returns:: list, list of conjuncts
Return type:: conjunctList

collectSents(doc)[source]¶

collect data of matched sentences that can be used for visualization

Args:
doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines

extract(sents, predSynonyms=[], exclPrepos=[])[source]¶

General extraction method

Parameters:

sents – list, the list of sentences
predSynonyms – list, the list of predicate synonyms
exclPrepos – list, the list of exlcuded prepositions

Returns:

generator, the extracted causal relation

Return type:

(subject tuple, predicate, object tuple)

bfs(root, deps)[source]¶

Return first child of root (included) that matches entType and dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)

Parameters:

root – spacy.tokens.Token, the root token
deps – list, list of dependency

Returns:

spacy.tokens.Token, the matched token

Return type:

child

findSubj(pred, passive)[source]¶

Find closest subject in predicates left subtree or predicates parent’s left subtree (recursive). Has a filter on organizations.

Parameters:

pred – spacy.tokens.Token, the predicate token
passive – bool, True if the predicate token is passive

Returns:

spacy.tokens.Token, the token that represents subject

Return type:

subj

findObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl'], exclPrepos=[])[source]¶

Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.

Parameters:

pred – spacy.tokens.Token, the predicate token
exclPrepos – list, the list of prepositions that will be excluded

Returns:

spacy.tokens.Token,, the token that represents the object

Return type:

obj

isValidKeyword(var, keywords)[source]¶

Parameters:

var – token
keywords – list/dict

Returns: True if the var is a valid among the keywords

getStatusForSubj(ent, include=False)[source]¶

Get the status for nsubj/nsubjpass ent

Parameters:

ent – Span, the nsubj/nsubjpass ent that will be used to search status
include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getStatusForObj(ent, include=False)[source]¶

Get the status for pobj/dobj ent

Parameters:

ent – Span, the pobj/dobj ent that will be used to search status
include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

status

getStatusForPobj(ent, include=False)[source]¶

Get the status for ent root pos pobj

Parameters:

ent – Span, the span of entity
include – bool, ent will be included in returned status if True

Returns:

Span or Token, the identified health status