src.dackar.workflows.WorkflowBase¶
Created on April, 2024
@author: wangc, mandd
Attributes¶
Classes¶
Base Class for Workflow Analysis |
Module Contents¶
- class src.dackar.workflows.WorkflowBase.WorkflowBase(nlp, entID='SSC', causalKeywordID='causal', *args, **kwargs)[source]¶
Bases:
object
Base Class for Workflow Analysis
- _causalNames = ['cause', 'cause health status', 'causal keyword', 'effect', 'effect health status', 'sentence',...[source]¶
- textProcess()[source]¶
Function to clean text
- Parameters:
None
- Returns:
procObj, DACKAR.Preprocessing object
- getKeywords(filename, columnNames=None)[source]¶
Get the keywords from given file
- Parameters:
filename – str, the file name to read the keywords
- Returns:
dict, dictionary contains the keywords
- Return type:
kw
- extractLemma(varList)[source]¶
Lammatize the variable list
- Parameters:
varList – list, list of variables
- Returns:
list, list of lammatized variables
- Return type:
lemmaList
- addKeywords(keywords, ktype)[source]¶
Method to update self._causalKeywords or self._statusKeywords
- Parameters:
keywords – dict, keywords that will be add to self._causalKeywords or self._statusKeywords
ktype – string, either ‘status’ or ‘causal’
- addEntityPattern(name, patternList)[source]¶
Add entity pattern, to extend doc.ents, similar function to self.extendEnt
- Parameters:
name – str, the name for the entity pattern.
patternList – list, the pattern list, for example:
{"label" – “GPE”, “pattern”: [{“LOWER”: “san”}, {“LOWER”: “francisco”}]}
- __call__(text, extract=True, screen=False)[source]¶
Find all token sequences matching the supplied pattern
- Parameters:
text – string, the text that need to be processed
- Returns:
None
- isPassive(token)[source]¶
Check the passiveness of the token
- Parameters:
token – spacy.tokens.Token, the token of the doc
- Returns:
True, if the token is passive
- Return type:
isPassive
- isConjecture(token)[source]¶
Check the conjecture of the token
- Parameters:
token – spacy.tokens.Token, the token of the doc, the token should be the root of the Doc
- Returns:
True, if the token/sentence indicates conjecture
- Return type:
isConjecture
- isNegation(token)[source]¶
Check negation status of given token
- Parameters:
token – spacy.tokens.Token, token from spacy.tokens.doc.Doc
- Returns:
tuple, the negation status and the token text
- Return type:
(neg, text)
- findVerb(doc)[source]¶
Find the first verb in the doc
- Parameters:
doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines
- Returns:
spacy.tokens.Token, the token that has VERB pos
- Return type:
token
- getCustomEnts(ents, labels)[source]¶
Get the custom entities
- Parameters:
ents – list, all entities from the processed doc
labels – list, list of labels to be used to get the custom entities out of “ents”
- Returns:
list, the customEnts associates with the “labels”
- Return type:
customEnts
- getPhrase(ent, start, end, include=False)[source]¶
Get the phrase for ent with all left children
- Parameters:
ent – Span, the ent to amend with all left children
start – int, the start index of ent
end – int, the end index of ent
include – bool, include ent in the returned expression if True
- Returns:
Span or Token, the identified status
- Return type:
status
- getAmod(ent, start, end, include=False)[source]¶
Get amod tokens for ent
- Parameters:
ent – Span, the ent to amend with all left children
start – int, the start index of ent
end – int, the end index of ent
include – bool, include ent in the returned expression if True
- Returns:
Span or Token, the identified status
- Return type:
status
- getAmodOnly(ent)[source]¶
Get amod tokens texts for ent
- Parameters:
ent – Span, the ent to amend with all left children
- Returns:
list, the list of amods for ent
- Return type:
amod
- getCompoundOnly(headEnt, ent)[source]¶
Get the compounds for headEnt except ent
- Parameters:
headEnt – Span, the head entity to ent
- Returns:
list, the list of compounds for head ent
- Return type:
compDes
- getNbor(token)[source]¶
Method to get the nbor from token, return None if nbor is not exist
- Parameters:
token – Token, the provided Token to request nbor
- Returns:
Token, the requested nbor
- Return type:
nbor
- validSent(sent)[source]¶
Check if the sentence has valid structure, either contains subject or object
- Parameters:
sent – Span, sentence from user provided text
- Returns:
bool, False if the sentence has no subject and object.
- Return type:
valid
- findLeftSubj(pred, passive)[source]¶
Find closest subject in predicates left subtree or predicates parent’s left subtree (recursive). Has a filter on organizations.
- Parameters:
pred – spacy.tokens.Token, the predicate token
passive – bool, True if passive
- Returns:
spacy.tokens.Token, the token that represent subject
- Return type:
subj
- findRightObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl', 'oprd'], exclPrepos=[])[source]¶
Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.
- Parameters:
pred – spacy.tokens.Token, the predicate token
exclPrepos – list, list of the excluded prepositions
- findRightKeyword(pred, exclPrepos=[])[source]¶
Find Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.
- Parameters:
pred – spacy.tokens.Token, the predicate token
exclPrepos – list, list of the excluded prepositions
- findHealthStatus(root, deps)[source]¶
Return first child of root (included) that matches dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)
- Parameters:
root – spacy.tokens.Token, the root token
deps – list, the dependency list
- Returns:
token, the token represents the health status
- Return type:
child
- isValidCausalEnts(ent)[source]¶
Check the entity if it belongs to the valid causal entities
Args:
ent: list, list of entities
Returns:
valid: bool, valid cansual ent if True
- getIndex(ent, entList)[source]¶
Get index for ent in entList
- Parameters:
ent – Span, ent that is used to get index
entList – list, list of entities
- Returns:
int, the index for ent
- Return type:
idx
- getConjuncts(entList)[source]¶
Get a list of conjuncts from entity list
- Parameters:
entList – list, list of entities
- Returns:
list, list of conjuncts
- Return type:
conjunctList
- collectSents(doc)[source]¶
collect data of matched sentences that can be used for visualization
- Args:
doc: spacy.tokens.doc.Doc, the processed document using nlp pipelines
- extract(sents, predSynonyms=[], exclPrepos=[])[source]¶
General extraction method
- Parameters:
sents – list, the list of sentences
predSynonyms – list, the list of predicate synonyms
exclPrepos – list, the list of exlcuded prepositions
- Returns:
generator, the extracted causal relation
- Return type:
(subject tuple, predicate, object tuple)
- bfs(root, deps)[source]¶
Return first child of root (included) that matches entType and dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)
- Parameters:
root – spacy.tokens.Token, the root token
deps – list, list of dependency
- Returns:
spacy.tokens.Token, the matched token
- Return type:
child
- findSubj(pred, passive)[source]¶
Find closest subject in predicates left subtree or predicates parent’s left subtree (recursive). Has a filter on organizations.
- Parameters:
pred – spacy.tokens.Token, the predicate token
passive – bool, True if the predicate token is passive
- Returns:
spacy.tokens.Token, the token that represents subject
- Return type:
subj
- findObj(pred, deps=['dobj', 'pobj', 'iobj', 'obj', 'obl'], exclPrepos=[])[source]¶
Find closest object in predicates right subtree. Skip prepositional objects if the preposition is in exclude list. Has a filter on organizations.
- Parameters:
pred – spacy.tokens.Token, the predicate token
exclPrepos – list, the list of prepositions that will be excluded
- Returns:
spacy.tokens.Token,, the token that represents the object
- Return type:
obj
- isValidKeyword(var, keywords)[source]¶
- Parameters:
var – token
keywords – list/dict
Returns: True if the var is a valid among the keywords
- getStatusForSubj(ent, include=False)[source]¶
Get the status for nsubj/nsubjpass ent
- Parameters:
ent – Span, the nsubj/nsubjpass ent that will be used to search status
include – bool, include ent in the returned expression if True
- Returns:
Span or Token, the identified status
- Return type:
status