src.dackar.workflows.RuleBasedMatcher

Created on March, 2022

@author: wangc, mandd

Attributes

logger

Classes

RuleBasedMatcher

Rule Based Matcher Class

Module Contents

src.dackar.workflows.RuleBasedMatcher.logger[source]
class src.dackar.workflows.RuleBasedMatcher.RuleBasedMatcher(nlp, entID='SSC', causalKeywordID='causal', *args, **kwargs)[source]

Bases: src.dackar.workflows.WorkflowBase.WorkflowBase

Rule Based Matcher Class

reset()[source]

Reset rule-based matcher

extractInformation()[source]

Extract information

Parameters:

None

Returns:

None

extractHealthStatus(matchedSents, predSynonyms=[], exclPrepos=[])[source]

Extract health status and relation

Parameters:
  • matchedSents – list, the matched sentences

  • predSynonyms – list, predicate synonyms

  • exclPrepos – list, exclude the prepositions

findHealthStatus(root, deps)[source]

Return first child of root (included) that matches dependency list by breadth first search. Search stops after first dependency match if firstDepOnly (used for subject search - do not “jump” over subjects)

Parameters:
  • root – spacy.tokens.Token, the root token

  • deps – list, the dependency list

Returns:

token, the token represents the health status

Return type:

child

isValidCausalEnts(ent)[source]

Check the entity if it belongs to the valid causal entities

Args:

ent: list, list of entities

Returns:

valid: bool, valid cansual ent if True

getSSCEnt(entList, index, direction='left')[source]

Get the closest group of SSC entities

Parameters:
  • entList – list, list of entities

  • index – int, the start location of entity

  • direction – str, ‘left’ or ‘right’, the search direction

Returns:

the closest group of SSC entities

Return type:

ent

extractRelDep(matchedSents)[source]
Parameters:

matchedSents – list, the list of matched sentences

Returns:

generator, the extracted causal relation

Return type:

(subject tuple, predicate, object tuple)

identifyCauseEffectForNsuj(cRoot, cEntsIndex, causalEnts, orderedEnts, validRightSSCEnts, reverse=False)[source]

Identify the cause effect pairs for nsubj

Parameters:
  • cRoot – Token, the root of causal entity

  • cEntsIndex – int, the index for the causal entity

  • causalEnts – list, the list of causal entities

  • orderedEnts – list, the entities ordered by their locations in the Doc

  • validRightSSCEnts – list, the valid list of entities on the right of given causal entity

  • reverse – bool, reverse the cause effect relation if True

Returns:

cause effect pairs, tuple, (causeList, effectList, skipCEnts)

identifyCauseEffectForAttr(cRoot, validLeftSSCEnts, validRightSSCEnts, reverse=False)[source]

Identify the cause effect pairs for attr

Parameters:
  • cRoot – Token, the root of causal entity

  • validLeftSSCEnts – list, the valid list of entities on the left of given causal entity

  • validRightSSCEnts – list, the valid list of entities on the right of given causal entity

  • reverse – bool, reverse the cause effect relation if True

Returns:

cause effect pairs, tuple, (causeList, effectList)

identifyCauseEffectForClauseModifier(cRoot, rootCause, validLeftSSCEnts, validRightSSCEnts, reverse=False)[source]

Identify the cause effect pairs for clause modifier

Parameters:
  • cRoot – Token, the root of causal entity

  • rootCause – tuple, list of causes

  • validLeftSSCEnts – list, the valid list of entities on the left of given causal entity

  • validRightSSCEnts – list, the valid list of entities on the right of given causal entity

  • reverse – bool, reverse the cause effect relation if True

Returns:

cause effect pairs, tuple, (causeList, effectList)

splitEntsFollowingNounCausal(cRoot, validRightSSCEnts)[source]

Spit the entities into cause, effect

Parameters:
  • cRoot – Token, the root of causal entity

  • validRightSSCEnts – list, the valid list of entities on the right of given causal entity

Returns:

cause effect pairs, tuple, (cause, effect)

getRightSSCEnts(cEnt, orderedEnts)[source]

Get the SSC ents on the right of causal entity

Parameters:
  • cEnt – Span, causal entity

  • orderedEnts – list, the entities ordered by their locations in the Doc

Returns:

list, list of SSC entities

Return type:

selEnts

getLeftSSCEnts(cEnt, orderedEnts)[source]

Get the SSC ents on the left of causal entity

Parameters:
  • cEnt – Span, causal entity

  • orderedEnts – list, the entities ordered by their locations in the Doc

Returns:

list, list of SSC entities

Return type:

selEnts

selectValidEnts(ents, cEnt)[source]

Select the valide ents that are within subtree of causal entity

Parameters:
  • ents – list, the list of entities

  • cEnt – Span, causal entity

Returns:

list, list of valid entities

Return type:

validEnts

collectExtactedCausals(cause, effect, causalKeyword, sent, conjecture=None)[source]

Collect the extracted causal relations

Parameters:
  • cause – list, list of causes

  • effect – list, list of effects

  • causalKeyword – str, causal keyword

  • sent – spacy.tokens.Span, sentence with identified causal relations

Returns:

None

collectCauseEffectSents(doc)[source]

Collect data of matched sentences that contain cause-effect keywords

Parameters:

doc – spacy.tokens.doc.Doc, the processed document using nlp pipelines

getHealthStatusForPobj(ent, include=False)[source]

Get the status for ent root pos pobj

Parameters:
  • ent – Span, the span of entity

  • include – bool, ent will be included in returned status if True

Returns:

Span or Token, the identified health status

getHealthStatusForSubj(ent, entHS, sent, causalStatus, predSynonyms, include=False)[source]

Get the status for nsubj/nsubjpass ent

Parameters:
  • ent – Span, the nsubj/nsubjpass ent that will be used to search status

  • entHS – Span, the entHS that the status will be associated with

  • sent – Span, the sent that includes the ent, entHS and status

  • causalStatus – bool, the causal status for the ent

  • predSynonyms – list, predicate synonyms

  • include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

healthStatus

getHealthStatusForObj(ent, entHS, sent, causalStatus, predSynonyms, include=False)[source]

Get the status for pobj/dobj ent

Parameters:
  • ent – Span, the pobj/dobj ent that will be used to search status

  • entHS – Span, the entHS that the status will be associated with

  • sent – Span, the sent that includes the ent, entHS and status

  • causalStatus – bool, the causal status for the ent

  • predSynonyms – list, predicate synonyms

  • include – bool, include ent in the returned expression if True

Returns:

Span or Token, the identified status

Return type:

healthStatus