Demo for Rule Based Natural Language Processing¶

1. Set up the path, so that the NLP modules can be found¶

[1]:

import os
import sys

cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)

2. Load Spacy module¶

[2]:

import spacy
nlp = spacy.load("en_core_web_lg", exclude=[])

3. Load other modules¶

[3]:

import pandas as pd

4. Import NLP modules¶

[4]:

from dackar.workflows.RuleBasedMatcher import RuleBasedMatcher
from dackar import config
from dackar.utils.nlp.nlp_utils import generatePatternList

5. Set up logging¶

[5]:

import logging
logging.basicConfig(format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%d-%b-%y %H:%M:%S', level=logging.DEBUG)

6. Read and process entities¶

[6]:

ents = []
entityFile = config.nlpConfig['files']['entity_file']
entityList = pd.read_csv(entityFile).values.ravel().tolist()
ents.extend(entityList)
ents = set(ents)
label = "pump_component"
entId = "SSC"
patternsOPM = generatePatternList(ents, label=label, id=entId, nlp=nlp, attr="LEMMA")

7. Read and process causal keywords¶

[7]:

causalLabel = "causal_keywords"
causalID = "causal"
patternsCausal = []
causalFilename = config.nlpConfig['files']['cause_effect_keywords_file']
ds = pd.read_csv(causalFilename, skipinitialspace=True)
for col in ds.columns:
    vars = set(ds[col].dropna())
    patternsCausal.extend(generatePatternList(vars, label=causalLabel, id=causalID, nlp=nlp, attr="LEMMA"))

8. Create Rule-based matcher with entity list and causal entity list¶

[8]:

name = 'ssc_entity_ruler'
matcher = RuleBasedMatcher(nlp, entID=entId, causalKeywordID=causalID)
matcher.addEntityPattern(name, patternsOPM)

causalName = 'causal_keywords_entity_ruler'
matcher.addEntityPattern(causalName, patternsCausal)

30-May-25 15:56:31 dackar.workflows.WorkflowBase INFO     Create instance of RuleBasedMatcher
30-May-25 15:56:33 dackar.utils.nlp.nlp_utils INFO     Model: core_web_lg, Language: en
30-May-25 15:56:33 dackar.utils.nlp.nlp_utils INFO     Available pipelines:pysbdSentenceBoundaries, tok2vec, tagger, parser, attribute_ruler, lemmatizer, mergePhrase, normEntities, initCoref, aliasResolver, anaphorCoref, anaphorEntCoref

9. Read input text file, or users can provide a raw string¶

[9]:

textFile = config.nlpConfig['files']['text_file']
with open(textFile, 'r') as ft:
    doc = ft.read()

10. Process raw string data using matcher¶

[10]:

matcher(doc)

30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher INFO     Start to extract health status
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher WARNING  No status identified for "pump" in "Slight Vibrations is noticed - likely from pump shaft deflection.
"
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher WARNING  Entity "pump" dep_ is "xcomp" is not among valid list "[nsubj, nsubjpass, pobj, dobj, compound]"
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher WARNING  Entity "pump" dep_ is "xcomp" is not among valid list "[nsubj, nsubjpass, pobj, dobj, compound]"
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher WARNING  Entity "pump" dep_ is "advcl" is not among valid list "[nsubj, nsubjpass, pobj, dobj, compound]"
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher INFO     End of health status extraction!
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher INFO     Start to extract causal relation using OPM model information
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher INFO     End of causal relation extraction!
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher INFO     Start to use general extraction method to extract causal relation
30-May-25 15:56:33 dackar.workflows.RuleBasedMatcher INFO     End of causal relation extraction using general extraction method!

(bearings, caused, shaft degradation) (bearings, caused, shaft degradation) (inspection, revealed, degradation) (inspection, revealed, degradation) (they, caused, failure) (Low flow conditions, causing, cavitation) (Pump, keep, the check valves) (shaft, made, noise) (Pump, made, noises)

11. Access processed information from matcher¶

[11]:

matcher._extractedCausals

[11]:

[[pump bearings,
  None,
  caused,
  shaft degradation,
  None,
  Rupture of pump bearings caused pump shaft degradation.,
  False],
 [pump bearings,
  None,
  caused,
  shaft degradation,
  None,
  Rupture of pump bearings caused pump shaft degradation and consequent flow reduction.,
  False],
 [power supply,
  None,
  due to,
  Pump,
  None,
  Pump test failed due to power supply failure.,
  False],
 [Pump,
  None,
  revealed,
  impeller,
  None,
  Pump inspection revealed excessive impeller degradation.,
  False],
 [Pump,
  None,
  revealed,
  impeller,
  None,
  Pump inspection revealed excessive impeller degradation likely due to cavitation.,
  True],
 [pump shaft,
  None,
  caused,
  pump,
  None,
  Several cracks on pump shaft were observed; they could have caused pump failure within few days.,
  True],
 [pump shaft,
  None,
  causing,
  motor,
  None,
  The pump shaft vibration appears to be causing the motor to vibrate as well.,
  False]]