Work Order Processing Demo

Setup path and load DACKAR modules

[1]:
%reload_ext autoreload
%autoreload 2

# Import libraries
import os, sys
import logging
import warnings
import spacy
from spacy import displacy
warnings.filterwarnings("ignore")

cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)

from dackar.workflows.WorkOrderProcessing import WorkOrderProcessing
from dackar.utils.nlp.nlp_utils import generatePatternList

logging.basicConfig(format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s', datefmt='%d-%b-%y %H:%M:%S', level=logging.INFO)
logging.getLogger().setLevel(logging.ERROR)

Generate entities patterns and process text using WorkOrderProcessing class

The following information will be identified:

  • Entities

  • Alias associated with entities

  • Status associated with entities

[2]:
# Specify Entities Labels and IDs
entLabel = "cws_component"        # user defined entity label
entId = "OPM"
# Load language model
nlp = spacy.load("en_core_web_lg", exclude=[])
matcher = WorkOrderProcessing(nlp, entID=entId)

entIDList = ['1-91120-P1', '1-91120-PM1', '91120']
patternsEnts = generatePatternList(entIDList, label=entLabel,    id=entId,    nlp=nlp, attr="LEMMA")
matcher.addEntityPattern('cws_entity_ruler', patternsEnts)

text="1-91120-P1, CLEAN PUMP AND MOTOR. 1-91120-PM1 REQUIRES OIL. 91120, CLEAN TRASH SCREEN"

doc = nlp(text)
displacy.render(doc, style='ent', jupyter=True)
28-May-25 09:57:58 dackar.workflows.WorkflowBase INFO     Create instance of WorkOrderProcessing
28-May-25 09:58:00 dackar.utils.nlp.nlp_utils INFO     Model: core_web_lg, Language: en
28-May-25 09:58:00 dackar.utils.nlp.nlp_utils INFO     Available pipelines:pysbdSentenceBoundaries, tok2vec, tagger, parser, attribute_ruler, lemmatizer, mergePhrase, normEntities, initCoref, aliasResolver, anaphorCoref, anaphorEntCoref
1-91120-P1 cws_component , CLEAN PUMP AND MOTOR. 1-91120-PM1 cws_component REQUIRES OIL. 91120 cws_component , CLEAN TRASH SCREEN

Processing work order accumulatively

[3]:
matcher.reset()
sents = list(text.split('.'))
for sent in sents:
    matcher(sent)
matcher._entStatus
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     Start to extract health status
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     End of health status extraction!
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     Start to extract causal relation using OPM model information
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     End of causal relation extraction!
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     Start to extract health status
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     End of health status extraction!
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     Start to extract causal relation using OPM model information
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     End of causal relation extraction!
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     Start to extract health status
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     End of health status extraction!
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     Start to extract causal relation using OPM model information
28-May-25 09:58:00 dackar.workflows.WorkOrderProcessing INFO     End of causal relation extraction!
[3]:
entity alias entity_text status conjecture negation negation_text
0 1-91120-P1 unit 1 pump unit 1 pump CLEAN PUMP AND MOTOR False False
1 1-91120-PM1 unit 1 pump motor unit 1 pump motor OIL False False
2 91120 pump pump CLEAN TRASH SCREEN False False

Accessing attributes of entities

[4]:
for ent in doc.ents:
    print(ent.text, ent._.alias, ent.ent_id_, ent.label_)
1-91120-P1 unit 1 pump OPM cws_component
1-91120-PM1 unit 1 pump motor OPM cws_component
91120 pump OPM cws_component