Alias Pipeline DemoΒΆ
Alias pipeline (i.e., aliasResolver) is one of the internal developed natural language processing (NLP) pipelines. It is used to annotate identified name entities with alias, and it can be accessed through:
ent._.alias
Note: default alias file located at ./DACKAR/data/alias.csv
is used. Users can also provide their own alias using config file located at ./DACKAR/src/dackar/config/nlp_config_default.toml
using keyword alias_file
.
Set up
[ ]:
# Setup loading path, and load aliasResolver pipeline
import os, sys
cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)
# Load aliasResolver pipeline
from dackar.pipelines.CustomPipelineComponents import aliasResolver
from dackar.utils.nlp.nlp_utils import resetPipeline
# Load General Entity pipeline aliasResolver is only used to annotate entities
from dackar.pipelines.GeneralEntity import GeneralEntity
# Load pattern generation
from dackar.utils.nlp.nlp_utils import generatePatternList
# Load trained language model/pipeline from spacy, the language model/pipeline includes tok2vec, tagger, parser, attribute_ruler, lemmatizer, ner etc.
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_lg")
Reset pipeline and add aliasResolver
[ ]:
# aliasResolver pipeline should always after "entity_ruler"
pipelines = ['aliasResolver']
resetPipeline(nlp, pipelines)
print(nlp.pipeline)
[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec object at 0x33ef4ed50>), ('tagger', <spacy.pipeline.tagger.Tagger object at 0x33ef4f830>), ('parser', <spacy.pipeline.dep_parser.DependencyParser object at 0x33ef322d0>), ('attribute_ruler', <spacy.pipeline.attributeruler.AttributeRuler object at 0x33f0c0850>), ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer object at 0x33f0c18d0>), ('ner', <spacy.pipeline.ner.EntityRecognizer object at 0x33ef32650>), ('aliasResolver', <function aliasResolver at 0x32be9d440>)]
Example
[3]:
# Example
text="1-91120-P1, CLEAN PUMP AND MOTOR. 1-91120-PM1 REQUIRES OIL. 91120, CLEAN TRASH SCREEN"
[4]:
# For this demo, General Entity class is used instead spacy default NER pipeline
pipeline = [pipe for (pipe,_) in nlp.pipeline]
if "enity_ruler" in pipeline:
nlp.remove_pipe("entity_ruler")
if "ner" in pipeline:
nlp.remove_pipe("ner")
# Specify Entities Labels and IDs
entLabel = "cws_component" # user defined entity label
entId = "OPM"
entIDList = ['1-91120-P1', '1-91120-PM1', '91120']
# Generate pattern list
patternsEnts = generatePatternList(entIDList, label=entLabel, id=entId, nlp=nlp, attr="LEMMA")
# Apply General Entity class to identify corresponding entities
generalEntity = GeneralEntity(nlp, patternsEnts)
# NLP processing and Entities visualization
doc = nlp(text)
displacy.render(doc, style='ent', jupyter=True)
1-91120-P1
cws_component
, CLEAN PUMP AND MOTOR.
1-91120-PM1
cws_component
REQUIRES OIL.
91120
cws_component
, CLEAN TRASH SCREEN
[5]:
# Check 'alias' annotation
for ent in doc.ents:
print('Entity:', ent.text, '| alias:', ent._.alias)
Entity: 1-91120-P1 | alias: unit 1 pump
Entity: 1-91120-PM1 | alias: unit 1 pump motor
Entity: 91120 | alias: pump