Alias Pipeline DemoΒΆ
Alias pipeline (i.e., aliasResolver) is one of the internal developed natural language processing (NLP) pipelines. It is used to annotate identified name entities with alias, and it can be accessed through:
ent._.alias
Note: default alias file located at ./DACKAR/data/alias.csv
is used. Users can also provide their own alias using config file located at ./DACKAR/src/dackar/config/nlp_config_default.toml
using keyword alias_file
.
Set up
[ ]:
# Setup loading path, and load aliasResolver pipeline
import os, sys
cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)
# Load aliasResolver pipeline
from dackar.pipelines.CustomPipelineComponents import aliasResolver
from dackar.utils.nlp.nlp_utils import resetPipeline
# Load General Entity pipeline aliasResolver is only used to annotate entities
from dackar.pipelines.GeneralEntity import GeneralEntity
# Load pattern generation
from dackar.utils.nlp.nlp_utils import generatePatternList
# Load trained language model/pipeline from spacy, the language model/pipeline includes tok2vec, tagger, parser, attribute_ruler, lemmatizer, ner etc.
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_lg")
Reset pipeline and add aliasResolver
[ ]:
# aliasResolver pipeline should always after "entity_ruler"
pipelines = ['aliasResolver']
resetPipeline(nlp, pipelines)
print(nlp.pipeline)
Example
[ ]:
# Example
text="1-91120-P1, CLEAN PUMP AND MOTOR. 1-91120-PM1 REQUIRES OIL. 91120, CLEAN TRASH SCREEN"
[ ]:
# For this demo, General Entity class is used instead spacy default NER pipeline
pipeline = [pipe for (pipe,_) in nlp.pipeline]
if "enity_ruler" in pipeline:
nlp.remove_pipe("entity_ruler")
if "ner" in pipeline:
nlp.remove_pipe("ner")
# Specify Entities Labels and IDs
entLabel = "cws_component" # user defined entity label
entId = "OPM"
entIDList = ['1-91120-P1', '1-91120-PM1', '91120']
# Generate pattern list
patternsEnts = generatePatternList(entIDList, label=entLabel, id=entId, nlp=nlp, attr="LEMMA")
# Apply General Entity class to identify corresponding entities
generalEntity = GeneralEntity(nlp, patternsEnts)
# NLP processing and Entities visualization
doc = nlp(text)
displacy.render(doc, style='ent', jupyter=True)
[ ]:
# Check 'alias' annotation
for ent in doc.ents:
print('Entity:', ent.text, '| alias:', ent._.alias)