Abbreviation Handler Demo¶
Abbreviation classes are used to substitute abbreviations with full expansions, and there are two internal developed classes:
Abbreviation class: it is used to directly substitute the abbreviations with full expansions. Users can provide their own abbreviation dictionary.
AbbrExpander class: it utilizes a more sophisticated method, i.e., spell checking with word similarity search, to identify abbreviations and substitute them with full expansions.
AbbrExpander class¶
[ ]:
import pandas as pd
import os, sys
import time
cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)
# Load AbbrExpander from DACKAR
from dackar.text_processing.AbbrExpander import AbbrExpander
[ ]:
# Text example
test = """Perf ann sens calib of cyl.
High conc of hydrogen obs.
High conc of hydrogen obs every wk.
Prfr chann calib of chan.
esf pump room and fuel bldg test.
cal press xmtr sit elev.
perform thermography survey of pzr htr terminations.
plant mods comp iso mode prep.
drain & rmv pipe."""
test = test.lower()
text = """A leak was noticed from the pump.
RCP pump 1A pressure gauge was found not operating.
RCP pump 1A pressure gauge was found inoperative.
RCP pump 1A pressure gauge was not functional.
Rupture of pump bearings caused shaft degradation.
Rupture of pump bearings caused shaft degradation and consequent flow reduction.
Pump power supply has been found burnout.
Pump test failed due to power supply failure.
Pump inspection revealed excessive impeller degradation.
Pump inspection revealed excessive impeller degradation likely due to cavitation.
Oil puddle was found in proximity of RCP pump 1A.
Anomalous vibrations were observed for RCP pump 1A.
Several cracks on pump shaft were observed; they could have caused pump failure within few days.
"""
text = text.lower()
[ ]:
# import pre-generated abbreviation list
filename = os.path.join(os.getcwd(), os.pardir, 'data', 'abbreviations.xlsx')
abbrList = pd.read_excel(filename)
abbrList.head()
[ ]:
# Utilize AbbrExpander to replace abbreviations
AbbrExp = AbbrExpander(filename)
cleanedTest = AbbrExp.abbrProcess(test, splitToList='True')
print('Test:\n', cleanedTest)
cleanedText = AbbrExp.abbrProcess(text)
print('Text:\n', cleanedText)
Abbreviation class¶
[ ]:
# Load Abbreviation from DACKAR
from dackar.text_processing.Abbreviation import Abbreviation
abbreviation = Abbreviation()
abbrDict = abbreviation.getAbbreviation()
print(abbrDict)
[ ]:
# Test
cleanedTest = abbreviation.abbreviationSub(test)
print(cleanedTest)
[ ]:
# Utilize user provided abbreviation dictionary
abbrDict = {'perf':'perform', 'ann':'annual', 'sens':'sensor', 'calib':'calibration'}
abbreviation.updateAbbreviation(abbrDict, reset=True)
print(abbreviation.getAbbreviation())
cleanedText = abbreviation.abbreviationSub(test)
print(cleanedText)