Installation¶
Operating Environments¶
DACKAR can run on Microsoft Windows, Apple OSX and Linux platforms.
Clone DACKAR¶
The HTTP cloning procedure uses the following clone command:
git clone https://github.com/idaholab/DACKAR.git
The SSH cloning procedure requires the user to create a SSH key (See: https://help.github.com/articles/connecting-to-github-with-ssh/). Once the SSH key has been created, to clone DACKAR the following command can be executed:
git clone git@github.com:idaholab/DACKAR.git
Install the Required Libraries¶
conda create -n dackar_libs python=3.11
conda activate dackar_libs
pip install spacy==3.5 stumpy textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy==1.26 scikit-learn pyspellchecker contextualSpellCheck pandas
Install Additional Libraries¶
Library neo4j
is a Python module that is used to communicate with Neo4j database management system,
and jupyterlab
is used to execute notebook examples under ./examples/
folder.
pip install neo4j jupyterlab
Download Language Model from spaCy¶
python -m spacy download en_core_web_lg
python -m coreferee install en
Required NLTK Data for Similarity Analysis¶
python -m nltk.downloader all
Retrain Quantulum3 Classifier (Optional)¶
quantulum3-training -s
Different Approach When There is an Issue with SSLError¶
Download en_core_web_lg-3.5.0.whl, then run
python -m pip install ./en_core_web_lg-3.5.0.whl
Download coreferee, then run:
python -m pip install ./coreferee_model_en.zip
run script DACKAR/nltkDownloader.py to download nltk data:
python nltkDownloader.py
or check installing_nltk_data on how to manually install nltk data. For this project, users can also try these steps:
cd ~
mkdir nltk_data
cd nltk_data
mkdir corpora
mkdir taggers
mkdir tokenizers
Dowload wordnet, averaged_perceptron_tagger, punkt
cp -r wordnet ~/nltk_data/corpora/
cp -r averaged_perceptron_tagger ~/nltk_data/taggers/
cp -r punkt ~/nltk_data/tokenizers