src.dackar.similarity.SentenceSimilarity

Attributes

log_format

logger

Classes

SentenceSimilarity

Module Contents

src.dackar.similarity.SentenceSimilarity.log_format = '%(asctime)s %(message)s'[source]
src.dackar.similarity.SentenceSimilarity.logger[source]
class src.dackar.similarity.SentenceSimilarity.SentenceSimilarity(disambiguationMethod='simple_lesk', similarityMethod='semantic_similarity_synsets', wordOrderContribution=0.0)[source]
validDisambiguation = ['simple_lesk', 'original_lesk', 'cosine_lesk', 'adapted_lesk', 'max_similarity'][source]
wordnetSimMethod = ['path_similarity', 'wup_similarity', 'lch_similarity', 'res_similarity', 'jcn_similarity',...[source]
validSimilarity = ['path_similarity', 'wup_similarity', 'lch_similarity', 'res_similarity', 'jcn_similarity',...[source]
wordOrder = 0.0[source]
disambiguationMethod = ''[source]
similarityMethod = ''[source]
brownIc[source]
setParameters(paramDict)[source]

Method to set the parameters

constructSimilarityVectorPawarMagoMethod(arr1, arr2)[source]

Construct the similarity vector

Parameters:
  • arr1 – set of wordnet.Synset for one sentence

  • arr2 – set of wordnet.Synset for the other sentence

Returns:

list, list of similarity vector count: int, the number of words that have high similarity >=0.804

Return type:

vector

sentenceSimilarity(sentence1, sentence2, method='pm_disambiguation', infoContentNorm=False)[source]

sentence similarity calculation

sentenceSimilarityPawarMagoMethod(sentence1, sentence2)[source]

Proposed method from https://arxiv.org/pdf/1802.05667.pdf

Parameters:
  • sentence1 – str, first sentence used to compute sentence similarity

  • sentence2 – str, second sentence used to compute sentence similarity

Returns:

float, [0, 1], the computed similarity for given two sentences

Return type:

similarity

sentenceSimialrityBestSense(sentence1, sentence2, infoContentNorm=False)[source]

Proposed method from https://github.com/anishvarsha/Sentence-Similaritity-using-corpus-statistics Compute sentence similarity using both semantic and word order similarity The semantic similarity is based on maximum word similarity between one word and another sentence

Parameters:
  • sentence1 – str, first sentence used to compute sentence similarity

  • sentence2 – str, second sentence used to compute sentence similarity

  • infoContentNorm – bool, True if statistics corpus is used to weight similarity vectors

Returns:

float, [0, 1], the computed similarity for given two sentences

Return type:

similarity