src.dackar.similarity.synsetUtils

Functions

synsetListSimilarity(synsetList1, synsetList2[, delta])

Compute similarity for synsetList pair

wordOrderSimilaritySynsetList(synsetList1, synsetList2)

Compute word order similarity for synsetList pair

constructSynsetOrderVector(synsets, jointSynsets, index)

Construct synset order vector for word order similarity calculation

identifyBestSimilarSynsetFromSynsets(syn, synsets)

Identify best similar synset from synsets

semanticSimilaritySynsets(synsetA, synsetB[, ...])

Compute the similarity between two synset using semantic analysis

pathLength(synsetA, synsetB[, alpha, disambiguation])

Path length calculation using nonlinear transfer function between two Wordnet Synsets.

scalingDepthEffect(synsetA, synsetB[, beta, ...])

Words at upper layers of hierarchical semantic nets have more general concepts and less semantic similarity

semanticSimilaritySynsetList(synsetList1, synsetList2)

Compute the similarity between two synsetList using semantic analysis

constructSemanticVector(syns, jointSyns)

Construct semantic vector

synsetsSimilarity(synsetA, synsetB[, method, ...])

Compute synsets similarity

constructSemanticVectorUsingDisambiguatedSynsets(...)

Construct semantic vector while disambiguation has been already performed

semanticSimilarityUsingDisambiguatedSynsets(synsetsA, ...)

Compute semantic similarity for given synsets while disambiguation has been already performed for given synsets

Module Contents

src.dackar.similarity.synsetUtils.synsetListSimilarity(synsetList1, synsetList2, delta=0.85)[source]

Compute similarity for synsetList pair

Parameters:
  • synsetList1 – list, list of synset

  • synsetList2 – list, list of synset

  • delta – float, between 0 and 1, factor for semantic similarity contribution

Returns:

float, the similarity score

Return type:

similarity

src.dackar.similarity.synsetUtils.wordOrderSimilaritySynsetList(synsetList1, synsetList2)[source]

Compute word order similarity for synsetList pair

Parameters:
  • synsetList1 – list, list of synset

  • synsetList2 – list, list of synset

Returns:

float, word order similarity score

src.dackar.similarity.synsetUtils.constructSynsetOrderVector(synsets, jointSynsets, index)[source]

Construct synset order vector for word order similarity calculation

Parameters:
  • synsets – list of synsets

  • jointSynsets – list of joint synsets

  • index – int, index for synsets

Returns:

np.array, synset order vector

Return type:

vector

src.dackar.similarity.synsetUtils.identifyBestSimilarSynsetFromSynsets(syn, synsets)[source]

Identify best similar synset from synsets

Parameters:
  • syn – wn.synset, synset

  • synsets – list of synsets

Returns:

the best similar synset in synsets similarity: the best similarity score

Return type:

bestSyn

src.dackar.similarity.synsetUtils.semanticSimilaritySynsets(synsetA, synsetB, disambiguation=False)[source]

Compute the similarity between two synset using semantic analysis (e.g., using both path length and depth information in wordnet)

Parameters:
  • synsetA – wordnet.synset, the first synset

  • synsetB – wordnet.synset, the second synset

Returns:

float, [0, 1], the similarity score

Return type:

similarity

src.dackar.similarity.synsetUtils.pathLength(synsetA, synsetB, alpha=0.2, disambiguation=False)[source]

Path length calculation using nonlinear transfer function between two Wordnet Synsets. The two Synsets should be the best Synset Pair (e.g., disambiguation should be performed)

Parameters:
  • synsetA – wordnet.synset, synset for first word

  • synsetB – wordnet.synset, synset for second word

  • alpha – float, a constant in monotonically descreasing function, exp(-alpha*wordnetpathLength),

  • wordnet (parameter used to scale the shortest path length. For)

  • 0.2 (the optimal value is)

  • disambiguation – bool, True if disambiguation have been performed for the given synsets

Returns:

float, [0, 1], the shortest distance between two synsets using exponential descreasing function.

Return type:

shortDistance

src.dackar.similarity.synsetUtils.scalingDepthEffect(synsetA, synsetB, beta=0.45, disambiguation=False)[source]

Words at upper layers of hierarchical semantic nets have more general concepts and less semantic similarity between words than words at lower layers. This method is used to scale the similarity behavior with repect to depth h (e.g., [exp(beta*h)-exp(-beta*g)]/[exp(beta*h)+exp(-beta*g)]) The two Synsets should be the best Synset Pair (e.g., disambiguation should be performed)

Parameters:
  • synsetA – wordnet.synset, synset for first word

  • synsetB – wordnet.synset, synset for second word

  • beta – float, parameter used to scale the shortest depth, for wordnet, the optimal value is 0.45

  • disambiguation – bool, True if disambiguation have been performed for the given synsets

Returns:

float, [0, 1], similary score based on depth effect in wordnet

Return type:

treeDist

src.dackar.similarity.synsetUtils.semanticSimilaritySynsetList(synsetList1, synsetList2)[source]

Compute the similarity between two synsetList using semantic analysis (i.e., compute the similarity using both path length and depth information in wordnet)

Parameters:
  • synsetList1 – list, the list of synset

  • synsetList2 – list, the list of synset

Returns:

float, [0, 1], the similarity score

Return type:

semSimilarity

src.dackar.similarity.synsetUtils.constructSemanticVector(syns, jointSyns)[source]

Construct semantic vector

Parameters:
  • syns – list of synsets

  • jointSyns – list of joint synsets

Returns:

numpy.array, the semantic vector

Return type:

vector

src.dackar.similarity.synsetUtils.synsetsSimilarity(synsetA, synsetB, method='semantic_similarity_synsets', disambiguation=True)[source]

Compute synsets similarity

Parameters:
  • synsetA – wordnet.synset, the first synset

  • synsetB – wordnet.synset, the second synset

  • method – str, the method used to compute synset similarity

  • ['semantic_similarity_synsets' (one of)

  • 'path'

  • 'wup'

  • 'lch'

  • 'res'

  • 'jcn'

  • 'lin']

  • disambiguation – bool, True if disambiguation has been already performed

Returns:

float, [0, 1], the similarity score

Return type:

similarity

src.dackar.similarity.synsetUtils.constructSemanticVectorUsingDisambiguatedSynsets(wordSynsets, jointWordSynsets, simMethod='semantic_similarity_synsets')[source]

Construct semantic vector while disambiguation has been already performed

Parameters:
  • wordSynsets – set/list, set of words synsets

  • jointWords – set, set of joint words synsets

  • simMethod – str, method for similarity analysis in the construction of semantic vectors

  • ['semantic_similarity_synsets' (one of)

  • 'path'

  • 'wup'

  • 'lch'

  • 'res'

  • 'jcn'

  • 'lin']

Returns:

numpy.array, semantic vector with disambiguation

Return type:

vector

src.dackar.similarity.synsetUtils.semanticSimilarityUsingDisambiguatedSynsets(synsetsA, synsetsB, simMethod='semantic_similarity_synsets')[source]

Compute semantic similarity for given synsets while disambiguation has been already performed for given synsets

Parameters:
  • synsetsA – set/list, list of synsets

  • synsetsB – set/list, list of synsets

  • simMethod – str, method for similarity analysis in the construction of semantic vectors

  • ['semantic_similarity_synsets' (one of)

  • 'path'

  • 'wup'

  • 'lch'

  • 'res'

  • 'jcn'

  • 'lin']

Returns:

float, [0, 1], the similarity score

Return type:

semSimilarity