Match profits rateюHow to calculate the similarity between two words/strings.
The string similarity formula originated to satisfy listed here demands:
- A real expression of lexical similarity – strings with tiny variations must named becoming similar. Specifically, a significant sub-string convergence should point to a top amount of similarity involving the chain.
- A robustness to changes of keyword purchase- two strings that incorporate the exact same phrase, however in another type of purchase, should-be seen as getting comparable. In contrast, if a person sequence is simply a random anagram regarding the characters within the various other, it should (usually) become recognized as dissimilar.
- Code independence – the formula should function not just in English, and in many different dialects.
Solution
The similarity are calculated in three steps:
- Partition each sequence into a summary of tokens.
- Processing the similarity between tokens simply by using a sequence edit-distance formula (expansion function: semantic similarity dimension by using the WordNet library). (περισσότερα…)