From 260e74ddd2b5485083b9bd8b30e83f17a3d39c0f Mon Sep 17 00:00:00 2001 From: Harry Stuart <42882697+harrystuart@users.noreply.github.com> Date: Wed, 7 Dec 2022 21:02:02 +1100 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 32a9d88..158b93c 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # NormNumSuM -NormNumSuM (Normalised Number of Substring Movements) is a string comparison algorithm designed to be token order invariant. The algorithm works by iteratively finding the longest substring between the two strings, removing the previously found substring at each timestep and finally comparing this value to the length of the shortest string. +NormNumSuM (Normalised Number of Substring Movements) is a string comparison algorithm designed to be token-order invariant. The algorithm works by iteratively finding the longest substring between the two strings, removing the previously found substring at each timestep and finally comparing this value to the length of the shortest string. Most existing string comparison algorithms, such as Levenshtein Distance, assert that the two strings should be "similar from left to right". These common algorithms are unsuitable for situations where one is looking to compare string similarity at the token level, where the order of tokens is less important. Rather than naively comparing tokens (where misspellings can be detrimental if token comparison is binary), the proposed substring approach allows for a more continous measure of similarity between strings at the token level. This is also a very lightweight approach relative to mechanisms employing semantic analysis.