numerical similarities
Objectives Development of suitable similarity functions for numerical identifiers
(dependence on semantics of numerical differences) and appropriate
similarity thresholds.
Description For the comparison of strings (e.g. surnames) there are well tried
similarity functions as edit distances or qgrams. Concerning numerical
characteristics these are not suitable since if numbers are treated as
strings their numeric values are ignored (Agrawal and Srikant 2002).
However, a numerical similarity depends on the semantics of differences:
A difference of 50 may be a considerable quantity in one situation
(e.g. age of persons), but may be negligible in other circumstances
(e.g. yearly earnings in Euro). Since research on numerical similarities
is sparse up to now, further work on this topic is pressingly needed.
Literature
R. Agrawal, R. Srikant 2002. Searching with numbers; in: Proceedings of the 11th International Conference on the World Wide Web, Honolulu 2002. New York: ACM, pp. 420431.
