Merge ToolBox (MTB)

MTB is a record linkage and de-duplication program in Java. Features of MTB include probabilistic and distance-based linkage techniques. MTB is free for academic use only. There is also a manual on MTB.
Current version: 0.75 (October 2017) [Updated for Stata 15 and older]

Please cite the following article if your research uses the MTB:
Schnell, R., Bachteler, T. & Bender, S. (2004): A Toolbox for record linkage; in: Austrian Journal of Statistics 33 (1-2) 125-133.

/wp-content/uploads/2017/05//mtb_logo14 mtb_logo14

Software from previous projects:


We developed a protocol for determining string similarities in a privacy-preserving manner (Safelink). The main idea is to store q-grams from identifying strings in Bloom filters using cryptographic hash functions (HMACs). The Center developed prototype software in Java; this version however isn’t offered any more and has been replaced with a new program written in Python. The program and a how-to can be downloaded via the link below. Note that Safelink is free for academic use only. Please refer to section 4 of the Safelink manual describing the use of Bloom-encoded identifiers with the Merge ToolBox.

Test Data Generator (TDGen)
TDGen is a pre-built workflow for KNIME, allowing the user to insert errors into test data for Record Linkage testing purposes. There is a publication including an installation guide for TDGen.

TDGen /wp-content/uploads/2017/05//TDGen


A macro for linking self-generated identification codes with MTB
The MTB-macro codes.xml allows automatic linkage of self-generated identification codes by the method described in
[1] Schnell, R.; Bachteler, T. & Reiher, J.: Improving the Use of Self-Generated Identification
Codes; in: Evaluation Review 34 (5) 391-418.
See readme.txt from or the Appendix in [1] for more explanation.

panelcodes /wp-content/uploads/2017/05//panelcodes