R Package
A toolbox for Privacy-preserving Record Linkage (PPRL)
A Toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques using R.
Combines the functionality of the MergeToolBox (see below) with current privacy-preserving techniques.
Software
Merge ToolBox (MTB)
MTB is a record linkage and de-duplication program in Java. Features of MTB include probabilistic and distance-based linkage techniques. MTB is free for academic use only. There is also a manual on MTB.
Current version: 0.751 (July 2024) [Minor bug fixes]
Please cite the following article if your research uses the MTB:
Schnell, R., Bachteler, T. & Bender, S. (2004): A Toolbox for record linkage; in: Austrian Journal of Statistics 33 (1-2) 125-133.
Software from previous projects:
Safelink
We developed a protocol for determining string similarities in a privacy-preserving manner (Safelink). The main idea is to store q-grams from identifying strings in Bloom filters using cryptographic hash functions (HMACs). The Center developed prototype software in Java; this version however isn’t offered any more and has been replaced with a new program written in Python. The program and a how-to can be downloaded via the link below. Note that Safelink is free for academic use only. Please refer to section 4 of the Safelink manual describing the use of Bloom-encoded identifiers with the Merge ToolBox.
Test Data Generator (TDGen)
TDGen is a pre-built workflow for KNIME, allowing the user to insert errors into test data for Record Linkage testing purposes. There is a publication including an installation guide for TDGen.
A macro for linking self-generated identification codes with MTB
The MTB-macro codes.xml allows automatic linkage of self-generated identification codes by the method described in
[1] Schnell, R.; Bachteler, T. & Reiher, J.: Improving the Use of Self-Generated Identification
Codes; in: Evaluation Review 34 (5) 391-418.
See readme.txt from panelcodes.zip or the Appendix in [1] for more explanation.