Show simple item record

dc.contributor.authorBrown, A.
dc.contributor.authorBorgs, C.
dc.contributor.authorRandall, Sean
dc.contributor.authorSchnell, R.
dc.date.accessioned2017-06-23T02:58:35Z
dc.date.available2017-06-23T02:58:35Z
dc.date.created2017-06-19T03:39:44Z
dc.date.issued2017
dc.identifier.citationBrown, A. and Borgs, C. and Randall, S. and Schnell, R. 2017. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.. BMC Medical Informatics and Decision Making. 17 (1): Article ID 83.
dc.identifier.urihttp://hdl.handle.net/20.500.11937/53086
dc.identifier.doi10.1186/s12911-017-0478-5
dc.description.abstract

Background: Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. Methods: Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. Results: Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. Conclusions: We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed.

dc.publisherBiomed Central Ltd
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titleEvaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.
dc.typeJournal Article
dcterms.source.volume17
dcterms.source.number1
dcterms.source.issn1472-6947
dcterms.source.titleBMC Medical Informatics and Decision Making
curtin.departmentCentre for Population Health Research
curtin.accessStatusOpen access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

http://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/