Privacy-preserving record linkage on large real world datasets
Access Status
Authors
Date
2014Type
Metadata
Show full item recordCitation
Source Title
ISSN
School
Collection
Abstract
Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised.
Related items
Showing items related by title, author, creator and subject.
-
Brown, Adrian; Ferrante, Anna; Randall, Sean; Boyd, James; Semmens, James (2017)© 2017 Brown, Ferrante, Randall, Boyd and Semmens. In an era where the volume of structured and unstructured digital data has exploded, there has been an enormous growth in the creation of data about individuals that can ...
-
Boyd, James; Randall, Sean; Ferrante, Anna (2015)Record linkage is the process of bringing together data relating to the same individual within and between different datasets. These integrated datasets provide diverse and rich resources for researchers without the cost ...
-
Brown, A.; Randall, Sean; Ferrante, A.; Semmens, J.; Boyd, J. (2017)Background: Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. ...