Show simple item record

dc.contributor.authorVidanage, Anushka
dc.contributor.authorRanbaduge, Thilina
dc.contributor.authorChristen, Peter
dc.contributor.authorRandall, Sean
dc.date.accessioned2020-11-26T05:01:18Z
dc.date.available2020-11-26T05:01:18Z
dc.date.issued2020
dc.identifier.citationVidanage, A. and Ranbaduge, T. and Christen, P. and Randall, S. 2020. A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage. International Journal of Population Data Science. 5 (1).
dc.identifier.urihttp://hdl.handle.net/20.500.11937/81786
dc.identifier.doi10.23889/ijpds.v5i1.1345
dc.description.abstract

Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensitive information led to the development of privacy-preserving record linkage techniques. The multiple dynamic match-key encoding approach recently proposed by Randall et al. (IJPDS, 2019) is such a technique aimed at providing sufficient privacy for linkage applications while obtaining high linkage quality. However, the use of this encoding in large databases can reveal frequency information that can allow the re-identification of encoded values.

Objectives

We propose a frequency-based attack to evaluate the privacy guarantees of multiple dynamic match-key encoding. We then present two improvements to this match-key encoding approach to prevent such a privacy attack.

Methods

The proposed attack analyses the frequency distributions of individual match-keys in order to identify the attributes used for each match-key, where we assume the adversary has access to a plain-text database with similar characteristics as the encoded database. We employ a set of statistical correlation tests to compare the frequency distributions of match-key values between the encoded and plain-text databases. Once the attribute combinations used for match-keys are discovered, we then re-identify encoded sensitive values by utilising a frequency alignment method. Next, we propose two modifications to the match-key encoding; one to alter the original frequency distributions and another to make the frequency distributions uniform. Both will help to prevent frequency-based attacks.

Results

We evaluate our privacy attack using two large real-world databases. The results show that in certain situations the attack can successfully re-identify a set of sensitive values encoded using the multiple dynamic match-key encoding approach. On the databases used in our experiments, the attack is able to re-identify plain-text values with a precision and recall of both up to 98%. Furthermore, we show that our proposed improvements are able to make this attack harder to perform with only a small reduction in linkage quality.

Conclusions

Our proposed privacy attack demonstrates the weaknesses of multiple match-key encoding that should be taken into consideration when linking databases that contain sensitive personal information. Our proposed modifications ensure that the multiple dynamic match-key encoding approach can be used securely while retaining high linkage quality.

dc.publisherSwansea University
dc.relation.sponsoredbyhttp://purl.org/au-research/grants/arc/DP160101934
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titleA Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
dc.typeJournal Article
dcterms.source.volume5
dcterms.source.number1
dcterms.source.issn2399-4908
dcterms.source.titleInternational Journal of Population Data Science
dc.date.updated2020-11-26T05:01:17Z
curtin.departmentSchool of Public Health
curtin.accessStatusOpen access
curtin.facultyFaculty of Health Sciences
curtin.contributor.orcidRandall, Sean [0000-0002-2756-5090]


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

http://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/