Show simple item record

dc.contributor.authorBoyd, James
dc.contributor.authorGuiver, T.
dc.contributor.authorRandall, Sean
dc.contributor.authorFerrante, Anna
dc.contributor.authorSemmens, James
dc.contributor.authorAnderson, P.
dc.contributor.authorDickinson, T.
dc.date.accessioned2017-01-30T12:55:55Z
dc.date.available2017-01-30T12:55:55Z
dc.date.created2016-06-07T19:30:15Z
dc.date.issued2016
dc.identifier.citationBoyd, J. and Guiver, T. and Randall, S. and Ferrante, A. and Semmens, J. and Anderson, P. and Dickinson, T. 2016. A simple sampling method for estimating the accuracy of large scale record linkage projects. Methods of Information in Medicine. 55 (3): pp. 276-283.
dc.identifier.urihttp://hdl.handle.net/20.500.11937/26908
dc.identifier.doi10.3414/ME15-01-0152
dc.description.abstract

Background: Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the quality and integrity of research. Current methods for measuring linkage quality typically focus on precision (the proportion of incorrect links), given the difficulty of measuring the proportion of false negatives. Objectives: The aim of this work is to introduce and evaluate a sampling based method to estimate both precision and recall following record linkage. Methods: In the sampling based method, record-pairs from each threshold (including those below the identified cut-off for acceptance) are sampled and clerically reviewed. These results are then applied to the entire set of record-pairs, providing estimates of false positives and false negatives. This method was evaluated on a synthetically generated dataset, where the true match status (which records belonged to the same person) was known. Results: The sampled estimates of linkage quality were relatively close to actual linkage quality metrics calculated for the whole synthetic dataset. The precision and recall measures for seven reviewers were very consistent with little variation in the clerical assessment results (overall agreement using the Fleiss Kappa statistics was 0.601). Conclusions: This method presents as a possible means of accurately estimating matching quality and refining linkages in population level linkage studies. The sampling approach is especially important for large project linkages where the number of record pairs produced may be very large often running into millions.

dc.publisherSchattauer Publishers
dc.titleA simple sampling method for estimating the accuracy of large scale record linkage projects
dc.typeJournal Article
dcterms.source.volume55
dcterms.source.number3
dcterms.source.startPage276
dcterms.source.endPage283
dcterms.source.issn0026-1270
dcterms.source.titleMethods of Information in Medicine
curtin.note

This article is not an exact copy of the original published article in Methods of Information in Medicine. The definitive publisher-authenticated version of Boyd, J. and Guiver, T. and Randall, S. and Ferrante, A. and Semmens, J. and Anderson, P. and Dickinson, T. 2016. A simple sampling method for estimating the accuracy of large scale record linkage projects. Methods of Information in Medicine. 55 (3): pp. 276-283. is available online at: http://doi.org/10.3414/ME15-01-0152

curtin.departmentCentre for Population Health Research
curtin.accessStatusOpen access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record