Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    A simple sampling method for estimating the accuracy of large scale record linkage projects

    240880_240880.pdf (267.4Kb)
    Access Status
    Open access
    Authors
    Boyd, James
    Guiver, T.
    Randall, Sean
    Ferrante, Anna
    Semmens, James
    Anderson, P.
    Dickinson, T.
    Date
    2016
    Type
    Journal Article
    
    Metadata
    Show full item record
    Citation
    Boyd, J. and Guiver, T. and Randall, S. and Ferrante, A. and Semmens, J. and Anderson, P. and Dickinson, T. 2016. A simple sampling method for estimating the accuracy of large scale record linkage projects. Methods of Information in Medicine. 55 (3): pp. 276-283.
    Source Title
    Methods of Information in Medicine
    DOI
    10.3414/ME15-01-0152
    ISSN
    0026-1270
    School
    Centre for Population Health Research
    Remarks

    This article is not an exact copy of the original published article in Methods of Information in Medicine. The definitive publisher-authenticated version of Boyd, J. and Guiver, T. and Randall, S. and Ferrante, A. and Semmens, J. and Anderson, P. and Dickinson, T. 2016. A simple sampling method for estimating the accuracy of large scale record linkage projects. Methods of Information in Medicine. 55 (3): pp. 276-283. is available online at: http://doi.org/10.3414/ME15-01-0152

    URI
    http://hdl.handle.net/20.500.11937/26908
    Collection
    • Curtin Research Publications
    Abstract

    Background: Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the quality and integrity of research. Current methods for measuring linkage quality typically focus on precision (the proportion of incorrect links), given the difficulty of measuring the proportion of false negatives. Objectives: The aim of this work is to introduce and evaluate a sampling based method to estimate both precision and recall following record linkage. Methods: In the sampling based method, record-pairs from each threshold (including those below the identified cut-off for acceptance) are sampled and clerically reviewed. These results are then applied to the entire set of record-pairs, providing estimates of false positives and false negatives. This method was evaluated on a synthetically generated dataset, where the true match status (which records belonged to the same person) was known. Results: The sampled estimates of linkage quality were relatively close to actual linkage quality metrics calculated for the whole synthetic dataset. The precision and recall measures for seven reviewers were very consistent with little variation in the clerical assessment results (overall agreement using the Fleiss Kappa statistics was 0.601). Conclusions: This method presents as a possible means of accurately estimating matching quality and refining linkages in population level linkage studies. The sampling approach is especially important for large project linkages where the number of record pairs produced may be very large often running into millions.

    Related items

    Showing items related by title, author, creator and subject.

    • Estimating parameters for probabilistic linkage of privacy-preserved datasets.
      Brown, A.; Randall, Sean; Ferrante, A.; Semmens, J.; Boyd, J. (2017)
      Background: Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. ...
    • Sociodemographic differences in linkage error: An examination of four large-scale datasets
      Randall, Sean; Brown, Adrian; Boyd, James; Schnell, R.; Borgs, C.; Ferrante, Anna (2018)
      © 2018 The Author(s). Background: Record linkage is an important tool for epidemiologists and health planners. Record linkage studies will generally contain some level of residual record linkage error, where individual ...
    • The effect of data cleaning on record linkage quality
      Randall, Sean; Ferrante, Anna; Boyd, James; Semmens, James (2013)
      Background: Within the field of record linkage, numerous data cleaning and standardisation techniques are employed to ensure the highest quality of links. While these facilities are common in record linkage software ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.