Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    The effect of data cleaning on record linkage quality

    191935_93901_The_effect_of_data_cleaning_74544.pdf (342.5Kb)
    Access Status
    Open access
    Authors
    Randall, Sean
    Ferrante, Anna
    Boyd, James
    Semmens, James
    Date
    2013
    Type
    Journal Article
    
    Metadata
    Show full item record
    Citation
    Randall, Sean M. and Ferrante, Anna M. and Boyd, James H. and Semmens, James B. 2013. The effect of data cleaning on record linkage quality. BMC Medical Informatics and Decision Making. 13 (64): pp. 1-10.
    Source Title
    BMC Medical Informatics and Decision Making
    Additional URLs
    http://www.biomedcentral.com/1472-6947/13/64
    ISSN
    1472-6947
    Remarks

    This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    URI
    http://hdl.handle.net/20.500.11937/17174
    Collection
    • Curtin Research Publications
    Abstract

    Background: Within the field of record linkage, numerous data cleaning and standardisation techniques are employed to ensure the highest quality of links. While these facilities are common in record linkage software packages and are regularly deployed across record linkage units, little work has been published demonstrating the impact of data cleaning on linkage quality.Methods: A range of cleaning techniques was applied to both a synthetically generated dataset and a large administrative dataset previously linked to a high standard. The effect of these changes on linkage quality was investigated using pairwise F-measure to determine quality.Results: Data cleaning made little difference to the overall linkage quality, with heavy cleaning leading to a decrease in quality. Further examination showed that decreases in linkage quality were due to cleaning techniques typically reducing the variability – although correct records were now more likely to match, incorrect records were also more likely to match, and these incorrect matches outweighed the correct matches, reducing quality overall.Conclusions: Data cleaning techniques have minimal effect on linkage quality. Care should be taken during the data cleaning process.

    Related items

    Showing items related by title, author, creator and subject.

    • Estimating parameters for probabilistic linkage of privacy-preserved datasets.
      Brown, A.; Randall, Sean; Ferrante, A.; Semmens, J.; Boyd, J. (2017)
      Background: Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. ...
    • A simple sampling method for estimating the accuracy of large scale record linkage projects
      Boyd, James; Guiver, T.; Randall, Sean; Ferrante, Anna; Semmens, James; Anderson, P.; Dickinson, T. (2016)
      Background: Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the ...
    • A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
      Vidanage, Anushka; Ranbaduge, Thilina; Christen, Peter; Randall, Sean (2020)
      Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensitive information led to the development of privacy-preserving ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.