Technical challenges of providing record linkage services for research
MetadataShow full item record
Background: Record linkage techniques are widely used to enable health researchers to gain event based longitudinal information for entire populations. The task of record linkage is increasingly being undertaken by specialised linkage units (SLUs). In addition to the complexity of undertaking probabilistic record linkage, these units face additional technical challenges in providing record linkage ‘as a service’ for research. The extent of this functionality, and approaches to solving these issues, has had little focus in the record linkage literature. Few, if any, of the record linkage packages or systems currently used by SLUs include the full range of functions required. Methods: This paper identifies and discusses some of the functions that are required or undertaken by SLUs in the provision of record linkage services. These include managing routine, on-going linkage; storing and handling changing data; handling different linkage scenarios; accommodating ever increasing datasets. Automated linkage processes are one way of ensuring consistency of results and scalability of service. Results: Alternative solutions to some of these challenges are presented. By maintaining a full history of links, and storing pairwise information, many of the challenges around handling ‘open’ records, and providing automated managed extractions are solved. A number of these solutions were implemented as part of the development of the National Linkage System (NLS) by the Centre for Data Linkage (part of the Population Health Research Network) in Australia.Conclusions: The demand for, and complexity of, linkage services are growing. This presents as a challenge to SLUs as they seek to service the varying needs of dozens of research projects annually. Linkage units need to be both flexible and scalable to meet this demand. It is hoped the solutions presented here can help mitigate these difficulties.
This article is published under the Open Access publishing model and distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/2.0/. Please refer to the licence to obtain terms for any further reuse or distribution of this work.
Showing items related by title, author, creator and subject.
Boyd, James; Ferrante, Anna; O'Keefe, C.; Bass, A.; Randall, Sean; Semmens, James (2012)Background: The Centre for Data Linkage (CDL) has been established to enable national and cross-jurisdictional health-related research in Australia. It has been funded through the Population Health Research Network (PHRN), ...
Boyd, James; Randall, Sean; Ferrante, Anna (2015)Record linkage is the process of bringing together data relating to the same individual within and between different datasets. These integrated datasets provide diverse and rich resources for researchers without the cost ...
Boyd, James; Randall, Sean; Ferrante, Anna; Bauer, Jacqui; McInneny, K.; Brown, Adrian; Spilsbury, Katrina; Gillies, Margo; Semmens, James (2015)Background - The technical challenges associated with national data linkage, and the extent of cross-border population movements, are explored as part of a pioneering research project. The project involved linking state-based ...