Creating a Research-Ready Data Asset version of primary care data for Wales and investigating the impact of COVID-19 on utilisation of primary care services
Hoda Abbasizanjani, Stuart Bedston, Ashley Akbari
Abstract
We developed an efficient Research-Ready Data Asset (RRDA) for the Welsh Longitudinal General Practice (WLGP) data within the Secure Anonymised Information Linkage Databank to standardise curation, enhance reproducibility, and facilitate research on primary care trends.
Introduction
Primary care is the foundation of healthcare delivery, providing continuous, comprehensive, and accessible services for a broad range of health concerns. General practices (GPs) serve as the first point of contact for most patients, managing acute and chronic conditions, coordinating specialist referrals, and delivering preventive care [1].
Methods
Approval for the use of anonymised data in this study, provisioned within the SAIL Databank, was granted by an independent Information Governance Review Panel (IGRP) under project 0911. The IGRP has a membership comprised of senior representatives from the British Medical Association
Results
The WLGP contains more than 4.6 billion clinical event records for approximately 5.1 million individuals from 1990 to 2024. Of those, 98.3% of records are linkable to individuals with valid GP registration records in WDSD.
Discussion
This study describes the development of the WLGP RRDA, a curated and structured version of primary care data for Wales designed to improve analytic readiness, consistency, and scalability of research using WLGP data within the SAIL Databank. The RRDA enables more efficient use of routinely collected primary care data by applying methodical cleaning.
Acknowledgments
We would like to thank Lucy Robinson and Mattew Curds for their valuable input and support during the development of this work as part of the ADR Wales Major Societal Challenges research team.
Citation: Abbasizanjani H, Bedston S, Akbari A (2025) Creating a Research-Ready Data Asset version of primary care data for Wales and investigating the impact of COVID-19 on utilisation of primary care services. PLoS One 20(12): e0338652. https://doi.org/10.1371/journal.pone.0338652
Editor: Maria Christine Magnus, Norwegian Institute of Public Health: Folkehelseinstituttet, NORWAY
Received: September 23, 2025; Accepted: November 25, 2025; Published: December 10, 2025
Copyright: © 2025 Abbasizanjani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this study are available in the SAIL Databank at Swansea University, Swansea, UK. All proposals to use SAIL data are subject to review by an independent Information Governance Review Panel (IGRP). Before any data can be accessed, approval must be given by the IGRP. The IGRP carefully considers each project to ensure the proper and appropriate use of SAIL data. When approved, access is gained through a privacy-protecting trusted research environment (TRE) and remote access system referred to as the SAIL Gateway. SAIL has established an application process to be followed by anyone who would like to access data via SAIL https://saildatabank.com/data/apply-to-work-with-the-data/. This study has been approved by the IGRP as project 0911. The scripts used for data cleaning, curation, and analysis are openly available at https://github.com/SwanseaUniversityDataScience/WLGP_RRDA/.
Funding: This work was supported by the ADR Wales programme of work. ADR Wales, part of the ADR UK investment, unites research expertise from Swansea University Medical School and WISERD (Wales Institute of Social and Economic Research and Data) at Cardiff University with analysts from Welsh Government. ADR UK is funded by the Economic and Social Research Council (ESRC), part of UK Research and Innovation. This research was supported by ESRC funding, including Administrative Data Research Wales (ES/W012227/1).
Competing interests: The authors have declared that no competing interests exist.
Abbreviation:: ALF, Anonymised Linkage Field; DHCW, Digital Health and Care Wales; EHR, Electronic health records; GAMM, Generalised Additive Mixed Model; GP, General practice; IGRP, Information Governance Review Panel; ONS, Office for National Statistics; RRDA, Research-Ready Data Asset; SAIL Databank, Secure Anonymised Information Linkage Databank; SNOMED-CT, Systematized Nomenclature of Medicine Clinical Terms; TRE, Trusted Research Environment; WDSD, Welsh Demographic Service Dataset; WIMD, Welsh Index of Multiple Deprivation; WLGP, Welsh Longitudinal General Practice data