How to Protect Patient Privacy When Launching a New Healthcare Data Project

Philip Russmeyer, Founder & CEO, FITFILE

Patient health data is an invaluable resource for healthcare organizations: it can be used to improve treatment pathways, manage demand and capacity, and raise safety standards and much more. However, confusion and concerns around the safe and compliant use of patient data is preventing many of these benefits from being realized. 


Health data expert Philip Russmeyer explains how to navigate the hurdles of patient data management, and how healthcare organizations can store, access and analyze patient data in the most secure way. This includes an explanation of: how to release valuable data from silos, the crucial differences between data anonymization and pseudonymization, and when, why and how each privacy-protecting process ought to be deployed.  

If asked to rank the most valuable assets of a healthcare organization, most people would rank the staff, medical equipment, medications, or even the software highest on their list. 

However, if we think about ‘value’ as ‘potential to save lives and improve multiple organizational outcomes at scale’, it’s the value of patient health data that stands out far ahead of all other assets. In Europe in 2023, every hospital and healthcare system stands to become a more efficient, more evidence-directed, and more effective patient care provider by better accessing, linking and deploying data resources in a secure and compliant manner. Despite this, many leaders are reluctant to improve their data utilization because the mountain appears too difficult and too dangerous to climb.

This is increasingly becoming a huge barrier to progress. Yes, large-scale data projects are a big undertaking, and can contain an element of risk, but there have also never been more options and technologies available to ensure privacy and deliver success. Today, as providers struggle with capacity and staff overutilisation, and in the future, as budgets become increasingly squeezed no health system can afford to ignore the benefits of data-driven decision making. 

Data saves lives 
As data access technologies, cryptographic algorithms and interoperable infrastructures improve in capability and sophistication, the ability of organizations to unlock the power of the data they hold about their patients - and unlock improved outcomes - is expanding at pace.  

In March 2020, as European countries began their first pandemic lockdowns, the Randomised Evaluation of COVID-19 Therapy (RECOVERY) trial was launched in the UK. With the goal of identifying existing medications that could prove to be effective as COVID-19 treatments, researchers on this trial were given secure access to data from nearly 50,000 patients across 192 clinical venues. This wealth of real-time information was deployed by the researchers to recruit trial participants, allocate treatments, monitor outcomes and analyze results. After just 100 days, researchers were able to provide evidence that neither hydroxychloroquine nor Lopinavir/ritonavir were effective treatments. However, this evidence also showed that a different medication, dexamethasone, could reduce deaths by up to one third in hospitalized patients. This drug went on to save the lives of over one million COVID-19 patients around the world. Data was accessed during this period under so-called emergency “COPI notices” which temporarily suspended individuals’ rights to privacy, and as the pandemic has ended this practice has quite correctly been removed. However, similar insights can now be generated at the same scale and speed with leading-edge data access and privacy preservation technology without the risk of compromising individuals’ privacy. How? By anonymising at source, still linking across sources when required, and leaving all record-level data in its original setting whenever possible.

In 2022, at St George’s Hospital in London, an Inflammatory Bowel Disease (IBD) project launched in collaboration with FITFILE used this very technology to undertake analysis of the organization’s existing siloed IBD patient data to generate significant system efficiencies. This initiative showed that switching the biologics of 50% of patients could improve overall outcomes by 10%. It also showed that savings of around £2.5 million could be achieved from funding reallocation just for one subsection of patients in one hospital department – which, when grossed up for all patients across all departments could significantly improve that hospital trust’s financial standing and available resources for optimized patient care.

These are two excellent use-cases for the huge volume of patient data that every healthcare organization – from hospitals to primary and specialist care clinics to entire health systems - collects and stores on a daily basis. 

Both the RECOVERY trial and the St George’s IBD project act as useful illustrations of how patient data can practically be leveraged in a high-impact manner: with other examples ranging from managing system capacity to identifying process efficiencies to designing new treatment pathways. 

Gold-standard systems 
Understandably, and encouragingly, national and independent healthcare organizations across Europe are now increasingly pursuing a vast array of projects to unlock and unite patient data. 

If healthcare organizations wish to create and maintain systems that facilitate this leveraging of locked-up data, they must premise such systems on streamlined and highly secure, complete data access. This necessitates mass collaboration between stakeholders - patients, researchers, clinicians and administrators - and explicit data sharing agreements that are ultimately in the patients’ best interests. 

Once a realistic roadmap has been put in place for the achievement of complete data access, the next step would be to establish safe and secure avenues along which data could be accessed and united. 

However, this huge opportunity comes with huge risk and huge responsibility. Unless leaders acknowledge the sources of this risk and understand how to mitigate it, the full potential of patient data projects will remain unrealized.  

Protecting patient privacy: the need-to-knows 
Every institution that collects and stores private data is legally obligated to uphold the privacy of the individuals it belongs to. Data protection regulation mandates that, in Europe, patient data can only be used by healthcare and research organizations beyond direct patient care if the individuals concerned can never be re-identified. The only practical exceptions are if consent has been explicitly granted by the individual concerned, and/or the use case has been specially approved by authorities.

Recent interesting examples include DARWIN EU’s initial batch of studies using real-world data covering areas such as rare blood cancers and asthma. National databases on a variety of sources such as hospitals, primary care, health insurance, registries and biobanks were used to investigate aspects such as prevalence, safety and prescription patterns.

These studies show that patient data does not need to remain locked up until explicit consent is secured. However, they also highlight the need for sophisticated methods of privacy preservation when data is accessed and used.

 In practice, this means that all identifying features must be removed from the data - so the information can never be linked back to a single individual. This is where the challenge lies. As noted, for instance, in the DARWIN Protocol C1-001, “patient visiting more than one provider are not cross identified for data protection reasons and therefore recorded as separate in the system”. This is just one example of the severe limits to insight generation if complete, accurate and consistent patient health profiles can’t be assembled at the record-level from numerous data points distributed across various sources such as electronic health records, pathology results and administrative databases. Until quite recently, it’s only been possible to bring together this data in (far less useful) aggregated or in identifiable form, which, as explained, violates privacy regulations. 

Even if ‘tokenization’ is used - and the patient identity is replaced with a token - the reversible nature of the process means that the data can be re-identified, and subsequently the process does not fully protect privacy. The explicit consent or specific permission route must be pursued for every use of tokenized data.

 Is anonymization the answer? 
Importantly, recent advances in proven cryptography have opened the door to huge possibilities for safely uniting and using patient data. 

Using new and unique processes, all identifiable information can be stripped from the data, rendering it safe for unification and the best option for preserving privacy. Once this total, irreversible anonymization has been carried out - ideally whilst the data remains in its source location - the data can now still be linked longitudinally and contemporaneously without requiring explicit patient consent.   

This means that both retrospective and prospective patient data, collected from numerous touchpoints, can be used without any compliance risk.   
This approach empowers healthcare providers and clinical teams to effectively recognize unique patterns within the local patient population and cater to individual healthcare requirements. It facilitates the prompt and precise identification of patient cohorts, optimizes resource allocation for care provision, and ultimately contributes to enhanced health outcomes.
Achieving anonymization 
To fully and irreversibly anonymize their data, it’s necessary for healthcare organizations to deploy proven technologies that mitigate risk and help build trust and buy-in for data projects.  

The FITFILE platform is an example of a technology built for this purpose. The platform uniquely anonymizes the data within the data controller’s own environment, and then connects, unites and integrates that data at a record level. It can also calculate statistical attributes of interest - without moving any data on populations of interest - in order to deliver differential privacy and truly federated insights for anonymized individuals with data across numerous silos (as opposed to studies such as the early DARWIN EU projects drawing only on single source data for non-overlapping populations).   

Federated computations at source mean that there is minimal movement of data, and fully and irreversibly anonymizing the data ahead of unification ensures the best possible privacy preservation.

Once given access to the results of the data analysis, clinicians could be equipped with the necessary information to recommend the best course of treatment to more finely stratified patients, whilst hospital leaders could be equipped to more effectively forecast and better manage aspects such as patient flow through their wards. 

Checks and balances 
The degree of risk that comes with any patient data project can - and must - be effectively managed. To do this, project leaders must ensure that patient data is fully and irreversibly anonymized wherever appropriate. Safeguards must be established to ensure that only approved persons can access the data, and that any threat from hacks or security breaches is minimized. 

As discussed above, this can be achieved with privacy-by-design safety mechanisms integrated into the platforms and tools in use. However, it is the ultimate responsibility of the project leaders to monitor adherence to national data protection regulations.

 Enabling an evidence-informed approach 
Not helped by frequently negative coverage in national mainstream media, a high level of apprehension exists in the public consciousness regarding the risks of shared patient and public health data. If this apprehension and fear is translated into a lack of investment or a lack of senior-level enthusiasm for data projects within the healthcare sector, everyone loses. It’s the responsibility of leaders to push ahead with such projects, whilst advocating for the most stringent processes, systems and protocols to be implemented and followed. This includes ensuring that privacy treatment is the best possible for any given purpose, which in many cases may mean that traditional tokenization or pseudonymisation can now be replaced with irreversible anonymization at source, meaning that patient privacy can be fully guaranteed whilst nevertheless producing a useful resource for the planners, managers, clinicians and even researchers who have valid reason to access it.  


Philip Russmeyer

Philip Russmeyer is the founder & CEO of FITFILE, a company on a mission to close the gap between data silos in the healthcare industry in order to drive better insights, deliver better care and ultimately improve patient outcomes.

Harvard Medical School - Leadership in Medicine Southeast Asia47th IHF World Hospital CongressHealthcare CNO SummitHealthcare CMO Summit