Secure processing environments (SPE) for medical data
© TMF e.V.
On November 4th 2024, representatives of the genomDE and Medical Informatics Initiative (MII) projects, as well as experts from CSC Finland, Genomics England and the Swiss Institute for Bioinformatics, exchanged ideas at a joint workshop in Berlin. The Federal Ministry of Health, the Federal Ministry of Education and Research, the Federal Institute for Drugs and Medical Devices (BfArM), and the Robert Koch Institute (RKI) also took part in the event, which was organised by TMF e.V. The discussions revolved around the requirements for and experiences with secure processing environments (SPE) from the perspective of research and genomic medical care.
Sebastian C. Semler, TMF managing director and head of the coordination body for genomDE and the MII, began by explaining the extent to which an SPE is mandatory in certain jurisdictions: In the European Health Data Space (EHDS), secondary data use is currently only possible via an SPE. The EU Commission is expected to issue an implementing act in 2025 that will regulate further aspects. The German Health Data Use Act (GDNG) stipulates that criteria are to be developed for how data linkage can be carried out by an SPE in the future.
What is an SPE?
There is currently no standardised definition of an SPE. SPEs can be described as secure and regulated data processing infrastructures that enable providers and researchers to access sensitive health data while adequately maintaining the security of the data. Various definitions were mentioned in the presentations.
SPEs in the Model Project for Diagnostics and Therapy Selection by Means of Genome Sequencing for Rare and Oncological Diseases (MV GenomSeq)
Professor Thomas Berlage, Fraunhofer FIT, reported on the results of an initial workshop investigating the requirements for SPEs from use cases in medical care based on genomic information and research. He emphasised that, particularly in the Model Project Genome Sequencing, knowledge-generating care can only be achieved in secure processing environments. The developments of the German Human Genome-Phenome Archive (GHGA) can already be used here for the processing of genomic data.
For data services as a central utilisation concept in the Model Project Genome Sequencing, many open questions still need to be clarified. The aim is federated processing by all clinical and genomic data nodes. Data linkage with data from the clinical cancer registries and the research data centers, including the inclusion of health insurance data, must also be clarified. A separate computing instance with statistical tools and broker access to data sources could be considered here, for example.
Requirements from a clinical perspective
Dr Philipp Breitfeld, UKE Hamburg, presented the model of a secure local research data environment (Trusted Research Environment, TRE). For routine clinical data, a secure working environment and secure access to complex data are needed for more precise research while complying with data privacy and legal requirements. In addition, the efficient provision of computing resources and a collaboration platform is important.
The results of a survey conducted by the MII junior research group BENEFIT showed that clinicians demand low-threshold data access and want to be supported in their research workflow. They want to be able to collaborate across disciplines, involve clinical users in research and access specialised software. The technical requirements for a TRE include scalability (provision of a flexible infrastructure within the TRE in the face of growing data volumes and higher user numbers), interoperability, and data security and protection (e.g. security protocols, access controls).
The UKE in Hamburg has developed a solution with the so-called research platform ‘Datenhotel’. This makes it possible to conduct individual research with pseudonymised data from local standard care on the basis of state law. Among other things, the ‘Datenhotel’ serves to generate and/or simplify the data protection-compliant review of scientific research questions. A so-called transfer point exports the user's requested data, pseudonymised by the trust centre, to a room assigned to the user for the research question. In the ‘data hotel’, the clinical data from the local hospital information system (HIS) can only be accessed temporarily and cannot be downloaded. Access is only possible via specially protected computers.
With regard to the further development of TREs, he emphasised that the processing of big data, machine learning and the export of AI models and algorithms to the protected data processing environments still have limitations. These functions should be integrated in the future. Collaboration with external organisations is desirable.
Differentiated consideration of the need for protection of genomic and image data
Professor Michael Krawczak, UKSH, explained that the need for protection of genomic data is high at first glance, since a risk of stigmatisation and discrimination can arise not only for the person concerned but also for their relatives in the event of re-identification. However, he pointed out that a high level of scientific expertise is necessary to interpret genomic data and that, in general, the protection of different genomic data categories should be considered separately. For example, the whole genome sequence requires a higher level of protection than SNPs (single nucleotide polymorphisms).
Professor Tobias Penzkofer, Charité – Universitätsmedizin Berlin, added that image data also require a high level of protection. These contain a large amount of identifying information, for example anatomical, pathological or demographic information and often further metadata. Image data and their potentially identifiable characteristics could be subdivided into different protection categories. Technical solutions, such as defacing, metadata replacement, but also organisational solutions (processing by qualified staff) could protect these data. It was necessary to face these problems and take effective measures.
SPE of the Research Data Centre at the BfArM
Dr Christian Brachem, BfArM, presented how the Health Data Lab (HDL) intends to provide billing data of those with statutory health insurance (outpatient and inpatient) and ePA data in an SPE. The HDL assessed the protection requirements of the DaTraV data as ‘high’ according to the criteria of the Federal Office for Information Security (BSI). The Federal Ministry of Drugs and Medical Devices continues to work closely with the BSI and the Federal Commissioner for Data Protection and Freedom of Information (BfDI). The HDL operates its own data centre with its own infrastructure, which separates zones with high and low protection requirements. Researchers submit a data use application via the application portal and, if approved, are given access to the secure processing environment (virtualised browser). Test data is stored there, while the real data is stored elsewhere. Only the statistical results can be made available to the researcher, leaving the zone containing the real data. This requires a manual assessment in each case. There is currently no legal basis for linking with genome data. The main criteria to be considered for an SPE are strengthening data protection and IT security, increasing transparency and visibility, and ensuring a research-friendly workflow.
Experiences from Finland, England and Switzerland
Examples from Finland, England and Switzerland showed how researchers in other European countries can access national medical data collections in secure data processing environments.
Dr Augusto Rendon from Genomics England explained that five security aspects apply to SPEs in England: safe people, safe projects, safe setting, safe data and safe outputs. As a partner of the National Health Service (NHS), Genomics England implements the majority of its services in private clouds, including Amazon. HL7-FHIR and NSH standards are used. He presented the National Genomic Research Library, which is a partnership between National Health Service (NHS) England and Genomics England.
Researchers have access to a secure data processing environment via a virtual desktop. This ensures data security and access control to services and data. All analyses are carried out in the SPE. Aggregated results can leave the secure processing environment, but raw data cannot. The research platform is cloud-based. It is subject to a fee and only partially financed by tax revenue. He estimates the costs for the entire infrastructure at approximately 10 million pounds per year.
Heikki Lehväslaiho explained the approach of CSC – IT Center for Science from Finland, a non-profit company owned by the Finnish state and universities. CSC offers cloud computing services in self-developed cloud systems. There is a virtual private cloud service (‘ePouta’), which is only available via an organisation's internal network, as well as ‘sensitive data services’, which are available on-demand over the internet and support the entire research cycle, as well as enabling collaboration. Lehväslaiho reported on the many years of experience with SPEs in Finland and warned against adopting a centralised approach. In his view, federated approaches are unavoidable, including at the European level. In order to make federated systems a success, identity management is crucial. Dr Julia Maurer from the Swiss Institute of Bioinformatics (SIB) explained that Switzerland has adopted a decentralised approach in which data is only transferred to secure nodes for research projects. The BioMedIT Network's secure research environment consists of three physical nodes that provide a secure cloud environment and IT support for research. Researchers receive the analysis results only, but not the data. She also presented the Swiss Federated Genomics Network (SFGN), which is the Swiss node for the European Genome-Phenome Archive (EGA) and is creating a genomic data set. She emphasised that, above all, a good governance framework is needed for an SPE, along with the willingness of all stakeholders to involve the public and to strengthen communication in order to gain public acceptance.
Conclusion
The workshop with focus on SPE, organised by the TMF, brought together the expertise of the two initiatives, genomDE and MII, as well as other national developments at federal authorities and universities. Moreover, it showed a comparison with other European countries, which made clear that many questions regarding SPEs remain to be answered at both the national and European level. These include requirements for the federation of SPEs and the question of operating costs and expenses. Furthermore, it is also necessary to discuss which functionalities of an SPE should be prioritised. The participants agreed that modular systems that are scalable and interoperable are to be preferred. As the coordination body for genomDE and the MII, the TMF will continue to support the development of requirements for an SPE from a scientific perspective.