Page 24 - Read Online
P. 24

Tsipouras et al. Rare Dis Orphan Drugs J 2023;2:17  https://dx.doi.org/10.20517/rdodj.2023.15  Page 3 of 6

               Table 1. The data aggregation challenge. Comparison of risks and benefits between existing and federated databases
                             Databases                             Federated databases
                Security and   Movement and copying of sensitive information increases   In a TRE and federation environment, data are not moved or
                compliance   the risk of data breach               copied, reducing security risk
                Data size and   Lack of standardized formats and pipelines limits   Fully standardized data, securely accessible by cloud-based
                interoperability   interoperability, and negatively impacts scalability, cost,   platforms through federation, can be combined with global
                             and efficiency                        cohorts and disparate datasets
                Collaboration   Data cannot leave jurisdictional borders. Data sharing   Federated approaches will eliminate a major barrier across
                             agreements are frequently difficult to negotiate and   individual datasets, vastly improving the statistical power of
                             implement, hindering collaboration    research

               TRE: trusted research environment.


               Federated data analysis platforms, which facilitate secure data access from multiple sources without the
               need for data movement- where data could be vulnerable to interception, have emerged as a promising part
               of a solution for safely sharing anonymized genomic data. Here, genomic data remains secure in the TRE,
               which can then be linked virtually using a set of Application Programming Interfaces (APIs).

               Traditional data access methods involve researchers downloading data to an institutional computing cluster.
               With federated analysis, the analysis is brought to where the distributed data lies, thereby eliminating the
               risky movement of data and removing many existing barriers to accessibility . Such technology means that
                                                                               [13]
               data can be made securely accessible but that data controllers (e.g., biobanks and healthcare providers)
               retain jurisdictional autonomy over data, a key concern in international data sharing.

               International initiatives such as the Global Alliance for Genomics and Health (GA4GH)  set standards to
                                                                                          [14]
               promote the international sharing of genomic and health-related data, in part by setting interoperability
               standards and providing open-source APIs.


               Common Data Models (CDMs) are crucial to ensuring data is interoperable, with several growing in
               popularity in the life sciences sector recently, including OMOP (Observational Medical Outcomes
               Partnership) CDM from the OHDSI (Observational Health Data Sciences and Informatics)-specifically for
               clinical-genomic data. Examples of health organizations utilizing OMOP as their CDM include the UK
               Biobank and All of Us from the US National Institutes for Health (NIH) [15,16] .


               Additionally, extraction, transformation, and loading (ETL) pipelines that can automate this work to
               process and convert raw data to analysis-ready data help further simplify this process for researchers.
               Normalizing all data to internationally recognized standards allows researchers to perform joint analyses
               across distributed datasets, which is key to ensuring diversity and representation of as many populations as
               possible in studies.


               These standardized and interoperable datasets could be combined seamlessly for analysis via federation,
               enabling researchers to analyze this data collaboratively in conjunction with other complementary datasets.
               Standardization of data formats and analytical approaches within and even between health systems can
               bring substantial benefits in terms of comparability of data and contribute to continually improving
               processes.

               Illustrative examples with potential multiplier effects could include:
   19   20   21   22   23   24   25   26   27   28   29