Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
Mult Scler ; 28(10): 1630-1640, 2022 09.
Article in English | MEDLINE | ID: mdl-35301890

ABSTRACT

BACKGROUND: Pregnancies have an impact on the disease course of multiple sclerosis (MS), but their relationship with MS risk is yet unclear. OBJECTIVE: To determine the relationships of pregnancies and gynecological diagnoses with MS risk. METHODS: In this retrospective case-control study, we assessed differences in gynecological International Classification of Diseases, 10th Revision (ICD-10) code recording rates between women with MS (n = 5720), Crohn's disease (n = 6280), or psoriasis (n = 40,555) and women without these autoimmune diseases (n = 26,729) in the 5 years before diagnosis. RESULTS: Twenty-eight ICD-10 codes were recorded less frequently for women with MS as compared to women without autoimmune disease, 18 of which are pregnancy-related. After adjustment for pregnancies, all codes unrelated to pregnancies were still negatively associated with MS. In a sensitivity analysis excluding women with evidence for possible demyelinating events before diagnosis, all associations were more pronounced. In comparison to women with psoriasis, most associations could be confirmed; that was not true in comparison to women with Crohn's disease. CONCLUSION: Our findings provide evidence for a possible protective effect of pregnancies on MS risk likely independent of or in addition to a previously suggested reversed causality. The negative associations of gynecological disorders with disease risk need further investigation. The associations might be shared by different autoimmune diseases.


Subject(s)
Autoimmune Diseases , Crohn Disease , Multiple Sclerosis , Psoriasis , Case-Control Studies , Crohn Disease/epidemiology , Female , Humans , Multiple Sclerosis/epidemiology , Multiple Sclerosis/etiology , Pregnancy , Psoriasis/complications , Psoriasis/epidemiology , Retrospective Studies
3.
J Med Internet Res ; 23(6): e27348, 2021 06 07.
Article in English | MEDLINE | ID: mdl-33999836

ABSTRACT

BACKGROUND: Overcoming the COVID-19 crisis requires new ideas and strategies for online communication of personal medical information and patient empowerment. Rapid testing of a large number of subjects is essential for monitoring and delaying the spread of SARS-CoV-2 in order to mitigate the pandemic's consequences. People who do not know that they are infected may not stay in quarantine and, thus, risk infecting others. Unfortunately, the massive number of COVID-19 tests performed is challenging for both laboratories and the units that conduct throat swabs and communicate the results. OBJECTIVE: The goal of this study was to reduce the communication burden for health care professionals. We developed a secure and easy-to-use tracking system to report COVID-19 test results online that is simple to understand for the tested subjects as soon as these results become available. Instead of personal calls, the system updates the status and the results of the tests automatically. This aims to reduce the delay when informing testees about their results and, consequently, to slow down the virus spread. METHODS: The application in this study draws on an existing tracking tool. With this open-source and browser-based online tracking system, we aim to minimize the time required to inform the tested person and the testing units (eg, hospitals or the public health care system). The system can be integrated into the clinical workflow with very modest effort and avoids excessive load to telephone hotlines. RESULTS: The test statuses and results are published on a secured webpage, enabling regular status checks by patients; status checks are performed without the use of smartphones, which has some importance, as smartphone usage diminishes with age. Stress tests and statistics show the performance of our software. CTest is currently running at two university hospitals in Germany-University Hospital Ulm and University Hospital Tübingen-with thousands of tests being performed each week. Results show a mean number of 10 (SD 2.8) views per testee. CONCLUSIONS: CTest runs independently of existing infrastructures, aims at straightforward integration, and aims for the safe transmission of information. The system is easy to use for testees. QR (Quick Response) code links allow for quick access to the test results. The mean number of views per entry indicates a reduced amount of time for both health care professionals and testees. The system is quite generic and can be extended and adapted to other communication tasks.


Subject(s)
COVID-19/diagnosis , COVID-19/psychology , Communication , Medical Informatics/organization & administration , Medical Informatics/standards , Pandemics , Patient Participation , SARS-CoV-2/isolation & purification , COVID-19/epidemiology , COVID-19/virology , Germany , Humans , Time Factors
4.
Eur J Epidemiol ; 36(2): 233-241, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33492549

ABSTRACT

Infectious complications are the major cause of morbidity and mortality after solid organ and stem cell transplantation. To better understand host and environmental factors associated with an increased risk of infection as well as the effect of infections on function and survival of transplanted organs, we established the DZIF Transplant Cohort, a multicentre prospective cohort study within the organizational structure of the German Center for Infection Research. At time of transplantation, heart-, kidney-, lung-, liver-, pancreas- and hematopoetic stem cell- transplanted patients are enrolled into the study. Follow-up visits are scheduled at 3, 6, 9, 12 months after transplantation, and annually thereafter; extracurricular visits are conducted in case of infectious complications. Comprehensive standard operating procedures, web-based data collection and monitoring tools as well as a state of the art biobanking concept for blood, purified PBMCs, urine, and faeces samples ensure high quality of data and biosample collection. By collecting detailed information on immunosuppressive medication, infectious complications, type of infectious agent and therapy, as well as by providing corresponding biosamples, the cohort will establish the foundation for a broad spectrum of studies in the field of infectious diseases and transplant medicine. By January 2020, baseline data and biosamples of about 1400 patients have been collected. We plan to recruit 3500 patients by 2023, and continue follow-up visits and the documentation of infectious events at least until 2025. Information about the DZIF Transplant Cohort is available at https://www.dzif.de/en/working-group/transplant-cohort .


Subject(s)
Biological Specimen Banks , Immunosuppression Therapy , Organ Transplantation , Postoperative Complications , Research Design , Adolescent , Adult , Aged , Aged, 80 and over , Bacterial Infections , Child , Child, Preschool , Cohort Studies , Female , Humans , Male , Middle Aged , Young Adult
5.
JMIR Med Inform ; 8(7): e15918, 2020 Jul 21.
Article in English | MEDLINE | ID: mdl-32706673

ABSTRACT

BACKGROUND: Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. OBJECTIVE: Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. METHODS: We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. RESULTS: The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. CONCLUSIONS: Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation.

6.
Stud Health Technol Inform ; 270: 68-72, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570348

ABSTRACT

Modern biomedical research is increasingly data-driven. To create the required big datasets, health data needs to be shared or reused, which often leads to privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and reliable tools. In this work, we tackle the problem of achieving reliability. Privacy models often involve mathematical definitions using real numbers which are typically approximated using floating-point numbers when implemented as software. We study the effect on the privacy guarantees provided and present a reliable computing framework based on fractional and interval arithmetic for improving the reliability of implementations. Extensive evaluations demonstrate that reliable data anonymization is practical and that it can be achieved with minor impacts on executions times and data utility.


Subject(s)
Biomedical Research , Data Anonymization , Confidentiality , Privacy , Reproducibility of Results , Software
7.
BMC Med Inform Decis Mak ; 20(1): 29, 2020 02 11.
Article in English | MEDLINE | ID: mdl-32046701

ABSTRACT

BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.


Subject(s)
Confidentiality , Data Anonymization , Decision Support Systems, Clinical , Models, Statistical , Software , Biomedical Research , Humans , Machine Learning , ROC Curve , Reproducibility of Results
8.
Dis Esophagus ; 32(8)2019 Aug 01.
Article in English | MEDLINE | ID: mdl-31329831

ABSTRACT

Risk stratification in patients with Barrett's esophagus (BE) to prevent the development of esophageal adenocarcinoma (EAC) is an unsolved task. The incidence of EAC and BE is increasing and patients are still at unknown risk. BarrettNET is an ongoing multicenter prospective cohort study initiated to identify and validate molecular and clinical biomarkers that allow a more personalized surveillance strategy for patients with BE. For BarrettNET participants are recruited in 20 study centers throughout Germany, to be followed for progression to dysplasia (low-grade dysplasia or high-grade dysplasia) or EAC for >10 years. The study instruments comprise self-administered epidemiological information (containing data on demographics, lifestyle factors, and health), as well as biological specimens, i.e., blood-based samples, esophageal tissue biopsies, and feces and saliva samples. In follow-up visits according to the individual surveillance plan of the participants, sample collection is repeated. The standardized collection and processing of the specimen guarantee the highest sample quality. Via a mobile accessible database, the documentation of inclusion, epidemiological data, and pathological disease status are recorded subsequently. Currently the BarrettNET registry includes 560 participants (23.1% women and 76.9% men, aged 22-92 years) with a median follow-up of 951 days. Both the design and the size of BarrettNET offer the advantage of answering research questions regarding potential causes of disease progression from BE to EAC. Here all the integrated methods and materials of BarrettNET are presented and reviewed to introduce this valuable German registry.


Subject(s)
Adenocarcinoma/diagnosis , Barrett Esophagus/complications , Early Detection of Cancer/methods , Esophageal Neoplasms/diagnosis , Population Surveillance/methods , Risk Assessment/methods , Adenocarcinoma/etiology , Adult , Aged , Aged, 80 and over , Biomarkers/analysis , Clinical Decision Rules , Disease Progression , Esophageal Neoplasms/etiology , Female , Germany , Humans , Male , Middle Aged , Prospective Studies , Registries , Risk Factors , Young Adult
9.
Int J Med Inform ; 126: 72-81, 2019 06.
Article in English | MEDLINE | ID: mdl-31029266

ABSTRACT

BACKGROUND: Modern data-driven approaches to medical research require patient-level information at comprehensive depth and breadth. To create the required big datasets, information from disparate sources can be integrated into clinical and translational warehouses. This is typically implemented with Extract, Transform, Load (ETL) processes, which access, harmonize and upload data into the analytics platform. OBJECTIVE: Privacy-protection needs careful consideration when data is pooled or re-used for secondary purposes, and data anonymization is an important protection mechanism. However, common ETL environments do not support anonymization, and common anonymization tools cannot easily be integrated into ETL workflows. The objective of the work described in this article was to bridge this gap. METHODS: Our main design goals were (1) to base the anonymization process on expert-level risk assessment methodologies, (2) to use transformation methods which preserve both the truthfulness of data and its schematic properties (e.g. data types), (3) to implement a method which is easy to understand and intuitive to configure, and (4) to provide high scalability. RESULTS: We designed a novel and efficient anonymization process and implemented a plugin for the Pentaho Data Integration (PDI) platform, which enables integrating data anonymization and re-identification risk analyses directly into ETL workflows. By combining different instances into a single ETL process, data can be protected from multiple threats. The plugin supports very large datasets by leveraging the streaming-based processing model of the underlying platform. We present results of an extensive experimental evaluation and discuss successful applications. CONCLUSIONS: Our work shows that expert-level anonymization methodologies can be integrated into ETL workflows. Our implementation is available under a non-restrictive open source license and it overcomes several limitations of other data anonymization tools.


Subject(s)
Biomedical Research , Privacy , Algorithms , Datasets as Topic , Humans
10.
Methods Inf Med ; 57(S 01): e57-e65, 2018 07.
Article in English | MEDLINE | ID: mdl-30016812

ABSTRACT

INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers. OBJECTIVES: The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing. GOVERNANCE AND POLICIES: Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change. ARCHITECTURAL FRAMEWORK AND METHODOLOGY: The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing. USE CASES: From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios. DISCUSSION: Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.


Subject(s)
Biomedical Research , Information Dissemination , Clinical Governance , Computer Security , Cooperative Behavior , Humans
11.
Endocr Relat Cancer ; 25(5): 547-560, 2018 05.
Article in English | MEDLINE | ID: mdl-29563190

ABSTRACT

Tropomyosin receptor kinase (Trk) inhibitors are investigated as a novel targeted therapy in various cancers. We investigated the in vitro effects of the pan-Trk inhibitor GNF-5837 in human neuroendocrine tumor (NET) cells. The human neuroendocrine pancreatic BON1, bronchopulmonary NCI-H727 and ileal GOT1 cell lines were treated with GNF-5837 alone and in combination with everolimus. Cell viability decreased in a time- and dose-dependent manner in GOT1 cells in response to GNF-5837 treatment, while treatment in BON1 and NCI-H727 cells showed no effect on cellular viability. Trk receptor expression determined GNF-5837 sensitivity. GNF-5837 caused downregulation of PI3K-Akt-mTOR signaling, Ras-Raf-MEK-ERK signaling, the cell cycle and increased apoptotic cell death. The combinational treatment of GNF-5837 with everolimus showed a significant enhancement in inhibition of cell viability vs single substance treatments, due to a cooperative PI3K-Akt-mTOR and Ras-Raf-MEK-ERK pathway downregulation, as well as an enhanced cell cycle component downregulation. Immunohistochemical staining for Trk receptors were performed using a tissue microarray containing 107 tumor samples of gastroenteropancreatic NETs. Immunohistochemical staining with TrkA receptor and pan-Trk receptor antibodies revealed a positive staining in pancreatic NETs in 24.2% (8/33) and 33.3% (11/33), respectively. We demonstrated that the pan-Trk inhibitor GNF-5837 has promising anti-tumoral properties in human NET cell lines expressing the TrkA receptor. Immunohistochemical or molecular screening for Trk expression particularly in pancreatic NETs might serve as predictive marker for molecular targeted therapy with Trk inhibitors.


Subject(s)
Neuroendocrine Tumors/drug therapy , Protein Kinase Inhibitors/therapeutic use , Receptor, trkA/antagonists & inhibitors , Humans , Neuroendocrine Tumors/pathology , Protein Kinase Inhibitors/pharmacology
12.
IEEE J Biomed Health Inform ; 22(2): 611-622, 2018 03.
Article in English | MEDLINE | ID: mdl-28358693

ABSTRACT

The sharing of sensitive personal health data is an important aspect of biomedical research. Methods of data de-identification are often used in this process to trade the granularity of data off against privacy risks. However, traditional approaches, such as HIPAA safe harbor or -anonymization, often fail to provide data with sufficient quality. Alternatively, data can be de-identified only to a degree which still allows us to use it as required, e.g., to carry out specific analyses. Controlled environments, which restrict the ways recipients can interact with the data, can then be used to cope with residual risks. The contributions of this article are twofold. First, we present a method for implementing controlled data sharing environments and analyze its privacy properties. Second, we present a de-identification method which is specifically suited for sanitizing health data which is to be shared in such environments. Traditional de-identification methods control the uniqueness of records in a dataset. The basic idea of our approach is to reduce the probability that a record in a dataset has characteristics which are unique within the underlying population. As the characteristics of the population are typically not known, we have implemented a pragmatic solution in which properties of the population are modeled with statistical methods. We have further developed an accompanying process for evaluating and validating the degree of protection provided. The results of an extensive experimental evaluation show that our approach enables the safe sharing of high-quality data and that it is highly scalable.


Subject(s)
Confidentiality , Databases, Factual , Information Dissemination/methods , Medical Records , Algorithms , Biomedical Research , Humans
13.
BMC Med Inform Decis Mak ; 17(1): 30, 2017 03 23.
Article in English | MEDLINE | ID: mdl-28330491

ABSTRACT

BACKGROUND: Translational researchers need robust IT solutions to access a range of data types, varying from public data sets to pseudonymised patient information with restricted access, provided on a case by case basis. The reason for this complication is that managing access policies to sensitive human data must consider issues of data confidentiality, identifiability, extent of consent, and data usage agreements. All these ethical, social and legal aspects must be incorporated into a differential management of restricted access to sensitive data. METHODS: In this paper we present a pilot system that uses several common open source software components in a novel combination to coordinate access to heterogeneous biomedical data repositories containing open data (open access) as well as sensitive data (restricted access) in the domain of biobanking and biosample research. Our approach is based on a digital identity federation and software to manage resource access entitlements. RESULTS: Open source software components were assembled and configured in such a way that they allow for different ways of restricted access according to the protection needs of the data. We have tested the resulting pilot infrastructure and assessed its performance, feasibility and reproducibility. CONCLUSIONS: Common open source software components are sufficient to allow for the creation of a secure system for differential access to sensitive data. The implementation of this system is exemplary for researchers facing similar requirements for restricted access data. Here we report experience and lessons learnt of our pilot implementation, which may be useful for similar use cases. Furthermore, we discuss possible extensions for more complex scenarios.


Subject(s)
Biological Specimen Banks/standards , Biomedical Research/standards , Computer Security/standards , Datasets as Topic , Translational Research, Biomedical/standards , Humans , Pilot Projects
14.
Stud Health Technol Inform ; 245: 704-708, 2017.
Article in English | MEDLINE | ID: mdl-29295189

ABSTRACT

When individual-level health data are shared in biomedical research, the privacy of patients must be protected. This is typically achieved by data de-identification methods, which transform data in such a way that formal privacy requirements are met. In the process, it is important to minimize the loss of information to maintain data quality. Although several models have been proposed for measuring this aspect, it remains unclear which model is best suited for which application. We have therefore performed an extensive experimental comparison. We first implemented several common quality models into the ARX de-identification tool for biomedical data. We then used each model to de-identify a patient discharge dataset covering almost 4 million cases and outputs were analyzed to measure the impact of different quality models on real-world applications. Our results show that different models are best suited for specific applications, but that one model (Non-Uniform Entropy) is particularly well suited for general-purpose use.


Subject(s)
Biomedical Research , Data Anonymization , Confidentiality , Data Accuracy , Humans , Privacy
15.
Stud Health Technol Inform ; 228: 312-6, 2016.
Article in English | MEDLINE | ID: mdl-27577394

ABSTRACT

Data sharing plays an important role in modern biomedical research. Due to the inherent sensitivity of health data, patient privacy must be protected. De-identification means to transform a dataset in such a way that it becomes extremely difficult for an attacker to link its records to identified individuals. This can be achieved with different types of data transformations. As transformation impacts the information content of a dataset, it is important to balance an increase in privacy with a decrease in data quality. To this end, models for measuring both aspects are needed. Non-Uniform Entropy is a model for data quality which is frequently recommended for de-identifying health data. In this work we show that it cannot be used in a meaningful way for measuring the quality of data which has been transformed with several important types of data transformation. We introduce a generic variant, which overcomes this limitation. We performed experiments with real-world datasets, which show that our method provides a unified framework in which the quality of differently transformed data can be compared to find a good or even optimal solution to a given data de-identification problem. We have implemented our method into ARX, an open source anonymization tool for biomedical data.


Subject(s)
Confidentiality , Information Dissemination , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Quality Control , Biomedical Research
16.
Methods Inf Med ; 55(4): 347-55, 2016 Aug 05.
Article in English | MEDLINE | ID: mdl-27322502

ABSTRACT

BACKGROUND: Data sharing is a central aspect of modern biomedical research. It is accompanied by significant privacy concerns and often data needs to be protected from re-identification. With methods of de-identification datasets can be transformed in such a way that it becomes extremely difficult to link their records to identified individuals. The most important challenge in this process is to find an adequate balance between an increase in privacy and a decrease in data quality. OBJECTIVES: Accurately measuring the risk of re-identification in a specific data sharing scenario is an important aspect of data de-identification. Overestimation of risks will significantly deteriorate data quality, while underestimation will leave data prone to attacks on privacy. Several models have been proposed for measuring risks, but there is a lack of generic methods for risk-based data de-identification. The aim of the work described in this article was to bridge this gap and to show how the quality of de-identified datasets can be improved by using risk models to tailor the process of de-identification to a concrete context. METHODS: We implemented a generic de-identification process and several models for measuring re-identification risks into the ARX de-identification tool for biomedical data. By integrating the methods into an existing framework, we were able to automatically transform datasets in such a way that information loss is minimized while it is ensured that re-identification risks meet a user-defined threshold. We performed an extensive experimental evaluation to analyze the impact of using different risk models and assumptions about the goals and the background knowledge of an attacker on the quality of de-identified data. RESULTS: The results of our experiments show that data quality can be improved significantly by using risk models for data de-identification. On a scale where 100 % represents the original input dataset and 0 % represents a dataset from which all information has been removed, the loss of information content could be reduced by up to 10 % when protecting datasets against strong adversaries and by up to 24 % when protecting datasets against weaker adversaries. CONCLUSIONS: The methods studied in this article are well suited for protecting sensitive biomedical data and our implementation is available as open-source software. Our results can be used by data custodians to increase the information content of de-identified data by tailoring the process to a specific data sharing scenario. Improving data quality is important for fostering the adoption of de-identification methods in biomedical research.


Subject(s)
Biomedical Research , Databases, Factual , Patient Identification Systems , Computer Security , Data Accuracy , Humans , Models, Theoretical , Risk
17.
BMC Med Inform Decis Mak ; 16: 49, 2016 Apr 30.
Article in English | MEDLINE | ID: mdl-27130179

ABSTRACT

BACKGROUND: Privacy must be protected when sensitive biomedical data is shared, e.g. for research purposes. Data de-identification is an important safeguard, where datasets are transformed to meet two conflicting objectives: minimizing re-identification risks while maximizing data quality. Typically, de-identification methods search a solution space of possible data transformations to find a good solution to a given de-identification problem. In this process, parts of the search space must be excluded to maintain scalability. OBJECTIVES: The set of transformations which are solution candidates is typically narrowed down by storing the results obtained during the search process and then using them to predict properties of the output of other transformations in terms of privacy (first objective) and data quality (second objective). However, due to the exponential growth of the size of the search space, previous implementations of this method are not well-suited when datasets contain many attributes which need to be protected. As this is often the case with biomedical research data, e.g. as a result of longitudinal collection, we have developed a novel method. METHODS: Our approach combines the mathematical concept of antichains with a data structure inspired by prefix trees to represent properties of a large number of data transformations while requiring only a minimal amount of information to be stored. To analyze the improvements which can be achieved by adopting our method, we have integrated it into an existing algorithm and we have also implemented a simple best-first branch and bound search (BFS) algorithm as a first step towards methods which fully exploit our approach. We have evaluated these implementations with several real-world datasets and the k-anonymity privacy model. RESULTS: When integrated into existing de-identification algorithms for low-dimensional data, our approach reduced memory requirements by up to one order of magnitude and execution times by up to 25 %. This allowed us to increase the size of solution spaces which could be processed by almost a factor of 10. When using the simple BFS method, we were able to further increase the size of the solution space by a factor of three. When used as a heuristic strategy for high-dimensional data, the BFS approach outperformed a state-of-the-art algorithm by up to 12 % in terms of the quality of output data. CONCLUSIONS: This work shows that implementing methods of data de-identification for real-world applications is a challenging task. Our approach solves a problem often faced by data custodians: a lack of scalability of de-identification software when used with datasets having realistic schemas and volumes. The method described in this article has been implemented into ARX, an open source de-identification software for biomedical data.


Subject(s)
Algorithms , Confidentiality , Medical Informatics/methods , Models, Statistical , Humans
18.
J Neurol ; 263(5): 961-972, 2016 May.
Article in English | MEDLINE | ID: mdl-26995359

ABSTRACT

The m.8344A>G mutation in the MTTK gene, which encodes the mitochondrial transfer RNA for lysine, is traditionally associated with myoclonic epilepsy and ragged-red fibres (MERRF), a multisystemic mitochondrial disease that is characterised by myoclonus, seizures, cerebellar ataxia, and mitochondrial myopathy with ragged-red fibres. We studied the clinical and paraclinical phenotype of 34 patients with the m.8344A>G mutation, mainly derived from the nationwide mitoREGISTER, the multicentric registry of the German network for mitochondrial disorders (mitoNET). Mean age at symptom onset was 24.5 years ±10.9 (6-48 years) with adult onset in 75 % of the patients. In our cohort, the canonical features seizures, myoclonus, cerebellar ataxia and ragged-red fibres that are traditionally associated with MERRF, occurred in only 61, 59, 70, and 63 % of the patients, respectively. In contrast, other features such as hearing impairment were even more frequently present (72 %). Other common features in our cohort were migraine (52 %), psychiatric disorders (54 %), respiratory dysfunction (45 %), gastrointestinal symptoms (38 %), dysarthria (36 %), and dysphagia (35 %). Brain MRI revealed cerebral and/or cerebellar atrophy in 43 % of our patients. There was no correlation between the heteroplasmy level in blood and age at onset or clinical phenotype. Our findings further broaden the clinical spectrum of the m.8344A>G mutation, document the large clinical variability between carriers of the same mutation, even within families and indicate an overlap of the phenotype with other mitochondrial DNA-associated syndromes.


Subject(s)
MERRF Syndrome/genetics , MERRF Syndrome/physiopathology , Mutation , RNA, Transfer, Lys/genetics , RNA/genetics , Adolescent , Adult , Age of Onset , Aged , Brain/diagnostic imaging , Cohort Studies , Female , Germany/epidemiology , Humans , MERRF Syndrome/drug therapy , MERRF Syndrome/epidemiology , Male , Middle Aged , Pedigree , Phenotype , RNA, Mitochondrial , Registries
19.
Article in German | MEDLINE | ID: mdl-26809823

ABSTRACT

BACKGROUND: In addition to the Biobanking and BioMolecular resources Research Initiative (BBMRI), which is establishing a European research infrastructure for biobanks, a network for large European prospective cohorts (LPC) is being built to facilitate transnational research into important groups of diseases and health care. One instrument for this is the database "LPC Catalogue," which supports access to the biomaterials of the participating cohorts. OBJECTIVES: To present the LPC Catalogue as a relevant tool for connecting European biobanks. In addition, the LPC Catalogue has been extended to establish compatibility with existing Minimum Information About Biobank data Sharing (MIABIS) and to allow for more detailed search requests. This article describes the LPC Catalogue, its organizational and technical structure, and the aforementioned extensions. MATERIALS AND METHODS: The LPC Catalogue provides a structured overview of the participating LPCs. It offers various retrieval possibilities and a search function. To support more detailed search requests, a new module has been developed, called a "data cube". The provision of data by the cohorts is being supported by a "connector" component. RESULTS: The LPC Catalogue contains data on 22 cohorts and more than 3.8 million biosamples. At present, data on the biosamples of three cohorts have been acquired for the "cube," which is continuously being expanded. In the BBMRI-LPC, tendering for scientific projects using the data and samples of the participating cohorts is currently being carried out. In this context, several proposals have already been approved. CONCLUSIONS: The LPC Catalogue is supporting transnational access to biosamples. A comparison with existing solutions illustrates the relevance of its functionality.


Subject(s)
Biological Specimen Banks/organization & administration , Biomedical Research/organization & administration , Catalogs as Topic , Database Management Systems/organization & administration , Databases, Factual , Interinstitutional Relations , Cohort Studies , Europe , Forecasting , Information Dissemination/methods , Information Storage and Retrieval/methods , Models, Organizational , Registries , Specimen Handling/methods
20.
BMC Med Inform Decis Mak ; 15: 100, 2015 Nov 30.
Article in English | MEDLINE | ID: mdl-26621059

ABSTRACT

BACKGROUND: Collaborative collection and sharing of data have become a core element of biomedical research. Typical applications are multi-site registries which collect sensitive person-related data prospectively, often together with biospecimens. To secure these sensitive data, national and international data protection laws and regulations demand the separation of identifying data from biomedical data and to introduce pseudonyms. Neither the formulation in laws and regulations nor existing pseudonymization concepts, however, are precise enough to directly provide an implementation guideline. We therefore describe core requirements as well as implementation options for registries and study databases with sensitive biomedical data. METHODS: We first analyze existing concepts and compile a set of fundamental requirements for pseudonymized data management. Then we derive a system architecture that fulfills these requirements. Next, we provide a comprehensive overview and a comparison of different technical options for an implementation. Finally, we develop a generic software solution for managing pseudonymized data and show its feasibility by describing how we have used it to realize two research networks. RESULTS: We have found that pseudonymization models are highly heterogeneous, already on a conceptual level. We have compiled a set of requirements from different pseudonymization schemes. We propose an architecture and present an overview of technical options. Based on a selection of technical elements, we suggest a generic solution. It supports the multi-site collection and management of biomedical data. Security measures are multi-tier pseudonymity and physical separation of data over independent backend servers. Integrated views are provided by a web-based user interface. Our approach has been successfully used to implement a national and an international rare disease network. CONCLUSIONS: We were able to identify a set of core requirements out of several pseudonymization models. Considering various implementation options, we realized a generic solution which was implemented and deployed in research networks. Still, further conceptual work on pseudonymity is needed. Specifically, it remains unclear how exactly data is to be separated into distributed subsets. Moreover, a thorough risk and threat analysis is needed.


Subject(s)
Biomedical Research/standards , Confidentiality/standards , Datasets as Topic/standards , Guidelines as Topic/standards , Registries/standards , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...