Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
2.
NPJ Sci Food ; 7(1): 46, 2023 Sep 01.
Article in English | MEDLINE | ID: mdl-37658060

ABSTRACT

Ensuring safe and healthy food is a big challenge due to the complexity of food supply chains and their vulnerability to many internal and external factors, including food fraud. Recent research has shown that Artificial Intelligence (AI) based algorithms, in particularly data driven Bayesian Network (BN) models, are very suitable as a tool to predict future food fraud and hence allowing food producers to take proper actions to avoid that such problems occur. Such models become even more powerful when data can be used from all actors in the supply chain, but data sharing is hampered by different interests, data security and data privacy. Federated learning (FL) may circumvent these issues as demonstrated in various areas of the life sciences. In this research, we demonstrate the potential of the FL technology for food fraud using a data driven BN, integrating data from different data owners without the data leaving the database of the data owners. To this end, a framework was constructed consisting of three geographically different data stations hosting different datasets on food fraud. Using this framework, a BN algorithm was implemented that was trained on the data of different data stations while the data remained at its physical location abiding by privacy principles. We demonstrated the applicability of the federated BN in food fraud and anticipate that such framework may support stakeholders in the food supply chain for better decision-making regarding food fraud control while still preserving the privacy and confidentiality nature of these data.

4.
J Biomed Semantics ; 13(1): 12, 2022 04 25.
Article in English | MEDLINE | ID: mdl-35468846

ABSTRACT

BACKGROUND: The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data 'silos' that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR. RESULTS: In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors' research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital. CONCLUSIONS: Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.


Subject(s)
COVID-19 , Pandemics , COVID-19/epidemiology , Hospitals , Humans , Metadata , Semantic Web
5.
Sci Data ; 9(1): 169, 2022 04 13.
Article in English | MEDLINE | ID: mdl-35418585

ABSTRACT

The genomes of thousands of individuals are profiled within Dutch healthcare and research each year. However, this valuable genomic data, associated clinical data and consent are captured in different ways and stored across many systems and organizations. This makes it difficult to discover rare disease patients, reuse data for personalized medicine and establish research cohorts based on specific parameters. FAIR Genomes aims to enable NGS data reuse by developing metadata standards for the data descriptions needed to FAIRify genomic data while also addressing ELSI issues. We developed a semantic schema of essential data elements harmonized with international FAIR initiatives. The FAIR Genomes schema v1.1 contains 110 elements in 9 modules. It reuses common ontologies such as NCIT, DUO and EDAM, only introducing new terms when necessary. The schema is represented by a YAML file that can be transformed into templates for data entry software (EDC) and programmatic interfaces (JSON, RDF) to ease genomic data sharing in research and healthcare. The schema, documentation and MOLGENIS reference implementation are available at https://fairgenomes.org .


Subject(s)
High-Throughput Nucleotide Sequencing , Metadata , Delivery of Health Care , Genomics , Humans , Software
6.
J Biomed Semantics ; 13(1): 9, 2022 03 15.
Article in English | MEDLINE | ID: mdl-35292119

ABSTRACT

BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.


Subject(s)
Common Data Elements , Rare Diseases , Humans , Registries , Semantics , Workflow
7.
Hum Mutat ; 43(6): 717-733, 2022 06.
Article in English | MEDLINE | ID: mdl-35178824

ABSTRACT

Rare disease patients are more likely to receive a rapid molecular diagnosis nowadays thanks to the wide adoption of next-generation sequencing. However, many cases remain undiagnosed even after exome or genome analysis, because the methods used missed the molecular cause in a known gene, or a novel causative gene could not be identified and/or confirmed. To address these challenges, the RD-Connect Genome-Phenome Analysis Platform (GPAP) facilitates the collation, discovery, sharing, and analysis of standardized genome-phenome data within a collaborative environment. Authorized clinicians and researchers submit pseudonymised phenotypic profiles encoded using the Human Phenotype Ontology, and raw genomic data which is processed through a standardized pipeline. After an optional embargo period, the data are shared with other platform users, with the objective that similar cases in the system and queries from peers may help diagnose the case. Additionally, the platform enables bidirectional discovery of similar cases in other databases from the Matchmaker Exchange network. To facilitate genome-phenome analysis and interpretation by clinical researchers, the RD-Connect GPAP provides a powerful user-friendly interface and leverages tens of information sources. As a result, the resource has already helped diagnose hundreds of rare disease patients and discover new disease causing genes.


Subject(s)
Genomics , Rare Diseases , Exome , Genetic Association Studies , Genomics/methods , Humans , Phenotype , Rare Diseases/diagnosis , Rare Diseases/genetics
8.
Orphanet J Rare Dis ; 16(1): 376, 2021 09 04.
Article in English | MEDLINE | ID: mdl-34481493

ABSTRACT

BACKGROUND: Patient data registries that are FAIR-Findable, Accessible, Interoperable, and Reusable for humans and computers-facilitate research across multiple resources. This is particularly relevant to rare diseases, where data often are scarce and scattered. Specific research questions can be asked across FAIR rare disease registries and other FAIR resources without physically combining the data. Further, FAIR implies well-defined, transparent access conditions, which supports making sensitive data as open as possible and as closed as necessary. RESULTS: We successfully developed and implemented a process of making a rare disease registry for vascular anomalies FAIR from its conception-de novo. Here, we describe the five phases of this process in detail: (i) pre-FAIRification, (ii) facilitating FAIRification, (iii) data collection, (iv) generating FAIR data in real-time, and (v) using FAIR data. This includes the creation of an electronic case report form and a semantic data model of the elements to be collected (in this case: the "Set of Common Data Elements for Rare Disease Registration" released by the European Commission), and the technical implementation of automatic, real-time data FAIRification in an Electronic Data Capture system. Further, we describe how we contribute to the four facets of FAIR, and how our FAIRification process can be reused by other registries. CONCLUSIONS: In conclusion, a detailed de novo FAIRification process of a registry for vascular anomalies is described. To a large extent, the process may be reused by other rare disease registries, and we envision this work to be a substantial contribution to an ecosystem of FAIR rare disease resources.


Subject(s)
Ecosystem , Rare Diseases , Humans , Rare Diseases/epidemiology , Registries
9.
J Biomed Inform ; 122: 103897, 2021 10.
Article in English | MEDLINE | ID: mdl-34454078

ABSTRACT

INTRODUCTION: Existing methods to make data Findable, Accessible, Interoperable, and Reusable (FAIR) are usually carried out in a post hoc manner: after the research project is conducted and data are collected. De-novo FAIRification, on the other hand, incorporates the FAIRification steps in the process of a research project. In medical research, data is often collected and stored via electronic Case Report Forms (eCRFs) in Electronic Data Capture (EDC) systems. By implementing a de novo FAIRification process in such a system, the reusability and, thus, scalability of FAIRification across research projects can be greatly improved. In this study, we developed and implemented a novel method for de novo FAIRification via an EDC system. We evaluated our method by applying it to the Registry of Vascular Anomalies (VASCA). METHODS: Our EDC and research project independent method ensures that eCRF data entered into an EDC system can be transformed into machine-readable, FAIR data using a semantic data model (a canonical representation of the data, based on ontology concepts and semantic web standards) and mappings from the model to questions on the eCRF. The FAIRified data are stored in a triple store and can, together with associated metadata, be accessed and queried through a FAIR Data Point. The method was implemented in Castor EDC, an EDC system, through a data transformation application. The FAIRness of the output of the method, the FAIRified data and metadata, was evaluated using the FAIR Evaluation Services. RESULTS: We successfully applied our FAIRification method to the VASCA registry. Data entered on eCRFs is automatically transformed into machine-readable data and can be accessed and queried using SPARQL queries in the FAIR Data Point. Twenty-one FAIR Evaluator tests pass and one test regarding the metadata persistence policy fails, since this policy is not in place yet. CONCLUSION: In this study, we developed a novel method for de novo FAIRification via an EDC system. Its application in the VASCA registry and the automated FAIR evaluation show that the method can be used to make clinical research data FAIR when they are entered in an eCRF without any intervention from data management and data entry personnel. Due to the generic approach and developed tooling, we believe that our method can be used in other registries and clinical trials as well.


Subject(s)
Biomedical Research , Metadata , Data Management , Electronics , Registries
10.
Stud Health Technol Inform ; 279: 144-146, 2021 May 07.
Article in English | MEDLINE | ID: mdl-33965931

ABSTRACT

BACKGROUND: Integration of heterogenous resources is key for Rare Disease research. Within the EJP RD, common Application Programming Interface specifications are proposed for discovery of resources and data records. This is not sufficient for automated processing between RD resources and meeting the FAIR principles. OBJECTIVE: To design a solution to improve FAIR for machines for the EJP RD API specification. METHODS: A FAIR Data Point is used to expose machine-actionable metadata of digital resources and it is configured to store its content to a semantic database to be FAIR at the source. RESULTS: A solution was designed based on grlc server as middleware to implement the EJP RD API specification on top of the FDP. CONCLUSION: grlc reduces potential API implementation overhead faced by maintainers who use FAIR at the source.


Subject(s)
Rare Diseases , Software , Databases, Factual , Humans , Internet , Metadata , Semantics
11.
Sci Data ; 8(1): 10, 2021 01 15.
Article in English | MEDLINE | ID: mdl-33452270

ABSTRACT

Rett syndrome (RTT) is a rare neurological disorder mostly caused by a genetic variation in MECP2. Making new MECP2 variants and the related phenotypes available provides data for better understanding of disease mechanisms and faster identification of variants for diagnosis. This is, however, currently hampered by the lack of interoperability between genotype-phenotype databases. Here, we demonstrate on the example of MECP2 in RTT that by making the genotype-phenotype data more Findable, Accessible, Interoperable, and Reusable (FAIR), we can facilitate prioritization and analysis of variants. In total, 10,968 MECP2 variants were successfully integrated. Among these variants 863 unique confirmed RTT causing and 209 unique confirmed benign variants were found. This dataset was used for comparison of pathogenicity predicting tools, protein consequences, and identification of ambiguous variants. Prediction tools generally recognised the RTT causing and benign variants, however, there was a broad range of overlap Nineteen variants were identified that were annotated as both disease-causing and benign, suggesting that there are additional factors in these cases contributing to disease development.


Subject(s)
Methyl-CpG-Binding Protein 2/genetics , Mutation , Rett Syndrome/etiology , DNA Mutational Analysis , Data Analysis , Humans , Rett Syndrome/genetics
12.
Stud Health Technol Inform ; 271: 115-116, 2020 Jun 23.
Article in English | MEDLINE | ID: mdl-32578552

ABSTRACT

BACKGROUND: Connecting currently existing, heterogeneous rare disease (RD) registries would greatly facilitate epidemiological and clinical research. To increase their interoperability, the European Union developed a set of Common Data Elements (CDEs) for RD registries. OBJECTIVES: To implement the CDEs and the FAIR data principles in the Registry of Vascular Anomalies (VASCA). METHODS: We created a semantic model for the CDE and transformed this into a Resource Description Framework (RDF) template. The electronic case report forms (eCRF) were mapped to the RDF template and published in a FAIR Data Point (FDP). RESULTS: The FAIR VASCA registry was successfully implemented using Castor EDC (Electronic Data Capture) software. CONCLUSION: FAIR technology allows researchers to query and combine data from different registries in real-time.


Subject(s)
Common Data Elements , Registries , Software , Humans , Rare Diseases , Semantics
14.
Biomed Res Int ; 2017: 8327980, 2017.
Article in English | MEDLINE | ID: mdl-29214177

ABSTRACT

Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries.


Subject(s)
Database Management Systems/statistics & numerical data , Information Storage and Retrieval/statistics & numerical data , Rare Diseases/epidemiology , Registries/statistics & numerical data , Semantic Web/statistics & numerical data , Computational Biology/methods , Databases, Factual/statistics & numerical data , Humans , Information Dissemination/methods , Internet/statistics & numerical data , Software/statistics & numerical data
15.
Sci Rep ; 7(1): 12575, 2017 10 03.
Article in English | MEDLINE | ID: mdl-28974727

ABSTRACT

Duchenne muscular dystrophy (DMD) is a muscular dystrophy with high incidence of learning and behavioural problems and is associated with neurodevelopmental disorders. To gain more insights into the role of dystrophin in this cognitive phenotype, we performed a comprehensive analysis of the expression patterns of dystrophin isoforms across human brain development, using unique transcriptomic data from Allen Human Brain and BrainSpan atlases. Dystrophin isoforms show large changes in expression through life with pronounced differences between the foetal and adult human brain. The Dp140 isoform was expressed in the cerebral cortex only in foetal life stages, while in the cerebellum it was also expressed postnatally. The Purkinje isoform Dp427p was virtually absent. The expression of dystrophin isoforms was significantly associated with genes implicated in neurodevelopmental disorders, like autism spectrum disorders or attention-deficit hyper-activity disorders, which are known to be associated to DMD. We also identified relevant functional associations of the different isoforms, like an association with axon guidance or neuron differentiation during early development. Our results point to the crucial role of several dystrophin isoforms in the development and function of the human brain.


Subject(s)
Autism Spectrum Disorder/genetics , Cerebral Cortex/metabolism , Dystrophin/genetics , Muscular Dystrophy, Duchenne/genetics , Protein Isoforms/genetics , Autism Spectrum Disorder/metabolism , Autism Spectrum Disorder/physiopathology , Axons/metabolism , Cell Differentiation/genetics , Cerebral Cortex/growth & development , Fetus/metabolism , Gene Expression Regulation, Developmental/genetics , Humans , Muscular Dystrophy, Duchenne/metabolism , Muscular Dystrophy, Duchenne/physiopathology , Mutation , Neurons/metabolism , Neuropsychological Tests , Phenotype , Purkinje Cells/metabolism , Purkinje Cells/pathology , Transcriptome/genetics
16.
PLoS One ; 11(2): e0149621, 2016.
Article in English | MEDLINE | ID: mdl-26919047

ABSTRACT

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans
17.
J Biomed Semantics ; 6: 5, 2015.
Article in English | MEDLINE | ID: mdl-26464783

ABSTRACT

Data from high throughput experiments often produce far more results than can ever appear in the main text or tables of a single research article. In these cases, the majority of new associations are often archived either as supplemental information in an arbitrary format or in publisher-independent databases that can be difficult to find. These data are not only lost from scientific discourse, but are also elusive to automated search, retrieval and processing. Here, we use the nanopublication model to make scientific assertions that were concluded from a workflow analysis of Huntington's Disease data machine-readable, interoperable, and citable. We followed the nanopublication guidelines to semantically model our assertions as well as their provenance metadata and authorship. We demonstrate interoperability by linking nanopublication provenance to the Research Object model. These results indicate that nanopublications can provide an incentive for researchers to expose data that is interoperable and machine-readable for future use and preservation for which they can get credits for their effort. Nanopublications can have a leading role into hypotheses generation offering opportunities to produce large-scale data integration.

18.
PLoS One ; 10(7): e0127612, 2015.
Article in English | MEDLINE | ID: mdl-26154165

ABSTRACT

MOTIVATION: Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. RESULTS: Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. AVAILABILITY: SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. CONTACT: philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk.


Subject(s)
Computational Biology/methods , Models, Theoretical , Peer Review, Research , Reproducibility of Results
19.
Genome Biol ; 16: 22, 2015 Jan 05.
Article in English | MEDLINE | ID: mdl-25723102

ABSTRACT

The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.


Subject(s)
Genomics/methods , Promoter Regions, Genetic , Software , Transcription Initiation, Genetic , Animals , Computational Biology/methods , Databases, Genetic , Datasets as Topic , Gene Expression Profiling , Humans , Mice , Transcriptome , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...