Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 677
Filter
1.
Front Cell Infect Microbiol ; 14: 1384809, 2024.
Article in English | MEDLINE | ID: mdl-38774631

ABSTRACT

Introduction: Sharing microbiome data among researchers fosters new innovations and reduces cost for research. Practically, this means that the (meta)data will have to be standardized, transparent and readily available for researchers. The microbiome data and associated metadata will then be described with regards to composition and origin, in order to maximize the possibilities for application in various contexts of research. Here, we propose a set of tools and protocols to develop a real-time FAIR (Findable. Accessible, Interoperable and Reusable) compliant database for the handling and storage of human microbiome and host-associated data. Methods: The conflicts arising from privacy laws with respect to metadata, possible human genome sequences in the metagenome shotgun data and FAIR implementations are discussed. Alternate pathways for achieving compliance in such conflicts are analyzed. Sample traceable and sensitive microbiome data, such as DNA sequences or geolocalized metadata are identified, and the role of the GDPR (General Data Protection Regulation) data regulations are considered. For the construction of the database, procedures have been realized to make data FAIR compliant, while preserving privacy of the participants providing the data. Results and discussion: An open-source development platform, Supabase, was used to implement the microbiome database. Researchers can deploy this real-time database to access, upload, download and interact with human microbiome data in a FAIR complaint manner. In addition, a large language model (LLM) powered by ChatGPT is developed and deployed to enable knowledge dissemination and non-expert usage of the database.


Subject(s)
Microbiota , Humans , Microbiota/genetics , Databases, Factual , Metadata , Metagenome , Information Dissemination , Computational Biology/methods , Metagenomics/methods , Databases, Genetic
2.
Sci Data ; 11(1): 524, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38778016

ABSTRACT

Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.


Subject(s)
Data Management , Metadata , Biomedical Research , Data Management/standards , Metadata/standards , Software
5.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724907

ABSTRACT

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Subject(s)
Data Curation , Software , Workflow , Data Curation/methods , Metadata , Databases, Genetic , Genomics/methods , Computational Biology/methods
6.
Sci Data ; 11(1): 503, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38755173

ABSTRACT

Nanomaterials hold great promise for improving our society, and it is crucial to understand their effects on biological systems in order to enhance their properties and ensure their safety. However, the lack of consistency in experimental reporting, the absence of universally accepted machine-readable metadata standards, and the challenge of combining such standards hamper the reusability of previously produced data for risk assessment. Fortunately, the research community has responded to these challenges by developing minimum reporting standards that address several of these issues. By converting twelve published minimum reporting standards into a machine-readable representation using FAIR maturity indicators, we have created a machine-friendly approach to annotate and assess datasets' reusability according to those standards. Furthermore, our NanoSafety Data Reusability Assessment (NSDRA) framework includes a metadata generator web application that can be integrated into experimental data management, and a new web application that can summarize the reusability of nanosafety datasets for one or more subsets of maturity indicators, tailored to specific computational risk assessment use cases. This approach enhances the transparency, communication, and reusability of experimental data and metadata. With this improved FAIR approach, we can facilitate the reuse of nanosafety research for exploration, toxicity prediction, and regulation, thereby advancing the field and benefiting society as a whole.


Subject(s)
Nanostructures , Metadata , Nanostructures/toxicity , Risk Assessment
7.
BMC Med Inform Decis Mak ; 24(1): 136, 2024 May 27.
Article in English | MEDLINE | ID: mdl-38802886

ABSTRACT

BACKGROUND: The selection of data elements is a decisive task within the development of a health registry. Having the right metadata is crucial for answering the particular research questions. Furthermore, the set of data elements determines the registries' readiness of interoperability and data reusability to a major extent. Six health registries shared and published their metadata within a German funding initiative. As one step in the direction of a common set of data elements, a selection of those metadata was evaluated with regard to their appropriateness for a broader usage. METHODS: Each registry was asked to contribute a 10%-selection of their data elements to an evaluation sample. The survey was set up with the online survey tool "LimeSurvey Cloud". The registries and an accompanying project participated in the survey with one vote for each project. The data elements were offered in content groups along with the question of whether the data element is appropriate for health registries on a broader scale. The question could be answered using a Likert scale with five options. Furthermore, "no answer" was allowed. The level of agreement was assessed using weighted Cohen's kappa and Kendall's coefficient of concordance. RESULTS: The evaluation sample consisted of 269 data elements. With a grade of "perhaps recommendable" or higher in the mean, 169 data elements were selected. These data elements belong preferably to groups' demography, education/occupation, medication, and nutrition. Half of the registries lost significance compared with their percentage of data elements in the evaluation sample, one remained stable. The level of concordance was adequate. CONCLUSIONS: The survey revealed a set of 169 data elements recommended for health registries. When developing a registry, this set could be valuable help in selecting the metadata appropriate to answer the registry's research questions. However, due to the high specificity of research questions, data elements beyond this set will be needed to cover the whole range of interests of a register. A broader discussion and subsequent surveys are needed to establish a common set of data elements on an international scale.


Subject(s)
Registries , Registries/standards , Germany , Humans , Surveys and Questionnaires , Metadata
8.
Philos Trans R Soc Lond B Biol Sci ; 379(1904): 20230104, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38705176

ABSTRACT

Technological advancements in biological monitoring have facilitated the study of insect communities at unprecedented spatial scales. The progress allows more comprehensive coverage of the diversity within a given area while minimizing disturbance and reducing the need for extensive human labour. Compared with traditional methods, these novel technologies offer the opportunity to examine biological patterns that were previously beyond our reach. However, to address the pressing scientific inquiries of the future, data must be easily accessible, interoperable and reusable for the global research community. Biodiversity information standards and platforms provide the necessary infrastructure to standardize and share biodiversity data. This paper explores the possibilities and prerequisites of publishing insect data obtained through novel monitoring methods through GBIF, the most comprehensive global biodiversity data infrastructure. We describe the essential components of metadata standards and existing data standards for occurrence data on insects, including data extensions. By addressing the current opportunities, limitations, and future development of GBIF's publishing framework, we hope to encourage researchers to both share data and contribute to the further development of biodiversity data standards and publishing models. Wider commitments to open data initiatives will promote data interoperability and support cross-disciplinary scientific research and key policy indicators. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.


Subject(s)
Biodiversity , Information Dissemination , Insecta , Animals , Entomology/methods , Entomology/standards , Information Dissemination/methods , Metadata
9.
Stud Health Technol Inform ; 313: 198-202, 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38682530

ABSTRACT

Secondary use of clinical health data implies a prior integration of mostly heterogenous and multidimensional data sets. A clinical data warehouse addresses the technological and organizational framework conditions required for this, by making any data available for analysis. However, users of a data warehouse often do not have a comprehensive overview of all available data and only know about their own data in their own systems - a situation which is also referred to as 'data siloed state'. This problem can be addressed and ultimately solved by implementation of a data catalog. Its core function is a search engine, which allows for searching the metadata collected from different data sources and thereby accessing all data there is. With this in mind, we conducted an explorative online market survey followed by vendor comparison as a pre-requisite for system selection of a data catalog. Assessment of vendor performance was based on seven predetermined and weighted selection criteria. Although three vendors achieved the highest score, results were lying closely together. Detailed investigations and test installations are needed for further narrowing down the selection process.


Subject(s)
Data Warehousing , Electronic Health Records , Search Engine , Humans , Information Storage and Retrieval/methods , Metadata
10.
Genome Biol ; 25(1): 100, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38641812

ABSTRACT

Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.


Subject(s)
Metadata , Research Design , Reproducibility of Results
11.
Database (Oxford) ; 20242024 Apr 03.
Article in English | MEDLINE | ID: mdl-38581360

ABSTRACT

When the scientific dataset evolves or is reused in workflows creating derived datasets, the integrity of the dataset with its metadata information, including provenance, needs to be securely preserved while providing assurances that they are not accidentally or maliciously altered during the process. Providing a secure method to efficiently share and verify the data as well as metadata is essential for the reuse of the scientific data. The National Science Foundation (NSF) funded Open Science Chain (OSC) utilizes consortium blockchain to provide a cyberinfrastructure solution to maintain integrity of the provenance metadata for published datasets and provides a way to perform independent verification of the dataset while promoting reuse and reproducibility. The NSF- and National Institutes of Health (NIH)-funded Neuroscience Gateway (NSG) provides a freely available web portal that allows neuroscience researchers to execute computational data analysis pipeline on high performance computing resources. Combined, the OSC and NSG platforms form an efficient, integrated framework to automatically and securely preserve and verify the integrity of the artifacts used in research workflows while using the NSG platform. This paper presents the results of the first study that integrates OSC-NSG frameworks to track the provenance of neurophysiological signal data analysis to study brain network dynamics using the Neuro-Integrative Connectivity tool, which is deployed in the NSG platform. Database URL: https://www.opensciencechain.org.


Subject(s)
Neurosciences , Publications , Reproducibility of Results , Databases, Factual , Metadata
12.
PLoS One ; 19(4): e0295474, 2024.
Article in English | MEDLINE | ID: mdl-38568922

ABSTRACT

Insect monitoring is essential to design effective conservation strategies, which are indispensable to mitigate worldwide declines and biodiversity loss. For this purpose, traditional monitoring methods are widely established and can provide data with a high taxonomic resolution. However, processing of captured insect samples is often time-consuming and expensive, which limits the number of potential replicates. Automated monitoring methods can facilitate data collection at a higher spatiotemporal resolution with a comparatively lower effort and cost. Here, we present the Insect Detect DIY (do-it-yourself) camera trap for non-invasive automated monitoring of flower-visiting insects, which is based on low-cost off-the-shelf hardware components combined with open-source software. Custom trained deep learning models detect and track insects landing on an artificial flower platform in real time on-device and subsequently classify the cropped detections on a local computer. Field deployment of the solar-powered camera trap confirmed its resistance to high temperatures and humidity, which enables autonomous deployment during a whole season. On-device detection and tracking can estimate insect activity/abundance after metadata post-processing. Our insect classification model achieved a high top-1 accuracy on the test dataset and generalized well on a real-world dataset with captured insect images. The camera trap design and open-source software are highly customizable and can be adapted to different use cases. With custom trained detection and classification models, as well as accessible software programming, many possible applications surpassing our proposed deployment method can be realized.


Subject(s)
Insecta , Software , Animals , Biodiversity , Data Collection , Metadata
13.
PLoS One ; 19(4): e0302426, 2024.
Article in English | MEDLINE | ID: mdl-38662676

ABSTRACT

Research data sharing has become an expected component of scientific research and scholarly publishing practice over the last few decades, due in part to requirements for federally funded research. As part of a larger effort to better understand the workflows and costs of public access to research data, this project conducted a high-level analysis of where academic research data is most frequently shared. To do this, we leveraged the DataCite and Crossref application programming interfaces (APIs) in search of Publisher field elements demonstrating which data repositories were utilized by researchers from six academic research institutions between 2012-2022. In addition, we also ran a preliminary analysis of the quality of the metadata associated with these published datasets, comparing the extent to which information was missing from metadata fields deemed important for public access to research data. Results show that the top 10 publishers accounted for 89.0% to 99.8% of the datasets connected with the institutions in our study. Known data repositories, including institutional data repositories hosted by those institutions, were initially lacking from our sample due to varying metadata standards and practices. We conclude that the metadata quality landscape for published research datasets is uneven; key information, such as author affiliation, is often incomplete or missing from source data repositories and aggregators. To enhance the findability, interoperability, accessibility, and reusability (FAIRness) of research data, we provide a set of concrete recommendations that repositories and data authors can take to improve scholarly metadata associated with shared datasets.


Subject(s)
Information Dissemination , Metadata , Information Dissemination/methods , Humans , Biomedical Research
14.
Lab Anim (NY) ; 53(3): 67-79, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38438748

ABSTRACT

Although biomedical research is experiencing a data explosion, the accumulation of vast quantities of data alone does not guarantee a primary objective for science: building upon existing knowledge. Data collected that lack appropriate metadata cannot be fully interrogated or integrated into new research projects, leading to wasted resources and missed opportunities for data repurposing. This issue is particularly acute for research using animals, where concerns regarding data reproducibility and ensuring animal welfare are paramount. Here, to address this problem, we propose a minimal metadata set (MNMS) designed to enable the repurposing of in vivo data. MNMS aligns with an existing validated guideline for reporting in vivo data (ARRIVE 2.0) and contributes to making in vivo data FAIR-compliant. Scenarios where MNMS should be implemented in diverse research environments are presented, highlighting opportunities and challenges for data repurposing at different scales. We conclude with a 'call for action' to key stakeholders in biomedical research to adopt and apply MNMS to accelerate both the advancement of knowledge and the betterment of animal welfare.


Subject(s)
Biomedical Research , Metadata , Animals , Reproducibility of Results , Animal Welfare
15.
PLoS One ; 19(3): e0297404, 2024.
Article in English | MEDLINE | ID: mdl-38446758

ABSTRACT

Film festivals are a key component in the global film industry in terms of trendsetting, publicity, trade, and collaboration. We present an unprecedented analysis of the international film festival circuit, which has so far remained relatively understudied quantitatively, partly due to the limited availability of suitable data sets. We use large-scale data from the Cinando platform of the Cannes Film Market, widely used by industry professionals. We explicitly model festival events as a global network connected by shared films and quantify festivals as aggregates of the metadata of their showcased films. Importantly, we argue against using simple count distributions for discrete labels such as language or production country, as such categories are typically not equidistant. Rather, we propose embedding them in continuous latent vector spaces. We demonstrate how these "festival embeddings" provide insight into changes in programmed content over time, predict festival connections, and can be used to measure diversity in film festival programming across various cultural, social, and geographical variables-which all constitute an aspect of public value creation by film festivals. Our results provide a novel mapping of the film festival circuit between 2009-2021 (616 festivals, 31,989 unique films), highlighting festival types that occupy specific niches, diverse series, and those that evolve over time. We also discuss how these quantitative findings fit into media studies and research on public value creation by cultural industries. With festivals occupying a central position in the film industry, investigations into the data they generate hold opportunities for researchers to better understand industry dynamics and cultural impact, and for organizers, policymakers, and industry actors to make more informed, data-driven decisions. We hope our proposed methodological approach to festival data paves way for more comprehensive film festival studies and large-scale quantitative cultural event analytics in general.


Subject(s)
Holidays , Industry , Geography , Language , Metadata
16.
Bioinformatics ; 40(3)2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38445753

ABSTRACT

SUMMARY: Python is the most commonly used language for deep learning (DL). Existing Python packages for mass spectrometry imaging (MSI) data are not optimized for DL tasks. We, therefore, introduce pyM2aia, a Python package for MSI data analysis with a focus on memory-efficient handling, processing and convenient data-access for DL applications. pyM2aia provides interfaces to its parent application M2aia, which offers interactive capabilities for exploring and annotating MSI data in imzML format. pyM2aia utilizes the image input and output routines, data formats, and processing functions of M2aia, ensures data interchangeability, and enables the writing of readable and easy-to-maintain DL pipelines by providing batch generators for typical MSI data access strategies. We showcase the package in several examples, including imzML metadata parsing, signal processing, ion-image generation, and, in particular, DL model training and inference for spectrum-wise approaches, ion-image-based approaches, and approaches that use spectral and spatial information simultaneously. AVAILABILITY AND IMPLEMENTATION: Python package, code and examples are available at (https://m2aia.github.io/m2aia).


Subject(s)
Deep Learning , Software , Mass Spectrometry/methods , Language , Metadata
17.
PLoS One ; 19(3): e0296810, 2024.
Article in English | MEDLINE | ID: mdl-38483886

ABSTRACT

Contact matrices are a commonly adopted data representation, used to develop compartmental models for epidemic spreading, accounting for the contact heterogeneities across age groups. Their estimation, however, is generally time and effort consuming and model-driven strategies to quantify the contacts are often needed. In this article we focus on household contact matrices, describing the contacts among the members of a family and develop a parametric model to describe them. This model combines demographic and easily quantifiable survey-based data and is tested on high resolution proximity data collected in two sites in South Africa. Given its simplicity and interpretability, we expect our method to be easily applied to other contexts as well and we identify relevant questions that need to be addressed during the data collection procedure.


Subject(s)
Epidemics , Metadata , Surveys and Questionnaires , Epidemiological Models , South Africa , Contact Tracing/methods
18.
Astrobiology ; 24(2): 131-137, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38393827

ABSTRACT

As scientific investigations increasingly adopt Open Science practices, reuse of data becomes paramount. However, despite decades of progress in internet search tools, finding relevant astrobiology datasets for an envisioned investigation remains challenging due to the precise and atypical needs of the astrobiology researcher. In response, we have developed the Astrobiology Resource Metadata Standard (ARMS), a metadata standard designed to uniformly describe astrobiology "resources," that is, virtually any product of astrobiology research. Those resources include datasets, physical samples, software (modeling codes and scripts), publications, websites, images, videos, presentations, and so on. ARMS has been formulated to describe astrobiology resources generated by individual scientists or smaller scientific teams, rather than larger mission teams who may be required to use more complex archival metadata schemes. In the following, we discuss the participatory development process, give an overview of the metadata standard, describe its current use in practice, and close with a discussion of additional possible uses and extensions.


Subject(s)
Exobiology , Metadata , Software
19.
Mol Cell Proteomics ; 23(3): 100731, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38331191

ABSTRACT

Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.


Subject(s)
Privacy , Proteomics , Humans , Genomics , Metadata , Information Dissemination
20.
Sci Data ; 11(1): 179, 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38332144

ABSTRACT

Data standardization promotes a common framework through which researchers can utilize others' data and is one of the leading methods neuroimaging researchers use to share and replicate findings. As of today, standardizing datasets requires technical expertise such as coding and knowledge of file formats. We present ezBIDS, a tool for converting neuroimaging data and associated metadata to the Brain Imaging Data Structure (BIDS) standard. ezBIDS contains four major features: (1) No installation or programming requirements. (2) Handling of both imaging and task events data and metadata. (3) Semi-automated inference and guidance for adherence to BIDS. (4) Multiple data management options: download BIDS data to local system, or transfer to OpenNeuro.org or to brainlife.io. In sum, ezBIDS requires neither coding proficiency nor knowledge of BIDS, and is the first BIDS tool to offer guided standardization, support for task events conversion, and interoperability with OpenNeuro.org and brainlife.io.


Subject(s)
Metadata , Neuroimaging , Data Display , Data Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...