Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
Proc Natl Acad Sci U S A ; 120(43): e2310223120, 2023 Oct 24.
Article in English | MEDLINE | ID: mdl-37844243

ABSTRACT

Physical laws-such as the laws of motion, gravity, electromagnetism, and thermodynamics-codify the general behavior of varied macroscopic natural systems across space and time. We propose that an additional, hitherto-unarticulated law is required to characterize familiar macroscopic phenomena of our complex, evolving universe. An important feature of the classical laws of physics is the conceptual equivalence of specific characteristics shared by an extensive, seemingly diverse body of natural phenomena. Identifying potential equivalencies among disparate phenomena-for example, falling apples and orbiting moons or hot objects and compressed springs-has been instrumental in advancing the scientific understanding of our world through the articulation of laws of nature. A pervasive wonder of the natural world is the evolution of varied systems, including stars, minerals, atmospheres, and life. These evolving systems appear to be conceptually equivalent in that they display three notable attributes: 1) They form from numerous components that have the potential to adopt combinatorially vast numbers of different configurations; 2) processes exist that generate numerous different configurations; and 3) configurations are preferentially selected based on function. We identify universal concepts of selection-static persistence, dynamic persistence, and novelty generation-that underpin function and drive systems to evolve through the exchange of information between the environment and the system. Accordingly, we propose a "law of increasing functional information": The functional information of a system will increase (i.e., the system will evolve) if many different configurations of the system undergo selection for one or more functions.

2.
Methods Mol Biol ; 2703: 3-22, 2023.
Article in English | MEDLINE | ID: mdl-37646933

ABSTRACT

The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. However, although many established infrastructures provide comprehensive and long-term stable services and platforms, a large quantity of research data is still hidden. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, for example, time series of images or high-resolution hyperspectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institutional boundaries. To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements an on-premise approach, which allows research data to be kept in place and wrapped in FAIR-aware software infrastructure. In this chapter, the e!DAL infrastructure software and the PGP repository are presented as best practice on how to easily setup FAIR-compliant and intuitive research data services.


Subject(s)
Genomics , Phenomics , Data Management , Databases, Factual , Germany
3.
J Integr Bioinform ; 19(4)2022 Dec 01.
Article in English | MEDLINE | ID: mdl-36065132

ABSTRACT

Over the last years it has been observed that the progress in data collection in life science has created increasing demand and opportunities for advanced bioinformatics. This includes data management as well as the individual data analysis and often covers the entire data life cycle. A variety of tools have been developed to store, share, or reuse the data produced in the different domains such as genotyping. Especially imputation, as a subfield of genotyping, requires good Research Data Management (RDM) strategies to enable use and re-use of genotypic data. To aim for sustainable software, it is necessary to develop tools and surrounding ecosystems, which are reusable and maintainable. Reusability in the context of streamlined tools can e.g. be achieved by standardizing the input and output of the different tools and adapting to open and broadly used file formats. By using such established file formats, the tools can also be connected with others, improving the overall interoperability of the software. Finally, it is important to build strong communities that maintain the tools by developing and contributing new features and maintenance updates. In this article, concepts for this will be presented for an imputation service.


Subject(s)
Computational Biology , Ecosystem , Genotype , Software
4.
Plant J ; 111(2): 335-347, 2022 07.
Article in English | MEDLINE | ID: mdl-35535481

ABSTRACT

The research data life cycle from project planning to data publishing is an integral part of current research. Until the last decade, researchers were responsible for all associated phases in addition to the actual research and were assisted only at certain points by IT or bioinformaticians. Starting with advances in sequencing, the automation of analytical methods in all life science fields, including in plant phenotyping, has led to ever-increasing amounts of ever more complex data. The tasks associated with these challenges now often exceed the expertise of and infrastructure available to scientists, leading to an increased risk of data loss over time. The IPK Gatersleben has one of the world's largest germplasm collections and two decades of experience in crop plant research data management. In this article we show how challenges in modern, data-driven research can be addressed by data stewards. Based on concrete use cases, data management processes and best practices from plant phenotyping, we describe which expertise and skills are required and how data stewards as an integral actor can enhance the quality of a necessary digital transformation in progressive research.


Subject(s)
Big Data , Phenomics , Plants , Crops, Agricultural/genetics , Plants/genetics
5.
Front Plant Sci ; 12: 732608, 2021.
Article in English | MEDLINE | ID: mdl-34659298

ABSTRACT

Gene pairs resulting from whole genome duplication (WGD), so-called ohnologous genes, are retained if at least one member of the pair undergoes neo- or sub-functionalization. Phylogenetic analyses of the ohnologous genes ALBOSTRIANS (HvAST/HvCMF7) and ALBOSTRIANS-LIKE (HvASL/HvCMF3) of barley (Hordeum vulgare) revealed them as members of a subfamily of genes coding for CCT motif (CONSTANS, CONSTANS-LIKE and TIMING OF CAB1) proteins characterized by a single CCT domain and a putative N-terminal chloroplast transit peptide. Recently, we showed that HvCMF7 is needed for chloroplast ribosome biogenesis. Here we demonstrate that mutations in HvCMF3 lead to seedlings delayed in development. They exhibit a yellowish/light green - xantha - phenotype and successively develop pale green leaves. Compared to wild type, plastids of mutant seedlings show a decreased PSII efficiency, impaired processing and reduced amounts of ribosomal RNAs; they contain less thylakoids and grana with a higher number of more loosely stacked thylakoid membranes. Site-directed mutagenesis of HvCMF3 identified a previously unknown functional domain, which is highly conserved within this subfamily of CCT domain containing proteins. HvCMF3:GFP fusion constructs were localized to plastids and nucleus. Hvcmf3Hvcmf7 double mutants exhibited a xantha-albino or albino phenotype depending on the strength of molecular lesion of the HvCMF7 allele. The chloroplast ribosome deficiency is discussed as the primary observed defect of the Hvcmf3 mutants. Based on our observations, the genes HvCMF3 and HvCMF7 have similar but not identical functions in chloroplast development of barley supporting our hypothesis of neo-/sub-functionalization between both ohnologous genes.

6.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33589928

ABSTRACT

This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.


Subject(s)
Data Management/methods , Metadata , Neural Networks, Computer , Proteomics/methods , Software , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , International Cooperation , Phenotype , Plants/genetics , Proteome , Self-Assessment , Workflow
7.
Gigascience ; 9(10)2020 10 22.
Article in English | MEDLINE | ID: mdl-33090199

ABSTRACT

BACKGROUND: The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, e.g., time series of images or high-resolution hyper-spectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institional boundaries. RESULTS: To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements a "bring the infrastructure to the data" approach, which allows research data to be kept in place and wrapped in a FAIR-aware software infrastructure. This article presents new features of the e!DAL infrastructure software and the PGP repository as a best practice on how to easily set up FAIR-compliant and intuitive research data services. Furthermore, the integration of the ELIXIR Authentication and Authorization Infrastructure (AAI) and data discovery services are introduced as means to lower technical barriers and to increase the visibility of research data. CONCLUSION: The e!DAL software matured to a powerful and FAIR-compliant infrastructure, while keeping the focus on flexible setup and integration into existing infrastructures and into the daily research process.


Subject(s)
Information Dissemination , Software , Databases, Factual , Genomics , Plants
8.
Front Plant Sci ; 11: 701, 2020.
Article in English | MEDLINE | ID: mdl-32595658

ABSTRACT

Genebanks harbor a large treasure trove of untapped plant genetic diversity. A growing world population and a changing climate require an increase in the production and development of stress resistant plant cultivars while decreasing the acreage. These requirements for improved plant cultivars can be supported by the broader exploitation of plant genetic resources (PGR) as inputs for genomics-assisted breeding. To support this process we have developed BRIDGE, a data warehouse and exploratory data analysis tool for genebank genomics of barley (Hordeum vulgare L.). Using efficient technologies for data storage, data transfer and web development, we facilitate access to digital genebank resources of barley by prioritizing the interactive and visual analysis of integrated genotypic and phenotypic data. The underlying data resulted from a barley genebank genomics study cataloging sequence and morphological data of 22,626 barley accessions, mainly from the German Federal ex situ genebank. BRIDGE consists of interactively coupled modules to visualize integrated, curated and quality checked data, such as variation data, results of dimensionality reduction and genome wide association studies (GWAS), phenotyping results, passport data as well as the geographic distribution of germplasm samples. The core component is a manager for custom collections of germplasm. A search module to find and select germplasm by passport and phenotypic attributes is included as well as modules to export genotypic data in gzip-compressed variant call format (VCF) files and phenotypic data in MIAPPE-compliant ISA-Tab files. BRIDGE is accessible at the following URL: https://bridge.ipk-gatersleben.de.

9.
New Phytol ; 227(1): 260-273, 2020 07.
Article in English | MEDLINE | ID: mdl-32171029

ABSTRACT

Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.


Subject(s)
Phenomics , Plants , Plants/genetics
10.
J Integr Bioinform ; 16(4)2020 Jan 08.
Article in English | MEDLINE | ID: mdl-31913851

ABSTRACT

Genetic variance within the genotype of population and its mapping to phenotype variance in a systematic and high throughput manner is of interest for biodiversity and breeding research. Beside the established and efficient high throughput genotype technologies, phenotype capabilities got increased focus in the last decade. This results in an increasing amount of phenotype data from well scaling, automated sensor platform. Thus, data stewardship is a central component to make experimental data from multiple domains interoperable and re-usable. To ensure a standard and comprehensive sharing of scientific and experimental data among domain experts, FAIR data principles are utilized for machine read-ability and scale-ability. In this context, BrAPI consortium, provides a comprehensive and commonly agreed FAIRed guidelines to offer a BrAPI layered scientific data in a RESTful manner. This paper presents the concepts, best practices and implementations to meet these challenges. As one of the worlds leading plant research institutes it is of vital interest for the IPK-Gatersleben to transform legacy data infrastructures into a bio-digital resource center for plant genetics resources (PGR). This paper also demonstrates the benefits of integrated database back-ends, established data stewardship processes, and FAIR data exposition in a machine-readable, highly scalable programmatic interfaces.


Subject(s)
Databases, Genetic , Plants/genetics , Programming Languages , Computational Biology , Information Management , Internet , Phenotype , Plant Breeding , Seeds/genetics , User-Computer Interface
11.
F1000Res ; 92020.
Article in English | MEDLINE | ID: mdl-33728038

ABSTRACT

Experimental data is only useful to other researchers if it is findable, accessible, interoperable, and reusable (FAIR). The ISA-Tab framework enables scientists to publish metadata about their experiments in a plain text, machine-readable format that aims to confer that interoperability and reusability. A Python software package (isatools) is currently being developed to programmatically produce these metadata files. For Java-based environments, there is no equivalent solution yet. While the isatools package provides a lot of flexibility and a wealth of different features for the Python ecosystem, a package for JVM-based applications might offer the speed and scalability needed for writing very large ISA-Tab files, making the ISA framework available in an even wider range of situations and environments. Here we present a light-weight and scalable Java library (isa4j) for generating metadata files in the ISA-Tab format, which elegantly integrates into existing JVM applications and especially shines at generating very large files. It is modeled after the ISA core specifications and designed in keeping with isatools conventions, making it consistent and intuitive to use for the community. isa4j is implemented in Java (JDK11+) and freely available under the terms of the MIT license from the Central Maven Repository ( https://mvnrepository.com/artifact/de.ipk-gatersleben/isa4j). The source code, detailed documentation, usage examples and performance evaluations can be found at https://github.com/IPK-BIT/isa4j.


Subject(s)
Metadata , Software , Writing
12.
Sci Data ; 6(1): 137, 2019 07 29.
Article in English | MEDLINE | ID: mdl-31358775

ABSTRACT

Genebanks are valuable sources of genetic diversity, which can help to cope with future problems of global food security caused by a continuously growing population, stagnating yields and climate change. However, the scarcity of phenotypic and genotypic characterization of genebank accessions severely restricts their use in plant breeding. To warrant the seed integrity of individual accessions during periodical regeneration cycles in the field phenotypic characterizations are performed. This study provides non-orthogonal historical data of 12,754 spring and winter wheat accessions characterized for flowering time, plant height, and thousand grain weight during 70 years of seed regeneration at the German genebank. Supported by historical weather observations outliers were removed following a previously described quality assessment pipeline. In this way, ready-to-use processed phenotypic data across regeneration years were generated and further validated. We encourage international and national genebanks to increase their efforts to transform into bio-digital resource centers. A first important step could consist in unlocking their historical data treasures that allows an educated choice of accessions by scientists and breeders.


Subject(s)
Seeds/genetics , Triticum/genetics , Conservation of Natural Resources , Crops, Agricultural/genetics , Models, Statistical , Phenotype , Seed Bank , Weather
13.
Plant J ; 97(1): 182-198, 2019 01.
Article in English | MEDLINE | ID: mdl-30500991

ABSTRACT

Recent advances in genomics technologies have greatly accelerated the progress in both fundamental plant science and applied breeding research. Concurrently, high-throughput plant phenotyping is becoming widely adopted in the plant community, promising to alleviate the phenotypic bottleneck. While these technological breakthroughs are significantly accelerating quantitative trait locus (QTL) and causal gene identification, challenges to enable even more sophisticated analyses remain. In particular, care needs to be taken to standardize, describe and conduct experiments robustly while relying on plant physiology expertise. In this article, we review the state of the art regarding genome assembly and the future potential of pangenomics in plant research. We also describe the necessity of standardizing and describing phenotypic studies using the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard to enable the reuse and integration of phenotypic data. In addition, we show how deep phenotypic data might yield novel trait-trait correlations and review how to link phenotypic data to genomic data. Finally, we provide perspectives on the golden future of machine learning and their potential in linking phenotypes to genomic features.


Subject(s)
Genetic Association Studies , Genome, Plant/genetics , Genomics , Machine Learning , Phenomics , Plants/genetics , Phenotype , Quantitative Trait Loci/genetics
14.
Sci Data ; 5: 180278, 2018 12 04.
Article in English | MEDLINE | ID: mdl-30512010

ABSTRACT

The scarce knowledge on phenotypic characterization restricts the usage of genetic diversity of plant genetic resources in research and breeding. We describe original and ready-to-use processed data for approximately 60% of ~22,000 barley accessions hosted at the Federal ex situ Genebank for Agricultural and Horticultural Plant Species. The dataset gathers records for three traits with agronomic relevance: flowering time, plant height and thousand grain weight. This information was collected for seven decades for winter and spring barley during the seed regeneration routine. The curated data represent a source for research on genetics and genomics of adaptive and yield related traits in cereals due to the importance of barley as model organism. This data could be used to predict the performance of non-phenotyped individuals in other collections through genomic prediction. Moreover, the dataset empowers the utilization of phenotypic diversity of genetic resources for crop improvement.


Subject(s)
Genetic Variation , Hordeum/genetics , Biological Variation, Population , Hordeum/growth & development , Seeds
15.
Gigascience ; 7(2)2018 02 01.
Article in English | MEDLINE | ID: mdl-29346559

ABSTRACT

Background: Image-based high-throughput phenotyping technologies have been rapidly developed in plant science recently, and they provide a great potential to gain more valuable information than traditionally destructive methods. Predicting plant biomass is regarded as a key purpose for plant breeders and ecologists. However, it is a great challenge to find a predictive biomass model across experiments. Results: In the present study, we constructed 4 predictive models to examine the quantitative relationship between image-based features and plant biomass accumulation. Our methodology has been applied to 3 consecutive barley (Hordeum vulgare) experiments with control and stress treatments. The results proved that plant biomass can be accurately predicted from image-based parameters using a random forest model. The high prediction accuracy based on this model will contribute to relieving the phenotyping bottleneck in biomass measurement in breeding applications. The prediction performance is still relatively high across experiments under similar conditions. The relative contribution of individual features for predicting biomass was further quantified, revealing new insights into the phenotypic determinants of the plant biomass outcome. Furthermore, methods could also be used to determine the most important image-based features related to plant biomass accumulation, which would be promising for subsequent genetic mapping to uncover the genetic basis of biomass. Conclusions: We have developed quantitative models to accurately predict plant biomass accumulation from image data. We anticipate that the analysis results will be useful to advance our views of the phenotypic determinants of plant biomass outcome, and the statistical methods can be broadly used for other plant species.


Subject(s)
Crops, Agricultural/anatomy & histology , Decision Trees , Hordeum/anatomy & histology , Image Processing, Computer-Assisted/statistics & numerical data , Imaging, Three-Dimensional/methods , Algorithms , Biomass , Crops, Agricultural/physiology , Droughts , Hordeum/physiology , Phenotype , Stress, Physiological
16.
J Biotechnol ; 261: 37-45, 2017 Nov 10.
Article in English | MEDLINE | ID: mdl-28698099

ABSTRACT

Plant genetic resources are a substantial opportunity for plant breeding, preservation and maintenance of biological diversity. As part of the German Network for Bioinformatics Infrastructure (de.NBI) the German Crop BioGreenformatics Network (GCBN) focuses mainly on crop plants and provides both data and software infrastructure which are tailored to the needs of the plant research community. Our mission and key objectives include: (1) provision of transparent access to germplasm seeds, (2) the delivery of improved workflows for plant gene annotation, and (3) implementation of bioinformatics services that link genotypes and phenotypes. This review introduces the GCBN's spectrum of web-services and integrated data resources that address common research problems in the plant genomics community.


Subject(s)
Genome, Plant/genetics , Genomics , Plants/genetics , Databases, Genetic , Genotype , Phenotype , Software
17.
J Biotechnol ; 261: 46-52, 2017 Nov 10.
Article in English | MEDLINE | ID: mdl-28602791

ABSTRACT

Recent advances in sequencing technologies have greatly accelerated the rate of plant genome and applied breeding research. Despite this advancing trend, plant genomes continue to present numerous difficulties to the standard tools and pipelines not only for genome assembly but also gene annotation and downstream analysis. Here we give a perspective on tools, resources and services necessary to assemble and analyze plant genomes and link them to plant phenotypes.


Subject(s)
Crops, Agricultural/genetics , Genome, Plant/genetics , Genomics , Molecular Sequence Annotation , Phenotype
18.
Plant Methods ; 12: 44, 2016.
Article in English | MEDLINE | ID: mdl-27843484

ABSTRACT

BACKGROUND: Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse. RESULTS: In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented. CONCLUSIONS: Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.

19.
Plant Genome ; 9(1)2016 03.
Article in English | MEDLINE | ID: mdl-27898761

ABSTRACT

The genome sequences of many important Triticeae species, including bread wheat ( L.) and barley ( L.), remained uncharacterized for a long time because their high repeat content, large sizes, and polyploidy. As a result of improvements in sequencing technologies and novel analyses strategies, several of these have recently been deciphered. These efforts have generated new insights into Triticeae biology and genome organization and have important implications for downstream usage by breeders, experimental biologists, and comparative genomicists. transPLANT () is an EU-funded project aimed at constructing hardware, software, and data infrastructure for genome-scale research in the life sciences. Since the Triticeae data are intrinsically complex, heterogenous, and distributed, the transPLANT consortium has undertaken efforts to develop common data formats and tools that enable the exchange and integration of data from distributed resources. Here we present an overview of the individual Triticeae genome resources hosted by transPLANT partners, introduce the objectives of transPLANT, and outline common developments and interfaces supporting integrated data access.


Subject(s)
Genome, Plant , Genomics/methods , Poaceae/genetics , Evolution, Molecular , Hordeum/genetics , Polyploidy , Triticum/genetics
20.
Sci Data ; 3: 160055, 2016 Aug 16.
Article in English | MEDLINE | ID: mdl-27529152

ABSTRACT

With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments.


Subject(s)
Arabidopsis/genetics , Phenotype , Arabidopsis Proteins , Computational Biology , Genome, Plant , Genomics , Growth and Development , Image Processing, Computer-Assisted , Information Storage and Retrieval , Plant Development , Plant Leaves , Plant Roots , Plant Shoots , Plants , Reproducibility of Results , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...