Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37220903

ABSTRACT

MOTIVATION: Developing new crop varieties with superior performance is highly important to ensure robust and sustainable global food security. The speed of variety development is limited by long field cycles and advanced generation selections in plant breeding programs. While methods to predict yield from genotype or phenotype data have been proposed, improved performance and integrated models are needed. RESULTS: We propose a machine learning model that leverages both genotype and phenotype measurements by fusing genetic variants with multiple data sources collected by unmanned aerial systems. We use a deep multiple instance learning framework with an attention mechanism that sheds light on the importance given to each input during prediction, enhancing interpretability. Our model reaches 0.754 ± 0.024 Pearson correlation coefficient when predicting yield in similar environmental conditions; a 34.8% improvement over the genotype-only linear baseline (0.559 ± 0.050). We further predict yield on new lines in an unseen environment using only genotypes, obtaining a prediction accuracy of 0.386 ± 0.010, a 13.5% improvement over the linear baseline. Our multi-modal deep learning architecture efficiently accounts for plant health and environment, distilling the genetic contribution and providing excellent predictions. Yield prediction algorithms leveraging phenotypic observations during training therefore promise to improve breeding programs, ultimately speeding up delivery of improved varieties. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/BorgwardtLab/PheGeMIL (code) and https://doi.org/doi:10.5061/dryad.kprr4xh5p (data).


Subject(s)
Deep Learning , Phenomics , Triticum/genetics , Plant Breeding/methods , Selection, Genetic , Phenotype , Genotype , Genomics/methods , Edible Grain/genetics
2.
Plant J ; 112(3): 772-785, 2022 11.
Article in English | MEDLINE | ID: mdl-36106415

ABSTRACT

Evolutionary change following gene duplication can lead to functionally divergent paralogous proteins. If comprising identical subunits their random assortment would also form potentially detrimental heteromeric proteins. In Arabidopsis, the ARF GTPase guanine-nucleotide exchange factor GNOM is essential for polar recycling of auxin-efflux transporter PIN1 from endosomes to the basal plasma membrane whereas its paralog GNL1 mediates retrograde Golgi-endoplasmic reticulum traffic. Here we show that both GNOM and GNL1 form homodimers but no heterodimers. To assess the biological significance of this, we generated transgenic plants expressing engineered heterodimer-compatible GNOM variants. Those plants showed developmental defects such as the failure to produce lateral roots. To identify mechanisms underlying heterodimer prevention, we analyzed interactions of the N-terminal dimerization and cyclophilin-binding (DCB) domain. Each DCB domain interacted with the complementary fragment (ΔDCB) both of their own and of the paralogous protein. However, only DCBGNOM interacted with itself whereas DCBGNL1 failed to interact with itself and with DCBGNOM . GNOM variants in which the DCB domain was removed or replaced by DCBGNL1 revealed a role for DCB-DCB interaction in the prevention of GNOM-GNL1 heterodimers whereas DCB-ΔDCB interaction was essential for dimer formation and GNOM function. Our data suggest a model of early DCB-DCB interaction that facilitates GNOM homodimer formation, indirectly precluding formation of detrimental heterodimers.


Subject(s)
Arabidopsis Proteins , Arabidopsis , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Dimerization , Guanine Nucleotide Exchange Factors/genetics , Guanine Nucleotide Exchange Factors/metabolism , Golgi Apparatus/metabolism , Peptidylprolyl Isomerase/metabolism
3.
Bioinformatics ; 38(13): 3454-3461, 2022 06 27.
Article in English | MEDLINE | ID: mdl-35639661

ABSTRACT

MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. RESULTS: Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. SUPPLEMENTARY INFORMATION: Supplemental data are available at Bioinformatics online.


Subject(s)
Machine Learning , Proteins , Proteins/metabolism , Gene Ontology
SELECTION OF CITATIONS
SEARCH DETAIL
...