Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
bioRxiv ; 2024 Feb 17.
Article in English | MEDLINE | ID: mdl-38405704

ABSTRACT

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. While some benchmarks produced similar results across the top-performing models, others differed substantially. All top-performing models used neural networks, but diverged in architectures and novel training strategies, tailored to genomics sequence data. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide any given model into logically equivalent building blocks. We tested all possible combinations for the top three models and observed performance improvements for each. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets. Overall, we demonstrate that high-quality gold-standard genomics datasets can drive significant progress in model development.

2.
Bioinformatics ; 39(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37490428

ABSTRACT

MOTIVATION: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar. RESULTS: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.org team in the DREAM 2022 challenge of predicting gene expression from gigantic parallel reporter assays. Using published data, here, we demonstrate that LegNet outperforms existing models and accurately predicts gene expression per se as well as the effects of single-nucleotide variants. Furthermore, we show how LegNet can be used in a diffusion network manner for the rational design of promoter sequences yielding the desired expression level. AVAILABILITY AND IMPLEMENTATION: https://github.com/autosome-ru/LegNet. The GitHub repository includes Jupyter Notebook tutorials and Python scripts under the MIT license to reproduce the results presented in the study.


Subject(s)
Deep Learning , Regulatory Sequences, Nucleic Acid , DNA , Promoter Regions, Genetic , Software
3.
J Exp Bot ; 73(7): 2021-2034, 2022 04 05.
Article in English | MEDLINE | ID: mdl-34940828

ABSTRACT

C4 photosynthesis increases the efficiency of carbon fixation by spatially separating high concentrations of molecular oxygen from Rubisco. The specialized leaf anatomy required for this separation evolved independently many times. The morphology of C4 root systems is also distinctive and adapted to support high rates of photosynthesis; however, little is known about the molecular mechanisms that have driven the evolution of C4 root system architecture. Using a mutant screen in the C4 model plant Setaria italica, we identify Siaux1-1 and Siaux1-2 as root system architecture mutants. Unlike in S. viridis, AUX1 promotes lateral root development in S. italica. A cell by cell analysis of the Siaux1-1 root apical meristem revealed changes in the distribution of cell volumes in all cell layers and a dependence of the frequency of protophloem and protoxylem strands on SiAUX1. We explore the molecular basis of the role of SiAUX1 in seedling development using an RNAseq analysis of wild-type and Siaux1-1 plants and present novel targets for SiAUX1-dependent gene regulation. Using a selection sweep and haplotype analysis of SiAUX1, we show that Hap-2412TT in the promoter region of SiAUX1 is an allele which is associated with lateral root number and has been strongly selected for during Setaria domestication.


Subject(s)
Setaria Plant , Domestication , Photosynthesis , Plant Leaves/genetics , Setaria Plant/genetics
4.
BMC Genomics ; 21(Suppl 8): 490, 2020 Jul 28.
Article in English | MEDLINE | ID: mdl-32723302

ABSTRACT

BACKGROUND: There is a plethora of methods for genome-wide association studies. However, only a few of them may be classified as multi-trait and multi-locus, i.e. consider the influence of multiple genetic variants to several correlated phenotypes. RESULTS: We propose a multi-trait multi-locus model which employs structural equation modeling (SEM) to describe complex associations between SNPs and traits - multi-trait multi-locus SEM (mtmlSEM). The structure of our model makes it possible to discriminate pleiotropic and single-trait SNPs of direct and indirect effect. We also propose an automatic procedure to construct the model using factor analysis and the maximum likelihood method. For estimating a large number of parameters in the model, we performed Bayesian inference and implemented Gibbs sampling. An important feature of the model is that it correctly copes with non-normally distributed variables, such as some traits and variants. CONCLUSIONS: We applied the model to Vavilov's collection of 404 chickpea (Cicer arietinum L.) accessions with 20-fold cross-validation. We analyzed 16 phenotypic traits which we organized into five groups and found around 230 SNPs associated with traits, 60 of which were of pleiotropic effect. The model demonstrated high accuracy in predicting trait values.


Subject(s)
Genome-Wide Association Study/statistics & numerical data , Latent Class Analysis , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Bayes Theorem , Genotype , Humans , Likelihood Functions
SELECTION OF CITATIONS
SEARCH DETAIL
...