Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 52(12): 6802-6810, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38828788

ABSTRACT

The computational design of synthetic DNA sequences with designer in vivo properties is gaining traction in the field of synthetic genomics. We propose here a computational method which combines a kinetic Monte Carlo framework with a deep mutational screening based on deep learning predictions. We apply our method to build regular nucleosome arrays with tailored nucleosomal repeat lengths (NRL) in yeast. Our design was validated in vivo by successfully engineering and integrating thousands of kilobases long tandem arrays of computationally optimized sequences which could accommodate NRLs much larger than the yeast natural NRL (namely 197 and 237 bp, compared to the natural NRL of ∼165 bp). RNA-seq results show that transcription of the arrays can occur but is not driven by the NRL. The computational method proposed here delineates the key sequence rules for nucleosome positioning in yeast and should be easily applicable to other sequence properties and other genomes.


Subject(s)
Nucleosomes , Saccharomyces cerevisiae , Nucleosomes/metabolism , Nucleosomes/genetics , Nucleosomes/chemistry , Saccharomyces cerevisiae/genetics , Computer Simulation , Monte Carlo Method , DNA/genetics , DNA/chemistry , DNA/metabolism , Base Sequence , Deep Learning , Chromatin Assembly and Disassembly
2.
PeerJ ; 10: e13613, 2022.
Article in English | MEDLINE | ID: mdl-35769139

ABSTRACT

The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.


Subject(s)
Deep Learning , Neural Networks, Computer , Genomics , Computational Biology
3.
J Mol Biol ; 434(7): 167497, 2022 04 15.
Article in English | MEDLINE | ID: mdl-35189129

ABSTRACT

The artificial 601 DNA sequence is often used to constrain the position of nucleosomes on a DNA molecule in vitro. Although the ability of the 147 base pair sequence to precisely position a nucleosome in vitro is well documented, application of this property in vivo has been explored only in a few studies and yielded contradictory conclusions. Our goal in the present study was to test the ability of the 601 sequence to dictate nucleosome positioning in Saccharomyces cerevisiae in the context of a long tandem repeat array inserted in a yeast chromosome. We engineered such arrays with three different repeat size, namely 167, 197 and 237 base pairs. Although our arrays are able to position nucleosomes in vitro, analysis of nucleosome occupancy in vivo revealed that nucleosomes are not preferentially positioned as expected on the 601-core sequence along the repeats and that the measured nucleosome repeat length does not correspond to the one expected by design. Altogether our results demonstrate that the rules defining nucleosome positions on this DNA sequence in vitro are not valid in vivo, at least in this chromosomal context, questioning the relevance of using the 601 sequence in vivo to achieve precise nucleosome positioning on designer synthetic DNA sequences.


Subject(s)
Nucleosomes , Saccharomyces cerevisiae , Tandem Repeat Sequences , Chromatin Assembly and Disassembly , DNA, Fungal/genetics , DNA, Fungal/metabolism , Genetic Engineering , Nucleosomes/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Tandem Repeat Sequences/genetics
4.
Genome Res ; 31(2): 317-326, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33355297

ABSTRACT

Genetically modified genomes are often used today in many areas of fundamental and applied research. In many studies, coding or noncoding regions are modified in order to change protein sequences or gene expression levels. Modifying one or several nucleotides in a genome can also lead to unexpected changes in the epigenetic regulation of genes. When designing a synthetic genome with many mutations, it would thus be very informative to be able to predict the effect of these mutations on chromatin. We develop here a deep learning approach that quantifies the effect of every possible single mutation on nucleosome positions on the full Saccharomyces cerevisiae genome. This type of annotation track can be used when designing a modified S. cerevisiae genome. We further highlight how this track can provide new insights on the sequence-dependent mechanisms that drive nucleosomes' positions in vivo.

5.
Bioinformatics ; 37(11): 1593-1594, 2021 07 12.
Article in English | MEDLINE | ID: mdl-33135730

ABSTRACT

SUMMARY: Prediction of genomic annotations from DNA sequences using deep learning is today becoming a flourishing field with many applications. Nevertheless, there are still difficulties in handling data in order to conveniently build and train models dedicated for specific end-user's tasks. keras_dna is designed for an easy implementation of Keras models (TensorFlow high level API) for genomics. It can handle standard bioinformatic files formats as inputs such as bigwig, gff, bed, wig, bedGraph or fasta and returns standardized inputs for model training. keras_dna is designed to implement existing models but also to facilitate the development of news models that can have single or multiple targets or inputs. AVAILABILITY AND IMPLEMENTATION: Freely available with a MIT License using pip install keras_dna or cloning the github repo at https://github.com/etirouthier/keras_dna.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Software , DNA/genetics , Genome , Genomics
6.
PeerJ Comput Sci ; 6: e278, 2020.
Article in English | MEDLINE | ID: mdl-33816929

ABSTRACT

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.

SELECTION OF CITATIONS
SEARCH DETAIL
...