Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 391, 2023 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-37853347

RESUMO

BACKGROUND: The rapid development of omics acquisition techniques has induced the production of a large volume of heterogeneous and multi-level omics datasets, which require specific and sometimes complex analyses to obtain relevant biological information. Here, we present ASTERICS (version 2.5), a publicly available web interface for the analyses of omics datasets. RESULTS: ASTERICS is designed to make both standard and complex exploratory and integration analysis workflows easily available to biologists and to provide high quality interactive plots. Special care has been taken to provide a comprehensive documentation of the implemented analyses and to guide users toward sound analysis choices regarding some specific omics data. Data and analyses are organized in a comprehensive graphical workflow within ASTERICS workspace to facilitate the understanding of successive data editions and analyses leading to a given result. CONCLUSION: ASTERICS provides an easy to use platform for omics data exploration and integration. The modular organization of its open source code makes it easy to incorporate new workflows and analyses by external contributors. ASTERICS is available at https://asterics.miat.inrae.fr and can also be deployed using provided docker images.


Assuntos
Software , Fluxo de Trabalho
2.
Sci Data ; 10(1): 369, 2023 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-37291142

RESUMO

Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the "Charolais" breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Animais , Bovinos , Feminino , Benchmarking , Genoma , Análise de Sequência de DNA
3.
Genes (Basel) ; 14(3)2023 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-36980936

RESUMO

By pairing to messenger RNAs (mRNAs for short), microRNAs (miRNAs) regulate gene expression in animals and plants. Accurately identifying which mRNAs interact with a given miRNA and the precise location of the interaction sites is crucial to reaching a more complete view of the regulatory network of an organism. Only a few experimental approaches, however, allow the identification of both within a single experiment. Computational predictions of miRNA-mRNA interactions thus remain generally the first step used, despite their drawback of a high rate of false-positive predictions. The major computational approaches available rely on a diversity of features, among which anchoring the miRNA seed and measuring mRNA accessibility are the key ones, with the first being universally used, while the use of the second remains controversial. Revisiting the importance of each is the aim of this paper, which uses Cross-Linking, Ligation, And Sequencing of Hybrids (CLASH) datasets to achieve this goal. Contrary to what might be expected, the results are more ambiguous regarding the use of the seed match as a feature, while accessibility appears to be a feature worth considering, indicating that, at least under some conditions, it may favour anchoring by miRNAs.


Assuntos
Regulação da Expressão Gênica , MicroRNAs , RNA Mensageiro , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Mensageiro/genética
4.
BMC Bioinformatics ; 23(1): 495, 2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36401177

RESUMO

BACKGROUND: Sequencing is the key method to study the impact of short RNAs, which include micro RNAs, tRNA-derived RNAs, and piwi-interacting RNA, among others. The first step to make use of these reads is to map them to a genome. Existing mapping tools have been developed for long RNAs in mind, and, so far, no tool has been conceived for short RNAs. However, short RNAs have several distinctive features which make them different from messenger RNAs: they are shorter, they are often redundant, they can be produced by duplicated loci, and they may be edited at their ends. RESULTS: In this work, we present a new tool, srnaMapper, that exhaustively maps these reads with all these features in mind, and is most efficient when applied to reads no longer than 50 base pairs. We show, on several datasets, that srnaMapper is very efficient considering computation time and edition error handling: it retrieves all the hits, with arbitrary number of errors, in time comparable with non-exhaustive tools.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , MicroRNAs , Análise de Sequência de RNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA Interferente Pequeno , RNA de Transferência
5.
PLoS One ; 17(9): e0273253, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36070299

RESUMO

Circular RNA (circRNA) is a noncoding RNA class with important implications for gene expression regulation, mostly by interaction with other RNA species or RNA-binding proteins. While the commonly applied short-read Illumina RNA-sequencing techniques can be used to detect circRNAs, their full sequence is not revealed. However, the complete sequence information is needed to analyze potential interactions and thus the mechanism of action of circRNAs. Here, we present an improved protocol to enrich and sequence full-length circRNAs by using the Oxford Nanopore long-read sequencing platform. The protocol involves an enrichment of lowly abundant circRNAs by exonuclease treatment and negative selection of linear RNAs. Then, a cDNA library is created and amplified by PCR. This protocol provides enough material for several sequencing runs. The library is used as input for ligation-based sequencing together with native barcoding. Stringent quality control of the libraries is ensured by a combination of Qubit, Fragment Analyzer and qRT-PCR. Multiplexing of up to 4 libraries yields in total more than 1-2 Million reads per library, of which 1-2% are circRNA-specific reads with >99% of them full-length. The protocol works well with human cancer cell lines. We further provide suggestions for the bioinformatic analysis of the created data, as well as the limitations of our approach together with recommendations for troubleshooting and interpretation. Taken together, this protocol enables reliable full-length analysis of circRNAs, a noncoding RNA type involved in a growing number of physiologic and pathologic conditions. Metadata Associated content. https://dx.doi.org/10.17504/protocols.io.rm7vzy8r4lx1/v2.


Assuntos
Nanoporos , RNA Circular , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , RNA/genética , Análise de Sequência de RNA/métodos
6.
Virus Evol ; 7(2): veab093, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35299790

RESUMO

Highly pathogenic avian influenza viruses (HPAIVs) evolve from low pathogenic avian influenza viruses (LPAIVs) of the H5 and H7 subtypes. This evolution is characterized by the acquisition of a multi-basic cleavage site (MBCS) motif in the hemagglutinin (HA) that leads to an extended viral tropism and severe disease in poultry. One key unanswered question is whether the risk of transition to HPAIVs is similar for all LPAIVs H5 or H7 strains, or whether specific determinants in the HA sequence of some H5 or H7 LPAIV strains correlate with a higher risk of transition to HPAIVs. Here, we determined if specific features of the conserved RNA stem-loop located at the HA cleavage site-encoding region could be detected along the LPAIV to HPAIV evolutionary pathway. Analysis of the thermodynamic stability of the predicted RNA structures showed no specific patterns common to HA sequences leading to HPAIVs and distinct from those remaining LPAIVs. However, RNA structure clustering analysis revealed that most of the American lineage ancestors leading to H7 emergences via recombination shared the same viral RNA (vRNA) structure topology at the HA1/HA2 boundary region. Our study thus identified predicted secondary RNA structures present in the HA of H7 viruses, which could promote genetic recombination and acquisition of a multibasic cleavage site motif (MBCS).

7.
PLoS One ; 15(5): e0231738, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32463818

RESUMO

High-throughput sequencing makes it possible to provide the genome-wide distribution of small non coding RNAs in a single experiment, and contributed greatly to the identification and understanding of these RNAs in the last decade. Small non coding RNAs gather a wide collection of classes, such as microRNAs, tRNA-derived fragments, small nucleolar RNAs and small nuclear RNAs, to name a few. As usual in RNA-seq studies, the sequencing step is followed by a feature quantification step: when a genome is available, the reads are aligned to the genome, their genomic positions are compared to the already available annotations, and the corresponding features are quantified. However, problem arises when many reads map at several positions and while different strategies exist to circumvent this problem, all of them are biased. In this article, we present a new strategy that compares all the reads that map at several positions, and their annotations when available. In many cases, all the hits co-localize with the same feature annotation (a duplicated miRNA or a duplicated gene, for instance). When different annotations exist for a given read, we propose to merge existing features and provide the counts for the merged features. This new strategy has been implemented in a tool, mmannot, freely available at https://github.com/mzytnicki/mmannot.


Assuntos
Pequeno RNA não Traduzido/genética , Análise de Sequência de RNA/métodos , Software , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , MicroRNAs/genética , Anotação de Sequência Molecular
8.
Front Microbiol ; 10: 2701, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31824468

RESUMO

CRISPR/Cas systems provide adaptive defense mechanisms against invading nucleic acids in prokaryotes. Because of its interest as a genetic tool, the Type II CRISPR/Cas9 system from Streptococcus pyogenes has been extensively studied. It includes the Cas9 endonuclease that is dependent on a dual-guide RNA made of a tracrRNA and a crRNA. Target recognition relies on crRNA annealing and the presence of a protospacer adjacent motif (PAM). Mollicutes are currently the bacteria with the smallest genome in which CRISPR/Cas systems have been reported. Many of them are pathogenic to humans and animals (mycoplasmas and ureaplasmas) or plants (phytoplasmas and some spiroplasmas). A global survey was conducted to identify and compare CRISPR/Cas systems found in the genome of these minimal bacteria. Complete or degraded systems classified as Type II-A and less frequently as Type II-C were found in the genome of 21 out of 52 representative mollicutes species. Phylogenetic reconstructions predicted a common origin of all CRISPR/Cas systems of mycoplasmas and at least two origins were suggested for spiroplasmas systems. Cas9 in mollicutes were structurally related to the S. aureus Cas9 except the PI domain involved in the interaction with the PAM, suggesting various PAM might be recognized by Cas9 of different mollicutes. Structure of the predicted crRNA/tracrRNA hybrids was conserved and showed typical stem-loop structures pairing the Direct Repeat part of crRNAs with the 5' region of tracrRNAs. Most mollicutes crRNA/tracrRNAs showed G + C% significantly higher than the genome, suggesting a selective pressure for maintaining stability of these secondary structures. Examples of CRISPR spacers matching with mollicutes phages were found, including the textbook case of Mycoplasma cynos strain C142 having no prophage sequence but a CRISPR/Cas system with spacers targeting prophage sequences that were found in the genome of another M. cynos strain that is devoid of a CRISPR system. Despite their small genome size, mollicutes have maintained protective means against invading DNAs, including restriction/modification and CRISPR/Cas systems. The apparent lack of CRISPR/Cas systems in several groups of species including main pathogens of humans, ruminants, and plants suggests different evolutionary routes or a lower risk of phage infection in specific ecological niches.

9.
Microbiol Resour Announc ; 8(31)2019 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-31371528

RESUMO

We present the draft genome sequence of Tubulinosema ratisbonensis, a microsporidium species infecting Drosophila melanogaster A total of 3,013 protein-encoding genes and an array of transposable elements were identified. This work represents a necessary step to develop a novel model of host-parasite relationships using the highly tractable genetic model D. melanogaster.

10.
Methods Enzymol ; 612: 47-66, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30502954

RESUMO

In this study, we compared different computational methods used for genome-wide determination of mRNA half-lives in Escherichia coli with a special focus on the impact on considering a delay before the onset of mRNA decay after transcription arrest. A wide variety of datasets were analyzed coming from different technical methods for mRNA quantification (microarrays, RNA-seq, and RT-qPCR) and different bacterial growth conditions. The exponential decay of mRNA levels was fitted using both linear and exponential models and with or without a delay. We showed that for all the models, independently of mRNA quantification methods and growth conditions, ignoring the delay resulted in only a modest overestimation of the half-life. For approximately 80% of the mRNAs, differences in mRNA half-life values were less than 34s. The correlation between half-lives estimated with and without a delay was extremely high. However, the slope of the linear regression between the half-lives with and without a delay tended to decrease with the delay. For the few mRNAs for which taking into account the delay influenced the estimated half-life, the impact was dependent on the model and the growth condition. The smallest impact was obtained for the linear model.


Assuntos
Escherichia coli/genética , Estabilidade de RNA/fisiologia , RNA Bacteriano/metabolismo , RNA Mensageiro/metabolismo , Estabilidade de RNA/genética , Transcrição Gênica/genética
11.
Nat Plants ; 4(7): 440-452, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29915331

RESUMO

Oaks are an important part of our natural and cultural heritage. Not only are they ubiquitous in our most common landscapes1 but they have also supplied human societies with invaluable services, including food and shelter, since prehistoric times2. With 450 species spread throughout Asia, Europe and America3, oaks constitute a critical global renewable resource. The longevity of oaks (several hundred years) probably underlies their emblematic cultural and historical importance. Such long-lived sessile organisms must persist in the face of a wide range of abiotic and biotic threats over their lifespans. We investigated the genomic features associated with such a long lifespan by sequencing, assembling and annotating the oak genome. We then used the growing number of whole-genome sequences for plants (including tree and herbaceous species) to investigate the parallel evolution of genomic characteristics potentially underpinning tree longevity. A further consequence of the long lifespan of trees is their accumulation of somatic mutations during mitotic divisions of stem cells present in the shoot apical meristems. Empirical4 and modelling5 approaches have shown that intra-organismal genetic heterogeneity can be selected for6 and provides direct fitness benefits in the arms race with short-lived pests and pathogens through a patchwork of intra-organismal phenotypes7. However, there is no clear proof that large-statured trees consist of a genetic mosaic of clonally distinct cell lineages within and between branches. Through this case study of oak, we demonstrate the accumulation and transmission of somatic mutations and the expansion of disease-resistance gene families in trees.


Assuntos
Genoma de Planta/genética , Quercus/genética , Evolução Biológica , DNA de Plantas/genética , Variação Genética/genética , Longevidade/genética , Mutação , Filogenia , Análise de Sequência de DNA
12.
BMC Genomics ; 18(1): 882, 2017 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-29145803

RESUMO

BACKGROUND: Small regulatory RNAs (sRNAs) are widely found in bacteria and play key roles in many important physiological and adaptation processes. Studying their evolution and screening for events of coevolution with other genomic features is a powerful way to better understand their origin and assess a common functional or adaptive relationship between them. However, evolution and coevolution of sRNAs with coding genes have been sparsely investigated in bacterial pathogens. RESULTS: We designed a robust and generic phylogenomics approach that detects correlated evolution between sRNAs and protein-coding genes using their observed and inferred patterns of presence-absence in a set of annotated genomes. We applied this approach on 79 complete genomes of the Listeria genus and identified fifty-two accessory sRNAs, of which most were present in the Listeria common ancestor and lost during Listeria evolution. We detected significant coevolution between 23 sRNA and 52 coding genes and inferred the Listeria sRNA-coding genes coevolution network. We characterized a main hub of 12 sRNAs that coevolved with genes encoding cell wall proteins and virulence factors. Among them, an sRNA specific to L. monocytogenes species, rli133, coevolved with genes involved either in pathogenicity or in interaction with host cells, possibly acting as a direct negative post-transcriptional regulation. CONCLUSIONS: Our approach allowed the identification of candidate sRNAs potentially involved in pathogenicity and host interaction, consistent with recent findings on known pathogenicity actors. We highlight four sRNAs coevolving with seven internalin genes, some of which being important virulence factors in Listeria.


Assuntos
Proteínas de Bactérias/genética , Evolução Molecular , Listeria/genética , Pequeno RNA não Traduzido/genética , Redes Reguladoras de Genes , Genes Bacterianos , Genoma Bacteriano , Listeria/patogenicidade
13.
Genome Announc ; 5(32)2017 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-28798184

RESUMO

Staphylococcus aureus is an opportunistic Gram-positive pathogen responsible for a wide range of infections from minor skin abscesses to life-threatening diseases. Here, we report the draft genome assembly and current annotation of the HG001 strain, a derivative of the RN1 (NCT8325) strain with restored rbsU (a positive activator of SigB).

14.
DNA Res ; 24(3): 251-260, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28338834

RESUMO

Microsporidia are ubiquitous intracellular pathogens whose opportunistic nature led to their increased recognition with the rise of the AIDS pandemic. As the RNA world was largely unexplored in this parasitic lineage, we developed a dedicated in silico methodology to carry out exhaustive identification of ncRNAs across the Encephalitozoon and Nosema genera. Thus, the previously missing U1 small nuclear RNA (snRNA) and small nucleolar RNAs (snoRNAs) targeting only the LSU rRNA were highlighted and were further validated using 5' and 3'RACE-PCR experiments. Overall, the 15 ncRNAs that were found shared between Encephalitozoon and Nosema spp. may represent the minimal core set required for parasitic life. Interestingly, the systematic presence of a CCC- or GGG-like motif in 5' of all ncRNA and mRNA gene transcripts regardless of the RNA polymerase involved suggests that the RNA polymerase machineries in microsporidia species could use common factors. Our data provide additional insights in accordance with the simplification processes observed in these reduce genomes and underline the usefulness of sequencing closely related species to help identify highly divergent ncRNAs in these parasites.


Assuntos
Encephalitozoon/genética , Genoma Fúngico , Nosema/genética , RNA não Traduzido/metabolismo , Transcrição Gênica , Sequência de Bases , Simulação por Computador , Genômica , RNA Nuclear Pequeno/metabolismo , RNA Nucleolar Pequeno/metabolismo
15.
mSystems ; 2(2)2017.
Artigo em Inglês | MEDLINE | ID: mdl-28317029

RESUMO

As for many model organisms, the amount of Listeria omics data produced has recently increased exponentially. There are now >80 published complete Listeria genomes, around 350 different transcriptomic data sets, and 25 proteomic data sets available. The analysis of these data sets through a systems biology approach and the generation of tools for biologists to browse these various data are a challenge for bioinformaticians. We have developed a web-based platform, named Listeriomics, that integrates different tools for omics data analyses, i.e., (i) an interactive genome viewer to display gene expression arrays, tiling arrays, and sequencing data sets along with proteomics and genomics data sets; (ii) an expression and protein atlas that connects every gene, small RNA, antisense RNA, or protein with the most relevant omics data; (iii) a specific tool for exploring protein conservation through the Listeria phylogenomic tree; and (iv) a coexpression network tool for the discovery of potential new regulations. Our platform integrates all the complete Listeria species genomes, transcriptomes, and proteomes published to date. This website allows navigation among all these data sets with enriched metadata in a user-friendly format and can be used as a central database for systems biology analysis. IMPORTANCE In the last decades, Listeria has become a key model organism for the study of host-pathogen interactions, noncoding RNA regulation, and bacterial adaptation to stress. To study these mechanisms, several genomics, transcriptomics, and proteomics data sets have been produced. We have developed Listeriomics, an interactive web platform to browse and correlate these heterogeneous sources of information. Our website will allow listeriologists and microbiologists to decipher key regulation mechanism by using a systems biology approach.

16.
BMC Genomics ; 17: 164, 2016 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-26931235

RESUMO

BACKGROUND: MicroRNAs (miRNAs) have emerged as important post-transcriptional regulators of gene expression in a wide variety of physiological processes. They can control both temporal and spatial gene expression and are believed to regulate 30 to 70% of the genes. Data are however limited for fish species, with only 9 out of the 30,000 fish species present in miRBase. The aim of the current study was to discover and characterize rainbow trout (Oncorhynchus mykiss) miRNAs in a large number of tissues using next-generation sequencing in order to provide an extensive repertoire of rainbow trout miRNAs. RESULTS: A total of 38 different samples corresponding to 16 different tissues or organs were individually sequenced and analyzed independently in order to identify a large number of miRNAs with high confidence. This led to the identification of 2946 miRNA loci in the rainbow trout genome, including 445 already known miRNAs. Differential expression analysis was performed in order to identify miRNAs exhibiting specific or preferential expression among the 16 analyzed tissues. In most cases, miRNAs exhibit a specific pattern of expression in only a few tissues. The expression data from sRNA sequencing were confirmed by RT-qPCR. In addition, novel miRNAs are described in rainbow trout that had not been previously reported in other species. CONCLUSION: This study represents the first characterization of rainbow trout miRNA transcriptome from a wide variety of tissue and sets an extensive repertoire of rainbow trout miRNAs. It provides a starting point for future studies aimed at understanding the roles of miRNAs in major physiological process such as growth, reproduction or adaptation to stress. These rainbow trout miRNAs repertoire provide a novel resource to advance genomic research in salmonid species.


Assuntos
MicroRNAs/genética , Oncorhynchus mykiss/genética , Transcriptoma , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA
17.
Bioinformatics ; 32(3): 456-8, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26454273

RESUMO

SUMMARY: Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. AVAILABILITY AND IMPLEMENTATION: Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. CONTACT: Jerome.Mariette@toulouse.inra.fr.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Internet , Software , Bases de Dados Factuais , Humanos , Fluxo de Trabalho
18.
Mol Ecol Resour ; 16(1): 254-65, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25944057

RESUMO

The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole-genome shotgun (WGS) approach, without the use of costly and time-consuming methods, such as fosmid or BAC clone-based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS-FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired-end and mate-pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired-end reads and contaminants detected, resulting in a total of 17,910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.


Assuntos
Genoma de Planta , Quercus/genética , Modelos Genéticos , Anotação de Sequência Molecular , Filogenia , Quercus/classificação , Análise de Sequência de DNA
19.
PLoS One ; 10(9): e0139075, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26421846

RESUMO

The proper prediction of the gene catalogue of an organism is essential to obtain a representative snapshot of its overall lifestyle, especially when it is not amenable to culturing. Microsporidia are obligate intracellular, sometimes hard to culture, eukaryotic parasites known to infect members of every animal phylum. To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes. In the present paper, we investigated whether such gene sizes may have induced biases for the methodologies used for genome annotation, with an emphasis on small coding sequence (CDS) gene prediction. Using better delineated intergenic regions from four Encephalitozoon genomes, we predicted de novo new small CDSs with sizes ranging from 78 to 255 bp (median 168) and corroborated these predictions by RACE-PCR experiments in Encephalitozoon cuniculi. Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance. The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.


Assuntos
Encephalitozoon/genética , Genes Fúngicos/genética , Tamanho do Genoma , Genômica , Fases de Leitura Aberta/genética , Sequência de Bases , DNA Intergênico/genética , Anotação de Sequência Molecular , Dados de Sequência Molecular , Filogenia , Reprodutibilidade dos Testes
20.
BMC Bioinformatics ; 16: 179, 2015 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-26022464

RESUMO

BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAFOLD. This novel method, called MIRINHO, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that MIRINHO, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, MIRINHO is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data.


Assuntos
Arabidopsis/genética , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Insetos/genética , MicroRNAs/genética , Análise de Sequência de RNA/métodos , Software , Algoritmos , Animais , Pareamento de Bases , Sequência de Bases , Genômica/métodos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...