Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Bioinformatics ; 31(24): 3897-905, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26315901

RESUMO

MOTIVATION: Long non-coding RNAs (lncRNAs), which are non-coding RNAs of length above 200 nucleotides, play important biological functions such as gene expression regulation. To fully reveal the functions of lncRNAs, a fundamental step is to annotate them in various species. However, as lncRNAs tend to encode one or multiple open reading frames, it is not trivial to distinguish these long non-coding transcripts from protein-coding genes in transcriptomic data. RESULTS: In this work, we design a new tool that calculates the coding potential of a transcript using a machine learning model (random forest) based on multiple features including sequence characteristics of putative open reading frames, translation scores based on ribosomal coverage, and conservation against characterized protein families. The experimental results show that our tool competes favorably with existing coding potential computation tools in lncRNA identification. AVAILABILITY AND IMPLEMENTATION: The scripts and data can be downloaded at https://github.com/zhangy72/LncRNA-ID.


Assuntos
Aprendizado de Máquina , RNA Longo não Codificante/genética , Software , Animais , Humanos , Camundongos , Fases de Leitura Aberta , Proteínas/genética , Ribossomos/metabolismo
3.
Plant Physiol ; 164(2): 513-24, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24306534

RESUMO

We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.


Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Genoma de Planta/genética , Anotação de Sequência Molecular/métodos , Software , Zea mays/genética , Processamento Alternativo/genética , Éxons/genética , Genes de Plantas/genética , Pseudogenes/genética , Sequências Repetitivas de Ácido Nucleico/genética , Reprodutibilidade dos Testes
4.
Mamm Genome ; 24(11-12): 484-99, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24202129

RESUMO

The diversity of dog breeds make the domestic dog a valuable model for identifying genes responsible for many phenotypic and behavioral traits. The brain, in particular, is a region of interest for the analysis of molecular changes that are involved in dog-specific behavioral phenotypes. However, such studies are handicapped due to incomplete annotation of the dog genome. We present a high-coverage transcriptome of the dog brain using RNA-Seq. Two areas of the brain, hypothalamus and cerebral cortex, were selected for their roles in cognition, emotion, and neuroendocrine functions. We detected many novel features of the dog transcriptome, including 13,799 novel exons, 51,357 exons with unique 5' or 3' modifications, and many novel alternative splicing events. We provide some examples of novel features in genes that are related to domestication, including ADCY8, SMOC2, and PRNP. We also found 247 novel protein-coding genes and 328 noncoding RNAs, including 57 long noncoding RNAs that represent the first empirical evidence for a large fraction of noncoding RNAs in the dog. In addition, we analyze both gene expression and alternative splicing differences between the hypothalamus and cerebral cortex and find that there is very little overlap between genes that are differentially alternatively spliced and genes that are differentially expressed. We thereby suggest that researchers who want to pinpoint the genetic causes for dog breed-specific traits and diseases should not confine their studies to gene expression alone, but should consider other factors such as alternative splicing and changes in untranslated regions.


Assuntos
Córtex Cerebral/metabolismo , Cães/genética , Hipotálamo/metabolismo , Transcriptoma , Processamento Alternativo , Animais , Encéfalo/metabolismo , Córtex Cerebral/química , Cães/metabolismo , Éxons , Masculino , RNA não Traduzido/genética , RNA não Traduzido/metabolismo
5.
BMC Bioinformatics ; 14 Suppl 2: S1, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23369147

RESUMO

BACKGROUND: Accurate secondary structure prediction provides important information to undefirstafinding the tertiary structures and thus the functions of ncRNAs. However, the accuracy of the native structure derivation of ncRNAs is still not satisfactory, especially on sequences containing pseudoknots. It is recently shown that using the abstract shapes, which retain adjacency and nesting of structural features but disregard the length details of helix and loop regions, can improve the performance of structure prediction. In this work, we use SVM-based feature selection to derive the consensus abstract shape of homologous ncRNAs and apply the predicted shape to structure prediction including pseudoknots. RESULTS: Our approach was applied to predict shapes and secondary structures on hundreds of ncRNA data sets with and without psuedoknots. The experimental results show that we can achieve 18% higher accuracy in shape prediction than the state-of-the-art consensus shape prediction tools. Using predicted shapes in structure prediction allows us to achieve approximate 29% higher sensitivity and 10% higher positive predictive value than other pseudoknot prediction tools. CONCLUSIONS: Extensive analysis of RNA properties based on SVM allows us to identify important properties of sequences and structures related to their shapes. The combination of mass data analysis and SVM-based feature selection makes our approach a promising method for shape and structure prediction. The implemented tools, Knot Shape and Knot Structure are open source software and can be downloaded at: http://www.cse.msu.edu/~achawana/KnotShape.


Assuntos
Conformação de Ácido Nucleico , RNA não Traduzido/química , Software , Máquina de Vetores de Suporte , Biologia Computacional , RNA não Traduzido/genética
6.
PLoS Genet ; 8(11): e1003064, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23166516

RESUMO

Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica-specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis species by a growing academic community focused on this genus.


Assuntos
Genoma , Anotação de Sequência Molecular , Estramenópilas/genética , Sequência de Bases , Genômica , Nitrogênio/administração & dosagem , Nitrogênio/metabolismo , Análise de Sequência de DNA , Análise de Sequência de RNA/métodos , Especificidade da Espécie , Estramenópilas/crescimento & desenvolvimento , Transformação Genética
7.
J Bioinform Comput Biol ; 9(2): 317-37, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21523935

RESUMO

Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.


Assuntos
Conformação de Ácido Nucleico , RNA não Traduzido/química , RNA não Traduzido/genética , Alinhamento de Sequência/estatística & dados numéricos , Biologia Computacional , Simulação por Computador , Sequência Consenso , Genoma Humano , Humanos , Modelos Moleculares , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...