Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Methods Mol Biol ; 2802: 33-55, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819555

RESUMO

The identification of orthologous genes is relevant for comparative genomics, phylogenetic analysis, and functional annotation. There are many computational tools for the prediction of orthologous groups as well as web-based resources that offer orthology datasets for download and online analysis. This chapter presents a simple and practical guide to the process of orthologous group prediction, using a dataset of 10 prokaryotic proteomes as example. The orthology methods covered are OrthoMCL, COGtriangles, OrthoFinder2, and OMA. The authors compare the number of orthologous groups predicted by these various methods, and present a brief workflow for the functional annotation and reconstruction of phylogenies from inferred single-copy orthologous genes. The chapter also demonstrates how to explore two orthology databases: eggNOG6 and OrthoDB.


Assuntos
Genômica , Filogenia , Genômica/métodos , Biologia Computacional/métodos , Software , Células Procarióticas/metabolismo , Bases de Dados Genéticas , Anotação de Sequência Molecular/métodos , Família Multigênica , Genoma Bacteriano
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38706315

RESUMO

In UniProtKB, up to date, there are more than 251 million proteins deposited. However, only 0.25% have been annotated with one of the more than 15000 possible Pfam family domains. The current annotation protocol integrates knowledge from manually curated family domains, obtained using sequence alignments and hidden Markov models. This approach has been successful for automatically growing the Pfam annotations, however at a low rate in comparison to protein discovery. Just a few years ago, deep learning models were proposed for automatic Pfam annotation. However, these models demand a considerable amount of training data, which can be a challenge with poorly populated families. To address this issue, we propose and evaluate here a novel protocol based on transfer learningThis requires the use of protein large language models (LLMs), trained with self-supervision on big unnanotated datasets in order to obtain sequence embeddings. Then, the embeddings can be used with supervised learning on a small and annotated dataset for a specialized task. In this protocol we have evaluated several cutting-edge protein LLMs together with machine learning architectures to improve the actual prediction of protein domain annotations. Results are significatively better than state-of-the-art for protein families classification, reducing the prediction error by an impressive 60% compared to standard methods. We explain how LLMs embeddings can be used for protein annotation in a concrete and easy way, and provide the pipeline in a github repo. Full source code and data are available at https://github.com/sinc-lab/llm4pfam.


Assuntos
Bases de Dados de Proteínas , Proteínas , Proteínas/química , Anotação de Sequência Molecular/métodos , Biologia Computacional/métodos , Aprendizado de Máquina
3.
Rev. Bras. Parasitol. Vet. (Online) ; 32(1): e012322, 2023. tab, mapas, ilus, tab
Artigo em Inglês | VETINDEX | ID: biblio-1416451

RESUMO

Hemoplasmas are non-cultivable bacterial parasites of erythrocytes that infect domestic and wild animals, as well as humans. Their means of transmission and pathogenesis remain contentious issues and difficult to evaluate in wild animals. Procyon cancrivorus is a South American carnivore and occurs in all Brazilian biomes. In this study, we aimed to investigate occurrences of hemoplasmas infecting P. cancrivorus and to identify their 16S rRNA gene, in southern Brazil. DNA was extracted from spleen and blood samples of P. cancrivorus (n = 9) from different locations. Hemoplasma DNA was detected in six samples, based on 16S rRNA gene amplification and phylogenetic analysis. Four of the six sequences belonged to the "Mycoplasma haemofelis group", which is closely related to genotypes detected in Procyon lotor from the USA; one was within the "Mycoplasma suis group", closely related to "Candidatus Mycoplasma haemominutum"; and one was within the intermediate group between these clusters. Thus, these sequences showed that the molecular identity of hemoplasmas in the population studied was very variable. In five positive animals, Amblyomma aureolatum ticks and a flea (Ctenocephalides felis felis) were collected. The present study describes the first molecular detection of mycoplasmas in P. cancrivorus.(AU)


Os micoplasmas hemotrópicos (hemoplasmas) são parasitas bacterianos não-cultiváveis de eritrócitos que infectam tanto animais domésticos e selvagens, como seres humanos. A transmissão e a patogênese são discutíveis e difíceis de avaliar em animais selvagens. O mão pelada (Procyon cancrivorus) é um carnívoro Sul-americano, que ocorre em todos os biomas brasileiros. O objetivo do presente estudo é o de investigar a ocorrência de hemoplasmas infectando P. cancrivorus e identificar seu gene 16S rRNA no Sul do Brasil. O DNA foi extraído do baço e amostras de sangue de P. cancrivorus (n= 9). O DNA de hemoplasma foi detectado em seis amostras, com base na amplificação do gene 16S rRNA e na análise filogenética. Quatro das seis sequências pertencem ao "Grupo Mycoplasma haemofelis", que estão intimamente relacionadas aos genótipos detectados no Procyon lotor dos EUA; uma dentro do "Grupo Mycoplasma suis", que está intimamente relacionado ao "Candidatus Mycoplasma haemominutum", e uma dentro do grupo intermediário entre esses clusters, mostrando assim que há uma diversidade genética de hemoplasmas na população estudada. Em cinco animais positivos, foram coletados carrapatos Amblyomma aureolatum e uma pulga Ctenocephalides felis. O presente estudo traz a primeira detecção molecular de micoplasmas em P. cancrivorus.(AU)


Assuntos
Doenças Parasitárias em Animais/diagnóstico , Guaxinins/microbiologia , RNA Ribossômico 16S/análise , Brasil , Anotação de Sequência Molecular/métodos
4.
Rev. bras. parasitol. vet ; 32(3): e004623, 2023. mapas, tab, graf
Artigo em Inglês | VETINDEX | ID: biblio-1444794

RESUMO

The aim of this study was to determine the presence of deoxyribonucleic acid (DNA) from Toxoplasma gondii, Sarcocystis spp. and Neospora caninum, in tissues of wild boars slaughtered in southern Brazil. A total of 156 samples were collected from different organs of 25 wild boars, and DNA from at least one of the protozoa investigated was detected in 79 samples. To differentiate between infectious agents, restriction fragment length polymorphism was performed using the restriction enzymes DdeI and HpaII. For N. caninum, conventional PCR was performed with specific primers. The DNA of at least one of the studied pathogens was detected in each animal: 26.58% for T. gondii, 68.36% for Sarcocystis spp. and 5.06% for N. caninum. Coinfection between T. gondii and Sarcocystis spp. occurred in 14 animals, between T. gondii and N. caninum in only one male animal, between Sarcocystis spp. and N. caninum in a female, while co-infection with the three agents was equally observed in only one male animal. Considering the high frequency of detection and its zoonotic risk, especially T. gondii, it appears that wild boars can be potential sources of transmission of infectious agents and the adoption of monitoring measures in these populations should be prioritized.(AU)


O objetivo deste estudo foi determinar a presença de ácido desoxirribonucléico (DNA) de Toxoplasma gondii, Sarcocystis spp. e Neospora caninum, em tecidos de javalis abatidos no sul do Brasil. Foram coletadas 156 amostras de diferentes órgãos de 25 javalis, sendo detectado o DNA de pelo menos um dos protozoários pesquisados em 79 amostras. Para diferenciar entre os agentes infecciosos, o polimorfismo do comprimento do fragmento de restrição, foi realizado usando-se as enzimas de restrição DdeI e HpaII. Para N. caninum, a PCR convencional foi realizada com "primers" específicos. O DNA de pelo menos um dos patógenos estudados foi detectado em cada animal: 26,58% para T. gondii, 68,36% para Sarcocystis spp. e 5,06% para N. caninum. Coinfecção entre T. gondii e Sarcocystis spp. ocorreu em 14 animais; entre T. gondii e N. caninum em apenas um animal macho; entre Sarcocystis spp. e N. caninum em uma fêmea, enquanto a coinfecção com os três agentes foi observada igualmente em apenas um animal macho. Considerando-se a alta frequência de detecção e seu risco zoonótico, especialmente T. gondii, constata-se que os javalis podem ser potenciais fontes de transmissão de agentes infecciosos, e a adoção de medidas de monitoramento nessas populações devem ser priorizadas.(AU)


Assuntos
Animais , Toxoplasma/citologia , DNA/análise , Sarcocystis/citologia , Neospora/citologia , Anotação de Sequência Molecular/métodos , Brasil , Sus scrofa/parasitologia
5.
PLoS One ; 16(4): e0249801, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33836025

RESUMO

Crustaceans are major constituents of aquatic ecosystems and, as such, changes in their behavior and the structure and function of their bodies can serve as indicators of alterations in their immediate environment, such as those associated with climate change and anthropogenic contamination. We have used bioinformatics and a de novo transcriptome assembly approach to identify potential targets for developing specific antibodies to serve as nervous system function markers for freshwater prawns of the Macrobrachium spp. Total RNA was extracted from brain ganglia of Macrobrachium carcinus freshwater prawns and Illumina Next Generation Sequencing was performed using an Eel Pond mRNA Seq Protocol to construct a de novo transcriptome. Sequencing yielded 97,202,662 sequences: 47,630,546 paired and 1,941,570 singletons. Assembly with Trinity resulted in 197,898 assembled contigs from which 30,576 were annotated: 9,600 by orthology, 17,197 by homology, and 3,779 by transcript families. We looked for glutamate receptors contigs, due to their main role in crustacean excitatory neurotransmission, and found 138 contigs related to ionotropic receptors, 32 related to metabotropic receptors, and 18 to unidentified receptors. After performing multiple sequence alignments within different biological organisms and antigenicity analysis, we were able to develop antibodies for prawn AMPA ionotropic glutamate receptor 1, metabotropic glutamate receptor 1 and 4, and ionotropic NMDA glutamate receptor subunit 2B, with the expectation that the availability of these antibodies will help broaden knowledge regarding the underlying structural and functional mechanisms involved in prawn behavioral responses to environmental impacts. The Macrobrachium carcinus brain transcriptome can be an important tool for examining changes in many other nervous system molecules as a function of developmental stages, or in response to particular conditions or treatments.


Assuntos
Anticorpos/imunologia , Encéfalo/metabolismo , Ecossistema , Anotação de Sequência Molecular/métodos , Palaemonidae/genética , Receptores de Glutamato/genética , Animais , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Palaemonidae/metabolismo , Receptores de Glutamato/imunologia , Receptores de Glutamato/metabolismo , Transcriptoma
6.
PLoS Comput Biol ; 17(3): e1008797, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33788829

RESUMO

Genome annotation conceptually consists of inferring and assigning biological information to gene products. Over the years, numerous pipelines and computational tools have been developed aiming to automate this task and assist researchers in gaining knowledge about target genes of study. However, even with these technological advances, manual annotation or manual curation is necessary, where the information attributed to the gene products is verified and enriched. Despite being called the gold standard process for depositing data in a biological database, the task of manual curation requires significant time and effort from researchers who sometimes have to parse through numerous products in various public databases. To assist with this problem, we present CODON, a tool for manual curation of genomic data, capable of performing the prediction and annotation process. This software makes use of a finite state machine in the prediction process and automatically annotates products based on information obtained from the Uniprot database. CODON is equipped with a simple and intuitive graphic interface that assists on manual curation, enabling the user to decide about the analysis based on information as to identity, length of the alignment, and name of the organism in which the product obtained a match. Further, visual analysis of all matches found in the database is possible, impacting significantly in the curation task considering that the user has at his disposal all the information available for a given product. An analysis performed on eleven organisms was used to test the efficiency of this tool by comparing the results of prediction and annotation through CODON to ones from the NCBI and RAST platforms.


Assuntos
Bactérias/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Bases de Dados Genéticas , Interface Usuário-Computador
7.
Sci Rep ; 10(1): 13957, 2020 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-32811897

RESUMO

Mugil incilis (lisa) is an important commercial fish species in many countries, living along the coasts of the western Atlantic Ocean. It has been used as a model organism for environmental monitoring and ecotoxicological investigations. Nevertheless, available genomic and transcriptomic information for this organism is extremely deficient. The aim of this study was to characterize M. incilis hepatic transcriptome using Illumina paired-end sequencing. A total of 32,082,124 RNA-Seq read pairs were generated utilizing the HiSeq platform and subsequently cleaned and assembled into 93,912 contigs (N50 = 2,019 bp). The analysis of species distribution revealed that M. incilis contigs had the highest number of hits to Stegastes partitus (13.4%). Using a sequence similarity search against the public databases GO and KEGG, a total of 7,301 and 16,967 contigs were annotated, respectively. KEGG database showed genes related to environmental information, metabolism and organismal system pathways were highly annotated. Complete or partial coding DNA sequences for several candidate genes associated with stress responses/detoxification of xenobiotics, as well as housekeeping genes, were employed to design primers that were successfully tested and validated by RT-qPCR. This study presents the first transcriptome resources for Mugil incilis and provides basic information for the development of genomic tools, such as the identification of RNA markers, useful to analyze environmental impacts on this fish Caribbean species.


Assuntos
Smegmamorpha/genética , Transcriptoma/genética , Animais , Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica/métodos , Genoma/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Fígado/metabolismo , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
8.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32507889

RESUMO

Modern biology produces data at a staggering rate. Yet, much of these biological data is still isolated in the text, figures, tables and supplementary materials of articles. As a result, biological information created at great expense is significantly underutilised. The protein motif biology field does not have sufficient resources to curate the corpus of motif-related literature and, to date, only a fraction of the available articles have been curated. In this study, we develop a set of tools and a web resource, 'articles.ELM', to rapidly identify the motif literature articles pertinent to a researcher's interest. At the core of the resource is a manually curated set of about 8000 motif-related articles. These articles are automatically annotated with a range of relevant biological data allowing in-depth search functionality. Machine-learning article classification is used to group articles based on their similarity to manually curated motif classes in the Eukaryotic Linear Motif resource. Articles can also be manually classified within the resource. The 'articles.ELM' resource permits the rapid and accurate discovery of relevant motif articles thereby improving the visibility of motif literature and simplifying the recovery of valuable biological insights sequestered within scientific articles. Consequently, this web resource removes a critical bottleneck in scientific productivity for the motif biology field. Database URL: http://slim.icr.ac.uk/articles/.


Assuntos
Motivos de Aminoácidos , Mineração de Dados/métodos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Anotação de Sequência Molecular/classificação , Anotação de Sequência Molecular/métodos , Publicações/classificação
9.
Mol Genet Genomics ; 295(4): 837-841, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32300860

RESUMO

This work presents a new method and tool to solve a common problem of molecular biologists and geneticists who use molecular markers in their scientific research and developments: curation of sequences. Omic studies conducted by molecular biologists and geneticists usually involve the use of molecular markers. AFLP, cDNA-AFLP, and MSAP are examples of markers that render information at the genomics, transcriptomics, and epigenomics levels, respectively. These three types of molecular markers use adaptors that are the template for PCR amplification. The sequences of the adaptors have to be eliminated for the analysis of the results. Since a large number of sequences are usually obtained in these studies, this clean-up of the data could demand long time and work. To automate this work, an R package, named CleanBSequences, was created that allows the sequences to be curated massively, quickly, without errors and can be used offline. The curating is performed by aligning the forward and/or reverse primers or ends of cloning vectors with the sequences to be removed. After the alignment, new subsequences are generated without biological fragments not desired by the user, i.e., sequences needed by the techniques. In conclusion, the CleanBSequences tool facilitates the work of researchers, reducing time, effort, and working errors. Therefore, the present tool would respond to the problems related to the curation of sequences obtained from the use of some types of molecular markers. In addition to the above, being an open source, CleanBSequences is a flexible tool that has the potential to be used in future improvements to respond to new problems.


Assuntos
Biologia Computacional , Marcadores Genéticos/genética , Biologia Molecular/métodos , Software , Epigenômica/métodos , Genômica/métodos , Anotação de Sequência Molecular/métodos , Alinhamento de Sequência/métodos , Análise de Sequência/métodos , Transcriptoma/genética
10.
Sci Rep ; 10(1): 1053, 2020 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-31974515

RESUMO

The common toad Rhinella arenarum is widely distributed in Argentina, where it is utilised as an autochthonous model in ecotoxicological research and environmental toxicology. However, the lack of a reference genome makes molecular assays and gene expression studies difficult to carry out on this non-model species. To address this issue, we performed a genome-wide transcriptome analysis on R. arenarum larvae through massive RNA sequencing, followed by de novo assembly, annotation, and gene prediction. We obtained 57,407 well-annotated transcripts representing 99.4% of transcriptome completeness (available at http://rhinella.uncoma.edu.ar). We also defined a set of 52,800 high-confidence lncRNA transcripts and demonstrated the reliability of the transcriptome data to perform phylogenetic analysis. Our comprehensive transcriptome analysis of R. arenarum represents a valuable resource to perform functional genomic studies and to identify potential molecular biomarkers in ecotoxicological research.


Assuntos
Bufonidae/genética , Genoma/genética , Transcriptoma/genética , Animais , Argentina , Sequência de Bases , Feminino , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Masculino , Anotação de Sequência Molecular/métodos , RNA Longo não Codificante/genética , Análise de Sequência de RNA
11.
Sci Rep ; 8(1): 12034, 2018 08 13.
Artigo em Inglês | MEDLINE | ID: mdl-30104688

RESUMO

Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark . Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.


Assuntos
Biologia Computacional/métodos , Leptospira interrogans/genética , Metagenoma/genética , Metagenômica/métodos , Algoritmos , Sequência de Bases , Bases de Dados Genéticas , Leptospira interrogans/classificação , Anotação de Sequência Molecular/métodos , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Software
12.
Sci Rep ; 8(1): 1794, 2018 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-29379090

RESUMO

Downstream analysis of genomic and transcriptomic sequence data is often executed by functional annotation that can be performed by various bioinformatics tools and biological databases. However, a full fast integrated tool is not available for such analysis. Besides, the current available software is not able to produce analytic lists of annotations and graphs to help users in evaluating the output results. Therefore, we present the Gene Ontology Functional Enrichment Annotation Tool (GO FEAT), a free web platform for functional annotation and enrichment of genomic and transcriptomic data based on sequence homology search. The analysis can be customized and visualized as per users' needs and specifications. GO FEAT is freely available at http://computationalbiology.ufpa.br/gofeat/ and its source code is hosted at https://github.com/fabriciopa/gofeat .


Assuntos
Genômica/métodos , Anotação de Sequência Molecular/métodos , Transcriptoma/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Ontologia Genética , Software
13.
Sci Rep ; 7(1): 17837, 2017 12 19.
Artigo em Inglês | MEDLINE | ID: mdl-29259202

RESUMO

Although human mesenchymal stem cells (hMSCs) are a powerful tool for cell therapy, prolonged culture times result in replicative senescence or acquisition of tumorigenic features. To identify a molecular signature for senescence, we compared the transcriptome of senescent and young hMSCs with normal karyotype (hMSCs/n) and with a constitutional inversion of chromosome 3 (hMSC/inv). Senescent and young cells from both lineages showed differentially expressed genes (DEGs), with higher levels in senescent hMSCs/inv. Among the 30 DEGs in senescent hMSC/inv, 11 are new candidates for biomarkers of cellular senescence. The functional categories most represented in senescent hMSCs were related to cellular development, cell growth/proliferation, cell death, cell signaling/interaction, and cell movement. Mapping of DEGs onto biological networks revealed matrix metalloproteinase-1, thrombospondin 1, and epidermal growth factor acting as topological bottlenecks. In the comparison between senescent hMSCs/n and senescent hMSCs/inv, other functional annotations such as segregation of chromosomes, mitotic spindle formation, and mitosis and proliferation of tumor lines were most represented. We found that many genes categorized into functional annotations related to tumors in both comparisons, with relation to tumors being highest in senescent hMSCs/inv. The data presented here improves our understanding of the molecular mechanisms underlying the onset of cellular senescence as well as tumorigenesis.


Assuntos
Carcinogênese/genética , Senescência Celular/genética , Células-Tronco Mesenquimais/fisiologia , Biomarcadores/metabolismo , Morte Celular/genética , Movimento Celular/genética , Proliferação de Células/genética , Células Cultivadas , Segregação de Cromossomos/genética , Cromossomos Humanos Par 3/genética , Humanos , Células-Tronco Mesenquimais/metabolismo , Mitose/genética , Anotação de Sequência Molecular/métodos , Fenótipo , Transdução de Sinais/genética
14.
Genet Mol Res ; 14(4): 15276-84, 2015 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-26634491

RESUMO

The leaves of tobacco plants were used to analyze differences in protein content of tobacco grown in the four main flue-cured tobacco-producing areas of Sichuan Province, China. An improved protein extraction method, isoelectric focusing/sodium dodecyl sulfate-polyacrylamide gel electrophoresis two-dimensional gel electrophoretic separation, was used to extract and separate total protein from tobacco leaves. Proteomic maps with relatively high resolution and repeatability were produced. At isoelectric points 4 to 7 and molecular weight ranging from 20-100 kDa, we detected 1032, 1030, 1019, and 1011 clearly visible protein spots in tobacco leaves from the four study areas. Proteome comparison between these protein spots showed that 119 spots with a greater than 2-fold change in expression quantity contributed to the variation in expression. Of which, 115 were successfully identified and annotated. According to the annotation results, these proteins participate in photosynthesis, energy metabolism, mineral nutrition, terpene metabolism, defensive reaction, and other physiological and biochemical processes. This study preliminarily explains the effects of ecological conditions on the physiological metabolism of tobacco leaves and how such effects directly or indirectly contribute to tobacco leaf quality.


Assuntos
Nicotiana/genética , Folhas de Planta/genética , Proteínas de Plantas/genética , Processamento de Proteína Pós-Traducional/genética , Proteoma/genética , China , Metabolismo Energético/genética , Anotação de Sequência Molecular/métodos , Fotossíntese/genética , Proteômica/métodos
15.
J Bioinform Comput Biol ; 13(6): 1550021, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26223200

RESUMO

Noncoding RNAs (ncRNAs) have been focus of intense research over the last few years. Since characteristics and signals of ncRNAs are not entirely known, researchers use different computational tools together with their biological knowledge to predict putative ncRNAs. In this context, this work presents ncRNA-Agents, a multi-agent system to annotate ncRNAs based on the output of different tools, using inference rules to simulate biologists' reasoning. Experiments with data from the fungus Saccharomyces cerevisiae allowed to measure the performance of ncRNA-Agents, with better sensibility, when compared to Infernal, a widely used tool for annotating ncRNA. Besides, data of the Schizosaccharomyces pombe and Paracoccidioides brasiliensis fungi identified novel putative ncRNAs, which demonstrated the usefulness of our approach. NcRNA-Agents can be be found at: http://www.biomol.unb.br/ncrna-agents.


Assuntos
Biologia Computacional/métodos , RNA não Traduzido/genética , Software , Bases de Dados Genéticas , Anotação de Sequência Molecular/métodos , Paracoccidioides/genética , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética
16.
Genet Mol Res ; 13(4): 10891-7, 2014 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-25526209

RESUMO

Gene annotation plays a key role in subsequent biochemical and molecular biological studies of various organisms. There are some errors in the original annotation of sequenced genomes because of the lack of sufficient data, and these errors may propagate into other genomes. Therefore, genome annotation must be checked from time to time to evaluate newly accumulated data. In this study, we evaluated the gene density of 2606 bacteria or archaea, and identified 2 with extreme values, the minimum value (Chloroflexus aurantiacus strain J-10-fl) and maximum value (Natrinema sp J7-2), to conduct genome re-annotation. In the genome of C. aurantiacus strain J-10-fl, we identified 17 new genes with definite functions and eliminated 34 non-coding open-reading frames; in the genome of Natrinema sp J7-2, we eliminated 118 non-coding open reading frames. Our re-annotation procedure may provide a reference for improving the annotation of other bacterial genomes.


Assuntos
Chloroflexus/genética , Halobacteriaceae/genética , Anotação de Sequência Molecular/métodos , Proteínas Arqueais/genética , Proteínas de Bactérias/genética , Chloroflexus/classificação , Tamanho do Genoma , Genoma Arqueal , Genoma Bacteriano , Halobacteriaceae/classificação , Análise de Sequência de DNA
17.
Rev. bras. pesqui. méd. biol ; Braz. j. med. biol. res;47(10): 834-841, 10/2014. tab, graf
Artigo em Inglês | LILACS | ID: lil-722173

RESUMO

In this study, biomarkers and transcriptional factor motifs were identified in order to investigate the etiology and phenotypic severity of Down syndrome. GSE 1281, GSE 1611, and GSE 5390 were downloaded from the gene expression ominibus (GEO). A robust multiarray analysis (RMA) algorithm was applied to detect differentially expressed genes (DEGs). In order to screen for biological pathways and to interrogate the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, the database for annotation, visualization, and integrated discovery (DAVID) was used to carry out a gene ontology (GO) function enrichment for DEGs. Finally, a transcriptional regulatory network was constructed, and a hypergeometric distribution test was applied to select for significantly enriched transcriptional factor motifs. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were each up-regulated two-fold in Down syndrome samples compared to normal samples; of these, SON and TTC3 were newly reported. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were located on human chromosome 21 (mouse chromosome 16). The DEGs were significantly enriched in macromolecular complex subunit organization and focal adhesion pathways. Eleven significantly enriched transcription factor motifs (PAX5, EGR1, XBP1, SREBP1, OLF1, MZF1, NFY, NFKAPPAB, MYCMAX, NFE2, and RP58) were identified. The DEGs and transcription factor motifs identified in our study provide biomarkers for the understanding of Down syndrome pathogenesis and progression.


Assuntos
Animais , Humanos , Camundongos , Ratos , Motivos de Aminoácidos/genética , Biologia Computacional/métodos , Síndrome de Down/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição/análise , Algoritmos , Biomarcadores/análise , Bases de Dados Genéticas , Síndrome de Down/etiologia , Expressão Gênica , Ontologia Genética , Anotação de Sequência Molecular/métodos , Fenótipo , Análise Serial de Proteínas/métodos , Regulação para Cima/genética
18.
Braz J Med Biol Res ; 47(10): 834-41, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25118625

RESUMO

In this study, biomarkers and transcriptional factor motifs were identified in order to investigate the etiology and phenotypic severity of Down syndrome. GSE 1281, GSE 1611, and GSE 5390 were downloaded from the gene expression ominibus (GEO). A robust multiarray analysis (RMA) algorithm was applied to detect differentially expressed genes (DEGs). In order to screen for biological pathways and to interrogate the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, the database for annotation, visualization, and integrated discovery (DAVID) was used to carry out a gene ontology (GO) function enrichment for DEGs. Finally, a transcriptional regulatory network was constructed, and a hypergeometric distribution test was applied to select for significantly enriched transcriptional factor motifs. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were each up-regulated two-fold in Down syndrome samples compared to normal samples; of these, SON and TTC3 were newly reported. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were located on human chromosome 21 (mouse chromosome 16). The DEGs were significantly enriched in macromolecular complex subunit organization and focal adhesion pathways. Eleven significantly enriched transcription factor motifs (PAX5, EGR1, XBP1, SREBP1, OLF1, MZF1, NFY, NFKAPPAB, MYCMAX, NFE2, and RP58) were identified. The DEGs and transcription factor motifs identified in our study provide biomarkers for the understanding of Down syndrome pathogenesis and progression.


Assuntos
Motivos de Aminoácidos/genética , Biologia Computacional/métodos , Síndrome de Down/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição/análise , Algoritmos , Animais , Biomarcadores/análise , Bases de Dados Genéticas , Síndrome de Down/etiologia , Expressão Gênica , Ontologia Genética , Humanos , Camundongos , Anotação de Sequência Molecular/métodos , Fenótipo , Análise Serial de Proteínas/métodos , Ratos , Regulação para Cima/genética
19.
PLoS One ; 9(2): e89162, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24586563

RESUMO

The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.


Assuntos
Bases de Dados de Proteínas , Enzimas , Anotação de Sequência Molecular/métodos , Software , Animais , Biologia Computacional/métodos , Enzimas/química , Enzimas/metabolismo , Previsões , Humanos , Modelos Moleculares , Estrutura Terciária de Proteína
20.
Malar J ; 11: 375, 2012 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-23153225

RESUMO

BACKGROUND: Signal peptide is one of the most important motifs involved in protein trafficking and it ultimately influences protein function. Considering the expected functional conservation among orthologs it was hypothesized that divergence in signal peptides within orthologous groups is mainly due to N-terminal protein sequence misannotation. Thus, discrepancies in signal peptide prediction of orthologous proteins were used to identify misannotated proteins in five Plasmodium species. METHODS: Signal peptide (SignalP) and orthology (OrthoMCL) were combined in an innovative strategy to identify orthologous groups showing discrepancies in signal peptide prediction among their protein members (Mixed groups). In a comparative analysis, multiple alignments for each of these groups and gene models were visually inspected in search of misannotated proteins and, whenever possible, alternative gene models were proposed. Thresholds for signal peptide prediction parameters were also modified to reduce their impact as a possible source of discrepancy among orthologs. Validation of new gene models was based on RT-PCR (few examples) or on experimental evidence already published (ApiLoc). RESULTS: The rate of misannotated proteins was significantly higher in Mixed groups than in Positive or Negative groups, corroborating the proposed hypothesis. A total of 478 proteins were reannotated and change of signal peptide prediction from negative to positive was the most common. Reannotations triggered the conversion of almost 50% of all Mixed groups, which were further reduced by optimization of signal peptide prediction parameters. CONCLUSIONS: The methodological novelty proposed here combining orthology and signal peptide prediction proved to be an effective strategy for the identification of proteins showing wrongly N-terminal annotated sequences, and it might have an important impact in the available data for genome-wide searching of potential vaccine and drug targets and proteins involved in host/parasite interactions, as demonstrated for five Plasmodium species.


Assuntos
Biologia Computacional/métodos , Anotação de Sequência Molecular/métodos , Plasmodium/genética , Sinais Direcionadores de Proteínas , Proteínas de Protozoários/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA