Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Patterns (N Y) ; 4(2): 100691, 2023 Feb 10.
Article in English | MEDLINE | ID: mdl-36873903

ABSTRACT

The automatic annotation of the protein universe is still an unresolved challenge. Today, there are 229,149,489 entries in the UniProtKB database, but only 0.25% of them have been functionally annotated. This manual process integrates knowledge from the protein families database Pfam, annotating family domains using sequence alignments and hidden Markov models. This approach has grown the Pfam annotations at a low rate in the last years. Recently, deep learning models appeared with the capability of learning evolutionary patterns from unaligned protein sequences. However, this requires large-scale data, while many families contain just a few sequences. Here, we contend this limitation can be overcome by transfer learning, exploiting the full potential of self-supervised learning on large unannotated data and then supervised learning on a small labeled dataset. We show results where errors in protein family prediction can be reduced by 55% with respect to standard methods.

2.
Gene ; 869: 147393, 2023 Jun 15.
Article in English | MEDLINE | ID: mdl-36966978

ABSTRACT

In angiosperms, the mitochondrial cox2 gene harbors up to two introns, commonly referred to as cox2i373 and cox2i691. We studied the cox2 from 222 fully-sequenced mitogenomes from 30 angiosperm orders and analyzed the evolution of their introns. Unlike cox2i373, cox2i691 shows a distribution among plants that is shaped by frequent intron loss events driven by localized retroprocessing. In addition, cox2i691 exhibits sporadic elongations, frequently in domain IV of introns. Such elongations are poorly related to repeat content and two of them showed the presence of LINE transposons, suggesting that increasing intron size is very likely due to nuclear intracelular DNA transfer followed by incorporation into the mitochondrial DNA. Surprisingly, we found that cox2i691 is erroneously annotated as absent in 30 mitogenomes deposited in public databases. Although each of the cox2 introns is âˆ¼1.5 kb in length, a cox2i691 of 4.2 kb has been reported in Acacia ligulata (Fabaceae). It is still unclear whether its unusual length is due to a trans-splicing arrangement or the loss of functionality of the interrupted cox2. Through analyzing short-read RNA sequencing of Acacia with a multi-step computational strategy, we found that the Acacia cox2 is functional and its long intron is spliced in cis in a very efficient manner despite its length.


Subject(s)
Magnoliopsida , Introns/genetics , Magnoliopsida/genetics , Mitochondria/genetics , RNA Splicing , Base Sequence
3.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35136916

ABSTRACT

The gene ontology (GO) provides a hierarchical structure with a controlled vocabulary composed of terms describing functions and localization of gene products. Recent works propose vector representations, also known as embeddings, of GO terms that capture meaningful information about them. Significant performance improvements have been observed when these representations are used on diverse downstream tasks, such as the measurement of semantic similarity between GO terms and functional similarity between proteins. Despite the success shown by these approaches, existing embeddings of GO terms still fail to capture crucial structural features of the GO. Here, we present anc2vec, a novel protocol based on neural networks for constructing vector representations of GO terms by preserving three important ontological features: its ontological uniqueness, ancestors hierarchy and sub-ontology membership. The advantages of using anc2vec are demonstrated by systematic experiments on diverse tasks: visualization, sub-ontology prediction, inference of structurally related terms, retrieval of terms from aggregated embeddings, and prediction of protein-protein interactions. In these tasks, experimental results show that the performance of anc2vec representations is better than those of recent approaches. This demonstrates that higher performances on diverse tasks can be achieved by embeddings when the structure of the GO is better represented. Full source code and data are available at https://github.com/sinc-lab/anc2vec.


Subject(s)
Semantics , Software , Computational Biology/methods , Gene Ontology , Neural Networks, Computer , Proteins/genetics
4.
Comput Biol Med ; 136: 104682, 2021 09.
Article in English | MEDLINE | ID: mdl-34343887

ABSTRACT

In land plant mitochondria, C-to-U RNA editing converts cytidines into uridines at highly specific RNA positions called editing sites. This editing step is essential for the correct functioning of mitochondrial proteins. When using sequence homology information, edited positions can be computationally predicted with high precision. However, predictions based on the sequence contexts of such edited positions often result in lower precision, which is limiting further advances on novel genetic engineering techniques for RNA regulation. Here, a deep convolutional neural network called Deepred-Mt is proposed. It predicts C-to-U editing events based on the 40 nucleotides flanking a given cytidine. Unlike existing methods, Deepred-Mt was optimized by using editing extent information, novel strategies of data augmentation, and a large-scale training dataset, constructed with deep RNA sequencing data of 21 plant mitochondrial genomes. In comparison to predictive methods based on sequence homology, Deepred-Mt attains significantly better predictive performance, in terms of average precision as well as F1 score. In addition, our approach is able to recognize well-known sequence motifs linked to RNA editing, and shows that the local RNA structure surrounding editing sites may be a relevant factor regulating their editing. These results demonstrate that Deepred-Mt is an effective tool for predicting C-to-U RNA editing in plant mitochondria. Source code, datasets, and detailed use cases are freely available at https://github.com/aedera/deepredmt.


Subject(s)
Mitochondria , RNA Editing , Mitochondria/genetics , RNA Editing/genetics
5.
Methods Mol Biol ; 2181: 13-34, 2021.
Article in English | MEDLINE | ID: mdl-32729072

ABSTRACT

Computers are able to systematically exploit RNA-seq data allowing us to efficiently detect RNA editing sites in a genome-wide scale. This chapter introduces a very flexible computational framework for detecting RNA editing sites in plant organelles. This framework comprises three major steps: RNA-seq data processing, RNA read alignment, and RNA editing site detection. Each step is discussed in sufficient detail to be implemented by the reader. As a study case, the framework will be used with publicly available sequencing data to detect C-to-U RNA editing sites in the coding sequences of the mitochondrial genome of Nicotiana tabacum.


Subject(s)
Computational Biology/methods , Genome, Mitochondrial , Mitochondria/genetics , Nicotiana/genetics , RNA Editing/genetics , RNA, Mitochondrial/genetics , Cytidine/chemistry , Cytidine/genetics , High-Throughput Nucleotide Sequencing , Mitochondria/metabolism , RNA, Mitochondrial/metabolism , Software , Nicotiana/metabolism , Transcriptome , Uridine/chemistry , Uridine/genetics
6.
New Phytol ; 229(3): 1701-1714, 2021 02.
Article in English | MEDLINE | ID: mdl-32929737

ABSTRACT

Although horizontal gene transfer (HGT) is common in angiosperm mitochondrial DNAs (mtDNAs), few cases of functional foreign genes have been identified. The one outstanding candidate for large-scale functional HGT is the holoparasite Lophophytum mirabile, whose mtDNA has lost most native genes but contains intact foreign homologs acquired from legume host plants. To investigate the extent to which this situation results from functional replacement of native by foreign genes, functional mitochondrial gene transfer to the nucleus, and/or loss of mitochondrial biochemical function in the context of extreme parasitism, we examined the Lophophytum mitochondrial and nuclear transcriptomes by deep paired-end RNA sequencing. Most foreign mitochondrial genes in Lophophytum are highly transcribed, accurately spliced, and efficiently RNA edited. By contrast, we found no evidence for functional gene transfer to the nucleus or loss of mitochondrial functions in Lophophytum. Many functional replacements occurred via the physical replacement of native genes by foreign genes. Some of these events probably occurred as the final act of HGT itself. Lophophytum mtDNA has experienced an unprecedented level of functional replacement of native genes by foreign copies. This raises important questions concerning population-genetic and molecular regimes that underlie such a high level of foreign gene takeover.


Subject(s)
Genes, Mitochondrial , Genome, Mitochondrial , DNA, Mitochondrial , Evolution, Molecular , Gene Transfer, Horizontal/genetics , Phylogeny
7.
Plant Mol Biol ; 97(3): 215-231, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29761268

ABSTRACT

KEY MESSAGE: Our understanding of the dynamic and evolution of RNA editing in angiosperms is in part limited by the few editing sites identified to date. This study identified 10,217 editing sites from 17 diverse angiosperms. Our analyses confirmed the universality of certain features of RNA editing, and offer new evidence behind the loss of editing sites in angiosperms. RNA editing is a post-transcriptional process that substitutes cytidines (C) for uridines (U) in organellar transcripts of angiosperms. These substitutions mostly take place in mitochondrial messenger RNAs at specific positions called editing sites. By means of publicly available RNA-seq data, this study identified 10,217 editing sites in mitochondrial protein-coding genes of 17 diverse angiosperms. Even though other types of mismatches were also identified, we did not find evidence of non-canonical editing processes. The results showed an uneven distribution of editing sites among species, genes, and codon positions. The analyses revealed that editing sites were conserved across angiosperms but there were some species-specific sites. Non-synonymous editing sites were particularly highly conserved (~ 80%) across the plant species and were efficiently edited (80% editing extent). In contrast, editing sites at third codon positions were poorly conserved (~ 30%) and only partially edited (~ 40% editing extent). We found that the loss of editing sites along angiosperm evolution is mainly occurring by replacing editing sites with thymidines, instead of a degradation of the editing recognition motif around editing sites. Consecutive and highly conserved editing sites had been replaced by thymidines as result of retroprocessing, by which edited transcripts are reverse transcribed to cDNA and then integrated into the genome by homologous recombination. This phenomenon was more pronounced in eudicots, and in the gene cox1. These results suggest that retroprocessing is a widespread driving force underlying the loss of editing sites in angiosperm mitochondria.


Subject(s)
Magnoliopsida/genetics , Mitochondria/genetics , RNA Editing , Base Pair Mismatch , Codon/genetics , Genes, Plant/genetics , Genome, Mitochondrial/genetics , Phylogeny , RNA Editing/genetics , Thymidine , Transcriptome/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...