Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Brief Funct Genomics ; 23(4): 441-451, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-38242863

RESUMO

Cell type identification is an important task for single-cell RNA-sequencing (scRNA-seq) data analysis. Many prediction methods have recently been proposed, but the predictive accuracy of difficult cell type identification tasks is still low. In this work, we proposed a novel Gaussian noise augmentation-based scRNA-seq contrastive learning method (GsRCL) to learn a type of discriminative feature representations for cell type identification tasks. A large-scale computational evaluation suggests that GsRCL successfully outperformed other state-of-the-art predictive methods on difficult cell type identification tasks, while the conventional random genes masking augmentation-based contrastive learning method also improved the accuracy of easy cell type identification tasks in general.


Assuntos
RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , RNA-Seq/métodos , Distribuição Normal , Aprendizado de Máquina , Humanos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Algoritmos , Análise da Expressão Gênica de Célula Única
2.
Cell Rep ; 39(12): 110979, 2022 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-35732129

RESUMO

Vertebrate evolution was accompanied by two rounds of whole-genome duplication followed by functional divergence in terms of regulatory circuits and gene expression patterns. As a basal and slow-evolving chordate species, amphioxus is an ideal paradigm for exploring the origin and evolution of vertebrates. Single-cell sequencing has been widely used to construct the developmental cell atlas of several representative species of vertebrates (human, mouse, zebrafish, and frog) and tunicates (sea squirts). Here, we perform single-nucleus RNA sequencing (snRNA-seq) and single-cell assay for transposase accessible chromatin sequencing (scATAC-seq) for different stages of amphioxus (covering embryogenesis and adult tissues). With the datasets generated, we constructed a developmental tree for amphioxus cell fate commitment and lineage specification and characterize the underlying key regulators and genetic regulatory networks. The data are publicly available on the online platform AmphioxusAtlas.


Assuntos
Anfioxos , Animais , Cromatina/genética , Expressão Gênica , Genoma , Anfioxos/genética , Camundongos , Peixe-Zebra/genética
3.
PLoS One ; 14(7): e0209958, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31335894

RESUMO

Protein-protein interaction network data provides valuable information that infers direct links between genes and their biological roles. This information brings a fundamental hypothesis for protein function prediction that interacting proteins tend to have similar functions. With the help of recently-developed network embedding feature generation methods and deep maxout neural networks, it is possible to extract functional representations that encode direct links between protein-protein interactions information and protein function. Our novel method, STRING2GO, successfully adopts deep maxout neural networks to learn functional representations simultaneously encoding both protein-protein interactions and functional predictive information. The experimental results show that STRING2GO outperforms other protein-protein interaction network-based prediction methods and one benchmark method adopted in a recent large scale protein function prediction competition.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas , Humanos , Proteínas/genética , Proteínas/metabolismo
4.
PLoS One ; 13(6): e0198216, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29889900

RESUMO

Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.


Assuntos
Bases de Dados de Proteínas , Aprendizado de Máquina , Redes Neurais de Computação , Proteínas , Humanos , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
5.
PLoS Comput Biol ; 13(10): e1005791, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29045400

RESUMO

Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.


Assuntos
Biologia Computacional/métodos , Proteínas de Drosophila/genética , Drosophila melanogaster/crescimento & desenvolvimento , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Animais , Análise por Conglomerados , Simulação por Computador , Proteínas de Drosophila/análise , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Modelos Estatísticos , Fenótipo , Transcriptoma/fisiologia
6.
Hum Mol Genet ; 25(21): 4804-4818, 2016 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-28175300

RESUMO

In model organisms, over 2,000 genes have been shown to modulate aging, the collection of which we call the 'gerontome'. Although some individual aging-related genes have been the subject of intense scrutiny, their analysis as a whole has been limited. In particular, the genetic interaction of aging and age-related pathologies remain a subject of debate. In this work, we perform a systematic analysis of the gerontome across species, including human aging-related genes. First, by classifying aging-related genes as pro- or anti-longevity, we define distinct pathways and genes that modulate aging in different ways. Our subsequent comparison of aging-related genes with age-related disease genes reveals species-specific effects with strong overlaps between aging and age-related diseases in mice, yet surprisingly few overlaps in lower model organisms. We discover that genetic links between aging and age-related diseases are due to a small fraction of aging-related genes which also tend to have a high network connectivity. Other insights from our systematic analysis include assessing how using datasets with genes more or less studied than average may result in biases, showing that age-related disease genes have faster molecular evolution rates and predicting new aging-related drugs based on drug-gene interaction data. Overall, this is the largest systems-level analysis of the genetics of aging to date and the first to discriminate anti- and pro-longevity genes, revealing new insights on aging-related genes as a whole and their interactions with age-related diseases.


Assuntos
Envelhecimento/genética , Longevidade/genética , Fatores Etários , Animais , Caenorhabditis elegans , Bases de Dados de Ácidos Nucleicos , Drosophila , Evolução Molecular , Genoma Humano , Humanos , Camundongos , Saccharomyces cerevisiae , Análise de Sequência de DNA/métodos
7.
Artigo em Inglês | MEDLINE | ID: mdl-26357215

RESUMO

Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Naïve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms' genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Naïve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Naïve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets.


Assuntos
Senescência Celular/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Modelos Genéticos , Algoritmos , Animais , Teorema de Bayes , Caenorhabditis elegans/genética , Bases de Dados Genéticas , Drosophila melanogaster/genética , Ontologia Genética , Camundongos , Saccharomyces cerevisiae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...