Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1482-1488, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27483459

RESUMO

Remote homology detection represents a central problem in bioinformatics, where the challenge is to detect functionally related proteins when their sequence similarity is low. Recent solutions employ representations derived from the sequence profile, obtained by replacing each amino acid of the sequence by the corresponding most probable amino acid in the profile. However, the information contained in the profile could be exploited more deeply, provided that there is a representation able to capture and properly model such crucial evolutionary information. In this paper, we propose a novel profile-based representation for sequences, called soft Ngram. This representation, which extends the traditional Ngram scheme (obtained by grouping N consecutive amino acids), permits considering all of the evolutionary information in the profile: this is achieved by extracting Ngrams from the whole profile, equipping them with a weight directly computed from the corresponding evolutionary frequencies. We illustrate two different approaches to model the proposed representation and to derive a feature vector, which can be effectively used for classification using a support vector machine (SVM). A thorough evaluation on three benchmarks demonstrates that the new approach outperforms other Ngram-based methods, and shows very promising results also in comparison with a broader spectrum of techniques.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Curva ROC , Máquina de Vetores de Suporte
2.
Artif Intell Med ; 70: 1-11, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27431033

RESUMO

OBJECTIVE: High-throughput technologies have generated an unprecedented amount of high-dimensional gene expression data. Algorithmic approaches could be extremely useful to distill information and derive compact interpretable representations of the statistical patterns present in the data. This paper proposes a mining approach to extract an informative representation of gene expression profiles based on a generative model called the Counting Grid (CG). METHOD: Using the CG model, gene expression values are arranged on a discrete grid, learned in a way that "similar" co-expression patterns are arranged in close proximity, thus resulting in an intuitive visualization of the dataset. More than this, the model permits to identify the genes that distinguish between classes (e.g. different types of cancer). Finally, each sample can be characterized with a discriminative signature - extracted from the model - that can be effectively employed for classification. RESULTS: A thorough evaluation on several gene expression datasets demonstrate the suitability of the proposed approach from a twofold perspective: numerically, we reached state-of-the-art classification accuracies on 5 datasets out of 7, and similar results when the approach is tested in a gene selection setting (with a stability always above 0.87); clinically, by confirming that many of the genes highlighted by the model as significant play also a key role for cancer biology. CONCLUSION: The proposed framework can be successfully exploited to meaningfully visualize the samples; detect medically relevant genes; properly classify samples.


Assuntos
Algoritmos , Mineração de Dados , Perfilação da Expressão Gênica , Análise por Conglomerados , Genes Neoplásicos , Humanos , Neoplasias/genética
3.
AIDS ; 30(5): 701-11, 2016 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-26730570

RESUMO

OBJECTIVES: AIDS is caused by CD4 T-cell depletion. Although combination antiretroviral therapy can restore blood T-cell numbers, the clonal diversity of the reconstituting cells, critical for immunocompetence, is not well defined. METHODS: We performed an extensive analysis of parameters of thymic function in perinatally HIV-1-infected (n = 39) and control (n = 28) participants ranging from 13 to 23 years of age. CD4 T cells including naive (CD27 CD45RA) and recent thymic emigrant (RTE) (CD31/CD45RA) cells, were quantified by flow cytometry. Deep sequencing was used to examine T-cell receptor (TCR) sequence diversity in sorted RTE CD4 T cells. RESULTS: Infected participants had reduced CD4 T-cell levels with predominant depletion of the memory subset and preservation of naive cells. RTE CD4 T-cell levels were normal in most infected individuals, and enhanced thymopoiesis was indicated by higher proportions of CD4 T cells containing TCR recombination excision circles. Memory CD4 T-cell depletion was highly associated with CD8 T-cell activation in HIV-1-infected persons and plasma interlekin-7 levels were correlated with naive CD4 T cells, suggesting activation-driven loss and compensatory enhancement of thymopoiesis. Deep sequencing of CD4 T-cell receptor sequences in well compensated infected persons demonstrated supranormal diversity, providing additional evidence of enhanced thymic output. CONCLUSION: Despite up to two decades of infection, many individuals have remarkable thymic reserve to compensate for ongoing CD4 T-cell loss, although there is ongoing viral replication and immune activation despite combination antiretroviral therapy. The longer term sustainability of this physiology remains to be determined.


Assuntos
Linfócitos T CD4-Positivos/imunologia , Infecções por HIV/imunologia , HIV-1/crescimento & desenvolvimento , Subpopulações de Linfócitos T/imunologia , Timo/fisiologia , Adolescente , Linfócitos T CD4-Positivos/química , Linfócitos T CD4-Positivos/classificação , Feminino , Citometria de Fluxo , Variação Genética , Infecções por HIV/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Antígenos Comuns de Leucócito/análise , Masculino , Molécula-1 de Adesão Celular Endotelial a Plaquetas/análise , Receptores de Antígenos de Linfócitos T/genética , Análise de Sequência de DNA , Subpopulações de Linfócitos T/química , Subpopulações de Linfócitos T/classificação , Membro 7 da Superfamília de Receptores de Fatores de Necrose Tumoral/análise , Adulto Jovem
4.
Artigo em Inglês | MEDLINE | ID: mdl-26451830

RESUMO

Protein remote homology detection represents a crucial and challenging task in bioinformatics: even if effective methods appeared in recent years, in several cases a proper characterization of remote evolutionary correlation can not be derived. In such situations, it may be possible that information derived from other sources helps, provided that it is possible to properly integrate such (even partial) information into existing models. In this paper, we provide some evidence that this route is feasible: inspired by the multimodal retrieval literature, we show how it is possible to exploit a simple multimodal approach to improve a model learned from a set of sequences, by using knowledge derived from a partial set of corresponding 3D structures. We investigate (with the SCOP 1.53 benchmark) the suitability of the proposed multimodal scheme, showing that a beneficial effect can be obtained even when a very reduced amount of structures are available. A further detailed analysis on a member of the GPCR superfamily confirms that this multimodal approach can extract information that cannot be obtained from sequence-based techniques.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Dados de Sequência Molecular
5.
Pac Symp Biocomput ; : 288-99, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24297555

RESUMO

The immune system gathers evidence of the execution of various molecular processes, both foreign and the cells' own, as time- and space-varying sets of epitopes, small linear or conformational segments of the proteins involved in these processes. Epitopes do not have any obvious ordering in this scheme: The immune system simply sees these epitope sets as disordered "bags" of simple signatures based on whose contents the actions need to be decided. The immense landscape of possible bags of epitopes is shaped by the cellular pathways in various cells, as well as the characteristics of the internal sampling process that chooses and brings epitopes to cellular surface. As a consequence, upon the infection by the same pathogen, different individuals' cells present very different epitope sets. Modeling this landscape should thus be a key step in computational immunology. We show that among possible bag-of-words models, the counting grid is most fit for modeling cellular presentation. We describe each patient by a bag-of-peptides they are likely to present on the cellular surface. In regression tests, we found that compared to the state-of-the-art, counting grids explain more than twice as much of the log viral load variance in these patients. This is potentially a significant advancement in the field, given that a large part of the log viral load variance also depends on the infecting HIV strain, and that HIV polymorphisms themselves are known to strongly associate with HLA types, both effects beyond what is modeled here.


Assuntos
HIV/genética , HIV/imunologia , Modelos Imunológicos , Carga Viral/estatística & dados numéricos , Biologia Computacional , Epitopos/genética , Antígenos HIV/genética , Infecções por HIV/imunologia , Infecções por HIV/virologia , Antígenos HLA/genética , Antígenos HLA/metabolismo , Teste de Histocompatibilidade , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/imunologia , Humanos , Medicina de Precisão , Análise de Regressão
6.
Artigo em Inglês | MEDLINE | ID: mdl-23221091

RESUMO

In recent years a particular class of probabilistic graphical models-called topic models-has proven to represent an useful and interpretable tool for understanding and mining microarray data. In this context, such models have been almost only applied in the clustering scenario, whereas the classification task has been disregarded by researchers. In this paper, we thoroughly investigate the use of topic models for classification of microarray data, starting from ideas proposed in other fields (e.g., computer vision). A classification scheme is proposed, based on highly interpretable features extracted from topic models, resulting in a hybrid generative-discriminative approach; an extensive experimental evaluation, involving 10 different literature benchmarks, confirms the suitability of the topic models for classifying expression microarray data.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Análise em Microsséries/métodos , Modelos Estatísticos , Teorema de Bayes , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...