Pesquisa | Portal Regional da BVS

1.

CellBoost: A pipeline for machine assisted annotation in Neuroanatomy.

Qian, Kui; Friedman, Beth; Takatoh, Jun; Wang, Fan; Kleinfeld, David; Freund, Yoav.

bioRxiv ; 2024 Jan 21.

Artigo em Inglês | MEDLINE | ID: mdl-38293051

RESUMO

One of the important yet labor intensive tasks in neuroanatomy is the identification of select populations of cells. Current high-throughput techniques enable marking cells with histochemical fluorescent molecules as well as through the genetic expression of fluorescent proteins. Modern scanning microscopes allow high resolution multi-channel imaging of the mechanically or optically sectioned brain with thousands of marked cells per square millimeter. Manual identification of all marked cells is prohibitively time consuming. At the same time, simple segmentation algorithms suffer from high error rates and sensitivity to variation in fluorescent intensity and spatial distribution. We present a methodology that combines human judgement and machine learning that serves to significantly reduce the labor of the anatomist while improving the consistency of the annotation. As a demonstration, we analyzed murine brains with marked premotor neurons in the brainstem. We compared the error rate of our method to the disagreement rate among human anatomists. This comparison shows that our method can reduce the time to annotate by as much as ten-fold without significantly increasing the rate of errors. We show that our method achieves significant reduction in labor while achieving an accuracy that is similar to the level of agreement between different anatomists.

2.

An active texture-based digital atlas enables automated mapping of structures and markers across brains.

Chen, Yuncong; McElvain, Lauren E; Tolpygo, Alexander S; Ferrante, Daniel; Friedman, Beth; Mitra, Partha P; Karten, Harvey J; Freund, Yoav; Kleinfeld, David.

Nat Methods ; 16(4): 341-350, 2019 04.

Artigo em Inglês | MEDLINE | ID: mdl-30858600

RESUMO

Brain atlases enable the mapping of labeled cells and projections from different brains onto a standard coordinate system. We address two issues in the construction and use of atlases. First, expert neuroanatomists ascertain the fine-scale pattern of brain tissue, the 'texture' formed by cellular organization, to define cytoarchitectural borders. We automate the processes of localizing landmark structures and alignment of brains to a reference atlas using machine learning and training data derived from expert annotations. Second, we construct an atlas that is active; that is, augmented with each use. We show that the alignment of new brains to a reference atlas can continuously refine the coordinate system and associated variance. We apply this approach to the adult murine brainstem and achieve a precise alignment of projections in cytoarchitecturally ill-defined regions across brains from different animals.

Assuntos

Mapeamento Encefálico/métodos , Encéfalo/diagnóstico por imagem , Biologia Computacional/métodos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Animais , Encéfalo/anatomia & histologia , Tronco Encefálico/diagnóstico por imagem , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Neurônios Motores , Neuroanatomia , Neurônios , Probabilidade , Medula Espinal/diagnóstico por imagem

3.

An online learning approach to occlusion boundary detection.

Jacobson, Natan; Freund, Yoav; Nguyen, Truong Q.

IEEE Trans Image Process ; 21(1): 252-61, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21788193

RESUMO

We propose a novel online learning-based framework for occlusion boundary detection in video sequences. This approach does not require any prior training and instead "learns" occlusion boundaries by updating a set of weights for the online learning Hedge algorithm at each frame instance. Whereas previous training-based methods perform well only on data similar to the trained examples, the proposed method is well suited for any video sequence. We demonstrate the performance of the proposed detector both for the CMU data set, which includes hand-labeled occlusion boundaries, and for a novel video sequence. In addition to occlusion boundary detection, the proposed algorithm is capable of classifying occlusion boundaries by angle and by whether the occluding object is covering or uncovering the background.

Assuntos

Algoritmos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Gravação em Vídeo/métodos , Inteligência Artificial , Aumento da Imagem/métodos , Sistemas On-Line , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

4.

Automatic identification of fluorescently labeled brain cells for rapid functional imaging.

Valmianski, Ilya; Shih, Andy Y; Driscoll, Jonathan D; Matthews, David W; Freund, Yoav; Kleinfeld, David.

J Neurophysiol ; 104(3): 1803-11, 2010 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-20610792

RESUMO

The on-line identification of labeled cells and vessels is a rate-limiting step in scanning microscopy. We use supervised learning to formulate an algorithm that rapidly and automatically tags fluorescently labeled somata in full-field images of cortex and constructs an optimized scan path through these cells. A single classifier works across multiple subjects, regions of the cortex of similar depth, and different magnification and contrast levels without the need to retrain the algorithm. Retraining only has to be performed when the morphological properties of the cells change significantly. In conjunction with two-photon laser scanning microscopy and bulk-labeling of cells in layers 2/3 of rat parietal cortex with a calcium indicator, we can automatically identify â¼ 50 cells within 1 min and sample them at â¼ 100 Hz with a signal-to-noise ratio of â¼ 10.

Assuntos

Corantes Fluorescentes/análise , Microscopia Confocal/métodos , Córtex Somatossensorial/química , Córtex Somatossensorial/citologia , Animais , Ratos , Ratos Sprague-Dawley , Córtex Somatossensorial/fisiologia , Fatores de Tempo

5.

Minimizing off-target signals in RNA fluorescent in situ hybridization.

Arvey, Aaron; Hermann, Anita; Hsia, Cheryl C; Ie, Eugene; Freund, Yoav; McGinnis, William.

Nucleic Acids Res ; 38(10): e115, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20164092

RESUMO

Fluorescent in situ hybridization (FISH) techniques are becoming extremely sensitive, to the point where individual RNA or DNA molecules can be detected with small probes. At this level of sensitivity, the elimination of 'off-target' hybridization is of crucial importance, but typical probes used for RNA and DNA FISH contain sequences repeated elsewhere in the genome. We find that very short (e.g. 20 nt) perfect repeated sequences within much longer probes (e.g. 350-1500 nt) can produce significant off-target signals. The extent of noise is surprising given the long length of the probes and the short length of non-specific regions. When we removed the small regions of repeated sequence from either short or long probes, we find that the signal-to-noise ratio is increased by orders of magnitude, putting us in a regime where fluorescent signals can be considered to be a quantitative measure of target transcript numbers. As the majority of genes in complex organisms contain repeated k-mers, we provide genome-wide annotations of k-mer-uniqueness at http://cbio.mskcc.org/ approximately aarvey/repeatmap.

Assuntos

Hibridização in Situ Fluorescente/métodos , Sondas RNA/química , RNA Mensageiro/análise , Animais , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Drosophila melanogaster/genética , Embrião não Mamífero/química , Proteínas Nucleares/genética , RNA Mensageiro/química , Sequências Repetitivas de Ácido Nucleico , Fatores de Transcrição/genética

6.

Visualization of individual Scr mRNAs during Drosophila embryogenesis yields evidence for transcriptional bursting.

Paré, Adam; Lemons, Derek; Kosman, Dave; Beaver, William; Freund, Yoav; McGinnis, William.

Curr Biol ; 19(23): 2037-42, 2009 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-19931455

RESUMO

The detection and counting of transcripts within single cells via fluorescent in situ hybridization (FISH) has allowed researchers to ask quantitative questions about gene expression at the level of individual cells. This method is often preferable to quantitative RT-PCR, because it does not necessitate destruction of the cells being probed and maintains spatial information that may be of interest. Until now, studies using FISH at single-molecule resolution have only been rigorously carried out in isolated cells (e.g., yeast cells or mammalian cell culture). Here, we describe the detection and counting of transcripts within single cells of fixed, whole-mount Drosophila embryos via a combination of FISH, immunohistochemistry, and image segmentation. Our method takes advantage of inexpensive, long RNA probes detected with antibodies, and we present novel evidence to show that we can robustly detect single mRNA molecules. We use this method to characterize transcription at the endogenous locus of the Hox gene Sex combs reduced (Scr), by comparing a stably expressing group of cells to a group that only transiently expresses the gene. Our data provide evidence for transcriptional bursting, as well for divergent "accumulation" and "maintenance" phases of gene activity at the Scr locus.

Assuntos

Proteínas de Drosophila/metabolismo , Drosophila/embriologia , RNA Mensageiro/metabolismo , Fatores de Transcrição/metabolismo , Transcrição Gênica/fisiologia , Animais , Proteínas de Drosophila/genética , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Imuno-Histoquímica , Hibridização in Situ Fluorescente , RNA Mensageiro/genética , Fatores de Transcrição/genética

7.

ResBoost: characterizing and predicting catalytic residues in enzymes.

Alterovitz, Ron; Arvey, Aaron; Sankararaman, Sriram; Dallett, Carolina; Freund, Yoav; Sjölander, Kimmen.

BMC Bioinformatics ; 10: 197, 2009 Jun 27.

Artigo em Inglês | MEDLINE | ID: mdl-19558703

RESUMO

BACKGROUND: Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. RESULTS: We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). CONCLUSION: ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.

Assuntos

Biologia Computacional/métodos , Enzimas/química , Software , Sítios de Ligação , Catálise , Bases de Dados de Proteínas

8.

Image-based crystal detection: a machine-learning approach.

Liu, Roy; Freund, Yoav; Spraggon, Glen.

Acta Crystallogr D Biol Crystallogr ; 64(Pt 12): 1187-95, 2008 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-19018095

RESUMO

The ability of computers to learn from and annotate large databases of crystallization-trial images provides not only the ability to reduce the workload of crystallization studies, but also an opportunity to annotate crystallization trials as part of a framework for improving screening methods. Here, a system is presented that scores sets of images based on the likelihood of containing crystalline material as perceived by a machine-learning algorithm. The system can be incorporated into existing crystallization-analysis pipelines, whereby specialists examine images as they normally would with the exception that the images appear in rank order according to a simple real-valued score. Promising results are shown for 319 112 images associated with 150 structures solved by the Joint Center for Structural Genomics pipeline during the 2006-2007 year. Overall, the algorithm achieves a mean receiver operating characteristic score of 0.919 and a 78% reduction in human effort per set when considering an absolute score cutoff for screening images, while incurring a loss of five out of 150 structures.

Assuntos

Inteligência Artificial , Cristalografia por Raios X/métodos , Processamento de Imagem Assistida por Computador/métodos , Proteínas/química , Algoritmos , Cristalização , Cristalografia por Raios X/instrumentação , Cristalografia por Raios X/tendências , Sistemas de Gerenciamento de Base de Dados/economia , Sistemas de Gerenciamento de Base de Dados/instrumentação , Interpretação de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/instrumentação , Curva ROC

9.

Lamellipodial actin mechanically links myosin activity with adhesion-site formation.

Giannone, Grégory; Dubin-Thaler, Benjamin J; Rossier, Olivier; Cai, Yunfei; Chaga, Oleg; Jiang, Guoying; Beaver, William; Döbereiner, Hans-Günther; Freund, Yoav; Borisy, Gary; Sheetz, Michael P.

Cell ; 128(3): 561-75, 2007 Feb 09.

Artigo em Inglês | MEDLINE | ID: mdl-17289574

RESUMO

Cell motility proceeds by cycles of edge protrusion, adhesion, and retraction. Whether these functions are coordinated by biochemical or biomechanical processes is unknown. We find that myosin II pulls the rear of the lamellipodial actin network, causing upward bending, edge retraction, and initiation of new adhesion sites. The network then separates from the edge and condenses over the myosin. Protrusion resumes as lamellipodial actin regenerates from the front and extends rearward until it reaches newly assembled myosin, initiating the next cycle. Upward bending, observed by evanescence and electron microscopy, results in ruffle formation when adhesion strength is low. Correlative fluorescence and electron microscopy shows that the regenerating lamellipodium forms a cohesive, separable layer of actin above the lamellum. Thus, actin polymerization periodically builds a mechanical link, the lamellipodium, connecting myosin motors with the initiation of adhesion sites, suggesting that the major functions driving motility are coordinated by a biomechanical process.

Assuntos

Actinas/metabolismo , Adesão Celular , Miosinas/metabolismo , Pseudópodes/química , Animais , Movimento Celular , Fibroblastos/citologia , Camundongos , Microscopia Eletrônica , Microscopia de Fluorescência , Miosina Tipo II/genética , Miosina Tipo II/metabolismo , Periodicidade , Polímeros/metabolismo , Pseudópodes/ultraestrutura

10.

A classification-based framework for predicting and analyzing gene regulatory response.

Kundaje, Anshul; Middendorf, Manuel; Shah, Mihir; Wiggins, Chris H; Freund, Yoav; Leslie, Christina.

BMC Bioinformatics ; 7 Suppl 1: S5, 2006 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-16723008

RESUMO

BACKGROUND: We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem--predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. METHODS: In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data. RESULTS: Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast--the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors--and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from http://www.cs.columbia.edu/compbio/robust-geneclass.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Algoritmos , Motivos de Aminoácidos , Sítios de Ligação , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Proteínas Fúngicas/química , Proteínas de Choque Térmico/metabolismo , Chaperonas Moleculares/química , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Saccharomyces cerevisiae/metabolismo

11.

Identifying metabolic enzymes with multiple types of association evidence.

Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M.

BMC Bioinformatics ; 7: 177, 2006 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-16571130

RESUMO

BACKGROUND: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. RESULTS: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. CONCLUSION: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities.

Assuntos

Enzimas/genética , Proteínas de Escherichia coli/genética , Genoma Bacteriano , Proteínas de Saccharomyces cerevisiae/genética , Análise de Sequência de Proteína/métodos , Metabolismo Energético/genética , Filogenia

12.

Profile-based string kernels for remote homology detection and motif extraction.

Kuang, Rui; Ie, Eugene; Wang, Ke; Wang, Kai; Siddiqi, Mahira; Freund, Yoav; Leslie, Christina.

J Bioinform Comput Biol ; 3(3): 527-50, 2005 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-16108083

RESUMO

We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of k-length subsequences ("k-mers") in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSI-BLAST in order to build the profiles is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profile-based string kernels used with SVM classifiers strongly outperform all recently presented supervised SVM methods. We further examine how to incorporate predicted secondary structure information into the profile kernel to obtain a small but significant performance improvement. We also show how we can use the learned SVM classifier to extract "discriminative sequence motifs"--short regions of the original profile that contribute almost all the weight of the SVM classification score--and show that these discriminative motifs correspond to meaningful structural features in the protein data. The use of PSI-BLAST profiles can be seen as a semi-supervised learning technique, since PSI-BLAST leverages unlabeled data from a large sequence database to build more informative profiles. Recently presented "cluster kernels" give general semi-supervised methods for improving SVM protein classification performance. We show that our profile kernel results also outperform cluster kernels while providing much better scalability to large datasets.

Assuntos

Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Sequência de Aminoácidos , Dados de Sequência Molecular , Proteínas/análise , Proteínas/classificação , Homologia de Sequência de Aminoácidos

13.

Predicting genetic regulatory response using classification.

Middendorf, Manuel; Kundaje, Anshul; Wiggins, Chris; Freund, Yoav; Leslie, Christina.

Bioinformatics ; 20 Suppl 1: i232-40, 2004 Aug 04.

Artigo em Inglês | MEDLINE | ID: mdl-15262804

RESUMO

MOTIVATION: Studying gene regulatory mechanisms in simple model organisms through analysis of high-throughput genomic data has emerged as a central problem in computational biology. Most approaches in the literature have focused either on finding a few strong regulatory patterns or on learning descriptive models from training data. However, these approaches are not yet adequate for making accurate predictions about which genes will be up- or down-regulated in new or held-out experiments. By introducing a predictive methodology for this problem, we can use powerful tools from machine learning and assess the statistical significance of our predictions. RESULTS: We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences ('motifs') in the gene's regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment ('parents'). Thus, our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. We convert the regression task of predicting real-valued gene expression measurements to a classification task of predicting +1 and -1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. The learning algorithm employed is boosting with a margin-based generalization of decision trees, alternating decision trees. This large-margin classifier is sufficiently flexible to allow complex logical functions, yet sufficiently simple to give insight into the combinatorial mechanisms of gene regulation. We observe encouraging prediction accuracy on experiments based on the Gasch S.cerevisiae dataset, and we show that we can accurately predict up- and down-regulation on held-out experiments. We also show how to extract significant regulators, motifs and motif-regulator pairs from the learned models for various stress responses. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks. AVAILABILITY: The MLJava package is available upon request to the authors. Supplementary: Additional results are available from http://www.cs.columbia.edu/compbio/geneclass

Assuntos

Mapeamento Cromossômico/métodos , Regulação da Expressão Gênica/fisiologia , Modelos Genéticos , Proteoma/metabolismo , Elementos Reguladores de Transcrição/genética , Transdução de Sinais/genética , Fatores de Transcrição/genética , Sítios de Ligação , Simulação por Computador , Ligação Proteica , Proteínas de Saccharomyces cerevisiae/fisiologia , Análise de Sequência de DNA/métodos , Ativação Transcricional/fisiologia

14.

Profile-based string kernels for remote homology detection and motif extraction.

Kuang, Rui; Ie, Eugene; Wang, Ke; Wang, Kai; Siddiqi, Mahira; Freund, Yoav; Leslie, Christina.

Proc IEEE Comput Syst Bioinform Conf ; : 152-60, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-16448009

RESUMO

We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of k-length subsequences ("k-mers") in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSI-BLAST in order to build the pro- files is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profile-based string kernels used with SVM classifiers strongly outperform all recently presented supervised SVM methods. We also show how we can use the learned SVM classifier to extract "discriminative sequence motifs" -- short regions of the original profile that contribute almost all the weight of the SVM classification score -- and show that these discriminative motifs correspond to meaningful structural features in the protein data. The use of PSI-BLAST profiles can be seen as a semi-supervised learning technique, since PSI-BLAST leverages unlabeled data from a large sequence database to build more informative profiles. Recently presented "cluster kernels" give general semi-supervised methods for improving SVM protein classification performance. We show that our profile kernel results are comparable to cluster kernels while providing much better scalability to large datasets.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Expressão Gênica/genética , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Inteligência Artificial , Análise por Conglomerados , Homologia de Sequência de Aminoácidos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA