Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 14(1): e0210966, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30689648

RESUMO

Early prediction of the potential for neurological recovery after resuscitation from cardiac arrest is difficult but important. Currently, no clinical finding or combination of findings are sufficient to accurately predict or preclude favorable recovery of comatose patients in the first 24 to 48 hours after resuscitation. Thus, life-sustaining therapy is often continued for several days in patients whose irrecoverable injury is not yet recognized. Conversely, early withdrawal of life-sustaining therapy increases mortality among patients who otherwise might have gone on to recover. In this work, we present Canonical Autocorrelation Analysis (CAA) and Canonical Autocorrelation Embeddings (CAE), novel methods suitable for identifying complex patterns in high-resolution multivariate data often collected in highly monitored clinical environments such as intensive care units. CAE embeds sets of datapoints onto a space that characterizes their latent correlation structures and allows direct comparison of these structures through the use of a distance metric. The methodology may be particularly suitable when the unit of analysis is not just an individual datapoint but a dataset, as for instance in patients for whom physiological measures are recorded over time, and where changes of correlation patterns in these datasets are informative for the task at hand. We present a proof of concept to illustrate the potential utility of CAE by applying it to characterize electroencephalographic recordings from 80 comatose survivors of cardiac arrest, aiming to identify patients who will survive to hospital discharge with favorable functional recovery. Our results show that with very low probability of making a Type 1 error, we are able to identify 32.5% of patients who are likely to have a good neurological outcome, some of whom have otherwise unfavorable clinical characteristics. Importantly, some of these had 5% predicted chance of favorable recovery based on initial illness severity measures alone. Providing this information to support clinical decision-making could motivate the continuation of life-sustaining therapies for these patients.


Assuntos
Eletroencefalografia/estatística & dados numéricos , Parada Cardíaca/fisiopatologia , Parada Cardíaca/terapia , Adulto , Idoso , Algoritmos , Reanimação Cardiopulmonar , Coma/fisiopatologia , Coma/terapia , Sistemas de Apoio a Decisões Clínicas , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Neurológicos , Análise Multivariada , Prognóstico , Recuperação de Função Fisiológica/fisiologia , Análise de Sobrevida
2.
BMC Genomics ; 16 Suppl 10: S7, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26449793

RESUMO

We present a computational framework tailored for the modeling of the complex, dynamic relationships that are encountered in splicing regulation. The starting point is whole-genome transcriptomic data from high-throughput array or sequencing methods that are used to quantify gene expression and alternative splicing across multiple contexts. This information is used as input for state of the art methods for Graphical Model Selection in order to recover the structure of a composite network that simultaneously models exon co-regulation and their cognate regulators. Community structure detection and social network analysis methods are used to identify distinct modules and key actors within the network. As a proof of concept for our framework we studied the splicing regulatory network for Drosophila development using the publicly available modENCODE data. The final model offers a comprehensive view of the splicing circuitry that underlies fly development. Identified modules are associated with major developmental hallmarks including maternally loaded RNAs, onset of zygotic gene expression, transitions between life stages and sex differentiation. Within-module key actors include well-known developmental-specific splicing regulators from the literature while additional factors previously unassociated with developmental-specific splicing are also highlighted. Finally we analyze an extensive battery of Splicing Factor knock-down transcriptome data and demonstrate that our approach captures true regulatory relationships.


Assuntos
Processamento Alternativo/genética , Biologia Computacional , Redes Reguladoras de Genes/genética , Transcriptoma/genética , Éxons/genética , Regulação da Expressão Gênica , Genoma
3.
Bioinformatics ; 30(16): 2280-7, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24764459

RESUMO

MOTIVATION: Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS: We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.


Assuntos
Filogenia , Algoritmos , Apicomplexa/genética , Epichloe/genética , Transferência Genética Horizontal , Genes , Alinhamento de Sequência , Software , Estatísticas não Paramétricas
4.
BMC Bioinformatics ; 13: 210, 2012 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-22909268

RESUMO

BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.


Assuntos
Filogenia , Análise de Sequência de DNA/métodos , Software , Máquina de Vetores de Suporte , Sequência de Bases , Duplicação Gênica , Transferência Genética Horizontal , Genes
5.
Bioinformatics ; 27(17): 2361-7, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21752801

RESUMO

MOTIVATION: Motif discovery is now routinely used in high-throughput studies including large-scale sequencing and proteomics. These datasets present new challenges. The first is speed. Many motif discovery methods do not scale well to large datasets. Another issue is identifying discriminative rather than generative motifs. Such discriminative motifs are important for identifying co-factors and for explaining changes in behavior between different conditions. RESULTS: To address these issues we developed a method for DECOnvolved Discriminative motif discovery (DECOD). DECOD uses a k-mer count table and so its running time is independent of the size of the input set. By deconvolving the k-mers DECOD considers context information without using the sequences directly. DECOD outperforms previous methods both in speed and in accuracy when using simulated and real biological benchmark data. We performed new binding experiments for p53 mutants and used DECOD to identify p53 co-factors, suggesting new mechanisms for p53 activation. AVAILABILITY: The source code and binaries for DECOD are available at http://www.sb.cs.cmu.edu/DECOD CONTACT: zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
DNA/química , Motivos de Nucleotídeos , Análise de Sequência de DNA , Algoritmos , Sequência de Bases , Proteína Supressora de Tumor p53/metabolismo
6.
Bull Math Biol ; 73(4): 795-810, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21409513

RESUMO

To infer a phylogenetic tree from a set of DNA sequences, typically a multiple alignment is first used to obtain homologous bases. The inferred phylogeny can be very sensitive to how the alignment was created. We develop tools for analyzing the robustness of phylogeny to perturbations in alignment parameters in the NW algorithm. Our main tool is parametric alignment, with novel improvements that are of general interest in parametric inference. Using parametric alignment and a Gaussian distribution on alignment parameters, we derive probabilities of optimal alignment summaries and inferred phylogenies. We apply our method to analyze intronic sequences from Drosophila flies. We show that phylogeny estimates can be sensitive to the choice of alignment parameters, and that parametric alignment elucidates the relationship between alignment parameters and reconstructed trees.


Assuntos
Filogenia , Alinhamento de Sequência/estatística & dados numéricos , Álcool Desidrogenase/genética , Algoritmos , Animais , Sequência de Bases/genética , Drosophila/genética , Proteínas de Drosophila/genética , Íntrons/genética , Distribuição Normal , Probabilidade , Alinhamento de Sequência/métodos , Homologia de Sequência do Ácido Nucleico , Software , Sinaptotagminas/genética
7.
Artigo em Inglês | MEDLINE | ID: mdl-20802801

RESUMO

We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by two input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after mapping trees into a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel method" which speeds up distance calculations when trees are mapped in a high-dimensional feature space, e.g., splits or quartets feature space. In this pilot study, first we test our statistical method on data sets simulated under a coalescence model, to test whether two alignments are generated by congruent gene trees. We follow our simulation results with applications to data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, Phylotree, is provided to facilitate computational experiments.

8.
Bioinformatics ; 25(12): 1476-83, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19357096

RESUMO

MOTIVATION: Many biological systems operate in a similar manner across a large number of species or conditions. Cross-species analysis of sequence and interaction data is often applied to determine the function of new genes. In contrast to these static measurements, microarrays measure the dynamic, condition-specific response of complex biological systems. The recent exponential growth in microarray expression datasets allows researchers to combine expression experiments from multiple species to identify genes that are not only conserved in sequence but also operated in a similar way in the different species studied. RESULTS: In this review we discuss the computational and technical challenges associated with these studies, the approaches that have been developed to address these challenges and the advantages of cross-species analysis of microarray data. We show how successful application of these methods lead to insights that cannot be obtained when analyzing data from a single species. We also highlight current open problems and discuss possible ways to address them.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Animais , Bases de Dados Genéticas , Humanos , Especificidade da Espécie
9.
Algorithms Mol Biol ; 3: 5, 2008 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-18447942

RESUMO

The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps [equation; see text] to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n

10.
Bull Math Biol ; 69(8): 2723-35, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17874271

RESUMO

The human genotope is the convex hull of all allele frequency vectors that can be obtained from the genotypes present in the human population. In this paper, we take a few initial steps toward a description of this object, which may be fundamental for future population based genetics studies. Here we use data from the HapMap Project, restricted to two ENCODE regions, to study a subpolytope of the human genotope. We study three different approaches for obtaining informative low-dimensional projections of this subpolytope. The projections are specified by projection onto few tag SNPs, principal component analysis, and archetypal analysis. We describe the application of our geometric approach to identifying structure in populations based on single nucleotide polymorphisms.


Assuntos
Genoma Humano , Genótipo , Variação Genética , Genômica/estatística & dados numéricos , Humanos , Matemática , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal
11.
PLoS Comput Biol ; 2(6): e73, 2006 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-16789815

RESUMO

The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Drosophila/genética , Genoma de Inseto/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Sequência Conservada , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...