Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Med Genomics ; 13(Suppl 10): 151, 2020 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-33087128

RESUMO

BACKGROUND: Bronchoscopy for suspected lung cancer has low diagnostic sensitivity, rendering many inconclusive results. The Bronchial Genomic Classifier (BGC) was developed to help with patient management by identifying those with low risk of lung cancer when bronchoscopy is inconclusive. The BGC was trained and validated on patients in the Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) trials. A modern patient cohort, the BGC Registry, showed differences in key clinical factors from the AEGIS cohorts, with less smoking history, smaller nodules and older age. Additionally, we discovered interfering factors (inhaled medication and sample collection timing) that impacted gene expressions and potentially disguised genomic cancer signals. METHODS: In this study, we leveraged multiple cohorts and next generation sequencing technology to develop a robust Genomic Sequencing Classifier (GSC). To address demographic composition shift and interfering factors, we synergized three algorithmic strategies: 1) ensemble of clinical dominant and genomic dominant models; 2) development of hierarchical regression models where the main effects from clinical variables were regressed out prior to the genomic impact being fitted in the model; and 3) targeted placement of genomic and clinical interaction terms to stabilize the effect of interfering factors. The final GSC model uses 1232 genes and four clinical covariates - age, pack-years, inhaled medication use, and specimen collection timing. RESULTS: In the validation set (N = 412), the GSC down-classified low and intermediate pre-test risk subjects to very low and low post-test risk with a specificity of 45% (95% CI 37-53%) and a sensitivity of 91% (95%CI 81-97%), resulting in a negative predictive value of 95% (95% CI 89-98%). Twelve percent of intermediate pre-test risk subjects were up-classified to high post-test risk with a positive predictive value of 65% (95%CI 44-82%), and 27% of high pre-test risk subjects were up-classified to very high post-test risk with a positive predictive value of 91% (95% CI 78-97%). CONCLUSIONS: The GSC overcame the impact of interfering factors and achieved consistent performance across multiple cohorts. It demonstrated diagnostic accuracy in both down- and up-classification of cancer risk, providing physicians actionable information for many patients with inconclusive bronchoscopy.


Assuntos
Sequenciamento do Exoma , Predisposição Genética para Doença , Neoplasias Pulmonares/genética , Modelos Genéticos , Transcriptoma , Idoso , Feminino , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias Pulmonares/diagnóstico , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Sistema de Registros , República da Coreia , Análise de Sequência de RNA
2.
Bioinformatics ; 31(2): 225-32, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25266225

RESUMO

MOTIVATION: Genomic analyses of many solid cancers have demonstrated extensive genetic heterogeneity between as well as within individual tumors. However, statistical methods for classifying tumors by subtype based on genomic biomarkers generally entail an all-or-none decision, which may be misleading for clinical samples containing a mixture of subtypes and/or normal cell contamination. RESULTS: We have developed a mixed-membership classification model, called glad, that simultaneously learns a sparse biomarker signature for each subtype as well as a distribution over subtypes for each sample. We demonstrate the accuracy of this model on simulated data, in-vitro mixture experiments, and clinical samples from the Cancer Genome Atlas (TCGA) project. We show that many TCGA samples are likely a mixture of multiple subtypes. AVAILABILITY: A python module implementing our algorithm is available from http://genomics.wpi.edu/glad/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Neoplasias/classificação , Neoplasias/genética , Software , Simulação por Computador , Interpretação Estatística de Dados , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos
3.
PLoS Genet ; 6(9): e1001099, 2010 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-20838588

RESUMO

Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ∼23 Mb genomes encoding ∼5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes have a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome alignments of seven Plasmodium species, we show that protein-coding, intergenic and intronic regions are all subject to purifying selection and we identify 670 conserved non-genic elements. We then use genome-wide polymorphism data from P. falciparum to describe short-term selective processes in this species and identify some candidate genes for balancing (diversifying) selection. Our analyses suggest that there are many functional elements in the non-genic regions of these genomes and that adaptive evolution has occurred more frequently in the protein-coding regions of the genome.


Assuntos
Genoma de Protozoário/genética , Malária/parasitologia , Parasitos/genética , Plasmodium/genética , Seleção Genética , Animais , Sequência Conservada/genética , Genes de Protozoários/genética , Fases de Leitura Aberta/genética , Filogenia , Especificidade da Espécie , Fatores de Tempo
4.
J Biol Chem ; 283(30): 21187-97, 2008 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-18487200

RESUMO

Type I collagen, the predominant protein of vertebrates, polymerizes with type III and V collagens and non-collagenous molecules into large cable-like fibrils, yet how the fibril interacts with cells and other binding partners remains poorly understood. To help reveal insights into the collagen structure-function relationship, a data base was assembled including hundreds of type I collagen ligand binding sites and mutations on a two-dimensional model of the fibril. Visual examination of the distribution of functional sites, and statistical analysis of mutation distributions on the fibril suggest it is organized into two domains. The "cell interaction domain" is proposed to regulate dynamic aspects of collagen biology, including integrin-mediated cell interactions and fibril remodeling. The "matrix interaction domain" may assume a structural role, mediating collagen cross-linking, proteoglycan interactions, and tissue mineralization. Molecular modeling was used to superimpose the positions of functional sites and mutations from the two-dimensional fibril map onto a three-dimensional x-ray diffraction structure of the collagen microfibril in situ, indicating the existence of domains in the native fibril. Sequence searches revealed that major fibril domain elements are conserved in type I collagens through evolution and in the type II/XI collagen fibril predominant in cartilage. Moreover, the fibril domain model provides potential insights into the genotype-phenotype relationship for several classes of human connective tissue diseases, mechanisms of integrin clustering by fibrils, the polarity of fibril assembly, heterotypic fibril function, and connective tissue pathology in diabetes and aging.


Assuntos
Colágeno/química , Sequência de Aminoácidos , Animais , Sítios de Ligação , Biologia Computacional , Humanos , Integrinas/química , Ligantes , Modelos Biológicos , Conformação Molecular , Dados de Sequência Molecular , Mutação , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos , Difração de Raios X
5.
Artigo em Inglês | MEDLINE | ID: mdl-20531976

RESUMO

Statistical evolutionary models provide an important mechanism for describing and understanding the escape response of a viral population under a particular therapy. We present a new hierarchical model that incorporates spatially varying mutation and recombination rates at the nucleotide level. It also maintains separate parameters for treatment and control groups, which allows us to estimate treatment effects explicitly. We use the model to investigate the sequence evolution of HIV populations exposed to a recently developed antisense gene therapy, as well as a more conventional drug therapy. The detection of biologically relevant and plausible signals in both therapy studies demonstrates the effectiveness of the method.

6.
Proc Natl Acad Sci U S A ; 102(22): 7900-5, 2005 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-15911755

RESUMO

Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements. We introduce a statistical framework for optimal species subset selection, based on maximizing power to detect conserved sites. Analysis of a phylogenetic star topology shows theoretically that the optimal species subset is not in general the most evolutionarily diverged subset. We then demonstrate this finding empirically in a study of vertebrate species. Our results suggest that marsupials are prime sequencing candidates.


Assuntos
Interpretação Estatística de Dados , Evolução Molecular , Genômica/métodos , Filogenia , Vertebrados/genética , Animais , Sequência Conservada/genética , Funções Verossimilhança , Especificidade da Espécie
7.
Proc Natl Acad Sci U S A ; 102(9): 3453-8, 2005 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-15716358

RESUMO

We previously characterized nutrient-specific transcriptional changes in Escherichia coli upon limitation of nitrogen (N) or sulfur (S). These global homeostatic responses presumably minimize the slowing of growth under a particular condition. Here, we characterize responses to slow growth per se that are not nutrient-specific. The latter help to coordinate the slowing of growth, and in the case of down-regulated genes, to conserve scarce N or S for other purposes. Three effects were particularly striking. First, although many genes under control of the stationary phase sigma factor RpoS were induced and were apparently required under S-limiting conditions, one or more was inhibitory under N-limiting conditions, or RpoS itself was inhibitory. RpoS was, however, universally required during nutrient downshifts. Second, limitation for N and S greatly decreased expression of genes required for synthesis of flagella and chemotaxis, and the motility of E. coli was decreased. Finally, unlike the response of all other met genes, transcription of metE was decreased under S- and N-limiting conditions. The metE product, a methionine synthase, is one of the most abundant proteins in E. coli grown aerobically in minimal medium. Responses of metE to S and N limitation pointed to an interesting physiological rationale for the regulatory subcircuit controlled by the methionine activator MetR.


Assuntos
Escherichia coli/genética , Genes Bacterianos , Nitrogênio/metabolismo , Enxofre/metabolismo , Proteínas de Bactérias/fisiologia , Flagelos/genética , Biossíntese de Proteínas/genética , RNA Mensageiro/genética , Fator sigma/fisiologia , Transcrição Gênica/fisiologia
8.
J Bacteriol ; 187(3): 1074-90, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15659685

RESUMO

We determined global transcriptional responses of Escherichia coli K-12 to sulfur (S)- or nitrogen (N)-limited growth in adapted batch cultures and cultures subjected to nutrient shifts. Using two limitations helped to distinguish between nutrient-specific changes in mRNA levels and common changes related to the growth rate. Both homeostatic and slow growth responses were amplified upon shifts. This made detection of these responses more reliable and increased the number of genes that were differentially expressed. We analyzed microarray data in several ways: by determining expression changes after use of a statistical normalization algorithm, by hierarchical and k-means clustering, and by visual inspection of aligned genome images. Using these tools, we confirmed known homeostatic responses to global S limitation, which are controlled by the activators CysB and Cbl, and found that S limitation propagated into methionine metabolism, synthesis of FeS clusters, and oxidative stress. In addition, we identified several open reading frames likely to respond specifically to S availability. As predicted from the fact that the ddp operon is activated by NtrC, synthesis of cross-links between diaminopimelate residues in the murein layer was increased under N-limiting conditions, as was the proportion of tripeptides. Both of these effects may allow increased scavenging of N from the dipeptide D-alanine-D-alanine, the substrate of the Ddp system.


Assuntos
Escherichia coli K12/metabolismo , Nitrogênio/metabolismo , Enxofre/metabolismo , Análise por Conglomerados , DNA Bacteriano/genética , DNA Complementar/genética , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Homeostase , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Peptidoglicano/genética , Transcrição Gênica
9.
Bioinformatics ; 20(12): 1850-60, 2004 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-14988105

RESUMO

MOTIVATION: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints. RESULTS: We show how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe shadower, our implementation of such a prediction system. We find that shadower outperforms previously reported ab initio gene finders, including comparative human-mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical analysis of shadower's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity and specificity in exon demarcation. AVAILABILITY: A Web server is available at http://bonaire.lbl.gov/shadower


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Evolução Molecular , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Cadeias de Markov , Modelos Estatísticos , Filogenia , Homologia de Sequência do Ácido Nucleico , Software
10.
Proc Natl Acad Sci U S A ; 100(16): 9232-7, 2003 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-12878731

RESUMO

High-pressure liquid chromatography-tandem mass spectrometry was used to obtain a protein profile of Escherichia coli strain MG1655 grown in minimal medium with glycerol as the carbon source. By using cell lysate from only 3 x 108 cells, at least four different tryptic peptides were detected for each of 404 proteins in a short 4-h experiment. At least one peptide with a high reliability score was detected for 986 proteins. Because membrane proteins were underrepresented, a second experiment was performed with a preparation enriched in membranes. An additional 161 proteins were detected, of which from half to two-thirds were membrane proteins. Overall, 1,147 different E. coli proteins were identified, almost 4 times as many as had been identified previously by using other tools. The protein list was compared with the transcription profile obtained on Affymetrix GeneChips. Expression of 1,113 (97%) of the genes whose protein products were found was detected at the mRNA level. The arithmetic mean mRNA signal intensity for these genes was 3-fold higher than that for all 4,300 protein-coding genes of E. coli. Thus, GeneChip data confirmed the high reliability of the protein list, which contains about one-fourth of the proteins of E. coli. Detection of even those membrane proteins and proteins of undefined function that are encoded by the same operons (transcriptional units) encoding proteins on the list remained low.


Assuntos
Escherichia coli/metabolismo , Transcrição Gênica , Membrana Celular/metabolismo , Cromatografia Líquida de Alta Pressão , Escherichia coli/fisiologia , Glicerol/química , Espectrometria de Massas , Análise de Sequência com Séries de Oligonucleotídeos , Proteoma , RNA Mensageiro/metabolismo , Fatores de Tempo
11.
Science ; 299(5611): 1391-4, 2003 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-12610304

RESUMO

Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.


Assuntos
Genoma Humano , Genoma , Filogenia , Primatas/genética , Análise de Sequência de DNA , Animais , Apolipoproteínas A/genética , Evolução Biológica , Cebidae/genética , Cercopithecidae/genética , Biologia Computacional , Sequência Conservada , Proteínas de Ligação a DNA/metabolismo , Ensaio de Desvio de Mobilidade Eletroforética , Éxons , Regulação da Expressão Gênica , Hominidae/genética , Humanos , Hylobates/genética , Funções Verossimilhança , Sequências Reguladoras de Ácido Nucleico , Especificidade da Espécie , Células Tumorais Cultivadas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...