Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
J Am Stat Assoc ; 114(526): 723-734, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31391793

RESUMO

We consider the problem of learning a conditional Gaussian graphical model in the presence of latent variables. Building on recent advances in this field, we suggest a method that decomposes the parameters of a conditional Markov random field into the sum of a sparse and a low-rank matrix. We derive convergence bounds for this estimator and show that it is well-behaved in the high-dimensional regime as well as "sparsistent" (i.e., capable of recovering the graph structure). We then show how proximal gradient algorithms and semi-definite programming techniques can be employed to fit the model to thousands of variables. Through extensive simulations, we illustrate the conditions required for identifiability and show that there is a wide range of situations in which this model performs significantly better than its counterparts, for example, by accommodating more latent variables. Finally, the suggested method is applied to two datasets comprising individual level data on genetic variants and metabolites levels. We show our results replicate better than alternative approaches and show enriched biological signal. Supplementary materials for this article are available online.

2.
Genet Epidemiol ; 43(5): 532-547, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30920090

RESUMO

Genome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared with standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for a range of possible true patterns of association across studies in a computationally efficient framework.


Assuntos
Estudo de Associação Genômica Ampla , Teorema de Bayes , Estudos de Casos e Controles , Simulação por Computador , Humanos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
3.
J Virol ; 93(1)2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30333167

RESUMO

Accurate determination of the genetic diversity present in the HIV quasispecies is critical for the development of a preventative vaccine: in particular, little is known about viral genetic diversity for the second type of HIV, HIV-2. A better understanding of HIV-2 biology is relevant to the HIV vaccine field because a substantial proportion of infected people experience long-term viral control, and prior HIV-2 infection has been associated with slower HIV-1 disease progression in coinfected subjects. The majority of traditional and next-generation sequencing methods have relied on target amplification prior to sequencing, introducing biases that may obscure the true signals of diversity in the viral population. Additionally, target enrichment through PCR requires a priori sequence knowledge, which is lacking for HIV-2. Therefore, a target enrichment free method of library preparation would be valuable for the field. We applied an RNA shotgun sequencing (RNA-Seq) method without PCR amplification to cultured viral stocks and patient plasma samples from HIV-2-infected individuals. Libraries generated from total plasma RNA were analyzed with a two-step pipeline: (i) de novo genome assembly, followed by (ii) read remapping. By this approach, whole-genome sequences were generated with a 28× to 67× mean depth of coverage. Assembled reads showed a low level of GC bias, and comparison of the genome diversities at the intrahost level showed low diversity in the accessory gene vpx in all patients. Our study demonstrates that RNA-Seq is a feasible full-genome de novo sequencing method for blood plasma samples collected from HIV-2-infected individuals.IMPORTANCE An accurate picture of viral genetic diversity is critical for the development of a globally effective HIV vaccine. However, sequencing strategies are often complicated by target enrichment prior to sequencing, introducing biases that can distort variant frequencies, which are not easily corrected for in downstream analyses. Additionally, detailed a priori sequence knowledge is needed to inform robust primer design when employing PCR amplification, a factor that is often lacking when working with tropical diseases localized in developing countries. Previous work has demonstrated that direct RNA shotgun sequencing (RNA-Seq) can be used to circumvent these issues for hepatitis C virus (HCV) and norovirus. We applied RNA-Seq to total RNA extracted from HIV-2 blood plasma samples, demonstrating the applicability of this technique to HIV-2 and allowing us to generate a dynamic picture of genetic diversity over the whole genome of HIV-2 in the context of low-bias sequencing.


Assuntos
Infecções por HIV/virologia , HIV-2/genética , RNA Viral/sangue , Análise de Sequência de RNA/métodos , África Ocidental , Viés , Feminino , Genoma Viral , Infecções por HIV/sangue , HIV-2/classificação , Humanos , Masculino , Filogenia , Quase-Espécies , Análise de Sequência de RNA/normas
4.
Genome Res ; 28(12): 1779-1790, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30355600

RESUMO

Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mutation levels in sperm over time. This process, termed "selfish spermatogonial selection," explains the high spontaneous birth prevalence and strong paternal age-effect of disorders such as achondroplasia and Apert, Noonan and Costello syndromes, with direct experimental evidence currently available for specific positions of six genes (FGFR2, FGFR3, RET, PTPN11, HRAS, and KRAS). We present a discovery screen to identify novel mutations and genes showing evidence of positive selection in the male germline, by performing massively parallel simplex PCR using RainDance technology to interrogate mutational hotspots in 67 genes (51.5 kb in total) in 276 biopsies of testes from five men (median age, 83 yr). Following ultradeep sequencing (about 16,000×), development of a low-frequency variant prioritization strategy, and targeted validation, we identified 61 distinct variants present at frequencies as low as 0.06%, including 54 variants not previously directly associated with selfish selection. The majority (80%) of variants identified have previously been implicated in developmental disorders and/or oncogenesis and include mutations in six newly associated genes (BRAF, CBL, MAP2K1, MAP2K2, RAF1, and SOS1), all of which encode components of the RAS-MAPK pathway and activate signaling. Our findings extend the link between mutations dysregulating the RAS-MAPK pathway and selfish selection, and show that the aging male germline is a repository for such deleterious mutations.


Assuntos
Proteínas Quinases Ativadas por Mitógeno/metabolismo , Mutação , Transdução de Sinais , Testículo/metabolismo , Proteínas ras/metabolismo , Idoso , Idoso de 80 Anos ou mais , Variação Genética , Humanos , Masculino , Pessoa de Meia-Idade
5.
PLoS One ; 12(5): e0178169, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28542371

RESUMO

Adult male germline stem cells (spermatogonia) proliferate by mitosis and, after puberty, generate spermatocytes that undertake meiosis to produce haploid spermatozoa. Germ cells are under evolutionary constraint to curtail mutations and maintain genome integrity. Despite constant turnover, spermatogonia very rarely form tumors, so-called spermatocytic tumors (SpT). In line with the previous identification of FGFR3 and HRAS selfish mutations in a subset of cases, candidate gene screening of 29 SpTs identified an oncogenic NRAS mutation in two cases. To gain insights in the etiology of SpT and into properties of the male germline, we performed whole-genome sequencing of five tumors (4/5 with matched normal tissue). The acquired single nucleotide variant load was extremely low (~0.2 per Mb), with an average of 6 (2-9) non-synonymous variants per tumor, none of which is likely to be oncogenic. The observed mutational signature of SpTs is strikingly similar to that of germline de novo mutations, mostly involving C>T transitions with a significant enrichment in the ACG trinucleotide context. The tumors exhibited extensive aneuploidy (50-99 autosomes/tumor) involving whole-chromosomes, with recurrent gains of chr9 and chr20 and loss of chr7, suggesting that aneuploidy itself represents the initiating oncogenic event. We propose that SpT etiology recapitulates the unique properties of male germ cells; because of evolutionary constraints to maintain low point mutation rate, rare tumorigenic driver events are caused by a combination of gene imbalance mediated via whole-chromosome aneuploidy. Finally, we propose a general framework of male germ cell tumor pathology that accounts for their mutational landscape, timing and cellular origin.


Assuntos
Biomarcadores Tumorais/genética , Genoma Humano , Mutação em Linhagem Germinativa/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Espermatócitos/patologia , Neoplasias Testiculares/genética , Variações do Número de Cópias de DNA/genética , Metilação de DNA , Humanos , Masculino , Receptor Tipo 3 de Fator de Crescimento de Fibroblastos , Maturidade Sexual , Espermatócitos/metabolismo , Neoplasias Testiculares/patologia
6.
Nat Genet ; 49(5): 666-673, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28394351

RESUMO

Outcomes of hepatitis C virus (HCV) infection and treatment depend on viral and host genetic factors. Here we use human genome-wide genotyping arrays and new whole-genome HCV viral sequencing technologies to perform a systematic genome-to-genome study of 542 individuals who were chronically infected with HCV, predominantly genotype 3. We show that both alleles of genes encoding human leukocyte antigen molecules and genes encoding components of the interferon lambda innate immune system drive viral polymorphism. Additionally, we show that IFNL4 genotypes determine HCV viral load through a mechanism dependent on a specific amino acid residue in the HCV NS5A protein. These findings highlight the interplay between the innate immune system and the viral genome in HCV control.


Assuntos
Imunidade Adaptativa/genética , Genoma Humano/genética , Genoma Viral/genética , Hepacivirus/genética , Hepatite C Crônica/genética , Imunidade Inata/genética , Alelos , Variação Genética , Genótipo , Antígenos HLA/genética , Hepacivirus/fisiologia , Hepatite C Crônica/virologia , Interações Hospedeiro-Patógeno/genética , Humanos , Interleucinas/genética , Modelos Logísticos , Análise de Componente Principal , Carga Viral/genética , Proteínas não Estruturais Virais/genética
7.
PLoS Comput Biol ; 12(5): e1004842, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27145223

RESUMO

A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.


Assuntos
Variação Genética , Modelos Genéticos , Linhagem , Algoritmos , Biologia Computacional , Simulação por Computador , Evolução Molecular , Genética Populacional , Humanos , Recombinação Genética , Tamanho da Amostra
8.
Bioinformatics ; 32(12): 1898-900, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-26873930

RESUMO

MOTIVATION: For many classes of disease the same genetic risk variants underly many related phenotypes or disease subtypes. Multinomial logistic regression provides an attractive framework to analyze multi-category phenotypes, and explore the genetic relationships between these phenotype categories. We introduce Trinculo, a program that implements a wide range of multinomial analyses in a single fast package that is designed to be easy to use by users of standard genome-wide association study software. AVAILABILITY AND IMPLEMENTATION: An open source C implementation, with code and binaries for Linux and Mac OSX, is available for download at http://sourceforge.net/projects/trinculo SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: lj4@well.ox.ac.uk.


Assuntos
Teorema de Bayes , Estudo de Associação Genômica Ampla , Modelos Logísticos , Fenótipo , Software , Humanos
9.
Nat Genet ; 47(3): 226-34, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25599401

RESUMO

We report a large multicenter genome-wide association study of Plasmodium falciparum resistance to artemisinin, the frontline antimalarial drug. Across 15 locations in Southeast Asia, we identified at least 20 mutations in kelch13 (PF3D7_1343700) affecting the encoded propeller and BTB/POZ domains, which were associated with a slow parasite clearance rate after treatment with artemisinin derivatives. Nonsynonymous polymorphisms in fd (ferredoxin), arps10 (apicoplast ribosomal protein S10), mdr2 (multidrug resistance protein 2) and crt (chloroquine resistance transporter) also showed strong associations with artemisinin resistance. Analysis of the fine structure of the parasite population showed that the fd, arps10, mdr2 and crt polymorphisms are markers of a genetic background on which kelch13 mutations are particularly likely to arise and that they correlate with the contemporary geographical boundaries and population frequencies of artemisinin resistance. These findings indicate that the risk of new resistance-causing mutations emerging is determined by specific predisposing genetic factors in the underlying parasite population.


Assuntos
Antimaláricos/farmacologia , Artemisininas/farmacologia , Genoma de Protozoário , Plasmodium falciparum/efeitos dos fármacos , Plasmodium falciparum/genética , Resistência a Medicamentos/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Malária Falciparum/tratamento farmacológico , Malária Falciparum/parasitologia , Mutação , Polimorfismo de Nucleotídeo Único
10.
Proc Natl Acad Sci U S A ; 110(50): 20152-7, 2013 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-24259709

RESUMO

The RAS proto-oncogene Harvey rat sarcoma viral oncogene homolog (HRAS) encodes a small GTPase that transduces signals from cell surface receptors to intracellular effectors to control cellular behavior. Although somatic HRAS mutations have been described in many cancers, germline mutations cause Costello syndrome (CS), a congenital disorder associated with predisposition to malignancy. Based on the epidemiology of CS and the occurrence of HRAS mutations in spermatocytic seminoma, we proposed that activating HRAS mutations become enriched in sperm through a process akin to tumorigenesis, termed selfish spermatogonial selection. To test this hypothesis, we quantified the levels, in blood and sperm samples, of HRAS mutations at the p.G12 codon and compared the results to changes at the p.A11 codon, at which activating mutations do not occur. The data strongly support the role of selection in determining HRAS mutation levels in sperm, and hence the occurrence of CS, but we also found differences from the mutation pattern in tumorigenesis. First, the relative prevalence of mutations in sperm correlates weakly with their in vitro activating properties and occurrence in cancers. Second, specific tandem base substitutions (predominantly GC>TT/AA) occur in sperm but not in cancers; genomewide analysis showed that this same mutation is also overrepresented in constitutional pathogenic and polymorphic variants, suggesting a heightened vulnerability to these mutations in the germline. We developed a statistical model to show how both intrinsic mutation rate and selfish selection contribute to the mutational burden borne by the paternal germline.


Assuntos
Envelhecimento/genética , Carcinogênese/genética , Síndrome de Costello/genética , Células Germinativas/química , Proteínas Proto-Oncogênicas p21(ras)/genética , Seleção Genética/genética , Adulto , Idoso , Envelhecimento/sangue , Códon/genética , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Mutação/genética , Proto-Oncogene Mas
11.
J Clin Endocrinol Metab ; 98(4): E796-800, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23450047

RESUMO

CONTEXT: The tumorigenic role of genetic abnormalities in sporadic pituitary nonfunctioning adenomas (NFAs), which usually originate from gonadotroph cells, is unknown. OBJECTIVE: The objective of the study was to identify somatic genetic abnormalities in sporadic pituitary NFAs. DESIGN: Whole-exome sequencing was performed using DNA from 7 pituitary NFAs and leukocyte samples obtained from the same patients. Somatic variants were confirmed by dideoxynucleotide sequencing, and candidate driver genes were assessed in an additional 24 pituitary NFAs. RESULTS: Whole-exome sequencing achieved a high degree of coverage such that approximately 97% of targeted bases were represented by more than 10 base reads; 24 somatic variants were identified and confirmed in the discovery set of 7 pituitary NFAs (mean 3.5 variants/tumor; range 1-7). Approximately 80% of variants occurred as missense single nucleotide variants and the remainder were synonymous changes or small frameshift deletions. Each of the 24 mutations occurred in independent genes with no recurrent mutations. Mutations were not observed in genes previously associated with pituitary tumorigenesis, although somatic variants in putative driver genes including platelet-derived growth factor D (PDGFD), N-myc down-regulated gene family member 4 (NDRG4), and Zipper sterile-α-motif kinase (ZAK) were identified; however, DNA sequence analysis of these in the validation set of 24 pituitary NFAs did not reveal any mutations indicating that these genes are unlikely to contribute significantly in the etiology of sporadic pituitary NFAs. CONCLUSIONS: Pituitary NFAs harbor few somatic mutations consistent with their low proliferation rates and benign nature, but mechanisms other than somatic mutation are likely involved in the etiology of sporadic pituitary NFAs.


Assuntos
Adenoma/genética , Exoma/genética , Neoplasias Hipofisárias/genética , Análise de Sequência de DNA , Adenoma/epidemiologia , Adenoma/fisiopatologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Análise Mutacional de DNA , Feminino , Regulação Neoplásica da Expressão Gênica , Estudos de Associação Genética , Humanos , Masculino , Análise em Microsséries , Pessoa de Meia-Idade , Mutação/fisiologia , Neoplasias Hipofisárias/epidemiologia , Neoplasias Hipofisárias/fisiopatologia , Análise de Sequência de DNA/métodos , Transcriptoma
12.
Science ; 339(6127): 1578-82, 2013 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-23413192

RESUMO

Instances in which natural selection maintains genetic variation in a population over millions of years are thought to be extremely rare. We conducted a genome-wide scan for long-lived balancing selection by looking for combinations of SNPs shared between humans and chimpanzees. In addition to the major histocompatibility complex, we identified 125 regions in which the same haplotypes are segregating in the two species, all but two of which are noncoding. In six cases, there is evidence for an ancestral polymorphism that persisted to the present in humans and chimpanzees. Regions with shared haplotypes are significantly enriched for membrane glycoproteins, and a similar trend is seen among shared coding polymorphisms. These findings indicate that ancient balancing selection has shaped human variation and point to genes involved in host-pathogen interactions as common targets.


Assuntos
Genoma Humano/genética , Interações Hospedeiro-Patógeno/genética , Pan troglodytes/genética , Seleção Genética , Animais , Sequência de Bases , Estudos de Associação Genética , Haplótipos , Humanos , Dados de Sequência Molecular , Linhagem , Polimorfismo de Nucleotídeo Único
13.
Nat Genet ; 45(2): 136-44, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23263490

RESUMO

Many individuals with multiple or large colorectal adenomas or early-onset colorectal cancer (CRC) have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple-adenoma and/or CRC cases but in no controls. The variants associated with susceptibility, POLE p.Leu424Val and POLD1 p.Ser478Asn, have high penetrance, and POLD1 mutation was also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proofreading (exonuclease) domain of DNA polymerases ɛ and δ and are predicted to cause a defect in the correction of mispaired bases inserted during DNA replication. In agreement with this prediction, the tumors from mutation carriers were microsatellite stable but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain.


Assuntos
Adenoma/genética , Neoplasias Colorretais/genética , Reparo de Erro de Pareamento de DNA/genética , DNA Polimerase III/genética , DNA Polimerase II/genética , Replicação do DNA/genética , Modelos Moleculares , Exodesoxirribonucleases/genética , Ligação Genética , Estudo de Associação Genômica Ampla , Mutação em Linhagem Germinativa/genética , Humanos , Repetições de Microssatélites/genética , Linhagem , Proteínas de Ligação a Poli-ADP-Ribose , Schizosaccharomyces/genética , Análise de Sequência de DNA
14.
PLoS Genet ; 8(12): e1003074, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23236289

RESUMO

ß-III spectrin is present in the brain and is known to be important in the function of the cerebellum. Heterozygous mutations in SPTBN2, the gene encoding ß-III spectrin, cause Spinocerebellar Ataxia Type 5 (SCA5), an adult-onset, slowly progressive, autosomal-dominant pure cerebellar ataxia. SCA5 is sometimes known as "Lincoln ataxia," because the largest known family is descended from relatives of the United States President Abraham Lincoln. Using targeted capture and next-generation sequencing, we identified a homozygous stop codon in SPTBN2 in a consanguineous family in which childhood developmental ataxia co-segregates with cognitive impairment. The cognitive impairment could result from mutations in a second gene, but further analysis using whole-genome sequencing combined with SNP array analysis did not reveal any evidence of other mutations. We also examined a mouse knockout of ß-III spectrin in which ataxia and progressive degeneration of cerebellar Purkinje cells has been previously reported and found morphological abnormalities in neurons from prefrontal cortex and deficits in object recognition tasks, consistent with the human cognitive phenotype. These data provide the first evidence that ß-III spectrin plays an important role in cortical brain development and cognition, in addition to its function in the cerebellum; and we conclude that cognitive impairment is an integral part of this novel recessive ataxic syndrome, Spectrin-associated Autosomal Recessive Cerebellar Ataxia type 1 (SPARCA1). In addition, the identification of SPARCA1 and normal heterozygous carriers of the stop codon in SPTBN2 provides insights into the mechanism of molecular dominance in SCA5 and demonstrates that the cell-specific repertoire of spectrin subunits underlies a novel group of disorders, the neuronal spectrinopathies, which includes SCA5, SPARCA1, and a form of West syndrome.


Assuntos
Cerebelo , Espectrina/genética , Ataxias Espinocerebelares , Adulto , Animais , Cerebelo/crescimento & desenvolvimento , Cerebelo/patologia , Mapeamento Cromossômico , Transtornos Cognitivos/genética , Humanos , Camundongos , Camundongos Knockout , Mutação , Neurônios/metabolismo , Neurônios/patologia , Células de Purkinje/patologia , Ataxias Espinocerebelares/genética , Ataxias Espinocerebelares/fisiopatologia
15.
Nat Genet ; 44(12): 1294-301, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23104008

RESUMO

To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies.


Assuntos
Doença da Artéria Coronariana/genética , Diabetes Mellitus Tipo 2/genética , Loci Gênicos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Doença de Graves/genética , Dioxigenase FTO Dependente de alfa-Cetoglutarato , Teorema de Bayes , Antígeno CTLA-4/genética , Quinase 5 Dependente de Ciclina/genética , Inibidor de Quinase Dependente de Ciclina p15/genética , Genes p16 , Proteínas de Homeodomínio/genética , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas/genética , Proteína 2 Semelhante ao Fator 7 de Transcrição/genética , Fatores de Transcrição/genética , tRNA Metiltransferases
16.
Science ; 334(6062): 1518-24, 2011 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-22174245

RESUMO

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.


Assuntos
Interpretação Estatística de Dados , Algoritmos , Animais , Beisebol/estatística & dados numéricos , Feminino , Expressão Gênica , Genes Fúngicos , Genômica/métodos , Humanos , Intestinos/microbiologia , Masculino , Metagenoma , Camundongos , Obesidade , Saccharomyces cerevisiae/genética
17.
Bioinformatics ; 27(15): 2156-8, 2011 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-21653522

RESUMO

SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net


Assuntos
Variação Genética , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Alelos , Genoma Humano , Genótipo , Humanos
18.
Blood ; 118(3): 670-4, 2011 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-21596858

RESUMO

Since an association between the human leukocyte antigen (HLA) region and Hodgkin lymphoma (HL) was first reported in 1967, many studies have reported associations between HL risk and both single nucleotide polymorphism (SNP) and classic HLA allele variation in the major histocompatibility complex. However, population stratification and the extent and complexity of linkage disequilibrium within the major histocompatibility complex have hindered efforts to fine-map causal signals. Using SNP data to impute alleles at classic HLA loci, we have conducted an integrated analysis of HL risk within the HLA region in 582 early-onset HL cases and 4736 controls. We confirm that the strongest signal of association comes from an SNP located in the class II region, rs6903608 (odds ratio [OR] = 1.79, P = 6.63 × 10(-19)), which is unlikely to be driven by association to HLA-DRB, DQA, or DQB alleles. In addition, we identify independent signals at rs2281389 (OR = 1.73, P = 6.31 × 10(-13)), a SNP that maps closely to HLA-DPB1, and the class II HLA allele DQA1*02:01 (OR = 0.56, P = 1.51 × 10(-7)). These data suggest that multiple independent loci within the HLA class II region contribute to the risk of developing early-onset HL.


Assuntos
Cromossomos Humanos Par 6 , Antígenos HLA/genética , Doença de Hodgkin/epidemiologia , Doença de Hodgkin/genética , Idade de Início , Predisposição Genética para Doença/epidemiologia , Predisposição Genética para Doença/genética , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Fatores de Risco
19.
Genome Biol ; 12(4): R33, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21463505

RESUMO

BACKGROUND: The human malaria parasite Plasmodium falciparum survives pressures from the host immune system and antimalarial drugs by modifying its genome. Genetic recombination and nucleotide substitution are the two major mechanisms that the parasite employs to generate genome diversity. A better understanding of these mechanisms may provide important information for studying parasite evolution, immune evasion and drug resistance. RESULTS: Here, we used a high-density tiling array to estimate the genetic recombination rate among 32 progeny of a P. falciparum genetic cross (7G8 × GB4). We detected 638 recombination events and constructed a high-resolution genetic map. Comparing genetic and physical maps, we obtained an overall recombination rate of 9.6 kb per centimorgan and identified 54 candidate recombination hotspots. Similar to centromeres in other organisms, the sequences of P. falciparum centromeres are found in chromosome regions largely devoid of recombination activity. Motifs enriched in hotspots were also identified, including a 12-bp G/C-rich motif with 3-bp periodicity that may interact with a protein containing 11 predicted zinc finger arrays. CONCLUSIONS: These results show that the P. falciparum genome has a high recombination rate, although it also follows the overall rule of meiosis in eukaryotes with an average of approximately one crossover per chromosome per meiosis. GC-rich repetitive motifs identified in the hotspot sequences may play a role in the high recombination rate observed. The lack of recombination activity in centromeric regions is consistent with the observations of reduced recombination near the centromeres of other organisms.


Assuntos
Troca Genética , Meiose/genética , Plasmodium falciparum/genética , Recombinação Genética/genética , Mapeamento Cromossômico , Cruzamentos Genéticos , Variação Genética , Genoma de Protozoário , Humanos , Malária/parasitologia
20.
Science ; 331(6019): 920-4, 2011 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-21330547

RESUMO

Efforts to identify the genetic basis of human adaptations from polymorphism data have sought footprints of "classic selective sweeps" (in which a beneficial mutation arises and rapidly fixes in the population).Yet it remains unknown whether this form of natural selection was common in our evolution. We examined the evidence for classic sweeps in resequencing data from 179 human genomes. As expected under a recurrent-sweep model, we found that diversity levels decrease near exons and conserved noncoding regions. In contrast to expectation, however, the trough in diversity around human-specific amino acid substitutions is no more pronounced than around synonymous substitutions. Moreover, relative to the genome background, amino acid and putative regulatory sites are not significantly enriched in alleles that are highly differentiated between populations. These findings indicate that classic sweeps were not a dominant mode of human adaptation over the past ~250,000 years.


Assuntos
Evolução Biológica , Variação Genética , Genoma Humano , Seleção Genética , Adaptação Biológica , Substituição de Aminoácidos , Cromossomos Humanos X/genética , Sequência Conservada , Evolução Molecular , Éxons , Frequência do Gene , Haplótipos , Humanos , Modelos Genéticos , Anotação de Sequência Molecular , Mutação , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Regiões não Traduzidas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...