Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Nature ; 2024 May 20.
Article in English | MEDLINE | ID: mdl-38768635

ABSTRACT

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

2.
Nature ; 599(7886): 628-634, 2021 11.
Article in English | MEDLINE | ID: mdl-34662886

ABSTRACT

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.


Subject(s)
Biological Specimen Banks , Databases, Genetic , Exome Sequencing , Exome/genetics , Africa/ethnology , Asia/ethnology , Asthma/genetics , Diabetes Mellitus/genetics , Europe/ethnology , Eye Diseases/genetics , Female , Genetic Predisposition to Disease/genetics , Genetic Variation , Genome-Wide Association Study , Humans , Hypertension/genetics , Liver Diseases/genetics , Male , Mutation , Neoplasms/genetics , Quantitative Trait, Heritable , United Kingdom
3.
Science ; 373(6550)2021 07 02.
Article in English | MEDLINE | ID: mdl-34210852

ABSTRACT

Large-scale human exome sequencing can identify rare protein-coding variants with a large impact on complex traits such as body adiposity. We sequenced the exomes of 645,626 individuals from the United Kingdom, the United States, and Mexico and estimated associations of rare coding variants with body mass index (BMI). We identified 16 genes with an exome-wide significant association with BMI, including those encoding five brain-expressed G protein-coupled receptors (CALCR, MC4R, GIPR, GPR151, and GPR75). Protein-truncating variants in GPR75 were observed in ~4/10,000 sequenced individuals and were associated with 1.8 kilograms per square meter lower BMI and 54% lower odds of obesity in the heterozygous state. Knock out of Gpr75 in mice resulted in resistance to weight gain and improved glycemic control in a high-fat diet model. Inhibition of GPR75 may provide a therapeutic strategy for obesity.


Subject(s)
Body Mass Index , Exome/genetics , Obesity/genetics , Receptors, G-Protein-Coupled/genetics , Animals , Genetic Variation , Humans , Mice , Mice, Knockout , Sequence Analysis, DNA , Weight Gain/genetics
4.
Nat Genet ; 53(7): 942-948, 2021 07.
Article in English | MEDLINE | ID: mdl-34183854

ABSTRACT

The UK Biobank Exome Sequencing Consortium (UKB-ESC) is a private-public partnership between the UK Biobank (UKB) and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants. Here, we describe the early results from ~200,000 UKB participants and the features of this project that enabled its success. The biopharmaceutical industry has increasingly used human genetics to improve success in drug discovery. Recognizing the need for large-scale human genetics data, as well as the unique value of the data access and contribution terms of the UKB, the UKB-ESC was formed. As a result, exome data from 200,643 UKB enrollees are now available. These data include ~10 million exonic variants-a rich resource of rare coding variation that is particularly valuable for drug discovery. The UKB-ESC precompetitive collaboration has further strengthened academic and industry ties and has provided teams with an opportunity to interact with and learn from the wider research community.


Subject(s)
Biological Specimen Banks , Drug Discovery , Exome Sequencing , Human Genetics , Research , Drug Discovery/methods , Genomics/methods , Humans , United Kingdom
5.
Clin Infect Dis ; 73(11): e4400-e4408, 2021 12 06.
Article in English | MEDLINE | ID: mdl-32897368

ABSTRACT

BACKGROUND: Respiratory syncytial virus (RSV) is a major cause of childhood medically attended respiratory infection (MARI). METHODS: We conducted a randomized, double-blind, placebo-controlled phase 3 trial in 1154 preterm infants of 1 or 2 doses of suptavumab, a human monoclonal antibody that can bind and block a conserved epitope on RSV A and B subtypes, for the prevention of RSV MARI. The primary endpoint was proportion of subjects with RSV-confirmed hospitalizations or outpatient lower respiratory tract infection (LRTI). RESULTS: There were no significant differences between primary endpoint rates (8.1%, placebo; 7.7%, 1-dose; 9.3%, 2-dose). Suptavumab prevented RSV A infections (relative risks, .38; 95% confidence interval [CI], .14-1.05 in the 1-dose group and .39 [95% CI, .14-1.07] in the 2-dose group; nominal significance of combined suptavumab group vs placebo; P = .0499), while increasing the rate of RSV B infections (relative risk 1.36 [95% CI, .73-2.56] in the 1-dose group and 1.69 [95% CI, .92-3.08] in the 2-dose group; nominal significance of combined suptavumab group vs placebo; P = .12). Sequenced RSV isolates demonstrated no suptavumab epitope changes in RSV A isolates, while all RSV B isolates had 2-amino acid substitution in the suptavumab epitope that led to loss of neutralization activity. Treatment emergent adverse events were balanced across treatment groups. CONCLUSIONS: Suptavumab did not reduce overall RSV hospitalizations or outpatient LRTI because of a newly circulating mutant strain of RSV B. Genetic variation in circulating RSV strains will continue to challenge prevention efforts. CLINICAL TRIALS REGISTRATION: NCT02325791.


Subject(s)
Respiratory Syncytial Virus Infections , Respiratory Syncytial Virus, Human , Antibodies, Monoclonal/therapeutic use , Antiviral Agents , Humans , Infant , Infant, Newborn , Infant, Premature , Respiratory Syncytial Virus Infections/drug therapy , Respiratory Syncytial Virus Infections/prevention & control
6.
Nature ; 586(7831): 749-756, 2020 10.
Article in English | MEDLINE | ID: mdl-33087929

ABSTRACT

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/genetics
7.
Am J Hum Genet ; 102(5): 874-889, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29727688

ABSTRACT

Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.


Subject(s)
Exome/genetics , Precision Medicine , Cohort Studies , Computer Simulation , Electronic Health Records , Exons/genetics , Family , Female , Genetics, Population , Geography , Heterozygote , Humans , Male , Mutation/genetics , Pedigree , Phenotype , Reproducibility of Results
8.
JAMA ; 312(18): 1870-9, 2014 Nov 12.
Article in English | MEDLINE | ID: mdl-25326635

ABSTRACT

IMPORTANCE: Clinical whole-exome sequencing is increasingly used for diagnostic evaluation of patients with suspected genetic disorders. OBJECTIVE: To perform clinical whole-exome sequencing and report (1) the rate of molecular diagnosis among phenotypic groups, (2) the spectrum of genetic alterations contributing to disease, and (3) the prevalence of medically actionable incidental findings such as FBN1 mutations causing Marfan syndrome. DESIGN, SETTING, AND PATIENTS: Observational study of 2000 consecutive patients with clinical whole-exome sequencing analyzed between June 2012 and August 2014. Whole-exome sequencing tests were performed at a clinical genetics laboratory in the United States. Results were reported by clinical molecular geneticists certified by the American Board of Medical Genetics and Genomics. Tests were ordered by the patient's physician. The patients were primarily pediatric (1756 [88%]; mean age, 6 years; 888 females [44%], 1101 males [55%], and 11 fetuses [1% gender unknown]), demonstrating diverse clinical manifestations most often including nervous system dysfunction such as developmental delay. MAIN OUTCOMES AND MEASURES: Whole-exome sequencing diagnosis rate overall and by phenotypic category, mode of inheritance, spectrum of genetic events, and reporting of incidental findings. RESULTS: A molecular diagnosis was reported for 504 patients (25.2%) with 58% of the diagnostic mutations not previously reported. Molecular diagnosis rates for each phenotypic category were 143/526 (27.2%; 95% CI, 23.5%-31.2%) for the neurological group, 282/1147 (24.6%; 95% CI, 22.1%-27.2%) for the neurological plus other organ systems group, 30/83 (36.1%; 95% CI, 26.1%-47.5%) for the specific neurological group, and 49/244 (20.1%; 95% CI, 15.6%-25.8%) for the nonneurological group. The Mendelian disease patterns of the 527 molecular diagnoses included 280 (53.1%) autosomal dominant, 181 (34.3%) autosomal recessive (including 5 with uniparental disomy), 65 (12.3%) X-linked, and 1 (0.2%) mitochondrial. Of 504 patients with a molecular diagnosis, 23 (4.6%) had blended phenotypes resulting from 2 single gene defects. About 30% of the positive cases harbored mutations in disease genes reported since 2011. There were 95 medically actionable incidental findings in genes unrelated to the phenotype but with immediate implications for management in 92 patients (4.6%), including 59 patients (3%) with mutations in genes recommended for reporting by the American College of Medical Genetics and Genomics. CONCLUSIONS AND RELEVANCE: Whole-exome sequencing provided a potential molecular diagnosis for 25% of a large cohort of patients referred for evaluation of suspected genetic conditions, including detection of rare genetic events and new mutations contributing to disease. The yield of whole-exome sequencing may offer advantages over traditional molecular diagnostic approaches in certain patients.


Subject(s)
Exome , Genetic Diseases, Inborn/diagnosis , Molecular Diagnostic Techniques , Sequence Analysis, DNA/methods , Adolescent , Adult , Child , Child, Preschool , Female , Fetus , Genetic Testing , Genomics , Humans , Incidental Findings , Infant , Infant, Newborn , Male , Mutation , Phenotype , Referral and Consultation
9.
Genome Med ; 5(6): 57, 2013.
Article in English | MEDLINE | ID: mdl-23806086

ABSTRACT

BACKGROUND: The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation. METHODS: We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies' Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq). RESULTS: We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband. CONCLUSIONS: ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.

10.
PLoS Genet ; 9(4): e1003443, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23593035

ABSTRACT

We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.


Subject(s)
Child Development Disorders, Pervasive/genetics , Exome , Genome-Wide Association Study , Case-Control Studies , Child , Child Development Disorders, Pervasive/physiopathology , Genetic Predisposition to Disease , Genetic Variation , Humans , Population Control , Sequence Analysis, DNA , Software
11.
Neuron ; 77(2): 235-42, 2013 Jan 23.
Article in English | MEDLINE | ID: mdl-23352160

ABSTRACT

To characterize the role of rare complete human knockouts in autism spectrum disorders (ASDs), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a 2-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤ 5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudoautosomal regions on the X chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together, these results provide compelling evidence that rare autosomal and X chromosome complete gene knockouts are important inherited risk factors for ASD.


Subject(s)
Child Development Disorders, Pervasive/diagnosis , Child Development Disorders, Pervasive/genetics , Demography/methods , Gene Deletion , Loss of Heterozygosity/genetics , Case-Control Studies , Child Development Disorders, Pervasive/epidemiology , Child, Preschool , Chromosomes, Human, X/genetics , Female , Genetic Variation/genetics , Homozygote , Humans , Linkage Disequilibrium/genetics , Male , Risk Factors
12.
Physiol Genomics ; 43(18): 1029-37, 2011 Sep 22.
Article in English | MEDLINE | ID: mdl-21771880

ABSTRACT

Our objective was to resequence insulin receptor substrate 2 (IRS2) to identify variants associated with obesity- and diabetes-related traits in Hispanic children. Exonic and intronic segments, 5' and 3' flanking regions of IRS2 (∼14.5 kb), were bidirectionally sequenced for single nucleotide polymorphism (SNP) discovery in 934 Hispanic children using 3730XL DNA Sequencers. Additionally, 15 SNPs derived from Illumina HumanOmni1-Quad BeadChips were analyzed. Measured genotype analysis tested associations between SNPs and obesity and diabetes-related traits. Bayesian quantitative trait nucleotide analysis was used to statistically infer the most likely functional polymorphisms. A total of 140 SNPs were identified with minor allele frequencies (MAF) ranging from 0.001 to 0.47. Forty-two of the 70 coding SNPs result in nonsynonymous amino acid substitutions relative to the consensus sequence; 28 SNPs were detected in the promoter, 12 in introns, 28 in the 3'-UTR, and 2 in the 5'-UTR. Two insertion/deletions (indels) were detected. Ten independent rare SNPs (MAF = 0.001-0.009) were associated with obesity-related traits (P = 0.01-0.00002). SNP 10510452_139 in the promoter region was shown to have a high posterior probability (P = 0.77-0.86) of influencing BMI, fat mass, and waist circumference in Hispanic children. SNP 10510452_139 contributed between 2 and 4% of the population variance in body weight and composition. None of the SNPs or indels were associated with diabetes-related traits or accounted for a previously identified quantitative trait locus on chromosome 13 for fasting serum glucose. Rare but not common IRS2 variants may play a role in the regulation of body weight but not an essential role in fasting glucose homeostasis in Hispanic children.


Subject(s)
Fasting/metabolism , Glucose/metabolism , Hispanic or Latino/genetics , Homeostasis/genetics , Insulin Receptor Substrate Proteins/genetics , Obesity/genetics , Polymorphism, Single Nucleotide/genetics , Bayes Theorem , Child , Diabetes Mellitus/genetics , Genetic Predisposition to Disease , Genotyping Techniques , Humans , Linkage Disequilibrium/genetics , Quantitative Trait, Heritable , Sequence Analysis, DNA
13.
Hum Mol Genet ; 20(17): 3366-75, 2011 Sep 01.
Article in English | MEDLINE | ID: mdl-21624971

ABSTRACT

Autism spectrum disorders (ASDs) are a heterogeneous group of neuro-developmental disorders. While significant progress has been made in the identification of genes and copy number variants associated with syndromic autism, little is known to date about the etiology of idiopathic non-syndromic autism. Sanger sequencing of 21 known autism susceptibility genes in 339 individuals with high-functioning, idiopathic ASD revealed de novo mutations in at least one of these genes in 6 of 339 probands (1.8%). Additionally, multiple events of oligogenic heterozygosity were seen, affecting 23 of 339 probands (6.8%). Screening of a control population for novel coding variants in CACNA1C, CDKL5, HOXA1, SHANK3, TSC1, TSC2 and UBE3A by the same sequencing technology revealed that controls were carriers of oligogenic heterozygous events at significantly (P < 0.01) lower rate, suggesting oligogenic heterozygosity as a new potential mechanism in the pathogenesis of ASDs.


Subject(s)
Child Development Disorders, Pervasive/genetics , Adolescent , Child , Child, Preschool , Genetic Predisposition to Disease/genetics , Heterozygote , Humans , Male , Mutation
14.
Nat Commun ; 1: 131, 2010 Nov 30.
Article in English | MEDLINE | ID: mdl-21119644

ABSTRACT

Accurately determining the distribution of rare variants is an important goal of human genetics, but resequencing of a sample large enough for this purpose has been unfeasible until now. Here, we applied Sanger sequencing of genomic PCR amplicons to resequence the diabetes-associated genes KCNJ11 and HHEX in 13,715 people (10,422 European Americans and 3,293 African Americans) and validated amplicons potentially harbouring rare variants using 454 pyrosequencing. We observed far more variation (expected variant-site count ∼578) than would have been predicted on the basis of earlier surveys, which could only capture the distribution of common variants. By comparison with earlier estimates based on common variants, our model shows a clear genetic signal of accelerating population growth, suggesting that humanity harbours a myriad of rare, deleterious variants, and that disease risk and the burden of disease in contemporary populations may be heavily influenced by the distribution of rare variants.

15.
Nature ; 467(7311): 52-8, 2010 Sep 02.
Article in English | MEDLINE | ID: mdl-20811451

ABSTRACT

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of

Subject(s)
DNA Copy Number Variations , Genome, Human , Polymorphism, Single Nucleotide , Population Groups/genetics , Human Genome Project , Humans
16.
Nature ; 455(7216): 1069-75, 2008 Oct 23.
Article in English | MEDLINE | ID: mdl-18948947

ABSTRACT

Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.


Subject(s)
Adenocarcinoma, Bronchiolo-Alveolar/genetics , Lung Neoplasms/genetics , Mutation/genetics , Female , Gene Dosage , Gene Expression Regulation, Neoplastic , Genes, Tumor Suppressor , Humans , Male , Proto-Oncogenes/genetics
17.
BMC Microbiol ; 7: 99, 2007 Nov 06.
Article in English | MEDLINE | ID: mdl-17986343

ABSTRACT

BACKGROUND: Community acquired (CA) methicillin-resistant Staphylococcus aureus (MRSA) increasingly causes disease worldwide. USA300 has emerged as the predominant clone causing superficial and invasive infections in children and adults in the USA. Epidemiological studies suggest that USA300 is more virulent than other CA-MRSA. The genetic determinants that render virulence and dominance to USA300 remain unclear. RESULTS: We sequenced the genomes of two pediatric USA300 isolates: one CA-MRSA and one CA-methicillin susceptible (MSSA), isolated at Texas Children's Hospital in Houston. DNA sequencing was performed by Sanger dideoxy whole genome shotgun (WGS) and 454 Life Sciences pyrosequencing strategies. The sequence of the USA300 MRSA strain was rigorously annotated. In USA300-MRSA 2658 chromosomal open reading frames were predicted and 3.1 and 27 kilobase (kb) plasmids were identified. USA300-MSSA contained a 20 kb plasmid with some homology to the 27 kb plasmid found in USA300-MRSA. Two regions found in US300-MRSA were absent in USA300-MSSA. One of these carried the arginine deiminase operon that appears to have been acquired from S. epidermidis. The USA300 sequence was aligned with other sequenced S. aureus genomes and regions unique to USA300 MRSA were identified. CONCLUSION: USA300-MRSA is highly similar to other MRSA strains based on whole genome alignments and gene content, indicating that the differences in pathogenesis are due to subtle changes rather than to large-scale acquisition of virulence factor genes. The USA300 Houston isolate differs from another sequenced USA300 strain isolate, derived from a patient in San Francisco, in plasmid content and a number of sequence polymorphisms. Such differences will provide new insights into the evolution of pathogens.


Subject(s)
Staphylococcal Infections/epidemiology , Staphylococcus aureus/genetics , Adolescent , Anti-Bacterial Agents/pharmacology , Base Sequence , Genomic Islands/genetics , Humans , Hydrolases/genetics , Methicillin Resistance , Molecular Epidemiology , Molecular Sequence Data , Open Reading Frames/genetics , Plasmids/genetics , Polymorphism, Genetic , Staphylococcus aureus/drug effects , United States/epidemiology
18.
PLoS One ; 2(9): e928, 2007 Sep 26.
Article in English | MEDLINE | ID: mdl-17895969

ABSTRACT

BACKGROUND: Bacillus spores are notoriously resistant to unfavorable conditions such as UV radiation, gamma-radiation, H2O2, desiccation, chemical disinfection, or starvation. Bacillus pumilus SAFR-032 survives standard decontamination procedures of the Jet Propulsion Lab spacecraft assembly facility, and both spores and vegetative cells of this strain exhibit elevated resistance to UV radiation and H2O2 compared to other Bacillus species. PRINCIPAL FINDINGS: The genome of B. pumilus SAFR-032 was sequenced and annotated. Lists of genes relevant to DNA repair and the oxidative stress response were generated and compared to B. subtilis and B. licheniformis. Differences in conservation of genes, gene order, and protein sequences are highlighted because they potentially explain the extreme resistance phenotype of B. pumilus. The B. pumilus genome includes genes not found in B. subtilis or B. licheniformis and conserved genes with sequence divergence, but paradoxically lacks several genes that function in UV or H2O2 resistance in other Bacillus species. SIGNIFICANCE: This study identifies several candidate genes for further research into UV and H2O2 resistance. These findings will help explain the resistance of B. pumilus and are applicable to understanding sterilization survival strategies of microbes.


Subject(s)
Bacillus/genetics , DNA Repair , Drug Resistance, Bacterial/genetics , Hydrogen Peroxide/pharmacology , Bacillus/drug effects , Bacillus/radiation effects , Gamma Rays , Genes, Bacterial , Genome, Bacterial , Oxidative Stress , Sequence Analysis, DNA , Spores, Bacterial/drug effects , Spores, Bacterial/genetics , Spores, Bacterial/radiation effects , Ultraviolet Rays
19.
Science ; 316(5822): 222-34, 2007 Apr 13.
Article in English | MEDLINE | ID: mdl-17431167

ABSTRACT

The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.


Subject(s)
Evolution, Molecular , Genome , Macaca mulatta/genetics , Animals , Biomedical Research , Female , Gene Duplication , Gene Rearrangement , Genetic Diseases, Inborn , Genetic Variation , Humans , Male , Multigene Family , Mutation , Pan troglodytes/genetics , Sequence Analysis, DNA , Species Specificity
20.
Nature ; 440(7088): 1194-8, 2006 Apr 27.
Article in English | MEDLINE | ID: mdl-16641997

ABSTRACT

After the completion of a draft human genome sequence, the International Human Genome Sequencing Consortium has proceeded to finish and annotate each of the 24 chromosomes comprising the human genome. Here we describe the sequencing and analysis of human chromosome 3, one of the largest human chromosomes. Chromosome 3 comprises just four contigs, one of which currently represents the longest unbroken stretch of finished DNA sequence known so far. The chromosome is remarkable in having the lowest rate of segmental duplication in the genome. It also includes a chemokine receptor gene cluster as well as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion that occurred some time after the split of Homininae from Ponginae, and propose an evolutionary history of the inversion.


Subject(s)
Chromosomes, Human, Pair 3/genetics , Animals , Base Sequence , Chromosome Breakage/genetics , Chromosome Inversion/genetics , Contig Mapping , CpG Islands/genetics , DNA, Complementary/genetics , Evolution, Molecular , Expressed Sequence Tags , Human Genome Project , Humans , Macaca mulatta/genetics , Molecular Sequence Data , Pan troglodytes/genetics , Sequence Analysis, DNA , Synteny/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...