Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
1.
N Engl J Med ; 388(17): 1559-1571, 2023 Apr 27.
Article in English | MEDLINE | ID: mdl-37043637

ABSTRACT

BACKGROUND: Pediatric disorders include a range of highly penetrant, genetically heterogeneous conditions amenable to genomewide diagnostic approaches. Finding a molecular diagnosis is challenging but can have profound lifelong benefits. METHODS: We conducted a large-scale sequencing study involving more than 13,500 families with probands with severe, probably monogenic, difficult-to-diagnose developmental disorders from 24 regional genetics services in the United Kingdom and Ireland. Standardized phenotypic data were collected, and exome sequencing and microarray analyses were performed to investigate novel genetic causes. We developed an iterative variant analysis pipeline and reported candidate variants to clinical teams for validation and diagnostic interpretation to inform communication with families. Multiple regression analyses were performed to evaluate factors affecting the probability of diagnosis. RESULTS: A total of 13,449 probands were included in the analyses. On average, we reported 1.0 candidate variant per parent-offspring trio and 2.5 variants per singleton proband. Using clinical and computational approaches to variant classification, we made a diagnosis in approximately 41% of probands (5502 of 13,449). Of 3599 probands in trios who received a diagnosis by clinical assertion, approximately 76% had a pathogenic de novo variant. Another 22% of probands (2997 of 13,449) had variants of uncertain significance in genes that were strongly linked to monogenic developmental disorders. Recruitment in a parent-offspring trio had the largest effect on the probability of diagnosis (odds ratio, 4.70; 95% confidence interval [CI], 4.16 to 5.31). Probands were less likely to receive a diagnosis if they were born extremely prematurely (i.e., 22 to 27 weeks' gestation; odds ratio, 0.39; 95% CI, 0.22 to 0.68), had in utero exposure to antiepileptic medications (odds ratio, 0.44; 95% CI, 0.29 to 0.67), had mothers with diabetes (odds ratio, 0.52; 95% CI, 0.41 to 0.67), or were of African ancestry (odds ratio, 0.51; 95% CI, 0.31 to 0.78). CONCLUSIONS: Among probands with severe, probably monogenic, difficult-to-diagnose developmental disorders, multimodal analysis of genomewide data had good diagnostic power, even after previous attempts at diagnosis. (Funded by the Health Innovation Challenge Fund and Wellcome Sanger Institute.).


Subject(s)
Genomics , Rare Diseases , Child , Humans , Exome , Ireland/epidemiology , United Kingdom/epidemiology , Rare Diseases/diagnosis , Rare Diseases/epidemiology , Rare Diseases/genetics , Oligonucleotide Array Sequence Analysis , Genetic Association Studies , Neurodevelopmental Disorders/diagnosis , Neurodevelopmental Disorders/genetics , Congenital Abnormalities/diagnosis , Congenital Abnormalities/genetics , Growth Disorders/diagnosis , Growth Disorders/genetics , Facies , Child Behavior Disorders/diagnosis , Child Behavior Disorders/genetics , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics
2.
Cell Genom ; 3(4): 100280, 2023 Apr 12.
Article in English | MEDLINE | ID: mdl-37082143

ABSTRACT

The use of induced pluripotent stem cells (iPSC) as models for development and human disease has enabled the study of otherwise inaccessible tissues. A remaining challenge in developing reliable models is our limited understanding of the factors driving irregular differentiation of iPSCs, particularly the impact of acquired somatic mutations. We leveraged data from a pooled dopaminergic neuron differentiation experiment of 238 iPSC lines profiled with single-cell RNA and whole-exome sequencing to study how somatic mutations affect differentiation outcomes. We found that deleterious somatic mutations in key developmental genes, notably the BCOR gene, are strongly associated with failure in dopaminergic neuron differentiation and a larger proliferation rate in culture. We further identified broad differences in cell type composition between incorrectly and successfully differentiating lines, as well as significant changes in gene expression contributing to the inhibition of neurogenesis. Our work calls for caution in interpreting differentiation-related phenotypes in disease-modeling experiments.

3.
Nat Genet ; 54(9): 1406-1416, 2022 09.
Article in English | MEDLINE | ID: mdl-35953586

ABSTRACT

We explored human induced pluripotent stem cells (hiPSCs) derived from different tissues to gain insights into genomic integrity at single-nucleotide resolution. We used genome sequencing data from two large hiPSC repositories involving 696 hiPSCs and daughter subclones. We find ultraviolet light (UV)-related damage in ~72% of skin fibroblast-derived hiPSCs (F-hiPSCs), occasionally resulting in substantial mutagenesis (up to 15 mutations per megabase). We demonstrate remarkable genomic heterogeneity between independent F-hiPSC clones derived during the same round of reprogramming due to oligoclonal fibroblast populations. In contrast, blood-derived hiPSCs (B-hiPSCs) had fewer mutations and no UV damage but a high prevalence of acquired BCOR mutations (26.9% of lines). We reveal strong selection pressure for BCOR mutations in F-hiPSCs and B-hiPSCs and provide evidence that they arise in vitro. Directed differentiation of hiPSCs and RNA sequencing showed that BCOR mutations have functional consequences. Our work strongly suggests that detailed nucleotide-resolution characterization is essential before using hiPSCs.


Subject(s)
Induced Pluripotent Stem Cells , Cell Differentiation/genetics , Genomics , Humans , Mutation , Nucleotides , Proto-Oncogene Proteins/genetics , Repressor Proteins/genetics
4.
Nature ; 605(7910): 503-508, 2022 05.
Article in English | MEDLINE | ID: mdl-35545669

ABSTRACT

Mutations in the germline generates all evolutionary genetic variation and is a cause of genetic disease. Parental age is the primary determinant of the number of new germline mutations in an individual's genome1,2. Here we analysed the genome-wide sequences of 21,879 families with rare genetic diseases and identified 12 individuals with a hypermutated genome with between two and seven times more de novo single-nucleotide variants than expected. In most families (9 out of 12), the excess mutations came from the father. Two families had genetic drivers of germline hypermutation, with fathers carrying damaging genetic variation in DNA-repair genes. For five of the families, paternal exposure to chemotherapeutic agents before conception was probably a key driver of hypermutation. Our results suggest that the germline is well protected from mutagenic effects, hypermutation is rare, the number of excess mutations is relatively modest and most individuals with a hypermutated genome will not have a genetic disease.


Subject(s)
Genetic Diseases, Inborn , Germ Cells , Germ-Line Mutation , Age Factors , Genetic Diseases, Inborn/genetics , Germ-Line Mutation/genetics , Humans , Male , Mutagenesis/genetics , Mutation , Parents , Polymorphism, Single Nucleotide
5.
Am J Hum Genet ; 108(11): 2186-2194, 2021 11 04.
Article in English | MEDLINE | ID: mdl-34626536

ABSTRACT

Structural variation (SV) describes a broad class of genetic variation greater than 50 bp in size. SVs can cause a wide range of genetic diseases and are prevalent in rare developmental disorders (DDs). Individuals presenting with DDs are often referred for diagnostic testing with chromosomal microarrays (CMAs) to identify large copy-number variants (CNVs) and/or with single-gene, gene-panel, or exome sequencing (ES) to identify single-nucleotide variants, small insertions/deletions, and CNVs. However, individuals with pathogenic SVs undetectable by conventional analysis often remain undiagnosed. Consequently, we have developed the tool InDelible, which interrogates short-read sequencing data for split-read clusters characteristic of SV breakpoints. We applied InDelible to 13,438 probands with severe DDs recruited as part of the Deciphering Developmental Disorders (DDD) study and discovered 63 rare, damaging variants in genes previously associated with DDs missed by standard SNV, indel, or CNV discovery approaches. Clinical review of these 63 variants determined that about half (30/63) were plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21 and 500 bp in size and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.9%. Of particular interest were seven confirmed de novo variants in MECP2, which represent 35.0% of all de novo protein-truncating variants in MECP2 among DDD study participants. InDelible provides a framework for the discovery of pathogenic SVs that are most likely missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES across a broad range of genetic diseases.


Subject(s)
Developmental Disabilities/diagnosis , Developmental Disabilities/genetics , Exome Sequencing/methods , Child , Female , Humans , Male , Methyl-CpG-Binding Protein 2/genetics
6.
Am J Hum Genet ; 108(6): 1083-1094, 2021 06 03.
Article in English | MEDLINE | ID: mdl-34022131

ABSTRACT

Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.


Subject(s)
5' Untranslated Regions , Developmental Disabilities/etiology , Genetic Predisposition to Disease , Loss of Function Mutation , Child , Cohort Studies , DNA Copy Number Variations , Developmental Disabilities/pathology , Humans , MEF2 Transcription Factors/genetics , Exome Sequencing
7.
Gigascience ; 10(2)2021 02 16.
Article in English | MEDLINE | ID: mdl-33594436

ABSTRACT

BACKGROUND: Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health. FINDINGS: We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading. CONCLUSION: Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded >1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.


Subject(s)
High-Throughput Nucleotide Sequencing , Reading , Sequence Alignment , Software , Writing
8.
Gigascience ; 10(2)2021 02 16.
Article in English | MEDLINE | ID: mdl-33590861

ABSTRACT

BACKGROUND: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. FINDINGS: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. CONCLUSION: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , Genome , Genomics
9.
Nature ; 586(7831): 757-762, 2020 10.
Article in English | MEDLINE | ID: mdl-33057194

ABSTRACT

De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.


Subject(s)
DNA Mutational Analysis , Data Analysis , Databases, Genetic , Datasets as Topic , Delivery of Health Care/statistics & numerical data , Developmental Disabilities/genetics , Genetic Diseases, Inborn/genetics , Cohort Studies , DNA Copy Number Variations/genetics , Developmental Disabilities/diagnosis , Europe , Female , Genetic Diseases, Inborn/diagnosis , Germ-Line Mutation/genetics , Haploinsufficiency/genetics , Humans , Male , Mutation, Missense/genetics , Penetrance , Perinatal Death , Sample Size
10.
Science ; 367(6484)2020 03 20.
Article in English | MEDLINE | ID: mdl-32193295

ABSTRACT

Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.


Subject(s)
Genetic Variation , Genetics, Population , Genome, Human , Whole Genome Sequencing , Africa , Americas , Animals , Asia , DNA Copy Number Variations , Haplotypes , Hominidae/genetics , Humans , INDEL Mutation , Neanderthals/genetics , Oceania , Phylogeny , Polymorphism, Single Nucleotide , Population Density , Racial Groups/genetics
11.
Nat Methods ; 17(4): 414-421, 2020 04.
Article in English | MEDLINE | ID: mdl-32203388

ABSTRACT

Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (https://github.com/single-cell-genetics/cardelino), a computational method for inferring the clonal tree configuration and the clone of origin of individual cells assayed using single-cell RNA-seq (scRNA-seq). Cardelino flexibly integrates information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. We apply cardelino to a published cancer dataset and to newly generated matched scRNA-seq and exome-seq data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a role for cell division genes in somatic evolution in healthy skin.


Subject(s)
Fibroblasts/metabolism , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Software , Algorithms , Cell Cycle , Cell Proliferation , Humans , Melanoma , Mutation , Transcriptome
12.
Nat Commun ; 10(1): 4630, 2019 10 11.
Article in English | MEDLINE | ID: mdl-31604926

ABSTRACT

Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.


Subject(s)
Developmental Disabilities/genetics , Genetic Variation , Retroelements/physiology , Humans , Mutation Rate , Retroelements/genetics
13.
Bioinformatics ; 35(15): 2555-2561, 2019 08 01.
Article in English | MEDLINE | ID: mdl-30576415

ABSTRACT

MOTIVATION: Very low-depth sequencing has been proposed as a cost-effective approach to capture low-frequency and rare variation in complex trait association studies. However, a full characterization of the genotype quality and association power for very low-depth sequencing designs is still lacking. RESULTS: We perform cohort-wide whole-genome sequencing (WGS) at low depth in 1239 individuals (990 at 1× depth and 249 at 4× depth) from an isolated population, and establish a robust pipeline for calling and imputing very low-depth WGS genotypes from standard bioinformatics tools. Using genotyping chip, whole-exome sequencing (75× depth) and high-depth (22×) WGS data in the same samples, we examine in detail the sensitivity of this approach, and show that imputed 1× WGS recapitulates 95.2% of variants found by imputed GWAS with an average minor allele concordance of 97% for common and low-frequency variants. In our study, 1× further allowed the discovery of 140 844 true low-frequency variants with 73% genotype concordance when compared to high-depth WGS data. Finally, using association results for 57 quantitative traits, we show that very low-depth WGS is an efficient alternative to imputed GWAS chip designs, allowing the discovery of up to twice as many true association signals than the classical imputed GWAS design. AVAILABILITY AND IMPLEMENTATION: The HELIC genotype and WGS datasets have been deposited to the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/home): EGAD00010000518; EGAD00010000522; EGAD00010000610; EGAD00001001636, EGAD00001001637. The peakplotter software is available at https://github.com/wtsi-team144/peakplotter, the transformPhenotype app can be downloaded at https://github.com/wtsi-team144/transformPhenotype. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Genotype , Humans , Multifactorial Inheritance , Whole Genome Sequencing
14.
Nat Genet ; 50(11): 1574-1583, 2018 11.
Article in English | MEDLINE | ID: mdl-30275530

ABSTRACT

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.


Subject(s)
Chromosome Mapping , Genetic Loci , Genome , Haplotypes , Mice, Inbred Strains/genetics , Animals , Animals, Laboratory , Chromosome Mapping/veterinary , Haplotypes/genetics , Mice , Mice, Inbred BALB C/genetics , Mice, Inbred C3H/genetics , Mice, Inbred C57BL/genetics , Mice, Inbred CBA/genetics , Mice, Inbred DBA/genetics , Mice, Inbred NOD/genetics , Mice, Inbred Strains/classification , Molecular Sequence Annotation , Phylogeny , Polymorphism, Single Nucleotide , Species Specificity
15.
Science ; 360(6392): 1024-1027, 2018 06 01.
Article in English | MEDLINE | ID: mdl-29853687

ABSTRACT

Little is known regarding the first people to enter the Americas and their genetic legacy. Genomic analysis of the oldest human remains from the Americas showed a direct relationship between a Clovis-related ancestral population and all modern Central and South Americans as well as a deep split separating them from North Americans in Canada. We present 91 ancient human genomes from California and Southwestern Ontario and demonstrate the existence of two distinct ancestries in North America, which possibly split south of the ice sheets. A contribution from both of these ancestral populations is found in all modern Central and South Americans. The proportions of these two ancestries in ancient and modern populations are consistent with a coastal dispersal and multiple admixture events.


Subject(s)
Biological Evolution , Emigration and Immigration , Genome, Human , Population/genetics , California , Humans , Ontario
17.
Am J Hum Genet ; 101(2): 274-282, 2017 Aug 03.
Article in English | MEDLINE | ID: mdl-28757201

ABSTRACT

The Canaanites inhabited the Levant region during the Bronze Age and established a culture that became influential in the Near East and beyond. However, the Canaanites, unlike most other ancient Near Easterners of this period, left few surviving textual records and thus their origin and relationship to ancient and present-day populations remain unclear. In this study, we sequenced five whole genomes from ∼3,700-year-old individuals from the city of Sidon, a major Canaanite city-state on the Eastern Mediterranean coast. We also sequenced the genomes of 99 individuals from present-day Lebanon to catalog modern Levantine genetic diversity. We find that a Bronze Age Canaanite-related ancestry was widespread in the region, shared among urban populations inhabiting the coast (Sidon) and inland populations (Jordan) who likely lived in farming societies or were pastoral nomads. This Canaanite-related ancestry derived from mixture between local Neolithic populations and eastern migrants genetically related to Chalcolithic Iranians. We estimate, using linkage-disequilibrium decay patterns, that admixture occurred 6,600-3,550 years ago, coinciding with recorded massive population movements in Mesopotamia during the mid-Holocene. We show that present-day Lebanese derive most of their ancestry from a Canaanite-related population, which therefore implies substantial genetic continuity in the Levant since at least the Bronze Age. In addition, we find Eurasian ancestry in the Lebanese not present in Bronze Age or earlier Levantines. We estimate that this Eurasian ancestry arrived in the Levant around 3,750-2,170 years ago during a period of successive conquests by distant populations.


Subject(s)
DNA, Mitochondrial/genetics , Ethnicity/genetics , Genetics, Population/methods , Genome, Human/genetics , Genetic Variation/genetics , History, Ancient , Humans , Lebanon , Linkage Disequilibrium , Male , White People/genetics
19.
Am J Hum Genet ; 100(6): 865-884, 2017 Jun 01.
Article in English | MEDLINE | ID: mdl-28552196

ABSTRACT

Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum.


Subject(s)
Anthropometry , Genome, Human , Genome-Wide Association Study , Quantitative Trait Loci/genetics , Sequence Analysis, DNA/methods , Body Height/genetics , Cohort Studies , DNA Methylation/genetics , Databases, Genetic , Female , Genetic Variation , Humans , Lipodystrophy/genetics , Male , Meta-Analysis as Topic , Obesity/genetics , Physical Chromosome Mapping , Sex Characteristics , Syndrome , United Kingdom
20.
Nature ; 546(7658): 370-375, 2017 06 15.
Article in English | MEDLINE | ID: mdl-28489815

ABSTRACT

Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.


Subject(s)
Genetic Variation/genetics , Induced Pluripotent Stem Cells/metabolism , Cells, Cultured , Cellular Reprogramming/genetics , DNA Copy Number Variations/genetics , Gene Expression Regulation/genetics , Genotype , Humans , Organ Specificity , Phenotype , Quality Control , Quantitative Trait Loci/genetics , Transcriptome/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...