Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 22(1): 459, 2021 Sep 25.
Article in English | MEDLINE | ID: mdl-34563119

ABSTRACT

BACKGROUND: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. RESULTS: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. CONCLUSIONS: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.


Subject(s)
Genetics, Population , Genome, Human , Haplotypes , Humans , Polymorphism, Single Nucleotide
2.
PLoS Genet ; 9(11): e1003925, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24244192

ABSTRACT

The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse--which today is reflected by shorter, older ancestry tracts--consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse--reflected by longer, younger tracts--is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.


Subject(s)
Black People/genetics , Gene Flow , Genetics, Population , Indians, North American/genetics , White People/genetics , Caribbean Region , DNA, Mitochondrial/genetics , Demography , Genomics , Haplotypes , Hispanic or Latino/genetics , Humans
3.
PLoS Genet ; 9(12): e1004023, 2013.
Article in English | MEDLINE | ID: mdl-24385924

ABSTRACT

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.


Subject(s)
Gene Frequency/genetics , Genetics, Population , Human Migration , Indians, North American/genetics , Black People/genetics , Chromosome Mapping , Exome , Genome, Human , Hispanic or Latino/genetics , Human Genome Project , Humans , Mexican Americans/genetics , Mexico , Puerto Rico , Racial Groups/genetics , White People/genetics
4.
PLoS Genet ; 8(1): e1002397, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22253600

ABSTRACT

North African populations are distinct from sub-Saharan Africans based on cultural, linguistic, and phenotypic attributes; however, the time and the extent of genetic divergence between populations north and south of the Sahara remain poorly understood. Here, we interrogate the multilayered history of North Africa by characterizing the effect of hypothesized migrations from the Near East, Europe, and sub-Saharan Africa on current genetic diversity. We present dense, genome-wide SNP genotyping array data (730,000 sites) from seven North African populations, spanning from Egypt to Morocco, and one Spanish population. We identify a gradient of likely autochthonous Maghrebi ancestry that increases from east to west across northern Africa; this ancestry is likely derived from "back-to-Africa" gene flow more than 12,000 years ago (ya), prior to the Holocene. The indigenous North African ancestry is more frequent in populations with historical Berber ethnicity. In most North African populations we also see substantial shared ancestry with the Near East, and to a lesser extent sub-Saharan Africa and Europe. To estimate the time of migration from sub-Saharan populations into North Africa, we implement a maximum likelihood dating method based on the distribution of migrant tracts. In order to first identify migrant tracts, we assign local ancestry to haplotypes using a novel, principal component-based analysis of three ancestral populations. We estimate that a migration of western African origin into Morocco began about 40 generations ago (approximately 1,200 ya); a migration of individuals with Nilotic ancestry into Egypt occurred about 25 generations ago (approximately 750 ya). Our genomic data reveal an extraordinarily complex history of migrations, involving at least five ancestral populations, into North Africa.


Subject(s)
Black People/genetics , Gene Flow/genetics , Genetic Variation , Population Dynamics , Population , Africa South of the Sahara/ethnology , Africa, Northern , Black People/history , DNA, Mitochondrial/genetics , Egypt, Ancient , Emigration and Immigration , Europe , Gene Pool , Genomics , Genotype , Haplotypes , History, Ancient , Humans , Middle East , Morocco , Polymorphism, Single Nucleotide , White People/genetics , White People/history
5.
PLoS Genet ; 7(9): e1002280, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21935354

ABSTRACT

Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.


Subject(s)
DNA Mutational Analysis/methods , Genes, Synthetic , Genetic Variation , Genome-Wide Association Study/methods , Thrombophilia/genetics , Alleles , Base Sequence , Female , Genetic Predisposition to Disease , Genome, Human , Genotype , Haplotypes , Humans , Male , Pedigree , Reference Standards , Risk Assessment , Sequence Alignment , Sequence Analysis, DNA
6.
J Hum Genet ; 54(9): 547-9, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19629136

ABSTRACT

Multiple sclerosis (MS) is a complex neurological trait. Allelic variation in the MHC class II region exerts the single strongest effect on MS genetic risk. The clinical onset of the disease is extremely variable, and can range from the first to the ninth decade of life. Epidemiological studies have suggested a modest genetic component to the age of onset (AO) of MS. Previous studies have shown that HLA-DRB1*1501 may be associated with a younger AO. Here, we sought to uncover any effect of HLA-DRB1*1501 on the AO of MS in a large Canadian cohort. A total of 1816 MS patients were genotyped for HLA-DRB1. Patients carrying HLA-DRB1*1501 were shown to have a small, but significantly lower, AO than patients without the allele (P=0.03). HLA-DRB1*1501 was also shown to reduce the mean AO in both progressive and relapsing forms of the disease. An investigation of parent-of-origin effects indicated that the lower AO for HLA-DRB1*1501 patients arises from maternally transmitted HLA-DRB1*1501 haplotypes (maternal HLA-DRB1*1501 mean AO=28.4 years, paternal=30.3 years; P=0.009). HLA-DRB1*1501 exerts a modest, but significant effect on the AO of all forms of MS. Parent-of-origin effects at the MHC are further implicated in MS disease pathogenesis.


Subject(s)
HLA-DR Antigens/genetics , Haplotypes/genetics , Multiple Sclerosis/genetics , Adult , Age of Onset , Alleles , Canada , Female , Genetic Predisposition to Disease , Genotype , HLA-DR Antigens/immunology , HLA-DRB1 Chains , Humans , Male , Parents , Phenotype , Risk Factors
7.
Genome Biol ; 9(11): R165, 2008.
Article in English | MEDLINE | ID: mdl-19025653

ABSTRACT

Whole genome tiling arrays are a key tool for profiling global genetic and expression variation. In this study we present our methods for detecting transcript level variation, splicing variation and allele specific expression in Arabidopsis thaliana. We also developed a generalized hidden Markov model for profiling transcribed fragment variation de novo. Our study demonstrates that whole genome tiling arrays are a powerful platform for dissecting natural transcriptome variation at multi-dimension and high resolution.


Subject(s)
Arabidopsis/genetics , Gene Expression Profiling , Genome, Plant , Polymorphism, Genetic , Alternative Splicing , Arabidopsis/metabolism , Gene Expression Regulation, Plant , Markov Chains
8.
Proc Natl Acad Sci U S A ; 103(39): 14412-6, 2006 Sep 26.
Article in English | MEDLINE | ID: mdl-16971485

ABSTRACT

Many Saccharomyces cerevisiae duplicate genes that were derived from an ancient whole-genome duplication (WGD) unexpectedly show a small synonymous divergence (K(S)), a higher sequence similarity to each other than to orthologues in Saccharomyces bayanus, or slow evolution compared with the orthologue in Kluyveromyces waltii, a non-WGD species. This decelerated evolution was attributed to gene conversion between duplicates. Using approximately 300 WGD gene pairs in four species and their orthologues in non-WGD species, we show that codon-usage bias and protein-sequence conservation are two important causes for decelerated evolution of duplicate genes, whereas gene conversion is effective only in the presence of strong codon-usage bias or protein-sequence conservation. Furthermore, we find that change in mutation pattern or in tDNA copy number changed codon-usage bias and increased the K(S) distance between K. waltii and S. cerevisiae. Intriguingly, some proteins showed fast evolution before the radiation of WGD species but little or no sequence divergence between orthologues and paralogues thereafter, indicating that functional conservation after the radiation may also be responsible for decelerated evolution in duplicates.


Subject(s)
Codon/genetics , Gene Duplication , Phylogeny , Yeasts/genetics , DNA, Fungal/genetics , Genes, Fungal/genetics
9.
Mol Biol Evol ; 23(6): 1136-43, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16527865

ABSTRACT

In Saccharomyces, an ancient whole-genome duplication (WGD) and widespread duplicate gene deletion resulted in extensive reorganization of adjacent gene relationships. We have studied the evolution of adjacent gene pairs' identity, orientation, and spacing following whole-genome duplication and deletion (WGD-D) using comparative genomic analyses and simulations. Surveying adjacent gene organization across the Saccharomyces species complex, we find a genome-wide bias toward divergently and convergently transcribed gene pairs in all species but a reduction in this bias in the species that underwent WGD-D. Among neutral models of WGD-D, only single-gene deletion can produce the appropriate reduction in orientation bias and recapitulate the pattern of short, highly dispersed deletions we observe in Saccharomyces cerevisiae. To characterize the dynamics of WGD-D, we trace the conservation and creation of adjacent gene pairs along the S. cerevisiae lineage. We find that newly created adjacencies have a tandem orientation bias, while adjacencies conserved from prior to WGD-D have the same divergent-convergent bias as found in the species that diverged before WGD. We also find that adjacent gene pairs produced by WGD-D gained greater intergenic spacing but that this is reduced in the older adjacencies. Given this, and the preponderance of short deleted blocks, we argue that the deletion phase of WGD-D occurred primarily by small inactivating mutations followed by numerous small deletions. Newly created adjacent gene pairs also have an initial increase in mean log2 expression ratios and maximal expression levels, suggesting that increased intergenic spacing caused a genome-wide reduction in transcriptional interference.


Subject(s)
Gene Deletion , Gene Duplication , Genome, Fungal , Saccharomyces cerevisiae/genetics , Saccharomyces/genetics , Evolution, Molecular , Gene Expression , Genes, Fungal , Oligonucleotide Array Sequence Analysis
10.
Proc Natl Acad Sci U S A ; 103(7): 2232-6, 2006 Feb 14.
Article in English | MEDLINE | ID: mdl-16461903

ABSTRACT

The question of how duplicate genes are retained in a population remains controversial. The duplication-degeneration-complementation model, which involves no positive selection, stipulates a higher retention rate of duplicate genes in a small population than in a large one. This model has been accepted by many evolutionists. However, we found considerably more retentions and fewer losses of duplicate genes in the mouse genome than in the human genome, although the population size of rodents is in general larger than that of primates. Indeed, in nearly every interval of synonymous divergence between duplicate genes, the number of gene retentions in mouse is larger than that in human. Our findings suggest a more important role of positive selection in duplicate retention than duplication-degeneration-complementation. In addition, certain functional categories show a higher tendency of lineage-specific expansion than expected, suggesting lineage-specific selection or functional bias in retained duplicates.


Subject(s)
Gene Duplication , Genes, Duplicate/genetics , Genome, Human/genetics , Genome/genetics , Selection, Genetic , Animals , Cell Lineage/genetics , Humans , Mice , Models, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...