ABSTRACT
Since the 1960s, East African athletes, mainly from Kenya and Ethiopia, have dominated long-distance running events in both the male and female categories. Further demographic studies have shown that two ethnic groups are overrepresented among elite endurance runners in each of these countries: the Kalenjin, from Kenya, and the Oromo, from Ethiopia, raising the possibility that this dominance results from genetic or/and cultural factors. However, looking at the life history of these athletes or at loci previously associated with endurance athletic performance, no compelling explanation has emerged. Here, we used a population approach to identify peaks of genetic differentiation for these two ethnicities and compared the list of genes close to these regions with a list, manually curated by us, of genes that have been associated with traits possibly relevant to endurance running in GWAS studies, and found a significant enrichment in both populations (Kalenjin, P = 0.048, and Oromo, P = 1.6x10-5). Those traits are mainly related to anthropometry, circulatory and respiratory systems, energy metabolism, and calcium homeostasis. Our results reinforce the notion that endurance running is a systemic activity with a complex genetic architecture, and indicate new candidate genes for future studies. Finally, we argue that a deterministic relationship between genetics and sports must be avoided, as it is both scientifically incorrect and prone to reinforcing population (racial) stereotyping.
Subject(s)
Athletic Performance , Running , Black People/genetics , Ethnicity/genetics , Female , Humans , Male , Physical Endurance/geneticsABSTRACT
Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.
ABSTRACT
Pyruvate kinase (PK), encoded by the PKLR gene, is a key player in glycolysis controlling the integrity of erythrocytes. Due to Plasmodium selection, mutations for PK deficiency, which leads to hemolytic anemia, are associated with resistance to malaria in sub-Saharan Africa and with susceptibility to intracellular pathogens in experimental models. In this case-control study, we enrolled 4,555 individuals and investigated whether PKLR single nucleotide polymorphisms (SNPs) putatively selected for malaria resistance are associated with susceptibility to leprosy across Brazil (Manaus-North; Salvador-Northeast; Rondonópolis-Midwest and Rio de Janeiro-Southeast) and with tuberculosis in Mozambique. Haplotype T/G/G (rs1052176/rs4971072/rs11264359) was associated with leprosy susceptibility in Rio de Janeiro (OR = 2.46, p = 0.00001) and Salvador (OR = 1.57, p = 0.04), and with tuberculosis in Mozambique (OR = 1.52, p = 0.07). This haplotype downregulates PKLR expression in nerve and skin, accordingly to GTEx, and might subtly modulate ferritin and haptoglobin levels in serum. Furthermore, we observed genetic signatures of positive selection in the HCN3 gene (xpEHH>2 -recent selection) in Europe but not in Africa, involving 6 SNPs which are PKLR/HCN3 eQTLs. However, this evidence was not corroborated by the other tests (FST, Tajima's D and iHS). Altogether, we provide evidence that a common PKLR locus in Africans contribute to mycobacterial susceptibility in African descent populations and also highlight, for first, PKLR as a susceptibility gene for leprosy and TB.
Subject(s)
Malaria/genetics , Polymorphism, Single Nucleotide , Pyruvate Kinase/genetics , Adult , Brazil , Case-Control Studies , Female , Gene Frequency , Genetic Predisposition to Disease , Haplotypes , Humans , Linkage Disequilibrium , Logistic Models , Male , Middle Aged , Mozambique , Pyruvate Kinase/deficiency , Young AdultABSTRACT
BACKGROUND/OBJECTIVES: Admixed populations are a resource to study the global genetic architecture of complex phenotypes, which is critical, considering that non-European populations are severely underrepresented in genomic studies. Here, we study the genetic architecture of BMI in children, young adults, and elderly individuals from the admixed population of Brazil. SUBJECTS/METHODS: Leveraging admixture in Brazilians, whose chromosomes are mosaics of fragments of Native American, European, and African origins, we used genome-wide data to perform admixture mapping/fine-mapping of body mass index (BMI) in three Brazilian population-based cohorts from Northeast (Salvador), Southeast (Bambuí), and South (Pelotas). RESULTS: We found significant associations with African-associated alleles in children from Salvador (PALD1 and ZMIZ1 genes), and in young adults from Pelotas (NOD2 and MTUS2 genes). More importantly, in Pelotas, rs114066381, mapped in a potential regulatory region, is significantly associated only in females (p = 2.76e-06). This variant is rare in Europeans but with frequencies of ~3% in West Africa and has a strong female-specific effect (95% CI: 2.32-5.65 kg/m2 per each A allele). We confirmed this sex-specific association and replicated its strong effect for an adjusted fat mass index in the same Pelotas cohort, and for BMI in another Brazilian cohort from São Paulo (Southeast Brazil). A meta-analysis confirmed the significant association. Remarkably, we observed that while the frequency of rs114066381-A allele ranges from 0.8 to 2.1% in the studied populations, it attains ~9% among women with morbid obesity from Pelotas, São Paulo, and Bambuí. The effect size of rs114066381 is at least five times higher than the FTO SNPs rs9939609 and rs1558902, already emblematic for their high effects. CONCLUSIONS: We identified six candidate SNPs associated with BMI. rs114066381 stands out for its high effect that was replicated and its high frequency in women with morbid obesity. We demonstrate how admixed populations are a source of new relevant phenotype-associated genetic variants.
Subject(s)
Body Mass Index , Genetics, Population , Polymorphism, Single Nucleotide , Aged , Aged, 80 and over , Alleles , Brazil , Child , Child, Preschool , Chromosome Mapping , Female , Humans , Male , Middle Aged , Phenotype , Regulatory Sequences, Nucleic Acid , Sex Factors , Young AdultABSTRACT
The Transatlantic Slave Trade transported more than 9 million Africans to the Americas between the early 16th and the mid-19th centuries. We performed a genome-wide analysis using 6,267 individuals from 25 populations to infer how different African groups contributed to North-, South-American, and Caribbean populations, in the context of geographic and geopolitical factors, and compared genetic data with demographic history records of the Transatlantic Slave Trade. We observed that West-Central Africa and Western Africa-associated ancestry clusters are more prevalent in northern latitudes of the Americas, whereas the South/East Africa-associated ancestry cluster is more prevalent in southern latitudes of the Americas. This pattern results from geographic and geopolitical factors leading to population differentiation. However, there is a substantial decrease in the between-population differentiation of the African gene pool within the Americas, when compared with the regions of origin from Africa, underscoring the importance of historical factors favoring admixture between individuals with different African origins in the New World. This between-population homogenization in the Americas is consistent with the excess of West-Central Africa ancestry (the most prevalent in the Americas) in the United States and Southeast-Brazil, with respect to historical-demography expectations. We also inferred that in most of the Americas, intercontinental admixture intensification occurred between 1750 and 1850, which correlates strongly with the peak of arrivals from Africa. This study contributes with a population genetics perspective to the ongoing social, cultural, and political debate regarding ancestry, admixture, and the mestizaje process in the Americas.
Subject(s)
Black People/genetics , Enslavement/history , Gene Pool , Genome, Human , Human Migration/history , Africa , Americas , History, 16th Century , History, 17th Century , History, 18th Century , History, 19th Century , Humans , PhylogeographyABSTRACT
Age-related cognitive decline (ACD) is the gradual process of decreasing of cognitive function over age. Most genetic risk factors for ACD have been identified in European populations and there are no reports in admixed Latin American individuals. We performed admixture mapping, genome-wide association analysis (GWAS), and fine-mapping to examine genetic factors associated with 15-year cognitive trajectory in 1,407 Brazilian older adults, comprising 14,956 Mini-Mental State Examination measures. Participants were enrolled as part of the Bambuí-Epigen Cohort Study of Aging. Our admixture mapping analysis identified a genomic region (3p24.2) in which increased Native American ancestry was significantly associated with faster ACD. Fine-mapping of this region identified a single nucleotide polymorphism (SNP) rs142380904 (ß = -0.044, SE = 0.01, p = 7.5 × 10-5) associated with ACD. In addition, our GWAS identified 24 associated SNPs, most in genes previously reported to influence cognitive function. The top six associated SNPs accounted for 18.5% of the ACD variance in our data. Furthermore, our longitudinal study replicated previous GWAS hits for cognitive decline and Alzheimer's disease. Our 15-year longitudinal study identified both ancestry-specific and cosmopolitan genetic variants associated with ACD in Brazilians, highlighting the need for more trans-ancestry genomic studies, especially in underrepresented ethnic groups.
Subject(s)
Aging , Cognitive Dysfunction/genetics , Polymorphism, Single Nucleotide , Age Factors , Aged , Brazil/epidemiology , Cognition , Cognitive Dysfunction/etiology , Cohort Studies , Female , Follow-Up Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Male , Middle AgedABSTRACT
After the colonization of the Americas by Europeans and the consequent Trans-Atlantic Slave Trade, most Native American populations in eastern Brazil disappeared or went through an admixture process that configured a population composed of three main genetic components: the European, the sub-Saharan African, and the Native American. The study of the Native American genetic history is challenged by the lack of availability of genome-wide samples from Native American populations, the technical difficulties to develop ancient DNA studies, and the low proportions of the Native American component in the admixed Brazilian populations (on average 7%). We analyzed genome-wide data of 5,825 individuals from three locations of eastern Brazil: Salvador (North-East), Bambui (South-East), and Pelotas (South) and we reconstructed populations that emulate the Native American groups that were living in the 16th century around the sampling locations. This genetic reconstruction was performed after local ancestry analysis of the admixed Brazilian populations, through the rearrangement of the Native American haplotypes into reconstructed individuals with full Native American ancestry (51 reconstructed individuals in Salvador, 45 in Bambui, and 197 in Pelotas). We compared the reconstructed populations with nonadmixed Native American populations from other regions of Brazil through haplotype-based methods. Our results reveal a population structure shaped by the dichotomy of Tupi-/Jê-speaking ancestry related groups. We also show evidence of a decrease of the diversity of nonadmixed Native American groups after the European contact, in contrast with the reconstructed populations, suggesting a reservoir of the Native American genetic diversity within the admixed Brazilian population.
Subject(s)
Indians, South American/genetics , Brazil , Genetic Variation , Genome, Human , Geography , Haplotypes , Humans , Population DensityABSTRACT
Populations in sub-Saharan Africa have historically been exposed to intense selection from chronic infection with falciparum malaria. Interestingly, populations with the highest malaria intensity can be identified by the increased occurrence of endemic Burkitt Lymphoma (eBL), a pediatric cancer that affects populations with intense malaria exposure, in the so called "eBL belt" in sub-Saharan Africa. However, the effects of intense malaria exposure and sub-Saharan populations' genetic histories remain poorly explored. To determine if historical migrations and intense malaria exposure have shaped the genetic composition of the eBL belt populations, we genotyped ~4.3 million SNPs in 1,708 individuals from Ghana and Northern Uganda, located on opposite sides of eBL belt and with ≥ 7 months/year of intense malaria exposure and published evidence of high incidence of BL. Among 35 Ghanaian tribes, we showed a predominantly West-Central African ancestry and genomic footprints of gene flow from Gambian and East African populations. In Uganda, the North West population showed a predominantly Nilotic ancestry, and the North Central population was a mixture of Nilotic and Southern Bantu ancestry, while the Southwest Ugandan population showed a predominant Southern Bantu ancestry. Our results support the hypothesis of diverse ancestral origins of the Ugandan, Kenyan and Tanzanian Great Lakes African populations, reflecting a confluence of Nilotic, Cushitic and Bantu migrations in the last 3000 years. Natural selection analyses suggest, for the first time, a strong positive selection signal in the ATP2B4 gene (rs10900588) in Northern Ugandan populations. These findings provide important baseline genomic data to facilitate disease association studies, including of eBL, in eBL belt populations.
Subject(s)
Burkitt Lymphoma/genetics , Gene Flow , Malaria, Falciparum/genetics , Selection, Genetic , Adolescent , Africa South of the Sahara , Aged , Burkitt Lymphoma/epidemiology , Case-Control Studies , Child , Child, Preschool , Endemic Diseases , Female , Genetics, Population , Genome-Wide Association Study , Ghana/epidemiology , Human Migration , Humans , Incidence , Infant , Infant, Newborn , Malaria, Falciparum/epidemiology , Male , Middle Aged , Models, Genetic , Plasma Membrane Calcium-Transporting ATPases/genetics , Polymorphism, Single Nucleotide , Uganda/epidemiologyABSTRACT
OBJECTIVES: To investigate the association between African and Native American genomic ancestry and long-term cognitive trajectories in admixed Brazilians. DESIGN: Population-based longitudinal study. SETTING: Bambui-Epigen (Brazil) cohort study. PARTICIPANTS: Adults aged 60 and older (N=1,215) MEASUREMENTS: Participants were followed from January 1997 to December 2011. Cognitive function was assessed annually using the Mini-Mental State Examination (MMSE), totaling 12,208 measurements. We used linear mixed-effects pattern models to assess MMSE score trajectories. Ancestry was assessed using a genome-wide approach. RESULTS: After adjustments for covariates, the highest quintile of African ancestry was associated with poorer baseline cognitive performance (ß=-0.73, 95% confidence interval (CI)=-1.36 to -0.11) but not with cognitive trajectory. Educational level modified the baseline association between highest African ancestry and cognitive performance in that the association was observed only in those with very low (<4 years) education (ß=-1.13, 95% CI=-2.02 to -0.23). No association was found between Native American ancestry and baseline cognitive function or its trajectory. CONCLUSION: Genomic African and Native American ancestry levels had no prognostic value for age-related cognitive decline in this admixed population.
Subject(s)
Black People/genetics , Cognition , Cognitive Dysfunction/ethnology , Cognitive Dysfunction/genetics , Indians, North American/genetics , Adult , Aged , Aged, 80 and over , Aging/ethnology , Aging/genetics , Brazil/epidemiology , Cognitive Dysfunction/epidemiology , Educational Status , Female , Genome-Wide Association Study , Genomics , Humans , Linear Models , Longitudinal Studies , Male , Middle Aged , Risk FactorsABSTRACT
Cryptic relatedness is a confounding factor in genetic diversity and genetic association studies. Development of strategies to reduce cryptic relatedness in a sample is a crucial step for downstream genetic analyses. This study uses a node selection algorithm, based on network degrees of centrality, to evaluate its applicability and impact on evaluation of genetic diversity and population stratification. 1,036 Guzerá (Bos indicus) females were genotyped using Illumina Bovine SNP50 v2 BeadChip. Four strategies were compared. The first and second strategies consist on a iterative exclusion of most related individuals based on PLINK kinship coefficient (φij) and VanRaden's φij, respectively. The third and fourth strategies were based on a node selection algorithm. The fourth strategy, Network G matrix, preserved the larger number of individuals with a better diversity and representation from the initial sample. Determining the most probable number of populations was directly affected by the kinship metric. Network G matrix was the better strategy for reducing relatedness due to producing a larger sample, with more distant individuals, a more similar distribution when compared with the full data set in the MDS plots and keeping a better representation of the population structure. Resampling strategies using VanRaden's φij as a relationship metric was better to infer the relationships among individuals. Moreover, the resampling strategies directly impact the genomic inflation values in genomewide association studies. The use of the node selection algorithm also implies better selection of the most central individuals to be removed, providing a more representative sample.
Subject(s)
Cattle/genetics , Genetic Variation , Genomics/methods , Algorithms , Animals , Datasets as Topic , Female , Genotyping Techniques/veterinaryABSTRACT
While multiallelic copy number variation (mCNV) loci are a major component of genomic variation, quantifying the individual copy number of a locus and defining genotypes is challenging. Few methods exist to study how mCNV genetic diversity is apportioned within and between populations (i.e. to define the population genetic structure of mCNV). These inferences are critical in populations with a small effective size, such as Amerindians, that may not fit the Hardy-Weinberg model due to inbreeding, assortative mating, population subdivision, natural selection or a combination of these evolutionary factors. We propose a likelihood-based method that simultaneously infers mCNV allele frequencies and the population structure parameter f, which quantifies the departure of homozygosity from the Hardy-Weinberg expectation. This method is implemented in the freely available software CNVice, which also infers individual genotypes using information from both the population and from trios, if available. We studied the population genetics of five immune-related mCNV loci associated with complex diseases (beta-defensins, CCL3L1/CCL4L1, FCGR3A, FCGR3B and FCGR2C) in 12 traditional Native American populations and found that the population structure parameters inferred for these mCNVs are comparable to but lower than those for single nucleotide polymorphisms studied in the same populations.
Subject(s)
Alleles , Gene Frequency/immunology , Genetic Loci/immunology , Models, Genetic , Polymorphism, Single Nucleotide , Female , Genetics, Population , Humans , Indians, South American , Male , Multilocus Sequence Typing , PeruABSTRACT
Several genome-wide association studies have been conducted to investigate the influence of genetic polymorphisms in the development of allergic diseases, but few of them have included the X chromosome. The aim of present study was to perform an X chromosome-wide association study (X-WAS) for asthma symptoms. The study included 1307 children of which 294 were asthma cases. DNA was genotyped using 2.5 HumanOmni Beadchip from Illumina. Statistical analyses were performed in PLINK 1.9, MACH 1.0 and Minimac2. The variant rs12007907 (g.29483892C>A) in IL1RAPL gene was suggestively associated with asthma symptoms in discovery set (odds ratio (OR)=0.49, 95% confidence interval (CI): 0.37-0.67; P=3.33 × 10-6). This result was replicated in the ProAr cohort in men only (OR=0.45, 95% CI: 0.21-0.95; P=0.038). Furthermore, investigating the functional role of the rs12007907 on the production a Th2-type cytokine, IL-13, we found a negative association between the minor allele A with IL-13 production in the discovery set (P=0.044). Gene-based analysis revealed that NUDT10 was the most consistently associated with asthma symptoms in discovery sample. In conclusion, the rs12007907 variant in IL1RAPL gene was negatively associated with asthma and IL-13 production in our study and a sex-specific association was observed in one of the validation samples. It suggests an effect on asthma susceptibility and may explain differences in severe asthma frequency between women and men.
Subject(s)
Asthma/genetics , Interleukin-1 Receptor Accessory Protein/genetics , Polymorphism, Single Nucleotide , Case-Control Studies , Child , Female , Humans , Interleukin-13/genetics , Interleukin-13/metabolism , Latin America , Male , Pyrophosphatases/genetics , Sex FactorsABSTRACT
The Brazilian population is considered to be highly admixed. The main contributing ancestral populations were European and African, with Amerindians contributing to a lesser extent. The aims of this study were to provide a resource for determining and quantifying individual continental ancestry using the smallest number of SNPs possible, thus allowing for a cost- and time-efficient strategy for genomic ancestry determination. We identified and validated a minimum set of 192 ancestry informative markers (AIMs) for the genetic ancestry determination of Brazilian populations. These markers were selected on the basis of their distribution throughout the human genome, and their capacity of being genotyped on widely available commercial platforms. We analyzed genotyping data from 6487 individuals belonging to three Brazilian cohorts. Estimates of individual admixture using this 192 AIM panels were highly correlated with estimates using ~370 000 genome-wide SNPs: 91%, 92%, and 74% of, respectively, African, European, and Native American ancestry components. Besides that, 192 AIMs are well distributed among populations from these ancestral continents, allowing greater freedom in future studies with this panel regarding the choice of reference populations. We also observed that genetic ancestry inferred by AIMs provides similar association results to the one obtained using ancestry inferred by genomic data (370 K SNPs) in a simple regression model with rs1426654, related to skin pigmentation, genotypes as dependent variable. In conclusion, these markers can be used to identify and accurately quantify ancestry of Latin Americans or US Hispanics/Latino individuals, in particular in the context of fine-mapping strategies that require the quantification of continental ancestry in thousands of individuals.
Subject(s)
Genome, Human , Polymorphism, Single Nucleotide , Population/genetics , American Indian or Alaska Native , Black People , Brazil , Genetic Markers , Humans , Pedigree , Skin Pigmentation/genetics , White PeopleABSTRACT
The study objective is to examine the role of African genome origin on baseline and 11-year blood pressure trajectories in community-based ethnoracially admixed older adults in Brazil. Data come from 1272 participants (aged ≥60 years) of the Bambui cohort study of aging during 11 years of follow-up. Outcome measures were systolic blood pressure, diastolic blood pressure, and hypertension control. Potential confounding variables were demographic characteristics, socioeconomic position (schooling and household income), and health indicators (smoking, sedentary lifestyle, high-density lipoprotein cholesterol, waist circumference, diabetes mellitus, and cardiovascular diseases), including antihypertensive drug use. We used 370 539 single-nucleotide polymorphisms to estimate each individual's African, European, and Native American trihybrid ancestry proportions. Median African, European, and Native American ancestry were 9.6%, 84.0%, and 5.3%, respectively. Among those with African ancestry, 59.4% came from East and 40.6% from West Africa. Baseline systolic and diastolic blood pressure, controlled hypertension, and their respective trajectories, were not significantly (P>0.05) associated with level (in quintiles) of African genomic ancestry. Similar results were found for West and East African subcontinental origins. Lower schooling level (<4 years versus higher) showed a significant and positive association with systolic blood pressure (Adjusted ß=2.92; 95% confidence interval, 0.85-4.99). Lower monthly household income per capita (Subject(s)
Aging/ethnology
, Black People/genetics
, Blood Pressure/physiology
, Forecasting
, Genomics/methods
, Hypertension/ethnology
, Aged
, Aging/genetics
, Blood Pressure Determination
, Brazil/epidemiology
, Female
, Follow-Up Studies
, Humans
, Hypertension/genetics
, Hypertension/physiopathology
, Male
, Middle Aged
, Prevalence
, Retrospective Studies
, Risk Factors
, Socioeconomic Factors
ABSTRACT
BACKGROUND: Asthma is a chronic disease of the airways and, despite the advances in the knowledge of associated genetic regions in recent years, their mechanisms have yet to be explored. Several genome-wide association studies have been carried out in recent years, but none of these have involved Latin American populations with a high level of miscegenation, as is seen in the Brazilian population. METHODS: 1246 children were recruited from a longitudinal cohort study in Salvador, Brazil. Asthma symptoms were identified in accordance with an International Study of Asthma and Allergies in Childhood (ISAAC) questionnaire. Following quality control, 1,877,526 autosomal SNPs were tested for association with childhood asthma symptoms by logistic regression using an additive genetic model. We complemented the analysis with an estimate of the phenotypic variance explained by common genetic variants. Replications were investigated in independent Mexican and US Latino samples. RESULTS: Two chromosomal regions reached genome-wide significance level for childhood asthma symptoms: the 14q11 region flanking the DAD1 and OXA1L genes (rs1999071, MAF 0.32, OR 1.78, 95% CI 1.45-2.18, p-value 2.83 × 10(-8)) and 15q22 region flanking the FOXB1 gene (rs10519031, MAF 0.04, OR 3.0, 95% CI 2.02-4.49, p-value 6.68 × 10(-8) and rs8029377, MAF 0.03, OR 2.49, 95% CI 1.76-3.53, p-value 2.45 × 10(-7)). eQTL analysis suggests that rs1999071 regulates the expression of OXA1L gene. However, the original findings were not replicated in the Mexican or US Latino samples. CONCLUSIONS: We conclude that the 14q11 and 15q22 regions may be associated with asthma symptoms in childhood.
Subject(s)
Asthma/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Child , Child, Preschool , Chromosomes, Human, Pair 14/genetics , Female , Humans , Latin America , Male , Metabolic Networks and Pathways/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics , Principal Component AnalysisABSTRACT
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6-8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes.
Subject(s)
Genetics, Population , Mutation , Black People/genetics , Brazil , Humans , White People/geneticsABSTRACT
Brazil never had segregation laws defining membership of an ethnoracial group. Thus, the composition of the Brazilian population is mixed, and its ethnoracial classification is complex. Previous studies showed conflicting results on the correlation between genome ancestry and ethnoracial classification in Brazilians. We used 370,539 Single Nucleotide Polymorphisms to quantify this correlation in 5,851 community-dwelling individuals in the South (Pelotas), Southeast (Bambui) and Northeast (Salvador) Brazil. European ancestry was predominant in Pelotas and Bambui (median = 85.3% and 83.8%, respectively). African ancestry was highest in Salvador (median = 50.5%). The strength of the association between the phenotype and median proportion of African ancestry varied largely across populations, with pseudo R(2) values of 0.50 in Pelotas, 0.22 in Bambui and 0.13 in Salvador. The continuous proportion of African genomic ancestry showed a significant S-shape positive association with self-reported Blacks in the three sites, and the reverse trend was found for self reported Whites, with most consistent classifications in the extremes of the high and low proportion of African ancestry. In self-classified Mixed individuals, the predicted probability of having African ancestry was bell-shaped. Our results support the view that ethnoracial self-classification is affected by both genome ancestry and non-biological factors.
Subject(s)
Epigen/genetics , Ethnicity/genetics , Adult , Black People/genetics , Brazil , Child , Child, Preschool , Cohort Studies , Genetics, Population/methods , Genomics/methods , Humans , Longitudinal Studies , Middle Aged , Phenotype , Polymorphism, Single Nucleotide/genetics , White People/genetics , Young AdultABSTRACT
BACKGROUND: Archaeology reports millenary cultural contacts between Peruvian Coast-Andes and the Amazon Yunga, a rainforest transitional region between Andes and Lower Amazonia. To clarify the relationships between cultural and biological evolution of these populations, in particular between Amazon Yungas and Andeans, we used DNA-sequence data, a model-based Bayesian approach and several statistical validations to infer a set of demographic parameters. RESULTS: We found that the genetic diversity of the Shimaa (an Amazon Yunga population) is a subset of that of Quechuas from Central-Andes. Using the Isolation-with-Migration population genetics model, we inferred that the Shimaa ancestors were a small subgroup that split less than 5300 years ago (after the development of complex societies) from an ancestral Andean population. After the split, the most plausible scenario compatible with our results is that the ancestors of Shimaas moved toward the Peruvian Amazon Yunga and incorporated the culture and language of some of their neighbors, but not a substantial amount of their genes. We validated our results using Approximate Bayesian Computations, posterior predictive tests and the analysis of pseudo-observed datasets. CONCLUSIONS: We presented a case study in which model-based Bayesian approaches, combined with necessary statistical validations, shed light into the prehistoric demographic relationship between Andeans and a population from the Amazon Yunga. Our results offer a testable model for the peopling of this large transitional environmental region between the Andes and the Lower Amazonia. However, studies on larger samples and involving more populations of these regions are necessary to confirm if the predominant Andean biological origin of the Shimaas is the rule, and not the exception.