Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 69
Filter
1.
Electrophoresis ; 45(9-10): 877-884, 2024 May.
Article in English | MEDLINE | ID: mdl-38196015

ABSTRACT

Macrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High-quality long-sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read-backed phasing) to identify macrohaplotypes, and thus it can detect multi-allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8-kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole-genome long-read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high-throughput sequencing data and supports applications using discriminating macrohaplotypes.


Subject(s)
Haplotypes , High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Polyploidy , Sequence Analysis, DNA , Software , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Algorithms , Computational Biology/methods , DNA/genetics , DNA/analysis , Microsatellite Repeats/genetics , Forensic Genetics/methods , Genotyping Techniques/methods
2.
Electrophoresis ; 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38010138

ABSTRACT

Genetic genealogy has been more frequently used in forensic investigations in identifying criminals. However, the current genetic genealogy applications usually do not consider lineage markers (including both Y and mitochondrial deoxyribonucleic acid (DNA)), which is probably because not all distant relatives share the same lineage markers. In addition, there is no study to show how to use lineage markers and what methods or thresholds should be applied in genetic genealogy. In this study, we developed a method to quickly determine if two single-nucleotide polymorphism (SNP) profiles are from the same paternal or material lineages by using a mismatch frequency of the SNPs in Y-chromosomal or mitochondrial DNA. For both Y and mitochondrial SNPs, profile pairs from the same or different lineages can be decided with high accuracies (i.e., 0.380% or 0.157% error rates with Y and mitochondrial DNA, respectively). With kinship coefficient filtering based on autosomal SNPs, the accuracies of determining maternal and paternal lineage can be further improved (i.e., 0.120% or 0.057% error rates with Y and mitochondrial DNA, respectively, using a threshold of kinship coefficient >0). This study shows that lineage markers can be very powerful tools with high accuracies to determine lineages, which could help solve cases and reduce costs for genetic genealogy investigations.

3.
Front Genet ; 14: 1227176, 2023.
Article in English | MEDLINE | ID: mdl-37533432

ABSTRACT

Calling tandem repeat (TR) variants from DNA sequences is of both theoretical and practical significance. Some bioinformatics tools have been developed for detecting or genotyping TRs. However, little study has been done to genotyping TR alleles from long-read sequencing data, and the accuracy of genotyping TR alleles from next-generation sequencing data still needs to be improved. Herein, a novel algorithm is described to retrieve TR regions from sequence alignment, and a software program TRcaller has been developed and integrated into a web portal to call TR alleles from both short- and long-read sequences, both whole genome and targeted sequences generated from multiple sequencing platforms. All TR alleles are genotyped as haplotypes and the robust alleles will be reported, even multiple alleles in a DNA mixture. TRcaller could provide substantially higher accuracy (>99% in 289 human individuals) in detecting TR alleles with magnitudes faster (e.g., ∼2 s for 300x human sequence data) than the mainstream software tools. The web portal preselected 119 TR loci from forensics, genealogy, and disease related TR loci. TRcaller is validated to be scalable in various applications, such as DNA forensics and disease diagnosis, which can be expanded into other fields like breeding programs. Availability: TRcaller is available at https://www.trcaller.com/SignIn.aspx.

4.
Electrophoresis ; 44(13-14): 1080-1087, 2023 07.
Article in English | MEDLINE | ID: mdl-37016479

ABSTRACT

Y chromosome Short Tandem Repeat (STR) haplotypes have been used in assisting forensic investigations primarily for identification and male lineage determination. The current SWGDAM interpretation guidelines for Y-STR typing provide helpful guidance on those purposes but do not address the issue of kinship analysis with Y-STR haplotypes. Because of the high mutation rate of Y-STRs, there are complex missing person cases in which inconsistent Y-STR haplotypes between true paternal lineage relatives will arise and cases with two or more male references in the same lineage and yet differ in their haplotypes. Therefore, more useful methods are needed for interpreting the Y-STR haplotype data. Computational methods and interpretation guidelines have been developed specifically addressing this issue, either using a mismatch-based counting method or a pedigree likelihood ratio method. In this study, a software program, MPKin-YSTR, was developed by implementing those more sophisticated methods. This software should be able to improve the interpretation of complex cases with Y-STR haplotype evidence. Thus, more biological evidence will be interpreted, which in turn will result in more investigation leads to help solve crimes.


Subject(s)
Chromosomes, Human, Y , Microsatellite Repeats , Humans , Male , Haplotypes/genetics , Chromosomes, Human, Y/genetics , Microsatellite Repeats/genetics , Pedigree , Genetics, Population
5.
BMC Bioinformatics ; 23(1): 497, 2022 Nov 19.
Article in English | MEDLINE | ID: mdl-36402991

ABSTRACT

BACKGROUND: Tandem repeats (TR), highly variable genomic variants, are widely used in individual identification, disease diagnostics, and evolutionary studies. The recent advances in sequencing technologies and bioinformatic tools facilitate calling TR haplotypes genome widely. Both length-based and sequence-based TR alleles are used in different applications. However, sequence-based TR alleles could provide the highest precision in characterizing TR haplotypes. The need to identify the differences at the single nucleotide level between or among TR haplotypes with an easy-use bioinformatic tool is essential. RESULTS: In this study, we developed a Universal STR Allele Toolkit (USAT) for TR haplotype analysis, which takes TR haplotype output from existing tools to perform allele size conversion, sequence comparison of haplotypes, figure plotting, comparison for allele distribution, and interactive visualization. An exemplary application of USAT for analysis of the CODIS core STR loci for DNA forensics with benchmarking human individuals demonstrated the capabilities of USAT. USAT has user-friendly graphic interfaces and runs fast in major computing operating systems with parallel computing enabled. CONCLUSION: USAT is a user-friendly bioinformatics software for interpretation, visualization, and comparisons of TRs.


Subject(s)
Computational Biology , Microsatellite Repeats , Humans , Alleles , Haplotypes , Sequence Analysis, DNA
6.
Front Genet ; 13: 971242, 2022.
Article in English | MEDLINE | ID: mdl-36263419

ABSTRACT

Estimating the relationships between individuals is one of the fundamental challenges in many fields. In particular, relationship.ip estimation could provide valuable information for missing persons cases. The recently developed investigative genetic genealogy approach uses high-density single nucleotide polymorphisms (SNPs) to determine close and more distant relationships, in which hundreds of thousands to tens of millions of SNPs are generated either by microarray genotyping or whole-genome sequencing. The current studies usually assume the SNP profiles were generated with minimum errors. However, in the missing person cases, the DNA samples can be highly degraded, and the SNP profiles generated from these samples usually contain lots of errors. In this study, a machine learning approach was developed for estimating the relationships with high error SNP profiles. In this approach, a hierarchical classification strategy was employed first to classify the relationships by degree and then the relationship types within each degree separately. As for each classification, feature selection was implemented to gain better performance. Both simulated and real data sets with various genotyping error rates were utilized in evaluating this approach, and the accuracies of this approach were higher than individual measures; namely, this approach was more accurate and robust than the individual measures for SNP profiles with genotyping errors. In addition, the highest accuracy could be obtained by providing the same genotyping error rates in train and test sets, and thus estimating genotyping errors of the SNP profiles is critical to obtaining high accuracy of relationship estimation.

7.
Forensic Sci Int Genet ; 61: 102785, 2022 11.
Article in English | MEDLINE | ID: mdl-36206658

ABSTRACT

One of the fundamental goals of forensic genetics is sample attribution, i.e., whether an item of evidence can be associated with some person or persons. The most common scenario involves a direct comparison, e.g., between DNA profiles from an evidentiary item and a sample collected from a person of interest. Less common is an indirect comparison in which kinship is used to potentially identify the source of the evidence. Because of the sheer amount of information lost in the hereditary process for comparison purposes, sampling a limited set of loci may not provide enough resolution to accurately resolve a relationship. Instead, whole genome techniques can sample the entirety of the genome or a sufficiently large portion of the genome and as such they may effect better relationship determinations. While relatively common in other areas of study, whole genome techniques have only begun to be explored in the forensic sciences. As such, bioinformatic pipelines are introduced for estimating kinship by massively parallel sequencing of whole genomes using approaches adapted from the medical and population genomic literature. The pipelines are designed to characterize a person's entire genome, not just some set of targeted markers. Two different variant callers are considered, contrasting a classical variant calling algorithm (BCFtools) to a more modern deep convolution neural network (DeepVariant). Two different bioinformatic pipelines specific to each variant caller are introduced and evaluated in a titration series. Filters and thresholds are then optimized specifically for the purposes of estimating kinship as determined by the KING-robust algorithm. With the appropriate filtering and thresholds in place both tools perform similarly, with DeepVariant tending to produce more accurate genotypes, though the resultant types of inaccuracies tended to produce slightly less accurate overall estimates of relatedness.


Subject(s)
High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Humans , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods , Genotype , Algorithms
8.
Int J Legal Med ; 136(6): 1541-1549, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36057692

ABSTRACT

Laboratories and their criminal justice systems are confronted with challenges for implementing new technologies, practices, and policies even when there appears to be demonstrative benefits to operational performance. Impacting decisions are the often higher costs associated with, for example, new technologies, limited current budgets, and making hard decisions on what to sacrifice to take on the seemingly better approach. A prospective cost-benefit analysis (CBA) could help an agency better formulate its strategies and plans and more importantly delineate how a relatively small increase to take on, for example, a new technology can have large impact on the system (e.g., the agency, other agencies, victims and families, and taxpayers). To demonstrate the process and potential value a CBA was performed on the use of an alternate and more expensive swab with reported better DNA yield and being certified human DNA free (i.e., nylon 4N6FLOQSwabs®), versus the traditional less costly swab (i.e., cotton swab). Assumptions are described, potential underestimates and overestimates noted, different values applied (for low and modest to high), and potential benefits (monetary and qualitative) presented. The overall outcome is that the cost of using the more expensive technology pales compared with the potential tangible and intangible benefits. This approach could be a guide for laboratories (and associated criminal justice systems) worldwide to support increased funding, although the costs and benefits may vary locally and for different technologies, practices, and policies. With well-developed CBAs, goals of providing the best services to support the criminal justice system and society can be attained.


Subject(s)
Nylons , Cost-Benefit Analysis , Humans , Prospective Studies
9.
Front Genet ; 13: 882268, 2022.
Article in English | MEDLINE | ID: mdl-35846115

ABSTRACT

Technological advances in sequencing and single nucleotide polymorphism (SNP) genotyping microarray technology have facilitated advances in forensic analysis beyond short tandem repeat (STR) profiling, enabling the identification of unknown DNA samples and distant relationships. Forensic genetic genealogy (FGG) has facilitated the identification of distant relatives of both unidentified remains and unknown donors of crime scene DNA, invigorating the use of biological samples to resolve open cases. Forensic samples are often degraded or contain only trace amounts of DNA. In this study, the accuracy of genome-wide relatedness methods and identity by descent (IBD) segment approaches was evaluated in the presence of challenges commonly encountered with forensic data: missing data and genotyping error. Pedigree whole-genome simulations were used to estimate the genotypes of thousands of individuals with known relationships using multiple populations with different biogeographic ancestral origins. Simulations were also performed with varying error rates and types. Using these data, the performance of different methods for quantifying relatedness was benchmarked across these scenarios. When the genotyping error was low (<1%), IBD segment methods outperformed genome-wide relatedness methods for close relationships and are more accurate at distant relationship inference. However, with an increasing genotyping error (1-5%), methods that do not rely on IBD segment detection are more robust and outperform IBD segment methods. The reduced call rate had little impact on either class of methods. These results have implications for the use of dense SNP data in forensic genomics for distant kinship analysis and FGG, especially when the sample quality is low.

10.
Leg Med (Tokyo) ; 57: 102073, 2022 Jul.
Article in English | MEDLINE | ID: mdl-35453076

ABSTRACT

In this population study 1541 samples in total were collected and analyzed. The samples were collected from five jurisdictions: North macro region (n = 272), Central macro region (n = 404), South macro region (n = 272), East macro region (n = 197), and the Lima macro region (n = 396). The samples were analyzed using the Investigator 24 plex GO and Investigator 24 plex QS kits which enable typing of 21 autosomal STR loci and an amelogenin marker for sex determination. The combined power of discrimination and the combined probability of exclusion for the total population were 0.9999999999 and 0.99999978, respectively. These population geographic subgroupings are similar, supporting that the combined Peruvian data or individual subgroupings could be used for generating statistics in forensic casework.


Subject(s)
Forensic Genetics , Microsatellite Repeats , Amelogenin/genetics , DNA Fingerprinting , Forensic Anthropology , Gene Frequency , Genetics, Population , Humans , Microsatellite Repeats/genetics , Peru
11.
F1000Res ; 11: 18, 2022.
Article in English | MEDLINE | ID: mdl-35222994

ABSTRACT

Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.


Subject(s)
Genome , Pedigree , Polymorphism, Single Nucleotide , Computational Biology , Humans , Software
12.
Int J Legal Med ; 136(2): 565-567, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34613462

ABSTRACT

With the advent of expanded STR (short tandem repeats) typing kits, it was necessary to determine allele frequencies and other appropriate population data parameters for El Salvador. Samples were collected from the central, east, and west regions of the country and typed for 21 forensically relevant STR loci. The data indicate that all loci are highly polymorphic, the three regions are genetically similar, and the population data are similar to those of US Hispanics. The results of this study support that the allele frequency data described herein can be used for statistical calculations for human identity testing in El Salvador.


Subject(s)
DNA Fingerprinting , Genetics, Population , Gene Frequency , Hispanic or Latino , Humans , Microsatellite Repeats
13.
F1000Res ; 11: 775, 2022.
Article in English | MEDLINE | ID: mdl-38779458

ABSTRACT

Motivation: Genotyping error can impact downstream single nucleotide polymorphism (SNP)-based analyses. Simulating various modes and levels of error can help investigators better understand potential biases caused by miscalled genotypes. Methods: We have developed and validated vcferr, a tool to probabilistically simulate genotyping error and missingness in variant call format (VCF) files. We demonstrate how vcferr could be used to address a research question by introducing varying levels of error of different type into a sample in a simulated pedigree, and assessed how kinship analysis degrades as a function of the kind and type of error. Software availability: vcferr is available for installation via PyPi (https://pypi.org/project/vcferr/) or conda (https://anaconda.org/bioconda/vcferr). The software is released under the MIT license with source code available on GitHub (https://github.com/signaturescience/vcferr).

14.
Genes (Basel) ; 12(11)2021 10 20.
Article in English | MEDLINE | ID: mdl-34828255

ABSTRACT

Wet-lab based studies have exploited emerging single-cell technologies to address the challenges of interpreting forensic mixture evidence. However, little effort has been dedicated to developing a systematic approach to interpreting the single-cell profiles derived from the mixtures. This study is the first attempt to develop a comprehensive interpretation workflow in which single-cell profiles from mixtures are interpreted individually and holistically. In this approach, the genotypes from each cell are assessed, the number of contributors (NOC) of the single-cell profiles is estimated, followed by developing a consensus profile of each contributor, and finally the consensus profile(s) can be used for a DNA database search or comparing with known profiles to determine their potential sources. The potential of this single-cell interpretation workflow was assessed by simulation with various mixture scenarios and empirical allele drop-out and drop-in rates, the accuracies of estimating the NOC, the accuracies of recovering the true alleles by consensus, and the capabilities of deconvolving mixtures with related contributors. The results support that the single-cell based mixture interpretation can provide a precision that cannot beachieved with current standard CE-STR analyses. A new paradigm for mixture interpretation is available to enhance the interpretation of forensic genetic casework.


Subject(s)
DNA/analysis , Forensic Genetics , Single-Cell Analysis/methods , Algorithms , Alleles , Cluster Analysis , DNA/chemistry , DNA/genetics , DNA Contamination , DNA Fingerprinting/methods , Forensic Genetics/methods , Forensic Genetics/trends , Genetic Techniques , Genotype , Humans , Microsatellite Repeats
15.
Int J Legal Med ; 135(6): 2189-2198, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34378071

ABSTRACT

Deconvoluting mixture samples is one of the most challenging problems confronting DNA forensic laboratories. Efforts have been made to provide solutions regarding mixture interpretation. The probabilistic interpretation of Short Tandem Repeat (STR) profiles has increased the number of complex mixtures that can be analyzed. A portion of complex mixture profiles, particularly for mixtures with a high number of contributors, are still being deemed uninterpretable. Novel forensic markers, such as Single Nucleotide Variants (SNV) and microhaplotypes, also have been proposed to allow for better mixture interpretation. However, these markers have both a lower discrimination power compared with STRs and are not compatible with CODIS or other national DNA databanks worldwide. The short-read sequencing (SRS) technologies can facilitate mixture interpretation by identifying intra-allelic variations within STRs. Unfortunately, the short size of the amplicons containing STR markers and sequence reads limit the alleles that can be attained per STR. The latest long-read sequencing (LRS) technologies can overcome this limitation in some samples in which larger DNA fragments (including both STRs and SNVs) with definitive phasing are available. Based on the LRS technologies, this study developed a novel CODIS compatible forensic marker, called a macrohaplotype, which combines a CODIS STR and flanking variants to offer extremely high number of haplotypes and hence very high discrimination power per marker. The macrohaplotype will substantially improve mixture interpretation capabilities. Based on publicly accessible data, a panel of 20 macrohaplotypes with sizes of ~ 8 k bp and the maximum high discrimination powers were designed. The statistical evaluation demonstrates that these macrohaplotypes substantially outperform CODIS STRs for mixture interpretation, particularly for mixtures with a high number of contributors, as well as other forensic applications. Based on these results, efforts should be undertaken to build a complete workflow, both wet-lab and bioinformatics, to precisely call the variants and generate the macrohaplotypes based on the LRS technologies.


Subject(s)
DNA Fingerprinting , Microsatellite Repeats , DNA/genetics , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
16.
J Forensic Sci ; 66(5): 1637-1646, 2021 Sep.
Article in English | MEDLINE | ID: mdl-33885147

ABSTRACT

For the past two to three decades, forensic DNA evidence has been analyzed with a limited number of short tandem repeats (STRs), and these STRs are usually assumed to be independent for statistical calculations. With the development and implementation of the MPS technologies, more autosomal markers, both single nucleotide polymorphisms (SNPs) and STRs, can be analyzed. A number of these markers are physically very close to each other, and it may not be appropriate to assume all these markers are genetically unlinked or in linkage equilibrium. In this study, publicly accessible genomic data from five representative populations were used to evaluate the genetic linkage and linkage disequilibrium (LD) between autosomal markers represented in six major commercial panels (in total, 362 markers). Among the 3041 syntenic marker pairs, 1524 pairs had sex-average genetic distances <50 cM, and thus, these marker pairs can be considered as genetically linked. Among the 143 marker pairs with physical distances <1 Mb, 19 LD haplotype blocks (comprising 39 SNPs in total) were detected for at least one of the tested populations. Statistical methods for interpreting linked markers and/or markers in LD were suggested for various case scenarios.


Subject(s)
Forensic Genetics , Genetic Linkage , High-Throughput Nucleotide Sequencing , Linkage Disequilibrium , DNA Fingerprinting , Genetic Markers , Humans , Microsatellite Repeats , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
17.
Forensic Sci Int Genet ; 53: 102494, 2021 07.
Article in English | MEDLINE | ID: mdl-33740707

ABSTRACT

The VeriFiler™ Plus PCR Amplification Kit is a 6-dye multiplex assay that simultaneously amplifies a set of 23 autosomal markers (D3S1358, vWA, D16S539, CSF1PO, D6S1043, D8S1179, D21S11, D18S51, D5S818, D2S441, D19S433, FGA, D10S1248, D22S1045, D1S1656, D13S317, D7S820, Penta E, Penta D, TH01, D12S391, D2S1338, and TPOX), a quality indicator system, and two sex-identification markers. Combined, the markers satisfy the requirements of the Chinese National autosomal DNA database as well as expanded CODIS (Combined DNA Index System). The VeriFiler Plus kit was developed with an improved Master Mix which incorporates the brighter TED™ dye, and accommodates a higher sample loading volume thus allowing for increased sensitivity and enabling maximum information recovery from challenging casework samples including touch, degraded, and inhibited samples. Here, we report the results of the developmental validation study which followed the SWGDAM (Scientific Working Group on DNA Analysis Methods) guidelines and includes data for PCR-based studies, sensitivity, species specificity, stability, precision, reproducibility and repeatability, concordance, stutter, DNA mixtures, and performance on mock casework samples. The results validate the multiplex design as well as demonstrate the kit's robustness, reliability, and suitability as an assay for human identification with casework DNA samples.


Subject(s)
Multiplex Polymerase Chain Reaction/instrumentation , Animals , DNA Degradation, Necrotic , DNA Fingerprinting , Female , Forensic Genetics/instrumentation , Genetics, Population , Humans , Male , Microsatellite Repeats , Reproducibility of Results , Species Specificity
18.
J Forensic Sci ; 66(2): 430-443, 2021 Mar.
Article in English | MEDLINE | ID: mdl-33136341

ABSTRACT

There are several indirect database searching approaches to identify the potential source of a forensic biological sample. These DNA-based approaches are familial searching, Y-STR database searching, and investigative genetic genealogy (IGG). The first two strategies use forensic DNA databases managed by the government, and the latter uses databases managed by private citizens or companies. Each of these search strategies relies on DNA testing to identify relatives of the donor of the crime scene sample, provided such profiles reside in the DNA database(s). All three approaches have been successfully used to identify the donor of biological evidence, which assisted in solving criminal cases or identifying unknown human remains. This paper describes and compares these approaches in terms of genotyping technologies, searching methods, database structures, searching efficiency, data quality, data security, and costs, and raises some potential privacy and legal considerations for further discussion by stakeholders and scientists. Y-STR database searching and IGG are advantageous since they are able to assist in more cases than familial searching readily identifying distant relatives. In contrast, familial searching can be performed more readily with existing laboratory systems. Every country or state may have its own unique economic, technical, cultural, and legal considerations and should decide the best approach(es) to fit those circumstances. Regardless of the approach, the ultimate goal should be the same: generate investigative leads and solve active and cold criminal cases to public safety, under stringent policies and security practices designed to protect the privacy of its citizenry.


Subject(s)
Databases, Nucleic Acid , Information Storage and Retrieval/methods , Pedigree , Chromosomes, Human, Y , DNA Fingerprinting , Forensic Genetics/methods , Humans , Microsatellite Repeats , Polymorphism, Single Nucleotide
20.
J Hum Genet ; 65(5): 461-468, 2020 May.
Article in English | MEDLINE | ID: mdl-32081902

ABSTRACT

Predicting the biogeographical ancestries of populations and unknown individuals based on ancestry-informative markers (AIMs) has been widely applied in providing DNA clues to criminal investigations, correcting the factor of population stratification in genome-wide association studies (GWAS), and working as the basis of predicting the externally visible characteristics (EVCs) of individuals. The present study chose Chinese Xinjiang Kazak (XJK) group as research object using a 165 AIM-SNPs panel via next generation sequencing (NGS) technology to reveal its ancestral information and genetic background by referencing the populations' data from 1000 Genomes Phase 3. After the Bonferroni correction, there were no significant deviations at the 165 AIM-SNP loci except two loci with homozygote in the studied XJK group. Ancestry information inference and populations genetic analyses were conducted basing on multiplex statistical methods such as forensic statistical parameter analyses, estimation of the success ratios with cross-validation, population tree, principal component analysis (PCA), and genetic structure analysis. The present results revealed that XJK group had the admixed ancestral components of East Asian and European populations with the ratio of about 62:37.


Subject(s)
Asian People , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Asian People/ethnology , Asian People/genetics , China/ethnology , Humans , White People/ethnology , White People/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...