Search | VHL Regional Portal

Sequencing of autosomal, mitochondrial and Y-chromosomal forensic markers in the People of the British Isles cohort detects population structure dominated by patrilineages.

Huszar, Tunde I; Bodmer, Walter F; Hutnik, Katarzyna; Wetton, Jon H; Jobling, Mark A.

Forensic Sci Int Genet ; 59: 102725, 2022 07.

Article in English | MEDLINE | ID: mdl-35640311

ABSTRACT

Short tandem repeat (STR) polymorphisms are traditionally assessed by measuring allele lengths via capillary electrophoresis (CE). Massively parallel sequencing (MPS) reveals differences among alleles of the same length, thus improving discrimination, but also identifying groups of alleles likely related by descent. These may have relatively restricted geographical distributions and thus MPS could detect population structure more effectively than CE-based analysis. We addressed this question by applying an MPS multiplex, the Promega PowerSeq™ Auto/Mito/Y System prototype, to 362 individuals chosen to represent a wide geographical spread from the People of the British Isles (PoBI) cohort, which represents at least three generations of local rural ancestry. As well as 22 autosomal STRs (aSTRs; equivalent to PowerPlex Fusion loci) the system sequences 23 Y-STRs (the PowerPlexY 23 loci) and the control region (CR) of mitochondrial DNA (mtDNA), allowing population structure to be compared across biparentally and uniparentally inherited segments of the genome. For all loci, FST-based tests of population structure were done based on historical, linguistic, and geographical partitions, and for aSTRs the clustering algorithm STRUCTURE was also applied. STRs were considered using both length and sequence. Sequencing increased aSTR allele diversity by 87.5% compared to CE-based designations, reducing random match probability to 1.25E-30, compared to a CE-based 6.72E-27. Significant population structure was detectable in just one pairwise comparison (Central/South East England compared to the rest), and for sequence-based alleles only. The 362 samples carried 308 distinct mtDNA CR haplotypes corresponding to 13 broad haplogroups, representing a haplotype diversity of 0.9985 ( ± 0.0005), and a haplotype match probability of 0.0043. No significant population structure was observed. Y-STR haplotypes belonged to ten broad predicted Y-haplogroups. Allele diversity increased by 33% when considered at the sequence rather than length level, although haplotype diversity was unchanged at 0.999969 ( ± 0.000001); haplotype match probability was 2.79E-03. In contrast to the biparentally and maternally inherited loci, Y-STR haplotypes showed significant population structure at several levels, but most markedly in a comparison of regions subject to Anglo-Saxon influence in the east with the rest of the sample. This was evident for both length- and sequence-based allele designations, with no systematic difference between the two. We conclude that MPS analysis of aSTRs or Y-STRs does not generally reveal stronger population structure than length-based analysis, that UK maternal lineages are not significantly structured, and that Y-STR haplotypes reveal significant population structure that may reflect the Anglo-Saxon migrations to Britain in the 6th century.

Subject(s)

Chromosomes, Human, Y , DNA Fingerprinting , DNA, Mitochondrial/genetics , High-Throughput Nucleotide Sequencing , Humans , Microsatellite Repeats , Sequence Analysis, DNA

A multi-dimensional evaluation of the 'NIST 1032' sample set across four forensic Y-STR multiplexes.

Steffen, Carolyn R; Huszar, Tunde I; Borsuk, Lisa A; Vallone, Peter M; Gettings, Katherine B.

Forensic Sci Int Genet ; 57: 102655, 2022 03.

Article in English | MEDLINE | ID: mdl-35007854

ABSTRACT

This manuscript reports Y-chromosomal short tandem repeat (Y-STR) haplotypes for 1032 male U.S. population samples across 30 Y-STR loci characterized by three capillary electrophoresis (CE) length-based kits (PowerPlex Y23 System, Yfiler Plus PCR Amplification Kit, and Investigator Argus Y-28 QS Kit) and one sequence-based kit (ForenSeq DNA Signature Prep Kit): DYF387S1, DYS19, DYS385 a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS449, DYS456, DYS458, DYS460, DYS481, DYS505, DYS518, DYS522, DYS533, DYS549, DYS570, DYS576, DYS612, DYS627, DYS635, DYS643, and Y-GATA-H4. The length-based Y-STR haplotypes include six loci that are not reported in the sequence-based kit (DYS393, DYS449, DYS456, DYS458, DYS518, and DYS627), whereas three loci included in the sequence-based kit are not present in length-based kits (DYS505, DYS522, and DYS612). For the latter, a custom multiplex was used to generate CE length-based data, allowing 1032 samples to be evaluated for concordance across the 30 Y-STR loci included in these four commercial Y-STR typing kits. Discordances between typing methods were analyzed further to assess underlying causes such as primer binding site mutations and flanking region insertions/deletions. Allele-level frequency and statistical information is provided for sequenced loci, excluding the multi-copy loci DYF387S1 and DYS385 a/b, for which locus-specific haplotype-level frequencies are provided instead. The resulting data reveals the degree of information gained through sequencing: 88% of sequenced Y-STR loci contain additional sequence-based alleles compared to length-based data, with the DYS389II locus containing the most additional alleles (51) observed by sequencing. Despite these allelic increases, only minimal improvement was observed in haplotype resolution by sequence, with all four commercial kits providing a similar ability to differentiate length-based haplotypes in this sample set. Finally, a subset of 369 male samples were compared to their corresponding additionally sequenced father samples, revealing the sequence basis for the 50 length-based changes observed, and no additional sequence-based mutations. GenBank accession numbers are reported for each unique sequence, and associated records are available in the STRSeq Y-Chromosomal STR Loci National Center for Biotechnology Information (NCBI) BioProject, accession PRJNA380347. Haplotype data is updated in the Y-STR Haplotype Reference Database (YHRD) for the 'NIST 1032' data set to now achieve the level of maximal haplotype of YHRD. All supplementary files including revisions to previously published Y-STR data are available in the NIST Public Data Repository: U.S. population data for human identification markers, DOI 10.18434/t4/1500024.

Subject(s)

Chromosomes, Human, Y , DNA Fingerprinting , DNA Fingerprinting/methods , Gene Frequency , Genetics, Population , Haplotypes , Humans , Male , Microsatellite Repeats

An Introductory Overview of Open-Source and Commercial Software Options for the Analysis of Forensic Sequencing Data.

Huszar, Tunde I; Gettings, Katherine B; Vallone, Peter M.

Genes (Basel) ; 12(11)2021 10 29.

Article in English | MEDLINE | ID: mdl-34828345

ABSTRACT

The top challenges of adopting new methods to forensic DNA analysis in routine laboratories are often the capital investment and the expertise required to implement and validate such methods locally. In the case of next-generation sequencing, in the last decade, several specifically forensic commercial options became available, offering reliable and validated solutions. Despite this, the readily available expertise to analyze, interpret and understand such data is still perceived to be lagging behind. This review gives an introductory overview for the forensic scientists who are at the beginning of their journey with implementing next-generation sequencing locally and because most in the field do not have a bioinformatics background may find it difficult to navigate the new terms and analysis options available. The currently available open-source and commercial software for forensic sequencing data analysis are summarized here to provide an accessible starting point for those fairly new to the forensic application of massively parallel sequencing.

Subject(s)

Computational Biology/methods , DNA Fingerprinting/methods , Forensic Genetics/methods , Sequence Analysis, DNA/methods , Software , Data Interpretation, Statistical , High-Throughput Nucleotide Sequencing , Humans , Microsatellite Repeats

Subdividing Y-chromosome haplogroup R1a1 reveals Norse Viking dispersal lineages in Britain.

Lall, Gurdeep Matharu; Larmuseau, Maarten H D; Wetton, Jon H; Batini, Chiara; Hallast, Pille; Huszar, Tunde I; Zadik, Daniel; Aase, Sigurd; Baker, Tina; Balaresque, Patricia; Bodmer, Walter; Børglum, Anders D; de Knijff, Peter; Dunn, Hayley; Harding, Stephen E; Løvvik, Harald; Dupuy, Berit Myhre; Pamjav, Horolma; Tillmar, Andreas O; Tomaszewski, Maciej; Tyler-Smith, Chris; Verdugo, Marta Pereira; Winney, Bruce; Vohra, Pragya; Story, Joanna; King, Turi E; Jobling, Mark A.

Eur J Hum Genet ; 29(3): 512-523, 2021 03.

Article in English | MEDLINE | ID: mdl-33139852

ABSTRACT

The influence of Viking-Age migrants to the British Isles is obvious in archaeological and place-names evidence, but their demographic impact has been unclear. Autosomal genetic analyses support Norse Viking contributions to parts of Britain, but show no signal corresponding to the Danelaw, the region under Scandinavian administrative control from the ninth to eleventh centuries. Y-chromosome haplogroup R1a1 has been considered as a possible marker for Viking migrations because of its high frequency in peninsular Scandinavia (Norway and Sweden). Here we select ten Y-SNPs to discriminate informatively among hg R1a1 sub-haplogroups in Europe, analyse these in 619 hg R1a1 Y chromosomes including 163 from the British Isles, and also type 23 short-tandem repeats (Y-STRs) to assess internal diversity. We find three specifically Western-European sub-haplogroups, two of which predominate in Norway and Sweden, and are also found in Britain; star-like features in the STR networks of these lineages indicate histories of expansion. We ask whether geographical distributions of hg R1a1 overall, and of the two sub-lineages in particular, correlate with regions of Scandinavian influence within Britain. Neither shows any frequency difference between regions that have higher (≥10%) or lower autosomal contributions from Norway and Sweden, but both are significantly overrepresented in the region corresponding to the Danelaw. These differences between autosomal and Y-chromosomal histories suggest either male-specific contribution, or the influence of patrilocality. Comparison of modern DNA with recently available ancient DNA data supports the interpretation that two sub-lineages of hg R1a1 spread with the Vikings from peninsular Scandinavia.

Subject(s)

Chromosomes, Human, Y/genetics , Haplotypes , Human Migration , Evolution, Molecular , Humans , Male , Minisatellite Repeats , Pedigree , Polymorphism, Single Nucleotide , Scandinavian and Nordic Countries , United Kingdom

Mitigating the effects of reference sequence bias in single-multiplex massively parallel sequencing of the mitochondrial DNA control region.

Huszar, Tunde I; Wetton, Jon H; Jobling, Mark A.

Forensic Sci Int Genet ; 40: 9-17, 2019 05.

Article in English | MEDLINE | ID: mdl-30682697

ABSTRACT

Sequence analysis of the mitochondrial DNA (mtDNA) control region can provide forensically useful information, particularly in challenging samples where autosomal DNA profiling fails. Sub-division of the 1122-bp region into shorter PCR fragments improves data recovery, and such fragments can be analysed together via massively parallel sequencing (MPS). Here, we generate mtDNA data using the prototype PowerSeq™ Auto/Mito/Y System (Promega) MPS assay, in which a single PCR reaction amplifies ten overlapping amplicons of the control region, in a set of 101 highly diverse samples representing most major clades of the mtDNA phylogeny. The overlapping multiplex design leads to non-uniform coverage in the regions of overlap, where it is further increased by short amplicons generated alongside the intended products. Primer sequences in targeted amplification libraries are a potential source of reference sequence bias and thus should be removed, but the proprietary nature of the primers in commercial kits necessitates an alternative approach that minimises data loss: here, we introduce the bioinformatic selection of sequencing reads spanning putative primer sites (Overarching Read Enrichment Option, OREO). While OREO performs well in mitigating the effects of primer sequences at the ends of sequence reads, we still find evidence of the internalisation of primer-derived sequences by overlap extension, which may compromise the ability to call variants or to measure heteroplasmy in primer-binding regions. The commercially available PowerSeq™ CRM Nested System design prevents primer internalisation, as shown in a reanalysis of a subset of 57 samples that contain possible heteroplasmies. In combination with OREO, the CRM Nested kit mitigates reference sequence bias, allowing heteroplasmic variants to be estimated down to a 5% threshold. Provided appropriate steps are taken in data processing, single-reaction multiplex assays represent robust tools to analyse mtDNA control region variation. The OREO approach will allow users to bypass the effects of unknown primer sequences in any single-reaction tiled multiplex and eliminate primer-derived bias in overlapping amplicon sequencing studies, in both forensic and non-forensic settings.

Subject(s)

DNA, Mitochondrial/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA , DNA Fingerprinting , Humans , Phylogeny , Polymerase Chain Reaction , Polymorphism, Single Nucleotide

A phylogenetic framework facilitates Y-STR variant discovery and classification via massively parallel sequencing.

Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H.

Forensic Sci Int Genet ; 35: 97-106, 2018 07.

Article in English | MEDLINE | ID: mdl-29679929

ABSTRACT

Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context.

Subject(s)

Chromosomes, Human, Y , High-Throughput Nucleotide Sequencing , Microsatellite Repeats , DNA Fingerprinting , Genetic Variation , Humans , Male , Phylogeny , Polymerase Chain Reaction

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL