Search | VHL Regional Portal

1.

Machine Learning: a new era for cardiovascular pregnancy physiology and cardio-obstetrics research.

Ricci, Contessa A; Crysup, Benjamin; Phillips, Nicole R; Ray, William C; Santillan, Mark K; Trask, Aaron J; Woerner, August E; Goulopoulou, Styliani.

Am J Physiol Heart Circ Physiol ; 2024 Jun 07.

Article in English | MEDLINE | ID: mdl-38847756

ABSTRACT

The maternal cardiovascular system undergoes functional and structural adaptations during pregnancy and postpartum to support increased metabolic demands of offspring and placental growth, labor, and delivery, as well as recovery from childbirth. Pregnancy thus poses physiological stress upon the maternal cardiovascular system, and in the absence of an appropriate response it imparts potential risks for cardiovascular complications and adverse outcomes. The proportion of pregnancy-related maternal deaths from cardiovascular events has been steadily increasing, contributing to high rates of maternal mortality. Despite advances in cardiovascular physiology research, there is still no comprehensive understanding of maternal cardiovascular adaptations in healthy pregnancies, and with far less known about pregnancy with complications. Further, current tools for prognosis of cardiovascular complications during pregnancy are limited. Machine learning (ML) offers new and effective tools for investigating mechanisms involved in pregnancy-related cardiovascular complications as well as the development of potential therapies. The main goal of this review is to summarize existing research that uses ML to understand mechanisms of cardiovascular physiology during pregnancy and develop prediction models for clinical application in pregnant patients. We also provide an overview of ML fundamentals and a discussion about platforms that can be used to enhance understanding of cardiovascular adaptations to pregnancy. Finally, we address the interpretability and explainability of ML outcomes, consequences of model bias, and ethics of ML use.

2.

Identifying distant relatives using benchtop-scale sequencing.

Woerner, August E; Novroski, Nicole M; Mandape, Sammed; King, Jonathan L; Crysup, Benjamin; Coble, Michael D.

Forensic Sci Int Genet ; 69: 103005, 2024 03.

Article in English | MEDLINE | ID: mdl-38171224

ABSTRACT

The genetic component of forensic genetic genealogy (FGG) is an estimate of kinship, often conducted at genome scales between a great number of individuals. The promise of FGG is substantial: in concert with genealogical records and other nongenetic information, it can indirectly identify a person of interest. A downside of FGG is cost, as it is currently expensive and requires chemistries uncommon to forensic genetic laboratories (microarrays and high throughput sequencing). The more common benchtop sequencers can be coupled with a targeted PCR assay to conduct FGG, though such approaches have limited resolution for kinship. This study evaluates low-pass sequencing, an alternative strategy that is accessible to benchtop sequencers and can produce resolutions comparable to high-pass sequencing. Samples from a three-generation pedigree were augmented to include up to 7th degree relatives (using whole genome pedigree simulations) and the ability to recover the true kinship coefficient was assessed using algorithms qualitatively similar to those found in GEDmatch. We show that up to 7th degree relatives can be reliably inferred from 1 × whole genome sequencing obtainable from desktop sequencers.

Subject(s)

Algorithms , High-Throughput Nucleotide Sequencing , Humans , Pedigree , Polymorphism, Single Nucleotide , Genotype , DNA Fingerprinting

3.

Mixture detection with Demixtify.

Woerner, August E; Crysup, Benjamin; King, Jonathan L; Novroski, Nicole M; Coble, Michael D.

Forensic Sci Int Genet ; 69: 102980, 2024 03.

Article in English | MEDLINE | ID: mdl-38016331

ABSTRACT

The de facto genetic markers of forensics are short tandem repeats (STRs). There are many analytical tools designed to work with STRs, including techniques for analyzing and assessing DNA mixtures. In contrast, the nascent field of forensic genetic genealogy often relies on biallelic single nucleotide polymorphisms (SNPs). Tools designed for the forensic assessment of SNPs are somewhat lacking, especially for DNA mixtures. In this paper we introduce Demixtify, a program that detects DNA mixtures using biallelic SNPs. Demixtify is quite powerful; highly imbalanced mixtures can be detected (≤1:99, considering in silico and in vitro mixtures) when coverage is ample. Demixtify can also detect mixtures in low coverage (â¼1×) samples (when the mixture is relatively balanced). Demixtify includes an empirical estimator of sequence error that is specific to the markers assayed, making it especially relevant to the forensic community. Orthogonal techniques are also developed to characterize in vitro mixtures, as well as samples thought to be single source, and the results of these approaches serve to validate the techniques presented.

Subject(s)

DNA Fingerprinting , DNA , Humans , DNA/genetics , Sequence Analysis, DNA/methods , Polymorphism, Single Nucleotide , Microsatellite Repeats , High-Throughput Nucleotide Sequencing

4.

Using unique molecular identifiers to improve allele calling in low-template mixtures.

Crysup, Benjamin; Mandape, Sammed; King, Jonathan L; Muenzler, Melissa; Kapema, Kapema Bupe; Woerner, August E.

Forensic Sci Int Genet ; 63: 102807, 2023 03.

Article in English | MEDLINE | ID: mdl-36462297

ABSTRACT

PCR artifacts are an ever-present challenge in sequencing applications. These artifacts can seriously limit the analysis and interpretation of low-template samples and mixtures, especially with respect to a minor contributor. In medicine, molecular barcoding techniques have been employed to decrease the impact of PCR error and to allow the examination of low-abundance somatic variation. In principle, it should be possible to apply the same techniques to the forensic analysis of mixtures. To that end, several short tandem repeat loci were selected for targeted sequencing, and a bioinformatic pipeline for analyzing the sequence data was developed. The pipeline notes the relevant unique molecular identifiers (UMIs) attached to each read and, using machine learning, filters the noise products out of the set of potential alleles. To evaluate this pipeline, DNA from pairs of individuals were mixed at different ratios (1-1, 1-9) and sequenced with different starting amounts of DNA (10, 1 and 0.1 ng). Naïvely using the information in the molecular barcodes led to increased performance, with the machine learning resulting in an additional benefit. In concrete terms, using the UMI data results in less noise for a given amount of drop out. For instance, if thresholds are selected that filter out a quarter of the true alleles, using read counts accepts 2381 noise alleles and using raw UMI counts accepts 1726 noise alleles, while the machine learning approach only accepts 307.

Subject(s)

DNA , High-Throughput Nucleotide Sequencing , Humans , Alleles , DNA/analysis , DNA Fingerprinting/methods , Sequence Analysis, DNA , Microsatellite Repeats

5.

Optimized variant calling for estimating kinship.

Woerner, August E; Mandape, Sammed; Kapema, Kapema Bupe; Duque, Tiffany M; Smuts, Amy; King, Jonathan L; Crysup, Benjamin; Wang, Xuewen; Huang, Meng; Ge, Jianye; Budowle, Bruce.

Forensic Sci Int Genet ; 61: 102785, 2022 11.

Article in English | MEDLINE | ID: mdl-36206658

ABSTRACT

One of the fundamental goals of forensic genetics is sample attribution, i.e., whether an item of evidence can be associated with some person or persons. The most common scenario involves a direct comparison, e.g., between DNA profiles from an evidentiary item and a sample collected from a person of interest. Less common is an indirect comparison in which kinship is used to potentially identify the source of the evidence. Because of the sheer amount of information lost in the hereditary process for comparison purposes, sampling a limited set of loci may not provide enough resolution to accurately resolve a relationship. Instead, whole genome techniques can sample the entirety of the genome or a sufficiently large portion of the genome and as such they may effect better relationship determinations. While relatively common in other areas of study, whole genome techniques have only begun to be explored in the forensic sciences. As such, bioinformatic pipelines are introduced for estimating kinship by massively parallel sequencing of whole genomes using approaches adapted from the medical and population genomic literature. The pipelines are designed to characterize a person's entire genome, not just some set of targeted markers. Two different variant callers are considered, contrasting a classical variant calling algorithm (BCFtools) to a more modern deep convolution neural network (DeepVariant). Two different bioinformatic pipelines specific to each variant caller are introduced and evaluated in a titration series. Filters and thresholds are then optimized specifically for the purposes of estimating kinship as determined by the KING-robust algorithm. With the appropriate filtering and thresholds in place both tools perform similarly, with DeepVariant tending to produce more accurate genotypes, though the resultant types of inaccuracies tended to produce slightly less accurate overall estimates of relatedness.

Subject(s)

High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Humans , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods , Genotype , Algorithms

6.

A genotype likelihood function for DNA mixtures.

Crysup, Benjamin; Woerner, August E.

Forensic Sci Int Genet ; 61: 102776, 2022 11.

Article in English | MEDLINE | ID: mdl-36152508

ABSTRACT

The recent advent of genetic genealogy has brought about a renewed interest in genome-scale forensic analyses, of which kinship estimation is a critical component. Most genomic kinship estimators consider SNPs (single nucleotide polymorphisms), often leveraging the co-inheritance of shared alleles to inform their analyses. While current estimators cannot directly evaluate mixed samples, there exist well-established SNP-based kinship estimators tailored to considering challenged samples, including low-pass whole genome sequencing. As an example, several studies have shown remarkable success in imputing genotype posterior probabilities in low template samples when linked sites are considered. Critical to these approaches is the ability to account for genotype uncertainty; the lack of an expression for a genotype likelihood in imbalanced mixtures has prevented direct application. This work develops such an expression. The formulation is fully compatible with genotype imputation software, suggesting a genomic pipeline that estimates genotype likelihoods, performs imputation, and then estimates kinship when the sample is a mixture. Further, when framed as an imbalanced mixture, the problem of mixture deconvolution is reducible to the problem of genotyping mixed samples. Herein, the ability to genotype two-person mixtures is assessed through example and in silico settings. While certain mixture scenarios and classes of sites are inherently inseparable, simulations of read depths between 60 and 190 appear to produce likelihoods of sufficient magnitude to deconvolve two-person mixtures whenever the mixture fraction is moderately imbalanced. The described approach and results suggest a path forward for estimating the kinship coefficient (and similar inferences on relatedness) when the sample is a mixture.

Subject(s)

DNA Fingerprinting , DNA , Humans , Likelihood Functions , Genotype , DNA Fingerprinting/methods , Alleles , DNA/genetics , DNA/analysis

7.

Evaluating the Impact of Dropout and Genotyping Error on SNP-Based Kinship Analysis With Forensic Samples.

Turner, Stephen D; Nagraj, V P; Scholz, Matthew; Jessa, Shakeel; Acevedo, Carlos; Ge, Jianye; Woerner, August E; Budowle, Bruce.

Front Genet ; 13: 882268, 2022.

Article in English | MEDLINE | ID: mdl-35846115

ABSTRACT

Technological advances in sequencing and single nucleotide polymorphism (SNP) genotyping microarray technology have facilitated advances in forensic analysis beyond short tandem repeat (STR) profiling, enabling the identification of unknown DNA samples and distant relationships. Forensic genetic genealogy (FGG) has facilitated the identification of distant relatives of both unidentified remains and unknown donors of crime scene DNA, invigorating the use of biological samples to resolve open cases. Forensic samples are often degraded or contain only trace amounts of DNA. In this study, the accuracy of genome-wide relatedness methods and identity by descent (IBD) segment approaches was evaluated in the presence of challenges commonly encountered with forensic data: missing data and genotyping error. Pedigree whole-genome simulations were used to estimate the genotypes of thousands of individuals with known relationships using multiple populations with different biogeographic ancestral origins. Simulations were also performed with varying error rates and types. Using these data, the performance of different methods for quantifying relatedness was benchmarked across these scenarios. When the genotyping error was low (<1%), IBD segment methods outperformed genome-wide relatedness methods for close relationships and are more accurate at distant relationship inference. However, with an increasing genotyping error (1-5%), methods that do not rely on IBD segment detection are more robust and outperform IBD segment methods. The reduced call rate had little impact on either class of methods. These results have implications for the use of dense SNP data in forensic genomics for distant kinship analysis and FGG, especially when the sample quality is low.

8.

Techniques for estimating genetically variable peptides and semi-continuous likelihoods from massively parallel sequencing data.

Woerner, August E; Crysup, Benjamin; Hewitt, F Curtis; Gardner, Myles W; Freitas, Michael A; Budowle, Bruce.

Forensic Sci Int Genet ; 59: 102719, 2022 07.

Article in English | MEDLINE | ID: mdl-35526505

ABSTRACT

Forensic genetic investigations typically rely on analysis of DNA for attribution purposes. There are times, however, when the amount and/or the quality of the DNA is limited, and thus little or no information can be obtained regarding the source of the sample. An alternative biochemical target that also contains genetic signatures is protein. One class of genetic signatures is protein polymorphisms that are a direct consequence of simple/single/short nucleotide polymorphisms (SNPs) in DNA. However, to interpret protein polymorphisms in a forensic context, certain complexities must be understood and addressed. These complexities include: 1) SNPs can generate 0, 1, or arbitrarily many polymorphisms in a polypeptide; and 2) as an object of expression that is modulated by alleles, genes and interactions with the environment, proteins may be present or absent in a given sample. To address these issues, a novel approach was taken to generate the expected protein alleles in a reference sample based on whole genome (or exome) sequence data and assess the significance of the evidence using a haplotype-based semi-continuous likelihood algorithm that leverages whole proteome data. Converting the genomic information into the proteomic information allows for the zero-to-many relationship between SNPs and GVPs to be abstracted away. When viewed as a haplotype, many GVPs that correspond to the same SNP is equivalent to many SNPs in perfect linkage disequilibrium (LD). As long as the likelihood formulation correctly accounts for LD, the correspondence between the SNP and the proteome can be safely neglected. Tests were performed on simulated samples, including single-source and two-person mixtures, and the power of using a classical semi-continuous likelihood versus one that has been adapted to neglect drop-out was compared. Additionally, summary statistics and a rudimentary set of decision guidelines were introduced to help identify mixtures from protein data.

Subject(s)

Proteome , Proteomics , DNA/genetics , Genotype , High-Throughput Nucleotide Sequencing/methods , Humans , Peptides/analysis , Peptides/genetics , Polymorphism, Single Nucleotide , Proteome/genetics , Sequence Analysis, DNA

9.

Determining Informative Microbial Single Nucleotide Polymorphisms for Human Identification.

Sherier, Allison J; Woerner, August E; Budowle, Bruce.

Appl Environ Microbiol ; 88(7): e0005222, 2022 04 12.

Article in English | MEDLINE | ID: mdl-35285713

ABSTRACT

The skin microbiome is a highly abundant and relatively stable source of DNA that may be utilized for human identification (HID). In this study, a set of single nucleotide polymorphisms (SNPs) with a high mean estimated Wright's fixation index (FST) (>0.1) and widespread abundance (found in ≥75% of samples compared) were selected from a diverse set of markers in the hidSkinPlex panel. The least absolute shrinkage and selection operator (LASSO) was used in a novel machine learning framework to generate a SNP panel and predict the human host from skin microbiome samples collected from the hand, manubrium, and foot. The framework was devised to emulate a new unknown person introduced to the algorithm and to match samples from that person against a population database. Unknown samples were classified with 96% accuracy (Matthews correlation coefficient [MCC], 0.954) in the test (n = 225 samples) data set. A final panel of informative SNPs was determined for HID (hidSkinPlex+) using all 51 individuals sampled at three body sites in triplicate. The hidSkinPlex+ panel comprises 365 SNPs and yielded prediction accuracy for the correct host of 95% (MCC = 0.949). The accuracy of the hidSkinPlex+ panel may be somewhat overestimated due to using 26 individuals from the training data set for the selection of the final panel. However, this accuracy still provides an indication of performance when tested on new samples. IMPORTANCE One of the fundamental goals in forensic genetics is to identify the source of biological evidence. Methods for detecting human DNA have advanced and can be quite sensitive, but not all DNA samples are amenable to current methods. However, the human skin microbiome is a source of DNA with high copy numbers, and it has the potential for high discriminatory power. The hidSkinPlex panel has been used for HID; however, some aspects of it could be improved. Missing information is ambiguous, as it is unclear if marker drop-out is a by-product of a low-template sample or if the reasons for not observing a marker are biological. Such ambiguity may confound methods for HID, and as such, an improved marker set (hidSkinPlex+) was designed that is considerably smaller and more robust to drop-out (365 SNPs contained in 135 markers) yet still can be used to accurately predict the human host.

Subject(s)

Microbiota , Polymorphism, Single Nucleotide , DNA , Forensic Anthropology , Genotype , High-Throughput Nucleotide Sequencing/methods , Humans , Microbiota/genetics , Sequence Analysis, DNA

10.

skater: an R package for SNP-based kinship analysis, testing, and evaluation.

Turner, Stephen D; Nagraj, V P; Scholz, Matthew; Jessa, Shakeel; Acevedo, Carlos; Ge, Jianye; Woerner, August E; Budowle, Bruce.

F1000Res ; 11: 18, 2022.

Article in English | MEDLINE | ID: mdl-35222994

ABSTRACT

Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.

Subject(s)

Genome , Pedigree , Polymorphism, Single Nucleotide , Computational Biology , Humans , Software

11.

ProSynAR: a reference aware read merger.

Crysup, Benjamin; Budowle, Bruce; Woerner, August E.

Bioinformatics ; 38(7): 2052-2053, 2022 03 28.

Article in English | MEDLINE | ID: mdl-35020788

ABSTRACT

MOTIVATION: Read-merging algorithms that look solely at the reads can misalign and mis-merge the reads (especially near repetitive sequences). RESULTS: The C++ program ProSynAR has been written to take the reads' position in the reference into account when performing (and deciding whether to perform) a merge. AVAILABILITY: *Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/prosynar). Windows binary available at https://github.com/Benjamin-Crysup/prosynar/releases/download/1.0/prosynar.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

High-Throughput Nucleotide Sequencing , Software , Sequence Analysis, DNA , Algorithms , Repetitive Sequences, Nucleic Acid

12.

vcferr: Development, validation, and application of a single nucleotide polymorphism genotyping error simulation framework.

Nagraj, V P; Scholz, Matthew; Jessa, Shakeel; Ge, Jianye; Woerner, August E; Huang, Meng; Budowle, Bruce; Turner, Stephen D.

F1000Res ; 11: 775, 2022.

Article in English | MEDLINE | ID: mdl-38779458

ABSTRACT

Motivation: Genotyping error can impact downstream single nucleotide polymorphism (SNP)-based analyses. Simulating various modes and levels of error can help investigators better understand potential biases caused by miscalled genotypes. Methods: We have developed and validated vcferr, a tool to probabilistically simulate genotyping error and missingness in variant call format (VCF) files. We demonstrate how vcferr could be used to address a research question by introducing varying levels of error of different type into a sample in a simulated pedigree, and assessed how kinship analysis degrades as a function of the kind and type of error. Software availability: vcferr is available for installation via PyPi (https://pypi.org/project/vcferr/) or conda (https://anaconda.org/bioconda/vcferr). The software is released under the MIT license with source code available on GitHub (https://github.com/signaturescience/vcferr).

13.

Optimization of proteomics sample preparation for forensic analysis of skin samples.

Baniasad, Maryam; Reed, Andrew J; Lai, Stella M; Zhang, Liwen; Schulte, Kathleen Q; Smith, Alan R; LeSassier, Danielle S; Weber, Katharina L; Hewitt, F Curtis; Woerner, August E; Gardner, Myles W; Wysocki, Vicki H; Freitas, Michael A.

J Proteomics ; 249: 104360, 2021 10 30.

Article in English | MEDLINE | ID: mdl-34481086

ABSTRACT

We present an efficient protein extraction and in-solution enzymatic digestion protocol optimized for mass spectrometry-based proteomics studies of human skin samples. Human skin cells are a proteinaceous matrix that can enable forensic identification of individuals. We performed a systematic optimization of proteomic sample preparation for a protein-based human forensic identification application. Digestion parameters, including incubation duration, temperature, and the type and concentration of surfactant, were systematically varied to maximize digestion completeness. Through replicate digestions, parameter optimization was performed to maximize repeatability and increase the number of identified peptides and proteins. Final digestion conditions were selected based on the parameters that yielded the greatest percent of peptides with zero missed tryptic cleavages, which benefit the analysis of genetically variable peptides (GVPs). We evaluated the final digestion conditions for identification of GVPs by applying MS-based proteomics on a mixed-donor sample. The results were searched against a human proteome database appended with a database of GVPs constructed from known non-synonymous single nucleotide polymorphisms (SNPs) that occur at known population frequencies. The aim of this study was to demonstrate the potential of our proteomics sample preparation for future implementation of GVP analysis by forensic laboratories to facilitate human identification. SIGNIFICANCE: Genetically variable peptides (GVPs) can provide forensic evidence that is complementary to traditional DNA profiling and be potentially used for human identification. An efficient protein extraction and reproducible digestion method of skin proteins is a key contributor for downstream analysis of GVPs and further development of this technology in forensic application. In this study, we optimized the enzymatic digestion conditions, such as incubation time and temperature, for skin samples. Our study is among the first attempts towards optimization of proteomics sample preparation for protein-based skin identification in forensic applications such as touch samples. Our digestion method employs RapiGest (an acid-labile surfactant), trypsin enzymatic digestion, and an incubation time of 16 h at 37 °C.

Subject(s)

Peptides , Proteomics , Forensic Medicine , Humans , Mass Spectrometry , Proteome , Trypsin

14.

MMDIT: A tool for the deconvolution and interpretation of mitochondrial DNA mixtures.

Mandape, Sammed N; Smart, Utpal; King, Jonathan L; Muenzler, Melissa; Kapema, Kapema Bupe; Budowle, Bruce; Woerner, August E.

Forensic Sci Int Genet ; 55: 102568, 2021 11.

Article in English | MEDLINE | ID: mdl-34416654

ABSTRACT

Short tandem repeats of the nuclear genome have been the preferred markers for analyzing forensic DNA mixtures. However, when nuclear DNA in a sample is degraded or limited, mitochondrial DNA (mtDNA) markers provide a powerful alternative. Though historically considered challenging, the interpretation and analysis of mtDNA mixtures have recently seen renewed interest with the advent of massively parallel sequencing. However, there are only a few software tools available for mtDNA mixture interpretation. To address this gap, the Mitochondrial Mixture Deconvolution and Interpretation Tool (MMDIT) was developed. MMDIT is an interactive application complete with a graphical user interface that allows users to deconvolve mtDNA (whole or partial genomes) mixtures into constituent donor haplotypes and estimate random match probabilities on these resultant haplotypes. In cases where deconvolution might not be feasible, the software allows mixture analysis directly within a binary framework (i.e. qualitatively, only using data on allele presence/absence). This paper explains the functionality of MMDIT, using an example of an in vitro two-person mtDNA mixture with a ratio of 1:4. The uniqueness of MMDIT lies in its ability to resolve mixtures into complete donor haplotypes using a statistical phasing framework before mixture analysis and evaluating statistical weights employing a novel graph algorithm approach. MMDIT is the first available open-source software that can automate mtDNA mixture deconvolution and analysis. The MMDIT web application can be accessed online at https://www.unthsc.edu/mmdit/. The source code is available at https://github.com/SammedMandape/MMDIT_UI and archived on zenodo (https://doi.org/10.5281/zenodo.4770184).

Subject(s)

DNA, Mitochondrial , High-Throughput Nucleotide Sequencing , DNA, Mitochondrial/genetics , Haplotypes , Humans , Sequence Analysis, DNA , Software

15.

Population Informative Markers Selected Using Wright's Fixation Index and Machine Learning Improves Human Identification Using the Skin Microbiome.

Sherier, Allison J; Woerner, August E; Budowle, Bruce.

Appl Environ Microbiol ; 87(20): e0120821, 2021 09 28.

Article in English | MEDLINE | ID: mdl-34379455

ABSTRACT

Microbial DNA, shed from human skin, can be distinctive to its host and, thus, help individualize donors of forensic biological evidence. Previous studies have utilized single-locus microbial DNA markers (e.g., 16S rRNA) to assess the presence/absence of personal microbiota to profile human hosts. However, since the taxonomic composition of the microbiome is in constant fluctuation, this approach may not be sufficiently robust for human identification (HID). Multimarker approaches may be more powerful. Additionally, genetic differentiation, rather than taxonomic distinction, may be more individualizing. To this end, the nondominant hands of 51 individuals were sampled in triplicate (n = 153). They were analyzed for markers in the hidSkinPlex, a multiplex panel comprising candidate markers for skin microbiome profiling. Single-nucleotide polymorphisms (SNPs) with the highest Wright's fixation index (FST) estimates were then selected for predicting donor identity using a support vector machine (SVM) learning model. FST is an estimate of the genetic differences within and between populations. Three different SNP selection criteria were employed: SNPs with the highest-ranking FST estimates (i) common between any two samples regardless of markers present (termed overall); (ii) each marker common between samples (termed per marker); and (iii) common to all samples used to train the SVM algorithm for HID (termed selected). The SNPs chosen based on criteria for overall, per marker, and selected methods resulted in an accuracy of 92.00%, 94.77%, and 88.00%, respectively. The results support that estimates of FST, combined with SVM, can notably improve forensic HID via skin microbiome profiling. IMPORTANCE There is a need for additional genetic information to help identify the source of biological evidence found at a crime scene. The human skin microbiome is a potentially abundant source of DNA that can enable the identification of a donor of biological evidence. With microbial profiling for human identification, there will be an additional source of DNA to identify individuals as well as to exclude individuals wrongly associated with biological evidence, thereby improving the utility of forensic DNA profiling to support criminal investigations.

Subject(s)

Microbiota , Skin/microbiology , Bacteria/genetics , Forensic Anthropology , Humans , Machine Learning , Polymorphism, Single Nucleotide , Support Vector Machine

16.

Evaluation of Promega PowerSeq™ Auto/Y systems prototype on an admixed sample of Rio de Janeiro, Brazil: Population data, sensitivity, stutter and mixture studies.

Moura-Neto, Rodrigo; King, Jonathan L; Mello, Isadora; Dias, Victor; Crysup, Benjamin; Woerner, August E; Budowle, Bruce; Silva, Rosane.

Forensic Sci Int Genet ; 53: 102516, 2021 07.

Article in English | MEDLINE | ID: mdl-33878618

ABSTRACT

Forensic DNA typing typically relies on the length-based (LB) separation of PCR products containing short tandem repeat loci (STRs). Massively parallel sequencing (MPS) elucidates an additional level of STR motif and flanking region variation. Also, MPS enables simultaneous analysis of different marker-types - autosomal STRs, SNPs for lineage and identification purposes, reducing both the amount of sample used and the turn-around-time of analysis. Therefore, MPS methodologies are being considered as an additional tool in forensic genetic casework. The PowerSeq™ Auto/Y System (Promega Corp), a multiplex forensic kit for MPS, enables analysis of the 22 autosomal STR markers (plus Amelogenin) from the PowerPlex® Fusion 6C kit and 23 Y-STR markers from the PowerPlex® Y23 kit. Population data were generated from 140 individuals from an admixed sample from Rio de Janeiro, Brazil. All samples were processed according to the manufacturers' recommended protocols. Raw data (FastQ) were generated for each indexed sample and analyzed using STRait Razor v2s and PowerSeqv2.config file. The subsequent population data showed the largest increase in expected heterozygosity (23%), from LB to sequence-based (SB) analyses at the D5S818 locus. Unreported allele was found at the D21S11 locus. The random match probability across all loci decreased from 5.9 × 10-28 to 7.6 × 10-33. Sensitivity studies using 1, 0.25, 0.062 and 0.016 ng of DNA input were analyzed in triplicate. Full Y-STR profiles were detected in all samples, and no autosomal allele drop-out was observed with 62 pg of input DNA. For mixture studies, 1 ng of genomic DNA from a male and female sample at 1:1, 1:4, 1:9, 1:19 and 1:49 proportions were analyzed in triplicate. Clearly resolvable alleles (i.e., no stacking or shared alleles) were obtained at a 1:19 male to female contributor ratio. The minus one stutter (-1) increased with the longest uninterrupted stretch (LUS) allele size reads and according to simple or compound/complex repeats. The haplotype-specific stutter rates add more information for mixed samples interpretation. These data support the use of the PowerSeqTM Auto/Y systems prototype kit (22 autosomal STR loci, 23 Y-STR loci and Amelogenin) for forensic genetics applications.

Subject(s)

DNA Fingerprinting/instrumentation , High-Throughput Nucleotide Sequencing/instrumentation , Microsatellite Repeats , Brazil , Chromosomes, Human, Y , Female , Gene Frequency , Genetic Markers , Humans , Male , Polymerase Chain Reaction , Sequence Analysis, DNA

17.

A Continuous Statistical Phasing Framework for the Analysis of Forensic Mitochondrial DNA Mixtures.

Smart, Utpal; Cihlar, Jennifer Churchill; Mandape, Sammed N; Muenzler, Melissa; King, Jonathan L; Budowle, Bruce; Woerner, August E.

Genes (Basel) ; 12(2)2021 01 20.

Article in English | MEDLINE | ID: mdl-33498312

ABSTRACT

Despite the benefits of quantitative data generated by massively parallel sequencing, resolving mitotypes from mixtures occurring in certain ratios remains challenging. In this study, a bioinformatic mixture deconvolution method centered on population-based phasing was developed and validated. The method was first tested on 270 in silico two-person mixtures varying in mixture proportions. An assortment of external reference panels containing information on haplotypic variation (from similar and different haplogroups) was leveraged to assess the effect of panel composition on phasing accuracy. Building on these simulations, mitochondrial genomes from the Human Mitochondrial DataBase were sourced to populate the panels and key parameter values were identified by deconvolving an additional 7290 in silico two-person mixtures. Finally, employing an optimized reference panel and phasing parameters, the approach was validated with in vitro two-person mixtures with differing proportions. Deconvolution was most accurate when the haplotypes in the mixture were similar to haplotypes present in the reference panel and when the mixture ratios were neither highly imbalanced nor subequal (e.g., 4:1). Overall, errors in haplotype estimation were largely bounded by the accuracy of the mixture's genotype results. The proposed framework is the first available approach that automates the reconstruction of complete individual mitotypes from mixtures, even in ratios that have traditionally been considered problematic.

Subject(s)

DNA, Mitochondrial , Forensic Genetics/methods , High-Throughput Nucleotide Sequencing , Models, Statistical , Algorithms , Bayes Theorem , Computational Biology/methods , Genome, Mitochondrial , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Polymorphism, Single Nucleotide , Reproducibility of Results , Sequence Analysis, DNA/methods

18.

STRait Razor Online: An enhanced user interface to facilitate interpretation of MPS data.

King, Jonathan L; Woerner, August E; Mandape, Sammed N; Kapema, Kapema Bupe; Moura-Neto, Rodrigo Soares; Silva, Rosane; Budowle, Bruce.

Forensic Sci Int Genet ; 52: 102463, 2021 05.

Article in English | MEDLINE | ID: mdl-33493821

ABSTRACT

Since 2013, STRait Razor has enabled analysis of massively parallel sequencing (MPS) data from various marker systems such as short tandem repeats, single nucleotide polymorphisms, insertion/deletions, and mitochondrial DNA. In this paper, STRait Razor Online (SRO), available at https://www.unthsc.edu/straitrazor, is introduced as an interactive, Shiny-based user interface for primary analysis of MPS data and secondary analysis of STRait Razor haplotype pileups. This software can be accessed from any common browser via desktop, tablet, or smartphone device. SRO is available also as a standalone application and open-source R script available at https://github.com/ExpectationsManaged/STRaitRazorOnline. The local application is capable of batch processing of both fastq files and primary analysis output. Processed batches generate individual report folders and summary reports at the locus- and haplotype-level in a matter of minutes. For example, the processing of data from â¼700 samples generated with the ForenSeq Signature Preparation Kit from allsequences.txt to a final table can be performed in â¼40 min whereas the Excel-based workbooks can take 35-60 h to compile a subset of the tables generated by SRO. To facilitate analysis of single-source, reference samples, a preliminary triaging system was implemented that calls potential alleles and flags loci suspected of severe heterozygote imbalance. When compared to published, manually curated data sets, 98.72 % of software-assigned allele calls without manual interpretation were consistent with curated data sets, 0.99 % loci were presented to the user for interpretation due to heterozygote imbalance, and the remaining 0.29 % of loci were inconsistent due to the analytical thresholds used across the studies.

Subject(s)

High-Throughput Nucleotide Sequencing , Software , User-Computer Interface , DNA Fingerprinting , Humans , Internet , Microsatellite Repeats , Sequence Analysis, DNA

19.

ProDerAl: reference position dependent alignment.

Crysup, Benjamin; Budowle, Bruce; Woerner, August E.

Bioinformatics ; 37(16): 2479-2480, 2021 Aug 25.

Article in English | MEDLINE | ID: mdl-33459758

ABSTRACT

MOTIVATION: Current read-mapping software uses a singular specification of alignment parameters with respect to the reference. In the presence of varying reference structures (such as the repetitive regions of the human genome), alignments can be improved if those parameters are allowed vary. RESULTS: To that end, the C++ program ProDerAl was written to refine previously generated alignments using varying parameters for these problematic regions. Synthetic benchmarks show that this realignment can result in an order of magnitude fewer misaligned bases. AVAILABILITY AND IMPLEMENTATION: *Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/proderal.git). Windows binary available at https://github.com/Benjamin-Crysup/proderal/releases/download/v1.1/proderal.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

20.

mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes.

Song, Bing; Woerner, August E; Planz, John.

BMC Bioinformatics ; 22(1): 12, 2021 Jan 06.

Article in English | MEDLINE | ID: mdl-33407074

ABSTRACT

BACKGROUND: Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. RESULTS: This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package "mixIndependR" calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy-Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. CONCLUSION: The package "mixIndependR" is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package "mixIndependR" makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. AVAILABILITY: The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html .

Subject(s)

Genetic Loci/genetics , Genomics/methods , Software , Databases, Genetic , Genotype , Linkage Disequilibrium/genetics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL