Search | VHL Regional Portal

1.

ProbRank: An efficient DNA database search method for complex mixtures per a quantitative likelihood ratio model.

Hoogenboom, Jerry; Sijen, Titia; Benschop, Corina.

Forensic Sci Int Genet ; 65: 102884, 2023 07.

Article in English | MEDLINE | ID: mdl-37150077

ABSTRACT

Searching a DNA Database with a DNA profile from an evidentiary trace can provide investigative leads in a forensic case. Various searching approaches exist such as conventional methods based on matching alleles or more advanced methods computing likelihood ratios (LR) while considering drop-in and drop-out. Here we examine the potential of using a quantitative LR model (EuroForMix model incorporated in ProbRank method) that takes peak heights into account in comparison to a qualitative LR model (LRmix model implemented in SmartRank method). Both methods present DNA database candidates in order of decreasing LR. Especially regarding minor contributors in complex mixtures, the method using the quantitative model outperforms the method using the qualitative model in terms of sensitivity and specificity as more true donors and less adventitious matches are retrieved. ProbRank is to be implemented in DNAStatistX and is sufficiently fast for daily use.

Subject(s)

Databases, Nucleic Acid , Software , Humans , DNA Fingerprinting/methods , Likelihood Functions , Complex Mixtures/genetics , Microsatellite Repeats

2.

Advancing FDSTools by integrating STRNaming 1.1.

Hoogenboom, Jerry; Weiler, N; Busscher, L; Struik, L; Sijen, Titia; van der Gaag, Kristiaan J.

Forensic Sci Int Genet ; 61: 102768, 2022 11.

Article in English | MEDLINE | ID: mdl-35994887

ABSTRACT

The introduction of massively parallel sequencing in forensic analysis has been facilitated with typing kits, analysis software and allele naming tools such as the ForenSeq DNA Signature Prep (DSP) kit, FDSTools and STRNaming respectively. Here we describe how FDSTools 2.0 with integrated and refined STRNaming nomenclature was validated for implementation under ISO 17025 accreditation for the ForenSeq DSP kit. Newly-added options result in efficient automatic allele calling for the majority of markers while specific settings are applied for 'novel' sequence variants to avoid the calling of remaining variable noise observed in samples sequenced with the ForenSeq DSP kit that seem to arise in the PCR. Genome-wide built-in reference data allows for greatly simplified configuration of allele naming for human targets.

Subject(s)

DNA Fingerprinting , Microsatellite Repeats , Humans , High-Throughput Nucleotide Sequencing , Alleles , DNA , Sequence Analysis, DNA , Polymorphism, Single Nucleotide

3.

The LOVD3 platform: efficient genome-wide sharing of genetic variants.

Fokkema, Ivo F A C; Kroon, Mark; López Hernández, Julia A; Asscheman, Daan; Lugtenburg, Ivar; Hoogenboom, Jerry; den Dunnen, Johan T.

Eur J Hum Genet ; 29(12): 1796-1803, 2021 12.

Article in English | MEDLINE | ID: mdl-34521998

ABSTRACT

Gene variant databases are the backbone of DNA-based diagnostics. These databases, also called Locus-Specific DataBases (LSDBs), store information on variants in the human genome and the observed phenotypic consequences. The largest collection of public databases uses the free, open-source LOVD software platform. To cope with the current demand for online databases, we have entirely redesigned the LOVD software. LOVD3 is genome-centered and can be used to store summary variant data, as well as full case-level data with information on individuals, phenotypes, screenings, and variants. While built on a standard core, the software is highly flexible and allows personalization to cope with the largely different demands of gene/disease database curators. LOVD3 follows current standards and includes tools to check variant descriptions, generate HTML files of reference sequences, predict the consequences of exon deletions/duplications on the reading frame, and link to genomic views in the different genomes browsers. It includes APIs to collect and submit data. The software is used by about 100 databases, of which 56 public LOVD instances are registered on our website and together contain 1,000,000,000 variant observations in 1,500,000 individuals. 42 LOVD instances share data with the federated LOVD data network containing 3,000,000 unique variants in 23,000 genes. This network can be queried directly, quickly identifying LOVD instances containing relevant information on a searched variant.

Subject(s)

Databases, Genetic/standards , Polymorphism, Genetic , Software , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study/methods , Humans

4.

Application of a probabilistic genotyping software to MPS mixture STR data is supported by similar trends in LRs compared with CE data.

Benschop, Corina C G; van der Gaag, Kristiaan J; de Vreede, Jennifer; Backx, Anouk J; de Leeuw, Rick H; Zuñiga, Sofia; Hoogenboom, Jerry; de Knijff, Peter; Sijen, Titia.

Forensic Sci Int Genet ; 52: 102489, 2021 05.

Article in English | MEDLINE | ID: mdl-33677249

ABSTRACT

The interpretation of short tandem repeat (STR) profiles can be challenging when, for example, alleles are masked due to allele sharing among contributors and/or when they are subject to drop-out, for instance from sample degradation. Mixture interpretation can be improved by increasing the number of STRs and/or loci with a higher discriminatory power. Both capillary electrophoresis (CE, 6-dye) and massively parallel sequencing (MPS) provide a platform for analysing relatively large numbers of autosomal STRs. In addition, MPS enables distinguishing between sequence variants, resulting in enlarged discriminatory power. Also, MPS allows for small amplicon sizes for all loci as spacing is not an issue, which is beneficial with degraded DNA. Altogether, MPS has the potential to increase the weights of evidence for true contributors to (complex) DNA profiles. In this study, likelihood ratio (LR) calculations were performed using STR profiles obtained with two different MPS systems and analysed using different settings: 1) MPS PowerSeq™ Auto System profiles analysed using FDSTools equipped with optimized settings such as noise correction, 2) ForenSeq™ DNA Signature Prep Kit profiles analysed using the default settings in the Universal Analysis Software (UAS), and 3) ForenSeq™ DNA Signature Prep Kit profiles analysed using FDSTools empirically adapted to cope with one-directional reads and provisional, basic settings. The LR calculations used genotyping data for two- to four-person mixtures varying for mixture proportion, level of drop-out and allele sharing and were generated with the continuous model EuroForMix. The LR results for the over 2000 sets of propositions were affected by the variation for the number of markers and analysis settings used in the three approaches. Nevertheless, trends for true and non-contributors, effects of replicates, assigned number of contributors, and model validation results were comparable for the three MPS approaches and alike the trends known for CE data. Based on this analogy, we regard the probabilistic interpretation of MPS STR data fit for forensic DNA casework. In addition, guidelines were derived on when to apply LR calculations to MPS autosomal STR data and report the corresponding results.

Subject(s)

DNA Fingerprinting , High-Throughput Nucleotide Sequencing , Likelihood Functions , Software , Alleles , Electrophoresis, Capillary , Genotype , Humans , Microsatellite Repeats , Sequence Analysis, DNA

5.

STRNaming: Generating simple, informative names for sequenced STR alleles in a standardised and automated manner.

Hoogenboom, Jerry; Sijen, Titia; van der Gaag, Kristiaan J.

Forensic Sci Int Genet ; 52: 102473, 2021 05.

Article in English | MEDLINE | ID: mdl-33607395

ABSTRACT

The introduction of Massively Parallel Sequencing in the forensic domain has exposed the need for comprehensive nomenclature of sequenced Short Tandem Repeat (STR) alleles. In general, three strategies are at hand: 1) the full sequence mapped to the human genome reference sequence, which ensures exact data exchange; 2) shortened, human-readable formats for forensic reporting and data presentation and 3) very short codes that enable compact figures and tables but do not convey any sequence information. Here, we describe an algorithm of the second type: STRNaming, which generates human-readable names for sequenced STR alleles. STRNaming is guided by a reference sequence at each locus and then functions independently to automatically assign a unique, sequence-descriptive name that also includes the capillary electrophoresis allele number. STRNaming settings were established based on preferences that were surveyed internationally in the forensic community. These settings ensure that a small change in the sequence corresponds to a small change in the allele name, which is helpful for recognising for instance stutter products. Sequence variants outside of the repeat units are indicated as simple variant calls. Since the STR name is sequence-descriptive, the sequence can be traced back from the allele name. Because STRNaming is fully guided by an assignable reference sequence, no central coordination or configuration is required and the method will work for any STR locus, be it autosomal, Y-, X-chromosomal in current or future use. The algorithm is publicly available online and offline.

Subject(s)

Algorithms , Alleles , Microsatellite Repeats , DNA Fingerprinting , Genome, Human , Humans , Sequence Analysis, DNA

6.

Multi-laboratory validation of DNAxs including the statistical library DNAStatistX.

Benschop, Corina C G; Hoogenboom, Jerry; Bargeman, Fiep; Hovers, Pauline; Slagter, Martin; van der Linden, Jennifer; Parag, Raymond; Kruise, Dennis; Drobnic, Katja; Klucevsek, Gregor; Parson, Walther; Berger, Burkhard; Laurent, Francois Xavier; Faivre, Magalie; Ulus, Ayhan; Schneider, Peter; Bogus, Magdalena; Kneppers, Alexander L J; Sijen, Titia.

Forensic Sci Int Genet ; 49: 102390, 2020 11.

Article in English | MEDLINE | ID: mdl-32937255

ABSTRACT

This study describes a multi-laboratory validation of DNAxs, a DNA eXpert System for the data management and probabilistic interpretation of DNA profiles [1], and its statistical library DNAStatistX to which, besides the organising laboratory, four laboratories participated. The software was modified to read multiple data formats and the study was performed prior to the release of the software to the forensic community. The first exercise explored all main functionalities of DNAxs with feedback on user-friendliness, installation and general performance. Next, every laboratory performed likelihood ratio (LR) calculations using their own dataset and a dataset provided by the organising laboratory. The organising laboratory performed LR calculations using all datasets. The datasets were generated with different STR typing kits or analysis systems and consisted of samples varying in DNA amounts, mixture ratios, number of contributors and drop-out level. Hypothesis sets had the correct, under- and over-assigned number of contributors and true and false donors as person of interest. When comparing the results between laboratories, the LRs were foremost within one unit on log10 scale. The few LR results that deviated more had differences for the parameters estimated by the optimizer within DNAStatistX. Some of these were indicated by failed iteration results, others by a failed model validation, since unrealistic hypotheses were included. When these results that do not meet the quality criteria were excluded, as is in accordance with interpretation guidelines, none of the analyses in the different laboratories yielded a different statement in the casework report. Nonetheless, changes in software parameters were sought that minimized differences in outcomes, which made the DNAStatistX module more robust. Overall, the software was found intuitive, user-friendly and valid for use in multiple laboratories.

Subject(s)

DNA Fingerprinting , Laboratories , Likelihood Functions , Software , Data Management , Humans , Microsatellite Repeats , Statistics as Topic

7.

Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach.

Benschop, Corina C G; van der Linden, Jennifer; Hoogenboom, Jerry; Ypma, Rolf; Haned, Hinda.

Forensic Sci Int Genet ; 43: 102150, 2019 11.

Article in English | MEDLINE | ID: mdl-31476660

ABSTRACT

The number of contributors (NOC) to (complex) autosomal STR profiles cannot be determined with absolute certainty due to complicating factors such as allele sharing and allelic drop-out. The precision of NOC estimations can be improved by increasing the number of (highly polymorphic) markers, the use of massively parallel sequencing instead of capillary electrophoresis, and/or using more profile information than only the allele counts. In this study, we focussed on machine learning approaches in order to make maximum use of the profile information. To this end, a set of 590 PowerPlex® Fusion 6C profiles with one up to five contributors were generated from a total of 1174 different donors. This set varied for the template amount of DNA, mixture proportion, levels of allele sharing, allelic drop-out and degradation. The dataset contained labels with known NOC and was split into a training, test and hold-out set. The training set was used to optimize ten different algorithms with selection of profile characteristics. Per profile, over 250 characteristics, denoted 'features', were calculated. These features were based on allele counts, peak heights and allele frequencies. The features that were most related to the NOC were selected based on partial correlation using the training set. Next, the performance of each model (=combination of features plus algorithm) was examined using the test set. A random forest classifier with 19 features, denoted the 'RFC19-model' showed best performance and was selected for further validation. Results showed improved accuracy compared to the conventional maximum allele count approach and an in-house nC-tool based on the total allele count. The method is extremely fast and regarded useful for application in forensic casework.

Subject(s)

DNA Fingerprinting/methods , DNA/genetics , Machine Learning , Microsatellite Repeats , Algorithms , Alleles , DNA Degradation, Necrotic , Gene Frequency , Humans

8.

DNAxs/DNAStatistX: Development and validation of a software suite for the data management and probabilistic interpretation of DNA profiles.

Benschop, Corina C G; Hoogenboom, Jerry; Hovers, Pauline; Slagter, Martin; Kruise, Dennis; Parag, Raymond; Steensma, Kristy; Slooten, Klaas; Nagel, Jord H A; Dieltjes, Patrick; van Marion, Vincent; van Paassen, Heidi; de Jong, Jeroen; Creeten, Christophe; Sijen, Titia; Kneppers, Alexander L J.

Forensic Sci Int Genet ; 42: 81-89, 2019 09.

Article in English | MEDLINE | ID: mdl-31254947

ABSTRACT

The data management, interpretation and comparison of sets of DNA profiles can be complex, time-consuming and error-prone when performed manually. This, combined with the growing numbers of genetic markers in forensic identification systems calls for expert systems that can automatically compare genotyping results within (large) sets of DNA profiles and assist in profile interpretation. To that aim, we developed a user-friendly software program or DNA eXpert System that is denoted DNAxs. This software includes features to view, infer and match autosomal short tandem repeat profiles with connectivity to up and downstream software programs. Furthermore, DNAxs has imbedded the 'DNAStatistX' module, a statistical library that contains a probabilistic algorithm to calculate likelihood ratios (LRs). This algorithm is largely based on the source code of the quantitative probabilistic genotyping system EuroForMix [1]. The statistical library, DNAStatistX, supports parallel computing which can be delegated to a computer cluster and enables automated queuing of requested LR calculations. DNAStatistX is written in Java and is accessible separately or via DNAxs. Using true and non-contributors to DNA profiles with up to four contributors, the DNAStatistX accuracy and precision were assessed by comparing the DNAStatistX results to those of EuroForMix. Results were the same up to rare differences that could be attributed to the different optimizers used in both software programs. Implementation of dye specific detection thresholds resulted in larger likelihood values and thus a better explanation of the data used in this study. Furthermore, processing time, robustness of DNAStatistX results and the circumstances under which model validations failed were examined. Finally, guidelines for application of the software are shared as an example. The DNAxs software is future-proof as it applies a modular approach by which novel functionalities can be incorporated.

Subject(s)

DNA Fingerprinting , Data Management , Likelihood Functions , Software , Algorithms , DNA, Mitochondrial/genetics , Datasets as Topic , Genotyping Techniques , High-Throughput Nucleotide Sequencing , Humans , Microsatellite Repeats , Software Design , Statistics as Topic

9.

Human-associated microbial populations as evidence in forensic casework.

Quaak, Frederike C A; van Duijn, Tineke; Hoogenboom, Jerry; Kloosterman, Ate D; Kuiper, Irene.

Forensic Sci Int Genet ; 36: 176-185, 2018 09.

Article in English | MEDLINE | ID: mdl-30036744

ABSTRACT

In forensic investigations involving human biological traces, cell type identification is often required. Identifying the cell type from which a human STR profile has originated can assist in verifying scenarios. Several techniques have been developed for this purpose, most of which focus on molecular characteristics of human cells. Here we present a microarray method focusing on the microbial populations that are associated with human cell material. A microarray with 863 probes targeting (sets of) species, specific genera, groups of genera or families was designed for this study and evaluated with samples from different body sites: hand, foot, groin, penis, vagina, mouth and faeces. In total 175 samples from healthy individuals were analysed. Next to human faeces, 15 feline and 15 canine faeces samples were also included. Both clustering and classification analysis were used for data analysis. Faecal and oral samples could clearly be distinguished from vaginal and skin samples, and also canine and feline faeces could be differentiated from human faeces. Some penis samples showed high similarity to vaginal samples, others to skin samples. Discriminating between skin samples from different skin sites proved to be challenging. As a proof of principle, twenty-one mock case samples were analysed with the microarray method. All mock case samples were clustered or classified within the correct main cluster/group. Only two of the mock case samples were assigned to the wrong sub-cluster/class; with classification one additional sample was classified within the wrong sub-class. Overall, the microarray method is a valuable addition to already existing cell typing techniques. Combining the results of microbial population analysis with for instance mRNA typing can increase the evidential value of a trace, since both techniques focus on independent targets within a sample.

Subject(s)

Bacteria/isolation & purification , Microarray Analysis , Adolescent , Adult , Aged , Animals , Bacteria/genetics , Biodiversity , Cats , DNA Probes , DNA, Bacterial/genetics , DNA, Bacterial/isolation & purification , Dogs , Feces/microbiology , Female , Foot/microbiology , Groin/microbiology , Hand/microbiology , Humans , Male , Middle Aged , Mouth/microbiology , Penis/microbiology , Polymerase Chain Reaction , Principal Component Analysis , Skin/microbiology , Vagina/microbiology , Young Adult

10.

An image-processing methodology for extracting bloodstain pattern features.

Arthur, Ravishka M; Humburg, Philomena J; Hoogenboom, Jerry; Baiker, Martin; Taylor, Michael C; de Bruin, Karla G.

Forensic Sci Int ; 277: 122-132, 2017 Aug.

Article in English | MEDLINE | ID: mdl-28646752

ABSTRACT

There is a growing trend in forensic science to develop methods to make forensic pattern comparison tasks more objective. This has generally involved the application of suitable image-processing methods to provide numerical data for identification or comparison. This paper outlines a unique image-processing methodology that can be utilised by analysts to generate reliable pattern data that will assist them in forming objective conclusions about a pattern. A range of features were defined and extracted from a laboratory-generated impact spatter pattern. These features were based in part on bloodstain properties commonly used in the analysis of spatter bloodstain patterns. The values of these features were consistent with properties reported qualitatively for such patterns. The image-processing method developed shows considerable promise as a way to establish measurable discriminating pattern criteria that are lacking in current bloodstain pattern taxonomies.

Subject(s)

Blood Stains , Image Processing, Computer-Assisted/methods , Animals , Forensic Sciences/methods , Humans , Statistics as Topic

11.

Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc.

Forensic Sci Int Genet ; 30: 66-70, 2017 09.

Article in English | MEDLINE | ID: mdl-28633070

ABSTRACT

We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. FST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10-70) and 1-(5.9×10-73), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10-20) and 1-(2.0×10-21).

Subject(s)

Ethnicity/genetics , Microsatellite Repeats , Polymorphism, Single Nucleotide , Alleles , Chromosomes, Human, X , Chromosomes, Human, Y , Female , Genetics, Population , High-Throughput Nucleotide Sequencing , Humans , Male , Spain

12.

FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.

Hoogenboom, Jerry; van der Gaag, Kristiaan J; de Leeuw, Rick H; Sijen, Titia; de Knijff, Peter; Laros, Jeroen F J.

Forensic Sci Int Genet ; 27: 27-40, 2017 03.

Article in English | MEDLINE | ID: mdl-27914278

ABSTRACT

Massively parallel sequencing (MPS) is on the advent of a broad scale application in forensic research and casework. The improved capabilities to analyse evidentiary traces representing unbalanced mixtures is often mentioned as one of the major advantages of this technique. However, most of the available software packages that analyse forensic short tandem repeat (STR) sequencing data are not well suited for high throughput analysis of such mixed traces. The largest challenge is the presence of stutter artefacts in STR amplifications, which are not readily discerned from minor contributions. FDSTools is an open-source software solution developed for this purpose. The level of stutter formation is influenced by various aspects of the sequence, such as the length of the longest uninterrupted stretch occurring in an STR. When MPS is used, STRs are evaluated as sequence variants that each have particular stutter characteristics which can be precisely determined. FDSTools uses a database of reference samples to determine stutter and other systemic PCR or sequencing artefacts for each individual allele. In addition, stutter models are created for each repeating element in order to predict stutter artefacts for alleles that are not included in the reference set. This information is subsequently used to recognise and compensate for the noise in a sequence profile. The result is a better representation of the true composition of a sample. Using Promega Powerseq™ Auto System data from 450 reference samples and 31 two-person mixtures, we show that the FDSTools correction module decreases stutter ratios above 20% to below 3%. Consequently, much lower levels of contributions in the mixed traces are detected. FDSTools contains modules to visualise the data in an interactive format allowing users to filter data with their own preferred thresholds.

Subject(s)

Artifacts , High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Software , Alleles , Amelogenin/genetics , DNA Fingerprinting , Humans , Polymerase Chain Reaction

13.

Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system.

van der Gaag, Kristiaan J; de Leeuw, Rick H; Hoogenboom, Jerry; Patel, Jaynish; Storts, Douglas R; Laros, Jeroen F J; de Knijff, Peter.

Forensic Sci Int Genet ; 24: 86-96, 2016 09.

Article in English | MEDLINE | ID: mdl-27347657

ABSTRACT

Current forensic DNA analysis predominantly involves identification of human donors by analysis of short tandem repeats (STRs) using Capillary Electrophoresis (CE). Recent developments in Massively Parallel Sequencing (MPS) technologies offer new possibilities in analysis of STRs since they might overcome some of the limitations of CE analysis. In this study 17 STRs and Amelogenin were sequenced in high coverage using a prototype version of the Promega PowerSeq™ system for 297 population samples from the Netherlands, Nepal, Bhutan and Central African Pygmies. In addition, 45 two-person mixtures with different minor contributions down to 1% were analysed to investigate the performance of this system for mixed samples. Regarding fragment length, complete concordance between the MPS and CE-based data was found, marking the reliability of MPS PowerSeq™ system. As expected, MPS presented a broader allele range and higher power of discrimination and exclusion rate. The high coverage sequencing data were used to determine stutter characteristics for all loci and stutter ratios were compared to CE data. The separation of alleles with the same length but exhibiting different stutter ratios lowers the overall variation in stutter ratio and helps in differentiation of stutters from genuine alleles in mixed samples. All alleles of the minor contributors were detected in the sequence reads even for the 1% contributions, but analysis of mixtures below 5% without prior information of the mixture ratio is complicated by PCR and sequencing artefacts.

Subject(s)

Genetics, Population , High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Sequence Analysis, DNA , Africa, Central , Amelogenin/genetics , Asia, Western , Humans , Netherlands , Racial Groups/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL