Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
J Integr Bioinform ; 20(3)2023 Sep 01.
Article in English | MEDLINE | ID: mdl-37602733

ABSTRACT

With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.


Subject(s)
Genomics , Software , Reproducibility of Results , Genomics/methods , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods
2.
Int J Mol Sci ; 20(18)2019 Sep 07.
Article in English | MEDLINE | ID: mdl-31500242

ABSTRACT

Copy number variants (CNVs) are an important type of human genome variation, which play a significant role in evolution contribute to population diversity and human genetic diseases. In recent years, next generation sequencing has become a valuable tool for clinical diagnostics and to provide sensitive and accurate approaches for detecting CNVs. In our previous work, we described a non-invasive prenatal test (NIPT) based on low-coverage massively parallel whole-genome sequencing of total plasma DNA for detection of CNV aberrations ≥600 kbp. We reanalyzed NIPT genomic data from 5018 patients to evaluate CNV aberrations in the Slovak population. Our analysis of autosomal chromosomes identified 225 maternal CNVs (47 deletions; 178 duplications) ranging from 600 to 7820 kbp. According to the ClinVar database, 137 CNVs (60.89%) were fully overlapping with previously annotated variants, 66 CNVs (29.33%) were in partial overlap, and 22 CNVs (9.78%) did not overlap with any previously described variant. Identified variants were further classified with the AnnotSV method. In summary, we identified 129 likely benign variants, 13 variants of uncertain significance, and 83 likely pathogenic variants. In this study, we use NIPT as a valuable source of population specific data. Our results suggest the utility of genomic data from commercial CNV analysis test as background for a population study.


Subject(s)
DNA Copy Number Variations , High-Throughput Nucleotide Sequencing/methods , Whole Genome Sequencing/methods , Chromosome Mapping , DNA/blood , Female , Humans , Pregnancy , Prenatal Diagnosis , Segmental Duplications, Genomic , Sequence Deletion , Slovakia
3.
Int J Mol Sci ; 20(16)2019 Aug 14.
Article in English | MEDLINE | ID: mdl-31416246

ABSTRACT

The reliability of non-invasive prenatal testing is highly dependent on accurate estimation of fetal fraction. Several methods have been proposed up to date, utilizing different attributes of analyzed genomic material, for example length and genomic location of sequenced DNA fragments. These two sources of information are relatively unrelated, but so far, there have been no published attempts to combine them to get an improved predictor. We collected 2454 single euploid male fetus samples from women undergoing NIPT testing. Fetal fractions were calculated using several proposed predictors and the state-of-the-art SeqFF method. Predictions were compared with the reference Y-based method. We demonstrate that prediction based on length of sequenced DNA fragments may achieve nearly the same precision as the state-of-the-art methods based on their genomic locations. We also show that combination of several sample attributes leads to a predictor that has superior prediction accuracy over any single approach. Finally, appropriate weighting of samples in the training process may achieve higher accuracy for samples with low fetal fraction and so allow more reliability for subsequent testing for genomic aberrations. We propose several improvements in fetal fraction estimation with a special focus on the samples most prone to wrong conclusion.


Subject(s)
DNA Fragmentation , Fetal Development/genetics , Fetus , Genetic Testing , Prenatal Diagnosis/methods , Adult , Base Composition , Female , Genetic Testing/methods , High-Throughput Nucleotide Sequencing , Humans , Pregnancy , Prognosis , Reproducibility of Results
4.
Int J Mol Sci ; 20(14)2019 Jul 11.
Article in English | MEDLINE | ID: mdl-31336782

ABSTRACT

Recent advances in massively parallel shotgun sequencing opened up new options for affordable non-invasive prenatal testing (NIPT) for fetus aneuploidy from DNA material extracted from maternal plasma. Tests typically compare chromosomal distributions of a tested sample with a control set of healthy samples with unaffected fetuses. Deviations above certain threshold levels are concluded as positive findings. The main problem with this approach is that the variance of the control set is dependent on the number of sequenced fragments. The higher the amount, the more precise the estimation of actual chromosomal proportions is. Testing a sample with a highly different number of sequenced reads as used in training may thus lead to over- or under-estimation of their variance, and so lead to false predictions. We propose the calculation of a variance for each tested sample adaptively, based on the actual number of its sequenced fragments. We demonstrate how it leads to more stable predictions, mainly in real-world diagnostics with the highly divergent inter-sample coverage.


Subject(s)
Models, Statistical , Molecular Diagnostic Techniques , Prenatal Diagnosis , Female , Genetic Testing , High-Throughput Nucleotide Sequencing , Humans , Pregnancy , Prenatal Diagnosis/methods
5.
J Biotechnol ; 299: 72-78, 2019 Jun 20.
Article in English | MEDLINE | ID: mdl-31054297

ABSTRACT

Low-coverage massively parallel genome sequencing for non-invasive prenatal testing (NIPT) of common aneuploidies is one of the most rapidly adopted and relatively low-cost DNA tests. Since aggregation of reads from a large number of samples allows overcoming the problems of extremely low coverage of individual samples, we describe the possible re-use of the data generated during NIPT testing for genome scale population specific frequency determination of small DNA variants, requiring no additional costs except of those for the NIPT test itself. We applied our method to a data set comprising of 1501 original NIPT test results and evaluated the findings on different levels, from in silico population frequency comparisons up to wet lab validation analyses using a gold-standard method based on Sanger sequencing. The revealed high reliability of variant calling and allelic frequency determinations suggest that these NIPT data could serve as valuable alternatives to large scale population studies even for smaller countries around the world.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Prenatal Diagnosis/methods , Computational Biology/economics , Female , Gene Frequency , High-Throughput Nucleotide Sequencing/economics , Humans , Pregnancy , Prenatal Diagnosis/economics , Reproducibility of Results , Slovakia , Whole Genome Sequencing/economics
6.
J Biotechnol ; 298: 64-75, 2019 Jun 10.
Article in English | MEDLINE | ID: mdl-30998956

ABSTRACT

Although massively parallel sequencing (MPS) is becoming common practice in both research and routine clinical care, confirmation requirements of identified DNA variants using alternative methods are still topics of debate. When evaluating variants directly from MPS data, different read depth statistics, together with specialized genotype quality scores are, therefore, of high relevance. Here we report results of our validation study performed in two different ways: 1) confirmation of MPS identified variants using Sanger sequencing; and 2) simultaneous Sanger and MPS analysis of exons of selected genes. Detailed examination of false-positive and false-negative findings revealed typical error sources connected to low read depth/coverage, incomplete reference genome, indel realignment problems, as well as microsatellite associated amplification errors leading to base miss-calling. However, all these error types were identifiable with thorough manual revision of aligned reads according to specific patterns of distributions of variants and their corresponding reads. Moreover, our results point to dependence of both basic quantitative metrics (such as total read counts, alternative allele read counts and allelic balance) together with specific genotype quality scores on the used bioinformatics pipeline, stressing thus the need for establishing of specific thresholds for these metrics in each laboratory and for each involved pipeline independently.


Subject(s)
DNA/genetics , Genome, Human/genetics , Germ Cells , High-Throughput Nucleotide Sequencing/methods , Exons/genetics , Genetic Variation/genetics , Genotype , Humans , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Software
7.
Bioinformatics ; 35(8): 1284-1291, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30219853

ABSTRACT

MOTIVATION: Non-invasive prenatal testing or NIPT is currently among the top researched topic in obstetric care. While the performance of the current state-of-the-art NIPT solutions achieve high sensitivity and specificity, they still struggle with a considerable number of samples that cannot be concluded with certainty. Such uninformative results are often subject to repeated blood sampling and re-analysis, usually after two weeks, and this period may cause a stress to the future mothers as well as increase the overall cost of the test. RESULTS: We propose a supplementary method to traditional z-scores to reduce the number of such uninformative calls. The method is based on a novel analysis of the length profile of circulating cell free DNA which compares the change in such profiles when random-based and length-based elimination of some fragments is performed. The proposed method is not as accurate as the standard z-score; however, our results suggest that combination of these two independent methods correctly resolves a substantial portion of healthy samples with an uninformative result. Additionally, we discuss how the proposed method can be used to identify maternal aberrations, thus reducing the risk of false positive and false negative calls. AVAILABILITY AND IMPLEMENTATION: The open-source code of the proposed methods, together with test data, is freely available for non-commercial users at github web page https://github.com/jbudis/lambda. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.


Subject(s)
Prenatal Diagnosis , Female , Humans , Pregnancy , Sensitivity and Specificity
8.
Bioinformatics ; 35(8): 1310-1317, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30203023

ABSTRACT

MOTIVATION: Short tandem repeats (STRs) are stretches of repetitive DNA in which short sequences, typically made of 2-6 nucleotides, are repeated several times. Since STRs have many important biological roles and also belong to the most polymorphic parts of the human genome, they became utilized in several molecular-genetic applications. Precise genotyping of STR alleles, therefore, was of high relevance during the last decades. Despite this, massively parallel sequencing (MPS) still lacks the analysis methods to fully utilize the information value of STRs in genome scale assays. RESULTS: We propose an alignment-free algorithm, called Dante, for genotyping and characterization of STR alleles at user-specified known loci based on sequence reads originating from STR loci of interest. The method accounts for natural deviations from the expected sequence, such as variation in the repeat count, sequencing errors, ambiguous bases and complex loci containing several different motifs. In addition, we implemented a correction for copy number defects caused by the polymerase induced stutter effect as well as a prediction of STR expansions that, according to the conventional view, cannot be fully captured by inherently short MPS reads. We tested Dante on simulated datasets and on datasets obtained by targeted sequencing of protein coding parts of thousands of selected clinically relevant genes. In both these datasets, Dante outperformed HipSTR and GATK genotyping tools. Furthermore, Dante was able to predict allele expansions in all tested clinical cases. AVAILABILITY AND IMPLEMENTATION: Dante is open source software, freely available for download at https://github.com/jbudis/dante. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Alleles , Genotype , Humans , Sequence Analysis, DNA
9.
Extremophiles ; 20(5): 795-808, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27338271

ABSTRACT

Different protocols based on Illumina high-throughput DNA sequencing and denaturing gradient gel electrophoresis (DGGE)-cloning were developed and applied for investigating hot spring related samples. The study was focused on three target genes: archaeal and bacterial 16S rRNA and mcrA of methanogenic microflora. Shorter read lengths of the currently most popular technology of sequencing by Illumina do not allow analysis of the complete 16S rRNA region, or of longer gene fragments, as was the case of Sanger sequencing. Here, we demonstrate that there is no need for special indexed or tailed primer sets dedicated to short variable regions of 16S rRNA since the presented approach allows the analysis of complete bacterial 16S rRNA amplicons (V1-V9) and longer archaeal 16S rRNA and mcrA sequences. Sample augmented with transposon is represented by a set of approximately 300 bp long fragments that can be easily sequenced by Illumina MiSeq. Furthermore, a low proportion of chimeric sequences was observed. DGGE-cloning based strategies were performed combining semi-nested PCR, DGGE and clone library construction. Comparing both investigation methods, a certain degree of complementarity was observed confirming that the DGGE-cloning approach is not obsolete. Novel protocols were created for several types of laboratories, utilizing the traditional DGGE technique or using the most modern Illumina sequencing.


Subject(s)
DNA, Archaeal/chemistry , DNA, Bacterial/chemistry , Hot Springs/microbiology , Microbiota , Sequence Analysis, DNA/methods , DNA, Archaeal/genetics , DNA, Bacterial/genetics , Denaturing Gradient Gel Electrophoresis/methods , Polymerase Chain Reaction/methods , RNA, Ribosomal, 16S/genetics
10.
PLoS One ; 10(12): e0144811, 2015.
Article in English | MEDLINE | ID: mdl-26669558

ABSTRACT

OBJECTIVES: The aims of this study were to test the utility of benchtop NGS platforms for NIPT for trisomy 21 using previously published z score calculation methods and to optimize the sample preparation and data analysis with use of in silico and physical size selection methods. METHODS: Samples from 130 pregnant women were analyzed by whole genome sequencing on benchtop NGS systems Ion Torrent PGM and MiSeq. The targeted yield of 3 million raw reads on each platform was used for z score calculation. The impact of in silico and physical size selection on analytical performance of the test was studied. RESULTS: Using a z score value of 3 as the cut-off, 98.11%-100% (104-106/106) specificity and 100% (24/24) sensitivity and 99.06%-100% (105-106/106) specificity and 100% (24/24) sensitivity were observed for Ion Torrent PGM and MiSeq, respectively. After in silico based size selection both platforms reached 100% specificity and sensitivity. Following the physical size selection z scores of tested trisomic samples increased significantly--p = 0.0141 and p = 0.025 for Ion Torrent PGM and MiSeq, respectively. CONCLUSIONS: Noninvasive prenatal testing for chromosome 21 trisomy with the utilization of benchtop NGS systems led to results equivalent to previously published studies performed on high-to-ultrahigh throughput NGS systems. The in silico size selection led to higher specificity of the test. Physical size selection performed on isolated DNA led to significant increase in z scores. The observed results could represent a basis for increasing of cost effectiveness of the test and thus help with its penetration worldwide.


Subject(s)
Chromosomes, Human, Pair 21/genetics , Computer Simulation , Down Syndrome/genetics , High-Throughput Nucleotide Sequencing/methods , Prenatal Diagnosis/methods , Female , Humans , Ions , Pregnancy , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...