Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
PeerJ ; 8: e10501, 2020.
Article in English | MEDLINE | ID: mdl-33354434

ABSTRACT

BACKGROUND: Low-coverage sequencing is a cost-effective way to obtain reads spanning an entire genome. However, read depth at each locus is low, making sequencing error difficult to separate from actual variation. Prior to variant calling, sequencer reads are aligned to a reference genome, with alignments stored in Sequence Alignment/Map (SAM) files. Each alignment has a mapping quality (MAPQ) score indicating the probability a read is incorrectly aligned. This study investigated the recalibration of probability estimates used to compute MAPQ scores for improving variant calling performance in single-sample, low-coverage settings. MATERIALS AND METHODS: Simulated tomato, hot pepper and rice genomes were implanted with known variants. From these, simulated paired-end reads were generated at low coverage and aligned to the original reference genomes. Features extracted from the SAM formatted alignment files for tomato were used to train machine learning models to detect incorrectly aligned reads and output estimates of the probability of misalignment for each read in all three data sets. MAPQ scores were then re-computed from these estimates. Next, the SAM files were updated with new MAPQ scores. Finally, Variant calling was performed on the original and recalibrated alignments and the results compared. RESULTS: Incorrectly aligned reads comprised only 0.16% of the reads in the training set. This severe class imbalance required special consideration for model training. The F1 score for detecting misaligned reads ranged from 0.76 to 0.82. The best performing model was used to compute new MAPQ scores. Single Nucleotide Polymorphism (SNP) detection was improved after mapping score recalibration. In rice, recall for called SNPs increased by 5.2%, while for tomato and pepper it increased by 3.1% and 1.5%, respectively. For all three data sets the precision of SNP calls ranged from 0.91 to 0.95, and was largely unchanged both before and after mapping score recalibration. CONCLUSION: Recalibrating MAPQ scores delivers modest improvements in single-sample variant calling results. Some variant callers operate on multiple samples simultaneously. They exploit every sample's reads to compensate for the low read-depth of individual samples. This improves polymorphism detection and genotype inference. It may be that small improvements in single-sample settings translate to larger gains in a multi-sample experiment. A study to investigate this is ongoing.

2.
World J Microbiol Biotechnol ; 36(7): 103, 2020 Jul 02.
Article in English | MEDLINE | ID: mdl-32613458

ABSTRACT

Food poisoning from consumption of food contaminated with non-typhoidal Salmonella spp. is a global problem. A modified high resolution DNA melting curve analysis (m-HRMa) was introduced to provide effective discrimination among closely related HRM curves of amplicons generated from selected Salmonella genome sequences enabled Salmonella spp. to be classified into discrete clusters. Combination of m-HRMa with serogroup identification (ms-HRMa) helped improve assignment of Salmonella spp. into clusters. In addition, a machine learning (dynamic time warping) algorithm (DTW) was employed to provide a simple and rapid protocol for clustering analysis as well as to create phylogeny tree of Salmonella strains (n = 40) collected from home, farms and slaughter houses in northern Thailand. Applications of DTW and ms-HRMa clustering analyses were capable of generating molecular signatures of the Salmonella isolates, resulting in 25 ms-HRM and 28 DTW clusters compared to 14 clusters from a standard HRM analysis, and the combination of both analyses permitted molecular subtyping of each Salmonella isolate. Results from DTW and ms-HRMa cluster analyses were in good agreement with that obtained from enterobacterial repetitive intergenic consensus sequence PCR clustering. While conventional serotyping of Clusters 1 and 2 revealed six different Salmonella serotypes, the majority being S. Weltevraden, the new Salmonella subtyping protocol identified five S. Weltevraden subtypes with S.Weltevreden subtype DTW4-M1 being predominant. Based on knowledge of the sources of Salmonella subtypes, transmission of S. Weltevraden in northern Thailand was likely to be farm-to-farm through contaminated chicken stool. In conclusion, the rapid, robust and specific Salmonella subtyping developed in the study can be performed in a local setting, enabling swift control and preventive measures to be initiated against potential epidemics of salmonellosis.


Subject(s)
Algorithms , Machine Learning , Nucleic Acid Denaturation , Salmonella Infections/microbiology , Salmonella/classification , Salmonella/genetics , Salmonella/isolation & purification , Animals , Bacterial Typing Techniques , Chickens/microbiology , DNA Fingerprinting/methods , Feces/microbiology , Humans , Phylogeny , Polymerase Chain Reaction , Salmonella Infections/transmission , Serogroup , Serotyping , Thailand
3.
PeerJ ; 8: e9113, 2020.
Article in English | MEDLINE | ID: mdl-32587791

ABSTRACT

BACKGROUND: Nontyphoidal Salmonella spp. constitute a major bacterial cause of food poisoning. Each Salmonella serotype causes distinct virulence to humans. METHOD: A small cohort study was conducted to characterize several aspects of Salmonella isolates obtained from stool of diarrheal patients (n = 26) admitted to Phayao Ram Hospital, Phayao province, Thailand. A simple CRISPR 2 molecular analysis was developed to rapidly type Salmonella isolates employing both uniplex and high resolution melting (HRM) curve analysis. RESULTS: CRISPR 2 monoplex PCR generated a single Salmonella serotype-specific amplicon, showing S. 4,[5],12:i:- with highest frequency (42%), S. Enteritidis (15%) and S. Stanley (11%); S. Typhimurium was not detected. CRISPR 2 HRM-PCR allowed further classification of S. 4,[5],12:i:- isolates based on their specific CRISPR 2 signature sequences. The highest prevalence of Salmonella infection was during the summer season (April to August). Additional studies were conducted using standard multiplex HRM-PCR typing, which confirmed CRISPR 2 PCR results and, using a machine-learning algorithm, clustered the majority of Salmonella serotypes into six clades; repetitive element-based (ERIC) PCR, which clustered the serotypes into three clades only; antibiogram profiling, which revealed the majority resistant to ampicillin (69%); and test for extended spectrum ß-lactamase production (two isolates) and PCR-based detection of bla alleles. CONCLUSION: CRISPR 2 PCR provided a simple assay for detection and identification of clinically-relevant Salmonella serotypes. In conjunction with antibiogram profiling and rapid assay for ß-lactamase producers, this approach should facilitate detection and appropriate treatment of Salmonellosis in a local hospital setting. In addition, CRISPR 2 HRM-PCR profiling enabled clustering of S. 4,[5],12:i:-isolates according to CRISPR 2 locus signature sequences, indicative of their different evolutionary trajectories, thereby providing a powerful tool for future epidemiological studies of virulent Salmonella serotypes.

4.
Int J Bioinform Res Appl ; 11(2): 111-29, 2015.
Article in English | MEDLINE | ID: mdl-25786791

ABSTRACT

The human gut is one of the most densely populated microbial communities in the world. The interaction of microbes with human host cells is responsible for several disease conditions and of criticality to human health. It is imperative to understand the relationships between these microbial communities within the human gut and their roles in disease. In this study we analyse the microbial communities within the human gut and their role in Inflammatory Bowel Disease (IBD). The bacterial communities were interrogated using Length Heterogeneity PCR (LH-PCR) fingerprinting of mucosal and luminal associated microbial communities for a class of healthy and diseases patients.


Subject(s)
Bacteria/genetics , Bacteria/isolation & purification , Inflammatory Bowel Diseases/microbiology , Intestinal Mucosa/microbiology , Pattern Recognition, Automated/methods , Polymerase Chain Reaction/methods , Algorithms , Humans , Microbiota/genetics , Reproducibility of Results , Sensitivity and Specificity , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL
...