Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 14.383
Filter
1.
Mol Biol Rep ; 51(1): 710, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824241

ABSTRACT

BACKGROUND: Circular RNA (circRNA) is a key player in regulating the multidirectional differentiation of stem cells. Previous research by our group found that the blue light-emitting diode (LED) had a promoting effect on the osteogenic/odontogenic differentiation of human stem cells from apical papilla (SCAPs). This research aimed to investigate the differential expression of circRNAs during the osteogenic/odontogenic differentiation of SCAPs regulated by blue LED. MATERIALS AND METHODS: SCAPs were divided into the irradiation group (4 J/cm2) and the control group (0 J/cm2), and cultivated in an osteogenic/odontogenic environment. The differentially expressed circRNAs during osteogenic/odontogenic differentiation of SCAPs promoted by blue LED were detected by high-throughput sequencing, and preliminarily verified by qRT-PCR. Functional prediction of these circRNAs was performed using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the circRNA-miRNA-mRNA networks were also constructed. RESULTS: It showed 301 circRNAs were differentially expressed. GO and KEGG analyses suggested that these circRNAs were associated with some signaling pathways related to osteogenic/odontogenic differentiation. And the circRNA-miRNA-mRNA networks were also successfully constructed. CONCLUSION: CircRNAs were involved in the osteogenic/odontogenic differentiation of SCAPs promoted by blue LED. In this biological process, circRNA-miRNA-mRNA networks served an important purpose, and circRNAs regulated this process through certain signaling pathways.


Subject(s)
Cell Differentiation , Dental Papilla , Light , Odontogenesis , Osteogenesis , RNA, Circular , Stem Cells , RNA, Circular/genetics , RNA, Circular/metabolism , Humans , Osteogenesis/genetics , Cell Differentiation/genetics , Stem Cells/metabolism , Stem Cells/cytology , Odontogenesis/genetics , Dental Papilla/cytology , Dental Papilla/metabolism , MicroRNAs/genetics , MicroRNAs/metabolism , Gene Ontology , Cells, Cultured , Gene Expression Profiling/methods , RNA, Messenger/genetics , RNA, Messenger/metabolism , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing/methods , Gene Expression Regulation/radiation effects , Blue Light
2.
J Clin Virol ; 173: 105695, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38823290

ABSTRACT

Metagenomics is gradually being implemented for diagnosing infectious diseases. However, in-depth protocol comparisons for viral detection have been limited to individual sets of experimental workflows and laboratories. In this study, we present a benchmark of metagenomics protocols used in clinical diagnostic laboratories initiated by the European Society for Clinical Virology (ESCV) Network on NGS (ENNGS). A mock viral reference panel was designed to mimic low biomass clinical specimens. The panel was used to assess the performance of twelve metagenomic wet lab protocols currently in use in the diagnostic laboratories of participating ENNGS member institutions. Both Illumina and Nanopore, shotgun and targeted capture probe protocols were included. Performance metrics sensitivity, specificity, and quantitative potential were assessed using a central bioinformatics pipeline. Overall, viral pathogens with loads down to 104 copies/ml (corresponding to CT values of 31 in our PCR assays) were detected by all the evaluated metagenomic wet lab protocols. In contrast, lower abundant mixed viruses of CT values of 35 and higher were detected only by a minority of the protocols. Considering the reference panel as the gold standard, optimal thresholds to define a positive result were determined per protocol, based on the horizontal genome coverage. Implementing these thresholds, sensitivity and specificity of the protocols ranged from 67 to 100 % and 87 to 100 %, respectively. A variety of metagenomic protocols are currently in use in clinical diagnostic laboratories. Detection of low abundant viral pathogens and mixed infections remains a challenge, implying the need for standardization of metagenomic analysis for use in clinical settings.


Subject(s)
Benchmarking , Metagenomics , Sensitivity and Specificity , Viruses , Metagenomics/methods , Metagenomics/standards , Humans , Viruses/genetics , Viruses/classification , Viruses/isolation & purification , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Virus Diseases/diagnosis , Virus Diseases/virology , Computational Biology/methods
3.
Sci Rep ; 14(1): 13138, 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38849509

ABSTRACT

Colorectal cancer (CRC) is a global health concern, and the incidence of early onset (EO) CRC, has an upward trend. This study delves into the genomic landscape of EO-CRC, specifically focusing on pediatric (PED) and young adult (YA) patients, comparing them with adult (AD) CRC. In this retrospective monocentric investigation, we performed targeted next-generation sequencing to compare the mutational profile of 38 EO-CRCs patients (eight PED and 30 YA) to those of a 'control group' consisting of 56 AD-CRCs. Our findings reveal distinct molecular profiles in EO-CRC, notably in the WNT and PI3K-AKT pathways. In pediatrics, we observed a significantly higher frequency of RNF43 mutations, whereas APC mutations were more prevalent in adult cases. These observations suggest age-related differences in the activation of the WNT pathway. Pathway and copy number variation analysis reveal that AD-CRC and YA-CRC have more similarities than the pediatric patients. PED shows a peculiar profile with CDK6 amplification and the enrichment of lysine degradation pathway. These findings may open doors for personalized therapies, such as PI3K-AKT pathway inhibitors or CDK6 inhibitors for pediatric patients. Additionally, the distinct molecular signatures of EO-CRC underscore the need for age-specific treatment strategies and precision medicine. This study emphasizes the importance of comprehensive molecular investigations in EO-CRCs, which can potentially improve diagnostic accuracy, prognosis, and therapeutic decisions for these patients. Collaboration between the pediatric and adult oncology community is fundamental to improve oncological outcomes for this rare and challenging pediatric tumor.


Subject(s)
Colorectal Neoplasms , Mutation , Humans , Colorectal Neoplasms/genetics , Male , Female , Child , Young Adult , Adolescent , Adult , Retrospective Studies , Child, Preschool , DNA Copy Number Variations , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Wnt Signaling Pathway/genetics
4.
BMC Genomics ; 25(1): 573, 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38849740

ABSTRACT

BACKGROUNDS: The single-pass long reads generated by third-generation sequencing technology exhibit a higher error rate. However, the circular consensus sequencing (CCS) produces shorter reads. Thus, it is effective to manage the error rate of long reads algorithmically with the help of the homologous high-precision and low-cost short reads from the Next Generation Sequencing (NGS) technology. METHODS: In this work, a hybrid error correction method (NmTHC) based on a generative neural machine translation model is proposed to automatically capture discrepancies within the aligned regions of long reads and short reads, as well as the contextual relationships within the long reads themselves for error correction. Akin to natural language sequences, the long read can be regarded as a special "genetic language" and be processed with the idea of generative neural networks. The algorithm builds a sequence-to-sequence(seq2seq) framework with Recurrent Neural Network (RNN) as the core layer. The before and post-corrected long reads are regarded as the sentences in the source and target language of translation, and the alignment information of long reads with short reads is used to create the special corpus for training. The well-trained model can be used to predict the corrected long read. RESULTS: NmTHC outperforms the latest mainstream hybrid error correction methods on real-world datasets from two mainstream platforms, including PacBio and Nanopore. Our experimental evaluation results demonstrate that NmTHC can align more bases with the reference genome without any segmenting in the six benchmark datasets, proving that it enhances alignment identity without sacrificing any length advantages of long reads. CONCLUSION: Consequently, NmTHC reasonably adopts the generative Neural Machine Translation (NMT) model to transform hybrid error correction tasks into machine translation problems and provides a novel perspective for solving long-read error correction problems with the ideas of Natural Language Processing (NLP). More remarkably, the proposed methodology is sequencing-technology-independent and can produce more precise reads.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing , Neural Networks, Computer , High-Throughput Nucleotide Sequencing/methods , Humans , Machine Learning
5.
Commun Biol ; 7(1): 675, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824179

ABSTRACT

The three-dimensional (3D) organization of genome is fundamental to cell biology. To explore 3D genome, emerging high-throughput approaches have produced billions of sequencing reads, which is challenging and time-consuming to analyze. Here we present Microcket, a package for mapping and extracting interacting pairs from 3D genomics data, including Hi-C, Micro-C, and derivant protocols. Microcket utilizes a unique read-stitch strategy that takes advantage of the long read cycles in modern DNA sequencers; benchmark evaluations reveal that Microcket runs much faster than the current tools along with improved mapping efficiency, and thus shows high potential in accelerating and enhancing the biological investigations into 3D genome. Microcket is freely available at https://github.com/hellosunking/Microcket .


Subject(s)
Genomics , Software , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Data Analysis
6.
Cell Mol Biol (Noisy-le-grand) ; 70(6): 7-13, 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38836688

ABSTRACT

SARS-CoV-2 has been identified by the WHO as a new virus causing mild to severe respiratory illnesses that belong to the Coronavirus family. The virus underwent rapid and continuous changes in the genetic material, especially the S gene, during COVID-19 pandemic and generated a number of new variants announced by WHO in late 2020. Mutations in the S gene have greatly affected virus pathogenesis as the spike protein is responsible for many critical processes. Delta and Omicron variants were studied extensively due to increased mortality and morbidity rates associated with their pandemic waves. This study aimed to analyse the S gene through NGS in an attempt to identify and characterize the circulating variants among the infected population in Erbil/Iraq. Nasopharyngeal and throat swab samples were collected from hospitalized and non-hospitalized patients with COVID-19 symptoms in Erbil City/Iraq from the 1st of November 2021 to the 28th of February 2022. Following confirmation of SARS-CoV-2 infection by RT-PCR, 15 samples were selected and sent to Intergen Lab (Ankara/Turkey) for NGS and analysis. Following analysis and alignment of the received sequences with the Wuhan-Hu-1 strain (wild-type), Delta variant was identified in 13 samples, and Omicron in two. On the whole, different mutation classes have been observed including nonsynonymous, synonymous, non-frameshift deletions and a non-frameshift insertion. The Delta-specific set of mutations, L452R, T478K and P681R, was detected in all Delta isolates. Both Omicron variants appeared to have 35 mutations. D614G variation was conserved in both variants.


Subject(s)
COVID-19 , High-Throughput Nucleotide Sequencing , Mutation , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Humans , Spike Glycoprotein, Coronavirus/genetics , COVID-19/virology , COVID-19/genetics , COVID-19/epidemiology , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , High-Throughput Nucleotide Sequencing/methods , Mutation/genetics , Male , Female
7.
Microb Genom ; 10(6)2024 Jun.
Article in English | MEDLINE | ID: mdl-38833287

ABSTRACT

It is now possible to assemble near-perfect bacterial genomes using Oxford Nanopore Technologies (ONT) long reads, but short-read polishing is usually required for perfection. However, the effect of short-read depth on polishing performance is not well understood. Here, we introduce Pypolca (with default and careful parameters) and Polypolish v0.6.0 (with a new careful parameter). We then show that: (1) all polishers other than Pypolca-careful, Polypolish-default and Polypolish-careful commonly introduce false-positive errors at low read depth; (2) most of the benefit of short-read polishing occurs by 25× depth; (3) Polypolish-careful almost never introduces false-positive errors at any depth; and (4) Pypolca-careful is the single most effective polisher. Overall, we recommend the following polishing strategies: Polypolish-careful alone when depth is very low (<5×), Polypolish-careful and Pypolca-careful when depth is low (5-25×), and Polypolish-default and Pypolca-careful when depth is sufficient (>25×).


Subject(s)
Genome, Bacterial , Nanopores , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Nanopore Sequencing/methods , Bacteria/genetics , Bacteria/classification , Software , Genomics/methods
8.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-38832466

ABSTRACT

BACKGROUND: Due to human error, sample swapping in large cohort studies with heterogeneous data types (e.g., mix of Oxford Nanopore Technologies, Pacific Bioscience, Illumina data, etc.) remains a common issue plaguing large-scale studies. At present, all sample swapping detection methods require costly and unnecessary (e.g., if data are only used for genome assembly) alignment, positional sorting, and indexing of the data in order to compare similarly. As studies include more samples and new sequencing data types, robust quality control tools will become increasingly important. FINDINGS: The similarity between samples can be determined using indexed k-mer sequence variants. To increase statistical power, we use coverage information on variant sites, calculating similarity using a likelihood ratio-based test. Per sample error rate, and coverage bias (i.e., missing sites) can also be estimated with this information, which can be used to determine if a spatially indexed principal component analysis (PCA)-based prescreening method can be used, which can greatly speed up analysis by preventing exhaustive all-to-all comparisons. CONCLUSIONS: Because this tool processes raw data, is faster than alignment, and can be used on very low-coverage data, it can save an immense degree of computational resources in standard quality control (QC) pipelines. It is robust enough to be used on different sequencing data types, important in studies that leverage the strengths of different sequencing technologies. In addition to its primary use case of sample swap detection, this method also provides information useful in QC, such as error rate and coverage bias, as well as population-level PCA ancestry analysis visualization.


Subject(s)
High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Humans , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Principal Component Analysis , Computational Biology/methods , Algorithms
9.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38833323

ABSTRACT

The emergence and rapid spread of SARS-CoV-2 prompted the global community to identify innovative approaches to diagnose infection and sequence the viral genome because at several points in the pandemic positive case numbers exceeded the laboratory capacity to characterize sufficient samples to adequately respond to the spread of emerging variants. From week 10, 2020, to week 13, 2023, Slovenian routine complete genome sequencing (CGS) surveillance network yielded 41 537 complete genomes and revealed a typical molecular epidemiology with early lineages gradually being replaced by Alpha, Delta, and finally Omicron. We developed a targeted next-generation sequencing based variant surveillance strategy dubbed Spike Screen through sample pooling and selective SARS-CoV-2 spike gene amplification in conjunction with CGS of individual cases to increase throughput and cost-effectiveness. Spike Screen identifies variant of concern (VOC) and variant of interest (VOI) signature mutations, analyses their frequencies in sample pools, and calculates the number of VOCs/VOIs at the population level. The strategy was successfully applied for detection of specific VOC/VOI mutations prior to their confirmation by CGS. Spike Screen complemented CGS efforts with an additional 22 897 samples sequenced in two time periods: between week 42, 2020, and week 24, 2021, and between week 37, 2021, and week 2, 2022. The results showed that Spike Screen can be applied to monitor VOC/VOI mutations among large volumes of samples in settings with limited sequencing capacity through reliable and rapid detection of novel variants at the population level and can serve as a basis for public health policy planning.


Subject(s)
COVID-19 , High-Throughput Nucleotide Sequencing , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Humans , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , High-Throughput Nucleotide Sequencing/methods , COVID-19/virology , COVID-19/diagnosis , COVID-19/epidemiology , Spike Glycoprotein, Coronavirus/genetics , Mutation , Genome, Viral , Slovenia/epidemiology
10.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38828640

ABSTRACT

Cell hashing, a nucleotide barcode-based method that allows users to pool multiple samples and demultiplex in downstream analysis, has gained widespread popularity in single-cell sequencing due to its compatibility, simplicity, and cost-effectiveness. Despite these advantages, the performance of this method remains unsatisfactory under certain circumstances, especially in experiments that have imbalanced sample sizes or use many hashtag antibodies. Here, we introduce a hybrid demultiplexing strategy that increases accuracy and cell recovery in multi-sample single-cell experiments. This approach correlates the results of cell hashing and genetic variant clustering, enabling precise and efficient cell identity determination without additional experimental costs or efforts. In addition, we developed HTOreader, a demultiplexing tool for cell hashing that improves the accuracy of cut-off calling by avoiding the dominance of negative signals in experiments with many hashtags or imbalanced sample sizes. When compared to existing methods using real-world datasets, this hybrid approach and HTOreader consistently generate reliable results with increased accuracy and cell recovery.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Algorithms , Software , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods
11.
Appl Microbiol Biotechnol ; 108(1): 367, 2024 Jun 08.
Article in English | MEDLINE | ID: mdl-38850297

ABSTRACT

Recent microbiome research has incorporated a higher number of samples through more participants in a study, longitudinal studies, and metanalysis between studies. Physical limitations in a sequencing machine can result in samples spread across sequencing runs. Here we present the results of sequencing nearly 1000 16S rRNA gene sequences in fecal (stabilized and swab) and oral (swab) samples from multiple human microbiome studies and positive controls that were conducted with identical standard operating procedures. Sequencing was performed in the same center across 18 different runs. The simplified mock community showed limitations in accuracy, while precision (e.g., technical variation) was robust for the mock community and actual human positive control samples. Technical variation was the lowest for stabilized fecal samples, followed by fecal swab samples, and then oral swab samples. The order of technical variation stability was inverse of DNA concentrations (e.g., highest in stabilized fecal samples), highlighting the importance of DNA concentration in reproducibility and urging caution when analyzing low biomass samples. Coefficients of variation at the genus level also followed the same trend for lower variation with higher DNA concentrations. Technical variation across both sample types and the two human sampling locations was significantly less than the observed biological variation. Overall, this research providing comparisons between technical and biological variation, highlights the importance of using positive controls, and provides semi-quantified data to better understand variation introduced by sequencing runs. KEY POINTS: • Mock community and positive control accuracy were lower than precision. • Samples with lower DNA concentration had increased technical variation across sequencing runs. • Biological variation was significantly higher than technical variation due to sequencing runs.


Subject(s)
DNA, Bacterial , Feces , Microbiota , RNA, Ribosomal, 16S , Sequence Analysis, DNA , Humans , RNA, Ribosomal, 16S/genetics , Feces/microbiology , Microbiota/genetics , Sequence Analysis, DNA/methods , DNA, Bacterial/genetics , Bacteria/genetics , Bacteria/classification , Bacteria/isolation & purification , Reproducibility of Results , Mouth/microbiology , High-Throughput Nucleotide Sequencing/methods
12.
J Cancer Res Clin Oncol ; 150(6): 296, 2024 Jun 08.
Article in English | MEDLINE | ID: mdl-38850363

ABSTRACT

Spatial transcriptomics (ST) provides novel insights into the tumor microenvironment (TME). ST allows the quantification and illustration of gene expression profiles in the spatial context of tissues, including both the cancer cells and the microenvironment in which they are found. In cancer research, ST has already provided novel insights into cancer metastasis, prognosis, and immunotherapy responsiveness. The clinical precision oncology application of next-generation sequencing (NGS) and RNA profiling of tumors relies on bulk methods that lack spatial context. The ability to preserve spatial information is now possible, as it allows us to capture tumor heterogeneity and multifocality. In this narrative review, we summarize precision oncology, discuss tumor sequencing in the clinic, and review the available ST research methods, including seqFISH, MERFISH (Vizgen), CosMx SMI (NanoString), Xenium (10x), Visium (10x), Stereo-seq (STOmics), and GeoMx DSP (NanoString). We then review the current ST literature with a focus on solid tumors organized by tumor type. Finally, we conclude by addressing an important question: how will spatial transcriptomics ultimately help patients with cancer?


Subject(s)
Neoplasms , Transcriptome , Tumor Microenvironment , Humans , Neoplasms/genetics , Neoplasms/pathology , Tumor Microenvironment/genetics , Gene Expression Profiling/methods , Precision Medicine/methods , High-Throughput Nucleotide Sequencing/methods
13.
Folia Biol (Praha) ; 70(1): 62-73, 2024.
Article in English | MEDLINE | ID: mdl-38830124

ABSTRACT

Germline DNA testing using the next-gene-ration sequencing (NGS) technology has become the analytical standard for the diagnostics of hereditary diseases, including cancer. Its increasing use places high demands on correct sample identification, independent confirmation of prioritized variants, and their functional and clinical interpretation. To streamline these processes, we introduced parallel DNA and RNA capture-based NGS using identical capture panel CZECANCA, which is routinely used for DNA analysis of hereditary cancer predisposition. Here, we present the analytical workflow for RNA sample processing and its analytical and diagnostic performance. Parallel DNA/RNA analysis allowed credible sample identification by calculating the kinship coefficient. The RNA capture-based approach enriched transcriptional targets for the majority of clinically relevant cancer predisposition genes to a degree that allowed analysis of the effect of identified DNA variants on mRNA processing. By comparing the panel and whole-exome RNA enrichment, we demonstrated that the tissue-specific gene expression pattern is independent of the capture panel. Moreover, technical replicates confirmed high reproducibility of the tested RNA analysis. We concluded that parallel DNA/RNA NGS using the identical gene panel is a robust and cost-effective diagnostic strategy. In our setting, it allows routine analysis of 48 DNA/RNA pairs using NextSeq 500/550 Mid Output Kit v2.5 (150 cycles) in a single run with sufficient coverage to analyse 226 cancer predisposition and candidate ge-nes. This approach can replace laborious Sanger confirmatory sequencing, increase testing turnaround, reduce analysis costs, and improve interpretation of the impact of variants by analysing their effect on mRNA processing.


Subject(s)
Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Humans , High-Throughput Nucleotide Sequencing/methods , Neoplasms/genetics , Neoplasms/diagnosis , RNA/genetics , Reproducibility of Results , Sequence Analysis, DNA/methods , Sequence Analysis, RNA/methods , DNA/genetics
14.
PLoS One ; 19(6): e0303938, 2024.
Article in English | MEDLINE | ID: mdl-38843147

ABSTRACT

Oxford Nanopore Technologies (ONT) sequencing is a promising technology. We assessed the performance of the new ONT R10 flowcells and V14 rapid sequencing chemistry for Mtb whole genome sequencing of Mycobacterium tuberculosis (Mtb) DNA extracted from clinical primary liquid cultures (CPLCs). Using the recommended protocols for MinION Mk1C, R10.4.1 MinION flowcells, and the ONT Rapid Sequencing Kit V14 on six CPLC samples, we obtained a pooled library yield of 10.9 ng/µl, generated 1.94 Gb of sequenced bases and 214k reads after 48h in a first sequencing run. Only half (49%) of all generated reads met the Phred Quality score threshold (>8). To assess if the low data output and sequence quality were due to impurities present in DNA extracted directly from CPLCs, we added a pre-library preparation bead-clean-up step and included purified DNA obtained from an Mtb subculture as a control sample in a second sequencing run. The library yield for DNA extracted from four CPLCs and one Mtb subculture (control) was similar (10.0 ng/µl), 2.38 Gb of bases and 822k reads were produced. The quality was slightly better with 66% of the produced reads having a Phred Quality >8. A third run of DNA from six CPLCs with bead clean-up pre-processing produced a low library yield (±1 Gb of bases, 166k reads) of low quality (51% of reads with a Phred Quality score >8). A median depth of coverage above 10× was only achieved for five of 17 (29%) sequenced libraries. Compared to Illumina WGS of the same samples, accurate lineage predictions and full drug resistance profiles from the generated ONT data could not be determined by TBProfiler. Further optimization of the V14 ONT rapid sequencing chemistry and library preparation protocol is needed for clinical Mtb WGS applications.


Subject(s)
DNA, Bacterial , Mycobacterium tuberculosis , Mycobacterium tuberculosis/genetics , Humans , DNA, Bacterial/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Nanopores , Nanopore Sequencing/methods , Genome, Bacterial , Whole Genome Sequencing/methods , Tuberculosis/microbiology , Tuberculosis/diagnosis , Gene Library
15.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38851298

ABSTRACT

Deletion is a crucial type of genomic structural variation and is associated with numerous genetic diseases. The advent of third-generation sequencing technology has facilitated the analysis of complex genomic structures and the elucidation of the mechanisms underlying phenotypic changes and disease onset due to genomic variants. Importantly, it has introduced innovative perspectives for deletion variants calling. Here we propose a method named Dual Attention Structural Variation (DASV) to analyze deletion structural variations in sequencing data. DASV converts gene alignment information into images and integrates them with genomic sequencing data through a dual attention mechanism. Subsequently, it employs a multi-scale network to precisely identify deletion regions. Compared with four widely used genome structural variation calling tools: cuteSV, SVIM, Sniffles and PBSV, the results demonstrate that DASV consistently achieves a balance between precision and recall, enhancing the F1 score across various datasets. The source code is available at https://github.com/deconvolution-w/DASV.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , Humans , High-Throughput Nucleotide Sequencing/methods , Sequence Deletion , Sequence Analysis, DNA/methods , Algorithms , Genomics/methods , Computational Biology/methods
16.
Sci Rep ; 14(1): 13069, 2024 06 06.
Article in English | MEDLINE | ID: mdl-38844820

ABSTRACT

Insertion mutations in exon 20 of the epidermal growth factor receptor gene (EGFR exon20ins) are rare, heterogeneous alterations observed in non-small cell lung cancer (NSCLC). With a few exceptions, they are associated with primary resistance to established EGFR tyrosine kinase inhibitors (TKIs). As patients carrying EGFR exon20ins may be eligible for treatment with novel therapeutics-the bispecific antibody amivantamab, the TKI mobocertinib, or potential future innovations-they need to be identified reliably in clinical practice for which quality-based routine genetic testing is crucial. Spearheaded by the German Quality Assurance Initiative Pathology two international proficiency tests were run, assessing the performance of 104 participating institutes detecting EGFR exon20ins in tissue and/or plasma samples. EGFR exon20ins were most reliably identified using next-generation sequencing (NGS). Interestingly, success rates of institutes using commercially available mutation-/allele-specific quantitative (q)PCR were below 30% for tissue samples and 0% for plasma samples. Most of these mutation-/allele-specific (q)PCR assays are not designed to detect the whole spectrum of EGFR exon20ins mutations leading to false negative results. These data suggest that NGS is a suitable method to detect EGFR exon20ins in various types of patient samples and is superior to the detection spectrum of commercially available assays.


Subject(s)
Carcinoma, Non-Small-Cell Lung , ErbB Receptors , Exons , High-Throughput Nucleotide Sequencing , Lung Neoplasms , Humans , ErbB Receptors/genetics , High-Throughput Nucleotide Sequencing/methods , Carcinoma, Non-Small-Cell Lung/genetics , Lung Neoplasms/genetics , Laboratory Proficiency Testing , Antibodies, Bispecific/therapeutic use , Mutagenesis, Insertional , Protein Kinase Inhibitors/therapeutic use
19.
J Med Virol ; 96(5): e29652, 2024 May.
Article in English | MEDLINE | ID: mdl-38727029

ABSTRACT

Human papillomavirus (HPV) genotyping is widely used, particularly in combination with high-risk (HR) HPV tests for cervical cancer screening. We developed a genotyping method using sequences of approximately 800 bp in the E6/E7 region obtained by PacBio single molecule real-time sequencing (SMRT) and evaluated its performance against MY09-11 L1 sequencing and after the APTIMA HPV genotyping assay. The levels of concordance of PacBio E6/E7 SMRT sequencing with MY09-11 L1 sequencing and APTIMA HPV genotyping were 100% and 90.8%, respectively. The sensitivity of PacBio E6/EA7 SMRT was slightly greater than that of L1 sequencing and, as expected, lower than that of HR-HPV tests. In the context of cervical cancer screening, PacBio E6/E7 SMRT is then best used after a positive HPV test. PacBio E6/E7 SMRT genotyping is an attractive alternative for HR and LR-HPV genotyping of clinical samples. PacBio SMRT sequencing provides unbiased genotyping and can detect multiple HPV infections and haplotypes within a genotype.


Subject(s)
Genotype , Genotyping Techniques , Papillomaviridae , Papillomavirus Infections , Humans , Papillomavirus Infections/virology , Papillomavirus Infections/diagnosis , Female , Genotyping Techniques/methods , Papillomaviridae/genetics , Papillomaviridae/classification , Papillomaviridae/isolation & purification , Sensitivity and Specificity , Uterine Cervical Neoplasms/virology , Uterine Cervical Neoplasms/diagnosis , Sequence Analysis, DNA/methods , Early Detection of Cancer/methods , Oncogene Proteins, Viral/genetics , DNA, Viral/genetics , High-Throughput Nucleotide Sequencing/methods
20.
Microbiome ; 12(1): 84, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38725076

ABSTRACT

BACKGROUND: Emergence of antibiotic resistance in bacteria is an important threat to global health. Antibiotic resistance genes (ARGs) are some of the key components to define bacterial resistance and their spread in different environments. Identification of ARGs, particularly from high-throughput sequencing data of the specimens, is the state-of-the-art method for comprehensively monitoring their spread and evolution. Current computational methods to identify ARGs mainly rely on alignment-based sequence similarities with known ARGs. Such approaches are limited by choice of reference databases and may potentially miss novel ARGs. The similarity thresholds are usually simple and could not accommodate variations across different gene families and regions. It is also difficult to scale up when sequence data are increasing. RESULTS: In this study, we developed ARGNet, a deep neural network that incorporates an unsupervised learning autoencoder model to identify ARGs and a multiclass classification convolutional neural network to classify ARGs that do not depend on sequence alignment. This approach enables a more efficient discovery of both known and novel ARGs. ARGNet accepts both amino acid and nucleotide sequences of variable lengths, from partial (30-50 aa; 100-150 nt) sequences to full-length protein or genes, allowing its application in both target sequencing and metagenomic sequencing. Our performance evaluation showed that ARGNet outperformed other deep learning models including DeepARG and HMD-ARG in most of the application scenarios especially quasi-negative test and the analysis of prediction consistency with phylogenetic tree. ARGNet has a reduced inference runtime by up to 57% relative to DeepARG. CONCLUSIONS: ARGNet is flexible, efficient, and accurate at predicting a broad range of ARGs from the sequencing data. ARGNet is freely available at https://github.com/id-bioinfo/ARGNet , with an online service provided at https://ARGNet.hku.hk . Video Abstract.


Subject(s)
Bacteria , Neural Networks, Computer , Bacteria/genetics , Bacteria/drug effects , Bacteria/classification , Drug Resistance, Bacterial/genetics , Anti-Bacterial Agents/pharmacology , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods , Genes, Bacterial/genetics , Drug Resistance, Microbial/genetics , Humans , Deep Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...