Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21.524
Filter
1.
Folia Biol (Praha) ; 70(1): 62-73, 2024.
Article in English | MEDLINE | ID: mdl-38830124

ABSTRACT

Germline DNA testing using the next-gene-ration sequencing (NGS) technology has become the analytical standard for the diagnostics of hereditary diseases, including cancer. Its increasing use places high demands on correct sample identification, independent confirmation of prioritized variants, and their functional and clinical interpretation. To streamline these processes, we introduced parallel DNA and RNA capture-based NGS using identical capture panel CZECANCA, which is routinely used for DNA analysis of hereditary cancer predisposition. Here, we present the analytical workflow for RNA sample processing and its analytical and diagnostic performance. Parallel DNA/RNA analysis allowed credible sample identification by calculating the kinship coefficient. The RNA capture-based approach enriched transcriptional targets for the majority of clinically relevant cancer predisposition genes to a degree that allowed analysis of the effect of identified DNA variants on mRNA processing. By comparing the panel and whole-exome RNA enrichment, we demonstrated that the tissue-specific gene expression pattern is independent of the capture panel. Moreover, technical replicates confirmed high reproducibility of the tested RNA analysis. We concluded that parallel DNA/RNA NGS using the identical gene panel is a robust and cost-effective diagnostic strategy. In our setting, it allows routine analysis of 48 DNA/RNA pairs using NextSeq 500/550 Mid Output Kit v2.5 (150 cycles) in a single run with sufficient coverage to analyse 226 cancer predisposition and candidate ge-nes. This approach can replace laborious Sanger confirmatory sequencing, increase testing turnaround, reduce analysis costs, and improve interpretation of the impact of variants by analysing their effect on mRNA processing.


Subject(s)
Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Humans , High-Throughput Nucleotide Sequencing/methods , Neoplasms/genetics , Neoplasms/diagnosis , RNA/genetics , Reproducibility of Results , Sequence Analysis, DNA/methods , Sequence Analysis, RNA/methods , DNA/genetics
2.
Commun Biol ; 7(1): 675, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824179

ABSTRACT

The three-dimensional (3D) organization of genome is fundamental to cell biology. To explore 3D genome, emerging high-throughput approaches have produced billions of sequencing reads, which is challenging and time-consuming to analyze. Here we present Microcket, a package for mapping and extracting interacting pairs from 3D genomics data, including Hi-C, Micro-C, and derivant protocols. Microcket utilizes a unique read-stitch strategy that takes advantage of the long read cycles in modern DNA sequencers; benchmark evaluations reveal that Microcket runs much faster than the current tools along with improved mapping efficiency, and thus shows high potential in accelerating and enhancing the biological investigations into 3D genome. Microcket is freely available at https://github.com/hellosunking/Microcket .


Subject(s)
Genomics , Software , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Data Analysis
3.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-38832466

ABSTRACT

BACKGROUND: Due to human error, sample swapping in large cohort studies with heterogeneous data types (e.g., mix of Oxford Nanopore Technologies, Pacific Bioscience, Illumina data, etc.) remains a common issue plaguing large-scale studies. At present, all sample swapping detection methods require costly and unnecessary (e.g., if data are only used for genome assembly) alignment, positional sorting, and indexing of the data in order to compare similarly. As studies include more samples and new sequencing data types, robust quality control tools will become increasingly important. FINDINGS: The similarity between samples can be determined using indexed k-mer sequence variants. To increase statistical power, we use coverage information on variant sites, calculating similarity using a likelihood ratio-based test. Per sample error rate, and coverage bias (i.e., missing sites) can also be estimated with this information, which can be used to determine if a spatially indexed principal component analysis (PCA)-based prescreening method can be used, which can greatly speed up analysis by preventing exhaustive all-to-all comparisons. CONCLUSIONS: Because this tool processes raw data, is faster than alignment, and can be used on very low-coverage data, it can save an immense degree of computational resources in standard quality control (QC) pipelines. It is robust enough to be used on different sequencing data types, important in studies that leverage the strengths of different sequencing technologies. In addition to its primary use case of sample swap detection, this method also provides information useful in QC, such as error rate and coverage bias, as well as population-level PCA ancestry analysis visualization.


Subject(s)
High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Humans , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Principal Component Analysis , Computational Biology/methods , Algorithms
4.
Microb Genom ; 10(6)2024 Jun.
Article in English | MEDLINE | ID: mdl-38833287

ABSTRACT

It is now possible to assemble near-perfect bacterial genomes using Oxford Nanopore Technologies (ONT) long reads, but short-read polishing is usually required for perfection. However, the effect of short-read depth on polishing performance is not well understood. Here, we introduce Pypolca (with default and careful parameters) and Polypolish v0.6.0 (with a new careful parameter). We then show that: (1) all polishers other than Pypolca-careful, Polypolish-default and Polypolish-careful commonly introduce false-positive errors at low read depth; (2) most of the benefit of short-read polishing occurs by 25× depth; (3) Polypolish-careful almost never introduces false-positive errors at any depth; and (4) Pypolca-careful is the single most effective polisher. Overall, we recommend the following polishing strategies: Polypolish-careful alone when depth is very low (<5×), Polypolish-careful and Pypolca-careful when depth is low (5-25×), and Polypolish-default and Pypolca-careful when depth is sufficient (>25×).


Subject(s)
Genome, Bacterial , Nanopores , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Nanopore Sequencing/methods , Bacteria/genetics , Bacteria/classification , Software , Genomics/methods
5.
Mol Biol Rep ; 51(1): 601, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38693276

ABSTRACT

BACKGROUND: Hemibagrus punctatus (Jerdon, 1849) is a critically endangered bagrid catfish endemic to the Western Ghats of India, whose population is declining due to anthropogenic activities. The current study aims to compare the mitogenome of H. punctatus with that of other Bagrid catfishes and provide insights into their evolutionary relationships. METHODS AND RESULTS: Samples were collected from Hemmige Karnataka, India. In the present study, the mitogenome of H. punctatus was successfully assembled, and its phylogenetic relationships with other Bagridae species were studied. The total genomic DNA of samples was extracted following the phenol-chloroform isoamyl alcohol method. Samples were sequenced, and the Illumina paired-end reads were assembled to a contig length of 16,517 bp. The mitochondrial genome was annotated using MitoFish and MitoAnnotator (Iwasaki et al., 2013). A robust phylogenetic analysis employing NJ (Maximum composite likelihood) and ASAP methods supports the classification of H. punctatus within the Bagridae family, which validates the taxonomic status of this species. In conclusion, this research enriches our understanding of H. punctatus mitogenome, shedding light on its evolutionary dynamics within the Bagridae family and contributing to the broader knowledge of mitochondrial genes in the context of evolutionary biology. CONCLUSIONS: The study's findings contribute to a better understanding of the mitogenome of H. punctatus and provide insights into the evolutionary relationships within other Hemibagrids.


Subject(s)
Catfishes , Endangered Species , Genome, Mitochondrial , Phylogeny , Animals , Genome, Mitochondrial/genetics , Catfishes/genetics , Catfishes/classification , India , Sequence Analysis, DNA/methods , DNA, Mitochondrial/genetics , Evolution, Molecular , RNA, Transfer/genetics
7.
HLA ; 103(5): e15488, 2024 May.
Article in English | MEDLINE | ID: mdl-38699815

ABSTRACT

HLA-C*03:620 differs from the HLA-C*03:04:01:02 allele by one nucleotide substitution in the exon 3.


Subject(s)
Alleles , Asian People , Base Sequence , Exons , HLA-C Antigens , Histocompatibility Testing , Humans , HLA-C Antigens/genetics , Asian People/genetics , Sequence Analysis, DNA/methods , Codon , Sequence Alignment , Polymorphism, Single Nucleotide , East Asian People
10.
HLA ; 103(5): e15498, 2024 May.
Article in English | MEDLINE | ID: mdl-38699849

ABSTRACT

Genomic full-length sequence of HLA-B*37:46 was identified by a group-specific sequencing approach in a Chinese individual.


Subject(s)
Alleles , Asian People , HLA-B Antigens , Histocompatibility Testing , Sequence Analysis, DNA , Humans , HLA-B Antigens/genetics , Sequence Analysis, DNA/methods , Histocompatibility Testing/methods , Asian People/genetics , Exons , Base Sequence
11.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38701418

ABSTRACT

Coverage quantification is required in many sequencing datasets within the field of genomics research. However, most existing tools fail to provide comprehensive statistical results and exhibit limited performance gains from multithreading. Here, we present PanDepth, an ultra-fast and efficient tool for calculating coverage and depth from sequencing alignments. PanDepth outperforms other tools in computation time and memory efficiency for both BAM and CRAM-format alignment files from sequencing data, regardless of read length. It employs chromosome parallel computation and optimized data structures, resulting in ultrafast computation speeds and memory efficiency. It accepts sorted or unsorted BAM and CRAM-format alignment files as well as GTF, GFF and BED-formatted interval files or a specific window size. When provided with a reference genome sequence and the option to enable GC content calculation, PanDepth includes GC content statistics, enhancing the accuracy and reliability of copy number variation analysis. Overall, PanDepth is a powerful tool that accelerates scientific discovery in genomics research.


Subject(s)
Genomics , Software , Genomics/methods , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Base Composition , DNA Copy Number Variations , Computational Biology/methods , Algorithms , Sequence Alignment/methods
13.
J Med Virol ; 96(5): e29652, 2024 May.
Article in English | MEDLINE | ID: mdl-38727029

ABSTRACT

Human papillomavirus (HPV) genotyping is widely used, particularly in combination with high-risk (HR) HPV tests for cervical cancer screening. We developed a genotyping method using sequences of approximately 800 bp in the E6/E7 region obtained by PacBio single molecule real-time sequencing (SMRT) and evaluated its performance against MY09-11 L1 sequencing and after the APTIMA HPV genotyping assay. The levels of concordance of PacBio E6/E7 SMRT sequencing with MY09-11 L1 sequencing and APTIMA HPV genotyping were 100% and 90.8%, respectively. The sensitivity of PacBio E6/EA7 SMRT was slightly greater than that of L1 sequencing and, as expected, lower than that of HR-HPV tests. In the context of cervical cancer screening, PacBio E6/E7 SMRT is then best used after a positive HPV test. PacBio E6/E7 SMRT genotyping is an attractive alternative for HR and LR-HPV genotyping of clinical samples. PacBio SMRT sequencing provides unbiased genotyping and can detect multiple HPV infections and haplotypes within a genotype.


Subject(s)
Genotype , Genotyping Techniques , Papillomaviridae , Papillomavirus Infections , Humans , Papillomavirus Infections/virology , Papillomavirus Infections/diagnosis , Female , Genotyping Techniques/methods , Papillomaviridae/genetics , Papillomaviridae/classification , Papillomaviridae/isolation & purification , Sensitivity and Specificity , Uterine Cervical Neoplasms/virology , Uterine Cervical Neoplasms/diagnosis , Sequence Analysis, DNA/methods , Early Detection of Cancer/methods , Oncogene Proteins, Viral/genetics , DNA, Viral/genetics , High-Throughput Nucleotide Sequencing/methods
14.
Mol Biol Rep ; 51(1): 639, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38727924

ABSTRACT

BACKGROUND: Peucedani Radix, also known as "Qian-hu" is a traditional Chinese medicine derived from Peucedanum praeruptorum Dunn. It is widely utilized for treating wind-heat colds and coughs accompanied by excessive phlegm. However, due to morphological similarities, limited resources, and heightened market demand, numerous substitutes and adulterants of Peucedani Radix have emerged within the herbal medicine market. Moreover, Peucedani Radix is typically dried and sliced for sale, rendering traditional identification methods challenging. MATERIALS AND METHODS: We initially examined and compared 104 commercial "Qian-hu" samples from various Chinese medicinal markets and 44 species representing genuine, adulterants or substitutes, utilizing the mini barcode ITS2 region to elucidate the botanical origins of the commercial "Qian-hu". The nucleotide signature specific to Peucedani Radix was subsequently developed by analyzing the polymorphic sites within the aligned ITS2 sequences. RESULTS: The results demonstrated a success rate of 100% and 93.3% for DNA extraction and PCR amplification, respectively. Forty-five samples were authentic "Qian-hu", while the remaining samples were all adulterants, originating from nine distinct species. Peucedani Radix, its substitutes, and adulterants were successfully identified based on the neighbor-joining tree. The 24-bp nucleotide signature (5'-ATTGTCGTACGAATCCTCGTCGTC-3') revealed distinct differences between Peucedani Radix and its common substitutes and adulterants. The newly designed specific primers (PR-F/PR-R) can amplify the nucleotide signature region from commercial samples and processed materials with severe DNA degradation. CONCLUSIONS: We advocate for the utilization of ITS2 and nucleotide signature for the rapid and precise identification of herbal medicines and their adulterants to regulate the Chinese herbal medicine industry.


Subject(s)
DNA Barcoding, Taxonomic , DNA, Plant , DNA, Plant/genetics , DNA Barcoding, Taxonomic/methods , Drugs, Chinese Herbal/standards , Apiaceae/genetics , Apiaceae/classification , Medicine, Chinese Traditional/standards , DNA, Ribosomal Spacer/genetics , Drug Contamination , Plants, Medicinal/genetics , Phylogeny , Sequence Analysis, DNA/methods , Polymerase Chain Reaction/methods , Nucleotides/genetics , Nucleotides/analysis
15.
Sci Rep ; 14(1): 10217, 2024 05 03.
Article in English | MEDLINE | ID: mdl-38702416

ABSTRACT

Mitochondrial DNA sequences are frequently transferred into the nuclear genome, generating nuclear mitochondrial DNA sequences (NUMTs). Here, we analysed, for the first time, NUMTs in the domestic yak genome. We obtained 499 alignment matches covering 340.2 kbp of the yak nuclear genome. After a merging step, we identified 167 NUMT regions with a total length of ~ 503 kbp, representing 0.02% of the nuclear genome. We discovered copies of all mitochondrial regions and found that most NUMT regions are intergenic or intronic and mostly untranscribed. 98 different NUMT regions from domestic yak showed high homology with cow and/or wild yak genomes, suggesting selection or hybridization between domestic/wild yak and cow. To rule out the possibility that the identified NUMTs could be artifacts of the domestic yak genome assembly, we validated experimentally five NUMT regions by PCR amplification. As NUMT regions show high similarity to the mitochondrial genome can potentially pose a risk to domestic yak DNA mitochondrial studies, special care is therefore needed to select primers for PCR amplification of mitochondrial DNA sequences.


Subject(s)
Cell Nucleus , DNA, Mitochondrial , Genome, Mitochondrial , Animals , Cattle/genetics , DNA, Mitochondrial/genetics , Cell Nucleus/genetics , Animals, Domestic/genetics , Sequence Analysis, DNA/methods
16.
Appl Microbiol Biotechnol ; 108(1): 319, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38709303

ABSTRACT

Shotgun metagenomics sequencing experiments are finding a wide range of applications. Nonetheless, there are still limited guidelines regarding the number of sequences needed to acquire meaningful information for taxonomic profiling and antimicrobial resistance gene (ARG) identification. In this study, we explored this issue in the context of oral microbiota by sequencing with a very high number of sequences (~ 100 million), four human plaque samples, and one microbial community standard and by evaluating the performance of microbial identification and ARGs detection through a downsampling procedure. When investigating the impact of a decreasing number of sequences on quantitative taxonomic profiling in the microbial community standard datasets, we found some discrepancies in the identified microbial species and their abundances when compared to the expected ones. Such differences were consistent throughout downsampling, suggesting their link to taxonomic profiling methods limitations. Overall, results showed that the number of sequences has a great impact on metagenomic samples at the qualitative (i.e., presence/absence) level in terms of loss of information, especially in experiments having less than 40 million reads, whereas abundance estimation was minimally affected, with only slight variations observed in low-abundance species. The presence of ARGs was also assessed: a total of 133 ARGs were identified. Notably, 23% of them inconsistently resulted as present or absent across downsampling datasets of the same sample. Moreover, over half of ARGs were lost in datasets having less than 20 million reads. This study highlights the importance of carefully considering sequencing aspects and suggests some guidelines for designing shotgun metagenomics experiments with the final goal of maximizing oral microbiome analyses. Our findings suggest varying optimized sequence numbers according to different study aims: 40 million for microbiota profiling, 50 million for low-abundance species detection, and 20 million for ARG identification. KEY POINTS: • Forty million sequences are a cost-efficient solution for microbiota profiling • Fifty million sequences allow low-abundance species detection • Twenty million sequences are recommended for ARG identification.


Subject(s)
Bacteria , Dental Plaque , Metagenomics , Microbiota , Humans , Metagenomics/methods , Dental Plaque/microbiology , Microbiota/genetics , Bacteria/genetics , Bacteria/classification , Bacteria/isolation & purification , Drug Resistance, Bacterial/genetics , Sequence Analysis, DNA/methods , Metagenome
17.
BMC Genom Data ; 25(1): 44, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38714950

ABSTRACT

BACKGROUND: China has thousands years of goat breeding and abundant goat genetic resources. Additionally, the Hainan black goat is one of the high-quality local goat breeds in China. In order to conserve the germplasm resources of the Hainan black goat, facilitate its genetic improvement and further protect the genetic diversity of goats, it is urgent to develop a single nucleotide polymorphism (SNP) chip for Hainan black goat. RESULTS: In this study, we aimed to design a 10K liquid chip for Hainan black goat based on genotyping by pinpoint sequencing of liquid captured targets (cGPS). A total of 45,588 candidate SNP sites were obtained, 10,677 of which representative SNP sites were selected to design probes, which finally covered 9,993 intervals and formed a 10K cGPS liquid chip for Hainan black goat. To verify the 10K cGPS liquid chip, some southern Chinese goat breeds and a sheep breed with similar phenotype to the Hainan black goat were selected. A total of 104 samples were used to verify the clustering ability of the 10K cGPS liquid chip for Hainan black goat. The results showed that the detection rate of sites was 97.34% -99.93%. 84.5% of SNP sites were polymorphic. The heterozygosity rate was 3.08%-36.80%. The depth of more than 99.4% sites was above 10X. The repetition rate was 99.66%-99.82%. The average consistency between cGPS liquid chip results and resequencing results was 85.58%. In addition, the phylogenetic tree clustering analysis verified that the SNP sites on the chip had better clustering ability. CONCLUSION: These results indicate that we have successfully realized the development and verification of the 10K cGPS liquid chip for Hainan black goat, which provides a useful tool for the genome analysis of Hainan black goat. Moreover, the 10K cGPS liquid chip is conducive to the research and protection of Hainan black goat germplasm resources and lays a solid foundation for its subsequent breeding work.


Subject(s)
Goats , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Animals , Goats/genetics , Polymorphism, Single Nucleotide/genetics , Oligonucleotide Array Sequence Analysis/methods , China , Genotyping Techniques/methods , Genotype , Sequence Analysis, DNA/methods , Breeding/methods
18.
BMC Bioinformatics ; 25(1): 180, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720249

ABSTRACT

BACKGROUND: High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling. RESULTS: Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants. CONCLUSIONS: Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.


Subject(s)
Benchmarking , High-Throughput Nucleotide Sequencing , Neoplasms , High-Throughput Nucleotide Sequencing/methods , Humans , Neoplasms/genetics , Mutation , Algorithms , DNA, Neoplasm/genetics , Sequence Analysis, DNA/methods , Computational Biology/methods
20.
Microb Genom ; 10(5)2024 May.
Article in English | MEDLINE | ID: mdl-38713194

ABSTRACT

Whole-genome reconstruction of bacterial pathogens has become an important tool for tracking transmission and antimicrobial resistance gene spread, but highly accurate and complete assemblies have largely only historically been achievable using hybrid long- and short-read sequencing. We previously found the Oxford Nanopore Technologies (ONT) R10.4/kit12 flowcell/chemistry produced improved assemblies over the R9.4.1/kit10 combination, however long-read only assemblies contained more errors compared to Illumina-ONT hybrid assemblies. ONT have since released an R10.4.1/kit14 flowcell/chemistry upgrade and recommended the use of Bovine Serum Albumin (BSA) during library preparation, both of which reportedly increase accuracy and yield. They have also released updated basecallers trained using native bacterial DNA containing methylation sites intended to fix systematic basecalling errors, including common adenosine (A) to guanine (G) and cytosine (C) to thymine (T) substitutions. To evaluate these improvements, we successfully sequenced four bacterial reference strains, namely Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus aureus, and nine genetically diverse E. coli bloodstream infection-associated isolates from different phylogroups and sequence types, both with and without BSA. These sequences were de novo assembled and compared against Illumina-corrected reference genomes. In this small evaluation of 13 isolates we found that nanopore long-read-only R10.4.1/kit 14 assemblies with updated basecallers trained using bacterial methylated DNA produce accurate assemblies with ≥40×depth, sufficient to be cost-effective compared with hybrid ONT/Illumina sequencing in our setting.


Subject(s)
Genome, Bacterial , Nanopores , High-Throughput Nucleotide Sequencing/methods , Escherichia coli/genetics , Staphylococcus aureus/genetics , Sequence Analysis, DNA/methods , Pseudomonas aeruginosa/genetics , Nanopore Sequencing/methods , DNA, Bacterial/genetics , Klebsiella pneumoniae/genetics , Whole Genome Sequencing/methods , Bacteria/genetics , Bacteria/classification , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...