Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Comput Struct Biotechnol J ; 23: 2289-2303, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38840832

ABSTRACT

The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.

2.
Comput Struct Biotechnol J ; 23: 1919-1928, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38711760

ABSTRACT

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

3.
NAR Genom Bioinform ; 6(2): lqae029, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38584871

ABSTRACT

The prevalence of nucleic and peptide short sequences across organismal genomes and proteomes has not been thoroughly investigated. We examined 45 785 reference genomes and 21 871 reference proteomes, spanning archaea, bacteria, eukaryotes and viruses to calculate the rarity of short sequences in them. To capture this, we developed a metric of the rarity of each sequence in nature, the rarity index. We find that the frequency of certain dipeptides in rare oligopeptide sequences is hundreds of times lower than expected, which is not the case for any dinucleotides. We also generate predictive regression models that infer the rarity of nucleic and proteomic sequences across nature or within each domain of life and viruses separately. When examining each of the three domains of life and viruses separately, the R² performance of the model predicting rarity for 5-mer peptides from mono- and dipeptides ranged between 0.814 and 0.932. A separate model predicting rarity for 10-mer oligonucleotides from mono- and dinucleotides achieved R² performance between 0.408 and 0.606. Our results indicate that the mono- and dinucleotide composition of nucleic sequences and the mono- and dipeptide composition of peptide sequences can explain a significant proportion of the variance in their frequencies in nature.

4.
Sci Adv ; 10(13): eadi4393, 2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38536919

ABSTRACT

The Drosophila brain contains tens of thousands of distinct cell types. Thousands of different transgenic lines reproducibly target specific neuron subsets, yet most still express in several cell types. Furthermore, most lines were developed without a priori knowledge of where the transgenes would be expressed. To aid in the development of cell type-specific tools for neuronal identification and manipulation, we developed an iterative assay for transposase-accessible chromatin (ATAC) approach. Open chromatin regions (OCRs) enriched in neurons, compared to whole bodies, drove transgene expression preferentially in subsets of neurons. A second round of ATAC-seq from these specific neuron subsets revealed additional enriched OCR2s that further restricted transgene expression within the chosen neuron subset. This approach allows for continued refinement of transgene expression, and we used it to identify neurons relevant for sleep behavior. Furthermore, this approach is widely applicable to other cell types and to other organisms.


Subject(s)
Chromatin , Transposases , Chromatin/genetics , Transposases/genetics , Transposases/metabolism , High-Throughput Nucleotide Sequencing , Chromatin Immunoprecipitation Sequencing , Neurons/metabolism , Sequence Analysis, DNA
5.
Cancer Gene Ther ; 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38351138

ABSTRACT

Early detection of cancer can significantly improve patient outcomes; however, sensitive and highly specific biomarkers for cancer detection are currently missing. Nullomers are the shortest sequences that are absent from the human genome but can emerge due to somatic mutations in cancer. We examine over 10,000 whole exome sequencing matched tumor-normal samples to characterize nullomer emergence across exonic regions of the genome. We also identify nullomer emerging mutational hotspots within tumor genes. Finally, we provide evidence for the identification of nullomers in cell-free RNA from peripheral blood samples, enabling detection of multiple tumor types. We show multiple tumor classification models with an AUC greater than 0.9, including a hepatocellular carcinoma classifier with an AUC greater than 0.99.

6.
Lung Cancer ; 186: 107424, 2023 12.
Article in English | MEDLINE | ID: mdl-37979487

ABSTRACT

INTRODUCTION: NELSON and NLST prompted the implementation of lung cancer screening programs in the United States followed by several European countries. This study aimed to assess the sensitivity of different screening criteria among patients with lung cancer in Greece and investigate reasons for ineligibility. METHODS: We performed a retrospective analysis on patients with lung cancer referred to the largest referral center in Athens, Greece, between June 2014 and May 2023. The proportion of patients who would meet the updated USPSTF and NLST criteria was compared to the corresponding proportion of the Greek population over 15 years of age. RESULTS: Out of 2434 patients with lung cancer, 77.4 % (N = 1883) would meet the updated USPSTF criteria, and 58.9 % (N = 1439) would meet the NLST criteria at diagnosis; the corresponding proportions for the Greek population over 15 years would be 13.8 % and 8.2 %, respectively. Ineligible patients were more likely to be female, former or never-smokers, have adenocarcinoma histology, and have driver mutations (p < 0.001). CONCLUSIONS: Although the updated USPSTF criteria demonstrated good sensitivity, a substantial proportion of patients with lung cancer would still not be eligible for screening. Future studies to shape a comprehensive screening strategy should focus on the incorporation of additional risk factors for lung cancer, including air pollution and individual genetic susceptibility.


Subject(s)
Lung Neoplasms , Humans , Female , United States , Male , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/epidemiology , Greece/epidemiology , Retrospective Studies , Early Detection of Cancer , Smoking/adverse effects , Mass Screening , Tomography, X-Ray Computed
7.
Genes (Basel) ; 13(10)2022 10 16.
Article in English | MEDLINE | ID: mdl-36292760

ABSTRACT

There is growing interest in saliva microRNAs (miRNAs) as non-invasive biomarkers for human disease. Such an approach requires understanding how differences in experimental design affect miRNA expression. Variations in technical methodologies, coupled with inter-individual variability may reduce study reproducibility and generalizability. Another barrier facing salivary miRNA biomarker research is a lack of recognized "control miRNAs". In one of the largest studies of human salivary miRNA to date (922 healthy individuals), we utilized 1225 saliva samples to quantify variability in miRNA expression resulting from aligner selection (Bowtie1 vs. Bowtie2), saliva collection method (expectorated vs. swabbed), RNA stabilizer (presence vs. absence), and individual biological factors (sex, age, body mass index, exercise, caloric intake). Differential expression analyses revealed that absence of RNA stabilizer introduced the greatest variability, followed by differences in methods of collection and aligner. Biological factors generally affected a smaller number of miRNAs. We also reported coefficients of variations for 643 miRNAs consistently present in saliva, highlighting several salivary miRNAs to serve as reference genes. Thus, the results of this analysis can be used by researchers to optimize parameters of salivary miRNA measurement, exclude miRNAs confounded by numerous biologic factors, and identify appropriate miRNA controls.


Subject(s)
MicroRNAs , Saliva , Humans , Saliva/chemistry , Reproducibility of Results , MicroRNAs/genetics , MicroRNAs/metabolism , Biomarkers/metabolism
8.
BMC Genomics ; 23(1): 399, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35614386

ABSTRACT

BACKGROUND: Gene regulation is critical for proper cellular function. Next-generation sequencing technology has revealed the presence of regulatory networks that regulate gene expression and essential cellular functions. Studies investigating the epigenome have begun to uncover the complex mechanisms regulating transcription. Assay for transposase-accessible chromatin by sequencing (ATAC-seq) is quickly becoming the assay of choice for many epigenomic investigations. However, whether intervention-mediated changes in accessible chromatin determined by ATAC-seq can be harnessed to generate intervention-inducible reporter constructs has not been systematically assayed. RESULTS: We used the insulin signaling pathway as a model to investigate chromatin regions and gene expression changes using ATAC- and RNA-seq in insulin-treated Drosophila S2 cells. We found correlations between ATAC- and RNA-seq data, especially when stratifying differentially-accessible chromatin regions by annotated feature type. In particular, our data demonstrated a weak but significant correlation between chromatin regions annotated to enhancers (1-2 kb from the transcription start site) and downstream gene expression. We cloned candidate enhancer regions upstream of luciferase and demonstrate insulin-inducibility of several of these reporters. CONCLUSIONS: Insulin-induced chromatin accessibility determined by ATAC-seq reveals enhancer regions that drive insulin-inducible reporter gene expression.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin , Animals , Chromatin/genetics , Drosophila/genetics , High-Throughput Nucleotide Sequencing , Insulin/pharmacology , Transposases/genetics
9.
Sci Rep ; 12(1): 6043, 2022 04 11.
Article in English | MEDLINE | ID: mdl-35411004

ABSTRACT

Assay for transposase-accessible chromatin by sequencing (ATAC-seq) is rapidly becoming the assay of choice to investigate chromatin-mediated gene regulation, largely because of low input requirements, a fast workflow, and the ability to interrogate the entire genome in an untargeted manner. Many studies using ATAC-seq use mammalian or human-derived tissues, and established protocols work well in these systems. However, ATAC-seq is not yet widely used in Drosophila. Vinegar flies present several advantages over mammalian systems that make them an excellent model for ATAC-seq studies, including abundant genetic tools that allow straightforward targeting, transgene expression, and genetic manipulation that are not available in mammalian models. Because current ATAC-seq protocols are not optimized to use flies, we developed an optimized workflow that accounts for several complicating factors present in Drosophila. We examined parameters affecting nuclei isolation, including input size, freezing time, washing, and possible confounds from retinal pigments. Then, we optimized the enzymatic steps of library construction to account for the smaller Drosophila genome size. Finally, we used our optimized protocol to generate ATAC-seq libraries that meet ENCODE quality metrics. Our optimized protocol enables extensive ATAC-seq experiments in Drosophila, thereby leveraging the advantages of this powerful model system to understand chromatin-mediated gene regulation.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin , Animals , Chromatin/genetics , Drosophila/genetics , Drosophila/metabolism , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , High-Throughput Nucleotide Sequencing/methods , Mammals/metabolism , Neurons/metabolism , Sequence Analysis, DNA/methods , Transposases/genetics , Transposases/metabolism
10.
BMC Biol ; 19(1): 31, 2021 02 16.
Article in English | MEDLINE | ID: mdl-33593351

ABSTRACT

BACKGROUND: Proper regulation of feeding is important for an organism's well-being and survival and involves a motivational component directing the search for food. Dissecting the molecular and neural mechanisms of motivated feeding behavior requires assays that allow quantification of both motivation and food intake. Measurements of motivated behavior usually involve assessing physical effort or overcoming an aversive stimulus. Food intake in Drosophila can be determined in a number of ways, including by measuring the time a fly's proboscis interacts with a food source associated with an electrical current in the fly liquid-food interaction counter (FLIC). Here, we show that electrical current flowing through flies during this interaction is aversive, and we describe a modified assay to measure motivation in Drosophila. RESULTS: Food intake is reduced during the interaction with FLIC when the electrical current is turned on, which provides a confounding variable in studies of motivated behavior. Based on the FLIC, we engineer a novel assay, the fly liquid-food electroshock assay (FLEA), which allows for current adjustments for each feeding well. Using the FLEA, we show that both external incentives and internal motivational state can serve as drivers for flies to overcome higher current (electric shock) to obtain superior food. Unlike similar assays in which bitterness is the aversive stimulus for the fly to overcome, we show that current perception is not discounted as flies become more food-deprived. Finally, we use genetically manipulated flies to show that neuropeptide F, an orthologue of mammalian NPY previously implicated in regulation of feeding motivation, is required for sensory processing of electrical current. CONCLUSION: The FLEA is therefore a novel assay to accurately measure incentive motivation in Drosophila. Using the FLEA, we also show that neuropeptide F is required for proper perception or processing of an electroshock, a novel function for this neuropeptide involved in the processing of external and internal stimuli.


Subject(s)
Drosophila melanogaster/physiology , Electroshock , Insect Proteins/metabolism , Neuropeptides/metabolism , Animals , Avoidance Learning/physiology , Feeding Behavior/physiology , Food/classification , Male , Taste Perception/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...