Search | VHL Regional Portal

kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.

Mouratidis, Ioannis; Baltoumas, Fotis A; Chantzi, Nikol; Patsakis, Michail; Chan, Candace S Y; Montgomery, Austin; Konnaris, Maxwell A; Aplakidou, Eleni; Georgakopoulos, George C; Das, Anshuman; Chartoumpekis, Dionysios V; Kovac, Jasna; Pavlopoulos, Georgios A; Georgakopoulos-Soares, Ilias.

Comput Struct Biotechnol J ; 23: 1919-1928, 2024 Dec.

Article in English | MEDLINE | ID: mdl-38711760

ABSTRACT

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

Visualizing metagenomic and metatranscriptomic data: A comprehensive review.

Aplakidou, Eleni; Vergoulidis, Nikolaos; Chasapi, Maria; Venetsianou, Nefeli K; Kokoli, Maria; Panagiotopoulou, Eleni; Iliopoulos, Ioannis; Karatzas, Evangelos; Pafilis, Evangelos; Georgakopoulos-Soares, Ilias; Kyrpides, Nikos C; Pavlopoulos, Georgios A; Baltoumas, Fotis A.

Comput Struct Biotechnol J ; 23: 2011-2033, 2024 Dec.

Article in English | MEDLINE | ID: mdl-38765606

ABSTRACT

The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.

Flame (v2.0): advanced integration and interpretation of functional enrichment results from multiple sources.

Karatzas, Evangelos; Baltoumas, Fotis A; Aplakidou, Eleni; Kontou, Panagiota I; Stathopoulos, Panos; Stefanis, Leonidas; Bagos, Pantelis G; Pavlopoulos, Georgios A.

Bioinformatics ; 39(8)2023 08 01.

Article in English | MEDLINE | ID: mdl-37540207

ABSTRACT

Functional enrichment is the process of identifying implicated functional terms from a given input list of genes or proteins. In this article, we present Flame (v2.0), a web tool which offers a combinatorial approach through merging and visualizing results from widely used functional enrichment applications while also allowing various flexible input options. In this version, Flame utilizes the aGOtool, g: Profiler, WebGestalt, and Enrichr pipelines and presents their outputs separately or in combination following a visual analytics approach. For intuitive representations and easier interpretation, it uses interactive plots such as parameterizable networks, heatmaps, barcharts, and scatter plots. Users can also: (i) handle multiple protein/gene lists and analyse union and intersection sets simultaneously through interactive UpSet plots, (ii) automatically extract genes and proteins from free text through text-mining and Named Entity Recognition (NER) techniques, (iii) upload single nucleotide polymorphisms (SNPs) and extract their relative genes, or (iv) analyse multiple lists of differentially expressed proteins/genes after selecting them interactively from a parameterizable volcano plot. Compared to the previous version of 197 supported organisms, Flame (v2.0) currently allows enrichment for 14 436 organisms. AVAILABILITY AND IMPLEMENTATION: Web Application: http://flame.pavlopouloslab.info. Code: https://github.com/PavlopoulosLab/Flame. Docker: https://hub.docker.com/r/pavlopouloslab/flame.

Subject(s)

Proteins , Software , Data Mining

Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.

Baltoumas, Fotis A; Karatzas, Evangelos; Paez-Espino, David; Venetsianou, Nefeli K; Aplakidou, Eleni; Oulas, Anastasis; Finn, Robert D; Ovchinnikov, Sergey; Pafilis, Evangelos; Kyrpides, Nikos C; Pavlopoulos, Georgios A.

Front Bioinform ; 3: 1157956, 2023.

Article in English | MEDLINE | ID: mdl-36959975

ABSTRACT

Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL