Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters










Publication year range
1.
Comput Struct Biotechnol J ; 23: 2011-2033, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38765606

ABSTRACT

The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.

2.
Comput Struct Biotechnol J ; 23: 1919-1928, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38711760

ABSTRACT

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

3.
Nucleic Acids Res ; 52(D1): D502-D512, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37811892

ABSTRACT

The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.


Subject(s)
Databases, Protein , Metagenome , Proteins , Amino Acid Sequence , Databases, Factual , Ecosystem , Proteins/chemistry , Geography
4.
bioRxiv ; 2023 Nov 22.
Article in English | MEDLINE | ID: mdl-38045264

ABSTRACT

Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.

5.
Comput Struct Biotechnol J ; 21: 5382-5393, 2023.
Article in English | MEDLINE | ID: mdl-38022693

ABSTRACT

Analysis and interpretation of high-throughput transcriptional and chromatin accessibility data at single-cell (sc) resolution are still open challenges in the biomedical field. The existence of countless bioinformatics tools, for the different analytical steps, increases the complexity of data interpretation and the difficulty to derive biological insights. In this article, we present SCALA, a bioinformatics tool for analysis and visualization of single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) datasets, enabling either independent or integrative analysis of the two modalities. SCALA combines standard types of analysis by integrating multiple software packages varying from quality control to the identification of distinct cell populations and cell states. Additional analysis options enable functional enrichment, cellular trajectory inference, ligand-receptor analysis, and regulatory network reconstruction. SCALA is fully parameterizable, presenting data in tabular format and producing publication-ready visualizations. The different available analysis modules can aid biomedical researchers in exploring, analyzing, and visualizing their data without any prior experience in coding. We demonstrate the functionality of SCALA through two use-cases related to TNF-driven arthritic mice, handling both scRNA-seq and scATAC-seq datasets. SCALA is developed in R, Shiny and JavaScript and is mainly available as a standalone version, while an online service of more limited capacity can be found at http://scala.pavlopouloslab.info or https://scala.fleming.gr.

6.
Nature ; 622(7983): 594-602, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821698

ABSTRACT

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Subject(s)
Metagenome , Metagenomics , Microbiology , Proteins , Cluster Analysis , Metagenome/genetics , Metagenomics/methods , Proteins/chemistry , Proteins/classification , Proteins/genetics , Databases, Protein , Protein Conformation
7.
Bioinformatics ; 39(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37540207

ABSTRACT

Functional enrichment is the process of identifying implicated functional terms from a given input list of genes or proteins. In this article, we present Flame (v2.0), a web tool which offers a combinatorial approach through merging and visualizing results from widely used functional enrichment applications while also allowing various flexible input options. In this version, Flame utilizes the aGOtool, g: Profiler, WebGestalt, and Enrichr pipelines and presents their outputs separately or in combination following a visual analytics approach. For intuitive representations and easier interpretation, it uses interactive plots such as parameterizable networks, heatmaps, barcharts, and scatter plots. Users can also: (i) handle multiple protein/gene lists and analyse union and intersection sets simultaneously through interactive UpSet plots, (ii) automatically extract genes and proteins from free text through text-mining and Named Entity Recognition (NER) techniques, (iii) upload single nucleotide polymorphisms (SNPs) and extract their relative genes, or (iv) analyse multiple lists of differentially expressed proteins/genes after selecting them interactively from a parameterizable volcano plot. Compared to the previous version of 197 supported organisms, Flame (v2.0) currently allows enrichment for 14 436 organisms. AVAILABILITY AND IMPLEMENTATION: Web Application: http://flame.pavlopouloslab.info. Code: https://github.com/PavlopoulosLab/Flame. Docker: https://hub.docker.com/r/pavlopouloslab/flame.


Subject(s)
Proteins , Software , Data Mining
8.
NAR Genom Bioinform ; 5(2): lqad053, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37260509

ABSTRACT

Arena3Dweb is an interactive web tool that visualizes multi-layered networks in 3D space. In this update, Arena3Dweb supports directed networks as well as up to nine different types of connections between pairs of nodes with the use of Bézier curves. It comes with different color schemes (light/gray/dark mode), custom channel coloring, four node clustering algorithms which one can run on-the-fly, visualization in VR mode and predefined layer layouts (zig-zag, star and cube). This update also includes enhanced navigation controls (mouse orbit controls, layer dragging and layer/node selection), while its newly developed API allows integration with external applications as well as saving and loading of sessions in JSON format. Finally, a dedicated Cytoscape app has been developed, through which users can automatically send their 2D networks from Cytoscape to Arena3Dweb for 3D multi-layer visualization. Arena3Dweb is accessible at http://arena3d.pavlopouloslab.info or http://arena3d.org.

9.
Front Bioinform ; 3: 1157956, 2023.
Article in English | MEDLINE | ID: mdl-36959975

ABSTRACT

Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.

10.
Biomolecules ; 13(2)2023 02 01.
Article in English | MEDLINE | ID: mdl-36830638

ABSTRACT

Receptor tyrosine kinases (RTKs) form a highly important group of protein receptors of the eukaryotic cell membrane. They control many vital cellular functions and are involved in the regulation of complex signaling networks. Mutations in RTKs have been associated with different types of cancers and other diseases. Although they are very important for proper cell function, they have been experimentally studied in a limited range of eukaryotic species. Currently, there is no available database for RTKs providing information about their function, expression, and interactions. Therefore, the identification of RTKs in multiple organisms, the documentation of their characteristics, and the collection of related information would be very useful. In this paper, we present a novel RTK detection pipeline (RTK-PRED) and the Receptor Tyrosine Kinases Database (TyReK-DB). RTK-PRED combines profile HMMs with transmembrane topology prediction to identify and classify potential RTKs. Proteins of all eukaryotic reference proteomes of the UniProt database were used as input in RTK-PRED leading to a filtered dataset of 20,478 RTKs. Based on the information collected for these RTKs from multiple databases, the relational TyReK database was created.


Subject(s)
Neoplasms , Proteome , Humans , Receptor Protein-Tyrosine Kinases/metabolism , Signal Transduction/physiology , Neoplasms/metabolism , Tyrosine
11.
Membranes (Basel) ; 13(1)2023 Jan 03.
Article in English | MEDLINE | ID: mdl-36676869

ABSTRACT

The nuclear envelope (NE) is a double-membrane system surrounding the nucleus of eukaryotic cells. A large number of proteins are localized in the NE, performing a wide variety of functions, from the bidirectional exchange of molecules between the cytoplasm and the nucleus to chromatin tethering, genome organization, regulation of signaling cascades, and many others. Despite its importance, several aspects of the NE, including its protein-protein interactions, remain understudied. In this work, we present NucEnvDB, a publicly available database of NE proteins and their interactions. Each database entry contains useful annotation including a description of its position in the NE, its interactions with other proteins, and cross-references to major biological repositories. In addition, the database provides users with a number of visualization and analysis tools, including the ability to construct and visualize protein-protein interaction networks and perform functional enrichment analysis for clusters of NE proteins and their interaction partners. The capabilities of NucEnvDB and its analysis tools are showcased by two informative case studies, exploring protein-protein interactions in Hutchinson-Gilford progeria and during SARS-CoV-2 infection at the level of the nuclear envelope.

12.
Biomolecules ; 12(4)2022 03 30.
Article in English | MEDLINE | ID: mdl-35454109

ABSTRACT

Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms in PubMed articles, mentioned in OMIM, DisGeNET and Human Phenotype Ontology (HPO) disease records, and generates an interactive biomedical entity association network. Nodes in this network represent genes, proteins, chemicals, functions, tissues, diseases, environments and phenotypes. Users can search by identifiers, terms/entities or free text and explore the relevant abstracts in an annotated format.


Subject(s)
Proteins , Software , Data Mining , Phenotype , PubMed
13.
Bioinform Adv ; 2(1): vbac036, 2022.
Article in English | MEDLINE | ID: mdl-36699373

ABSTRACT

Motivation: Network biology is a dominant player in today's multi-omics era. Therefore, the need for visualization tools which can efficiently cope with intra-network heterogeneity emerges. Results: NORMA-2.0 is a web application which uses efficient layouts to group together areas of interest in a network. In this version, NORMA-2.0 utilizes three different strategies to make such groupings as distinct as possible while it preserves all of the properties from its first version where one can handle multiple networks and annotation files simultaneously. Availability and implementation: The web resource is available at http://norma.pavlopouloslab.info/. The source code is freely available at https://github.com/PavlopoulosLab/NORMA.

14.
NAR Genom Bioinform ; 3(4): lqab090, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34632381

ABSTRACT

Extracting and processing information from documents is of great importance as lots of experimental results and findings are stored in local files. Therefore, extracting and analyzing biomedical terms from such files in an automated way is absolutely necessary. In this article, we present OnTheFly2.0, a web application for extracting biomedical entities from individual files such as plain texts, office documents, PDF files or images. OnTheFly2.0 can generate informative summaries in popup windows containing knowledge related to the identified terms along with links to various databases. It uses the EXTRACT tagging service to perform named entity recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and gene ontology terms. Multiple files can be analyzed, whereas identified terms such as proteins or genes can be explored through functional enrichment analysis or be associated with diseases and PubMed entries. Finally, protein-protein and protein-chemical networks can be generated with the use of STRING and STITCH services. To demonstrate its capacity for knowledge discovery, we interrogated published meta-analyses of clinical biomarkers of severe COVID-19 and uncovered inflammatory and senescence pathways that impact disease pathogenesis. OnTheFly2.0 currently supports 197 species and is available at http://bib.fleming.gr:3838/OnTheFly/ and http://onthefly.pavlopouloslab.info.

15.
Biology (Basel) ; 10(7)2021 Jul 14.
Article in English | MEDLINE | ID: mdl-34356520

ABSTRACT

Functional enrichment is a widely used method for interpreting experimental results by identifying classes of proteins/genes associated with certain biological functions, pathways, diseases, or phenotypes. Despite the variety of existing tools, most of them can process a single list per time, thus making a more combinatorial analysis more complicated and prone to errors. In this article, we present FLAME, a web tool for combining multiple lists prior to enrichment analysis. Users can upload several lists and use interactive UpSet plots, as an alternative to Venn diagrams, to handle unions or intersections among the given input files. Functional and literature enrichment, along with gene conversions, are offered by g:Profiler and aGOtool applications for 197 organisms. FLAME can analyze genes/proteins for related articles, Gene Ontologies, pathways, annotations, regulatory motifs, domains, diseases, and phenotypes, and can also generate protein-protein interactions derived from STRING. We have validated FLAME by interrogating gene expression data associated with the sensitivity of the distal part of the large intestine to experimental colitis-propelled colon cancer. FLAME comes with an interactive user-friendly interface for easy list manipulation and exploration, while results can be visualized as interactive and parameterizable heatmaps, barcharts, Manhattan plots, networks, and tables.

16.
Biomolecules ; 11(8)2021 08 20.
Article in English | MEDLINE | ID: mdl-34439912

ABSTRACT

Technological advances in high-throughput techniques have resulted in tremendous growth of complex biological datasets providing evidence regarding various biomolecular interactions. To cope with this data flood, computational approaches, web services, and databases have been implemented to deal with issues such as data integration, visualization, exploration, organization, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more and more difficult for an end user to know what the scope and focus of each repository is and how redundant the information between them is. Several repositories have a more general scope, while others focus on specialized aspects, such as specific organisms or biological systems. Unfortunately, many of these databases are self-contained or poorly documented and maintained. For a clearer view, in this article we provide a comprehensive categorization, comparison and evaluation of such repositories for different bioentity interaction types. We discuss most of the publicly available services based on their content, sources of information, data representation methods, user-friendliness, scope and interconnectivity, and we comment on their strengths and weaknesses. We aim for this review to reach a broad readership varying from biomedical beginners to experts and serve as a reference article in the field of Network Biology.


Subject(s)
Medical Informatics/methods , Protein Interaction Mapping/methods , Software , Systems Biology/methods , Animals , Computational Biology/methods , Databases, Factual , Humans , Protein Binding , Protein Interaction Maps , RNA/metabolism , Signal Transduction
17.
Comput Biol Med ; 135: 104557, 2021 08.
Article in English | MEDLINE | ID: mdl-34139436

ABSTRACT

Clustering is the process of grouping different data objects based on similar properties. Clustering has applications in various case studies from several fields such as graph theory, image analysis, pattern recognition, statistics and others. Nowadays, there are numerous algorithms and tools able to generate clustering results. However, different algorithms or parameterizations may produce quite dissimilar cluster sets. In this way, the user is often forced to manually filter and compare these results in order to decide which of them generate the ideal clusters. To automate this process, in this study, we present VICTOR, the first fully interactive and dependency-free visual analytics web application which allows the visual comparison of the results of various clustering algorithms. VICTOR can handle multiple cluster set results simultaneously and compare them using ten different metrics. Clustering results can be filtered and compared to each other with the use of data tables or interactive heatmaps, bar plots, correlation networks, sankey and circos plots. We demonstrate VICTOR's functionality using three examples. In the first case, we compare five different network clustering algorithms on a Yeast protein-protein interaction dataset whereas in the second example, we test four different parameters of the MCL clustering algorithm on the same dataset. Finally, as a third example, we compare four different meta-analyses with hierarchically clustered differentially expressed genes found to be involved in myocardial infarction. VICTOR is available at http://victor.pavlopouloslab.info or http://bib.fleming.gr:3838/VICTOR.


Subject(s)
Algorithms , Benchmarking , Cluster Analysis
18.
Nucleic Acids Res ; 49(W1): W36-W45, 2021 07 02.
Article in English | MEDLINE | ID: mdl-33885790

ABSTRACT

Efficient integration and visualization of heterogeneous biomedical information in a single view is a key challenge. In this study, we present Arena3Dweb, the first, fully interactive and dependency-free, web application which allows the visualization of multilayered graphs in 3D space. With Arena3Dweb, users can integrate multiple networks in a single view along with their intra- and inter-layer connections. For clearer and more informative views, users can choose between a plethora of layout algorithms and apply them on a set of selected layers either individually or in combination. Users can align networks and highlight node topological features, whereas each layer as well as the whole scene can be translated, rotated and scaled in 3D space. User-selected edge colors can be used to highlight important paths, while node positioning, coloring and resizing can be adjusted on-the-fly. In its current version, Arena3Dweb supports weighted and unweighted undirected graphs and is written in R, Shiny and JavaScript. We demonstrate the functionality of Arena3Dweb using two different use-case scenarios; one regarding drug repurposing for SARS-CoV-2 and one related to GPCR signaling pathways implicated in melanoma. Arena3Dweb is available at http://bib.fleming.gr:3838/Arena3D or http://bib.fleming.gr/Arena3D.


Subject(s)
Algorithms , Data Visualization , Internet , Protein Interaction Maps , Software , COVID-19/metabolism , Color , Drug Repositioning , Humans , Melanoma/drug therapy , Melanoma/metabolism , Programming Languages , Receptors, Endothelin/metabolism , SARS-CoV-2/metabolism , Signal Transduction , COVID-19 Drug Treatment
19.
J Proteome Res ; 19(1): 511-524, 2020 01 03.
Article in English | MEDLINE | ID: mdl-31774292

ABSTRACT

G-protein coupled receptors (GPCRs) mediate crucial physiological functions in humans, have been implicated in an array of diseases, and are therefore prime drug targets. GPCRs signal via a multitude of pathways, mainly through G-proteins and ß-arrestins, to regulate effectors responsible for cellular responses. The limited number of transducers results in different GPCRs exerting control on the same pathway, while the availability of signaling proteins in a cell defines the result of GPCR activation. The aim of this study was to construct the extended human GPCR network (hGPCRnet) and examine the effect that cell-type specificity has on GPCR signaling pathways. To achieve this, protein-protein interaction data between GPCRs, G-protein coupled receptor kinases (GRKs), Gα subunits, ß-arrestins, and effectors were combined with protein expression data in cell types. This resulted in the hGPCRnet, a very large interconnected network, and similar cell-type-specific networks in which, distinct GPCR signaling pathways were formed. Finally, a user friendly web application, hGPCRnet ( http://bioinformatics.biol.uoa.gr/hGPCRnet ), was created to allow for the visualization and exploration of these networks and of GPCR signaling pathways. This work, and the resulting application, can be useful in further studies of GPCR function and pharmacology.


Subject(s)
Alzheimer Disease/metabolism , Drug-Related Side Effects and Adverse Reactions/metabolism , Neoplasms/metabolism , Receptors, G-Protein-Coupled/metabolism , Cluster Analysis , Data Visualization , Databases, Protein , Humans , Protein Interaction Maps , Signal Transduction , Software , beta-Arrestins/metabolism
20.
J Struct Biol ; 207(3): 260-269, 2019 09 01.
Article in English | MEDLINE | ID: mdl-31170474

ABSTRACT

ALECT2 (leukocyte chemotactic factor 2) amyloidosis is one of the most recently identified amyloid-related diseases, with LECT2 amyloids commonly found in different types of tissues. Under physiological conditions, LECT2 is a 16 kDa multifunctional protein produced by the hepatocytes and secreted into circulation. The pathological mechanisms causing LECT2 transition into the amyloid state are still largely unknown. In the case of ALECT2 patients, there is no disease-causing mutation, yet almost all patients carry a common polymorphism that appears to be necessary but not sufficient to directly trigger amyloidogenesis. In this work, we followed a reductionist methodology in order to detect critical amyloidogenic "hot-spots" during the fibrillation of LECT2. By associating experimental and computational assays, this approach reveals the explicit amyloidogenic core of human LECT2 and pinpoints regions with distinct amyloidogenic properties. The fibrillar architecture of LECT2 polymers, based on our results, provides a wealth of detailed information about the amyloidogenic "hot-spot" interactions and represents a starting point for future peptide-driven intervention in ALECT2 amyloidosis.


Subject(s)
Amyloid/chemistry , Amyloidosis/genetics , Intercellular Signaling Peptides and Proteins/chemistry , Polymorphism, Single Nucleotide , Amino Acid Sequence , Amyloid/metabolism , Amyloid/ultrastructure , Amyloidosis/diagnosis , Amyloidosis/metabolism , Binding Sites/genetics , Hepatocytes/cytology , Hepatocytes/metabolism , Humans , Intercellular Signaling Peptides and Proteins/genetics , Intercellular Signaling Peptides and Proteins/metabolism , Microscopy, Electron , Models, Molecular , Protein Aggregates , Protein Aggregation, Pathological , Protein Binding , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL
...