Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Front Bioinform ; 3: 1304099, 2023.
Article in English | MEDLINE | ID: mdl-38076030

ABSTRACT

The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.

2.
BMC Bioinformatics ; 23(Suppl 2): 154, 2022 Dec 12.
Article in English | MEDLINE | ID: mdl-36510125

ABSTRACT

BACKGROUND: Cis-regulatory regions (CRRs) are non-coding regions of the DNA that fine control the spatio-temporal pattern of transcription; they are involved in a wide range of pivotal processes such as the development of specific cell-lines/tissues and the dynamic cell response to physiological stimuli. Recent studies showed that genetic variants occurring in CRRs are strongly correlated with pathogenicity or deleteriousness. Considering the central role of CRRs in the regulation of physiological and pathological conditions, the correct identification of CRRs and of their tissue-specific activity status through Machine Learning methods plays a major role in dissecting the impact of genetic variants on human diseases. Unfortunately, the problem is still open, though some promising results have been already reported by (deep) machine-learning based methods that predict active promoters and enhancers in specific tissues or cell lines by encoding epigenetic or spectral features directly extracted from DNA sequences. RESULTS: We present the experiments we performed to compare two Deep Neural Networks, a Feed-Forward Neural Network model working on epigenomic features, and a Convolutional Neural Network model working only on genomic sequence, targeted to the identification of enhancer- and promoter-activity in specific cell lines. While performing experiments to understand how the experimental setup influences the prediction performance of the methods, we particularly focused on (1) automatic model selection performed by Bayesian optimization and (2) exploring different data rebalancing setups for reducing negative unbalancing effects. CONCLUSIONS: Results show that (1) automatic model selection by Bayesian optimization improves the quality of the learner; (2) data rebalancing considerably impacts the prediction performance of the models; test set rebalancing may provide over-optimistic results, and should therefore be cautiously applied; (3) despite working on sequence data, convolutional models obtain performance close to those of feed forward models working on epigenomic information, which suggests that also sequence data carries informative content for CRR-activity prediction. We therefore suggest combining both models/data types in future works.


Subject(s)
Deep Learning , Humans , Bayes Theorem , Regulatory Sequences, Nucleic Acid , Neural Networks, Computer , Machine Learning
3.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35679533

ABSTRACT

Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.


Subject(s)
Algorithms , Machine Learning
4.
Bioinformatics ; 37(23): 4526-4533, 2021 12 07.
Article in English | MEDLINE | ID: mdl-34240108

ABSTRACT

MOTIVATION: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). 'Hierarchy-unaware' classifiers, also known as 'flat' methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while 'hierarchy-aware' approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. RESULTS: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide 'TPR-safe' predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges. AVAILABILITY AND IMPLEMENTATION: Fully tested R code freely available at https://anaconda.org/bioconda/r-hemdag. Tutorial and documentation at https://hemdag.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology , Gene Ontology , Computational Biology/methods , Proteins/metabolism
5.
Hematol Oncol ; 39(3): 364-379, 2021 Aug.
Article in English | MEDLINE | ID: mdl-33497493

ABSTRACT

Wnt/Fzd signaling has been implicated in hematopoietic stem cell maintenance and in acute leukemia establishment. In our previous work, we described a recurrent rearrangement involving the WNT10B locus (WNT10BR ), characterized by the expression of WNT10BIVS1 transcript variant, in acute myeloid leukemia. To determine the occurrence of WNT10BR in T-cell acute lymphoblastic leukemia (T-ALL), we retrospectively analyzed an Italian cohort of patients (n = 20) and detected a high incidence (13/20) of WNT10BIVS1 expression. To address genes involved in WNT10B molecular response, we have designed a Wnt-targeted RNA sequencing panel. Identifying Wnt agonists and antagonists, it results that the expression of FZD6, LRP5, and PROM1 genes stands out in WNT10BIVS1 positive patients compared to negative ones. Using MOLT4 and MUTZ-2 as leukemic cell models, which are characterized by the expression of WNT10BIVS1 , we have observed that WNT10B drives major Wnt activation to the FZD6 receptor complex through receipt of ligand. Additionally, short hairpin RNAs (shRNAs)-mediated gene silencing and small molecule-mediated inhibition of WNTs secretion have been observed to interfere with the WNT10B/FZD6 interaction. We have therefore identified that WNT10BIVS1 knockdown, or pharmacological interference by the LGK974 porcupine (PORCN) inhibitor, reduces WNT10B/FZD6 protein complex formation and significantly impairs intracellular effectors and leukemic expansion. These results describe the molecular circuit induced by WNT10B and suggest WNT10B/FZD6 as a new target in the T-ALL treatment strategy.


Subject(s)
Frizzled Receptors/metabolism , Gene Expression Regulation, Leukemic , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/metabolism , Proto-Oncogene Proteins/biosynthesis , Wnt Proteins/biosynthesis , Wnt Signaling Pathway , Acyltransferases/antagonists & inhibitors , Acyltransferases/genetics , Acyltransferases/metabolism , Female , Frizzled Receptors/genetics , HeLa Cells , Humans , Male , Membrane Proteins/antagonists & inhibitors , Membrane Proteins/genetics , Membrane Proteins/metabolism , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/drug therapy , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/genetics , Precursor T-Cell Lymphoblastic Leukemia-Lymphoma/pathology , Proto-Oncogene Proteins/genetics , Pyrazines/pharmacology , Pyridines/pharmacology , Wnt Proteins/genetics
6.
Australas J Dermatol ; 62(2): e162-e169, 2021 May.
Article in English | MEDLINE | ID: mdl-33125722

ABSTRACT

BACKGROUND: Histiocytoses are haematological disorders of bone marrow origin that share many biological and clinical features with haematological neoplasms. The association between histiocytoses of the cutaneous-group and myeloid malignancies is a poorly investigated topic of high biological and clinical impact. METHODS: We performed a systematic review of the scientific literature, compliant with PRISMA guidelines, to unravel the clinical and pathological features of this intriguing association. FINDINGS: We gathered and analysed 102 patients. Most were children with generalised cutaneous eruptions and displayed risk organ involvement (i.e. bone marrow, spleen, liver). Interestingly, all these features are uncommonly encountered in C-group histiocytosis not associated with haematological neoplasms. CONCLUSIONS: Our review shows that generalised eruptions and risk organ involvement in cutaneous-group histiocytosis should raise a suspicion for a concomitant myeloid neoplasm both in children and in adults and warrant further investigations. A rapid recognition of this association is required to start a prompt and effective therapeutic management given the aggressive behaviour of the associated myeloid neoplasm in most instances.


Subject(s)
Hematologic Neoplasms/complications , Histiocytosis/complications , Skin Diseases/complications , Humans
7.
PLoS One ; 15(12): e0244241, 2020.
Article in English | MEDLINE | ID: mdl-33351828

ABSTRACT

The visual exploration and analysis of biomolecular networks is of paramount importance for identifying hidden and complex interaction patterns among proteins. Although many tools have been proposed for this task, they are mainly focused on the query and visualization of a single protein with its neighborhood. The global exploration of the entire network and the interpretation of its underlying structure still remains difficult, mainly due to the excessively large size of the biomolecular networks. In this paper we propose a novel multi-resolution representation and exploration approach that exploits hierarchical community detection algorithms for the identification of communities occurring in biomolecular networks. The proposed graphical rendering combines two types of nodes (protein and communities) and three types of edges (protein-protein, community-community, protein-community), and displays communities at different resolutions, allowing the user to interactively zoom in and out from different levels of the hierarchy. Links among communities are shown in terms of relationships and functional correlations among the biomolecules they contain. This form of navigation can be also combined by the user with a vertex centric visualization for identifying the communities holding a target biomolecule. Since communities gather limited-size groups of correlated proteins, the visualization and exploration of complex and large networks becomes feasible on off-the-shelf computer machines. The proposed graphical exploration strategies have been implemented and integrated in UNIPred-Web, a web application that we recently introduced for combining the UNIPred algorithm, able to address both integration and protein function prediction in an imbalance-aware fashion, with an easy to use vertex-centric exploration of the integrated network. The tool has been deeply amended from different standpoints, including the prediction core algorithm. Several tests on networks of different size and connectivity have been conducted to show off the vast potential of our methodology; moreover, enrichment analyses have been performed to assess the biological meaningfulness of detected communities. Finally, a CoV-human network has been embedded in the system, and a corresponding case study presented, including the visualization and the prediction of human host proteins that potentially interact with SARS-CoV2 proteins.


Subject(s)
COVID-19/genetics , Internet , Metabolic Networks and Pathways/genetics , SARS-CoV-2/genetics , Algorithms , COVID-19/metabolism , COVID-19/virology , Humans , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/metabolism , SARS-CoV-2/pathogenicity
8.
Sci Rep ; 10(1): 3612, 2020 02 27.
Article in English | MEDLINE | ID: mdl-32107391

ABSTRACT

Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.


Subject(s)
Breast Neoplasms/diagnosis , Colorectal Neoplasms/diagnosis , Gene Regulatory Networks , Neural Networks, Computer , Pancreatic Neoplasms/diagnosis , Algorithms , Artificial Intelligence , Breast Neoplasms/epidemiology , Colorectal Neoplasms/epidemiology , Computational Biology/methods , Datasets as Topic , Female , Humans , Individuality , Male , Pancreatic Neoplasms/epidemiology , Phenotype , Prognosis , Transcriptome , Treatment Outcome
9.
Genes Chromosomes Cancer ; 59(5): 295-308, 2020 05.
Article in English | MEDLINE | ID: mdl-31846142

ABSTRACT

Blastic plasmacytoid dendritic cell neoplasm (BPDCN) is a rare and highly aggressive hematological malignancy with a poorly understood pathobiology and no effective therapeutic options. Despite a few recurrent genetic defects (eg, single nucleotide changes, indels, large chromosomal aberrations) have been identified in BPDCN, none are disease-specific, and more importantly, none explain its genesis or clinical behavior. In this study, we performed the first high resolution whole-genome analysis of BPDCN with a special focus on structural genomic alterations by using whole-genome sequencing and RNA sequencing. Our study, the first to characterize the landscape of genomic rearrangements and copy number alterations of BPDCN at nucleotide-level resolution, revealed that IKZF1, a gene encoding a transcription factor required for the differentiation of plasmacytoid dendritic cell precursors, is focally inactivated through recurrent structural alterations in this neoplasm. In concordance with the genomic data, transcriptome analysis revealed that conserved IKZF1 target genes display a loss-of-IKZF1 expression pattern. Furthermore, up-regulation of cellular processes responsible for cell-cell and cell-ECM interactions, which is a hallmark of IKZF1 deficiency, was prominent in BPDCN. Our findings suggest that IKZF1 inactivation plays a central role in the pathobiology of the disease, and consequently, therapeutic approaches directed at reestablishing the function of this gene might be beneficial for patients.


Subject(s)
Dendritic Cells/pathology , Hematologic Neoplasms/genetics , Hematologic Neoplasms/pathology , Ikaros Transcription Factor/genetics , Plasmacytoma/genetics , Plasmacytoma/pathology , Adult , Aged , Aged, 80 and over , Blast Crisis/genetics , Blast Crisis/metabolism , Blast Crisis/pathology , Cell Adhesion/physiology , Chromosome Aberrations , Databases, Genetic , Dendritic Cells/metabolism , Female , Hematologic Neoplasms/metabolism , Humans , Ikaros Transcription Factor/antagonists & inhibitors , Male , Middle Aged , Phosphatidylinositol 3-Kinases/metabolism , Plasmacytoma/metabolism , Transcription Factors/metabolism , Whole Genome Sequencing/methods
10.
BMC Bioinformatics ; 20(1): 733, 2019 Dec 27.
Article in English | MEDLINE | ID: mdl-31881821

ABSTRACT

BACKGROUND: The protein ki67 (pki67) is a marker of tumor aggressiveness, and its expression has been proven to be useful in the prognostic and predictive evaluation of several types of tumors. To numerically quantify the pki67 presence in cancerous tissue areas, pathologists generally analyze histochemical images to count the number of tumor nuclei marked for pki67. This allows estimating the ki67-index, that is the percentage of tumor nuclei positive for pki67 over all the tumor nuclei. Given the high image resolution and dimensions, its estimation by expert clinicians is particularly laborious and time consuming. Though automatic cell counting techniques have been presented so far, the problem is still open. RESULTS: In this paper we present a novel automatic approach for the estimations of the ki67-index. The method starts by exploiting the STRESS algorithm to produce a color enhanced image where all pixels belonging to nuclei are easily identified by thresholding, and then separated into positive (i.e. pixels belonging to nuclei marked for pki67) and negative by a binary classification tree. Next, positive and negative nuclei pixels are processed separately by two multiscale procedures identifying isolated nuclei and separating adjoining nuclei. The multiscale procedures exploit two Bayesian classification trees to recognize positive and negative nuclei-shaped regions. CONCLUSIONS: The evaluation of the computed results, both through experts' visual assessments and through the comparison of the computed indexes with those of experts, proved that the prototype is promising, so that experts believe in its potential as a tool to be exploited in the clinical practice as a valid aid for clinicians estimating the ki67-index. The MATLAB source code is open source for research purposes.


Subject(s)
Image Processing, Computer-Assisted/methods , Ki-67 Antigen/analysis , Neoplasms/chemistry , Algorithms , Animals , Bayes Theorem , Cell Nucleus/chemistry , Humans , Mice , Software
11.
BMC Bioinformatics ; 20(1): 422, 2019 Aug 14.
Article in English | MEDLINE | ID: mdl-31412768

ABSTRACT

BACKGROUND: One of the main issues in the automated protein function prediction (AFP) problem is the integration of multiple networked data sources. The UNIPred algorithm was thereby proposed to efficiently integrate -in a function-specific fashion- the protein networks by taking into account the imbalance that characterizes protein annotations, and to subsequently predict novel hypotheses about unannotated proteins. UNIPred is publicly available as R code, which might result of limited usage for non-expert users. Moreover, its application requires efforts in the acquisition and preparation of the networks to be integrated. Finally, the UNIPred source code does not handle the visualization of the resulting consensus network, whereas suitable views of the network topology are necessary to explore and interpret existing protein relationships. RESULTS: We address the aforementioned issues by proposing UNIPred-Web, a user-friendly Web tool for the application of the UNIPred algorithm to a variety of biomolecular networks, already supplied by the system, and for the visualization and exploration of protein networks. We support different organisms and different types of networks -e.g., co-expression, shared domains and physical interaction networks. Users are supported in the different phases of the process, ranging from the selection of the networks and the protein function to be predicted, to the navigation of the integrated network. The system also supports the upload of user-defined protein networks. The vertex-centric and the highly interactive approach of UNIPred-Web allow a narrow exploration of specific proteins, and an interactive analysis of large sub-networks with only a few mouse clicks. CONCLUSIONS: UNIPred-Web offers a practical and intuitive (visual) guidance to biologists interested in gaining insights into protein biomolecular functions. UNIPred-Web provides facilities for the integration of networks, and supplies a framework for the imbalance-aware protein network integration of nine organisms, the prediction of thousands of GO protein functions, and a easy-to-use graphical interface for the visual analysis, navigation and interpretation of the integrated networks and of the functional predictions.


Subject(s)
Computational Biology/methods , Internet , Protein Interaction Maps , Proteins/metabolism , Software , Algorithms , User-Computer Interface
13.
BMC Bioinformatics ; 19(Suppl 10): 353, 2018 Oct 15.
Article in English | MEDLINE | ID: mdl-30367594

ABSTRACT

BACKGROUND: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. RESULTS: We propose a novel semi-supervised parallel enhancement of COSNET, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. CONCLUSIONS: By parallelizing COSNET we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.


Subject(s)
Algorithms , Computer Graphics , Gene Regulatory Networks , Animals , Gene Ontology , Humans , Mice , Protein Interaction Maps/genetics , Proteins/genetics , Saccharomyces cerevisiae/genetics , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...