Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3163-3172, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37030791

RESUMO

Since many biological processes are governed by protein-protein interactions, understanding which mutations lead to a disruption in these interactions is profoundly important for cancer research. Most of the existing methods focus on the stability of the protein without considering the specific effects of a mutation on its interactions with other proteins. Here, we focus on somatic mutations that appear on the interface regions of the protein and predict the interactions that would be affected by a mutation of interest. We build an ensemble model, Predator, that classifies the interface mutations as disruptive or nondisruptive based on the predicted effects of mutations on specific protein-protein interactions. We show that Predator outperforms existing approaches in literature in terms of prediction accuracy. We then apply Predator on various TCGA cancer cohorts and perform comprehensive analysis at cohort level, patient level, and gene level in determining the genes whose interface mutations tend to yield a disruption in its interactions. The predictions obtained by Predator shed light on interesting patterns on several genes for each cohort regarding their potential as cancer drivers. Our analyses further reveal that the identified genes and their frequently disrupted partners exhibit patterns of mutually exclusivity across cancer cohorts under study.


Assuntos
Neoplasias , Humanos , Mutação/genética , Neoplasias/genética , Proteínas/genética
2.
Bioinformatics ; 38(13): 3407-3414, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35579340

RESUMO

MOTIVATION: A major challenge in cancer genomics is to distinguish the driver mutations that are causally linked to cancer from passenger mutations that do not contribute to cancer development. The majority of existing methods provide a single driver gene list for the entire cohort of patients. However, since mutation profiles of patients from the same cancer type show a high degree of heterogeneity, a more ideal approach is to identify patient-specific drivers. RESULTS: We propose a novel method that integrates genomic data, biological pathways and protein connectivity information for personalized identification of driver genes. The method is formulated on a personalized bipartite graph for each patient. Our approach provides a personalized ranking of the mutated genes of a patient based on the sum of weighted 'pairwise pathway coverage' scores across all the samples, where appropriate pairwise patient similarity scores are used as weights to normalize these coverage scores. We compare our method against five state-of-the-art patient-specific cancer gene prioritization methods. The comparisons are with respect to a novel evaluation method that takes into account the personalized nature of the problem. We show that our approach outperforms the existing alternatives for both the TCGA and the cell line data. In addition, we show that the KEGG/Reactome pathways enriched in our ranked genes and those that are enriched in cell lines' reference sets overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods. Our findings can provide valuable information toward the development of personalized treatments and therapies. AVAILABILITY AND IMPLEMENTATION: All the codes and data are available at https://github.com/abu-compbio/PersonaDrive, and the data underlying this article are available in Zenodo, at https://doi.org/10.5281/zenodo.6520187. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Genômica/métodos , Medicina de Precisão/métodos , Mutação , Oncogenes
3.
Front Genet ; 12: 746495, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34899838

RESUMO

One of the key concepts employed in cancer driver gene identification is that of mutual exclusivity (ME); a driver mutation is less likely to occur in case of an earlier mutation that has common functionality in the same molecular pathway. Several ME tests have been proposed recently, however the current protocols to evaluate ME tests have two main limitations. Firstly the evaluations are mostly with respect to simulated data and secondly the evaluation metrics lack a network-centric view. The latter is especially crucial as the notion of common functionality can be achieved through searching for interaction patterns in relevant networks. We propose a network-centric framework to evaluate the pairwise significances found by statistical ME tests. It has three main components. The first component consists of metrics employed in the network-centric ME evaluations. Such metrics are designed so that network knowledge and the reference set of known cancer genes are incorporated in ME evaluations under a careful definition of proper control groups. The other two components are designed as further mechanisms to avoid confounders inherent in ME detection on top of the network-centric view. To this end, our second objective is to dissect the side effects caused by mutation load artifacts where mutations driving tumor subtypes with low mutation load might be incorrectly diagnosed as mutually exclusive. Finally, as part of the third main component, the confounding issue stemming from the use of nonspecific interaction networks generated as combinations of interactions from different tissues is resolved through the creation and use of tissue-specific networks in the proposed framework. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/NetCentric.

4.
BMC Bioinformatics ; 22(1): 62, 2021 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-33568049

RESUMO

BACKGROUND: Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. RESULTS: We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. CONCLUSIONS: Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.


Assuntos
Genômica , Neoplasias , Oncogenes , Mapas de Interação de Proteínas , Redes Reguladoras de Genes , Humanos , Neoplasias/genética
5.
Sci Rep ; 10(1): 21971, 2020 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-33319839

RESUMO

The majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay's output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.


Assuntos
Algoritmos , Biologia Computacional/métodos , Neoplasias/genética , Humanos , Curva ROC
6.
Bioinformatics ; 36(3): 872-879, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31432076

RESUMO

MOTIVATION: Genomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction (PPI) networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules. RESULTS: We present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions (PPIs), mutual exclusivity and coverage to identify cancer driver modules. MEXCOwalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/MEXCOwalk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Neoplasias , Algoritmos , Redes Reguladoras de Genes , Humanos , Mutação , Software
7.
BMC Syst Biol ; 11(1): 110, 2017 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-29166896

RESUMO

BACKGROUND: Identification of driver genes related to certain types of cancer is an important research topic. Several systems biology approaches have been suggested, in particular for the identification of breast cancer (BRCA) related genes. Such approaches usually rely on differential gene expression and/or mutational landscape data. In some cases interaction network data is also integrated to identify cancer-related modules computationally. RESULTS: We provide a framework for the comparative graph-theoretical analysis of networks integrating the relevant gene expression, mutations, and potein-protein interaction network data. The comparisons involve a graph-theoretical analysis of normal and tumor network pairs across all instances of a given set of breast cancer samples. The network measures under consideration are based on appropriate formulations of various centrality measures: betweenness, clustering coefficients, degree centrality, random walk distances, graph-theoretical distances, and Jaccard index centrality. CONCLUSIONS: Among all the studied centrality-based graph-theoretical properties, we show that a betweenness-based measure differentiates BRCA genes across all normal versus tumor network pairs, than the rest of the popular centrality-based measures. The AUROC and AUPR values of the gene lists ordered with respect to the measures under study as compared to NCBI BioSystems pathway and the COSMIC database of cancer genes are the largest with the betweenness-based differentiation, followed by the measure based on degree centrality. In order to test the robustness of the suggested measures in prioritizing cancer genes, we further tested the two most promising measures, those based on betweenness and degree centralities, on randomly rewired networks. We show that both measures are quite resilient to noise in the input interaction network. We also compared the same measures against a state-of-the-art alternative disease gene prioritization method, MUFFFINN. We show that both our graph-theoretical measures outperform MUFFINN prioritizations in terms of ROC and precions/recall analysis. Finally, we filter the ordered list of the best measure, the betweenness-based differentiation, via a maximum-weight independent set formulation and investigate the top 50 genes in regards to literature verification. We show that almost all genes in the list are verified by the breast cancer literature and three genes are presented as novel genes that may potentialy be BRCA-related but missing in literature.


Assuntos
Neoplasias da Mama/genética , Redes Reguladoras de Genes , Genes Supressores de Tumor , Feminino , Humanos , Modelos Teóricos , Mutação
8.
Bioinformatics ; 33(4): 537-544, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27797764

RESUMO

Motivation: Analysis of protein-protein interaction (PPI) networks provides invaluable insight into several systems biology problems. High-throughput experimental techniques together with computational methods provide large-scale PPI networks. However, a major issue with these networks is their erroneous nature; they contain false-positive interactions and usually many more false-negatives. Recently, several computational methods have been proposed for network reconstruction based on topology, where given an input PPI network the goal is to reconstruct the network by identifying false-positives/-negatives as correctly as possible. Results: We observe that the existing topology-based network reconstruction algorithms suffer several shortcomings. An important issue is regarding the scalability of their computational requirements, especially in terms of execution times, with the network sizes. They have only been tested on small-scale networks thus far and when applied on large-scale networks of popular PPI databases, the executions require unreasonable amounts of time, or may even crash without producing any output for some instances even after several months of execution. We provide an algorithm, RedNemo, for the topology-based network reconstruction problem. It provides more accurate networks than the alternatives as far as biological qualities measured in terms of most metrics based on gene ontology annotations. The recovery of a high-confidence network modified via random edge removals and rewirings is also better with RedNemo than with the alternatives under most of the experimented removal/rewiring ratios. Furthermore, through extensive tests on databases of varying sizes, we show that RedNemo achieves these results with much better running time performances. Availability and Implementation: Supplementary material including source code, useful scripts, experimental data and the results are available at http://webprs.khas.edu.tr/~cesim/RedNemo.tar.gz. Contact: cesim@khas.edu.tr. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Software , Algoritmos , Animais , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Saccharomyces cerevisiae/metabolismo , Biologia de Sistemas/métodos
9.
Bioinformatics ; 31(14): 2356-63, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-25788620

RESUMO

MOTIVATION: Network prediction as applied to protein-protein interaction (PPI) networks has received considerable attention within the last decade. Because of the limitations of experimental techniques for interaction detection and network construction, several computational methods for PPI network reconstruction and growth have been suggested. Such methods usually limit the scope of study to a single network, employing data based on genomic context, structure, domain, sequence information or existing network topology. Incorporating multiple species network data for network reconstruction and growth entails the design of novel models encompassing both network reconstruction and network alignment, since the goal of network alignment is to provide functionally orthologous proteins from multiple networks and such orthology information can be used in guiding interolog transfers. However, such an approach raises the classical chicken or egg problem; alignment methods assume error-free networks, whereas network prediction via orthology works affectively if the functionally orthologous proteins are determined with high precision. Thus to resolve this intertwinement, we propose a framework to handle both problems simultaneously, that of SImultaneous Prediction and Alignment of Networks (SiPAN). RESULTS: We present an algorithm that solves the SiPAN problem in accordance with its simultaneous nature. Bearing the same name as the defined problem itself, the SiPAN algorithm employs state-of-the-art alignment and topology-based interaction confidence construction algorithms, which are used as benchmark methods for comparison purposes as well. To demonstrate the effectiveness of the proposed network reconstruction via SiPAN, we consider two scenarios; one that preserves the network sizes and the other where the network sizes are increased. Through extensive tests on real-world biological data, we show that the network qualities of SiPAN reconstructions are as good as those of original networks and in some cases SiPAN networks are even better, especially for the former scenario. An alternative state-of-the-art network reconstruction algorithm random walk with resistance produces networks considerably worse than the original networks and those reproduced via SiPAN in both cases. AVAILABILITY AND IMPLEMENTATION: Freely available at http://webprs.khas.edu.tr/∼cesim/SiPAN.tar.gz.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Humanos , Proteínas/química
10.
Bioinformatics ; 30(4): 531-9, 2014 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-24336414

RESUMO

MOTIVATION: Global many-to-many alignment of biological networks has been a central problem in comparative biological network studies. Given a set of biological interaction networks, the informal goal is to group together related nodes. For the case of protein-protein interaction networks, such groups are expected to form clusters of functionally orthologous proteins. Construction of such clusters for networks from different species may prove useful in determining evolutionary relationships, in predicting the functions of proteins with unknown functions and in verifying those with estimated functions. RESULTS: A central informal objective in constructing clusters of orthologous proteins is to guarantee that each cluster is composed of members with high homological similarity, usually determined via sequence similarities, and that the interactions of the proteins involved in the same cluster are conserved across the input networks. We provide a formal definition of the global many-to-many alignment of multiple protein-protein interaction networks that captures this informal objective. We show the computational intractability of the suggested definition. We provide a heuristic method based on backbone extraction and merge strategy (BEAMS) for the problem. We finally show, through experiments based on biological significance tests, that the proposed BEAMS algorithm performs better than the state-of-the-art approaches. Furthermore, the computational burden of the BEAMS algorithm in terms of execution speed and memory requirements is more reasonable than the competing algorithms. AVAILABILITY AND IMPLEMENTATION: Supplementary material including code implementations in LEDA C++, experimental data and the results are available at http://webprs.khas.edu.tr/~cesim/BEAMS.tar.gz.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Animais , Humanos , Modelos Biológicos
11.
Bioinformatics ; 29(13): i145-53, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812978

RESUMO

MOTIVATION: Given a pair of metabolic pathways, an alignment of the pathways corresponds to a mapping between similar substructures of the pair. Successful alignments may provide useful applications in phylogenetic tree reconstruction, drug design and overall may enhance our understanding of cellular metabolism. RESULTS: We consider the problem of providing one-to-many alignments of reactions in a pair of metabolic pathways. We first provide a constrained alignment framework applicable to the problem. We show that the constrained alignment problem even in a primitive setting is computationally intractable, which justifies efforts for designing efficient heuristics. We present our Constrained Alignment of Metabolic Pathways (CAMPways) algorithm designed for this purpose. Through extensive experiments involving a large pathway database, we demonstrate that when compared with a state-of-the-art alternative, the CAMPways algorithm provides better alignment results on metabolic networks as far as measures based on same-pathway inclusion and biochemical significance are concerned. The execution speed of our algorithm constitutes yet another important improvement over alternative algorithms. AVAILABILITY: Open source codes, executable binary, useful scripts, all the experimental data and the results are freely available as part of the Supplementary Material at http://code.google.com/p/campways/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Redes e Vias Metabólicas , Engenharia Metabólica
12.
Bioinformatics ; 29(7): 917-24, 2013 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-23413436

RESUMO

MOTIVATION: Given protein-protein interaction (PPI) networks of a pair of species, a pairwise global alignment corresponds to a one-to-one mapping between their proteins. Based on the presupposition that such a mapping provides pairs of functionally orthologous proteins accurately, the results of the alignment may then be used in comparative systems biology problems such as function prediction/verification or construction of evolutionary relationships. RESULTS: We show that the problem is NP-hard even for the case where the pair of networks are simply paths. We next provide a polynomial time heuristic algorithm, SPINAL, which consists of two main phases. In the first coarse-grained alignment phase, we construct all pairwise initial similarity scores based on pairwise local neighborhood matchings. Using the produced similarity scores, the fine-grained alignment phase produces the final one-to-one mapping by iteratively growing a locally improved solution subset. Both phases make use of the construction of neighborhood bipartite graphs and the contributors as a common primitive. We assess the performance of our algorithm on the PPI networks of yeast, fly, human and worm. We show that based on the accuracy measures used in relevant work, our method outperforms the state-of-the-art algorithms. Furthermore, our algorithm does not suffer from scalability issues, as such accurate results are achieved in reasonable running times as compared with the benchmark algorithms. AVAILABILITY: Supplementary Document, open source codes, useful scripts, all the experimental data and the results are freely available at http://code.google.com/p/spinal/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Animais , Humanos , Alinhamento de Sequência , Análise de Sequência de Proteína
13.
Bioinformatics ; 27(11): 1583-4, 2011 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-21478488

RESUMO

SUMMARY: We present our protein-protein interaction (PPI) network visualization system RobinViz (reliability-oriented bioinformatic networks visualization). Clustering the PPI network based on gene ontology (GO) annotations or biclustered gene expression data, providing a clustered visualization model based on a central/peripheral duality, computing layouts with algorithms specialized for interaction reliabilities represented as weights, completely automated data acquisition, processing are notable features of the system. AVAILABILITY: RobinViz is a free, open-source software protected under GPL. It is written in C++ and Python, and consists of almost 30 000 lines of code, excluding the employed libraries. Source code, user manual and other Supplementary Material are available for download at http://code.google.com/p/robinviz/.


Assuntos
Gráficos por Computador , Mapeamento de Interação de Proteínas , Software , Algoritmos , Análise por Conglomerados
14.
Bioinformatics ; 26(20): 2594-600, 2010 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-20733064

RESUMO

MOTIVATION: Biclustering gene expression data is the problem of extracting submatrices of genes and conditions exhibiting significant correlation across both the rows and the columns of a data matrix of expression values. Even the simplest versions of the problem are computationally hard. Most of the proposed solutions therefore employ greedy iterative heuristics that locally optimize a suitably assigned scoring function. METHODS: We provide a fast and simple pre-processing algorithm called localization that reorders the rows and columns of the input data matrix in such a way as to group correlated entries in small local neighborhoods within the matrix. The proposed localization algorithm takes its roots from effective use of graph-theoretical methods applied to problems exhibiting a similar structure to that of biclustering. In order to evaluate the effectivenesss of the localization pre-processing algorithm, we focus on three representative greedy iterative heuristic methods. We show how the localization pre-processing can be incorporated into each representative algorithm to improve biclustering performance. Furthermore, we propose a simple biclustering algorithm, Random Extraction After Localization (REAL) that randomly extracts submatrices from the localization pre-processed data matrix, eliminates those with low similarity scores, and provides the rest as correlated structures representing biclusters. RESULTS: We compare the proposed localization pre-processing with another pre-processing alternative, non-negative matrix factorization. We show that our fast and simple localization procedure provides similar or even better results than the computationally heavy matrix factorization pre-processing with regards to H-value tests. We next demonstrate that the performances of the three representative greedy iterative heuristic methods improve with localization pre-processing when biological correlations in the form of functional enrichment and PPI verification constitute the main performance criteria. The fact that the random extraction method based on localization REAL performs better than the representative greedy heuristic methods under same criteria also confirms the effectiveness of the suggested pre-processing method. AVAILABILITY: Supplementary material including code implementations in LEDA C++ library, experimental data, and the results are available at http://code.google.com/p/biclustering/ CONTACTS: cesim@khas.edu.tr; melihsozdinler@boun.edu.tr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados Factuais , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...