Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 14(1): 186, 2023 01 17.
Article in English | MEDLINE | ID: mdl-36650144

ABSTRACT

Dynamic processes on networks, be it information transfer in the Internet, contagious spreading in a social network, or neural signaling, take place along shortest or nearly shortest paths. Computing shortest paths is a straightforward task when the network of interest is fully known, and there are a plethora of computational algorithms for this purpose. Unfortunately, our maps of most large networks are substantially incomplete due to either the highly dynamic nature of networks, or high cost of network measurements, or both, rendering traditional path finding methods inefficient. We find that shortest paths in large real networks, such as the network of protein-protein interactions and the Internet at the autonomous system level, are not random but are organized according to latent-geometric rules. If nodes of these networks are mapped to points in latent hyperbolic spaces, shortest paths in them align along geodesic curves connecting endpoint nodes. We find that this alignment is sufficiently strong to allow for the identification of shortest path nodes even in the case of substantially incomplete networks, where numbers of missing links exceed those of observable links. We demonstrate the utility of latent-geometric path finding in problems of cellular pathway reconstruction and communication security.


Subject(s)
Algorithms , Signal Transduction , Communication , Cell Communication
2.
Biomolecules ; 14(1)2023 Dec 27.
Article in English | MEDLINE | ID: mdl-38254640

ABSTRACT

Until recently, efforts in population genetics have been focused primarily on people of European ancestry. To attenuate this bias, global population studies, such as the 1000 Genomes Project, have revealed differences in genetic variation across ethnic groups. How many of these differences can be attributed to population-specific traits? To answer this question, the mutation data must be linked with functional outcomes. A new "edgotype" concept has been proposed, which emphasizes the interaction-specific, "edgetic", perturbations caused by mutations in the interacting proteins. In this work, we performed systematic in silico edgetic profiling of ~50,000 non-synonymous SNVs (nsSNVs) from the 1000 Genomes Project by leveraging our semi-supervised learning approach SNP-IN tool on a comprehensive set of over 10,000 protein interaction complexes. We interrogated the functional roles of the variants and their impact on the human interactome and compared the results with the pathogenic variants disrupting PPIs in the same interactome. Our results demonstrated that a considerable number of nsSNVs from healthy populations could rewire the interactome. We also showed that the proteins enriched with interaction-disrupting mutations were associated with diverse functions and had implications in a broad spectrum of diseases. Further analysis indicated that distinct gene edgetic profiles among major populations could shed light on the molecular mechanisms behind the population phenotypic variances. Finally, the network analysis revealed that the disease-associated modules surprisingly harbored a higher density of interaction-disrupting mutations from healthy populations. The variation in the cumulative network damage within these modules could potentially account for the observed disparities in disease susceptibility, which are distinctly specific to certain populations. Our work demonstrates the feasibility of a large-scale in silico edgetic study, and reveals insights into the orchestrated play of population-specific mutations in the human interactome.


Subject(s)
Genetic Profile , Research Design , Humans , Mutation , Phenotype , Supervised Machine Learning
3.
BMC Bioinformatics ; 22(1): 149, 2021 Mar 23.
Article in English | MEDLINE | ID: mdl-33757430

ABSTRACT

BACKGROUND: A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. RESULTS: We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser ( http://atavdb.org/ ). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. CONCLUSIONS: Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.


Subject(s)
Exome Sequencing , Genetics, Population/instrumentation , Genomics , Software , Databases, Genetic , Humans , Phenotype , Reproducibility of Results
4.
Viruses ; 12(4)2020 03 25.
Article in English | MEDLINE | ID: mdl-32218151

ABSTRACT

During its first two and a half months, the recently emerged 2019 novel coronavirus, SARS-CoV-2, has already infected over one-hundred thousand people worldwide and has taken more than four thousand lives. However, the swiftly spreading virus also caused an unprecedentedly rapid response from the research community facing the unknown health challenge of potentially enormous proportions. Unfortunately, the experimental research to understand the molecular mechanisms behind the viral infection and to design a vaccine or antivirals is costly and takes months to develop. To expedite the advancement of our knowledge, we leveraged data about the related coronaviruses that is readily available in public databases and integrated these data into a single computational pipeline. As a result, we provide comprehensive structural genomics and interactomics roadmaps of SARS-CoV-2 and use this information to infer the possible functional differences and similarities with the related SARS coronavirus. All data are made publicly available to the research community.


Subject(s)
Betacoronavirus/genetics , Viral Proteins/genetics , Animals , Betacoronavirus/chemistry , Binding Sites , Biological Evolution , COVID-19 , Chiroptera/virology , Computational Biology , Conserved Sequence , Coronavirus Infections , Coronavirus Nucleocapsid Proteins , Genome, Viral , Genomics , Humans , Ligands , Models, Molecular , Nucleocapsid Proteins/chemistry , Pandemics , Phosphoproteins , Phylogeny , Pneumonia, Viral , Protein Interaction Mapping , Protein Structure, Tertiary , Severe acute respiratory syndrome-related coronavirus , SARS-CoV-2 , Sequence Alignment , Spike Glycoprotein, Coronavirus/chemistry , Viral Envelope Proteins/chemistry , Viral Matrix Proteins/chemistry
5.
Genes (Basel) ; 10(11)2019 11 15.
Article in English | MEDLINE | ID: mdl-31731769

ABSTRACT

Rapid progress in high-throughput -omics technologies moves us one step closer to the datacalypse in life sciences. In spite of the already generated volumes of data, our knowledge of the molecular mechanisms underlying complex genetic diseases remains limited. Increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. The identification of disease-specific functional modules in the human interactome can provide a more focused insight into the mechanistic nature of the disease. However, carving a disease network module from the whole interactome is a difficult task. In this paper, we propose a computational framework, Discovering most IMpacted SUbnetworks in interactoMe (DIMSUM), which enables the integration of genome-wide association studies (GWAS) and functional effects of mutations into the protein-protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest functional impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. We expect for our method to become a part of the common toolbox for the disease module analysis, facilitating the discovery of new disease markers.


Subject(s)
Algorithms , Disease/genetics , Gene Regulatory Networks , Genomics/methods , Protein Interaction Mapping/methods , Databases, Genetic , Datasets as Topic , Genome-Wide Association Study , Humans , Mutation , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps/genetics , Software
6.
J Mol Biol ; 430(18 Pt A): 2974-2992, 2018 09 14.
Article in English | MEDLINE | ID: mdl-30017919

ABSTRACT

Non-synonymous mutations linked to the complex diseases often have a global impact on a biological system, affecting large biomolecular networks and pathways. However, the magnitude of the mutation-driven effects on the macromolecular network is yet to be fully explored. In this work, we present a systematic multi-level characterization of human mutations associated with genetic disorders by determining their individual and combined interaction-rewiring, "edgetic," effects on the human interactome. Our in silico analysis highlights the intrinsic differences and important similarities between the pathogenic single-nucleotide variants (SNVs) and frameshift mutations. We show that pathogenic SNVs are more likely to cause gene pleiotropy than pathogenic frameshift mutations and are enriched on the protein interaction interfaces. Functional profiling of SNVs indicates widespread disruption of the protein-protein interactions and synergistic effects of SNVs. The coverage of our approach is several times greater than the recently published experimental study and has the minimal overlap with it, while the distributions of determined edgotypes between the two sets of profiled mutations are remarkably similar. Case studies reveal the central role of interaction-disrupting mutations in type 2 diabetes mellitus and suggest the importance of studying mutations that abnormally strengthen the protein interactions in cancer. With the advancement of next-generation sequencing technology that drives precision medicine, there is an increasing demand in understanding the changes in molecular mechanisms caused by the patient-specific genetic variation. The current and future in silico edgotyping tools present a cheap and fast solution to deal with the rapidly growing data sets of discovered mutations.


Subject(s)
Computational Biology , High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Transcriptome , Computational Biology/methods , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Gene Expression Profiling , Genetic Association Studies/methods , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Neoplasms/genetics , Neoplasms/metabolism , Protein Interaction Mapping , Protein Interaction Maps
7.
BMC Proc ; 10(Suppl 7): 275-281, 2016.
Article in English | MEDLINE | ID: mdl-27980649

ABSTRACT

Statistical association studies are an important tool in detecting novel disease genes. However, for sequencing data, association studies confront the challenge of low power because of relatively small data sample size and rare variants. Incorporating biological information that reflects disease mechanism is likely to strengthen the association evidence of disease genes, and thus increase the power of association studies. In this paper, we annotate non-synonymous single-nucleotide variants according to protein binding sites (BSs) by using a more accurate BS prediction method. We then incorporate this information into association study through a statistical framework of likelihood ratio test (LRT) based on weighted burden score of single-nucleotide variants (SNVs). The strategy is applied to Genetic Analysis Workshop 19 exome-sequencing data for detecting novel genes associated to hypotension. The SNV-weighting LRT idea is empirically verified by the simulated phenotypes (336 cases and 1607 controls), and the weights based on BS annotation are applied to the real phenotypes (394 cases and 1457 controls). Such strategy of weighting the prior information on protein functional sites is shown to be superior to the unweighted LRT and serves as a good complement to the existing association tests. Several putative genes are reported; some of them are functionally related to hypertension according to the previous evidence in the literature.

8.
Methods ; 79-80: 18-31, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25944472

ABSTRACT

Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.


Subject(s)
Genetic Predisposition to Disease , Genetic Variation , Sequence Analysis, DNA/methods , Computational Biology , DNA Copy Number Variations , Epigenomics/methods , Genome-Wide Association Study , Genomics/methods , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...