Search | VHL Regional Portal

Genome analysis and knowledge-driven variant interpretation with TGex.

Dahary, Dvir; Golan, Yaron; Mazor, Yaron; Zelig, Ofer; Barshir, Ruth; Twik, Michal; Iny Stein, Tsippi; Rosner, Guy; Kariv, Revital; Chen, Fei; Zhang, Qiang; Shen, Yiping; Safran, Marilyn; Lancet, Doron; Fishilevich, Simon.

BMC Med Genomics ; 12(1): 200, 2019 12 30.

Article in English | MEDLINE | ID: mdl-31888639

ABSTRACT

BACKGROUND: The clinical genetics revolution ushers in great opportunities, accompanied by significant challenges. The fundamental mission in clinical genetics is to analyze genomes, and to identify the most relevant genetic variations underlying a patient's phenotypes and symptoms. The adoption of Whole Genome Sequencing requires novel capacities for interpretation of non-coding variants. RESULTS: We present TGex, the Translational Genomics expert, a novel genome variation analysis and interpretation platform, with remarkable exome analysis capacities and a pioneering approach of non-coding variants interpretation. TGex's main strength is combining state-of-the-art variant filtering with knowledge-driven analysis made possible by VarElect, our highly effective gene-phenotype interpretation tool. VarElect leverages the widely used GeneCards knowledgebase, which integrates information from > 150 automatically-mined data sources. Access to such a comprehensive data compendium also facilitates TGex's broad variant annotation, supporting evidence exploration, and decision making. TGex has an interactive, user-friendly, and easy adaptive interface, ACMG compliance, and an automated reporting system. Beyond comprehensive whole exome sequence capabilities, TGex encompasses innovative non-coding variants interpretation, towards the goal of maximal exploitation of whole genome sequence analyses in the clinical genetics practice. This is enabled by GeneCards' recently developed GeneHancer, a novel integrative and fully annotated database of human enhancers and promoters. Examining use-cases from a variety of TGex users world-wide, we demonstrate its high diagnostic yields (42% for single exome and 50% for trios in 1500 rare genetic disease cases) and critical actionable genetic findings. The platform's support for integration with EHR and LIMS through dedicated APIs facilitates automated retrieval of patient data for TGex's customizable reporting engine, establishing a rapid and cost-effective workflow for an entire range of clinical genetic testing, including rare disorders, cancer predisposition, tumor biopsies and health screening. CONCLUSIONS: TGex is an innovative tool for the annotation, analysis and prioritization of coding and non-coding genomic variants. It provides access to an extensive knowledgebase of genomic annotations, with intuitive and flexible configuration options, allows quick adaptation, and addresses various workflow requirements. It thus simplifies and accelerates variant interpretation in clinical genetics workflows, with remarkable diagnostic yield, as exemplified in the described use cases. TGex is available at http://tgex.genecards.org/.

Subject(s)

Genetic Variation , Genomics/methods , Databases, Genetic , Gene Frequency , Genotype , Humans , Molecular Sequence Annotation , Phenotype , Software , User-Computer Interface , Workflow

Rational confederation of genes and diseases: NGS interpretation via GeneCards, MalaCards and VarElect.

Rappaport, Noa; Fishilevich, Simon; Nudel, Ron; Twik, Michal; Belinky, Frida; Plaschkes, Inbar; Stein, Tsippi Iny; Cohen, Dana; Oz-Levi, Danit; Safran, Marilyn; Lancet, Doron.

Biomed Eng Online ; 16(Suppl 1): 72, 2017 Aug 18.

Article in English | MEDLINE | ID: mdl-28830434

ABSTRACT

BACKGROUND: A key challenge in the realm of human disease research is next generation sequencing (NGS) interpretation, whereby identified filtered variant-harboring genes are associated with a patient's disease phenotypes. This necessitates bioinformatics tools linked to comprehensive knowledgebases. The GeneCards suite databases, which include GeneCards (human genes), MalaCards (human diseases) and PathCards (human pathways) together with additional tools, are presented with the focus on MalaCards utility for NGS interpretation as well as for large scale bioinformatic analyses. RESULTS: VarElect, our NGS interpretation tool, leverages the broad information in the GeneCards suite databases. MalaCards algorithms unify disease-related terms and annotations from 69 sources. Further, MalaCards defines hierarchical relatedness-aliases, disease families, a related diseases network, categories and ontological classifications. GeneCards and MalaCards delineate and share a multi-tiered, scored gene-disease network, with stringency levels, including the definition of elite status-high quality gene-disease pairs, coming from manually curated trustworthy sources, that includes 4500 genes for 8000 diseases. This unique resource is key to NGS interpretation by VarElect. VarElect, a comprehensive search tool that helps infer both direct and indirect links between genes and user-supplied disease/phenotype terms, is robustly strengthened by the information found in MalaCards. The indirect mode benefits from GeneCards' diverse gene-to-gene relationships, including SuperPaths-integrated biological pathways from 12 information sources. We are currently adding an important information layer in the form of "disease SuperPaths", generated from the gene-disease matrix by an algorithm similar to that previously employed for biological pathway unification. This allows the discovery of novel gene-disease and disease-disease relationships. The advent of whole genome sequencing necessitates capacities to go beyond protein coding genes. GeneCards is highly useful in this respect, as it also addresses 101,976 non-protein-coding RNA genes. In a more recent development, we are currently adding an inclusive map of regulatory elements and their inferred target genes, generated by integration from 4 resources. CONCLUSIONS: MalaCards provides a rich big-data scaffold for in silico biomedical discovery within the gene-disease universe. VarElect, which depends significantly on both GeneCards and MalaCards power, is a potent tool for supporting the interpretation of wet-lab experiments, notably NGS analyses of disease. The GeneCards suite has thus transcended its 2-decade role in biomedical research, maturing into a key player in clinical investigation.

Subject(s)

Computational Biology/methods , Disease/genetics , High-Throughput Nucleotide Sequencing , Databases, Genetic , Genomics , Humans , Phenotype

GeneHancer: genome-wide integration of enhancers and target genes in GeneCards.

Fishilevich, Simon; Nudel, Ron; Rappaport, Noa; Hadar, Rotem; Plaschkes, Inbar; Iny Stein, Tsippi; Rosen, Naomi; Kohn, Asher; Twik, Michal; Safran, Marilyn; Lancet, Doron; Cohen, Dana.

Database (Oxford) ; 20172017 01 01.

Article in English | MEDLINE | ID: mdl-28605766

ABSTRACT

A major challenge in understanding gene regulation is the unequivocal identification of enhancer elements and uncovering their connections to genes. We present GeneHancer, a novel database of human enhancers and their inferred target genes, in the framework of GeneCards. First, we integrated a total of 434 000 reported enhancers from four different genome-wide databases: the Encyclopedia of DNA Elements (ENCODE), the Ensembl regulatory build, the functional annotation of the mammalian genome (FANTOM) project and the VISTA Enhancer Browser. Employing an integration algorithm that aims to remove redundancy, GeneHancer portrays 285 000 integrated candidate enhancers (covering 12.4% of the genome), 94 000 of which are derived from more than one source, and each assigned an annotation-derived confidence score. GeneHancer subsequently links enhancers to genes, using: tissue co-expression correlation between genes and enhancer RNAs, as well as enhancer-targeted transcription factor genes; expression quantitative trait loci for variants within enhancers; and capture Hi-C, a promoter-specific genome conformation assay. The individual scores based on each of these four methods, along with geneenhancer genomic distances, form the basis for GeneHancer's combinatorial likelihood-based scores for enhancergene pairing. Finally, we define 'elite' enhancergene relations reflecting both a high-likelihood enhancer definition and a strong enhancergene association. GeneHancer predictions are fully integrated in the widely used GeneCards Suite, whereby candidate enhancers and their annotations are displayed on every relevant GeneCard. This assists in the mapping of non-coding variants to enhancers, and via the linked genes, forms a basis for variantphenotype interpretation of whole-genome sequences in health and disease. Database URL: http://www.genecards.org/.

Subject(s)

Databases, Nucleic Acid , Enhancer Elements, Genetic , Genome , Sequence Analysis, DNA/methods , Web Browser , Genome-Wide Association Study , Predictive Value of Tests

MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search.

Rappaport, Noa; Twik, Michal; Plaschkes, Inbar; Nudel, Ron; Iny Stein, Tsippi; Levitt, Jacob; Gershoni, Moran; Morrey, C Paul; Safran, Marilyn; Lancet, Doron.

Nucleic Acids Res ; 45(D1): D877-D887, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899610

ABSTRACT

The MalaCards human disease database (http://www.malacards.org/) is an integrated compendium of annotated diseases mined from 68 data sources. MalaCards has a web card for each of â¼20 000 disease entries, in six global categories. It portrays a broad array of annotation topics in 15 sections, including Summaries, Symptoms, Anatomical Context, Drugs, Genetic Tests, Variations and Publications. The Aliases and Classifications section reflects an algorithm for disease name integration across often-conflicting sources, providing effective annotation consolidation. A central feature is a balanced Genes section, with scores reflecting the strength of disease-gene associations. This is accompanied by other gene-related disease information such as pathways, mouse phenotypes and GO-terms, stemming from MalaCards' affiliation with the GeneCards Suite of databases. MalaCards' capacity to inter-link information from complementary sources, along with its elaborate search function, relational database infrastructure and convenient data dumps, allows it to tackle its rich disease annotation landscape, and facilitates systems analyses and genome sequence interpretation. MalaCards adopts a 'flat' disease-card approach, but each card is mapped to popular hierarchical ontologies (e.g. International Classification of Diseases, Human Phenotype Ontology and Unified Medical Language System) and also contains information about multi-level relations among diseases, thereby providing an optimal tool for disease representation and scrutiny.

Subject(s)

Computational Biology , Databases, Genetic , Genetic Association Studies/methods , Algorithms , Computational Biology/methods , Genetic Predisposition to Disease , Genetic Variation , Genomics/methods , Humans , Molecular Sequence Annotation , Web Browser

ORDB, HORDE, ODORactor and other on-line knowledge resources of olfactory receptor-odorant interactions.

Marenco, Luis; Wang, Rixin; McDougal, Robert; Olender, Tsviya; Twik, Michal; Bruford, Elspeth; Liu, Xinyi; Zhang, Jian; Lancet, Doron; Shepherd, Gordon; Crasto, Chiquito.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-27694208

ABSTRACT

We present here an exploration of the evolution of three well-established, web-based resources dedicated to the dissemination of information related to olfactory receptors (ORs) and their functional ligands, odorants. These resources are: the Olfactory Receptor Database (ORDB), the Human Olfactory Data Explorer (HORDE) and ODORactor. ORDB is a repository of genomic and proteomic information related to ORs and other chemosensory receptors, such as taste and pheromone receptors. Three companion databases closely integrated with ORDB are OdorDB, ORModelDB and OdorMapDB; these resources are part of the SenseLab suite of databases (http://senselab.med.yale.edu). HORDE (http://genome.weizmann.ac.il/horde/) is a semi-automatically populated database of the OR repertoires of human and several mammals. ODORactor (http://mdl.shsmu.edu.cn/ODORactor/) provides information related to OR-odorant interactions from the perspective of the odorant. All three resources are connected to each other via web-links.Database URL: http://senselab.med.yale.edu; http://genome.weizmann.ac.il/horde/; http://mdl.shsmu.edu.cn/ODORactor/.

Subject(s)

Databases, Protein , Odorants , Receptors, Odorant/chemistry , Receptors, Odorant/metabolism , Animals , Humans , Proteomics , Receptors, Odorant/genetics

VarElect: the phenotype-based variation prioritizer of the GeneCards Suite.

Stelzer, Gil; Plaschkes, Inbar; Oz-Levi, Danit; Alkelai, Anna; Olender, Tsviya; Zimmerman, Shahar; Twik, Michal; Belinky, Frida; Fishilevich, Simon; Nudel, Ron; Guan-Golan, Yaron; Warshawsky, David; Dahary, Dvir; Kohn, Asher; Mazor, Yaron; Kaplan, Sergey; Iny Stein, Tsippi; Baris, Hagit N; Rappaport, Noa; Safran, Marilyn; Lancet, Doron.

BMC Genomics ; 17 Suppl 2: 444, 2016 06 23.

Article in English | MEDLINE | ID: mdl-27357693

ABSTRACT

BACKGROUND: Next generation sequencing (NGS) provides a key technology for deciphering the genetic underpinnings of human diseases. Typical NGS analyses of a patient depict tens of thousands non-reference coding variants, but only one or very few are expected to be significant for the relevant disorder. In a filtering stage, one employs family segregation, rarity in the population, predicted protein impact and evolutionary conservation as a means for shortening the variation list. However, narrowing down further towards culprit disease genes usually entails laborious seeking of gene-phenotype relationships, consulting numerous separate databases. Thus, a major challenge is to transition from the few hundred shortlisted genes to the most viable disease-causing candidates. RESULTS: We describe a novel tool, VarElect ( http://ve.genecards.org ), a comprehensive phenotype-dependent variant/gene prioritizer, based on the widely-used GeneCards, which helps rapidly identify causal mutations with extensive evidence. The GeneCards suite offers an effective and speedy alternative, whereby >120 gene-centric automatically-mined data sources are jointly available for the task. VarElect cashes on this wealth of information, as well as on GeneCards' powerful free-text Boolean search and scoring capabilities, proficiently matching variant-containing genes to submitted disease/symptom keywords. The tool also leverages the rich disease and pathway information of MalaCards, the human disease database, and PathCards, the unified pathway (SuperPaths) database, both within the GeneCards Suite. The VarElect algorithm infers direct as well as indirect links between genes and phenotypes, the latter benefitting from GeneCards' diverse gene-to-gene data links in GenesLikeMe. Finally, our tool offers an extensive gene-phenotype evidence portrayal ("MiniCards") and hyperlinks to the parent databases. CONCLUSIONS: We demonstrate that VarElect compares favorably with several often-used NGS phenotyping tools, thus providing a robust facility for ranking genes, pointing out their likelihood to be related to a patient's disease. VarElect's capacity to automatically process numerous NGS cases, either in stand-alone format or in VCF-analyzer mode (TGex and VarAnnot), is indispensable for emerging clinical projects that involve thousands of whole exome/genome NGS analyses.

Subject(s)

Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Algorithms , Data Mining , Databases, Genetic , Genome, Human , Humans , Phenotype

The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron.

Curr Protoc Bioinformatics ; 54: 1.30.1-1.30.33, 2016 06 20.

Article in English | MEDLINE | ID: mdl-27322403

ABSTRACT

GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc.

Subject(s)

Data Mining/methods , Databases, Genetic , Genomics/methods , Sequence Analysis/methods , High-Throughput Nucleotide Sequencing , Humans , Phenotype , Proteome , Software/standards

MalaCards: A Comprehensive Automatically-Mined Database of Human Diseases.

Rappaport, Noa; Twik, Michal; Nativ, Noam; Stelzer, Gil; Bahir, Iris; Stein, Tsippi Iny; Safran, Marilyn; Lancet, Doron.

Curr Protoc Bioinformatics ; 47: 1.24.1-19, 2014 Sep 08.

Article in English | MEDLINE | ID: mdl-25199789

ABSTRACT

Systems medicine provides insights into mechanisms of human diseases, and expedites the development of better diagnostics and drugs. To facilitate such strategies, we initiated MalaCards, a compendium of human diseases and their annotations, integrating and often remodeling information from 64 data sources. MalaCards employs, among others, the proven automatic data-mining strategies established in the construction of GeneCards, our widely used compendium of human genes. The development of MalaCards poses many algorithmic challenges, such as disease name unification, integrated classification, gene-disease association, and disease-targeted expression analysis. MalaCards displays a Web card for each of >19,000 human diseases, with 17 sections, including textual summaries, related diseases, related genes, genetic variations and tests, and relevant publications. Also included are a powerful search engine and a variety of categorized disease lists. This unit describes two basic protocols to search and browse MalaCards effectively.

Subject(s)

Automation , Data Mining , Database Management Systems , Disease , Humans , User-Computer Interface

MalaCards: an integrated compendium for diseases and their annotation.

Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Stein, Tsippi Iny; Bahir, Iris; Belinky, Frida; Morrey, C Paul; Safran, Marilyn; Lancet, Doron.

Database (Oxford) ; 2013: bat018, 2013.

Article in English | MEDLINE | ID: mdl-23584832

ABSTRACT

Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/

Subject(s)

Databases, Genetic , Disease/genetics , Molecular Sequence Annotation , Data Mining , Humans , Internet

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL