Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Front Genet ; 12: 642991, 2021.
Article in English | MEDLINE | ID: mdl-33763122

ABSTRACT

The steady elaboration of the Metagenomic and Metadesign of Subways and Urban Biomes (MetaSUB) international consortium project raises important new questions about the origin, variation, and antimicrobial resistance of the collected samples. CAMDA (Critical Assessment of Massive Data Analysis, http://camda.info/) forum organizes annual challenges where different bioinformatics and statistical approaches are tested on samples collected around the world for bacterial classification and prediction of geographical origin. This work proposes a method which not only predicts the locations of unknown samples, but also estimates the relative risk of antimicrobial resistance through spatial modeling. We introduce a new component in the standard analysis as we apply a Bayesian spatial convolution model which accounts for spatial structure of the data as defined by the longitude and latitude of the samples and assess the relative risk of antimicrobial resistance taxa across regions which is relevant to public health. We can then use the estimated relative risk as a new measure for antimicrobial resistance. We also compare the performance of several machine learning methods, such as Gradient Boosting Machine, Random Forest, and Neural Network to predict the geographical origin of the mystery samples. All three methods show consistent results with some superiority of Random Forest classifier. In our future work we can consider a broader class of spatial models and incorporate covariates related to the environment and climate profiles of the samples to achieve more reliable estimation of the relative risk related to antimicrobial resistance.

2.
Cancer Res ; 81(2): 282-288, 2021 01 15.
Article in English | MEDLINE | ID: mdl-33115802

ABSTRACT

Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers used in these pipelines identify variants at the lowest possible granularity, single-nucleotide variants (SNV). As a result, multiple adjacent SNVs are called individually instead of as a multi-nucleotide variants (MNV). With this approach, the amino acid change from the individual SNV within a codon could be different from the amino acid change based on the MNV that results from combining SNV, leading to incorrect conclusions about the downstream effects of the variants. Here, we analyzed 10,383 variant call files (VCF) from the Cancer Genome Atlas (TCGA) and found 12,141 incorrectly annotated MNVs. Analysis of seven commonly mutated genes from 178 studies in cBioPortal revealed that MNVs were consistently missed in 20 of these studies, whereas they were correctly annotated in 15 more recent studies. At the BRAF V600 locus, the most common example of MNV, several public datasets reported separate BRAF V600E and BRAF V600M variants instead of a single merged V600K variant. VCFs from the TCGA Mutect2 caller were used to develop a solution to merge SNV to MNV. Our custom script used the phasing information from the SNV VCF and determined whether SNVs were at the same codon and needed to be merged into MNV before variant annotation. This study shows that institutions performing NGS sequencing for cancer genomics should incorporate the step of merging MNV as a best practice in their pipelines. SIGNIFICANCE: Identification of incorrect mutation calls in TCGA, including clinically relevant BRAF V600 and KRAS G12, will influence research and potentially clinical decisions.


Subject(s)
Genome, Human , Genomics/standards , Molecular Sequence Annotation/standards , Mutation , Neoplasms/genetics , Polymorphism, Single Nucleotide , Scientific Experimental Error/statistics & numerical data , Algorithms , High-Throughput Nucleotide Sequencing/methods , Humans , Neoplasms/pathology
3.
Biotechniques ; 69(6): 420-426, 2020 12.
Article in English | MEDLINE | ID: mdl-33103912

ABSTRACT

Although next-generation sequencing assays are routinely carried out using samples from cancer trials, the sequencing data are not always of the required quality. There is a need to evaluate the performance of tissue collection sites and provide feedback about the quality of next-generation sequencing data. This study used a modeling approach based on whole exome sequencing quality control (QC) metrics to evaluate the relative performance of sites participating in the Bristol Myers Squibb Immuno-Oncology clinical trials sample collection. We identified several events for the sample swap. Overall, most sites performed well and few showed poor performance. These findings can increase awareness of sample failure and improve the quality of samples.


Subject(s)
Exome Sequencing , Models, Theoretical , Specimen Handling , Clinical Laboratory Techniques , Humans , Quality Control , Exome Sequencing/standards
4.
Mol Diagn Ther ; 23(4): 507-520, 2019 08.
Article in English | MEDLINE | ID: mdl-31250328

ABSTRACT

INTRODUCTION: Tumor mutational burden (TMB) has emerged as a clinically relevant biomarker that may be associated with immune checkpoint inhibitor efficacy. Standardization of TMB measurement is essential for implementing diagnostic tools to guide treatment. OBJECTIVE: Here we describe the in-depth evaluation of bioinformatic TMB analysis by whole exome sequencing (WES) in formalin-fixed, paraffin-embedded samples from a phase III clinical trial. METHODS: In the CheckMate 026 clinical trial, TMB was retrospectively assessed in 312 patients with non-small-cell lung cancer (58% of the intent-to-treat population) who received first-line nivolumab treatment or standard-of-care chemotherapy. We examined the sensitivity of TMB assessment to bioinformatic filtering methods and assessed concordance between TMB data derived by WES and the FoundationOne® CDx assay. RESULTS: TMB scores comprising synonymous, indel, frameshift, and nonsense mutations (all mutations) were 3.1-fold higher than data including missense mutations only, but values were highly correlated (Spearman's r = 0.99). Scores from CheckMate 026 samples including missense mutations only were similar to those generated from data in The Cancer Genome Atlas, but those including all mutations were generally higher. Using databases for germline subtraction (instead of matched controls) showed a trend for race-dependent increases in TMB scores. WES and FoundationOne CDx outputs were highly correlated (Spearman's r = 0.90). CONCLUSIONS: Parameter variation can impact TMB calculations, highlighting the need for standardization. Encouragingly, differences between assays could be accounted for by empirical calibration, suggesting that reliable TMB assessment across assays, platforms, and centers is achievable.


Subject(s)
Biomarkers, Tumor , Carcinoma, Non-Small-Cell Lung/genetics , Computational Biology , Lung Neoplasms/genetics , Mutation , Carcinoma, Non-Small-Cell Lung/mortality , Carcinoma, Non-Small-Cell Lung/pathology , Computational Biology/methods , Genetic Association Studies , Genetic Predisposition to Disease , Humans , Lung Neoplasms/pathology , Prognosis , Reproducibility of Results , Exome Sequencing , Workflow
5.
Mol Omics ; 15(1): 67-76, 2019 02 11.
Article in English | MEDLINE | ID: mdl-30702115

ABSTRACT

The scientific value of re-analyzing existing datasets is often proportional to the complexity of the data. Proteomics data are inherently complex and can be analyzed at many levels, including proteins, peptides, and post-translational modifications to verify and/or develop new hypotheses. In this paper, we present our re-analysis of a previously published study comparing colon biopsy samples from ulcerative colitis (UC) patients to non-affected controls. We used a different statistical approach, employing a linear mixed-effects regression model and analyzed the data both on the protein and peptide level. In addition to confirming and reinforcing the original finding of upregulation of neutrophil extracellular traps (NETs), we report novel findings, including that Extracellular Matrix (ECM) degradation and neutrophil maturation are involved in the pathology of UC. The pharmaceutically most relevant differential protein expressions were confirmed using immunohistochemistry as an orthogonal method. As part of this study, we also compared proteomics data to previously published mRNA expression data. These comparisons indicated compensatory regulation at transcription levels of the ECM proteins we identified and open possible new avenues for drug discovery.


Subject(s)
Colitis, Ulcerative/metabolism , Colitis, Ulcerative/pathology , Extracellular Matrix/metabolism , Biopsy , Case-Control Studies , Colon/metabolism , Colon/pathology , Humans , Hydroxyproline/metabolism , Proteins/genetics , Proteins/metabolism , Quality Control
6.
Cancer Discov ; 8(7): 822-835, 2018 07.
Article in English | MEDLINE | ID: mdl-29773717

ABSTRACT

KRAS is the most common oncogenic driver in lung adenocarcinoma (LUAC). We previously reported that STK11/LKB1 (KL) or TP53 (KP) comutations define distinct subgroups of KRAS-mutant LUAC. Here, we examine the efficacy of PD-1 inhibitors in these subgroups. Objective response rates to PD-1 blockade differed significantly among KL (7.4%), KP (35.7%), and K-only (28.6%) subgroups (P < 0.001) in the Stand Up To Cancer (SU2C) cohort (174 patients) with KRAS-mutant LUAC and in patients treated with nivolumab in the CheckMate-057 phase III trial (0% vs. 57.1% vs. 18.2%; P = 0.047). In the SU2C cohort, KL LUAC exhibited shorter progression-free (P < 0.001) and overall (P = 0.0015) survival compared with KRASMUT;STK11/LKB1WT LUAC. Among 924 LUACs, STK11/LKB1 alterations were the only marker significantly associated with PD-L1 negativity in TMBIntermediate/High LUAC. The impact of STK11/LKB1 alterations on clinical outcomes with PD-1/PD-L1 inhibitors extended to PD-L1-positive non-small cell lung cancer. In Kras-mutant murine LUAC models, Stk11/Lkb1 loss promoted PD-1/PD-L1 inhibitor resistance, suggesting a causal role. Our results identify STK11/LKB1 alterations as a major driver of primary resistance to PD-1 blockade in KRAS-mutant LUAC.Significance: This work identifies STK11/LKB1 alterations as the most prevalent genomic driver of primary resistance to PD-1 axis inhibitors in KRAS-mutant lung adenocarcinoma. Genomic profiling may enhance the predictive utility of PD-L1 expression and tumor mutation burden and facilitate establishment of personalized combination immunotherapy approaches for genomically defined LUAC subsets. Cancer Discov; 8(7); 822-35. ©2018 AACR.See related commentary by Etxeberria et al., p. 794This article is highlighted in the In This Issue feature, p. 781.


Subject(s)
Adenocarcinoma of Lung/drug therapy , Drug Resistance, Neoplasm/genetics , Lung Neoplasms/drug therapy , Mutation , Nivolumab/therapeutic use , Protein Serine-Threonine Kinases/genetics , Proto-Oncogene Proteins p21(ras)/genetics , AMP-Activated Protein Kinase Kinases , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/metabolism , Adenocarcinoma of Lung/therapy , Adult , Aged , Aged, 80 and over , Animals , Antineoplastic Agents, Immunological/pharmacology , Antineoplastic Agents, Immunological/therapeutic use , Disease Models, Animal , Humans , Immunotherapy , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Lung Neoplasms/therapy , Male , Mice , Middle Aged , Nivolumab/pharmacology , Prognosis , Programmed Cell Death 1 Receptor/antagonists & inhibitors , Progression-Free Survival
7.
Biotechnol Bioeng ; 115(9): 2377-2382, 2018 09.
Article in English | MEDLINE | ID: mdl-29777592

ABSTRACT

This study reports findings of an unusual cluster of mutations spanning 22 bp (base pairs) in a monoclonal antibody expression vector. It was identified by two orthogonal methods: mass spectrometry on expressed protein and next-generation sequencing (NGS) on the plasmid DNA. While the initial NGS analysis confirmed the designed sequence modification, intact mass analysis detected an additional mass of the antibody molecule expressed in CHO cells. The extra mass was eventually found to be associated with unmatched nucleotides in a distal region by checking full-length sequence alignment plots. Interestingly, the complementary sequence of the mutated sequence was a reverse sequence of the original sequence and flanked by two 10-bp reverse-complementary sequences, leading to an undesirable DNA recombination. The finding highlights the necessity of rigorous examination of expression vector design and early monitoring of molecule integrity at both DNA and protein levels to prevent clones from having sequence variants during cell line development.


Subject(s)
Antibodies/metabolism , Genetic Vectors , Immunologic Factors/metabolism , Mutation , Recombinant Proteins/metabolism , Animals , Antibodies/chemistry , Antibodies/genetics , CHO Cells , Cricetulus , High-Throughput Nucleotide Sequencing , Immunologic Factors/chemistry , Immunologic Factors/genetics , Mass Spectrometry , Plasmids , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombination, Genetic
8.
Neurol Genet ; 3(6): e207, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29264398

ABSTRACT

OBJECTIVE: To examine the incidence of nonsynonymous missense variants in SCN9A (NaV1.7), SCN10A (NaV1.8), and SCN11A (NaV1.9) in patients with painful and nonpainful peripheral neuropathy. METHODS: Next-generation sequencing was performed on 457 patient DNA samples provided by the Peripheral Neuropathy Research Registry (PNRR). The patient diagnosis was as follows: 278 idiopathic peripheral neuropathy (67% painful and 33% nonpainful) and 179 diabetic distal polyneuropathy (77% painful and 23% nonpainful). RESULTS: We identified 36 (SCN9A), 31 (SCN10A), and 15 (SCN11A) nonsynonymous missense variants, with 47.7% of patients carrying a low-frequency (minor allele frequency <5%) missense variant in at least 1 gene. The incidence of previously reported gain-of-function missense variants was low (≤3%), and these were detected in patients with and without pain. There were no significant differences in missense variant allele frequencies of any gene, or SCN9A haplotype frequencies, between PNRR patients with painful or nonpainful peripheral neuropathy. PNRR patient SCN9A and SCN11A missense variant allele frequencies were not significantly different from the Exome Variant Server, European American (EVS-EA) reference population. For SCN10A, there was a significant increase in the alternate allele frequency of the common variant p.V1073A and low-frequency variant pS509P in PNRR patients compared with EVS-EA and the 1000 Genomes European reference populations. CONCLUSIONS: These results suggest that identification of a genetically defined subpopulation for testing of NaV1.7 inhibitors in patients with peripheral neuropathy is unlikely and that additional factors, beyond expression of previously reported disease "mutations," are more important for the development of painful neuropathy than previously discussed.

9.
Bioinformatics ; 33(23): 3811-3812, 2017 Dec 01.
Article in English | MEDLINE | ID: mdl-28961906

ABSTRACT

SUMMARY: The simplicity and precision of CRISPR/Cas9 system has brought in a new era of gene editing. Screening for desired clones with CRISPR-mediated genomic edits in a large number of samples is made possible by next generation sequencing (NGS) due to its multiplexing. Here we present CRISPR-DAV (CRISPR Data Analysis and Visualization) pipeline to analyze the CRISPR NGS data in a high throughput manner. In the pipeline, Burrows-Wheeler Aligner and Assembly Based ReAlignment are used for small and large indel detection, and results are presented in a comprehensive set of charts and interactive alignment view. AVAILABILITY AND IMPLEMENTATION: CRISPR-DAV is available at GitHub and Docker Hub repositories: https://github.com/pinetree1/crispr-dav.git and https://hub.docker.com/r/pinetree1/crispr-dav/. CONTACT: xuning.wang@bms.com.


Subject(s)
Clone Cells , Clustered Regularly Interspaced Short Palindromic Repeats , High-Throughput Nucleotide Sequencing/methods , INDEL Mutation , Sequence Analysis, DNA/methods , Software , Bacteria/genetics , Genome, Bacterial , Genomics/methods
10.
Adv Ther ; 33(7): 1169-79, 2016 07.
Article in English | MEDLINE | ID: mdl-27287851

ABSTRACT

INTRODUCTION: The combination of daclatasvir (DCV, pan-genotypic NS5A inhibitor) plus asunaprevir (ASV; NS3 protease inhibitor) is approved in Japan, Korea and other countries for the treatment of chronic hepatitis C virus (HCV) genotype (GT)-1. A high (~90 to 100%) sustained virologic response (SVR) with DCV/ASV therapy has been achieved by excluding patients infected with HCV GT-1b with baseline NS5A resistance-associated variants (RAVs) at L31 or Y93H detected by direct sequencing (DS). We set out to determine whether patients with minor variants at NS5A-L31 or -Y93H, detected by next-generation sequencing (NGS), impacted SVR rates with DCV/ASV therapy. METHODS: Baseline samples from 222 interferon (IFN)-ineligible/intolerant (N = 135) and prior non-responder (N = 87) patients infected with GT-1b who were treated with DCV/ASV for 24 weeks in the Phase 3 clinical study AI447026 were prepared for NGS (Ion-Torrent platform). The prevalence of baseline NS5A RAVs and their impact on SVR when observed at ≥1% by NGS in a patient's virus population were examined. NGS and DS (sensitivity ≥20%) data were compared. RESULTS: The prevalence of baseline NS5A RAVs at L31 or Y93H was 29% (63/219) and 18% (39/214) by NGS and DS, respectively. SVR24 rates were comparable in patients without observed baseline L31 or Y93H polymorphisms whether assessed by NGS (96%; 148/154) or by the less sensitive DS platform (95%; 164/173). CONCLUSION: Optimal SVR rates (≥95%) to DCV/ASV treatment were achieved using DS to exclude patients infected with GT-1b with NS5A RAVs at L31 or Y93H representing ≥20% of their virus population. Exclusion by NGS of patients with minor variants in NS5A (<20%) did not enhance SVR rates. These results suggest that the presence of minor variants in NS5A does not appear to impact the overall SVR rate in patients with GT-1b treated with DCV/ASV. FUNDING: This study was sponsored by Bristol-Myers Squibb. TRIAL REGISTRATION: ClinicalTrials.gov identifier: NCT01497834.


Subject(s)
Antiviral Agents/therapeutic use , Hepacivirus/genetics , Hepatitis C, Chronic/drug therapy , Imidazoles/therapeutic use , Isoquinolines/therapeutic use , Sulfonamides/therapeutic use , Adult , Aged , Antiviral Agents/administration & dosage , Carbamates , Drug Therapy, Combination , Genotype , Humans , Imidazoles/administration & dosage , Isoquinolines/administration & dosage , Japan , Male , Middle Aged , Polymorphism, Genetic , Pyrrolidines , Sulfonamides/administration & dosage , Valine/analogs & derivatives , Viral Nonstructural Proteins/genetics , Young Adult
11.
Mol Cancer Ther ; 14(10): 2167-74, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26253517

ABSTRACT

The BET (bromodomain and extra-terminal) proteins bind acetylated histones and recruit protein complexes to promote transcription elongation. In hematologic cancers, BET proteins have been shown to regulate expression of MYC and other genes that are important to disease pathology. Pharmacologic inhibition of BET protein binding has been shown to inhibit tumor growth in MYC-dependent cancers, such as multiple myeloma. In this study, we demonstrate that small cell lung cancer (SCLC) cells are exquisitely sensitive to growth inhibition by the BET inhibitor JQ1. JQ1 treatment has no impact on MYC protein expression, but results in downregulation of the lineage-specific transcription factor ASCL1. SCLC cells that are sensitive to JQ1 are also sensitive to ASCL1 depletion by RNAi. Chromatin immunoprecipitation studies confirmed the binding of the BET protein BRD4 to the ASCL1 enhancer, and the ability of JQ1 to disrupt the interaction. The importance of ASCL1 as a potential driver oncogene in SCLC is further underscored by the observation that ASCL1 is overexpressed in >50% of SCLC specimens, an extent greater than that observed for other putative oncogenes (MYC, MYCN, and SOX2) previously implicated in SCLC. Our studies have provided a mechanistic basis for the sensitivity of SCLC to BET inhibition and a rationale for the clinical development of BET inhibitors in this disease with high unmet medical need.


Subject(s)
Antineoplastic Agents/pharmacology , Azepines/pharmacology , Basic Helix-Loop-Helix Transcription Factors/genetics , Lung Neoplasms/metabolism , Nuclear Proteins/metabolism , Small Cell Lung Carcinoma/metabolism , Transcription Factors/metabolism , Triazoles/pharmacology , Basic Helix-Loop-Helix Transcription Factors/metabolism , Cell Cycle Proteins , Cell Line, Tumor , Cell Proliferation/drug effects , Cell Survival/drug effects , Drug Resistance, Neoplasm/genetics , Drug Screening Assays, Antitumor , Enhancer Elements, Genetic , Gene Expression , Gene Expression Regulation, Neoplastic/drug effects , Humans , Inhibitory Concentration 50 , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Protein Binding , Small Cell Lung Carcinoma/drug therapy , Small Cell Lung Carcinoma/genetics , Transcriptome/drug effects
12.
Methods Mol Biol ; 1101: 31-42, 2014.
Article in English | MEDLINE | ID: mdl-24233776

ABSTRACT

Most high-throughput methods which are used in molecular biology generate gene lists. Interpreting large gene lists can reveal mechanistic insights and generate useful testable hypotheses. The process can be cumbersome and challenging. Multiple commercial and open solution currently exist that can aid researchers in the functional annotation of gene lists. The process of gene set annotation includes dataset preparation, which is method specific, gene list annotation and analysis and interpretation of the significant associations that were found. In this chapter, we demonstrate how WebGestalt can be applied to gene lists generated from transcriptional profiling data.


Subject(s)
Antibodies, Monoclonal/pharmacology , Antineoplastic Agents/pharmacology , Gene Expression Regulation, Neoplastic/drug effects , Molecular Sequence Annotation/methods , Software , Binding Sites , Biopsy , Gene Ontology , Gene Regulatory Networks , HLA-D Antigens/physiology , Humans , Ipilimumab , Neoplasms/drug therapy , Neoplasms/genetics , Neoplasms/metabolism , Regulatory Sequences, Nucleic Acid , Signal Transduction , Transcription Factors/physiology
13.
Nucleic Acids Res ; 37(Database issue): D54-60, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18971253

ABSTRACT

The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein-DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data 'boutiques' within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk.


Subject(s)
Databases, Genetic , Gene Expression Regulation , Regulatory Elements, Transcriptional , Software , Transcription Factors/metabolism , Base Sequence , Binding Sites , Conserved Sequence , Sequence Alignment , Sequence Analysis, DNA
14.
Genome Biol ; 8(10): R207, 2007.
Article in English | MEDLINE | ID: mdl-17916232

ABSTRACT

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at http://www.pazar.info, is open for business.


Subject(s)
Databases, Genetic , Internet , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics
15.
Methods Mol Biol ; 408: 19-33, 2007.
Article in English | MEDLINE | ID: mdl-18314575

ABSTRACT

High-throughput experiments in biology often produce sets of genes of potential interests. Some of those gene sets might be of considerable size. Therefore, computer-assisted analysis is necessary for the biological interpretation of the gene sets, and for creating working hypotheses, which can be tested experimentally. One obvious way to analyze gene set data is to associate the genes with a particular biological feature, for example, a given pathway. Statistical analysis could be used to evaluate if a gene set is truly associated with a feature. Over the past few years many tools that perform such analysis have been created. In this chapter, using WebGestalt as an example, it will be explained in detail how to associate gene sets with functional annotations, pathways, publication records, and protein domains.


Subject(s)
Databases, Genetic , Genetic Techniques/statistics & numerical data , Software , Computational Biology , Data Interpretation, Statistical , Gene Expression Profiling/statistics & numerical data , Genomics/statistics & numerical data , Oligonucleotide Array Sequence Analysis/statistics & numerical data
16.
J Biomed Biotechnol ; 2005(2): 172-80, 2005 Jun 30.
Article in English | MEDLINE | ID: mdl-16046823

ABSTRACT

Gene expression microarray data can be used for the assembly of genetic coexpression network graphs. Using mRNA samples obtained from recombinant inbred Mus musculus strains, it is possible to integrate allelic variation with molecular and higher-order phenotypes. The depth of quantitative genetic analysis of microarray data can be vastly enhanced utilizing this mouse resource in combination with powerful computational algorithms, platforms, and data repositories. The resulting network graphs transect many levels of biological scale. This approach is illustrated with the extraction of cliques of putatively co-regulated genes and their annotation using gene ontology analysis and cis-regulatory element discovery. The causal basis for co-regulation is detected through the use of quantitative trait locus mapping.

17.
Nucleic Acids Res ; 33(Web Server issue): W741-8, 2005 Jul 01.
Article in English | MEDLINE | ID: mdl-15980575

ABSTRACT

High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from 'single genes' to 'gene sets'. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg_enrich.php.


Subject(s)
Genes , Software , Computer Graphics , Data Interpretation, Statistical , Databases, Genetic , Gene Expression , Genomics , Humans , Internet , Proteomics , Systems Integration , Tissue Distribution , User-Computer Interface
18.
BMC Bioinformatics ; 5: 16, 2004 Feb 18.
Article in English | MEDLINE | ID: mdl-14975175

ABSTRACT

BACKGROUND: Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. RESULTS: We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at http://genereg.ornl.gov/gotm/. CONCLUSION: GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets.


Subject(s)
Genes, Insect/physiology , Genes/physiology , Animals , Cluster Analysis , Computational Biology/statistics & numerical data , Computer Graphics/statistics & numerical data , Data Interpretation, Statistical , Databases, Genetic/statistics & numerical data , Diptera/genetics , Gene Expression Profiling/statistics & numerical data , Genome , Genome, Human , Humans , Internet , Mice , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Rats , Software/statistics & numerical data , Software Design , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...