Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
Cancer Res ; 2024 Jun 26.
Article in English | MEDLINE | ID: mdl-38924467

ABSTRACT

Adaptive metabolic switches are proposed to underlie conversions between cellular states during normal development as well as in cancer evolution. Metabolic adaptations represent important therapeutic targets in tumors, highlighting the need to characterize the full spectrum, characteristics, and regulation of the metabolic switches. To investigate the hypothesis that metabolic switches associated with specific metabolic states can be recognized by locating large alternating gene expression patterns, we developed a method to identify interspersed gene sets by massive correlated biclustering (MCbiclust) and to predict their metabolic wiring. Testing the method on breast cancer transcriptome datasets revealed a series of gene sets with switch-like behavior that could be used to predict mitochondrial content, metabolic activity, and central carbon flux in tumors. The predictions were experimentally validated by bioenergetic profiling and metabolic flux analysis of 13C-labelled substrates. The metabolic switch positions also distinguished between cellular states, correlating with tumor pathology, prognosis, and chemosensitivity. The method is applicable to any large and heterogeneous transcriptome dataset to discover metabolic and associated pathophysiological states.

2.
Bioinformatics ; 39(7)2023 07 01.
Article in English | MEDLINE | ID: mdl-37364005

ABSTRACT

MOTIVATION: Liquid Chromatography Tandem Mass Spectrometry experiments aim to produce high-quality fragmentation spectra, which can be used to annotate metabolites. However, current Data-Dependent Acquisition approaches may fail to collect spectra of sufficient quality and quantity for experimental outcomes, and extend poorly across multiple samples by failing to share information across samples or by requiring manual expert input. RESULTS: We present TopNEXt, a real-time scan prioritization framework that improves data acquisition in multi-sample Liquid Chromatography Tandem Mass Spectrometry metabolomics experiments. TopNEXt extends traditional Data-Dependent Acquisition exclusion methods across multiple samples by using a Region of Interest and intensity-based scoring system. Through both simulated and lab experiments, we show that methods incorporating these novel concepts acquire fragmentation spectra for an additional 10% of our set of target peaks and with an additional 20% of acquisition intensity. By increasing the quality and quantity of fragmentation spectra, TopNEXt can help improve metabolite identification with a potential impact across a variety of experimental contexts. AVAILABILITY AND IMPLEMENTATION: TopNEXt is implemented as part of the ViMMS framework and the latest version can be found at https://github.com/glasgowcompbio/vimms. A stable version used to produce our results can be found at 10.5281/zenodo.7468914.


Subject(s)
Metabolomics , Mass Spectrometry/methods , Chromatography, Liquid/methods , Metabolomics/methods
3.
Front Mol Biosci ; 10: 1130781, 2023.
Article in English | MEDLINE | ID: mdl-36959982

ABSTRACT

Data-Dependent and Data-Independent Acquisition modes (DDA and DIA, respectively) are both widely used to acquire MS2 spectra in untargeted liquid chromatography tandem mass spectrometry (LC-MS/MS) metabolomics analyses. Despite their wide use, little work has been attempted to systematically compare their MS/MS spectral annotation performance in untargeted settings due to the lack of ground truth and the costs involved in running a large number of acquisitions. Here, we present a systematic in silico comparison of these two acquisition methods in untargeted metabolomics by extending our Virtual Metabolomics Mass Spectrometer (ViMMS) framework with a DIA module. Our results show that the performance of these methods varies with the average number of co-eluting ions as the most important factor. At low numbers, DIA outperforms DDA, but at higher numbers, DDA has an advantage as DIA can no longer deal with the large amount of overlapping ion chromatograms. Results from simulation were further validated on an actual mass spectrometer, demonstrating that using ViMMS we can draw conclusions from simulation that translate well into the real world. The versatility of the Virtual Metabolomics Mass Spectrometer (ViMMS) framework in simulating different parameters of both Data-Dependent and Data-Independent Acquisition (DDA and DIA) modes is a key advantage of this work. Researchers can easily explore and compare the performance of different acquisition methods within the ViMMS framework, without the need for expensive and time-consuming experiments with real experimental data. By identifying the strengths and limitations of each acquisition method, researchers can optimize their choice and obtain more accurate and robust results. Furthermore, the ability to simulate and validate results using the ViMMS framework can save significant time and resources, as it eliminates the need for numerous experiments. This work not only provides valuable insights into the performance of DDA and DIA, but it also opens the door for further advancements in LC-MS/MS data acquisition methods.

4.
Bioinformatics ; 38(3): 730-737, 2022 01 12.
Article in English | MEDLINE | ID: mdl-33471074

ABSTRACT

MOTIVATION: High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. RESULTS: We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. AVAILABILITY AND IMPLEMENTATION: Code is available at: https://github.com/rvinas/adversarial-gene-expression. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Escherichia coli , Gene Expression Profiling , Humans , Gene Expression Profiling/methods , Gene Expression
5.
JAMA Netw Open ; 4(10): e2124946, 2021 10 01.
Article in English | MEDLINE | ID: mdl-34633425

ABSTRACT

Importance: Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic. Objectives: To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups. Design, Setting, and Participants: This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries. Main Outcomes and Measures: Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated. Results: In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger. Conclusions and Relevance: In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.


Subject(s)
Algorithms , Benchmarking , COVID-19/diagnosis , Clinical Decision Rules , Crowdsourcing , Hospitalization/statistics & numerical data , Machine Learning , Adolescent , Adult , Aged , Aged, 80 and over , Area Under Curve , COVID-19/epidemiology , COVID-19/therapy , COVID-19 Testing , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Middle Aged , Models, Statistical , Prognosis , ROC Curve , Severity of Illness Index , Washington/epidemiology , Young Adult
6.
EMBO Rep ; 21(9): e48260, 2020 09 03.
Article in English | MEDLINE | ID: mdl-32783398

ABSTRACT

IκB kinase ε (IKKε) is a key molecule at the crossroads of inflammation and cancer. Known to regulate cytokine secretion via NFκB and IRF3, the kinase is also a breast cancer oncogene, overexpressed in a variety of tumours. However, to what extent IKKε remodels cellular metabolism is currently unknown. Here, we used metabolic tracer analysis to show that IKKε orchestrates a complex metabolic reprogramming that affects mitochondrial metabolism and consequently serine biosynthesis independently of its canonical signalling role. We found that IKKε upregulates the serine biosynthesis pathway (SBP) indirectly, by limiting glucose-derived pyruvate utilisation in the TCA cycle, inhibiting oxidative phosphorylation. Inhibition of mitochondrial function induces activating transcription factor 4 (ATF4), which in turn drives upregulation of the expression of SBP genes. Importantly, pharmacological reversal of the IKKε-induced metabolic phenotype reduces proliferation of breast cancer cells. Finally, we show that in a highly proliferative set of ER negative, basal breast tumours, IKKε and PSAT1 are both overexpressed, corroborating the link between IKKε and the SBP in the clinical context.


Subject(s)
Breast Neoplasms , I-kappa B Kinase , Mitochondria , Serine/biosynthesis , Breast Neoplasms/genetics , Female , Humans , I-kappa B Kinase/genetics , Mitochondria/genetics , Mitochondria/metabolism , Oncogenes/genetics
7.
Cell ; 178(6): 1299-1312.e29, 2019 09 05.
Article in English | MEDLINE | ID: mdl-31474368

ABSTRACT

Metformin is the first-line therapy for treating type 2 diabetes and a promising anti-aging drug. We set out to address the fundamental question of how gut microbes and nutrition, key regulators of host physiology, affect the effects of metformin. Combining two tractable genetic models, the bacterium E. coli and the nematode C. elegans, we developed a high-throughput four-way screen to define the underlying host-microbe-drug-nutrient interactions. We show that microbes integrate cues from metformin and the diet through the phosphotransferase signaling pathway that converges on the transcriptional regulator Crp. A detailed experimental characterization of metformin effects downstream of Crp in combination with metabolic modeling of the microbiota in metformin-treated type 2 diabetic patients predicts the production of microbial agmatine, a regulator of metformin effects on host lipid metabolism and lifespan. Our high-throughput screening platform paves the way for identifying exploitable drug-nutrient-microbiome interactions to improve host health and longevity through targeted microbiome therapies. VIDEO ABSTRACT.


Subject(s)
Diabetes Mellitus, Type 2/drug therapy , Gastrointestinal Microbiome/drug effects , Host Microbial Interactions/drug effects , Hypoglycemic Agents/therapeutic use , Metformin/therapeutic use , Agmatine/metabolism , Animals , Caenorhabditis elegans/microbiology , Cyclic AMP Receptor Protein , Escherichia coli/drug effects , Escherichia coli/genetics , Humans , Hypoglycemic Agents/pharmacology , Lipid Metabolism/drug effects , Longevity/drug effects , Metformin/pharmacology , Nutrients/metabolism
8.
Methods Mol Biol ; 1928: 469-478, 2019.
Article in English | MEDLINE | ID: mdl-30725470

ABSTRACT

Transcription of a large set of nuclear-encoded genes underlies biogenesis of mitochondria, regulated by a complex network of transcription factors and co-regulators. A remarkable heterogeneity can be detected in the expression of these genes in different cell types and tissues, and the recent availability of large gene expression compendiums allows the quantification of specific mitochondrial biogenesis patterns. We have developed a method to effectively perform this task. Massively correlated biclustering (MCbiclust) is a novel bioinformatics method that has been successfully applied to identify co-regulation patterns in large genesets, underlying essential cellular functions and determining cell types. The method has been recently evaluated and made available as a package in Bioconductor for R. One of the potential applications of the method is to compare expression of nuclear-encoded mitochondrial genes or larger sets of metabolism-related genes between different cell types or cellular metabolic states. Here we describe the essential steps to use MCbiclust as a tool to investigate co-regulation of mitochondrial genes and metabolic pathways.


Subject(s)
Cluster Analysis , Computational Biology , Gene Expression Profiling , Gene Expression Regulation , Genes, Mitochondrial , Mitochondria/metabolism , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling/methods , Gene Regulatory Networks , Metabolic Networks and Pathways
9.
Nucleic Acids Res ; 45(15): 8712-8730, 2017 Sep 06.
Article in English | MEDLINE | ID: mdl-28911113

ABSTRACT

The potential to understand fundamental biological processes from gene expression data has grown in parallel with the recent explosion of the size of data collections. However, to exploit this potential, novel analytical methods are required, capable of discovering large co-regulated gene networks. We found current methods limited in the size of correlated gene sets they could discover within biologically heterogeneous data collections, hampering the identification of multi-gene controlled fundamental cellular processes such as energy metabolism, organelle biogenesis and stress responses. Here we describe a novel biclustering algorithm called Massively Correlated Biclustering (MCbiclust) that selects samples and genes from large datasets with maximal correlated gene expression, allowing regulation of complex networks to be examined. The method has been evaluated using synthetic data and applied to large bacterial and cancer cell datasets. We show that the large biclusters discovered, so far elusive to identification by existing techniques, are biologically relevant and thus MCbiclust has great potential in the analysis of transcriptomics data to identify large-scale unknown effects hidden within the data. The identified massive biclusters can be used to develop improved transcriptomics based diagnosis tools for diseases caused by altered gene expression, or used for further network analysis to understand genotype-phenotype correlations.


Subject(s)
Algorithms , Datasets as Topic , Gene Expression Profiling , Gene Regulatory Networks/physiology , High-Throughput Nucleotide Sequencing , Neoplasms/genetics , Cluster Analysis , Databases, Genetic , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation , Genes, Regulator , Genetic Association Studies/methods , Genetic Association Studies/statistics & numerical data , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Phenotype
10.
Cell ; 169(3): 442-456.e18, 2017 04 20.
Article in English | MEDLINE | ID: mdl-28431245

ABSTRACT

Fluoropyrimidines are the first-line treatment for colorectal cancer, but their efficacy is highly variable between patients. We queried whether gut microbes, a known source of inter-individual variability, impacted drug efficacy. Combining two tractable genetic models, the bacterium E. coli and the nematode C. elegans, we performed three-way high-throughput screens that unraveled the complexity underlying host-microbe-drug interactions. We report that microbes can bolster or suppress the effects of fluoropyrimidines through metabolic drug interconversion involving bacterial vitamin B6, B9, and ribonucleotide metabolism. Also, disturbances in bacterial deoxynucleotide pools amplify 5-FU-induced autophagy and cell death in host cells, an effect regulated by the nucleoside diphosphate kinase ndk-1. Our data suggest a two-way bacterial mediation of fluoropyrimidine effects on host metabolism, which contributes to drug efficacy. These findings highlight the potential therapeutic power of manipulating intestinal microbiota to ensure host metabolic health and treat disease.


Subject(s)
Antineoplastic Agents/metabolism , Escherichia coli/metabolism , Fluorouracil/metabolism , Gastrointestinal Microbiome , Animals , Autophagy , Caenorhabditis elegans , Cell Death , Colorectal Neoplasms/drug therapy , Diet , Escherichia coli/enzymology , Escherichia coli/genetics , Humans , Models, Animal , Pentosyltransferases/genetics
11.
Am J Med Genet B Neuropsychiatr Genet ; 174(3): 235-250, 2017 Apr.
Article in English | MEDLINE | ID: mdl-27696737

ABSTRACT

Response to antidepressant (AD) treatment may be a more polygenic trait than previously hypothesized, with many genetic variants interacting in yet unclear ways. In this study we used methods that can automatically learn to detect patterns of statistical regularity from a sparsely distributed signal across hippocampal transcriptome measurements in a large-scale animal pharmacogenomic study to uncover genomic variations associated with AD. The study used four inbred mouse strains of both sexes, two drug treatments, and a control group (escitalopram, nortriptyline, and saline). Multi-class and binary classification using Machine Learning (ML) and regularization algorithms using iterative and univariate feature selection methods, including InfoGain, mRMR, ANOVA, and Chi Square, were used to uncover genomic markers associated with AD response. Relevant genes were selected based on Jaccard distance and carried forward for gene-network analysis. Linear association methods uncovered only one gene associated with drug treatment response. The implementation of ML algorithms, together with feature reduction methods, revealed a set of 204 genes associated with SSRI and 241 genes associated with NRI response. Although only 10% of genes overlapped across the two drugs, network analysis shows that both drugs modulated the CREB pathway, through different molecular mechanisms. Through careful implementation and optimisations, the algorithms detected a weak signal used to predict whether an animal was treated with nortriptyline (77%) or escitalopram (67%) on an independent testing set. The results from this study indicate that the molecular signature of AD treatment may include a much broader range of genomic markers than previously hypothesized, suggesting that response to medication may be as complex as the pathology. The search for biomarkers of antidepressant treatment response could therefore consider a higher number of genetic markers and their interactions. Through predominately different molecular targets and mechanisms of action, the two drugs modulate the same Creb1 pathway which plays a key role in neurotrophic responses and in inflammatory processes. © 2016 The Authors. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics Published by Wiley Periodicals, Inc.


Subject(s)
Antidepressive Agents/therapeutic use , Serotonin and Noradrenaline Reuptake Inhibitors/pharmacology , Animals , Citalopram/therapeutic use , Cyclic AMP Response Element-Binding Protein , Depression/drug therapy , Depressive Disorder/drug therapy , Depressive Disorder/genetics , Disease Models, Animal , Female , Hippocampus , Male , Mice , Multifactorial Inheritance/genetics , Nortriptyline/therapeutic use , Pharmacogenetics , Selective Serotonin Reuptake Inhibitors/therapeutic use , Serotonin and Noradrenaline Reuptake Inhibitors/therapeutic use , Transcriptome/genetics , Treatment Outcome
12.
Genome Biol ; 17(1): 184, 2016 09 07.
Article in English | MEDLINE | ID: mdl-27604469

ABSTRACT

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Subject(s)
Computational Biology , Proteins/chemistry , Software , Structure-Activity Relationship , Algorithms , Databases, Protein , Gene Ontology , Humans , Molecular Sequence Annotation , Proteins/genetics
13.
Nucleic Acids Res ; 41(Web Server issue): W349-57, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23748958

ABSTRACT

Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.


Subject(s)
Protein Conformation , Software , Animals , Internet , Membrane Proteins/chemistry , Mice , Proteins/chemistry , Sequence Analysis, Protein , Structural Homology, Protein
14.
BMC Bioinformatics ; 14 Suppl 3: S1, 2013.
Article in English | MEDLINE | ID: mdl-23514099

ABSTRACT

BACKGROUND: Accurate protein function annotation is a severe bottleneck when utilizing the deluge of high-throughput, next generation sequencing data. Keeping database annotations up-to-date has become a major scientific challenge that requires the development of reliable automatic predictors of protein function. The CAFA experiment provided a unique opportunity to undertake comprehensive 'blind testing' of many diverse approaches for automated function prediction. We report on the methodology we used for this challenge and on the lessons we learnt. METHODS: Our method integrates into a single framework a wide variety of biological information sources, encompassing sequence, gene expression and protein-protein interaction data, as well as annotations in UniProt entries. The methodology transfers functional categories based on the results from complementary homology-based and feature-based analyses. We generated the final molecular function and biological process assignments by combining the initial predictions in a probabilistic manner, which takes into account the Gene Ontology hierarchical structure. RESULTS: We propose a novel scoring function called COmbined Graph-Information Content similarity (COGIC) score for the comparison of predicted functional categories and benchmark data. We demonstrate that our integrative approach provides increased scope and accuracy over both the component methods and the naïve predictors. In line with previous studies, we find that molecular function predictions are more accurate than biological process assignments. CONCLUSIONS: Overall, the results indicate that there is considerable room for improvement in the field. It still remains for the community to invest a great deal of effort to make automated function prediction a useful and routine component in the toolbox of life scientists. As already witnessed in other areas, community-wide blind testing experiments will be pivotal in establishing standards for the evaluation of prediction accuracy, in fostering advancements and new ideas, and ultimately in recording progress.


Subject(s)
Proteins/physiology , Computational Biology/methods , Databases, Protein , Evolution, Molecular , Gene Expression , Molecular Sequence Annotation , Protein Interaction Mapping , Proteins/chemistry , Proteins/genetics , Sequence Analysis
15.
Nat Methods ; 10(3): 221-7, 2013 Mar.
Article in English | MEDLINE | ID: mdl-23353650

ABSTRACT

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.


Subject(s)
Computational Biology/methods , Molecular Biology/methods , Molecular Sequence Annotation , Proteins/physiology , Algorithms , Animals , Databases, Protein , Exoribonucleases/classification , Exoribonucleases/genetics , Exoribonucleases/physiology , Forecasting , Humans , Proteins/chemistry , Proteins/classification , Proteins/genetics , Species Specificity
16.
Bioinformatics ; 29(4): 413-9, 2013 Feb 15.
Article in English | MEDLINE | ID: mdl-23239673

ABSTRACT

MOTIVATION: Linkage analysis remains an important tool in elucidating the genetic component of disease and has become even more important with the advent of whole exome sequencing, enabling the user to focus on only those genomic regions co-segregating with Mendelian traits. Unfortunately, methods to perform multipoint linkage analysis scale poorly with either the number of markers or with the size of the pedigree. Large pedigrees with many markers can only be evaluated with Markov chain Monte Carlo (MCMC) methods that are slow to converge and, as no attempts have been made to exploit parallelism, massively underuse available processing power. Here, we describe SWIFTLINK, a novel application that performs MCMC linkage analysis by spreading the computational burden between multiple processor cores and a graphics processing unit (GPU) simultaneously. SWIFTLINK was designed around the concept of explicitly matching the characteristics of an algorithm with the underlying computer architecture to maximize performance. RESULTS: We implement our approach using existing Gibbs samplers redesigned for parallel hardware. We applied SWIFTLINK to a real-world dataset, performing parametric multipoint linkage analysis on a highly consanguineous pedigree with EAST syndrome, containing 28 members, where a subset of individuals were genotyped with single nucleotide polymorphisms (SNPs). In our experiments with a four core CPU and GPU, SWIFTLINK achieves a 8.5× speed-up over the single-threaded version and a 109× speed-up over the popular linkage analysis program SIMWALK. AVAILABILITY: SWIFTLINK is available at https://github.com/ajm/swiftlink. All source code is licensed under GPLv3.


Subject(s)
Genetic Linkage , Software , Algorithms , Genomics , Hearing Loss, Sensorineural/genetics , Humans , Intellectual Disability/genetics , Markov Chains , Monte Carlo Method , Pedigree , Polymorphism, Single Nucleotide , Seizures/genetics
17.
J Clin Microbiol ; 50(7): 2419-27, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22553238

ABSTRACT

The introduction of pneumococcal conjugate vaccines necessitates continued monitoring of circulating strains to assess vaccine efficacy and replacement serotypes. Conventional serological methods are costly, labor-intensive, and prone to misidentification, while current DNA-based methods have limited serotype coverage requiring multiple PCR primers. In this study, a computer algorithm was developed to interrogate the capsulation locus (cps) of vaccine serotypes to locate primer pairs in conserved regions that border variable regions and could differentiate between serotypes. In silico analysis of cps from 92 serotypes indicated that a primer pair spanning the regulatory gene cpsB could putatively amplify 84 serotypes and differentiate 46. This primer set was specific to Streptococcus pneumoniae, with no amplification observed for other species, including S. mitis, S. oralis, and S. pseudopneumoniae. One hundred thirty-eight pneumococcal strains covering 48 serotypes were tested. Of 23 vaccine serotypes included in the study, most (19/22, 86%) were identified correctly at least to the serogroup level, including all of the 13-valent conjugate vaccine and other replacement serotypes. Reproducibility was demonstrated by the correct sequetyping of different strains of a serotype. This novel sequence-based method employing a single PCR primer pair is cost-effective and simple. Furthermore, it has the potential to identify new serotypes that may evolve in the future.


Subject(s)
Molecular Typing/methods , Polymerase Chain Reaction/methods , Streptococcus pneumoniae/classification , Streptococcus pneumoniae/genetics , Computational Biology , Conserved Sequence , DNA Primers/genetics , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Humans , Molecular Sequence Data , Pneumococcal Infections/microbiology , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, DNA , Serotyping/methods , Streptococcus pneumoniae/isolation & purification
18.
Ann Hum Genet ; 74(6): 555-65, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20946257

ABSTRACT

Gene duplications represent an important class of evolutionary events that is likely to have contributed to the unique human phenotype in the short evolutionary time since the human-chimpanzee divergence. With the availability of both human and chimpanzee genome drafts in high coverage re-sequencing assemblies and the high annotation quality of most human genes, it should now be possible to identify all human lineage-specific gene duplication events (human inparalogues) and a few pioneering studies have attempted to do that. However, the different levels of coverage in the human and chimpanzee's genomes assemblies, and the differing levels of gene annotation, have led to problematic assumptions and oversimplifications in the algorithms and the datasets used to detect human lineage-specific gene duplications. In this study, we have developed a set of bioinformatic tools to overcome a number of the conceptual problems that are prevalent in previous studies and have collected a reliable and representative set of human inparalogues.


Subject(s)
Computational Biology , Evolution, Molecular , Gene Duplication , Genome, Human , Algorithms , Animals , Humans , Models, Biological , Molecular Sequence Annotation , Pan troglodytes/genetics , Proteome/genetics
19.
PLoS One ; 3(7): e2712, 2008 Jul 16.
Article in English | MEDLINE | ID: mdl-18628962

ABSTRACT

BACKGROUND: While much progress has been made in understanding stem cell (SC) function, a complete description of the molecular mechanisms regulating SCs is not yet established. This lack of knowledge is a major barrier holding back the discovery of therapeutic uses of SCs. We investigated the value of a novel meta-analysis of microarray gene expression in mouse SCs to aid the elucidation of regulatory mechanisms common to SCs and particular SC types. METHODOLOGY/PRINCIPAL FINDINGS: We added value to previously published microarray gene expression data by characterizing the promoter type likely to regulate transcription. Promoters of up-regulated genes in SCs were characterized in terms of alternative promoter (AP) usage and CpG-richness, with the aim of correlating features known to affect transcriptional control with SC function. We found that SCs have a higher proportion of up-regulated genes using CpG-rich promoters compared with the negative controls. Comparing subsets of SC type with the controls a slightly different story unfolds. The differences between the proliferating adult SCs and the embryonic SCs versus the negative controls are statistically significant. Whilst the difference between the quiescent adult SCs compared with the negative controls is not. On examination of AP usage, no difference was observed between SCs and the controls. However, comparing the subsets of SC type with the controls, the quiescent adult SCs are found to up-regulate a larger proportion of genes that have APs compared to the controls and the converse is true for the proliferating adult SCs and the embryonic SCs. CONCLUSIONS/SIGNIFICANCE: These findings suggest that looking at features associated with control of transcription is a promising future approach for characterizing "stemness" and that further investigations of stemness could benefit from separate considerations of different SC states. For example, "proliferating-stemness" is shown here, in terms of promoter usage, to be distinct from "quiescent-stemness".


Subject(s)
Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Stem Cells/cytology , Animals , Computational Biology/methods , CpG Islands , Genomics , Mice , Models, Genetic , Promoter Regions, Genetic , Research Design , Transcription, Genetic
20.
Curr Protein Pept Sci ; 8(2): 181-8, 2007 Apr.
Article in English | MEDLINE | ID: mdl-17430199

ABSTRACT

Domain prediction from sequence is a particularly challenging task, and currently, a large variety of different methodologies are employed to tackle the task. Here we try to classify these diverse approaches into a number of broad categories. Completely automatic domain prediction from sequence alone is currently fraught with problems, but this should not be so surprising since human experts currently have significant disagreement on domain assignment even when given the structures. It can be argued that we should only test the domain prediction methods on benchmark data that human experts agree upon and this is the approach we take in this paper. Even for the data sets on which human experts agree, automatic structure-based domain assignment still cannot always agree, and so again it is still unlikely that domain prediction methods will reliably obtain correct results completely automatically. We make the argument that computer-assisted domain prediction is a more achievable goal. With this aim in mind, we present the DomPred server. This server provides the user with the results from two completely different categories of method (DPS and DomSSEA). In this paper, each method is individually benchmarked against one of the latest domain prediction benchmarks to provide information about their respective reliabilities. A variety of different benchmark scores are employed since the accuracy of a domain prediction method depends critically on what types of results one wishes to obtain (single/multi-domain classification, domain number, residue linker positions, etc.). Also both of these methods, implemented within the DomPred server, can suggest alternative domain predictions, allowing the user to make the final decision based on these results and applying their own background knowledge to the problem. The DomPred server is available from the URL:http://bioinf.cs.ucl.ac.uk/software.html.


Subject(s)
Computers , Databases, Protein , Proteins/chemistry , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL
...