Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Am J Hum Genet ; 97(5): 677-90, 2015 Nov 05.
Article in English | MEDLINE | ID: mdl-26544803

ABSTRACT

Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively with best linear unbiased prediction (BLUP) methods. Such methods were pioneered in plant and animal-breeding literature and have since been applied to predict human traits, with the aim of eventual clinical utility. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two-variance-component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated with genetic markers. In simulations using real genotypes from the Candidate-gene Association Resource (CARe) and Framingham Heart Study (FHS) family cohorts, we demonstrate that the two-variance-component model achieves gains in prediction r(2) over standard BLUP at current sample sizes, and we project, based on simulations, that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two-variance-component model significantly improves prediction r(2) in each case, with up to a 20% relative improvement. We also find that standard mixed-model association tests can produce inflated test statistics in datasets with related individuals, whereas the two-variance-component model corrects for inflation.


Subject(s)
Cardiovascular Diseases/diagnosis , Genetic Markers , Genome-Wide Association Study , Models, Genetic , Models, Statistical , Quantitative Trait Loci , Cardiovascular Diseases/genetics , Computer Simulation , Datasets as Topic , Family , Genetic Association Studies , Genomics/methods , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Principal Component Analysis , Selection, Genetic/genetics
2.
Cell ; 161(3): 647-660, 2015 Apr 23.
Article in English | MEDLINE | ID: mdl-25910212

ABSTRACT

How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000 disease-associated variants. Here we functionally profile several thousand missense mutations across a spectrum of Mendelian disorders using various interaction assays. The majority of disease-associated alleles exhibit wild-type chaperone binding profiles, suggesting they preserve protein folding or stability. While common variants from healthy individuals rarely affect interactions, two-thirds of disease-associated alleles perturb protein-protein interactions, with half corresponding to "edgetic" alleles affecting only a subset of interactions while leaving most other interactions unperturbed. With transcription factors, many alleles that leave protein-protein interactions intact affect DNA binding. Different mutations in the same gene leading to different interaction profiles often result in distinct disease phenotypes. Thus disease-associated alleles that perturb distinct protein activities rather than grossly affecting folding and stability are relatively widespread.


Subject(s)
Disease/genetics , Mutation, Missense , Protein Interaction Maps , Proteins/genetics , Proteins/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Genome-Wide Association Study , Humans , Open Reading Frames , Protein Folding , Protein Stability
3.
Nat Genet ; 47(3): 284-90, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25642633

ABSTRACT

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.


Subject(s)
Bayes Theorem , Genetic Association Studies/methods , Genome, Human , Algorithms , Female , Genotyping Techniques , Humans , Linear Models , Polymorphism, Single Nucleotide , Quantitative Trait Loci
4.
Cell ; 158(2): 434-448, 2014 Jul 17.
Article in English | MEDLINE | ID: mdl-25036637

ABSTRACT

Chaperones are abundant cellular proteins that promote the folding and function of their substrate proteins (clients). In vivo, chaperones also associate with a large and diverse set of cofactors (cochaperones) that regulate their specificity and function. However, how these cochaperones regulate protein folding and whether they have chaperone-independent biological functions is largely unknown. We combined mass spectrometry and quantitative high-throughput LUMIER assays to systematically characterize the chaperone-cochaperone-client interaction network in human cells. We uncover hundreds of chaperone clients, delineate their participation in specific cochaperone complexes, and establish a surprisingly distinct network of protein-protein interactions for cochaperones. As a salient example of the power of such analysis, we establish that NUDC family cochaperones specifically associate with structurally related but evolutionarily distinct ß-propeller folds. We provide a framework for deciphering the proteostasis network and its regulation in development and disease and expand the use of chaperones as sensors for drug-target engagement.


Subject(s)
HSP70 Heat-Shock Proteins/metabolism , HSP90 Heat-Shock Proteins/metabolism , Protein Interaction Maps , Humans , Protein Folding , Tacrolimus Binding Proteins/metabolism
5.
Genetics ; 197(3): 1045-9, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24788602

ABSTRACT

Using a reduced subset of SNPs in a linear mixed model can improve power for genome-wide association studies, yet this can result in insufficient correction for population stratification. We propose a hybrid approach using principal components that does not inflate statistics in the presence of population stratification and improves power over standard linear mixed models.


Subject(s)
Genome-Wide Association Study , Principal Component Analysis , Confounding Factors, Epidemiologic , Genetics, Population , Humans , Linear Models , Models, Genetic , Multiple Sclerosis/genetics , Polymorphism, Single Nucleotide/genetics
6.
BMC Syst Biol ; 8: 13, 2014 Feb 07.
Article in English | MEDLINE | ID: mdl-24507381

ABSTRACT

BACKGROUND: Accurate estimation of parameters of biochemical models is required to characterize the dynamics of molecular processes. This problem is intimately linked to identifying the most informative experiments for accomplishing such tasks. While significant progress has been made, effective experimental strategies for parameter identification and for distinguishing among alternative network topologies remain unclear. We approached these questions in an unbiased manner using a unique community-based approach in the context of the DREAM initiative (Dialogue for Reverse Engineering Assessment of Methods). We created an in silico test framework under which participants could probe a network with hidden parameters by requesting a range of experimental assays; results of these experiments were simulated according to a model of network dynamics only partially revealed to participants. RESULTS: We proposed two challenges; in the first, participants were given the topology and underlying biochemical structure of a 9-gene regulatory network and were asked to determine its parameter values. In the second challenge, participants were given an incomplete topology with 11 genes and asked to find three missing links in the model. In both challenges, a budget was provided to buy experimental data generated in silico with the model and mimicking the features of different common experimental techniques, such as microarrays and fluorescence microscopy. Data could be bought at any stage, allowing participants to implement an iterative loop of experiments and computation. CONCLUSIONS: A total of 19 teams participated in this competition. The results suggest that the combination of state-of-the-art parameter estimation and a varied set of experimental methods using a few datasets, mostly fluorescence imaging data, can accurately determine parameters of biochemical models of gene regulation. However, the task is considerably more difficult if the gene network topology is not completely defined, as in challenge 2. Importantly, we found that aggregating independent parameter predictions and network topology across submissions creates a solution that can be better than the one from the best-performing submission.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks , Computer Simulation , Kinetics , Models, Genetic , Time Factors
7.
BMC Bioinformatics ; 14: 299, 2013 Oct 04.
Article in English | MEDLINE | ID: mdl-24093595

ABSTRACT

BACKGROUND: Comprehensive protein-protein interaction (PPI) maps are a powerful resource for uncovering the molecular basis of genetic interactions and providing mechanistic insights. Over the past decade, high-throughput experimental techniques have been developed to generate PPI maps at proteome scale, first using yeast two-hybrid approaches and more recently via affinity purification combined with mass spectrometry (AP-MS). Unfortunately, data from both protocols are prone to both high false positive and false negative rates. To address these issues, many methods have been developed to post-process raw PPI data. However, with few exceptions, these methods only analyze binary experimental data (in which each potential interaction tested is deemed either observed or unobserved), neglecting quantitative information available from AP-MS such as spectral counts. RESULTS: We propose a novel method for incorporating quantitative information from AP-MS data into existing PPI inference methods that analyze binary interaction data. Our approach introduces a probabilistic framework that models the statistical noise inherent in observations of co-purifications. Using a sampling-based approach, we model the uncertainty of interactions with low spectral counts by generating an ensemble of possible alternative experimental outcomes. We then apply the existing method of choice to each alternative outcome and aggregate results over the ensemble. We validate our approach on three recent AP-MS data sets and demonstrate performance comparable to or better than state-of-the-art methods. Additionally, we provide an in-depth discussion comparing the theoretical bases of existing approaches and identify common aspects that may be key to their performance. CONCLUSIONS: Our sampling framework extends the existing body of work on PPI analysis using binary interaction data to apply to the richer quantitative data now commonly available through AP-MS assays. This framework is quite general, and many enhancements are likely possible. Fruitful future directions may include investigating more sophisticated schemes for converting spectral counts to probabilities and applying the framework to direct protein complex prediction methods.


Subject(s)
Computational Biology/methods , Mass Spectrometry/methods , Protein Interaction Mapping/methods , Proteins/chemistry , Proteins/metabolism , Chromatography, Affinity/methods , Databases, Protein
8.
Med Teach ; 35(8): e1340-64, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23848374

ABSTRACT

BACKGROUND: Traditionally, clinical learning for medical students consists of short-term and opportunistic encounters with primarily acute-care patients, supervised by an array of clinician preceptors. In response to educational concerns, some medical schools have developed longitudinal placements rather than short-term rotations. Many of these longitudinal placements are also integrated across the core clinical disciplines, are commonly termed longitudinal integrated clerkships (LICs) and often situated in rural locations. This review aimed to explore, analyse and synthesise evidence relating to the effectiveness of longitudinal placements, for medical students in particular to determine which aspects are most critical to successful outcomes. METHOD: Extensive search of the literature resulted in 1679 papers and abstracts being considered, with 53 papers ultimately being included for review. The review group coded these 53 papers according to standard BEME review guidelines. Specific information extracted included: data relating to effectiveness, the location of the study, number of students involved, format, length and description of placement, the learning outcomes, research design, the impact level for evaluation and the main evaluation methods and findings. We applied a realist approach to consider what works well for whom and under what circumstances. FINDINGS: The early LICs were all community-based immersion programs, situated in general practice and predominantly in rural settings. More recent LIC innovations were situated in tertiary-level specialist ambulatory care in urban settings. Not all placements were integrated across medical disciplines but were longitudinal in relation to location, patient base and/or supervision. Twenty-four papers focussed on one of four programs from different viewpoints. Most evaluations were student opinion (survey, interview, focus group) and/or student assessment results. Placements varied from one half day per week for six months through to full time immersion for more than 12 months. The predominant mechanism relating to factors influencing effectiveness was continuity of one or more of: patient care, supervision and mentorship, peer group and location. The success of LICs and participation satisfaction depended on the preparation of both students and clinical supervisors, and the level of support each received from their academic institutions. CONCLUSION: Longitudinal placements, including longitudinal integrated placements, are gaining in popularity as an alternative to traditional block rotations. Although relatively few established LICs currently exist, medical schools may look for ways to incorporate some of the principles of LICs more generally in their clinical education programmes. Further research is required to ascertain the optimum length of time for placements depending on the defined learning outcomes and timing within the programme, which students are most likely to benefit and the effects of context such as location and type of integration.


Subject(s)
Clinical Clerkship/organization & administration , Education, Medical, Undergraduate/organization & administration , Attitude of Health Personnel , Behavior , Career Choice , Clinical Clerkship/standards , Clinical Competence , Education, Medical, Undergraduate/standards , Educational Measurement , Health Knowledge, Attitudes, Practice , Humans , Learning , Mentors , Peer Group , Program Evaluation , Residence Characteristics , Time Factors
9.
Environ Sci Technol ; 46(1): 19-26, 2012 Jan 03.
Article in English | MEDLINE | ID: mdl-21776976

ABSTRACT

Soil contamination near munitions plants and testing grounds is a serious environmental concern that can result in the formation of tissue chemical residue in exposed animals. Quantitative prediction of tissue residue still represents a challenging task despite long-term interest and pursuit, as tissue residue formation is the result of many dynamic processes including uptake, transformation, and assimilation. The availability of high-dimensional microarray gene expression data presents a new opportunity for computational predictive modeling of tissue residue from changes in expression profile. Here we analyzed a 240-sample data set with measurements of transcriptomic-wide gene expression and tissue residue of two chemicals, 2,4,6-trinitrotoluene (TNT) and 1,3,5-trinitro-1,3,5-triazacyclohexane (RDX), in the earthworm Eisenia fetida. We applied two different computational approaches, LASSO (Least Absolute Shrinkage and Selection Operator) and RF (Random Forest), to identify predictor genes and built predictive models. Each approach was tested alone and in combination with a prior variable selection procedure that involved the Wilcoxon rank-sum test and HOPACH (Hierarchical Ordered Partitioning And Collapsing Hybrid). Model evaluation results suggest that LASSO was the best performer of minimum complexity on the TNT data set, whereas the combined Wilcoxon-HOPACH-RF approach achieved the highest prediction accuracy on the RDX data set. Our models separately identified two small sets of ca. 30 predictor genes for RDX and TNT. We have demonstrated that both LASSO and RF are powerful tools for quantitative prediction of tissue residue. They also leave more unknown than explained, however, allowing room for improvement with other computational methods and extension to mixture contamination scenarios.


Subject(s)
Explosive Agents/toxicity , Gene Expression Regulation/drug effects , Models, Biological , Oligochaeta/drug effects , Oligochaeta/genetics , Oligonucleotide Array Sequence Analysis , Organ Specificity/genetics , Animals , DNA Probes/metabolism , Databases, Genetic , Environmental Monitoring , Molecular Sequence Annotation , Organ Specificity/drug effects , Reproducibility of Results , Survival Analysis , Toxicity Tests , Triazines/toxicity , Trinitrotoluene/toxicity
10.
Sci Signal ; 4(196): rs10, 2011 Oct 25.
Article in English | MEDLINE | ID: mdl-22028469

ABSTRACT

Characterizing the extent and logic of signaling networks is essential to understanding specificity in such physiological and pathophysiological contexts as cell fate decisions and mechanisms of oncogenesis and resistance to chemotherapy. Cell-based RNA interference (RNAi) screens enable the inference of large numbers of genes that regulate signaling pathways, but these screens cannot provide network structure directly. We describe an integrated network around the canonical receptor tyrosine kinase (RTK)-Ras-extracellular signal-regulated kinase (ERK) signaling pathway, generated by combining parallel genome-wide RNAi screens with protein-protein interaction (PPI) mapping by tandem affinity purification-mass spectrometry. We found that only a small fraction of the total number of PPI or RNAi screen hits was isolated under all conditions tested and that most of these represented the known canonical pathway components, suggesting that much of the core canonical ERK pathway is known. Because most of the newly identified regulators are likely cell type- and RTK-specific, our analysis provides a resource for understanding how output through this clinically relevant pathway is regulated in different contexts. We report in vivo roles for several of the previously unknown regulators, including CG10289 and PpV, the Drosophila orthologs of two components of the serine/threonine-protein phosphatase 6 complex; the Drosophila ortholog of TepIV, a glycophosphatidylinositol-linked protein mutated in human cancers; CG6453, a noncatalytic subunit of glucosidase II; and Rtf1, a histone methyltransferase.


Subject(s)
Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Genomics/methods , MAP Kinase Signaling System , Proteomics/methods , Algorithms , Animals , Blotting, Western , Cell Line , Drosophila/cytology , Drosophila/genetics , Drosophila/metabolism , Extracellular Signal-Regulated MAP Kinases/genetics , Extracellular Signal-Regulated MAP Kinases/metabolism , Gene Regulatory Networks , Immunoprecipitation , Models, Genetic , Protein Binding , Protein Interaction Mapping/methods , RNA Interference , Receptor Protein-Tyrosine Kinases/genetics , Receptor Protein-Tyrosine Kinases/metabolism , Wings, Animal/growth & development , Wings, Animal/metabolism , ras Proteins/genetics , ras Proteins/metabolism
11.
PLoS One ; 6(12): e29095, 2011.
Article in English | MEDLINE | ID: mdl-22216175

ABSTRACT

A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.


Subject(s)
Glycine max/microbiology , Phytophthora/physiology , Genotype , Phenotype , Glycine max/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...