Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
bioRxiv ; 2024 May 04.
Article in English | MEDLINE | ID: mdl-38746274

ABSTRACT

The explosion of sequence data has allowed the rapid growth of protein language models (pLMs). pLMs have now been employed in many frameworks including variant-effect and peptide-specificity prediction. Traditionally, for protein-protein or peptide-protein interactions (PPIs), corresponding sequences are either co-embedded followed by post-hoc integration or the sequences are concatenated prior to embedding. Interestingly, no method utilizes a language representation of the interaction itself. We developed an interaction LM (iLM), which uses a novel language to represent interactions between protein/peptide sequences. Sliding Window Interaction Grammar (SWING) leverages differences in amino acid properties to generate an interaction vocabulary. This vocabulary is the input into a LM followed by a supervised prediction step where the LM's representations are used as features. SWING was first applied to predicting peptide:MHC (pMHC) interactions. SWING was not only successful at generating Class I and Class II models that have comparable prediction to state-of-the-art approaches, but the unique Mixed Class model was also successful at jointly predicting both classes. Further, the SWING model trained only on Class I alleles was predictive for Class II, a complex prediction task not attempted by any existing approach. For de novo data, using only Class I or Class II data, SWING also accurately predicted Class II pMHC interactions in murine models of SLE (MRL/lpr model) and T1D (NOD model), that were validated experimentally. To further evaluate SWING's generalizability, we tested its ability to predict the disruption of specific protein-protein interactions by missense mutations. Although modern methods like AlphaMissense and ESM1b can predict interfaces and variant effects/pathogenicity per mutation, they are unable to predict interaction-specific disruptions. SWING was successful at accurately predicting the impact of both Mendelian mutations and population variants on PPIs. This is the first generalizable approach that can accurately predict interaction-specific disruptions by missense mutations with only sequence information. Overall, SWING is a first-in-class generalizable zero-shot iLM that learns the language of PPIs.

2.
PLoS Comput Biol ; 20(4): e1012032, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38683863

ABSTRACT

Public health decisions must be made about when and how to implement interventions to control an infectious disease epidemic. These decisions should be informed by data on the epidemic as well as current understanding about the transmission dynamics. Such decisions can be posed as statistical questions about scientifically motivated dynamic models. Thus, we encounter the methodological task of building credible, data-informed decisions based on stochastic, partially observed, nonlinear dynamic models. This necessitates addressing the tradeoff between biological fidelity and model simplicity, and the reality of misspecification for models at all levels of complexity. We assess current methodological approaches to these issues via a case study of the 2010-2019 cholera epidemic in Haiti. We consider three dynamic models developed by expert teams to advise on vaccination policies. We evaluate previous methods used for fitting these models, and we demonstrate modified data analysis strategies leading to improved statistical fit. Specifically, we present approaches for diagnosing model misspecification and the consequent development of improved models. Additionally, we demonstrate the utility of recent advances in likelihood maximization for high-dimensional nonlinear dynamic models, enabling likelihood-based inference for spatiotemporal incidence data using this class of models. Our workflow is reproducible and extendable, facilitating future investigations of this disease system.


Subject(s)
Cholera , Haiti/epidemiology , Cholera/epidemiology , Cholera/transmission , Cholera/prevention & control , Humans , Computational Biology/methods , Epidemics/statistics & numerical data , Epidemics/prevention & control , Epidemiological Models , Health Policy , Likelihood Functions , Stochastic Processes , Models, Statistical
3.
Nat Methods ; 21(5): 835-845, 2024 May.
Article in English | MEDLINE | ID: mdl-38374265

ABSTRACT

Modern multiomic technologies can generate deep multiscale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make analyses and integration of high-dimensional omic datasets challenging. Here we present Significant Latent Factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous false discovery rate control. Using SLIDE on single-cell and spatial omic datasets, we uncovered significant interacting latent factors underlying a range of molecular, cellular and organismal phenotypes. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches. More importantly, it provides biological inference beyond prediction that other methods do not afford. Thus, SLIDE is a versatile engine for biological discovery from modern multiomic datasets.


Subject(s)
Machine Learning , Humans , Computational Biology/methods , Animals , Single-Cell Analysis/methods , Algorithms
4.
iScience ; 24(1): 102009, 2021 Jan 22.
Article in English | MEDLINE | ID: mdl-33490917

ABSTRACT

Circadian rhythms regulate adaptive alterations in mammalian physiology and are maximally entrained by the short wavelength blue spectrum; cataracts block the transmission of light, particularly blue light. Cataract surgery is performed with two types of intraocular lenses (IOL): (1) conventional IOL that transmit the entire visible spectrum and (2) blue-light-filtering (BF) IOL that block the short wavelength blue spectrum. We hypothesized that the transmission properties of IOL are associated with long-term survival. This retrospective cohort study of a 15-hospital healthcare system identified 9,108 participants who underwent bilateral cataract surgery; 3,087 were implanted with conventional IOL and 6,021 received BF-IOL. Multivariable Cox proportional hazards models that included several a priori determined subgroup and sensitivity analyses yielded estimates supporting that conventional IOL compared with BF-IOL may be associated with significantly reduced risk of long-term death. Confirming these differences and identifying any potential causal mechanisms await the conduct of appropriately controlled prospective translational trials.

SELECTION OF CITATIONS
SEARCH DETAIL
...