Search | VHL Regional Portal

1.

TSpred: a robust prediction framework for TCR-epitope interactions using paired chain TCR sequence data.

Kim, Ha Young; Kim, Sungsik; Park, Woong-Yang; Kim, Dongsup.

Bioinformatics ; 2024 Jul 25.

Article in English | MEDLINE | ID: mdl-39052940

ABSTRACT

MOTIVATION: Prediction of T-cell receptor (TCR)-epitope interactions is important for many applications in biomedical research, such as cancer immunotherapy and vaccine design. The prediction of TCR-epitope interactions remains challenging especially for novel epitopes, due to the scarcity of available data. RESULTS: We propose TSpred, a new deep learning approach for the pan-specific prediction of TCR binding specificity based on paired chain TCR data. We develop a robust model that generalizes well to unseen epitopes by combining the predictive power of CNN and the attention mechanism. In particular, we design a reciprocal attention mechanism which focuses on extracting the patterns underlying TCR-epitope interactions. Upon a comprehensive evaluation of our model, we find that TSpred achieves state-of-the-art performances in both seen and unseen epitope specificity prediction tasks. Also, compared to other predictors, TSpred is more robust to bias related to peptide imbalance in the dataset. In addition, the reciprocal attention component of our model allows for model interpretability by capturing structurally important binding regions. Results indicate that TSpred is a robust and reliable method for the task of TCR-epitope binding prediction. AVAILABILITY: Source code is available at https://github.com/ha01994/TSpred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.

AbFlex: designing antibody complementarity determining regions with flexible CDR definition.

Jeon, Woosung; Kim, Dongsup.

Bioinformatics ; 40(3)2024 Mar 04.

Article in English | MEDLINE | ID: mdl-38449295

ABSTRACT

MOTIVATION: Antibodies are proteins that the immune system produces in response to foreign pathogens. Designing antibodies that specifically bind to antigens is a key step in developing antibody therapeutics. The complementarity determining regions (CDRs) of the antibody are mainly responsible for binding to the target antigen, and therefore must be designed to recognize the antigen. RESULTS: We develop an antibody design model, AbFlex, that exhibits state-of-the-art performance in terms of structure prediction accuracy and amino acid recovery rate. Furthermore, >38% of newly designed antibody models are estimated to have better binding energies for their antigens than wild types. The effectiveness of the model is attributed to two different strategies that are developed to overcome the difficulty associated with the scarcity of antibody-antigen complex structure data. One strategy is to use an equivariant graph neural network model that is more data-efficient. More importantly, a new data augmentation strategy based on the flexible definition of CDRs significantly increases the performance of the CDR prediction model. AVAILABILITY AND IMPLEMENTATION: The source code and implementation are available at https://github.com/wsjeon92/AbFlex.

Subject(s)

Antigen-Antibody Complex , Complementarity Determining Regions , Complementarity Determining Regions/chemistry , Complementarity Determining Regions/metabolism , Amino Acid Sequence , Models, Molecular , Antigen-Antibody Complex/chemistry , Antigens

3.

Applying network link prediction in drug discovery: an overview of the literature.

Son, Jeongtae; Kim, Dongsup.

Expert Opin Drug Discov ; 19(1): 43-56, 2024.

Article in English | MEDLINE | ID: mdl-37794688

ABSTRACT

INTRODUCTION: Network representation can give a holistic view of relationships for biomedical entities through network topology. Link prediction estimates the probability of link formation between the pair of unconnected nodes. In the drug discovery process, the link prediction method not only enables the detection of connectivity patterns but also predicts the effects of one biomedical entity to multiple entities simultaneously and vice versa, which is useful for many applications. AREAS COVERED: The authors provide a comprehensive overview of network link prediction in drug discovery. Link prediction methodologies such as similarity-based approaches, embedding-based approaches, probabilistic model-based approaches, and preprocessing methods are summarized with examples. In addition to describing their properties and limitations, the authors discuss the applications of link prediction in drug discovery based on the relationship between biomedical concepts. EXPERT OPINION: Link prediction is a powerful method to infer the existence of novel relationships in drug discovery. However, link prediction has been hampered by the sparsity of data and the lack of negative links in biomedical networks. With preprocessing to balance positive and negative samples and the collection of more data, the authors believe it is possible to develop more reliable link prediction methods that can become invaluable tools for successful drug discovery.

Subject(s)

Drug Discovery , Models, Statistical , Humans , Drug Discovery/methods

4.

G-RANK: an equivariant graph neural network for the scoring of protein-protein docking models.

Kim, Ha Young; Kim, Sungsik; Park, Woong-Yang; Kim, Dongsup.

Bioinform Adv ; 3(1): vbad011, 2023.

Article in English | MEDLINE | ID: mdl-36818727

ABSTRACT

Motivation: Protein complex structure prediction is important for many applications in bioengineering. A widely used method for predicting the structure of protein complexes is computational docking. Although many tools for scoring protein-protein docking models have been developed, it is still a challenge to accurately identify near-native models for unknown protein complexes. A recently proposed model called the geometric vector perceptron-graph neural network (GVP-GNN), a subtype of equivariant graph neural networks, has demonstrated success in various 3D molecular structure modeling tasks. Results: Herein, we present G-RANK, a GVP-GNN-based method for the scoring of protein-protein docking models. When evaluated on two different test datasets, G-RANK achieved a performance competitive with or better than the state-of-the-art scoring functions. We expect G-RANK to be a useful tool for various applications in biological engineering. Availability and implementation: Source code is available at https://github.com/ha01994/grank. Contact: kds@kaist.ac.kr. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

5.

Prediction of drug-target interactions through multi-task learning.

Moon, Chaeyoung; Kim, Dongsup.

Sci Rep ; 12(1): 18323, 2022 10 31.

Article in English | MEDLINE | ID: mdl-36316405

ABSTRACT

Identifying the binding between the target proteins and molecules is essential in drug discovery. The multi-task learning method has been introduced to facilitate knowledge sharing among tasks when the amount of information for each task is small. However, multi-task learning sometimes worsens the overall performance or generates a trade-off between individual task's performance. In this study, we propose a general multi-task learning scheme that not only increases the average performance but also minimizes individual performance degradation, through group selection and knowledge distillation. The groups are selected on the basis of chemical similarity between ligand sets of targets, and the similar targets in the same groups are trained together. During training, we apply knowledge distillation with teacher annealing. The multi-task learning models are guided by the predictions of the single-task learning models. This method results in higher average performance than that from single-task learning and classic multi-task learning. Further analysis reveals that multi-task learning is particularly effective for low performance tasks, and knowledge distillation helps the model avoid the degradation in individual task performance in multi-task learning.

Subject(s)

Learning , Machine Learning , Drug Discovery , Ligands , Proteins

6.

DeepLUCIA: predicting tissue-specific chromatin loops using Deep Learning-based Universal Chromatin Interaction Annotator.

Yang, Dongchan; Chung, Taesu; Kim, Dongsup.

Bioinformatics ; 38(14): 3501-3512, 2022 07 11.

Article in English | MEDLINE | ID: mdl-35640981

ABSTRACT

MOTIVATION: The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called Deep Learning-based Universal Chromatin Interaction Annotator (DeepLUCIA). RESULTS: Although DeepLUCIA does not use TF binding profile data which previous TF binding-dependent methods critically rely on, its prediction accuracies are comparable to those of the previous TF binding-dependent methods. More importantly, DeepLUCIA enables the tissue-specific chromatin loop predictions from tissue-specific epigenomes that cannot be handled by genomic variation-based approach. We demonstrated the utility of the DeepLUCIA by predicting several novel target genes of SNPs identified in genome-wide association studies targeting Brugada syndrome, COVID-19 severity and age-related macular degeneration. Availability and implementation DeepLUCIA is freely available at https://github.com/bcbl-kaist/DeepLUCIA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

COVID-19 , Deep Learning , Humans , Chromatin , Genome-Wide Association Study , Genomics/methods

7.

Brain physiome: A concept bridging in vitro 3D brain models and in silico models for predicting drug toxicity in the brain.

Seo, Yoojin; Bang, Seokyoung; Son, Jeongtae; Kim, Dongsup; Jeong, Yong; Kim, Pilnam; Yang, Jihun; Eom, Joon-Ho; Choi, Nakwon; Kim, Hong Nam.

Bioact Mater ; 13: 135-148, 2022 Jul.

Article in English | MEDLINE | ID: mdl-35224297

ABSTRACT

In the last few decades, adverse reactions to pharmaceuticals have been evaluated using 2D in vitro models and animal models. However, with increasing computational power, and as the key drivers of cellular behavior have been identified, in silico models have emerged. These models are time-efficient and cost-effective, but the prediction of adverse reactions to unknown drugs using these models requires relevant experimental input. Accordingly, the physiome concept has emerged to bridge experimental datasets with in silico models. The brain physiome describes the systemic interactions of its components, which are organized into a multilevel hierarchy. Because of the limitations in obtaining experimental data corresponding to each physiome component from 2D in vitro models and animal models, 3D in vitro brain models, including brain organoids and brain-on-a-chip, have been developed. In this review, we present the concept of the brain physiome and its hierarchical organization, including cell- and tissue-level organizations. We also summarize recently developed 3D in vitro brain models and link them with the elements of the brain physiome as a guideline for dataset collection. The connection between in vitro 3D brain models and in silico modeling will lead to the establishment of cost-effective and time-efficient in silico models for the prediction of the safety of unknown drugs.

8.

Rising of LOXHD1 as a signature causative gene of down-sloping hearing loss in people in their teens and 20s.

Kim, Bong Jik; Jeon, Hyoung Won; Jeon, Woosung; Han, Jin Hee; Oh, Jayoung; Yi, Nayoung; Kim, Min Young; Kim, Minah; Kim, Justin Namju; Kim, Bo Hye; Hyon, Joon Young; Kim, Dongsup; Koo, Ja-Won; Oh, Doo-Yi; Choi, Byung Yoon.

J Med Genet ; 59(5): 470-480, 2022 05.

Article in English | MEDLINE | ID: mdl-33753533

ABSTRACT

BACKGROUND: Down-sloping sensorineural hearing loss (SNHL) in people in their teens and 20s hampers efficient learning and communication and in-depth social interactions. Nonetheless, its aetiology remains largely unclear, with the exception of some potential causative genes, none of which stands out especially in people in their teens and 20s. Here, we examined the role and genotype-phenotype correlation of lipoxygenase homology domain 1 (LOXHD1) in down-sloping SNHL through a cohort study. METHODS: Based on the Seoul National University Bundang Hospital (SNUBH) genetic deafness cohort, in which the patients show varying degrees of deafness and different onset ages (n=1055), we have established the 'SNUBH Teenager-Young Adult Down-sloping SNHL' cohort (10-35 years old) (n=47), all of whom underwent exome sequencing. Three-dimensional molecular modelling, minigene splicing assay and short tandem repeat marker genotyping were performed, and medical records were reviewed. RESULTS: LOXHD1 accounted for 33.3% of all genetically diagnosed cases of down-sloping SNHL (n=18) and 12.8% of cases in the whole down-sloping SNHL cohort (n=47) of young adults. We identified a potential common founder allele, as well as an interesting genotype-phenotype correlation. We also showed that transcript 6 is necessary and probably sufficient for normal hearing. CONCLUSIONS: LOXHD1 exceeds other genes in its contribution to down-sloping SNHL in young adults, rising as a signature causative gene, and shows a potential but interesting genotype-phenotype correlation.

Subject(s)

Deafness , Hearing Loss, Sensorineural , Hearing Loss , Adolescent , Adult , Carrier Proteins/genetics , Cohort Studies , Hearing Loss, Sensorineural/diagnosis , Hearing Loss, Sensorineural/epidemiology , Hearing Loss, Sensorineural/genetics , Humans , Lipoxygenase , Young Adult

9.

An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks.

Kim, Ha Young; Jeon, Woosung; Kim, Dongsup.

Sci Rep ; 11(1): 19127, 2021 09 27.

Article in English | MEDLINE | ID: mdl-34580383

ABSTRACT

The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr . To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.

10.

Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities.

Son, Jeongtae; Kim, Dongsup.

PLoS One ; 16(4): e0249404, 2021.

Article in English | MEDLINE | ID: mdl-33831016

ABSTRACT

Prediction of protein-ligand interactions is a critical step during the initial phase of drug discovery. We propose a novel deep-learning-based prediction model based on a graph convolutional neural network, named GraphBAR, for protein-ligand binding affinity. Graph convolutional neural networks reduce the computational time and resources that are normally required by the traditional convolutional neural network models. In this technique, the structure of a protein-ligand complex is represented as a graph of multiple adjacency matrices whose entries are affected by distances, and a feature matrix that describes the molecular properties of the atoms. We evaluated the predictive power of GraphBAR for protein-ligand binding affinities by using PDBbind datasets and proved the efficiency of the graph convolution. Given the computational efficiency of graph convolutional neural networks, we also performed data augmentation to improve the model performance. We found that data augmentation with docking simulation data could improve the prediction accuracy although the improvement seems not to be significant. The high prediction performance and speed of GraphBAR suggest that such networks can serve as valuable tools in drug discovery.

Subject(s)

Computer Graphics , Neural Networks, Computer , Proteins/metabolism , Ligands , Protein Binding

11.

Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors.

Jeon, Woosung; Kim, Dongsup.

Sci Rep ; 10(1): 22104, 2020 12 16.

Article in English | MEDLINE | ID: mdl-33328504

ABSTRACT

We developed a computational method named Molecule Optimization by Reinforcement Learning and Docking (MORLD) that automatically generates and optimizes lead compounds by combining reinforcement learning and docking to develop predicted novel inhibitors. This model requires only a target protein structure and directly modifies ligand structures to obtain higher predicted binding affinity for the target protein without any other training data. Using MORLD, we were able to generate potential novel inhibitors against discoidin domain receptor 1 kinase (DDR1) in less than 2 days on a moderate computer. We also demonstrated MORLD's ability to generate predicted novel agonists for the D4 dopamine receptor (D4DR) from scratch without virtual screening on an ultra large compound library. The free web server is available at http://morld.kaist.ac.kr .

12.

CRDS: Consensus Reverse Docking System for target fishing.

Lee, Aeri; Kim, Dongsup.

Bioinformatics ; 36(3): 959-960, 2020 02 01.

Article in English | MEDLINE | ID: mdl-31432077

ABSTRACT

MOTIVATION: Identification of putative drug targets is a critical step for explaining the mechanism of drug action against multiple targets, finding new therapeutic indications for existing drugs and unveiling the adverse drug reactions. One important approach is to use the molecular docking. However, its widespread utilization has been hindered by the lack of easy-to-use public servers. Therefore, it is vital to develop a streamlined computational tool for target prediction by molecular docking on a large scale. RESULTS: We present a fully automated web tool named Consensus Reverse Docking System (CRDS), which predicts potential interaction sites for a given drug. To improve hit rates, we developed a strategy of consensus scoring. CRDS carries out reverse docking against 5254 candidate protein structures using three different scoring functions (GoldScore, Vina and LeDock from GOLD version 5.7.1, AutoDock Vina version 1.1.2 and LeDock version 1.0, respectively), and those scores are combined into a single score named Consensus Docking Score (CDS). The web server provides the list of top 50 predicted interaction sites, docking conformations, 10 most significant pathways and the distribution of consensus scores. AVAILABILITY AND IMPLEMENTATION: The web server is available at http://pbil.kaist.ac.kr/CRDS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Computers , Proteins , Consensus , Ligands , Molecular Conformation , Molecular Docking Simulation

13.

Prediction of mutation effects using a deep temporal convolutional network.

Kim, Ha Young; Kim, Dongsup.

Bioinformatics ; 36(7): 2047-2052, 2020 04 01.

Article in English | MEDLINE | ID: mdl-31746978

ABSTRACT

MOTIVATION: Accurate prediction of the effects of genetic variation is a major goal in biological research. Towards this goal, numerous machine learning models have been developed to learn information from evolutionary sequence data. The most effective method so far is a deep generative model based on the variational autoencoder (VAE) that models the distributions using a latent variable. In this study, we propose a deep autoregressive generative model named mutationTCN, which employs dilated causal convolutions and attention mechanism for the modeling of inter-residue correlations in a biological sequence. RESULTS: We show that this model is competitive with the VAE model when tested against a set of 42 high-throughput mutation scan experiments, with the mean improvement in Spearman rank correlation â¼0.023. In particular, our model can more efficiently capture information from multiple sequence alignments with lower effective number of sequences, such as in viral sequence families, compared with the latent variable model. Also, we extend this architecture to a semi-supervised learning framework, which shows high prediction accuracy. We show that our model enables a direct optimization of the data likelihood and allows for a simple and stable training process. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/ha01994/mutationTCN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Machine Learning , Neural Networks, Computer , Mutation , Sequence Alignment , Software

14.

In-Silico Molecular Binding Prediction for Human Drug Targets Using Deep Neural Multi-Task Learning.

Lee, Kyoungyeul; Kim, Dongsup.

Genes (Basel) ; 10(11)2019 11 07.

Article in English | MEDLINE | ID: mdl-31703452

ABSTRACT

In in-silico prediction for molecular binding of human genomes, promising results have been demonstrated by deep neural multi-task learning due to its strength in training tasks with imbalanced data and its ability to avoid over-fitting. Although the interrelation between tasks is known to be important for successful multi-task learning, its adverse effect has been underestimated. In this study, we used molecular interaction data of human targets from ChEMBL to train and test various multi-task and single-task networks and examined the effectiveness of multi-task learning for different compositions of targets. Targets were clustered based on sequence similarity in their binding domains and various target sets from clusters were chosen. By comparing the performance of deep neural architectures for each target set, we found that similarity within a target set is highly important for reliable multi-task learning. For a diverse target set or overall human targets, the performance of multi-task learning was lower than single-task learning, but outperformed single-task for the target set containing similar targets. From this insight, we developed Multiple Partial Multi-Task learning, which is suitable for binding prediction for human drug targets.

Subject(s)

Deep Learning , Drug Discovery/methods , Small Molecule Libraries/pharmacology , Databases, Chemical , Humans , Molecular Docking Simulation/methods , Protein Binding , Small Molecule Libraries/chemistry

15.

Optimization of a microarray for fission yeast.

Kim, Dong-Uk; Lee, Minho; Han, Sangjo; Nam, Miyoung; Lee, Sol; Lee, Jaewoong; Woo, Jihye; Kim, Dongsup; Hoe, Kwang-Lae.

Genomics Inform ; 17(3): e28, 2019 Sep.

Article in English | MEDLINE | ID: mdl-31610624

ABSTRACT

Bar-code (tag) microarrays of yeast gene-deletion collections facilitate the systematic identification of genes required for growth in any condition of interest. Anti-sense strands of amplified bar-codes hybridize with ~10,000 (5,000 each for up- and down-tags) different kinds of sense-strand probes on an array. In this study, we optimized the hybridization processes of an array for fission yeast. Compared to the first version of the array (11 µm, 100K) consisting of three sectors with probe pairs (perfect match and mismatch), the second version (11 µm, 48K) could represent ~10,000 up-/down-tags in quadruplicate along with 1,508 negative controls in quadruplicate and a single set of 1,000 unique negative controls at random dispersed positions without mismatch pairs. For PCR, the optimal annealing temperature (maximizing yield and minimizing extra bands) was 58°C for both tags. Intriguingly, up-tags required 3ï´ higher amounts of blocking oligonucleotides than down-tags. A 1:1 mix ratio between up- and down-tags was satisfactory. A lower temperature (25°C) was optimal for cultivation instead of a normal temperature (30°C) because of extra temperature-sensitive mutants in a subset of the deletion library. Activation of frozen pooled cells for >1 day showed better resolution of intensity than no activation. A tag intensity analysis showed that tag(s) of 4,316 of the 4,526 strains tested were represented at least once; 3,706 strains were represented by both tags, 4,072 strains by up-tags only, and 3,950 strains by down-tags only. The results indicate that this microarray will be a powerful analytical platform for elucidating currently unknown gene functions.

16.

A compendium of promoter-centered long-range chromatin interactions in the human genome.

Jung, Inkyung; Schmitt, Anthony; Diao, Yarui; Lee, Andrew J; Liu, Tristin; Yang, Dongchan; Tan, Catherine; Eom, Junghyun; Chan, Marilynn; Chee, Sora; Chiang, Zachary; Kim, Changyoun; Masliah, Eliezer; Barr, Cathy L; Li, Bin; Kuan, Samantha; Kim, Dongsup; Ren, Bing.

Nat Genet ; 51(10): 1442-1449, 2019 10.

Article in English | MEDLINE | ID: mdl-31501517

ABSTRACT

A large number of putative cis-regulatory sequences have been annotated in the human genome, but the genes they control remain poorly defined. To bridge this gap, we generate maps of long-range chromatin interactions centered on 18,943 well-annotated promoters for protein-coding genes in 27 human cell/tissue types. We use this information to infer the target genes of 70,329 candidate regulatory elements and suggest potential regulatory function for 27,325 noncoding sequence variants associated with 2,117 physiological traits and diseases. Integrative analysis of these promoter-centered interactome maps reveals widespread enhancer-like promoters involved in gene regulation and common molecular pathways underlying distinct groups of human traits and diseases.

Subject(s)

Chromatin/metabolism , Gene Expression Regulation , Genome, Human , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Chromatin/genetics , Genomics , Humans , Transcription Factors/genetics

17.

Global Analysis of Intercellular Homeodomain Protein Transfer.

Lee, Eun Jung; Kim, Namsuk; Park, Jun Woo; Kang, Kyung Hwa; Kim, Woo-Il; Sim, Nam Suk; Jeong, Chan-Seok; Blackshaw, Seth; Vidal, Marc; Huh, Sung-Oh; Kim, Dongsup; Lee, Jeong Ho; Kim, Jin Woo.

Cell Rep ; 28(3): 712-722.e3, 2019 07 16.

Article in English | MEDLINE | ID: mdl-31315049

ABSTRACT

The homeodomain is found in hundreds of transcription factors that play roles in fate determination via cell-autonomous regulation of gene expression. However, some homeodomain-containing proteins (HPs) are thought to be secreted and penetrate neighboring cells to affect the recipient cell fate. To determine whether this is a general characteristic of HPs, we carried out a large-scale validation for intercellular transfer of HPs. Our screening reveals that intercellular transfer is a general feature of HPs, but it occurs in a cell-context-sensitive manner. We also found the secretion is not solely a function of the homeodomain, but it is supported by external motifs containing hydrophobic residues. Thus, mutations of hydrophobic residues of HPs abrogate secretion and consequently interfere with HP function in recipient cells. Collectively, our study proposes that HP transfer is an intercellular communication method that couples the functions of interacting cells.

Subject(s)

Cell Communication/genetics , Homeodomain Proteins/metabolism , Protein Transport/genetics , Amino Acid Motifs/genetics , Animals , Brain/embryology , Brain/metabolism , Cell Line , Female , High-Throughput Screening Assays , Homeodomain Proteins/genetics , Humans , Hydrophobic and Hydrophilic Interactions , Mice , Mice, Inbred C57BL , Mice, Knockout , Mutation , Pregnancy , Retina/metabolism

18.

FP2VEC: a new molecular featurizer for learning molecular properties.

Jeon, Woosung; Kim, Dongsup.

Bioinformatics ; 35(23): 4979-4985, 2019 12 01.

Article in English | MEDLINE | ID: mdl-31070725

ABSTRACT

MOTIVATION: One of the most successful methods for predicting the properties of chemical compounds is the quantitative structure-activity relationship (QSAR) methods. The prediction accuracy of QSAR models has recently been greatly improved by employing deep learning technology. Especially, newly developed molecular featurizers based on graph convolution operations on molecular graphs significantly outperform the conventional extended connectivity fingerprints (ECFP) feature in both classification and regression tasks, indicating that it is critical to develop more effective new featurizers to fully realize the power of deep learning techniques. Motivated by the fact that there is a clear analogy between chemical compounds and natural languages, this work develops a new molecular featurizer, FP2VEC, which represents a chemical compound as a set of trainable embedding vectors. RESULTS: To implement and test our new featurizer, we build a QSAR model using a simple convolutional neural network (CNN) architecture that has been successfully used for natural language processing tasks such as sentence classification task. By testing our new method on several benchmark datasets, we demonstrate that the combination of FP2VEC and CNN model can achieve competitive results in many QSAR tasks, especially in classification tasks. We also demonstrate that the FP2VEC model is especially effective for multitask learning. AVAILABILITY AND IMPLEMENTATION: FP2VEC is available from https://github.com/wsjeon92/FP2VEC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Deep Learning , Natural Language Processing , Quantitative Structure-Activity Relationship , Software

19.

Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network.

Chung, Taesu; Kim, Dongsup.

PLoS One ; 14(4): e0216257, 2019.

Article in English | MEDLINE | ID: mdl-31026297

ABSTRACT

RNA-binding proteins (RBPs) are important in gene expression regulations by post-transcriptional control of RNAs and immune system development and its function. Due to the help of sequencing technology, numerous RNA sequences are newly discovered without knowing their binding partner RBPs. Therefore, demands for accurate prediction method for RBP binding sites are increasing. There are many attempts for RBP binding site predictions using various machine-learning techniques combined with various RNA features. In this work, we present a new deep convolution neural network model trained on CLIP-seq datasets using multi-sized filters and multi-modal features to predict the binding property of RBPs. With this model, we integrated sequence and structure information to extract sequence motifs, structure motifs, and combined motifs at the same time. The RBP binding site prediction on RBP-24 dataset was compared with two multi-modal methods, GraphProt and Deepnet-rbp, using area under curve (AUC) of receiver-operating characteristics (ROC). Our method (average AUC = 0.920) outperformed 20 RBPs with GraphProt (average AUC = 0.888) and 15 RBP with Deepnet-rbp (average AUC = 0.902). The improvement was achieved by using multi-sized convolution filters, where average relative error reduction was 17%. By introducing new RNA structure representation, structure probability matrix, average relative error was reduced by 3% when compared to one-hot encoded secondary structure representation. Interestingly, structure probability matrix was more effective on ALKBH5, where relative error reduction was 30%. We developed new sequence motif enrichment method, which we stated as response enrichment method. We successfully enriched sequence motif for 12 RBPs, which had high resemblance with other literature evidences, RBPgroup and CISBP-RNA. Finally by analyzing these results altogether, we found intricate interplay between sequence motif and structure motif, which agreed with other researches.

Subject(s)

Deep Learning , Neural Networks, Computer , Area Under Curve , Base Sequence , Nucleotide Motifs/genetics , Protein Binding

20.

Hypomorphic Mutations in TONSL Cause SPONASTRIME Dysplasia.

Chang, Hae Ryung; Cho, Sung Yoon; Lee, Jae Hoon; Lee, Eunkyung; Seo, Jieun; Lee, Hye Ran; Cavalcanti, Denise P; Mäkitie, Outi; Valta, Helena; Girisha, Katta M; Lee, Chung; Neethukrishna, Kausthubham; Bhavani, Gandham S; Shukla, Anju; Nampoothiri, Sheela; Phadke, Shubha R; Park, Mi Jung; Ikegawa, Shiro; Wang, Zheng; Higgs, Martin R; Stewart, Grant S; Jung, Eunyoung; Lee, Myeong-Sok; Park, Jong Hoon; Lee, Eun A; Kim, Hongtae; Myung, Kyungjae; Jeon, Woosung; Lee, Kyoungyeul; Kim, Dongsup; Kim, Ok-Hwa; Choi, Murim; Lee, Han-Woong; Kim, Yonghwan; Cho, Tae-Joon.

Am J Hum Genet ; 104(3): 439-453, 2019 03 07.

Article in English | MEDLINE | ID: mdl-30773278

ABSTRACT

SPONASTRIME dysplasia is a rare, recessive skeletal dysplasia characterized by short stature, facial dysmorphism, and aberrant radiographic findings of the spine and long bone metaphysis. No causative genetic alterations for SPONASTRIME dysplasia have yet been determined. Using whole-exome sequencing (WES), we identified bi-allelic TONSL mutations in 10 of 13 individuals with SPONASTRIME dysplasia. TONSL is a multi-domain scaffold protein that interacts with DNA replication and repair factors and which plays critical roles in resistance to replication stress and the maintenance of genome integrity. We show here that cellular defects in dermal fibroblasts from affected individuals are complemented by the expression of wild-type TONSL. In addition, in vitro cell-based assays and in silico analyses of TONSL structure support the pathogenicity of those TONSL variants. Intriguingly, a knock-in (KI) Tonsl mouse model leads to embryonic lethality, implying the physiological importance of TONSL. Overall, these findings indicate that genetic variants resulting in reduced function of TONSL cause SPONASTRIME dysplasia and highlight the importance of TONSL in embryonic development and postnatal growth.

Subject(s)

Fibroblasts/pathology , Genes, Lethal , Mutation , NF-kappa B/genetics , Osteochondrodysplasias/pathology , Adolescent , Adult , Animals , Cells, Cultured , Child , Child, Preschool , DNA Damage , Dermis/metabolism , Dermis/pathology , Female , Fibroblasts/metabolism , Humans , Infant , Infant, Newborn , Mice , Mice, Inbred C57BL , Osteochondrodysplasias/genetics , Exome Sequencing/methods , Young Adult

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL