Search | VHL Regional Portal

CalCleaveMKL: a Tool for Calpain Cleavage Prediction.

duVerle, David A; Mamitsuka, Hiroshi.

Methods Mol Biol ; 1915: 121-147, 2019.

Article in English | MEDLINE | ID: mdl-30617801

ABSTRACT

Calpain, an intracellular Ca2+-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown.Current sequencing technologies have made it possible to compile large amounts of cleavage data and brought greater understanding of the underlying protein interactions. However, the practical impossibility of exhaustively retrieving substrate sequences through experimentation alone has created the need for efficient computational prediction methods. Such methods must be able to quickly mark substrate candidates and putative cleavage sites for further analysis. While many methods exist for both calpain and other types of proteolytic actions, the expected reliability of these methods depends heavily on the type and complexity of proteolytic action, as well as the availability of well-labeled experimental datasets, which both vary greatly across enzyme families.This chapter introduces CalCleaveMKL: a tool for calpain cleavage prediction based on multiple kernel learning, an extension to the classic support vector machine framework that is able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality. Along with its improved accuracy, the method used by CalCleaveMKL provided numerous insights on the respective importance of sequence-related features, such as solvent accessibility and secondary structure. It notably demonstrated there existed significant specificity differences across calpain subtypes, despite previous assumption to the contrary.An online implementation of this prediction tool is available at http://calpain.org .

Subject(s)

Calpain/chemistry , Computational Biology/methods , Software , Algorithms , Binding Sites , Caspases/chemistry , Protein Structure, Secondary , Proteolysis , Substrate Specificity , Support Vector Machine

SiBIC: A Tool for Generating a Network of Biclusters Captured by Maximal Frequent Itemset Mining.

Takahashi, Kei-Ichiro; duVerle, David A; Yotsukura, Sohiya; Takigawa, Ichigaku; Mamitsuka, Hiroshi.

Methods Mol Biol ; 1807: 95-111, 2018.

Article in English | MEDLINE | ID: mdl-30030806

ABSTRACT

Biclustering extracts coexpressed genes under certain experimental conditions, providing more precise insight into the genetic behaviors than one-dimensional clustering. For understanding the biological features of genes in a single bicluster, visualizations such as heatmaps or parallel coordinate plots and tools for enrichment analysis are widely used. However, simultaneously handling many biclusters still remains a challenge. Thus, we developed a web service named SiBIC, which, using maximal frequent itemset mining, exhaustively discovers significant biclusters, which turn into networks of overlapping biclusters, where nodes are gene sets and edges show their overlaps in the detected biclusters. SiBIC provides a graphical user interface for manipulating a gene set network, where users can find target gene sets based on the enriched network. This chapter provides a user guide/instruction of SiBIC with background of having developed this software. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/sibic/faces/index.jsp .

Subject(s)

Data Mining/methods , Software , Cluster Analysis , Gene Expression Regulation , Gene Regulatory Networks , Internet

CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data.

duVerle, David A; Yotsukura, Sohiya; Nomura, Seitaro; Aburatani, Hiroyuki; Tsuda, Koji.

BMC Bioinformatics ; 17(1): 363, 2016 Sep 13.

Article in English | MEDLINE | ID: mdl-27620863

ABSTRACT

BACKGROUND: Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. RESULTS: Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. CONCLUSIONS: With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .

Subject(s)

RNA/genetics , Sequence Analysis, RNA/methods , Stem Cells/immunology , Cell Differentiation , Humans

Discovering combinatorial interactions in survival data.

Duverle, David A; Takeuchi, Ichiro; Murakami-Tonami, Yuko; Kadomatsu, Kenji; Tsuda, Koji.

Bioinformatics ; 29(23): 3053-9, 2013 Dec 01.

Article in English | MEDLINE | ID: mdl-24037215

ABSTRACT

MOTIVATION: Although several methods exist to relate high-dimensional gene expression data to various clinical phenotypes, finding combinations of features in such input remains a challenge, particularly when fitting complex statistical models such as those used for survival studies. RESULTS: Our proposed method builds on existing 'regularization path-following' techniques to produce regression models that can extract arbitrarily complex patterns of input features (such as gene combinations) from large-scale data that relate to a known clinical outcome. Through the use of the data's structure and itemset mining techniques, we are able to avoid combinatorial complexity issues typically encountered with such methods, and our algorithm performs in similar orders of duration as single-variable versions. Applied to data from various clinical studies of cancer patient survival time, our method was able to produce a number of promising gene-interaction candidates whose tumour-related roles appear confirmed by literature.

Subject(s)

Breast Neoplasms/mortality , Computational Biology/methods , Gene Regulatory Networks , Neoplasm Proteins/genetics , Neuroblastoma/mortality , Algorithms , Breast Neoplasms/genetics , Female , Gene Expression Profiling , Humans , Likelihood Functions , Logistic Models , Models, Biological , Neuroblastoma/genetics , Proportional Hazards Models , Risk Factors , Survival Rate

A review of statistical methods for prediction of proteolytic cleavage.

duVerle, David A; Mamitsuka, Hiroshi.

Brief Bioinform ; 13(3): 337-49, 2012 May.

Article in English | MEDLINE | ID: mdl-22138323

ABSTRACT

A fundamental component of systems biology, proteolytic cleavage is involved in nearly all aspects of cellular activities: from gene regulation to cell lifecycle regulation. Current sequencing technologies have made it possible to compile large amount of cleavage data and brought greater understanding of the underlying protein interactions. However, the practical impossibility to exhaustively retrieve substrate sequences through experimentation alone has long highlighted the need for efficient computational prediction methods. Such methods must be able to quickly mark substrate candidates and putative cleavage sites for further analysis. Available methods and expected reliability depend heavily on the type and complexity of proteolytic action, as well as the availability of well-labelled experimental data sets: factors varying greatly across enzyme families. For this review, we chose to give a quick overview of the general issues and challenges in cleavage prediction methods followed by a more in-depth presentation of major techniques and implementations, with a focus on two particular families of cysteine proteases: caspases and calpains. Through their respective differences in proteolytic specificity (high for caspases, broader for calpains) and data availability (much lower for calpains), we aimed to illustrate the strengths and limitations of techniques ranging from position-based matrices and decision trees to more flexible machine-learning methods such as hidden Markov models and Support Vector Machines. In addition to a technical overview for each family of algorithms, we tried to provide elements of evaluation and performance comparison across methods.

Subject(s)

Mathematical Computing , Algorithms , Binding Sites , Calpain/metabolism , Caspases/metabolism , Markov Chains , Proteolysis , Support Vector Machine

Calpain cleavage prediction using multiple kernel learning.

DuVerle, David A; Ono, Yasuko; Sorimachi, Hiroyuki; Mamitsuka, Hiroshi.

PLoS One ; 6(5): e19035, 2011 May 03.

Article in English | MEDLINE | ID: mdl-21559271

ABSTRACT

Calpain, an intracellular Ca²âº-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.

Subject(s)

Calpain/chemistry , Algorithms , Area Under Curve , Artificial Intelligence , Binding Sites , Biochemistry/methods , Calcium/chemistry , Calcium-Binding Proteins/chemistry , Computational Biology/methods , Humans , Models, Statistical , Normal Distribution , Protein Binding , Protein Structure, Secondary , Protein Structure, Tertiary , Reproducibility of Results , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL