Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
J Cheminform ; 3: 22, 2011 Jun 27.
Article in English | MEDLINE | ID: mdl-21708012

ABSTRACT

BACKGROUND: We present a machine learning approach to the problem of protein ligand interaction prediction. We focus on a set of binding data obtained from 113 different protein kinases and 20 inhibitors. It was attained through ATP site-dependent binding competition assays and constitutes the first available dataset of this kind. We extract information about the investigated molecules from various data sources to obtain an informative set of features. RESULTS: A Support Vector Machine (SVM) as well as a decision tree algorithm (C5/See5) is used to learn models based on the available features which in turn can be used for the classification of new kinase-inhibitor pair test instances. We evaluate our approach using different feature sets and parameter settings for the employed classifiers. Moreover, the paper introduces a new way of evaluating predictions in such a setting, where different amounts of information about the binding partners can be assumed to be available for training. Results on an external test set are also provided. CONCLUSIONS: In most of the cases, the presented approach clearly outperforms the baseline methods used for comparison. Experimental results indicate that the applied machine learning methods are able to detect a signal in the data and predict binding affinity to some extent. For SVMs, the binding prediction can be improved significantly by using features that describe the active site of a kinase. For C5, besides diversity in the feature set, alignment scores of conserved regions turned out to be very useful.

2.
Mol Inform ; 30(2-3): 205-18, 2011 Mar 14.
Article in English | MEDLINE | ID: mdl-27466774

ABSTRACT

We present a novel (Q)SAR approach that detects groups of structures for local (Q)SAR modeling. The algorithm combines clustering and classification or regression for making predictions on chemical structure data. A clustering procedure producing clusters with shared structural scaffolds is applied as a preprocessing step, before a (local) model is learned for each relevant cluster. Instead of using only one global model (classical approach), we use weighted local models for predictions of query compounds dependent on cluster memberships. The approach is evaluated and compared against standard statistical (Q)SAR algorithms on various datasets. The results show that in many cases the application of local models significantly improves the predictive power of the derived (Q)SAR models compared to the classical approach, to models that are induced by a fingerprint-based or a hierarchical clustering approach and to locally weighted learning.

3.
Bioinformatics ; 27(2): 204-10, 2011 Jan 15.
Article in English | MEDLINE | ID: mdl-21098432

ABSTRACT

MOTIVATION: The slow growth of expert-curated databases compared to experimental databases makes it necessary to build upon highly accurate automated processing pipelines to make the most of the data until curation becomes available. We address this problem in the context of protein structures and their classification into structural and functional classes, more specifically, the structural classification of proteins (SCOP). Structural alignment methods like Vorolign already provide good classification results, but effectively work in a 1-Nearest Neighbor mode. Model-based (in contrast to instance-based) approaches so far have been shown to be of limited values due to small classes arising in such classification schemes. RESULTS: In this article, we describe how kernels defined in terms of Vorolign scores can be used in SVM learning, and explore variants of combined instance-based and model-based learning, up to exclusively model-based learning. Our results suggest that kernels based on Vorolign scores are effective and that model-based learning can yield highly competitive classification results for the prediction of SCOP families. AVAILABILITY: The code is made available at: http://wwwkramer.in.tum.de/research/applications/vorolign-kernel.


Subject(s)
Algorithms , Artificial Intelligence , Protein Structure, Tertiary , Classification/methods , Models, Molecular , Proteins/chemistry , Proteins/classification
4.
J Cheminform ; 2(1): 7, 2010 Aug 31.
Article in English | MEDLINE | ID: mdl-20807436

ABSTRACT

OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.

5.
Article in English | MEDLINE | ID: mdl-20671325

ABSTRACT

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article, we investigate empirically whether this is the case for two such hierarchies. We compare multiclass classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multiclass settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data but not in the case of the protein classification problems. Based on this, we recommend that strong flat multiclass methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.


Subject(s)
Algorithms , Artificial Intelligence , Proteins/chemistry , Proteins/classification , Computing Methodologies , Molecular Sequence Data , Pattern Recognition, Automated , Protein Folding
SELECTION OF CITATIONS
SEARCH DETAIL
...