Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Biol Med ; 66: 330-6, 2015 Nov 01.
Article in English | MEDLINE | ID: mdl-26476414

ABSTRACT

Recombinant protein overexpression, an important biotechnological process, is ruled by complex biological rules which are mostly unknown, is in need of an intelligent algorithm so as to avoid resource-intensive lab-based trial and error experiments in order to determine the expression level of the recombinant protein. The purpose of this study is to propose a predictive model to estimate the level of recombinant protein overexpression for the first time in the literature using a machine learning approach based on the sequence, expression vector, and expression host. The expression host was confined to Escherichia coli which is the most popular bacterial host to overexpress recombinant proteins. To provide a handle to the problem, the overexpression level was categorized as low, medium and high. A set of features which were likely to affect the overexpression level was generated based on the known facts (e.g. gene length) and knowledge gathered from related literature. Then, a representative sub-set of features generated in the previous objective was determined using feature selection techniques. Finally a predictive model was developed using random forest classifier which was able to adequately classify the multi-class imbalanced small dataset constructed. The result showed that the predictive model provided a promising accuracy of 80% on average, in estimating the overexpression level of a recombinant protein.


Subject(s)
Escherichia coli/metabolism , Gene Expression Regulation, Bacterial , Machine Learning , Recombinant Proteins/biosynthesis , Algorithms , Artificial Intelligence , Databases, Protein , Gene Expression Profiling/methods , Reproducibility of Results
2.
BMC Bioinformatics ; 15: 134, 2014 May 08.
Article in English | MEDLINE | ID: mdl-24885721

ABSTRACT

BACKGROUND: Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. RESULTS: This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. CONCLUSIONS: This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.


Subject(s)
Artificial Intelligence , Escherichia coli/genetics , Recombinant Proteins/chemistry , Amino Acid Sequence , Escherichia coli/metabolism , Recombinant Proteins/biosynthesis , Solubility
3.
Protein Expr Purif ; 95: 92-5, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24333540

ABSTRACT

UNLABELLED: Recombinant protein production is a significant biotechnological process as it allows researchers to produce a specific protein in desired quantities. Escherichia coli (E. coli) is the most popular heterologous expression host for the production of recombinant proteins due to its advantages such as low cost, high-productivity, well-characterized genetics, simple growth requirements and rapid growth. There are a number of factors that influence the expression level of a recombinant protein in E. coli which are the gene to be expressed, the expression vector, the expression host, and the culture condition. The major motivation to develop our database, EcoliOverExpressionDB, is to provide a means for researchers to quickly locate key factors in the overexpression of certain proteins. Such information would be a useful guide for the overexpression of similar proteins in E. coli. To the best of the present researchers' knowledge, in general and specifically in E. coli, EcoliOverExpressionDB is the first database of recombinant protein expression experiments which gathers the influential parameters on protein overexpression and the results in one place. AVAILABILITY: EcoliOverExpressionDB is freely available and accessible using all major browsers at http://birg4.fbb.utm.my:8080/EcoliOverExpressionDB/.


Subject(s)
Biotechnology , Databases, Factual , Escherichia coli , Recombinant Proteins , Escherichia coli/genetics , Escherichia coli/metabolism , Internet , Recombinant Proteins/genetics , Recombinant Proteins/metabolism
4.
Int J Data Min Bioinform ; 7(4): 397-415, 2013.
Article in English | MEDLINE | ID: mdl-23798224

ABSTRACT

Protein contact map is a simplified representation of a protein's spatial structure. The Committee Machine is a machine learning method that allots the learning task to a number of learners and divides the input space into subspaces. Learners' responses to an input are combined to produce the system's final response, which is more accurate than any single individual's response. In this study, we propose a novel method called CMP_model, for contact map prediction based on the committee machine. The results of the proposed model in comparison with two other models, show considerable gain (an accuracy improvement from 0.05 to 0.15).


Subject(s)
Artificial Intelligence , Protein Interaction Maps , Proteins/chemistry , Binding Sites , Models, Molecular , Protein Conformation , Protein Folding
SELECTION OF CITATIONS
SEARCH DETAIL
...