Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Phys Chem Chem Phys ; 26(10): 8380-8389, 2024 Mar 06.
Article in English | MEDLINE | ID: mdl-38404232

ABSTRACT

The quest for high-performance solar cell absorbers has garnered significant attention in the field of photovoltaic research in recent years. To overcome the Shockley-Queisser (SQ) limit of ∼31% for single junction solar cell and realize higher power conversion efficiency, the concept of an intermediate band solar cell (IBSC) has been proposed. This involves the incorporation of an intermediate band (IB) to assist the three band-edge absorptions within the single absorber layer. BaSnS2 has an appropriate width of its forbidden gap in order to host an IB. In this work, doping of BaSnS2 was studied based on hybrid functional calculations. The results demonstrated that isolated and half-filled IBs were generated with suitable energy states in the band gap region after group-IIIA element (i.e., Al, Ga, and In) doping at Sn site. The theoretical efficiencies under one sun illumination of 39.0%, 44.3%, and 39.7% were obtained for 25% doping concentration of Al, Ga, and In, respectively; thus, larger than the single-junction SQ-limit. Furthermore, the dopants have lower formation energies when substituting the Sn site compare to occupying the Ba and S sites, and that helps realizing a proper IB with three band-edge absorptions. Therefore, group-IIIA element doped BaSnS2 is proposed as a high-efficiency absorber for IBSC.

2.
Sensors (Basel) ; 23(24)2023 Dec 06.
Article in English | MEDLINE | ID: mdl-38139493

ABSTRACT

Autism spectrum disorder (ASD) poses as a multifaceted neurodevelopmental condition, significantly impacting children's social, behavioral, and communicative capacities. Despite extensive research, the precise etiological origins of ASD remain elusive, with observable connections to brain activity. In this study, we propose a novel framework for ASD detection, extracting the characteristics of functional magnetic resonance imaging (fMRI) data and phenotypic data, respectively. Specifically, we employ recursive feature elimination (RFE) for feature selection of fMRI data and subsequently apply graph neural networks (GNN) to extract informative features from the chosen data. Moreover, we devise a phenotypic feature extractor (PFE) to extract phenotypic features effectively. We then, synergistically fuse the features and validate them on the ABIDE dataset, achieving 78.7% and 80.6% accuracy, respectively, thereby showcasing competitive performance compared to state-of-the-art methods. The proposed framework provides a promising direction for the development of effective diagnostic tools for ASD.


Subject(s)
Autism Spectrum Disorder , Child , Humans , Autism Spectrum Disorder/diagnostic imaging , Communication , Neural Networks, Computer , Brain/diagnostic imaging , Magnetic Resonance Imaging , Brain Mapping
3.
J Comput Biol ; 30(9): 1019-1033, 2023 09.
Article in English | MEDLINE | ID: mdl-37702623

ABSTRACT

In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.


Subject(s)
Drug Development , Neural Networks, Computer
4.
Interdiscip Sci ; 14(2): 607-622, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35428965

ABSTRACT

Emerging evidence indicates that miRNAs have strong relationships with many human diseases. Investigating the associations will contribute to elucidating the activities of miRNAs and pathogenesis mechanisms, and providing new opportunities for disease diagnosis and drug discovery. Therefore, it is of significance to identify potential associations between miRNAs and diseases. The existing databases about the miRNA-disease associations (MDAs) only provide the known MDAs, which can be regarded as positive samples. However, the unknown MDAs are not sufficient to regard as reliable negative samples. To deal with this uncertainty, we proposed a convolutional neural network (CNN) framework, named DNRLCNN, based on a latent feature matrix extracted by only positive samples to predict MDAs. First, by only considering the positive samples into the calculation process, we captured the latent feature matrix for complex interactions between miRNAs and diseases in low-dimensional space. Then, we constructed a feature vector for each miRNA and disease pair based on the feature representation. Finally, we adopted a modified CNN for the feature vector to predict MDAs. As a result, our model achieves better performance than other state-of-the-art methods which based CNN in fivefold cross-validation on both miRNA-disease association prediction task (average AUC of 0.9030) and miRNA-phenotype association prediction task (average AUC of 0. 9442). In addition, we carried out case studies on two human diseases, and all the top-50 predicted miRNAs for lung neoplasms are confirmed by HMDD v3.2 and dbDEMC 2.0 databases, 98% of the top-50 predicted miRNAs for heart failure are confirmed. The experiment results show that our model has the capability of inferring potential disease-related miRNAs.


Subject(s)
MicroRNAs , Algorithms , Computational Biology/methods , Genetic Predisposition to Disease , Humans , MicroRNAs/genetics , Neural Networks, Computer
5.
BMC Bioinformatics ; 22(1): 248, 2021 May 13.
Article in English | MEDLINE | ID: mdl-33985429

ABSTRACT

BACKGROUND: Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. RESULTS: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. CONCLUSIONS: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.


Subject(s)
Protein Interaction Maps , Saccharomyces cerevisiae Proteins , Algorithms , Computational Biology , Protein Interaction Mapping , ROC Curve , Saccharomyces cerevisiae Proteins/metabolism , Transcriptome
6.
J Comput Biol ; 28(7): 637-649, 2021 07.
Article in English | MEDLINE | ID: mdl-33439753

ABSTRACT

Essential proteins possess critical functions for cell survival. Identifying essential proteins improves our understanding of how a cell works and also plays a vital role in the research fields of disease treatment and drug development. Recently, some machine-learning methods and ensemble learning methods have been proposed to identify essential proteins by introducing effective protein features. However, the ensemble learning method only used to focus on the choice of base classifiers. In this article, we propose a novel ensemble learning framework called multi-ensemble to integrate different base classifiers. The multi-ensemble method adopts the idea of multi-view learning and selects multiple base classifiers and trains those classifiers by continually adding the samples that are predicted correctly by the other base classifiers. We applied multi-ensemble to Yeast data and Escherichia coli data. The results show that our approach achieved better performance than both individual classifiers and the other ensemble learning methods.


Subject(s)
Computational Biology/methods , Escherichia coli/metabolism , Proteins/analysis , Yeasts/metabolism , Algorithms , Escherichia coli Proteins/metabolism , Fungal Proteins/metabolism , Genes, Essential , Machine Learning
7.
Brief Bioinform ; 22(2): 1729-1750, 2021 03 22.
Article in English | MEDLINE | ID: mdl-32118252

ABSTRACT

Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.


Subject(s)
Mass Spectrometry/methods , Proteins/chemistry , Algorithms , Amino Acid Sequence , Databases, Protein
8.
Mol Genet Genomics ; 296(1): 223-233, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33159254

ABSTRACT

Circular RNAs (circRNAs) are a special class of non-coding RNAs with covalently closed-loop structures. Studies prove that circRNAs perform critical roles in various biological processes, and the aberrant expression of circRNAs is closely related to tumorigenesis. Therefore, identifying potential circRNA-disease associations is beneficial to understand the pathogenesis of complex diseases at the circRNA level and helps biomedical researchers and practitioners to discover diagnostic biomarkers accurately. However, it is tremendously laborious and time-consuming to discover disease-related circRNAs with conventional biological experiments. In this study, we develop an integrative framework, called iCDA-CMG, to predict potential associations between circRNAs and diseases. By incorporating multi-source prior knowledge, including known circRNA-disease associations, disease similarities and circRNA similarities, we adopt a collective matrix completion-based graph learning model to prioritize the most promising disease-related circRNAs for guiding laborious clinical trials. The results show that iCDA-CMG outperforms other state-of-the-art models in terms of cross-validation and independent prediction. Moreover, the case studies for several representative cancers suggest the effectiveness of iCDA-CMG in screening circRNA candidates for human diseases, which will contribute to elucidating the pathogenesis mechanisms and unveiling new opportunities for disease diagnosis and targeted therapy.


Subject(s)
Algorithms , Models, Statistical , Neoplasms/genetics , RNA, Circular/genetics , RNA, Neoplasm/genetics , Computational Biology/methods , Datasets as Topic , Humans , Models, Genetic , Neoplasms/classification , Neoplasms/diagnosis , Neoplasms/pathology , RNA, Circular/metabolism , RNA, Neoplasm/metabolism , Research Design
9.
Genomics ; 112(5): 3407-3415, 2020 09.
Article in English | MEDLINE | ID: mdl-32561349

ABSTRACT

Circular RNAs (circRNAs) have been proved to be implicated in various pathological processes and play vital roles in tumors. Increasing evidence has shown that circRNAs can serve as an important class of regulators, which have great potential to become a new type of biomarkers for tumor diagnosis and treatment. However, their biological functions remain largely unknown, and it is costly and tremendously laborious to investigate the molecular mechanisms of circRNAs in human diseases based on conventional wet-lab experiments. The emergence and rapid growth of genomics data sources has provided new opportunities for us to decipher the underlying relationships between circRNAs and diseases by computational models. Therefore, it is appealing to develop powerful computational models to discover potential disease-associated circRNAs. Here, we develop an in-silico method with graph-based multi-label learning for large-scale of prediction potential circRNA-disease associations and discovery of those most promising disease circRNAs. By fully exploiting different characteristics of circRNA space and disease space and maintaining the data local geometric structures, the graph regularization and mixed-norm constraint terms are also incorporated into the model to help to make prediction. Results and case studies show that the proposed method outperforms other models and could effectively infer potential associations with high accuracy.


Subject(s)
Computer Simulation , Disease/genetics , RNA, Circular , Algorithms , Animals , Computational Biology/methods , Humans , Mice , Rats
10.
Genes (Basel) ; 11(2)2020 01 31.
Article in English | MEDLINE | ID: mdl-32023848

ABSTRACT

Essential genes are a group of genes that are indispensable for cell survival and cell fertility. Studying human essential genes helps scientists reveal the underlying biological mechanisms of a human cell but also guides disease treatment. Recently, the publication of human essential gene data makes it possible for researchers to train a machine-learning classifier by using some features of the known human essential genes and to use the classifier to predict new human essential genes. Previous studies have found that the essentiality of genes closely relates to their properties in the protein-protein interaction (PPI) network. In this work, we propose a novel supervised method to predict human essential genes by network embedding the PPI network. Our approach implements a bias random walk on the network to get the node network context. Then, the node pairs are input into an artificial neural network to learn their representation vectors that maximally preserves network structure and the properties of the nodes in the network. Finally, the features are put into an SVM classifier to predict human essential genes. The prediction results on two human PPI networks show that our method achieves better performance than those that refer to either genes' sequence information or genes' centrality properties in the network as input features. Moreover, it also outperforms the methods that represent the PPI network by other previous approaches.


Subject(s)
Computational Biology/methods , Genes, Essential , Protein Interaction Maps , Databases, Genetic , Humans , Supervised Machine Learning
11.
IEEE Trans Nanobioscience ; 17(3): 243-250, 2018 07.
Article in English | MEDLINE | ID: mdl-29993553

ABSTRACT

Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Proteins , Algorithms , Databases, Protein , Proteins/classification , Proteins/physiology , Software
12.
Methods ; 124: 69-77, 2017 07 15.
Article in English | MEDLINE | ID: mdl-28576328

ABSTRACT

MicroRNAs have been reported to have close relationship with diseases due to their deregulation of the expression of target mRNAs. Detecting disease-related microRNAs is helpful for disease therapies. With the development of high throughput experimental techniques, a large number of microRNAs have been sequenced. However, it is still a big challenge to identify which microRNAs are related to diseases. Recently, researchers are interesting in combining multiple-biological information to identify the associations between microRNAs and diseases. In this work, we have proposed a novel method to predict the microRNA-disease associations based on four biological properties. They are microRNA, disease, gene and environment factor. Compared with previous methods, our method makes predictions not only by using the prior knowledge of associations among microRNAs, disease, environment factors and genes, but also by using the internal relationship among these biological properties. We constructed four biological networks based on the similarity of microRNAs, diseases, environment factors and genes, respectively. Then random walking was implemented on the four networks unequally. In the walking course, the associations can be inferred from the neighbors in the same networks. Meanwhile the association information can be transferred from one network to another. The results of experiment showed that our method achieved better prediction performance than other existing state-of-the-art methods.


Subject(s)
Algorithms , Cardiovascular Diseases/genetics , Gene Regulatory Networks , MicroRNAs/genetics , Neoplasms/genetics , RNA, Messenger/genetics , Schizophrenia/genetics , Area Under Curve , Cardiovascular Diseases/metabolism , Cardiovascular Diseases/pathology , Databases, Genetic , Gene Expression Regulation , Gene-Environment Interaction , Humans , MicroRNAs/metabolism , Neoplasms/metabolism , Neoplasms/pathology , RNA, Messenger/metabolism , Risk Factors , Schizophrenia/metabolism , Schizophrenia/pathology , Signal Transduction
13.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1399-1409, 2017.
Article in English | MEDLINE | ID: mdl-28113634

ABSTRACT

Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Proteins/chemistry , Tandem Mass Spectrometry/methods , Algorithms , Area Under Curve , Databases, Protein , Humans , Peptides/analysis , Peptides/chemistry , Proteins/analysis , Yeasts/genetics , Yeasts/metabolism
14.
Bioinformatics ; 32(12): 1788-96, 2016 06 15.
Article in English | MEDLINE | ID: mdl-26833342

ABSTRACT

MOTIVATION: Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging. It is highly appealing to develop sensitive and accurate methods to detect deletions from sequencing data, especially deletions with microhomology and deletions with microinsertion. RESULTS: We present a novel method called Sprites (SPlit Read re-alIgnment To dEtect Structural variants) which finds deletions from sequencing data. It aligns a whole soft-clipping read rather than its clipped part to the target sequence, a segment of the reference which is determined by spanning reads, in order to find the longest prefix or suffix of the read that has a match in the target sequence. This alignment aims to solve the problem of deletions with microhomologies and deletions with microinsertions. Using both simulated and real data we show that Sprites performs better on detecting deletions compared with other current methods in terms of F-score. AVAILABILITY AND IMPLEMENTATION: Sprites is open source software and freely available at https://github.com/zhangzhen/sprites CONTACT: jxwang@mail.csu.edu.cnSupplementary data: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Genome, Human , Humans , Sequence Deletion , Software
15.
Article in English | MEDLINE | ID: mdl-26357321

ABSTRACT

Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.


Subject(s)
Cluster Analysis , Computational Biology/methods , Software , Algorithms , User-Computer Interface
16.
Curr Protein Pept Sci ; 15(6): 529-39, 2014.
Article in English | MEDLINE | ID: mdl-25059324

ABSTRACT

Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Recently, some methods have been developed to solve the problem by incorporating functional similarity of GO terms into protein-protein interaction (PPI) network, which are based on the observation that a protein tends to share some common functions with proteins that interact with it in PPI network, and two similar GO terms in functional interrelationship network usually co-annotate some common proteins. However, these methods annotate functions of proteins by considering at the same level neighbors of proteins and GO terms respectively, and few attempts have been made to investigate their difference. Given the topological and structural difference between PPI network and functional interrelationship network, we firstly investigate at which level neighbors of proteins tend to have functional associations and at which level neighbors of GO terms usually co-annotate some common proteins. Then, an unbalanced Bi-random walk (UBiRW) algorithm which iteratively walks different number of steps in the two networks is adopted to find protein-GO term associations according to some known associations. Experiments are carried out on S. cerevisiae data. The results show that our method achieves better prediction performance not only than methods that only use PPI network data, but also than methods that consider at the same level neighbors of proteins and of GO terms.


Subject(s)
Algorithms , Protein Interaction Mapping/methods , Protein Interaction Maps , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism
17.
Article in English | MEDLINE | ID: mdl-26355787

ABSTRACT

Essential proteins are vital for an organism's viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression profiles help to make up the deficiency. In this work, Pearson correlation coefficient (PCC) is used to bridge the gap between PPI and gene expression data. Based on PCC and edge clustering coefficient (ECC), a new centrality measure, i.e., the weighted degree centrality (WDC), is developed to achieve the reliable prediction of essential proteins. WDC is employed to identify essential proteins in the yeast PPI and e-Coli networks in order to estimate its performance. For comparison, other prediction technologies are also performed to identify essential proteins. Some evaluation methods are used to analyze the results from various prediction approaches. The prediction results and comparative analyses are shown in the paper. Furthermore, the parameter λ in the method WDC will be analyzed in detail and an optimal λ value will be found. Based on the optimal λ value, the differentiation of WDC and another prediction method PeC is discussed. The analyses prove that WDC outperforms other methods including DC, BC, CC, SC, EC, IC, NC, and PeC. At the same time, the analyses also mean that it is an effective way to predict essential proteins by means of integrating different data sources.


Subject(s)
Computational Biology/methods , Protein Interaction Maps/genetics , Proteins/chemistry , Proteins/metabolism , Transcriptome/genetics , Cluster Analysis , Proteins/genetics , ROC Curve
18.
BMC Genomics ; 14 Suppl 4: S7, 2013.
Article in English | MEDLINE | ID: mdl-24267033

ABSTRACT

BACKGROUND: Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. RESULTS: In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. CONCLUSIONS: The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.


Subject(s)
Artificial Intelligence , Genes, Essential , Proteins/metabolism , Software , Algorithms , Cell Survival/genetics , Computational Biology/methods , Gene Expression , Models, Genetic , Saccharomyces cerevisiae , Saccharomyces cerevisiae Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...