Search | VHL Regional Portal

Anti-CRISPRdb: a comprehensive online resource for anti-CRISPR proteins.

Dong, Chuan; Hao, Ge-Fei; Hua, Hong-Li; Liu, Shuo; Labena, Abraham Alemayehu; Chai, Guoshi; Huang, Jian; Rao, Nini; Guo, Feng-Biao.

Nucleic Acids Res ; 46(D1): D393-D398, 2018 01 04.

Article in English | MEDLINE | ID: mdl-29036676

ABSTRACT

CRISPR-Cas is a tool that is widely used for gene editing. However, unexpected off-target effects may occur as a result of long-term nuclease activity. Anti-CRISPR proteins, which are powerful molecules that inhibit the CRISPR-Cas system, may have the potential to promote better utilization of the CRISPR-Cas system in gene editing, especially for gene therapy. Additionally, more in-depth research on these proteins would help researchers to better understand the co-evolution of bacteria and phages. Therefore, it is necessary to collect and integrate data on various types of anti-CRISPRs. Herein, data on these proteins were manually gathered through data screening of the literatures. Then, the first online resource, anti-CRISPRdb, was constructed for effectively organizing these proteins. It contains the available protein sequences, DNA sequences, coding regions, source organisms, taxonomy, virulence, protein interactors and their corresponding three-dimensional structures. Users can access our database at http://cefg.uestc.edu.cn/anti-CRISPRdb/ without registration. We believe that the anti-CRISPRdb can be used as a resource to facilitate research on anti-CRISPR proteins and in related fields.

Subject(s)

Bacteriophages/physiology , CRISPR-Cas Systems , Databases, Protein , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism

An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao.

Biomed Res Int ; 2016: 7639397, 2016.

Article in English | MEDLINE | ID: mdl-27660763

ABSTRACT

Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus, which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots.

Dong, Chuan; Yuan, Ya-Zhou; Zhang, Fa-Zhan; Hua, Hong-Li; Ye, Yuan-Nong; Labena, Abraham Alemayehu; Lin, Hao; Chen, Wei; Guo, Feng-Biao.

Mol Biosyst ; 12(9): 2893-900, 2016 08 16.

Article in English | MEDLINE | ID: mdl-27410247

ABSTRACT

Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by â¼6.6%, â¼3.2%, and â¼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from . In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.

Subject(s)

Computational Biology/methods , DNA/chemistry , DNA/genetics , Nucleotides , Algorithms , Genome, Fungal , ROC Curve , Recombination, Genetic , Reproducibility of Results , Sensitivity and Specificity , Support Vector Machine , Web Browser

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL