Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Publication year range
1.
Article in Chinese | WPRIM (Western Pacific) | ID: wpr-1015620

ABSTRACT

DNA double-strand break(DSB) is a serious form of DNA damage in cells, which is closely related to a variety of genomic instability diseases, including cancer, abnormal recombination and neuronal development. Due to the limitations of cost and technical threshold, high-resolution DSB mapping by high-throughput sequencing technology is very limited. This hinders our understanding of the DSB situation in the genomes of different species. Therefore, we developed a classification prediction model based on random Forest(RF), support vector machine(SVM) and logistic regression(LR) classifiers to predict DSB loci in the whole genome of human NHEK cells. In addition to the epigenetic features and DNA shape features commonly used in previous prediction studies, we found that DNA sequence features(kmer frequency, GC content, GC-skew, Mutual Information) can also characterize DSB sites. At the same time, the prediction accuracy is improved after considering DNA physical properties, chemical shifts and autocorrelation information. After combining all the above features, logistic regression(LR) has the best prediction performance(AUC = 0. 97), which is comparable to previous prediction(AUC = 0. 964). In addition, the optimal feature collection consisting of 294 features was obtained by the incremental feature search method, and the corresponding AUC value reached 0. 974.

2.
J Bioinform Comput Biol ; 16(1): 1840003, 2018 02.
Article in English | MEDLINE | ID: mdl-29382253

ABSTRACT

Predicting promoter activity of DNA fragment is an important task for computational biology. Approaches using physical properties of DNA to predict bacterial promoters have recently gained a lot of attention. To select an adequate set of physical properties for training a classifier, various characteristics of DNA molecule should be taken into consideration. Here, we present a systematic approach that allows us to select less correlated properties for classification by means of both correlation and cophenetic coefficients as well as concordance matrices. To prove this concept, we have developed the first classifier that uses not only sequence and static physical properties of DNA fragment, but also dynamic properties of DNA open states. Therefore, the best performing models with accuracy values up to 90% for all types of sequences were obtained. Furthermore, we have demonstrated that the classifier can serve as a reliable tool enabling promoter DNA fragments to be distinguished from promoter islands despite the similarity of their nucleotide sequences.


Subject(s)
Computational Biology/methods , DNA, Bacterial/classification , Escherichia coli K12/genetics , Promoter Regions, Genetic , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Genome, Bacterial , Static Electricity
SELECTION OF CITATIONS
SEARCH DETAIL
...