Your browser doesn't support javascript.
loading
Machine Learning-based Identification of the Sites of DNA Double-Strand-Break in the Human Genome / 中国生物化学与分子生物学报
Article em Zh | WPRIM | ID: wpr-1015620
Biblioteca responsável: WPRO
ABSTRACT
DNA double-strand break(DSB) is a serious form of DNA damage in cells, which is closely related to a variety of genomic instability diseases, including cancer, abnormal recombination and neuronal development. Due to the limitations of cost and technical threshold, high-resolution DSB mapping by high-throughput sequencing technology is very limited. This hinders our understanding of the DSB situation in the genomes of different species. Therefore, we developed a classification prediction model based on random Forest(RF), support vector machine(SVM) and logistic regression(LR) classifiers to predict DSB loci in the whole genome of human NHEK cells. In addition to the epigenetic features and DNA shape features commonly used in previous prediction studies, we found that DNA sequence features(kmer frequency, GC content, GC-skew, Mutual Information) can also characterize DSB sites. At the same time, the prediction accuracy is improved after considering DNA physical properties, chemical shifts and autocorrelation information. After combining all the above features, logistic regression(LR) has the best prediction performance(AUC = 0. 97), which is comparable to previous prediction(AUC = 0. 964). In addition, the optimal feature collection consisting of 294 features was obtained by the incremental feature search method, and the corresponding AUC value reached 0. 974.
Palavras-chave
Texto completo: 1 Índice: WPRIM Idioma: Zh Revista: Chinese Journal of Biochemistry and Molecular Biology Ano de publicação: 2023 Tipo de documento: Article
Texto completo: 1 Índice: WPRIM Idioma: Zh Revista: Chinese Journal of Biochemistry and Molecular Biology Ano de publicação: 2023 Tipo de documento: Article