Pesquisa | Portal Regional da BVS (teste)

LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes.

Quan, Chenxu; Liu, Fenghui; Qi, Lin; Tie, Yun.

Interdiscip Sci ; 15(2): 217-230, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-36848004

RESUMO

Somatic mutations often occur at high relapse sites in protein sequences, which indicates that the location clustering of somatic missense mutations can be used to identify driving genes. However, the traditional clustering algorithm has such problems as the background signal over-fitting, the clustering algorithm is not suitable for mutation data, and the performance of identifying low-frequency mutation genes needs to be improved. In this paper, we propose a linear clustering algorithm based on likelihood ratio test knowledge to identify driver genes. In this experiment, firstly, the polynucleotide mutation rate is calculated based on the prior knowledge of likelihood ratio test. Then, the simulation data set is obtained through the background mutation rate model. Finally, the unsupervised peak clustering algorithm is used to, respectively, evaluate the somatic mutation data and the simulation data to identify the driver genes. The experimental results show that our method achieves a better balance of precision and sensitivity. It can also identify the driver genes missed by other methods, making it an effective supplement to other methods. We also discover some potential linkages between genes and between genes and mutation sites, which is of great value to target drug therapy research. Method framework: Our proposed model framework is as follows. a. Counting mutation sites and the number of mutations in tumor gene elements. b. The nucleotide context mutation frequency is counted based on the likelihood ratio test knowledge, and the background mutation rate model is obtained. c. Based on Monte Carlo simulation method, data sets with the same number of mutations as gene elements are randomly sampled to obtain simulated mutation data, and the sampling frequency of each mutation site is related to the mutation rate of polynucleotide. d. The original mutation data and the simulated mutation data after random reconstruction are clustered by peak density, respectively, and the corresponding clustering scores are obtained. e. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the original single nucleotide mutation data through step d. f. According to the observed score and the simulated clustering score, the p-value of the corresponding gene fragment is calculated. g. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the simulated single nucleotide mutation data through step d.

Assuntos

Algoritmos , Neoplasias , Humanos , Funções Verossimilhança , Simulação por Computador , Análise por Conglomerados , Mutação/genética , Nucleotídeos , Neoplasias/genética

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA