Your browser doesn't support javascript.
Using amino acid features to identify the pathogenicity of influenza B virus.
Kou, Zheng; Fan, Xinyue; Li, Junjie; Shao, Zehui; Qiang, Xiaoli.
  • Kou Z; Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China. kouzhengcn@foxmail.com.
  • Fan X; Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
  • Li J; Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
  • Shao Z; Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
  • Qiang X; School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China. qiangxl@gzhu.edu.cn.
Infect Dis Poverty ; 11(1): 50, 2022 May 04.
Article in English | MEDLINE | ID: covidwho-1883543
ABSTRACT

BACKGROUND:

Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.

METHODS:

The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.

RESULTS:

The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.

CONCLUSIONS:

The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Influenza B virus / Amino Acids Type of study: Prognostic study / Randomized controlled trials Language: English Journal: Infect Dis Poverty Year: 2022 Document Type: Article Affiliation country: S40249-022-00974-0

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Influenza B virus / Amino Acids Type of study: Prognostic study / Randomized controlled trials Language: English Journal: Infect Dis Poverty Year: 2022 Document Type: Article Affiliation country: S40249-022-00974-0