A framework for predicting variable-length epitopes of human-adapted viruses using machine learning methods.

Yin, Rui; Zhu, Xianghe; Zeng, Min; Wu, Pengfei; Li, Min; Kwoh, Chee Keong

Yin, Rui; Zhu, Xianghe; Zeng, Min; Wu, Pengfei; Li, Min; Kwoh, Chee Keong.

Yin R; Department of Biomedical Informatics, Harvard Medical School, Boston, USA.
Zhu X; Department of Statistics, University of Oxford, Oxford, UK.
Zeng M; Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China.
Wu P; Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China.
Li M; Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China.
Kwoh CK; School of Computer Science and Engineering, Nanyang Technological University, Singapore.

Brief Bioinform ; 23(5)2022 09 20.

Article in English | MEDLINE | ID: covidwho-1948166

ABSTRACT

ABSTRACT

The coronavirus disease 2019 pandemic has alerted people of the threat caused by viruses. Vaccine is the most effective way to prevent the disease from spreading. The interaction between antibodies and antigens will clear the infectious organisms from the host. Identifying B-cell epitopes is critical in vaccine design, development of disease diagnostics and antibody production. However, traditional experimental methods to determine epitopes are time-consuming and expensive, and the predictive performance using the existing in silico methods is not satisfactory. This paper develops a general framework to predict variable-length linear B-cell epitopes specific for human-adapted viruses with machine learning approaches based on Protvec representation of peptides and physicochemical properties of amino acids. QR decomposition is incorporated during the embedding process that enables our models to handle variable-length sequences. Experimental results on large immune epitope datasets validate that our proposed model's performance is superior to the state-of-the-art methods in terms of AUROC (0.827) and AUPR (0.831) on the testing set. Moreover, sequence analysis also provides the results of the viral category for the corresponding predicted epitopes with high precision. Therefore, this framework is shown to reliably identify linear B-cell epitopes of human-adapted viruses given protein sequences and could provide assistance for potential future pandemics and epidemics.

Subject(s)

COVID-19; Viruses; Amino Acids; Epitope Mapping/methods; Epitopes, B-Lymphocyte; Humans; Machine Learning; Peptides/chemistry

Keywords

Epitope prediction; Human viruses; Machine learning; Variable-length

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: Viruses / COVID-19 Type of study: Prognostic study Topics: Vaccines Limits: Humans Language: English Journal subject: Biology / Medical Informatics Year: 2022 Document Type: Article Affiliation country: Bib

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google