Your browser doesn't support javascript.
Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences.
Erten, Mehmet; Acharya, Madhav R; Kamath, Aditya P; Sampathila, Niranjana; Bairy, G Muralidhar; Aydemir, Emrah; Barua, Prabal Datta; Baygin, Mehmet; Tuncer, Ilknur; Dogan, Sengul; Tuncer, Turker.
  • Erten M; Laboratory of Medical Biochemistry, Malatya Training and Research Hospital, 44000 Malatya, Turkey.
  • Acharya MR; Department of Biomedical Engineering, Manipal Academy of Higher Education, Manipal 04478, India.
  • Kamath AP; Center for Biomedical Engineering, Brown University, Providence, RI 02912, USA.
  • Sampathila N; Department of Biomedical Engineering, Manipal Academy of Higher Education, Manipal 04478, India.
  • Bairy GM; Department of Biomedical Engineering, Manipal Academy of Higher Education, Manipal 04478, India.
  • Aydemir E; Department of Management Information, College of Management, Sakarya University, 54050 Sakarya, Turkey.
  • Barua PD; School of Management & Enterprise, University of Southern Queensland, Toowoomba, QLD 4350, Australia.
  • Baygin M; Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia.
  • Tuncer I; Department of Computer Engineering, Faculty of Engineering, Ardahan University, 75000 Ardahan, Turkey.
  • Dogan S; Elazig Governorship, Interior Ministry, 23119 Elazig, Turkey.
  • Tuncer T; Department of Digital Forensics Engineering, Technology Faculty, Firat University, 23119 Elazig, Turkey.
Diagnostics (Basel) ; 12(12)2022 Dec 15.
Article in English | MEDLINE | ID: covidwho-2163270
ABSTRACT
SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare's Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (7525 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections.
Keywords

Full text: Available Collection: International databases Database: MEDLINE Type of study: Diagnostic study / Experimental Studies / Prognostic study / Randomized controlled trials Language: English Year: 2022 Document Type: Article Affiliation country: Diagnostics12123181

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Diagnostic study / Experimental Studies / Prognostic study / Randomized controlled trials Language: English Year: 2022 Document Type: Article Affiliation country: Diagnostics12123181