Your browser doesn't support javascript.
Identification of SARS-CoV-2 origin: Using Ngrams, principal component analysis and Random Forest algorithm.
El Boujnouni, Hamoucha; Rahouti, Mohamed; El Boujnouni, Mohamed.
  • El Boujnouni H; Research Center of Plant and Microbial Biotechnologies, Biodiversity, and Environment, Faculty of Sciences, Mohammed V University in Rabat, PO Box 1014, Morocco.
  • Rahouti M; Research Center of Plant and Microbial Biotechnologies, Biodiversity, and Environment, Faculty of Sciences, Mohammed V University in Rabat, PO Box 1014, Morocco.
  • El Boujnouni M; Laboratory of Information Technologies, National School of Applied Sciences, Chouaib Doukkali University in El Jadida, PO Box 1166, Morocco.
Inform Med Unlocked ; 24: 100577, 2021.
Article in English | MEDLINE | ID: covidwho-1193338
ABSTRACT
COVID-19 is an infectious disease caused by the newly discovered SARS-CoV-2 virus. This virus causes a respiratory tract infection, symptoms include dry cough, fever, tiredness and in more severe cases, breathing difficulty. SARS-CoV-2 is an extremely contagious virus that is spreading rapidly all over the world and the scientific community is working tirelessly to find an effective treatment. This paper aims to determine the origin of this virus by comparing its nucleic acid sequence with all members of the coronaviridae family. This study uses a new approach based on the combination of three powerful techniques which are Ngrams (For text categorization), Principal Component Analysis (For dimensionality reduction) and Random Forest algorithm (For supervised classification). The experimental results have shown that a large set of SARS-CoV-2 genomes, collected from different locations around the world, present significant similarities to those found in pangolins. This finding confirms some previous results obtained by other methods, which also suggest that pangolins should be considered as possible hosts in the emergence of the new coronavirus.
Keywords

Full text: Available Collection: International databases Database: MEDLINE Type of study: Randomized controlled trials Language: English Journal: Inform Med Unlocked Year: 2021 Document Type: Article Affiliation country: J.imu.2021.100577

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Randomized controlled trials Language: English Journal: Inform Med Unlocked Year: 2021 Document Type: Article Affiliation country: J.imu.2021.100577