Your browser doesn't support javascript.
CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.
Serna García, Giuseppe; Al Khalaf, Ruba; Invernici, Francesco; Ceri, Stefano; Bernasconi, Anna.
  • Serna García G; Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.
  • Al Khalaf R; Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.
  • Invernici F; Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.
  • Ceri S; Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.
  • Bernasconi A; Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.
Gigascience ; 122022 12 28.
Artículo en Inglés | MEDLINE | ID: covidwho-20242676
ABSTRACT

BACKGROUND:

Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus.

RESULTS:

The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http//gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples.

CONCLUSIONS:

The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.
Asunto(s)
Palabras clave

Texto completo: Disponible Colección: Bases de datos internacionales Base de datos: MEDLINE Asunto principal: Aprendizaje Profundo / COVID-19 Tipo de estudio: Estudio experimental / Estudio pronóstico Tópicos: Variantes Límite: Humanos Idioma: Inglés Año: 2022 Tipo del documento: Artículo País de afiliación: Gigascience

Similares

MEDLINE

...
LILACS

LIS


Texto completo: Disponible Colección: Bases de datos internacionales Base de datos: MEDLINE Asunto principal: Aprendizaje Profundo / COVID-19 Tipo de estudio: Estudio experimental / Estudio pronóstico Tópicos: Variantes Límite: Humanos Idioma: Inglés Año: 2022 Tipo del documento: Artículo País de afiliación: Gigascience