Your browser doesn't support javascript.
Generator based approach to analyze mutations in genomic datasets.
Jain, Siddharth; Xiao, Xiongye; Bogdan, Paul; Bruck, Jehoshua.
  • Jain S; California Institute of Technology, Electrical Engineering, Pasadena, 91125, USA.
  • Xiao X; University of Southern California , Electrical and Computer Engineering, Los Angeles, 90007, USA.
  • Bogdan P; University of Southern California , Electrical and Computer Engineering, Los Angeles, 90007, USA. pbogdan@usc.edu.
  • Bruck J; California Institute of Technology, Electrical Engineering, Pasadena, 91125, USA. bruck@caltech.edu.
Sci Rep ; 11(1): 21084, 2021 10 26.
Article in English | MEDLINE | ID: covidwho-1493213
ABSTRACT
In contrast to the conventional approach of directly comparing genomic sequences using sequence alignment tools, we propose a computational approach that performs comparisons between sequence generators. These sequence generators are learned via a data-driven approach that empirically computes the state machine generating the genomic sequence of interest. As the state machine based generator of the sequence is independent of the sequence length, it provides us with an efficient method to compute the statistical distance between large sets of genomic sequences. Moreover, our technique provides a fast and efficient method to cluster large datasets of genomic sequences, characterize their temporal and spatial evolution in a continuous manner, get insights into the locality sensitive information about the sequences without any need for alignment. Furthermore, we show that the technique can be used to detect local regions with mutation activity, which can then be applied to aid alignment techniques for the fast discovery of mutations. To demonstrate the efficacy of our technique on real genomic data, we cluster different strains of SARS-CoV-2 viral sequences, characterize their evolution and identify regions of the viral sequence with mutations.
Subject(s)

Full text: Available Collection: International databases Database: MEDLINE Main subject: Computational Biology / Genomics / SARS-CoV-2 / COVID-19 / Mutation Type of study: Diagnostic study Limits: Humans Language: English Journal: Sci Rep Year: 2021 Document Type: Article Affiliation country: S41598-021-00609-8

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Computational Biology / Genomics / SARS-CoV-2 / COVID-19 / Mutation Type of study: Diagnostic study Limits: Humans Language: English Journal: Sci Rep Year: 2021 Document Type: Article Affiliation country: S41598-021-00609-8