Longest k-tuple Common Sub-Strings
2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022
; : 63-66, 2022.
Article
in English
| Scopus | ID: covidwho-2232482
ABSTRACT
We focus on a new problem that is formulated to find a longest k-tuple of common sub-strings (abbr. k-CSSs) of two or more strings. We present a suffix tree based algorithm for this problem, which can find a longest k-CSS of m strings in O(kmn-{k}) time and O(kmn) space where n is the length sum of the m strings. This algorithm can be used to approximate the longest k-CSS problem to a performance ratio frac{1}{epsilon} in O(kmn-{lceilepsilon krceil}) time for epsilonin(0,1]. Since the algorithm has the space complexity in linear order of n, it will show advantage in comparing particularly long strings. This algorithm proves that the problem that asks to find a longest gapped pattern of non-constant number of strings is polynomial time solvable if the gap number is restricted constant, although the problem without any restriction on the gap number was proved NP-Hard. Using a C++ tool that is reliant on the algorithm, we performed experiments of finding longest 2-CSSs, 3-CSSs and 5-CSSs of 2 14 COVID-19 S-proteins. Under the help of longest 2-CSSs and 3-CSSs of COVID-19 S-proteins, we identified the mutation sites in the S-proteins of two COVID-19 variants Delta and Omicron. The algorithm based tool is available for downloading at https//github.com/lytt0/k-CSS. © 2022 IEEE.
Full text:
Available
Collection:
Databases of international organizations
Database:
Scopus
Topics:
Variants
Language:
English
Journal:
2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022
Year:
2022
Document Type:
Article
Similar
MEDLINE
...
LILACS
LIS