Search | VHL Regional Portal

1.

New Construction of Family of MLCS Algorithms.

Shi, Haihe; Wang, Jun.

J Healthc Eng ; 2021: 6636710, 2021.

Article in English | MEDLINE | ID: mdl-33542799

ABSTRACT

The multiple longest common subsequence (MLCS) problem involves finding all the longest common subsequences of multiple character sequences. This problem is encountered in a variety of areas, including data mining, text processing, and bioinformatics, and is particularly important for biological sequence analysis. By taking the MLCS problem and algorithms for its solution as research domain, this study analyzes the domain of multiple longest common subsequence algorithms, extracts features that algorithms in the domain do and do not have in common, and creates a domain feature model for the MLCS problem by using generic programming, domain engineering, abstraction, and related technologies. A component library for the domain is designed based on the feature model for the MLCS problem, and the partition and recur (PAR) platform is used to ensure that highly reliable MLCS algorithms can be quickly assembled through component assembly. This study provides a valuable reference for obtaining rapid solutions to problems of biological sequence analysis and improves the reliability and assembly flexibility of assembling algorithms.

Subject(s)

Algorithms , Computational Biology , Data Mining , Humans , Reproducibility of Results

2.

A New Implementation of Genome Rearrangement Problem.

Jing, Xiaoqian; Shi, Haihe.

J Healthc Eng ; 2021: 6692775, 2021.

Article in English | MEDLINE | ID: mdl-33552456

ABSTRACT

Unsigned reverse genome rearrangement is an important part of bioinformatics research, which is widely used in biological similarity and homology analysis, revealing biological inheritance, variation, and evolution. Branch and bound, simulated annealing, and other algorithms in unsigned reverse genome rearrangement algorithm are rare in practical application because of their huge time and space consumption, and greedy algorithms are mostly used at present. By deeply analyzing the domain of unsigned reverse genome rearrangement algorithm based on greedy strategy (unsigned reverse genome rearrangement algorithm (URGRA) based on greedy strategy), the domain features are modeled, and the URGRA algorithm components are interactively designed according to the production programming method. With the support of the PAR platform, the algorithm component library of the URGRA is formally realized, and the concrete algorithm is generated by assembly, which improves the reliability of the assembly algorithm.

Subject(s)

Computational Biology , Genome , Algorithms , Computational Biology/methods , Humans , Reproducibility of Results

3.

Efficient Generation of RNA Secondary Structure Prediction Algorithm Under PAR Framework.

Shi, Haihe; Jing, Xiaoqian.

Front Plant Sci ; 12: 830042, 2021.

Article in English | MEDLINE | ID: mdl-35126440

ABSTRACT

Prediction of RNA secondary structure is an important part of bioinformatics genomics research. Mastering RNA secondary structure can help us to better analyze protein synthesis, cell differentiation, metabolism, and genetic processes and thus reveal the genetic laws of organisms. Comparative sequence analysis, support vector machine, centroid method, and other algorithms in RNA secondary structure prediction algorithms often use dynamic programming algorithm to predict RNA secondary structure because of their huge time and space consumption and complex data structure. In this article, the domain of RNA secondary structure prediction algorithm based on dynamic programming (DP-SSP) is analyzed in depth, and the domain features are modeled. According to the generative programming method, the DP-SSP algorithm components are interactively designed. With the support of PAR platform, the DP-SSP algorithm component library is formally realized. Finally, the concrete algorithm is generated through component assembly, which improves the efficiency and reliability of algorithm development.

4.

Component-Based Design and Assembly of Heuristic Multiple Sequence Alignment Algorithms.

Shi, Haihe; Zhang, Xuchu.

Front Genet ; 11: 105, 2020.

Article in English | MEDLINE | ID: mdl-32174970

ABSTRACT

In recent years, there has been an explosive increase in the amount of bioinformatics data produced, but data are not information. The purpose of bioinformatics research is to obtain information with biological significance from large amounts of data. Multiple sequence alignment is widely used in sequence homology detection, protein secondary and tertiary structure prediction, phylogenetic tree analysis, and other fields. Existing research mainly focuses on the specific steps of the algorithm or on specific problems, and there is a lack of high-level abstract domain algorithm frameworks. As a result, multiple sequence alignment algorithms are complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm, which may lead to computing errors. Here, through in-depth study and analysis of the heuristic multiple sequence alignment algorithm (HMSAA) domain, a domain-feature model and an interactive model of HMSAA components have been established according to the generative programming method. With the support of the PAR (partition and recur) platform, the HMSAA algorithm component library is formalized and a specific alignment algorithm is assembled, thus improving the reliability of algorithm assembly. This work provides a valuable theoretical reference for the applications of other biological sequence analysis algorithms.

5.

Efficient Multiple Sequences Alignment Algorithm Generation via Components Assembly Under PAR Framework.

Shi, Haipeng; Shi, Haihe; Xu, Shenghua.

Front Genet ; 11: 628175, 2020.

Article in English | MEDLINE | ID: mdl-33613626

ABSTRACT

As a key algorithm in bioinformatics, sequence alignment algorithm is widely used in sequence similarity analysis and genome sequence database search. Existing research focuses mainly on the specific steps of the algorithm or is for specific problems, lack of high-level abstract domain algorithm framework. Multiple sequence alignment algorithms are more complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm; some computing errors may occur. Based on our constructed pairwise sequence alignment algorithm component library and the convenient software platform PAR, a few expansion domain components are developed for multiple sequence alignment application domain, and specific multiple sequence alignment algorithm can be designed, and its corresponding program, i.e., C++/Java/Python program, can be generated efficiently and thus enables the improvement of the development efficiency of complex algorithms, as well as accuracy of sequence alignment calculation. A star alignment algorithm is designed and generated to demonstrate the development process.

6.

Research on Components Assembly Platform of Biological Sequences Alignment Algorithm.

Shi, Haihe; Wu, Gang; Zhang, Xuchu; Wang, Jun; Shi, Haipeng; Xu, Shenghua.

Front Genet ; 11: 630923, 2020.

Article in English | MEDLINE | ID: mdl-33552143

ABSTRACT

After years of development, the complexity of the biological sequence alignment algorithm is gradually increasing, and the lack of high abstract level domain research leads to the complexity of its algorithm development and improvement. By applying the idea of software components to the design and development of algorithms, the development efficiency and reliability of biological sequence alignment algorithms can be effectively improved. The component assembly platform applies related assembly technology, which simplifies the operation difficulty of component assembly and facilitates the maintenance and optimization of the algorithm. At the same time, a friendly visual interface is used to intuitively complete the assembly of algorithm components, and an executable sequence alignment algorithm program is obtained, which can directly carry out alignment computing.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL