Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38241106

RESUMO

Identifying motifs within sets of protein sequences constitutes a pivotal challenge in proteomics, imparting insights into protein evolution, function prediction, and structural attributes. Motifs hold the potential to unveil crucial protein aspects like transcription factor binding sites and protein-protein interaction regions. However, prevailing techniques for identifying motif sequences in extensive protein collections often entail significant time investments. Furthermore, ensuring the accuracy of obtained results remains a persistent motif discovery challenge. This paper introduces an innovative approach-a branch and bound algorithm-for exact motif identification across diverse lengths. This algorithm exhibits superior performance in terms of reduced runtime and enhanced result accuracy, as compared to existing methods. To achieve this objective, the study constructs a comprehensive tree structure encompassing potential motif evolution pathways. Subsequently, the tree is pruned based on motif length and targeted similarity thresholds. The proposed algorithm efficiently identifies all potential motif subsequences, characterized by maximal similarity, within expansive protein sequence datasets. Experimental findings affirm the algorithm's efficacy, highlighting its superior performance in terms of runtime, motif count, and accuracy, in comparison to prevalent practical techniques.

2.
Biosystems ; 226: 104869, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36858110

RESUMO

The sequencing of eukaryotic genomes has shown that tandem repeats are abundant in their sequences. In addition to affecting some cellular processes, tandem repeats in the genome may be associated with specific diseases and have been the key to resolving criminal cases. Any tool developed for detecting tandem repeats must be accurate, fast, and useable in thousands of laboratories worldwide, including those with not very advanced computing capabilities. The proposed method, the Rapid Perfect Tandem Repeat Finder (RPTRF), minimizes the need for excess character comparison processing by indexing the input file and significantly helps to accelerate and prepare the output without artifacts by using an interval tree in the filtering section. The experiments demonstrated that the RPTRF is very fast in discovering all perfect tandem repeats of all categories of any genomic sequences. Although the detection of imperfect TRs is not the focus of the RPTRF, comparisons show that it even outperforms some other tools (in five selected gold standards) designed explicitly for this purpose. The implemented tool and how to use it are available on GitHub.


Assuntos
Genômica , Sequências de Repetição em Tandem , Sequência de Bases , Sequências de Repetição em Tandem/genética , Análise de Sequência de DNA
3.
Bioinformatics ; 38(10): 2734-2741, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561171

RESUMO

SUMMARY: Topology determination is one of the most important intermediate steps toward building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (SSEs) (α-helices and ß-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a linear programming-based topology determination (LPTD) method to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein's sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α-ß proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the dataset, the native topology has been detected in the first rank topology in <2 s. Besides, this method is able to successfully handle large complex proteins with as many as 65 SSEs. Such a large number of SSEs have never been solved with current tools/methods. AVAILABILITY AND IMPLEMENTATION: The LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface have been provided in the shared readme file. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Programação Linear , Proteínas , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação Proteica , Estrutura Secundária de Proteína , Proteínas/química
4.
Biomolecules ; 11(12)2021 11 26.
Artigo em Inglês | MEDLINE | ID: mdl-34944417

RESUMO

Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4-10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and ß-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Microscopia Crioeletrônica , Cristalografia por Raios X , Modelos Moleculares , Estrutura Secundária de Proteína , Máquina de Vetores de Suporte
5.
Comput Biol Chem ; 94: 107552, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34390958

RESUMO

The three-dimensional structures of proteins provide their functions and incorrect folding of its ß-strands can be the cause of many diseases. There are two major approaches for determining protein structures: computational prediction and experimental methods that employ technologies such as Cryo-electron microscopy. Due to experimental methods's high costs, extended wait times for its lengthy processes, and incompleteness of results, computational prediction is an attractive alternative. As the focus of the present paper, ß-sheet structure prediction is a major portion of overall protein structure prediction. Prediction of other substructures, such as α-helices, is simpler with lower computational time complexities. Brute force methods are the most common approach and dynamic programming is also utilized to generate all possible conformations. The current study introduces the Subset Sum Approach (SSA) for the direct search space generation method, which is shown to outperform the dynamic programming approach in terms of both time and space. For the first time, the present work has calculated both the state space cardinality of the dynamic programming approach and the search space cardinality of the general brute force approaches. In regard to a set of pruning rules, SSA has demonstrated higher efficiency with respect to both time and accuracy in comparison to state-of-the-art methods.


Assuntos
Proteínas/química , Software , Conformação Proteica em Folha beta
6.
Data Brief ; 36: 107057, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33898662

RESUMO

The data presented in this article is related to the research article entitled "Developing an ultra-efficient microsatellite discoverer to find structural differences between SARS-CoV-1 and Covid-19" [Naghibzadeh et al. 2020]. Simple tandem repeats (microsatellites, STR) are extracted and investigated across all viral families from four main viral realms. An ultra-efficient and reliable software, which is recently developed by the authors and published in the above-mentioned article, is used for extracting STRs. The analysis is done for k-mer tandem repeats where k varies from one to seven. In particular the frequency of trimer STRs is shown to be low in RNA viruses compared with DNA viruses. Special attention is paid to seven zoonotic viruses from family Coronaviridae which caused several severe human crises during last two decades including MERS, SARS 2003 and Covid-19.

7.
J Mol Graph Model ; 103: 107815, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33338845

RESUMO

Cryo-electron microscopy (cryo-EM) has recently emerged as a prominent biophysical method for macromolecular structure determination. Many research efforts have been devoted to produce cryo-EM images, density maps, at near-atomic resolution. Despite many advances in technology, the resolution of the generated density maps may not be sufficiently adequate and informative to directly construct the atomic structure of proteins. At medium-resolution (∼4-10 Å), secondary structure elements (α-helices and ß-sheets) are discernible, whereas finding the correspondence of secondary structure elements detected in the density map with those on the sequence remains a challenging problem. In this paper, an automatic framework is proposed to solve α-helix correspondence problem in three-dimensional space. Through modeling of the sequence with the aid of a novel strategy, the α-helix correspondence problem is initially transformed into a complete weighted bipartite graph matching problem. An innovative correlation-based scoring function based on a well-known and robust statistical method is proposed for weighting the graph. Moreover, two local optimization algorithms, which are Greedy and Improved Greedy algorithms, have been presented to find α-helix correspondence. A widely used data set including 16 reconstructed and 4 experimental cryo-EM maps were chosen to verify the accuracy and reliability of the proposed automatic method. The experimental results demonstrate that the automatic method is highly efficient (86.25% accuracy), robust (11.3% error rate), fast (∼1.4 s), and works independently from cryo-EM skeleton.


Assuntos
Algoritmos , Proteínas , Microscopia Crioeletrônica , Modelos Moleculares , Conformação Proteica em alfa-Hélice , Reprodutibilidade dos Testes
8.
BMC Bioinformatics ; 21(1): 400, 2020 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-32912135

RESUMO

BACKGROUND: Infectious diseases are a cruel assassin with millions of victims around the world each year. Understanding infectious mechanism of viruses is indispensable for their inhibition. One of the best ways of unveiling this mechanism is to investigate the host-pathogen protein-protein interaction network. In this paper we try to disclose many properties of this network. We focus on human as host and integrate experimentally 32,859 interaction between human proteins and virus proteins from several databases. We investigate different properties of human proteins targeted by virus proteins and find that most of them have a considerable high centrality scores in human intra protein-protein interaction network. Investigating human proteins network properties which are targeted by different virus proteins can help us to design multipurpose drugs. RESULTS: As host-pathogen protein-protein interaction network is a bipartite network and centrality measures for this type of networks are scarce, we proposed seven new centrality measures for analyzing bipartite networks. Applying them to different virus strains reveals unrandomness of attack strategies of virus proteins which could help us in drug design hence elevating the quality of life. They could also be used in detecting host essential proteins. Essential proteins are those whose functions are critical for survival of its host. One of the proposed centralities named diversity of predators, outperforms the other existing centralities in terms of detecting essential proteins and could be used as an optimal essential proteins' marker. CONCLUSIONS: Different centralities were applied to analyze human protein-protein interaction network and to detect characteristics of human proteins targeted by virus proteins. Moreover, seven new centralities were proposed to analyze host-pathogen protein-protein interaction network and to detect pathogens' favorite host protein victims. Comparing different centralities in detecting essential proteins reveals that diversity of predator (one of the proposed centralities) is the best essential protein marker.


Assuntos
Interações Hospedeiro-Patógeno , Mapas de Interação de Proteínas , Proteínas/metabolismo , Doenças Transmissíveis/metabolismo , Doenças Transmissíveis/patologia , Doenças Transmissíveis/virologia , Bases de Dados de Proteínas , Humanos , Interface Usuário-Computador , Vírus/patogenicidade
9.
Math Biosci Eng ; 17(4): 3109-3129, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-32987519

RESUMO

More than ten million deaths make influenza virus one of the deadliest of history. About half a million sever illnesses are annually reported consequent of influenza. Influenza is a parasite which needs the host cellular machinery to replicate its genome. To reach the host, viral proteins need to interact with the host proteins. Therefore, identification of host-virus protein interaction network (HVIN) is one of the crucial steps in treating viral diseases. Being expensive, time-consuming and laborious of HVIN experimental identification, force the researches to use computational methods instead of experimental ones to obtain a better understanding of HVIN. In this study, several features are extracted from physicochemical properties of amino acids, combined with different centralities of human protein-protein interaction network (HPPIN) to predict protein-protein interactions between human proteins and Alphainfluenzavirus proteins (HI-PPIs). Ensemble learning methods were used to predict such PPIs. Our model reached 0.93 accuracy, 0.91 sensitivity and 0.95 specificity. Moreover, a database including 694522 new PPIs was constructed by prediction results of the model. Further analysis showed that HPPIN centralities, gene ontology semantic similarity and conjoint triad of virus proteins are the most important features to predict HI-PPIs.


Assuntos
Alphavirus , Influenza Humana , Orthomyxoviridae , Interações Hospedeiro-Patógeno , Humanos , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas
10.
Iran J Biotechnol ; 18(1): e2547, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32884959

RESUMO

BACKGROUND: Many problems of combinatorial optimization, which are solvable only in exponential time, are known to be Non-Deterministic Polynomial hard (NP-hard). With the advent of parallel machines, new opportunities have been emerged to develop the effective solutions for NP-hard problems. However, solving these problems in polynomial time needs massive parallel machines and is not applicable up to now. OBJECTIVES: DNA (Deoxyribonucleic acid) computing provides a fantastic method to solve NP-hard problems in polynomial time. Accordingly, one of the famous NP-hard problems is assignment problem, which is designed to find the best assignment of n jobs to n persons in a way that it could maximize the profit or minimize the cost. MATERIAL AND METHODS: Applying bio molecular operations of Adelman Lipton model, a novel parallel DNA algorithm have been proposed for solving the assignment problem. RESULTS: The proposed algorithm can solve the problem in time complexity, and just O(n2) initial DNA strand in comparison with nn initial sequence, which is used by the other methods. CONCLUSIONS: In this article, using DNA computing, we proposed a parallel DNA algorithm to solve the assignment problem in linear time.

11.
Inform Med Unlocked ; 20: 100413, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32838020

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the novel coronavirus which caused the coronavirus disease 2019 pandemic and infected more than 12 million victims and resulted in over 560,000 deaths in 213 countries around the world. Having no symptoms in the first week of infection increases the rate of spreading the virus. The increasing rate of the number of infected individuals and its high mortality necessitates an immediate development of proper diagnostic methods and effective treatments. SARS-CoV-2, similar to other viruses, needs to interact with the host proteins to reach the host cells and replicate its genome. Consequently, virus-host protein-protein interaction (PPI) identification could be useful in predicting the behavior of the virus and the design of antiviral drugs. Identification of virus-host PPIs using experimental approaches are very time consuming and expensive. Computational approaches could be acceptable alternatives for many preliminary investigations. In this study, we developed a new method to predict SARS-CoV-2-human PPIs. Our model is a three-layer network in which the first layer contains the most similar Alphainfluenzavirus proteins to SARS-CoV-2 proteins. The second layer contains protein-protein interactions between Alphainfluenzavirus proteins and human proteins. The last layer reveals protein-protein interactions between SARS-CoV-2 proteins and human proteins by using the clustering coefficient network property on the first two layers. To further analyze the results of our prediction network, we investigated human proteins targeted by SARS-CoV-2 proteins and reported the most central human proteins in human PPI network. Moreover, differentially expressed genes of previous researches were investigated and PPIs of SARS-CoV-2-human network, the human proteins of which were related to upregulated genes, were reported.

12.
Inform Med Unlocked ; 19: 100356, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32501423

RESUMO

MOTIVATION: Recently, the outbreak of Coronavirus-Covid-19 has forced the World Health Organization to declare a pandemic status. A genome sequence is the core of this virus which interferes with the normal activities of its counterparts within humans. Analysis of its genome may provide clues toward the proper treatment of patients and the design of new drugs and vaccines. Microsatellites are composed of short genome subsequences which are successively repeated many times in the same direction. They are highly variable in terms of their building blocks, number of repeats, and their locations in the genome sequences. This mutability property has been the source of many diseases. Usually the host genome is analyzed to diagnose possible diseases in the victim. In this research, the focus is concentrated on the attacker's genome for discovery of its malicious properties. RESULTS: The focus of this research is the microsatellites of both SARS and Covid-19. An accurate and highly efficient computer method for identifying all microsatellites in the genome sequences is discovered and implemented, and it is used to find all microsatellites in the Coronavirus-Covid-19 and SARS2003. The Microsatellite discovery is based on an efficient indexing technique called K-Mer Hash Indexing. The method is called Fast Microsatellite Discovery (FMSD) and it is used for both SARS and Covid-19. A table composed of all microsatellites is reported. There are many differences between SARS and Covid-19, but there is an outstanding difference which requires further investigation. AVAILABILITY: FMSD is freely available at https://gitlab.com/FUM_HPCLab/fmsd_project, implemented in C on Linux-Ubuntu system. Software related contact: hossein_savari@mail.um.ac.ir.

13.
IEEE/ACM Trans Comput Biol Bioinform ; 16(6): 1936-1947, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29994539

RESUMO

Predicting ß-sheet topology (ß-topology) is one of the most critical intermediate steps towards protein structure and function prediction. The ß-topology prediction problem is defined as the determination of the optimal arrangement of ß-strand interactions within protein ß-sheets. Significant efforts have been made to predict ß-topologies. However, due to the inaccurate determination of interactions among ß-strands and the huge topological space of proteins with a large number of ß-strands, more efficient methods are required to improve both the accuracy and speed of ß-topology prediction. In order to attain higher accuracy, the current paper introduces a bidirectional strand-strand interaction graph and considers all possible orientations (parallel and antiparallel) and orders of ß-strand pairwise interactions. For the first time, the ß-topology prediction is transformed into a maximum weight disjoint path cover solution by conserving all potential topologies. Moreover, to manage the computation time, a set of candidate ß-sheets is generated and an optimization process is applied to select a subset of maximum score disjoint ß-sheets as a predicted ß-topology. The proposed method is comprehensively compared with state-of-the-art methods. The experimental results on the BetaSheet916 and BetaSheet1452 datasets reveal that the current study's approach enhances performance measurements as well as reduces the runtime.


Assuntos
Biologia Computacional/métodos , Conformação Proteica em Folha beta , Proteínas/química , Algoritmos , Teorema de Bayes , Bases de Dados de Proteínas , Humanos , Cadeias de Markov , Software
14.
Comput Biol Med ; 104: 241-249, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30530227

RESUMO

The sequence-based prediction of beta-residue contacts and beta-sheet structures contain key information for protein structure prediction. However, the determination of beta-sheet structures poses numerous challenges due to long-range beta-residue interactions and the huge number of possible beta-sheet structures. Recently gaining attention has been the prediction of residue contacts based on deep learning models whose results have led to improvement in protein structure prediction. In addition, to reduce the computational complexity of determining beta-sheet structures, it has been suggested that this problem be transformed into graph-based solutions. Consequently, the current work proposes BetaDL, a combination of a deep learning and a graph-based beta-sheet structure predictor. BetaDL adopts deep learning models to capture beta-residue contacts and improve beta-sheet structure predictions. In addition, a graph-based approach is presented to model the beta-sheets conformational space and a new score function is introduced to evaluate beta-sheets. Furthermore, the present study demonstrates that the beta-sheet structure can be predicted within an acceptable computational time by the utilization of a heuristic maximum weight independent set solution. When compared to state-of-the-art methods, experimental results from BetaSheet916 and BetaSheet1452 datasets indicate that BetaDL improves the accuracy of beta-residue contact and beta-sheet structure prediction. Using BetaDL, beta-sheet structures are predicted with a 4% and 6% improvement in the F1-score at the residue and strand levels, respectively. BetaDL's source code and data are available at http://kerg.um.ac.ir/index.php/datasets/#BetaDL.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Aprendizado Profundo , Modelos Moleculares , Proteínas/química , Software , Conformação Proteica em Folha beta
15.
Comput Biol Chem ; 70: 142-155, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28881217

RESUMO

Predicting the ß-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in ß-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all ß-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of ß-strands. Additionally, brute-force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate ß-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict ß-sheet structures with high number of ß-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art ß-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar.


Assuntos
Algoritmos , Biologia Computacional , Estrutura Secundária de Proteína
16.
Biosystems ; 162: 24-34, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28860070

RESUMO

High-throughput methods have provided us with a large amount of data pertaining to protein-protein interaction networks. The alignment of these networks enables us to better understand biological systems. Given the fact that the alignment of networks is computationally intractable, it is important to introduce a more efficient and accurate algorithm which finds as large as possible similar areas among networks. This paper proposes a new algorithm named INDEX for the global alignment of protein-protein interaction networks. INDEX has multiple phases. First, it computes topological and biological scores of proteins and creates the initial alignment based on the proposed matching score strategy. Using networks topologies and aligned proteins, it then selects a set of high scoring proteins in each phase and extends new alignments around them until final alignment is obtained. Proposing a new alignment strategy, detailed consideration of matching scores, and growth of the alignment core has led INDEX to obtain a larger common connected subgraph with a much greater number of edges compared with previous methods. Regarding other measures such as edge correctness, symmetric substructure score, and runtime, the proposed algorithm performed considerably better than existing popular methods. Our results show that INDEX can be a promising method for identifying functionally conserved interactions. AVAILABILITY: The INDEX executable file is available at https://github.com/a-mir/index/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Animais , Humanos , Proteínas/química , Proteínas/metabolismo , Software
17.
J Theor Biol ; 417: 43-50, 2017 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-28108305

RESUMO

One of the main tasks towards the prediction of protein ß-sheet structure is to predict the native alignment of ß-strands. The alignment of two ß-strands defines similar regions that may reflect functional, structural, or evolutionary relationships between them. Therefore, any improvement in ß-strands alignment not only reduces the computational search space but also improves ß-sheet structure prediction accuracy. To define the alignment scores, previous studies utilized predicted residue-residue contacts (contact maps). However, there are two serious problems using them. First, the precision of contact map prediction techniques, especially for long-range contacts (i.e., ß-residues), is still not satisfactory. Second, the residue-residue contact predictors usually utilize general properties of amino acids and disregard the structural features of ß-residues. In this paper, we consider ß-structure information, which is estimated from protein ß-sheet data sets, as alignment scores. However, the predicted contact maps are used as a prior knowledge about residues. They are used for strengthening or weakening the alignment scores in our algorithm. Thus, we can utilize both ß-residues and ß-structure information in alignment of ß-strands. The structure of dynamic programming of the alignment algorithm is changed in order to work with our prior knowledge. Moreover, the Four Russians method is applied to the proposed alignment algorithm in order to reduce the time complexity of the problem. For evaluating the proposed method, we applied it to the state-of-the-art ß-sheet structure prediction methods. The experimental results on the BetaSheet916 data set showed significant improvements in the execution time, the accuracy of ß-strands' alignment and consequently ß-sheet structure prediction accuracy. The results are available at http://conceptsgate.com/BetaSheet.


Assuntos
Algoritmos , Modelos Moleculares , Conformação Proteica em Folha beta , Biologia Computacional/métodos , Bases de Dados de Proteínas , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...