Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
IEEE J Biomed Health Inform ; 27(12): 6029-6038, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37703167

ABSTRACT

Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.


Subject(s)
Language , Unified Medical Language System , International Classification of Diseases , Natural Language Processing
2.
Virol Sin ; 38(4): 541-548, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37211247

ABSTRACT

The Influenza A (H1N1) pdm09 virus caused a global pandemic in 2009 and has circulated seasonally ever since. As the continual genetic evolution of hemagglutinin in this virus leads to antigenic drift, rapid identification of antigenic variants and characterization of the antigenic evolution are needed. In this study, we developed PREDAC-H1pdm, a model to predict antigenic relationships between H1N1pdm viruses and identify antigenic clusters for post-2009 pandemic H1N1 strains. Our model performed well in predicting antigenic variants, which was helpful in influenza surveillance. By mapping the antigenic clusters for H1N1pdm, we found that substitutions on the Sa epitope were common for H1N1pdm, whereas for the former seasonal H1N1, substitutions on the Sb epitope were more common in antigenic evolution. Additionally, the localized epidemic pattern of H1N1pdm was more obvious than that of the former seasonal H1N1, which could make vaccine recommendation more sophisticated. Overall, the antigenic relationship prediction model we developed provides a rapid determination method for identifying antigenic variants, and the further analysis of evolutionary and epidemic characteristics can facilitate vaccine recommendations and influenza surveillance for H1N1pdm.


Subject(s)
Influenza A Virus, H1N1 Subtype , Influenza Vaccines , Influenza, Human , Humans , Influenza A Virus, H1N1 Subtype/genetics , Influenza, Human/epidemiology , Epitopes/genetics , Evolution, Molecular , Phylogeny , Hemagglutinin Glycoproteins, Influenza Virus/genetics
3.
Health Data Sci ; 3: 0011, 2023.
Article in English | MEDLINE | ID: mdl-38487197

ABSTRACT

Background: Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge to processing Chinese medical texts for fine-grained medical knowledge representation. To unify Chinese medical terminologies, mapping Chinese medical entities to their English counterparts in the Unified Medical Language System (UMLS) is an efficient solution. However, their mappings have not been investigated sufficiently in former research. In this study, we explore strategies for mapping Chinese medical entities to the UMLS and systematically evaluate the mapping performance. Methods: First, Chinese medical entities are translated to English using multiple web-based translation engines. Then, 3 mapping strategies are investigated: (a) string-based, (b) semantic-based, and (c) string and semantic similarity combined. In addition, cross-lingual pretrained language models are applied to map Chinese medical entities to UMLS concepts without translation. All of these strategies are evaluated on the ICD10-CN, Chinese Human Phenotype Ontology (CHPO), and RealWorld datasets. Results: The linear combination method based on the SapBERT and term frequency-inverse document frequency bag-of-words models perform the best on all evaluation datasets, with 91.85%, 82.44%, and 78.43% of the top 5 accuracies on the ICD10-CN, CHPO, and RealWorld datasets, respectively. Conclusions: In our study, we explore strategies for mapping Chinese medical entities to the UMLS and identify a satisfactory linear combination method. Our investigation will facilitate Chinese medical entity normalization and inspire research that focuses on Chinese medical ontology development.

4.
J Med Internet Res ; 24(6): e37213, 2022 06 03.
Article in English | MEDLINE | ID: mdl-35657661

ABSTRACT

BACKGROUND: Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario. OBJECTIVE: In this study, we aimed to develop a deep-phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. METHODS: The core of the methodology was to identify linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and perform deep phenotyping of Chinese EHRs by recognizing linguistic patterns in free text. Specifically, 1000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (Semantic Structured Unit of Phenotypes). The annotation data set was randomly divided into a training set (n=700, 70%) and a testing set (n=300, 30%). The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool-MEME (Multiple Expectation Maximums for Motif Elicitation)-was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning-based method for named entity recognition and a pattern recognition-based method for attribute prediction. RESULTS: In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers-bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern-based method. CONCLUSIONS: We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non-English-speaking countries.


Subject(s)
Electronic Health Records , Medical Informatics , Algorithms , Humans , Phenotype , Semantics
5.
IEEE J Biomed Health Inform ; 26(8): 4142-4152, 2022 08.
Article in English | MEDLINE | ID: mdl-35609107

ABSTRACT

Electronic health record (EHR) resources are valuable but remain underexplored because most clinical information, especially phenotype information, is buried in the free text of EHRs. An intelligent annotation tool plays an important role in unlocking the full potential of EHRs by transforming free-text phenotype information into a computer-readable form. Deep phenotyping has shown its advantage in representing phenotype information in EHRs with high fidelity; however, most existing annotation tools are not suitable for the deep phenotyping task. Here, we developed an intelligent annotation tool named PIAT with a major focus on the deep phenotyping of Chinese EHRs. PIAT can improve the annotation efficiency for EHR-based deep phenotyping with a simple but effective interactive interface, automatic preannotation support, and a learning mechanism. Specifically, experts can proofread automatic annotation results from the annotation algorithm in the web-based interactive interface, and EHRs reviewed by experts can be used for evolving the underlying annotation algorithm. In this way, the annotation process of deep phenotyping EHRs will become easier. In conclusion, we create a powerful intelligent system for the deep phenotyping of Chinese EHRs. It is hoped that our work will inspire further studies in constructing intelligent systems for deep phenotyping English and non-English EHRs.


Subject(s)
Algorithms , Electronic Health Records , China , Phenotype
6.
Front Endocrinol (Lausanne) ; 13: 839829, 2022.
Article in English | MEDLINE | ID: mdl-35282438

ABSTRACT

Objective: The purpose of this study was to predict elevated TSH levels by developing an effective machine learning model based on large-scale physical examination results. Methods: Subjects who underwent general physical examinations from January 2015 to December 2019 were enrolled in this study. A total of 21 clinical parameters were analyzed, including six demographic parameters (sex, age, etc.) and 15 laboratory parameters (thyroid peroxidase antibody (TPO-Ab), thyroglobulin antibody (TG-Ab), etc.). The risk factors for elevated TSH levels in the univariate and multivariate Logistic analyses were used to construct machine learning models. Four machine learning models were trained to predict the outcome of elevated TSH levels one year/two years after patient enrollment, including decision tree (DT), linear regression (LR), eXtreme Gradient boosting (XGBoost), and support vector machine (SVM). Feature importance was calculated in the machine learning models to show which parameter plays a vital role in predicting elevated TSH levels. Results: A total of 12,735 individuals were enrolled in this study. Univariate and multivariate Logistic regression analyses showed that elevated TSH levels were significantly correlated with gender, FT3/FT4, total cholesterol (TC), TPO-Ab, Tg-Ab, creatinine (Cr), and triglycerides (TG). Among the four machine learning models, XGBoost performed best in the one-year task of predicting elevated TSH levels (AUC (0.87(+/- 0.03))). The most critical feature in this model was FT3/FT4, followed by TPO-Ab and other clinical parameters. In the two-year task of predicting TSH levels, none of the four models performed well. Conclusions: In this study, we trained an effective XGBoost model for predicting elevated TSH levels one year after patient enrollment. The measurement of FT3 and FT4 could provide an early warning of elevated TSH levels to prevent relative thyroid diseases.


Subject(s)
Thyrotropin , Thyroxine , Humans , Machine Learning , Physical Examination , Triiodothyronine
8.
J Med Internet Res ; 23(6): e26892, 2021 06 15.
Article in English | MEDLINE | ID: mdl-34128811

ABSTRACT

BACKGROUND: Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs because they only consider the core concepts of phenotypes while neglecting the details (attributes) associated with these phenotypes. OBJECTIVE: To characterize the details of disease phenotypes for clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (semantic structured unit of phenotypes). METHODS: PhenoSSU is an "entity-attribute-value" model by its very nature, and it aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values. A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes. To automatically construct fine-grained phenotype knowledge graphs, a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed. RESULTS: Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. A total of 4020 PhenoSSU instances were annotated in these knowledge graphs, and 3757 of them (89.5%) were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines. By comparison, other information models, such as the clinical element model and the HL7 fast health care interoperability resource model, could only capture the full semantics underlying 48.4% (2034/4020) and 21.8% (914/4020) of the descriptions of phenotypes listed in clinical guidelines, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction. CONCLUSIONS: PhenoSSU is an effective information model for the precise representation of phenotype knowledge for clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs. Our work will potentially shift the focus of medical knowledge engineering from a coarse-grained level to a more fine-grained level.


Subject(s)
Communicable Diseases , Semantics , Artificial Intelligence , Communicable Diseases/diagnosis , Humans , Pattern Recognition, Automated , Phenotype
9.
Biosaf Health ; 2(4): 206-209, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32905055

ABSTRACT

Since coronavirus disease 2019 (COVID-19) might circulate in the following seasons, it is essential to understand how COVID-19 influences other respiratory diseases, especially influenza. In this study, we analyzed the influenza activity from mid-November 2019 to March 2020 in Chinese mainland and found that the influenza season ended much earlier than previous seasons for all subtypes and lineages, which may have resulted from the circulation of COVID-19 and measures such as travel control and personal protection. These findings provide rudimentary knowledge of the co-circulation patterns of the two types of viruses.

10.
Bioinformatics ; 36(10): 3251-3253, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32049310

ABSTRACT

MOTIVATION: Newly emerging influenza viruses keep challenging global public health. To evaluate the potential risk of the viruses, it is critical to rapidly determine the phenotypes of the viruses, including the antigenicity, host, virulence and drug resistance. RESULTS: Here, we built FluPhenotype, a one-stop platform to rapidly determinate the phenotypes of the influenza A viruses. The input of FluPhenotype is the complete or partial genomic/protein sequences of the influenza A viruses. The output presents five types of information about the viruses: (i) sequence annotation including the gene and protein names as well as the open reading frames, (ii) potential hosts and human-adaptation-associated amino acid markers, (iii) antigenic and genetic relationships with the vaccine strains of different HA subtypes, (iv) mammalian virulence-related amino acid markers and (v) drug resistance-related amino acid markers. FluPhenotype will be a useful bioinformatic tool for surveillance and early warnings of the newly emerging influenza A viruses. AVAILABILITY AND IMPLEMENTATION: It is publicly available from: http://www.computationalbiology.cn : 18888/IVEW. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Influenza A virus , Influenza, Human , Orthomyxoviridae , Amino Acid Sequence , Animals , Hemagglutinin Glycoproteins, Influenza Virus , Humans , Influenza A virus/genetics
11.
J Biomed Inform ; 102: 103372, 2020 02.
Article in English | MEDLINE | ID: mdl-31901507

ABSTRACT

BACKGROUND: A wealth of clinical information is buried in free text of electronic health records (EHR), and converting clinical information to machine-understandable form is crucial for the secondary use of EHRs. Laboratory test results, as one of the most important types of clinical information, are written in various styles in free text of EHRs. This has brought great difficulties for data integration and utilization of EHRs. Therefore, developing technology to normalize different expressions of laboratory test results in free text is indispensable for the secondary use of EHRs. METHODS: In this study, we developed a knowledge-based method named LATTE (transforming lab test results), which could transform various expressions of laboratory test results into a normalized and machine-understandable format. We first identified the analyte of a laboratory test result with a dictionary-based method and then designed a series of rules to detect information associated with the analyte, including its specimen, measured value, unit of measure, conclusive phrase and sampling factor. We determined whether a test result is normal or abnormal by understanding the meaning of conclusive phrases or by comparing its measured value with an appropriate normal range. Finally, we converted various expressions of laboratory test results, either in numeric or textual form, into a normalized form as "specimen-analyte-abnormality". With this method, a laboratory test with the same type of abnormality would have the same representation, regardless of the way that it is mentioned in free text. RESULTS: LATTE was developed and optimized on a training set including 8894 laboratory test results from 756 EHRs, and evaluated on a test set including 3740 laboratory test results from 210 EHRs. Compared to experts' annotations, LATTE achieved a precision of 0.936, a recall of 0.897 and an F1 score of 0.916 on the training set, and a precision of 0.892, a recall of 0.843 and an F1 score of 0.867 on the test set. For 223 laboratory tests with at least two different expression forms in the test set, LATTE transformed 85.7% (2870/3350) of laboratory test results into a normalized form. Besides, LATTE achieved F1 scores above 0.8 for EHRs from 18 of 21 different hospital departments, indicating its generalization capabilities in normalizing laboratory test results. CONCLUSION: In conclusion, LATTE is an effective method for normalizing various expressions of laboratory test results in free text of EHRs. LATTE will facilitate EHR-based applications such as cohort querying, patient clustering and machine learning. AVAILABILITY: LATTE is freely available for download on GitHub (https://github.com/denglizong/LATTE).


Subject(s)
Clinical Laboratory Techniques/standards , Electronic Health Records , China , Humans , Knowledge Bases , Machine Learning
12.
Sci Rep ; 7(1): 14295, 2017 10 30.
Article in English | MEDLINE | ID: mdl-29085020

ABSTRACT

Many host specific mutations have been detected in influenza A viruses (IAVs). However, their effects on hydrogen bond (H-bond) variations have rarely been investigated. In this study, 60 host specific sites were identified in the internal proteins of avian and human IAVs, 27 of which contained mutations with effects on H-bonds. Besides, 30 group specific sites were detected in HA and NA. Twenty-six of 36 mutations existing at these group specific sites caused H-bond loss or formation in at least one subtype. The number of mutations in isolations of 2009 pandemic H1N1, human-infecting H5N1 and H7N9 varied. The combinations of mutations and H-bond changes in these three subtypes of IAVs were also different. In addition, the mutations in isolations of H5N1 distributed more scattered than those in 2009 pandemic H1N1 and H7N9. Eight wave specific mutations in isolations of the fifth H7N9 wave were also identified. Three of them, R140K in HA, Y170H in NA, and R340K in PB2, were capable of resulting in H-bond loss. As mentioned above, these host or group or wave specific H-bond variations provide us with a new field of vision for understanding the changes of structural features in the human adaptation of IAVs.


Subject(s)
Hemagglutinin Glycoproteins, Influenza Virus/genetics , Influenza A Virus, H1N1 Subtype/genetics , Influenza A Virus, H5N1 Subtype/genetics , Influenza A Virus, H7N9 Subtype/genetics , Neuraminidase/genetics , Virus Attachment , Adaptation, Physiological , Humans , Hydrogen Bonding , Influenza A Virus, H1N1 Subtype/metabolism , Influenza A Virus, H5N1 Subtype/metabolism , Influenza A Virus, H7N9 Subtype/metabolism , Mutation/genetics
13.
Bioinformatics ; 33(12): 1881-1882, 2017 Jun 15.
Article in English | MEDLINE | ID: mdl-28174895

ABSTRACT

MOTIVATION: Previously, we developed a computational model to identify genomic co-occurrence networks that was applied to capture the coevolution patterns within genomes of influenza viruses. To facilitate easy public use of this model, an R package 'cooccurNet' is presented here. RESULTS: 'cooccurNet' includes functionalities of construction and analysis of residues (e.g. nucleotides, amino acids and SNPs) co-occurrence network. In addition, a new method for measuring residues coevolution, defined as residue co-occurrence score (RCOS), is proposed and implemented in 'cooccurNet' based on the co-occurrence network. AVAILABILITY AND IMPLEMENTATION: 'cooccurNet' is publicly available on CRAN repositories under the GPL-3 Open Source License ( http://cran.r-project.org/package=cooccurNet ). CONTACT: taijiao@ibms.pumc.edu.cn or pys2013@hnu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computer Simulation , Genome, Viral , Genomics/methods , Orthomyxoviridae/genetics , Software , Evolution, Molecular , Polymorphism, Single Nucleotide
14.
Bioinformatics ; 32(16): 2526-7, 2016 08 15.
Article in English | MEDLINE | ID: mdl-27153622

ABSTRACT

MOTIVATION: Timely surveillance of the antigenic dynamics of the influenza virus is critical for accurate selection of vaccine strains, which is important for effective prevention of viral spread and infection. RESULTS: Here, we provide a computational platform, called PREDAC-H3, for antigenic surveillance of human influenza A(H3N2) virus based on the sequence of surface protein hemagglutinin (HA). PREDAC-H3 not only determines the antigenic variants and antigenic cluster (grouped for similar antigenicity) to which the virus belongs, based on HA sequences, but also allows visualization of the spatial distribution and temporal dynamics of antigenic clusters of viruses isolated from around the world, thus assisting in antigenic surveillance of human influenza A(H3N2) virus. AVAILABILITY AND IMPLEMENTATION: It is publicly available from: http://biocloud.hnu.edu.cn/influ411/html/index.php CONTACTS: : yshu@cnic.org.cn or taijiao@moon.ibp.ac.cn.


Subject(s)
Computational Biology/methods , Epidemiological Monitoring , Hemagglutinins , Influenza A Virus, H3N2 Subtype , Influenza, Human/epidemiology , Sequence Analysis, DNA , Antigenic Variation , Antigens, Viral , DNA, Viral , Hemagglutinin Glycoproteins, Influenza Virus , Humans , Influenza A virus
15.
Nat Mater ; 15(2): 217-26, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26569474

ABSTRACT

The notion that animals can detect the Earth's magnetic field was once ridiculed, but is now well established. Yet the biological nature of such magnetosensing phenomenon remains unknown. Here, we report a putative magnetic receptor (Drosophila CG8198, here named MagR) and a multimeric magnetosensing rod-like protein complex, identified by theoretical postulation and genome-wide screening, and validated with cellular, biochemical, structural and biophysical methods. The magnetosensing complex consists of the identified putative magnetoreceptor and known magnetoreception-related photoreceptor cryptochromes (Cry), has the attributes of both Cry- and iron-based systems, and exhibits spontaneous alignment in magnetic fields, including that of the Earth. Such a protein complex may form the basis of magnetoreception in animals, and may lead to applications across multiple fields.


Subject(s)
Iron-Sulfur Proteins/metabolism , Magnetics , Animals , Antibodies , Biocompatible Materials , Biophysics , Columbidae/metabolism , Computer Simulation , Drosophila melanogaster/metabolism , Gene Expression Regulation , Genome-Wide Association Study , Iron-Sulfur Proteins/genetics , Microscopy, Electron , Models, Molecular , Mutagenesis , Protein Conformation , Protein Transport , RNA, Messenger/genetics , RNA, Messenger/metabolism , Retina/metabolism
18.
Bioinformatics ; 30(17): 2440-6, 2014 Sep 01.
Article in English | MEDLINE | ID: mdl-24813541

ABSTRACT

MOTIVATION: Protein domains are fundamental units of protein structure, function and evolution; thus, it is critical to gain a deep understanding of protein domain organization. Previous works have attempted to identify key residues involved in organization of domain architecture. Because one of the most important characteristics of domain architecture is the arrangement of secondary structure elements (SSEs), here we present a picture of domain organization through an integrated consideration of SSE arrangements and residue contact networks. RESULTS: In this work, by representing SSEs as main-chain scaffolds and side-chain interfaces and through construction of residue contact networks, we have identified the SSE interfaces well packed within protein domains as SSE packing clusters. In total, 17 334 SSE packing clusters were recognized from 9015 Structural Classification of Proteins domains of <40% sequence identity. The similar SSE packing clusters were observed not only among domains of the same folds, but also among domains of different folds, indicating their roles as common scaffolds for organization of protein domains. Further analysis of 14 small single-domain proteins reveals a high correlation between the SSE packing clusters and the folding nuclei. Consistent with their important roles in domain organization, SSE packing clusters were found to be more conserved than other regions within the same proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Protein Structure, Secondary , Protein Structure, Tertiary , Computational Biology/methods , Humans , Models, Molecular , Protein Kinases/chemistry
20.
PLoS One ; 9(2): e89935, 2014.
Article in English | MEDLINE | ID: mdl-24587135

ABSTRACT

Many template-based modeling (TBM) methods have been developed over the recent years that allow for protein structure prediction and for the study of structure-function relationships for proteins. One major problem all TBM algorithms face, however, is their unsatisfactory performance when proteins under consideration are low-homology. To improve the performance of TBM methods for such targets, a novel model evaluation method was developed here, and named MEFTop. Our novel method focuses on evaluating the topology by using two novel groups of features. These novel features included secondary structure element (SSE) contact information and 3-dimensional topology information. By combining MEFTop algorithm with FR-t5, a threading program developed by our group, we found that this modified TBM program, which was named FR-t5-M, exhibited significant improvements in predictive abilities for low-homology protein targets. We further showed that the MEFTop could be a generalized method to improve threading programs for low-homology protein targets. The softwares (FR-t5-M and MEFTop) are available to non-commercial users at our website: http://jianglab.ibp.ac.cn/lims/FRt5M/FRt5M.html.


Subject(s)
Algorithms , Models, Molecular , Proteins/chemistry , Software , Structural Homology, Protein , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL
...