Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
2.
Sci Rep ; 13(1): 20304, 2023 11 20.
Article in English | MEDLINE | ID: mdl-37985846

ABSTRACT

Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.


Subject(s)
Biological Evolution , Evolution, Molecular , Phylogeny , Sequence Alignment , Proteins/genetics , Proteins/chemistry , Algorithms
3.
PLoS One ; 12(3): e0173583, 2017.
Article in English | MEDLINE | ID: mdl-28273143

ABSTRACT

The heat-tolerance mechanisms of (hyper)thermophilic proteins provide a unique opportunity to investigate the unsolved protein folding problem. In an attempt to determine whether the interval between residues in sequence might play a role in determining thermostability, we constructed a sequence interval-dependent value function to calculate the residue pair frequency. Additionally, we identified a new sequence arrangement pattern, where like-charged residues tend to be adjacently assembled, while unlike-charged residues are distributed over longer intervals, using statistical analysis of a large sequence database. This finding indicated that increasing the intervals between unlike-charged residues can increase protein thermostability, with the arrangement patterns of these charged residues serving as thermodynamically favorable nucleation points for protein folding. Additionally, we identified that the residue pairs K-E, R-E, L-V and V-V involving long sequence intervals play important roles involving increased protein thermostability. This work demonstrated a novel approach for considering sequence intervals as keys to understanding protein folding. Our findings of novel relationships between residue arrangement and protein thermostability can be used in industry and academia to aid the design of thermostable proteins.


Subject(s)
Models, Molecular , Protein Folding , Proteins/chemistry , Amino Acid Sequence , Amino Acids/chemistry , Thermodynamics
4.
Sci Rep ; 6: 21280, 2016 Feb 17.
Article in English | MEDLINE | ID: mdl-26883082

ABSTRACT

In the early stages of infection, Human Immunodeficiency Virus Type 1 (HIV-1) generally selects CCR5 as the primary coreceptor for entering the host cell. As infection progresses, the virus evolves and may exhibit a coreceptor-switch to CXCR4. Accurate determination coreceptor usage and identification key mutational patterns associated tropism switch are essential for selection of appropriate therapies and understanding mechanism of coreceptor change. We developed a classifier composed of two coreceptor-specific weight matrices (CMs) based on a full-scale dataset. For this classifier, we found an AUC of 0.97, an accuracy of 95.21% and an MCC of 0.885 (sensitivity 92.92%; specificity 95.54%) in a ten-fold cross-validation, outperforming all other methods on an independent dataset (13% higher MCC value than geno2pheno and 15% higher MCC value than PSSM). A web server (http://spg.med.tsinghua.edu.cn/CM.html) based on our classifier was provided. Patterns of genetic mutations that occur along with coreceptor transitions were further identified based on the score of each sequence. Six pairs of one-AA mutational patterns and three pairs of two-AA mutational patterns were identified to associate with increasing propensity for X4 tropism. These mutational patterns offered new insights into the mechanism of coreceptor switch and aided in monitoring coreceptor switch.


Subject(s)
HIV Infections/genetics , HIV Infections/virology , HIV-1/physiology , Mutation , Receptors, CCR5/genetics , Receptors, HIV/genetics , Viral Tropism , Algorithms , Computational Biology/methods , Datasets as Topic , HIV Infections/metabolism , Humans , ROC Curve , Receptors, CCR5/metabolism , Receptors, HIV/metabolism , Reproducibility of Results
5.
PLoS One ; 9(6): e100081, 2014.
Article in English | MEDLINE | ID: mdl-24925130

ABSTRACT

Accurate estimates of HIV-1 incidence are essential for monitoring epidemic trends and evaluating intervention efforts. However, the long asymptomatic stage of HIV-1 infection makes it difficult to effectively distinguish incident infections from chronic ones. Current incidence assays based on serology or viral sequence diversity are both still lacking in accuracy. In the present work, a sequence clustering based diversity (SCBD) assay was devised by utilizing the fact that viral sequences derived from each transmitted/founder (T/F) strain tend to cluster together at early stage, and that only the intra-cluster diversity is correlated with the time since HIV-1 infection. The dot-matrix pairwise alignment was used to eliminate the disproportional impact of insertion/deletions (indels) and recombination events, and so was the proportion of clusterable sequences (Pc) as an index to identify late chronic infections with declined viral genetic diversity. Tested on a dataset containing 398 incident and 163 chronic infection cases collected from the Los Alamos HIV database (last modified 2/8/2012), our SCBD method achieved 99.5% sensitivity and 98.8% specificity, with an overall accuracy of 99.3%. Further analysis and evaluation also suggested its performance was not affected by host factors such as the viral subtypes and transmission routes. The SCBD method demonstrated the potential of sequencing based techniques to become useful for identifying incident infections. Its use may be most advantageous for settings with low to moderate incidence relative to available resources. The online service is available at http://www.bioinfo.tsinghua.edu.cn:8080/SCBD/index.jsp.


Subject(s)
HIV Infections/epidemiology , HIV-1/genetics , Models, Statistical , Serogroup , AIDS Serodiagnosis , HIV Infections/transmission , Humans , Incidence , Sequence Alignment
6.
PLoS One ; 7(6): e37653, 2012.
Article in English | MEDLINE | ID: mdl-22723837

ABSTRACT

Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteins/classification , Protein Conformation , Protein Structure, Tertiary
7.
J Biol Chem ; 283(46): 31690-6, 2008 Nov 14.
Article in English | MEDLINE | ID: mdl-18779322

ABSTRACT

A double mutant cycle (DMC) approach was employed to estimate the effect of temperature on the contribution of two highly conserved salt bridges to protein stability in the hyperthermophilic protein Ssh10b. The coupling free energy were 2.4 +/- 0.4 kJ/mol at 298 K and 2.2 +/- 0.4 kJ/mol at 353 K for Glu-54/Arg-57, and 6.0 +/- 0.2 kJ/mol at 298 K and 5.9 +/- 0.6 kJ/mol at 353 K for Glu-36/Lys-68. The stability free energy of Ssh10b decrease greatly with increasing temperature, while the direct contribution of these two salt bridges to protein stability remain almost constant, providing evidence supporting the theoretical prediction that salt bridges are extremely resilient to temperature increases and thus are specially suited to improving protein stability at high temperatures. The reason for the difference in coupling free energy between salt bridges Glu-54/Arg-57 and Glu-36/Lys-68 is discussed. Comparing our results with published DMC data for the contribution of salt bridges to stability in other proteins, we found that the energy contribution of a salt bridge formed by two charged residues far apart in the primary sequence is higher than that of those formed between two very close ones. Implications of this finding are useful for engineering proteins with enhanced thermostability.


Subject(s)
Archaeal Proteins/chemistry , Archaeal Proteins/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Salts/chemistry , Sulfolobus/chemistry , Sulfolobus/metabolism , Amino Acid Sequence , Archaeal Proteins/genetics , Archaeal Proteins/isolation & purification , Crystallography, X-Ray , DNA-Binding Proteins/genetics , DNA-Binding Proteins/isolation & purification , Gene Expression , Models, Molecular , Molecular Sequence Data , Mutation/genetics , Protein Folding , Protein Multimerization , Protein Structure, Quaternary , Sequence Alignment , Sequence Homology, Amino Acid , Sulfolobus/genetics , Temperature , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...