Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
PeerJ ; 11: e16600, 2023.
Article in English | MEDLINE | ID: mdl-38089911

ABSTRACT

DNA 5-methylcytosine (5mC) is widely present in multicellular eukaryotes, which plays important roles in various developmental and physiological processes and a wide range of human diseases. Thus, it is essential to accurately detect the 5mC sites. Although current sequencing technologies can map genome-wide 5mC sites, these experimental methods are both costly and time-consuming. To achieve a fast and accurate prediction of 5mC sites, we propose a new computational approach, BERT-5mC. First, we pre-trained a domain-specific BERT (bidirectional encoder representations from transformers) model by using human promoter sequences as language corpus. BERT is a deep two-way language representation model based on Transformer. Second, we fine-tuned the domain-specific BERT model based on the 5mC training dataset to build the model. The cross-validation results show that our model achieves an AUROC of 0.966 which is higher than other state-of-the-art methods such as iPromoter-5mC, 5mC_Pred, and BiLSTM-5mC. Furthermore, our model was evaluated on the independent test set, which shows that our model achieves an AUROC of 0.966 that is also higher than other state-of-the-art methods. Moreover, we analyzed the attention weights generated by BERT to identify a number of nucleotide distributions that are closely associated with 5mC modifications. To facilitate the use of our model, we built a webserver which can be freely accessed at: http://5mc-pred.zhulab.org.cn.


Subject(s)
5-Methylcytosine , DNA , Humans , DNA/genetics , Electric Power Supplies , Eukaryota , Language
2.
ACS Omega ; 8(44): 41930-41942, 2023 Nov 07.
Article in English | MEDLINE | ID: mdl-37969991

ABSTRACT

As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace.

3.
Plants (Basel) ; 12(19)2023 Sep 23.
Article in English | MEDLINE | ID: mdl-37836107

ABSTRACT

Weeds seriously affect the yield and quality of crops. Because manual weeding is time-consuming and laborious, the use of herbicides becomes an effective way to solve the harm caused by weeds in fields. Both 5-enolpyruvyl shikimate-3-phosphate synthetase (EPSPS) and acetyltransferase genes (bialaphos resistance, BAR) are widely used to improve crop resistance to herbicides. However, cotton, as the most important natural fiber crop, is not tolerant to herbicides in China, and the EPSPS and BAR family genes have not yet been characterized in cotton. Therefore, we explore the genes of these two families to provide candidate genes for the study of herbicide resistance mechanisms. In this study, 8, 8, 4, and 5 EPSPS genes and 6, 6, 5, and 5 BAR genes were identified in allotetraploid Gossypium hirsutum and Gossypium barbadense, diploid Gossypium arboreum and Gossypium raimondii, respectively. Members of the EPSPS and BAR families were classified into three subgroups based on the distribution of phylogenetic trees, conserved motifs, and gene structures. In addition, the promoter sequences of EPSPS and BAR family members included growth and development, stress, and hormone-related cis-elements. Based on the expression analysis, the family members showed tissue-specific expression and differed significantly in response to abiotic stresses. Finally, qRT-PCR analysis revealed that the expression levels of GhEPSPS3, GhEPSPS4, and GhBAR1 were significantly upregulated after exogenous spraying of herbicides. Overall, we characterized the EPSPS and BAR gene families of cotton at the genome-wide level, which will provide a basis for further studying the functions of EPSPS and BAR genes during growth and development and herbicide stress.

4.
J Proteome Res ; 22(3): 718-728, 2023 03 03.
Article in English | MEDLINE | ID: mdl-36749151

ABSTRACT

Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at https://github.com/liuyf-a/NeuroPpred-SVM.


Subject(s)
Neuropeptides , Support Vector Machine , Algorithms , ROC Curve , Area Under Curve
5.
Interdiscip Sci ; 15(2): 293-305, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36646842

ABSTRACT

Accurately detecting linear B-cell epitopes (BCEs) makes great sense in vaccine design, immunodiagnostic test, antibody production, disease prevention and treatment. Wet-lab experiments for determining linear BCEs are both expensive and laborious, which are not able to meet the recognition needs of modern massive protein sequence data. Instead, computational methods can efficiently identify linear BCEs with low cost. Although several computational methods are available, the performance is still not satisfactory. Thus, we propose a new method, LBCE-XGB, to forecast linear BCEs based on XGBoost algorithm. To represent the biological information concealed in peptide sequences, the embeddings of the residues were obtained from a pre-trained domain-specific BERT model. In addition, the other five types of attributes comprising amino acid composition, amino acid antigenicity scale were also extracted. The best feature combination was determined according to the cross-validation results. Against the models developed by other deep learning and machine learning algorithms, LBCE-XGB achieves the top performance with an AUROC of 0.845 for fivefold cross-validation. The results on the independent test set show that our model attains an AUROC of 0.838 which is substantially higher than other state-of-the-art methods. The outcomes indicate that the representations of BERT could be an effective feature in predicting linear BCEs and we believe that LBCE-XGB could be a useful medium for detecting linear B cell epitopes with high accuracy and low cost.


Subject(s)
Algorithms , Epitopes, B-Lymphocyte , Amino Acid Sequence , Antigens/chemistry , Amino Acids
6.
Front Bioinform ; 2: 834153, 2022.
Article in English | MEDLINE | ID: mdl-36304324

ABSTRACT

As one of the most important posttranslational modifications (PTMs), protein lysine glycation changes the characteristics of the proteins and leads to the dysfunction of the proteins, which may cause diseases. Accurately detecting the glycation sites is of great benefit for understanding the biological function and potential mechanism of glycation in the treatment of diseases. However, experimental methods are expensive and time-consuming for lysine glycation site identification. Instead, computational methods, with their higher efficiency and lower cost, could be an important supplement to the experimental methods. In this study, we proposed a novel predictor, BERT-Kgly, for protein lysine glycation site prediction, which was developed by extracting embedding features of protein segments from pretrained Bidirectional Encoder Representations from Transformers (BERT) models. Three pretrained BERT models were explored to get the embeddings with optimal representability, and three downstream deep networks were employed to build our models. Our results showed that the model based on embeddings extracted from the BERT model pretrained on 556,603 protein sequences of UniProt outperforms other models. In addition, an independent test set was used to evaluate and compare our model with other existing methods, which indicated that our model was superior to other existing models.

7.
Front Genet ; 13: 853258, 2022.
Article in English | MEDLINE | ID: mdl-35432446

ABSTRACT

As one of the most important post-transcriptional modifications of RNA, 5-cytosine-methylation (m5C) is reported to closely relate to many chemical reactions and biological functions in cells. Recently, several computational methods have been proposed for identifying m5C sites. However, the accuracy and efficiency are still not satisfactory. In this study, we proposed a new method, m5Cpred-XS, for predicting m5C sites of H. sapiens, M. musculus, and A. thaliana. First, the powerful SHAP method was used to select the optimal feature subset from seven different kinds of sequence-based features. Second, different machine learning algorithms were used to train the models. The results of five-fold cross-validation indicate that the model based on XGBoost achieved the highest prediction accuracy. Finally, our model was compared with other state-of-the-art models, which indicates that m5Cpred-XS is superior to other methods. Moreover, we deployed the model on a web server that can be accessed through http://m5cpred-xs.zhulab.org.cn/, and m5Cpred-XS is expected to be a useful tool for studying m5C sites.

8.
Biochemistry ; 61(24): 2861-2869, 2022 12 20.
Article in English | MEDLINE | ID: mdl-35414181

ABSTRACT

Capnine (2-amino-3-hydroxy-15-methylhexadecane-1-sulfonate) and capnoids (N-fatty acylated capnine derivatives) are sulfonolipids present in the outer membrane of gliding bacteria in the phylum Bacteroidetes and play a role in their unique gliding motility. They are structurally similar to sphingolipids and are thought to be biosynthesized via a similar pathway. Here we report the identification and biochemical characterization of the capnine biosynthetic enzymes cysteate synthase (CapA) and cysteate-C-fatty acyltransferase (CapB) from the pathogenic gliding bacterium Capnocytophaga ochracea and NAD(P)H-dependent dehydrocapnine reductase CapC from the avian pathogen Ornithobacterium rhinotracheale. CapA catalyzes the formation of cysteate from O-phospho-l-serine and sulfite, and CapB catalyzes the formation of dehydrocapnine from cysteate and 13-methyl-myristoyl-CoA, followed by reduction by CapC. CapA is closely related to cystathionine-ß-synthase but distantly related to the archaeal cysteate synthase. Close homologues of CapA, CapB, and the CapA isozyme archaeal cysteate synthase are present in many Bacteroidetes bacteria, including environmental, pathogenic, and human oral and intestinal microbiome bacteria, suggesting the widespread ability of these bacteria to biosynthesize capnine and related sulfonolipids.


Subject(s)
Alkanesulfonic Acids , Cysteic Acid , Humans , Cysteic Acid/metabolism , Biosynthetic Pathways , Bacteria/metabolism , Bacteroidetes
9.
Org Biomol Chem ; 20(7): 1532-1537, 2022 02 16.
Article in English | MEDLINE | ID: mdl-35129563

ABSTRACT

We report for the first time the coupling of activated thioamides with alcohols to efficiently form thionoesters via a palladium-catalyzed C-N cleavage strategy. The new approach employs thioamides as a thioacylating reagent to give thionoesters in moderate to good yields. Notably, this methodology demonstrates a broad substrate scope, as alkyl/aryl alcohols are well tolerated, and this process might facilitate the synthesis of sulfur-containing compounds under simple and mild conditions.

10.
Front Psychol ; 12: 747656, 2021.
Article in English | MEDLINE | ID: mdl-35002843

ABSTRACT

Nowadays, short-form video applications have become increasingly popular due to their strong appeal to people, especially among college students. With this trend, the phenomenon of short-form video application addiction (SVA) also become prominent, which is a great risk for individuals' health and adaptation. Against this background, the present study aimed to examine the association between perceived stress and SVA addiction, as well as its mechanism-the mediating role of self-compensation motivation (SCM) and the moderating role of shyness. A total of 896 Chinese college students was recruited to complete a set of questionnaires on perceived stress (PS), SCM, shyness, and short-form video applications. The results show that PS was positively associated with SVA, and SCM partially mediated this association. In addition, both the direct association between PS and SVA and the indirect effect of SCM were moderated by shyness and were stronger for individuals with higher levels of shyness. The results could not only deepen our understanding of the underlying factors of SVA but also provide suggestions for relevant prevention and intervention procedures.

11.
BMC Bioinformatics ; 21(1): 489, 2020 Oct 30.
Article in English | MEDLINE | ID: mdl-33126851

ABSTRACT

BACKGROUND: As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. RESULTS: In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites. CONCLUSION: In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .


Subject(s)
5-Methylcytosine/metabolism , RNA/genetics , Support Vector Machine , Animals , Arabidopsis/genetics , Base Sequence , Humans , Internet , Mice , ROC Curve
12.
Biochem Biophys Res Commun ; 533(4): 1109-1114, 2020 12 17.
Article in English | MEDLINE | ID: mdl-33036753

ABSTRACT

Sulfoquinovose (6-deoxy-6-sulfoglucose, SQ) is a component of sulfolipids found in the photosynthetic membranes of plants and other photosynthetic organisms, and is one of the most abundant organosulfur compounds in nature. Microbial degradation of SQ, termed sulfoglycolysis, constitutes an important component of the biogeochemical sulfur cycle. Two sulfoglycolysis pathways have been reported, with one resembling the Embden-Meyerhof-Parnas (sulfo-EMP) pathway, and the other resembling the Entner-Doudoroff (sulfo-ED) pathway. Here we report a third sulfoglycolysis pathway in the bacterium Bacillus megaterium DSM 1804, in which sulfosugar cleavage is catalyzed by the transaldolase SqvA, which converts 6-deoxy-6-sulfofructose and glyceraldehyde 3-phosphate into fructose -6-phosphate and (S)-sulfolactaldehyde. Variations of this transaldolase-dependent sulfoglycolysis (sulfo-TAL) pathway are present in diverse bacteria, and add to the diversity of mechanisms for the degradation of this abundant organosulfur compound.


Subject(s)
Bacillus megaterium/metabolism , Glycolysis , Metabolic Networks and Pathways , Methylglucosides/metabolism , Transaldolase/metabolism , Bacillus megaterium/enzymology , Chromatography, Liquid , Computational Biology , Gene Expression , Glycolysis/genetics , Mass Spectrometry , Metabolic Networks and Pathways/genetics , Multigene Family , Phylogeny
13.
Article in English | MEDLINE | ID: mdl-19964559

ABSTRACT

Arterial stiffness is an important index for cardiovascular events. The objective of this study is to examine possible parameters related to arterial stiffness that can be estimated during simple arm movements. An experiment was conducted on 32 subjects divided into two groups, one with an age of 26+/-4 years old, and the other 61+/-9. The pulse transit time measured from electrocardiogram to finger photoplethysmogram (PPG) and the amplitude of PPG were calculated beat-to-beat for the subjects while they had their arms lowered. The results of the study showed that the ratio between percentage changes in PTT and finger height are significantly different for the two groups of subjects with different age and health conditions, indicating that parameters can be potentially extracted from this procedure to represent the difference in arterial stiffness of the two groups of subjects.


Subject(s)
Arteries/physiology , Compliance , Adult , Humans
14.
Article in English | MEDLINE | ID: mdl-19162907

ABSTRACT

Pulse arrival time (PAT) has been proposed for measuring blood pressure (BP) noninvasively and continuously. A challenge of the PAT-based BP measurement technique is to calibrate it individually. The objective of this study is to examine a previously proposed model-based calibration method utilizing hydrostatic pressure for BP estimation. A preliminary experiment has been conducted on eight subjects aged from 23 to 36. Each subject was asked to raise their right arms to five different heights (H). At each height, PAT and brachial BP were measured from the elevated arm and the resting arm respectively. The data recorded at each height were used to calibrate a subject-dependent coefficient b which was then used to estimate his/her brachial SBP before and after exercise. It was found that the estimation results were influenced by H and k, which is a constant time interval subtracted from PAT. In this study, the estimation errors were found to be more sensitive to H than to k for -30< or =H< or =30 cm and 40< or =k< or =70 ms.


Subject(s)
Blood Pressure Monitoring, Ambulatory/instrumentation , Blood Pressure/physiology , Adult , Blood Pressure Monitoring, Ambulatory/methods , Calibration , Humans , Hydrostatic Pressure , Pulse/instrumentation , Pulse/methods , Young Adult
15.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 6404-5, 2006.
Article in English | MEDLINE | ID: mdl-17947191

ABSTRACT

The changes in pulse transit time (PTT) during the continuous slow deflation of brachial cuff were early reported; however, the PTTs obtained for specific cuff pressures during inflation or deflation have not been compared before. Therefore, the objective of this study is to examine the differences in PTT when cuff pressure (P(cuff)) was raised or deflated to the desired level. Sixteen subjects participated in this study and according to their systolic blood pressure (SBP) and diastolic blood pressure (DBF), 8 levels of P(cuff) were predetermined for them individually. P(cuff) was directly raised to each predetermined level while 20 seconds of electrocardiographic and photoplethysmographic signals were recorded for the calculation of PTT. Another set of recordings were taken when P(cuff) was raised above the SBP and deflated to the predetermined levels. The results of this study showed that PTT increase significantly when P(cuff) was larger than 80% of DBF, regardless of whether P(cuff) was reached by inflation or deflation. Overall, no significant difference was found between PTT obtained during inflation and deflation for 12 out of the 16 subjects. To conclude, changes in PTT are mainly induced by the level of cuff pressure when there is no prolonged period of artery occlusion.


Subject(s)
Blood Pressure Determination , Blood Pressure Monitors , Heart Rate , Pulse , Adolescent , Adult , Blood Pressure , Diastole , Electrocardiography/methods , Humans , Male , Pressure , Systole
SELECTION OF CITATIONS
SEARCH DETAIL
...