Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Comput Biol Med ; 179: 108904, 2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-39047504

RESUMEN

Urinary tract stones are a common and frequently recurring medical issue. Accurately predicting the success rate after surgery can help avoid ineffective medical procedures and reduce unnecessary healthcare costs. This study collected data from patients with upper ureter stones who underwent extracorporeal shock wave lithotripsy, including cases of successful as well as unsuccessful stone removal after the first and second lithotripsy procedures, and constructed prediction systems for the outcomes of the first and second lithotripsy procedures. Features were extracted from three categories of information: patient characteristics, stone characteristics, and extracorporeal shock wave lithotripsy machine data, and additional features were created using Feature Creation. Finally, the impact of features on the models was analyzed using six methods to calculate feature importance. Our prediction model for the first lithotripsy, selected from among 43 methods and seven ensemble learning techniques, achieves an AUC of 0.91. For the second lithotripsy, the AUC reaches 0.76. The results indicate that the detailed and binary information provided by patients regarding their history of stone experiences contributes differently to the predictive accuracy of the first and second lithotripsy procedures. The prediction tool is available at https://predictor.isu.edu.tw/ks.

3.
Cancers (Basel) ; 15(14)2023 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-37509351

RESUMEN

(1) Background: Breast cancer is the second leading cause of cancer death among women. The accurate prediction of survival intervals will help physicians make informed decisions about treatment strategies or the use of palliative care. (2) Methods: Gene expression is predictive and correlates to patient prognosis. To establish a reliable prediction tool, we collected a total of 1187 RNA-seq data points from breast cancer patients (median age 58 years) in Fragments Per Kilobase Million (FPKM) format from the TCGA database. Among them, we selected 144 patients with date of death information to establish the SaBrcada-AD dataset. We first normalized the SaBrcada-AD dataset to TPM to build the survival prediction model SaBrcada. After normalization and dimension raising, we used the differential gene expression data to test eight different deep learning architectures. Considering the effect of age on prognosis, we also performed a stratified random sampling test on all ages between the lower and upper quartiles of patient age, 48 and 69 years; (3) Results: Stratifying by age 61, the performance of SaBrcada built by GoogLeNet was improved to a highest accuracy of 0.798. We also built a free website tool to provide five predicted survival periods: within six months, six months to one year, one to three years, three to five years, or over five years, for clinician reference. (4) Conclusions: We built the prediction model, SaBrcada, and the website tool of the same name for breast cancer survival analysis. Through these models and tools, clinicians will be provided with survival interval information as a basis for formulating precision medicine.

4.
Artículo en Inglés | MEDLINE | ID: mdl-35270634

RESUMEN

Purposes: This study discussed the accommodative response and pupil size of myopic adults using a double-mirror system (DMS). The viewing distance could be extended to 2.285 m by using a DMS, which resulted in a reduction and increase in the accommodative response and pupil size, respectively. By using a DMS, the reduction of the accommodative response could improve eye fatigue with near work. Method: Sixty subjects aged between 18 and 22 years old were recruited in this study, and the average age was 20.67 ± 1.09. There were two main steps in the experimental process. In the first step, we examined the subjects' refraction state and visual function, and then fitted disposable contact lenses with a corresponding refractive error. In the second step, the subjects gazed at an object from a viewing distance of 0.4 m and at a virtual image through a DMS, respectively, and the accommodative response and pupil size were measured using an open field autorefractor. Results: When the subjects gazed at the object from a distance of 0.4 m, or gazed at the virtual image through a DMS, the mean value of the accommodative response was 1.74 ± 0.43 or 0.16 ± 0.47 D, and the pupil size was 3.98 ± 0.06 mm or 4.18 ± 0.58 mm, respectively. With an increase in the viewing distance from 0.4 m to 2.285 m, the accommodative response and pupil size were significantly reduced about 1.58 D and enlarged about 0.2 mm, respectively. For three asterisk targets of different sizes (1 cm × 1 cm, 2 cm × 2 cm, and 3 cm × 3 cm), the mean accommodative response and pupil size through the DMS was 0.19 ± 0.16, 0.27 ± 0.24, 0.26 ± 0.19 D; and 4.20 ± 1.02, 3.94 ± 0.73, 4.21 ± 0.57 mm, respectively. The changes of the accommodative response and pupil size were not significant with the size of the targets (p > 0.05). In the low or high myopia group, the accommodative response of 0.4 m and 2.285 m was 1.68 ± 0.42 D and 0.21 ± 0.48 D; and 1.88 ± 0.25 D and 0.05 ± 0.40 D, respectively. The accommodative response was significantly reduced by 1.47 D and 1.83 D for these two groups. The accommodative microfluctuations (AMFs) were stable when a DMS was used; on the contrary, the AMFs were unstable at a viewing distance of 0.4 m. Conclusions: In this study, the imaging through a DMS extended the viewing distance and enlarged the image, and resulted in a reduction in the accommodative response and an increase in the pupil size. For the low myopia group and the high myopia group, the accommodative response and pupil size were statistically significantly different before and after the use of the DMS. The reduction of the accommodative response could be applied for the improvement of asthenopia.


Asunto(s)
Astenopía , Miopía , Acomodación Ocular , Adolescente , Adulto , Humanos , Pupila/fisiología , Refracción Ocular , Pruebas de Visión , Adulto Joven
5.
Artículo en Inglés | MEDLINE | ID: mdl-35206141

RESUMEN

There are many factors that affect vitamin D supplementation, including those from the theory of planned behaviour (TPB); however, how the perceived benefit acts in the model remains unknown. In the current study, we tested the efficacy of the TPB and the impacts of the perceived benefit (PBE) in the model. The subjects were 287 customers who purchased vitD from pharmacies in major cities in Taiwan. A structured questionnaire was used to collect the data. t-tests, analysis of variance (ANOVA), regression analyses, and path analysis via SPSS and AMOS were used to analyse the data. The original TPB model explained 47.5% of the variance of intention with the three variables of attitude (ß = 0.261), perceived behavioural control (ß = 0.183), and subjective norms (ß = 0.169). The model that incorporated PBE increased the explained variance to 59.7%, and PBE became the strongest predictor (ß = 0.310) and a significant mediator linking attitude, subjective norms, perceived control (ANC) with supplementation intention. PBE and attitude were the two most important variables in predicting vitD supplementation intention. We suggest that updated information regarding dietary sources of vitD and its benefits should be included in health- or nutrition-related courses in education programs for the overall health of the nation.


Asunto(s)
Actitud , Intención , Suplementos Dietéticos , Humanos , Teoría Psicológica , Encuestas y Cuestionarios , Vitamina D
6.
Viruses ; 13(8)2021 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-34452396

RESUMEN

Upon invasion by foreign pathogens, specific antibodies can identify specific foreign antigens and disable them. As a result of this ability, antibodies can help with vaccine production and food allergen detection in patients. Many studies have focused on predicting linear B-cell epitopes, but only two prediction tools are currently available to predict the sub-type of an epitope. NIgPred was developed as a prediction tool for IgA, IgE, and IgG. NIgPred integrates various heterologous features with machine-learning approaches. Differently from previous studies, our study considered peptide-characteristic correlation and autocorrelation features. Sixty kinds of classifier were applied to construct the best prediction model. Furthermore, the genetic algorithm and hill-climbing algorithm were used to select the most suitable features for improving the accuracy and reducing the time complexity of the training model. NIgPred was found to be superior to the currently available tools for predicting IgE epitopes and IgG epitopes on independent test sets. Moreover, NIgPred achieved a prediction accuracy of 100% for the IgG epitopes of a coronavirus data set. NIgPred is publicly available at our website.


Asunto(s)
Epítopos de Linfocito B/inmunología , Inmunoglobulina A/inmunología , Inmunoglobulina E/inmunología , Inmunoglobulina G/inmunología , Aprendizaje Automático , SARS-CoV-2/inmunología , Algoritmos , COVID-19/inmunología , Proteínas de la Envoltura de Coronavirus/inmunología , Proteínas de la Nucleocápside de Coronavirus/inmunología , Epítopos de Linfocito B/química , Humanos , Fosfoproteínas/inmunología , Programas Informáticos , Glicoproteína de la Espiga del Coronavirus/inmunología , Proteínas de la Matriz Viral/inmunología
7.
Front Genet ; 12: 798107, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34976025

RESUMEN

To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer.

8.
Int J Mol Sci ; 21(21)2020 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-33114312

RESUMEN

Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance. GasPhos is available at http://predictor.nchu.edu.tw/GasPhos.


Asunto(s)
Biología Computacional/métodos , Fosfotransferasas/química , Algoritmos , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático , Fosforilación , Fosfotransferasas/genética , Programas Informáticos
9.
Biomed Res Int ; 2020: 2654815, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32566676

RESUMEN

Information about the expression status of hormone receptors such as estrogen receptor (ER), progesterone receptor (PR), and Her-2 is crucial in the management and prognosis of breast cancer. Therefore, the retrieval and analysis of hormone receptor expression characteristics in metastatic breast cancer may be valuable in breast cancer study. Herein, we report a text mining tool based on word/phrase matching that retrieves hormone receptor expression data of regional or distant metastatic breast cancer from pathology reports. It was tested on pathology reports at the China Medical University Hospital from 2013 to 2018. The tool showed specificities of 91.6% and 63.3% for the detection of regional lymph node metastasis and distant metastasis, respectively. Sensitivity in immunohistochemical study result extraction in these cases was 98.6% for distant metastasis and 78.3% for regional lymph node metastasis. Statistical analysis on these retrieved data showed significant difference s in PR and Her-2 expressions between regional and metastatic breast cancer, which is compatible with previous studies. In conclusion, our study shows that metastatic breast cancer hormone receptor expression characteristics can be retrieved by text mining. The algorithm designed in this study may be useful in future studies about text mining in pathology reports.


Asunto(s)
Neoplasias de la Mama , Minería de Datos/métodos , Receptor ErbB-2/metabolismo , Receptores de Estrógenos/metabolismo , Receptores de Progesterona/metabolismo , Algoritmos , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Biología Computacional , Femenino , Humanos , Metástasis Linfática
10.
PLoS One ; 15(6): e0234084, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32497121

RESUMEN

Hepatocellular carcinoma (HCC), which is associated with an absence of obvious symptoms and poor prognosis, is the second leading cause of cancer death worldwide. Genome-wide molecular biology studies should provide biological insights into HCC development. Based on the importance of phosphorylation for signal transduction, several protein kinase inhibitors have been developed that improve the survival of cancer patients. However, a comprehensive database of HCC-related phosphorylated biomarkers (HCCPMs) and novel HCCPMs prediction platform has been lacking. We have thus constructed the dBMHCC databases to provide expression profiles, phosphorylation and drug information, and evidence type; gathered information on HCC-related pathways and their involved genes as candidate HCC biomarkers; and established a system for evaluating protein phosphorylation and HCC-related biomarkers to improve the reliability of biomarker prediction. The resulting dBMHCC contains 611 notable HCC-related genes, 234 HCC-related pathways, 17 phosphorylation-related motifs and their 255 corresponding protein kinases, 5955 HCC biomarkers, and 1077 predicted HCCPMs. Methionine adenosyltransferase 2B (MAT2B) and acireductone dioxygenase 1 (ADI1), which regulate HCC development and hepatitis C virus infection, respectively, were among the top 10 HCCPMs predicted by dBMHCC. Platelet-derived growth factor receptor alpha (PDGFRA), which had the highest evaluation score, was identified as the target of one HCC drug (Regorafenib), five cancer drugs, and four non-cancer drugs. dBMHCC is an open resource for HCC phosphorylated biomarkers, which supports researchers investigating the development of HCC and designing novel diagnosis methods and drug treatments. Database URL: http://predictor.nchu.edu.tw/dBMHCC.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Carcinoma Hepatocelular/metabolismo , Biología Computacional/métodos , Bases de Datos Factuales , Neoplasias Hepáticas/metabolismo , Animales , Carcinoma Hepatocelular/diagnóstico , Humanos , Internet , Neoplasias Hepáticas/diagnóstico , Ratones , Fosforilación , Pronóstico
11.
PLoS One ; 15(4): e0232087, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32348325

RESUMEN

Many proteins exist in natures as oligomers with various quaternary structural attributes rather than as single chains. Predicting these attributes is an essential task in computational biology for the advancement of proteomics. However, the existing methods do not consider the integration of heterogeneous coding and the accuracy of subunit categories with limited data. To this end, we proposed a tool that can predict more than 12 subunit protein oligomers, QUATgo. Meanwhile, three kinds of sequence coding were used, including dipeptide composition, which was used for the first time to predict protein quaternary structural attributes, and protein half-life characteristics, and we modified the coding method of the functional domain composition proposed by predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data for a single subunit using a two-stage architecture and uses 10-fold cross-validation to test the predictive accuracy of the classifier. QUATgo has 49.0% cross-validation accuracy and 31.1% independent test accuracy. In the case study, the accuracy of QUATgo can reach 61.5% for predicting the quaternary structure of influenza virus hemagglutinin proteins. Finally, QUATgo is freely accessible to the public as a web server via the site http://predictor.nchu.edu.tw/QUATgo.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Estructura Cuaternaria de Proteína , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Proteínas Virales/química , Algoritmos , Animales , Bases de Datos de Proteínas , Humanos , Dominios Proteicos , Proteínas/clasificación , Máquina de Vectores de Soporte
12.
Comput Struct Biotechnol J ; 18: 622-630, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32226595

RESUMEN

Protein mutations can lead to structural changes that affect protein function and result in disease occurrence. In protein engineering, drug design or and optimization industries, mutations are often used to improve protein stability or to change protein properties while maintaining stability. To provide possible candidates for novel protein design, several computational tools for predicting protein stability changes have been developed. Although many prediction tools are available, each tool employs different algorithms and features. This can produce conflicting prediction results that make it difficult for users to decide upon the correct protein design. Therefore, this study proposes an integrated prediction tool, iStable 2.0, which integrates 11 sequence-based and structure-based prediction tools by machine learning and adds protein sequence information as features. Three coding modules are designed for the system, an Online Server Module, a Stand-alone Module and a Sequence Coding Module, to improve the prediction performance of the previous version of the system. The final integrated structure-based classification model has a higher Matthews correlation coefficient than that of the single prediction tool (0.708 vs 0.547, respectively), and the Pearson correlation coefficient of the regression model likewise improves from 0.669 to 0.714. The sequence-based model not only successfully integrates off-the-shelf predictors but also improves the Matthews correlation coefficient of the best single prediction tool by at least 0.161, which is better than the individual structure-based prediction tools. In addition, both the Sequence Coding Module and the Stand-alone Module maintain performance with only a 5% decrease of the Matthews correlation coefficient when the integrated online tools are unavailable. iStable 2.0 is available at http://ncblab.nchu.edu.tw/iStable2.

13.
Sci Rep ; 10(1): 1466, 2020 01 30.
Artículo en Inglés | MEDLINE | ID: mdl-32001758

RESUMEN

MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene expression and biological processes through binding to messenger RNAs. Predicting the relationship between miRNAs and their targets is crucial for research and clinical applications. Many tools have been developed to predict miRNA-target interactions, but variable results among the different prediction tools have caused confusion for users. To solve this problem, we developed miRgo, an application that integrates many of these tools. To train the prediction model, extreme values and median values from four different data combinations, which were obtained via an energy distribution function, were used to find the most representative dataset. Support vector machines were used to integrate 11 prediction tools, and numerous feature types used in these tools were classified into six categories-binding energy, scoring function, evolution evidence, binding type, sequence property, and structure-to simplify feature selection. In addition, a novel evaluation indicator, the Chu-Hsieh-Liang (CHL) index, was developed to improve the prediction power in positive data for feature selection. miRgo achieved better results than all other prediction tools in evaluation by an independent testing set and by its subset of functionally important genes. The tool is available at http://predictor.nchu.edu.tw/miRgo.


Asunto(s)
MicroARNs/metabolismo , Máquina de Vectores de Soporte , Biología Computacional/métodos , Regulación de la Expresión Génica , Humanos , MicroARNs/fisiología , Modelos Estadísticos , Modelos Teóricos
14.
Math Biosci ; 315: 108217, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31220511

RESUMEN

Influenza type A, a serious infectious disease of the human respiratory tract, poses an enormous threat to human health worldwide. It leads to high mortality rates in poultry, pigs, and humans. The primary target identity regions for the human immune system are hemagglutinin (HA) and neuraminidase (NA), two surface proteins of the influenza A virus. Research and development of vaccines is highly complex because the influenza A virus evolves rapidly. This study focused on three genetic features of viral surface proteins: ribonucleic acid (RNA) sequence conservation, linear B-cell epitopes, and N-linked glycosylation. On the basis of these three properties, we analyzed 12,832 HA and 9487 NA protein sequences, which we retrieved from the influenza virus database. We classified the viral surface protein sequences into the 18 HA and 11 NA subtypes that have been identified thus far. Using available analytic tools, we searched for the representative strain of each virus subtype. Furthermore, using machine learning methods, we looked for conservation regions with sequences showing linear B-cell epitopes and N-linked glycosylation. Compared to the prediction of the Immune Epitope Database (IEDB) antibody neutralization response (i.e., screening of antibody sequence regions), in this study, the virus sequence coverage was large and accurate and contained N-linked glycosylation sites. The results of this study proved that we can use the machine learning-based prediction method to solve the problem of vaccine invalidation that occurred during the rapid evolution of the influenza A virus and also as a prevaccine assessment. In addition, the screening fragments can be used as a universal influenza vaccine design reference in the future.


Asunto(s)
Secuencia Conservada , Epítopos de Linfocito B , Glicoproteínas Hemaglutininas del Virus de la Influenza , Virus de la Influenza A , Gripe Humana , Aprendizaje Automático , Neuraminidasa , Proteínas Virales , Bases de Datos de Proteínas , Glicosilación , Humanos , Virus de la Influenza A/clasificación
16.
PLoS Comput Biol ; 15(5): e1006942, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-31067213

RESUMEN

T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to predict the flanking gene expression of T-DNA insertion site in rice mutants. The three kinds of DNA sequences including UPS1K, DISTANCE, and MIDDLE were retrieved to encode and build a forecast model of two-layer machine learning. In the first-layer models, the features nucleotide context (N-gram), cis-regulatory elements (Motif), nucleotide physicochemical properties (NPC), and CG-island (CGI) were used to build SVM models by analysing the concealed information embedded within the three kinds of sequences. Logistic regression was used to estimate the probability of gene activation which as feature-encoding weighting within first-layer model. In the second-layer models, the NaiveBayesUpdateable algorithm was used to integrate these first layer-models, and the system performance was 88.33% on 5-fold cross-validation, and 79.17% on independent-testing finally. In the three kinds of sequences, the model constructed by Middle had the best contribution to the system for identifying the activated genes. The EAT-Rice system provided better performance and gene expression prediction at further distances when compared to the TRIM database. An online server based on EAT-rice is available at http://predictor.nchu.edu.tw/EAT-Rice.


Asunto(s)
ADN Bacteriano/genética , Predicción/métodos , Oryza/genética , Secuencia de Bases , ADN de Plantas/genética , Expresión Génica/genética , Regulación de la Expresión Génica de las Plantas/genética , Aprendizaje Automático , Modelos Estadísticos , Mutagénesis Insercional/métodos , Mutación/genética , Plantas Modificadas Genéticamente , Activación Transcripcional/genética
17.
Open Med (Wars) ; 14: 91-98, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30847396

RESUMEN

BACKGROUND: Hormone receptors of breast cancer, such as estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (Her-2), are important prognostic factors for breast cancer. OBJECTIVE: The current study aimed to develop a method to retrieve the statistics of hormone receptor expression status, documented in pathology reports, given their importance in research for primary and recurrent breast cancer, and quality management of pathology laboratories. METHOD: A two-stage text mining approach via regular expression-based word/phrase matching, was developed to retrieve the data. RESULTS: The method achieved a sensitivity of 98.8%, 98.7% and 98.4% for extraction of ER, PR, and Her-2 results. The hormone expression status from 3679 primary and 44 recurrent breast cancer cases was successfully retrieved with the method. Statistical analysis of these data showed that the recurrent disease had a significantly lower positivity rate for ER (54.5% vs 76.5%, p=0.001278) than primary breast cancer and a higher positivity rate for Her-2 (48.8% vs 16.2%, p=9.79e-8). These results corroborated the previous literature. CONCLUSION: Text mining on pathology reports using the developed method may benefit research of primary and recurrent breast cancer.

18.
Sci Rep ; 8(1): 15512, 2018 10 19.
Artículo en Inglés | MEDLINE | ID: mdl-30341374

RESUMEN

Most modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuracy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew's correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo .


Asunto(s)
Algoritmos , Biología Computacional/métodos , Lisina/metabolismo , Procesamiento Proteico-Postraduccional , Sumoilación , Secuencias de Aminoácidos , Secuencia de Consenso , Bases de Datos de Proteínas , Curva ROC , Proteínas Modificadoras Pequeñas Relacionadas con Ubiquitina/química , Proteínas Modificadoras Pequeñas Relacionadas con Ubiquitina/metabolismo
19.
Genes (Basel) ; 9(2)2018 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-29443925

RESUMEN

Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus. The classification of the protein quaternary structure complex for the post-genome era of proteomics research will be of great help. Classification systems among protein quaternary structures have not been widely developed. Therefore, we designed the architecture of a two-layer machine learning technique in this study, and developed the classification system PClass. The protein quaternary structure of the complex is divided into five categories, namely, monomer, dimer, trimer, tetramer, and other subunit classes. In the framework of the bootstrap method with a support vector machine, we propose a new model selection method. Each type of complex is classified based on sequences, entropy, and accessible surface area, thereby generating a plurality of feature modules. Subsequently, the optimal model of effectiveness is selected as each kind of complex feature module. In this stage, the optimal performance can reach as high as 70% of Matthews correlation coefficient (MCC). The second layer of construction combines the first-layer module to integrate mechanisms and the use of six machine learning methods to improve the prediction performance. This system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system using transcription factors in dimer structure and virus-infection-associated glycoprotein in trimer structure. PClass is available via a web interface at http://predictor.nchu.edu.tw/PClass/.

20.
Entropy (Basel) ; 20(12)2018 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-33266711

RESUMEN

Thermostability is a protein property that impacts many types of studies, including protein activity enhancement, protein structure determination, and drug development. However, most computational tools designed to predict protein thermostability require tertiary structure data as input. The few tools that are dependent only on the primary structure of a protein to predict its thermostability have one or more of the following problems: a slow execution speed, an inability to make large-scale mutation predictions, and the absence of temperature and pH as input parameters. Therefore, we developed a computational tool, named KStable, that is sequence-based, computationally rapid, and includes temperature and pH values to predict changes in the thermostability of a protein upon the introduction of a mutation at a single site. KStable was trained using basis features and minimal redundancy-maximal relevance (mRMR) features, and 58 classifiers were subsequently tested. To find the representative features, a regular-mRMR method was developed. When KStable was evaluated with an independent test set, it achieved an accuracy of 0.708.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...