Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38622358

RESUMO

N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 binding is poorly understood. To improve our understanding, we developed m6A-BERT-Deg, a BERT model adapted for predicting YTHDF2-mediated degradation of m6A-methylated mRNAs. We meticulously assembled a high-quality training dataset by integrating multiple data sources for the HeLa cell line. To overcome the limitation of small training samples, we employed a pre-training-fine-tuning strategy by first performing a self-supervised pre-training of the model on 427 760 unlabeled m6A site sequences. The test results demonstrated the importance of this pre-training strategy in enabling m6A-BERT-Deg to outperform other benchmark models. We further conducted a comprehensive model interpretation and revealed a surprising finding that the presence of co-factors in proximity to m6A sites may disrupt YTHDF2-mediated mRNA degradation, subsequently enhancing mRNA stability. We also extended our analyses to the HEK293 cell line, shedding light on the context-dependent YTHDF2-mediated mRNA degradation.


Assuntos
Adenina , Proteínas de Ligação a RNA , Fatores de Transcrição , Animais , Humanos , Células HEK293 , Células HeLa , Estabilidade de RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Fatores de Transcrição/metabolismo
2.
bioRxiv ; 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38313267

RESUMO

Motivation: Molecular Regulatory Pathways (MRPs) are crucial for understanding biological functions. Knowledge Graphs (KGs) have become vital in organizing and analyzing MRPs, providing structured representations of complex biological interactions. Current tools for mining KGs from biomedical literature are inadequate in capturing complex, hierarchical relationships and contextual information about MRPs. Large Language Models (LLMs) like GPT-4 offer a promising solution, with advanced capabilities to decipher the intricate nuances of language. However, their potential for end-to-end KG construction, particularly for MRPs, remains largely unexplored. Results: We present reguloGPT, a novel GPT-4 based in-context learning prompt, designed for the end-to-end joint name entity recognition, N-ary relationship extraction, and context predictions from a sentence that describes regulatory interactions with MRPs. Our reguloGPT approach introduces a context-aware relational graph that effectively embodies the hierarchical structure of MRPs and resolves semantic inconsistencies by embedding context directly within relational edges. We created a benchmark dataset including 400 annotated PubMed titles on N6-methyladenosine (m6A) regulations. Rigorous evaluation of reguloGPT on the benchmark dataset demonstrated marked improvement over existing algorithms. We further developed a novel G-Eval scheme, leveraging GPT-4 for annotation-free performance evaluation and demonstrated its agreement with traditional annotation-based evaluations. Utilizing reguloGPT predictions on m6A-related titles, we constructed the m6A-KG and demonstrated its utility in elucidating m6A's regulatory mechanisms in cancer phenotypes across various cancers. These results underscore reguloGPT's transformative potential for extracting biological knowledge from the literature. Availability and implementation: The source code of reguloGPT, the m6A title and benchmark datasets, and m6A-KG are available at: https://github.com/Huang-AI4Medicine-Lab/reguloGPT.

3.
ArXiv ; 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38292306

RESUMO

N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 binding is poorly understood. To improve our understanding, we developed m6A-BERT-Deg, a BERT model adapted for predicting YTHDF2-mediated degradation of m6A-methylated mRNAs. We meticulously assembled a high-quality training dataset by integrating multiple data sources for the HeLa cell line. To overcome the limitation of small training samples, we employed a pre-training-fine-tuning strategy by first performing a self-supervised pre-training of the model on 427,760 unlabeled m6A site sequences. The test results demonstrated the importance of this pre-training strategy in enabling m6A-BERT-Deg to outperform other benchmark models. We further conducted a comprehensive model interpretation and revealed a surprising finding that the presence of co-factors in proximity to m6A sites may disrupt YTHDF2-mediated mRNA degradation, subsequently enhancing mRNA stability. We also extended our analyses to the HEK293 cell line, shedding light on the context-dependent YTHDF2-mediated mRNA degradation.

4.
Cancers (Basel) ; 14(19)2022 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-36230685

RESUMO

Deep learning has been applied in precision oncology to address a variety of gene expression-based phenotype predictions. However, gene expression data's unique characteristics challenge the computer vision-inspired design of popular Deep Learning (DL) models such as Convolutional Neural Network (CNN) and ask for the need to develop interpretable DL models tailored for transcriptomics study. To address the current challenges in developing an interpretable DL model for modeling gene expression data, we propose a novel interpretable deep learning architecture called T-GEM, or Transformer for Gene Expression Modeling. We provided the detailed T-GEM model for modeling gene-gene interactions and demonstrated its utility for gene expression-based predictions of cancer-related phenotypes, including cancer type prediction and immune cell type classification. We carefully analyzed the learning mechanism of T-GEM and showed that the first layer has broader attention while higher layers focus more on phenotype-related genes. We also showed that T-GEM's self-attention could capture important biological functions associated with the predicted phenotypes. We further devised a method to extract the regulatory network that T-GEM learns by exploiting the attributions of self-attention weights for classifications and showed that the network hub genes were likely markers for the predicted phenotypes.

5.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34929734

RESUMO

Since its selection as the method of the year in 2013, single-cell technologies have become mature enough to provide answers to complex research questions. With the growth of single-cell profiling technologies, there has also been a significant increase in data collected from single-cell profilings, resulting in computational challenges to process these massive and complicated datasets. To address these challenges, deep learning (DL) is positioned as a competitive alternative for single-cell analyses besides the traditional machine learning approaches. Here, we survey a total of 25 DL algorithms and their applicability for a specific step in the single cell RNA-seq processing pipeline. Specifically, we establish a unified mathematical representation of variational autoencoder, autoencoder, generative adversarial network and supervised DL models, compare the training strategies and loss functions for these models, and relate the loss functions of these models to specific objectives of the data processing step. Such a presentation will allow readers to choose suitable algorithms for their particular objective at each step in the pipeline. We envision that this survey will serve as an important information portal for learning the application of DL for scRNA-seq analysis and inspire innovative uses of DL to address a broader range of new challenges in emerging multi-omics and spatial single-cell sequencing.


Assuntos
Aprendizado Profundo , RNA-Seq/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Humanos , Aprendizado de Máquina , Análise de Sequência de RNA/métodos , Transcriptoma
6.
Membranes (Basel) ; 11(11)2021 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-34832055

RESUMO

Hydrogen-air proton exchange membrane fuel cells (PEMFCs) and direct methanol fuel cells (DMFCs) are excellent fuel cells with high limits of energy density. However, the low carbon monoxide (CO) tolerance of the Pt electrode catalyst in hydrogen-air PEMFCs and methanol permanent in DMFCs greatly hindered their extensive use. Applying polybenzimidazole (PBI) membranes can avoid these problems. The high thermal stability allows PBI membranes to work at elevated temperatures when the CO tolerance can be significantly improved; the excellent methanol resistance also makes it suitable for DMFCs. However, the poor proton conductivity of pristine PBI makes it hard to be directly applied in fuel cells. In the past decades, researchers have made great efforts to promote the proton conductivity of PBI membranes, and various effective modification methods have been proposed. To provide engineers and researchers with a basis to further promote the properties of fuel cells with PBI membranes, this paper reviews critical researches on the modification of PBI membranes in both hydrogen-air PEMFCs and DMFCs aiming at promoting the proton conductivity. The modification methods have been classified and the obtained properties have been included. A guide for designing modifications on PBI membranes for high-performance fuel cells is provided.

7.
J Mech Behav Biomed Mater ; 124: 104834, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34544016

RESUMO

3D image-based finite element (FE) and bone volume fraction (BV/TV)/fabric tensor modeling techniques are currently used to determine the apparent stiffness tensor of trabecular bone for assessing its anisotropic elastic behavior. Inspired by the recent success of deep learning (DL) techniques, we hypothesized that DL modeling techniques could be used to predict the apparent stiffness tensor of trabecular bone directly using dual-energy X-ray absorptiometry (DXA) images. To test the hypothesis, a convolutional neural network (CNN) model was trained and validated to predict the apparent stiffness tensor of trabecular bone cubes using their DXA images. Trabecular bone cubes obtained from human cadaver proximal femurs were used to obtain simulated DXA images as input, and the apparent stiffness tensor of the trabecular cubes determined by using micro-CT based FE simulations was used as output (ground truth) to train the DL model. The prediction accuracy of the DL model was evaluated by comparing it with the micro-CT based FE models, histomorphometric parameter based multiple linear regression models, and BV/TV/fabric tensor based multiple linear regression models. The results showed that DXA image-based DL model achieved high fidelity in predicting the apparent stiffness tensor of trabecular bone cubes (R2 = 0.905-0.973), comparable to or better than the histomorphometric parameter based multiple linear regression and BV/TV/fabric tensor based multiple linear regression models, thus supporting the hypothesis of this study. The outcome of this study could be used to help develop DXA image-based DL techniques for clinical assessment of bone fracture risk.


Assuntos
Osso Esponjoso , Aprendizado Profundo , Absorciometria de Fóton , Anisotropia , Densidade Óssea , Osso Esponjoso/diagnóstico por imagem , Análise de Elementos Finitos , Humanos , Microtomografia por Raio-X
8.
Anal Biochem ; 618: 114120, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33535061

RESUMO

Enhancers are non-coding DNA sequences bound by proteins called transcription factors. They function as distant regulators of gene transcription and participate in the development and maintenance of cell types and tissues. Since experimental validation of enhancers is expensive and time-consuming, many computational methods have been developed to predict enhancers and their strength. However, most of these methods still lack good performance in the prediction of enhancer strength. Here, we present a method to predict Enhancers Strength (i.e., strong and weak) by using Augmented data and Residual Convolutional Neural Network (ES-ARCNN). To train ES-ARCNN, we used two data augmentation tricks (i.e., reverse complement and shift) to previously identified enhancers for enlarging a previously identified dataset of enhancers. We further employed a residual convolutional neural network and trained it using the augmented dataset. Compared with other state-of-the-art methods in the 10-fold cross-validation (CV) test, ES-ARCNN has the best performance with the accuracy of 66.17%, and the tricks of data augmentation can effectively improve the prediction performance. We further tested ES-ARCNN on an independent dataset and obtained 65.5% accuracy, which has more than 4% improvement over the other three existing methods. The results in 10CV and independent tests show that ES-ARCNN can effectively predict the enhancer strength. The transcription factor binding sites (TFBSs) enrichment analysis shows that from the mechanistic perspective, enhancer strength is associated with a higher density of important TFBSs in a tissue. A user-friendly web-application is also provided at http://compgenomics.utsa.edu/ES-ARCNN/.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Modelos Genéticos , Redes Neurais de Computação , Fatores de Transcrição/metabolismo , Humanos
9.
Bone Rep ; 13: 100295, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32695850

RESUMO

Dual-energy X-ray absorptiometry (DXA) is widely used for clinical assessment of bone mineral density (BMD). Recent evidence shows that DXA images may also contain microstructural information of trabecular bones. However, no current image processing techniques could aptly extract the information. Inspired by the success of deep learning techniques in medical image analyses, we hypothesized in this study that DXA image-based deep learning models could predict the major microstructural features of trabecular bone with a reasonable accuracy. To test the hypothesis, 1249 trabecular cubes (6 mm × 6 mm × 6 mm) were digitally dissected out from the reconstruction of seven human cadaveric proximal femurs using microCT scans. From each cube, simulated DXA images in designated projections were generated, and the histomorphometric parameters (i.e., BV/TV, BS, Tb.Th, DA, Conn. D, and SMI) of the cube were determined using Image J. Convolutional neural network (CNN) models were trained using the simulated DXA images to predict the histomorphometric parameters of trabecular bone cubes. The results exhibited that the CNN models achieved high fidelity in predicting these histomorphometric parameters (from R = 0.80 to R = 0.985), showing that the DL models exhibited the capability of predicting the microstructural features using DXA images. This study also showed that the number and resolution of input simulated DXA images had considerable impacts on the prediction accuracy of the DL models. These findings support the hypothesis of this study and indicate a high potential of using DXA images in prediction of osteoporotic bone fracture risk.

10.
PLoS Pathog ; 16(1): e1008114, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31951641

RESUMO

Infection by Kaposi's sarcoma-associated herpesvirus (KSHV) is necessary for the development of Kaposi's sarcoma (KS), which most often develops in HIV-infected individuals. KS frequently has oral manifestations and KSHV DNA can be detected in oral cells. Numerous types of cancer are associated with the alteration of microbiome including bacteria and virus. We hypothesize that oral bacterial microbiota affects or is affected by oral KS and the presence of oral cell-associated KSHV DNA. In this study, oral and blood specimens were collected from a cohort of HIV/KSHV-coinfected individuals all previously diagnosed with KS, and were classified as having oral KS with any oral cell-associated KSHV DNA status (O-KS, n = 9), no oral KS but with oral cell-associated KSHV DNA (O-KSHV, n = 10), or with neither oral KS nor oral cell-associated KSHV DNA (No KSHV, n = 10). We sequenced the hypervariable V1-V2 region of the 16S rRNA gene present in oral cell-associated DNA by next generation sequencing. The diversity, richness, relative abundance of operational taxonomic units (OTUs) and taxonomic composition of oral microbiota were analyzed and compared across the 3 studied groups. We found impoverishment of oral microbial diversity and enrichment of specific microbiota in O-KS individuals compared to O-KSHV or No KSHV individuals. These results suggest that HIV/KSHV coinfection and oral microbiota might impact one another and influence the development of oral KS.


Assuntos
Bactérias/isolamento & purificação , DNA Viral/genética , Infecções por HIV/microbiologia , Herpesvirus Humano 8/genética , Microbiota , Boca/microbiologia , Sarcoma de Kaposi/virologia , Bactérias/classificação , Bactérias/genética , Estudos de Coortes , Coinfecção/imunologia , Coinfecção/microbiologia , Coinfecção/virologia , Estudos Transversais , DNA Viral/metabolismo , Infecções por HIV/complicações , Infecções por HIV/imunologia , Infecções por HIV/virologia , Herpesvirus Humano 8/isolamento & purificação , Herpesvirus Humano 8/fisiologia , Humanos , Boca/virologia , Filogenia , Sarcoma de Kaposi/complicações , Sarcoma de Kaposi/imunologia , Sarcoma de Kaposi/microbiologia
11.
BMC Med Genomics ; 12(1): 119, 2019 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-31405368

RESUMO

Following publication of the original article [1], the authors provided an updated funding statement to the article. The updated statement is as follows.

12.
BMC Med Genomics ; 12(Suppl 1): 18, 2019 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-30704458

RESUMO

BACKGROUND: The study of high-throughput genomic profiles from a pharmacogenomics viewpoint has provided unprecedented insights into the oncogenic features modulating drug response. A recent study screened for the response of a thousand human cancer cell lines to a wide collection of anti-cancer drugs and illuminated the link between cellular genotypes and vulnerability. However, due to essential differences between cell lines and tumors, to date the translation into predicting drug response in tumors remains challenging. Recently, advances in deep learning have revolutionized bioinformatics and introduced new techniques to the integration of genomic data. Its application on pharmacogenomics may fill the gap between genomics and drug response and improve the prediction of drug response in tumors. RESULTS: We proposed a deep learning model to predict drug response (DeepDR) based on mutation and expression profiles of a cancer cell or a tumor. The model contains three deep neural networks (DNNs), i) a mutation encoder pre-trained using a large pan-cancer dataset (The Cancer Genome Atlas; TCGA) to abstract core representations of high-dimension mutation data, ii) a pre-trained expression encoder, and iii) a drug response predictor network integrating the first two subnetworks. Given a pair of mutation and expression profiles, the model predicts IC50 values of 265 drugs. We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1.96 (log-scale IC50 values). The performance was superior in prediction error or stability than two classical methods (linear regression and support vector machine) and four analog DNN models of DeepDR, including DNNs built without TCGA pre-training, partly replaced by principal components, and built on individual types of input data. We then applied the model to predict drug response of 9059 tumors of 33 cancer types. Using per-cancer and pan-cancer settings, the model predicted both known, including EGFR inhibitors in non-small cell lung cancer and tamoxifen in ER+ breast cancer, and novel drug targets, such as vinorelbine for TTN-mutated tumors. The comprehensive analysis further revealed the molecular mechanisms underlying the resistance to a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer potential of a novel agent, CX-5461, in treating gliomas and hematopoietic malignancies. CONCLUSIONS: Here we present, as far as we know, the first DNN model to translate pharmacogenomics features identified from in vitro drug screening to predict the response of tumors. The results covered both well-studied and novel mechanisms of drug resistance and drug targets. Our model and findings improve the prediction of drug response and the identification of novel therapeutic options.


Assuntos
Antineoplásicos/farmacologia , Aprendizado Profundo , Genômica/métodos , Benzotiazóis/farmacologia , Linhagem Celular Tumoral , Docetaxel/farmacologia , Humanos , Mutação , Naftiridinas/farmacologia , Transcriptoma/efeitos dos fármacos
13.
BMC Syst Biol ; 12(Suppl 8): 142, 2018 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577835

RESUMO

BACKGROUND: Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists' capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter-gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets. RESULTS: In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets' ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets. CONCLUSIONS: Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.


Assuntos
Genômica/métodos , Adenocarcinoma de Pulmão/diagnóstico , Adenocarcinoma de Pulmão/genética , Neoplasias da Mama/genética , Humanos , Aprendizado de Máquina , Prognóstico , Análise de Sobrevida
14.
Brain Sci ; 8(4)2018 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-29690601

RESUMO

Varying indoor environmental conditions is known to affect office worker’s performance; wherein past research studies have reported the effects of unfavorable indoor temperature and air quality causing sick building syndrome (SBS) among office workers. Thus, investigating factors that can predict performance in changing indoor environments have become a highly important research topic bearing significant impact in our society. While past research studies have attempted to determine predictors for performance, they do not provide satisfactory prediction ability. Therefore, in this preliminary study, we attempt to predict performance during office-work tasks triggered by different indoor room temperatures (22.2 °C and 30 °C) from human brain signals recorded using electroencephalography (EEG). Seven participants were recruited, from whom EEG, skin temperature, heart rate and thermal survey questionnaires were collected. Regression analyses were carried out to investigate the effectiveness of using EEG power spectral densities (PSD) as predictors of performance. Our results indicate EEG PSDs as predictors provide the highest R² (> 0.70), that is 17 times higher than using other physiological signals as predictors and is more robust. Finally, the paper provides insight on the selected predictors based on brain activity patterns for low- and high-performance levels under different indoor-temperatures.

15.
Int J Mol Sci ; 15(2): 3220-33, 2014 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-24566145

RESUMO

Protein-protein interactions (PPIs) play a key role in many cellular processes. Unfortunately, the experimental methods currently used to identify PPIs are both time-consuming and expensive. These obstacles could be overcome by developing computational approaches to predict PPIs. Here, we report two methods of amino acids feature extraction: (i) distance frequency with PCA reducing the dimension (DFPCA) and (ii) amino acid index distribution (AAID) representing the protein sequences. In order to obtain the most robust and reliable results for PPI prediction, pairwise kernel function and support vector machines (SVM) were employed to avoid the concatenation order of two feature vectors generated with two proteins. The highest prediction accuracies of AAID and DFPCA were 94% and 93.96%, respectively, using the 10 CV test, and the results of pairwise radial basis kernel function are considerably improved over those based on radial basis kernel function. Overall, the PPI prediction tool, termed PPI-PKSVM, which is freely available at http://159.226.118.31/PPI/index.html, promises to become useful in such areas as bio-analysis and drug development.


Assuntos
Proteínas/metabolismo , Máquina de Vetores de Suporte , Algoritmos , Aminoácidos/química , Internet , Mapas de Interação de Proteínas , Proteínas/química , Software
16.
Mol Inform ; 33(3): 230-9, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27485691

RESUMO

Fast and effective prediction of signal peptides (SP) and their cleavage sites is of great importance in computational biology. The approaches developed to predict signal peptide can be roughly divided into machine learning based, and sliding windows based. In order to further increase the prediction accuracy and coverage of organism for SP cleavage sites, we propose a novel method for predicting SP cleavage sites called Signal-CTF that utilizes machine learning and sliding windows, and is designed for N-termial secretory proteins in a large variety of organisms including human, animal, plant, virus, bacteria, fungi and archaea. Signal-CTF consists of three distinct elements: (1) a subsite-coupled and regularization function with a scaled window of fixed width that selects a set of candidates of possible secretion-cleavable segment for a query secretory protein; (2) a sum fusion system that integrates the outcomes from aligning the cleavage site template sequence with each of the aforementioned candidates in a scaled window of fixed width to determine the best candidate cleavage sites for the query secretory protein; (3) a voting system that identifies the ultimate signal peptide cleavage site among all possible results derived from using scaled windows of different width. When compared with Signal-3L and SignalP 4.0 predictors, the prediction accuracy of Signal-CTF is 4-12 %, 10-25 % higher than that of Signal-3L for human, animal and eukaryote, and SignalP 4.0 for eukaryota, Gram-positive bacteria and Gram-negative bacteria, respectively. Comparing with PRED-SIGNAL and SignalP 4.0 predictors on the 32 archaea secretory proteins of used in Bagos's paper, the prediction accuracy of Signal-CTF is 12.5 %, 25 % higher than that of PRED-SIGNAL and SignalP 4.0, respectively. The predicting results of several long signal peptides show that the Signal-CTF can better predict cleavage sites for long signal peptides than SignalP, Phobius, Philius, SPOCTOPUS, Signal-CF and Signal-3L. These results show that Signal-CTF is more accurate and flexible in predicting signal peptides of different characteristics for many organisms. Signal-CTF is freely available as a web-server at http://darwin2.cbi.utsa.edu/minniweb/index.html.

17.
Anal Biochem ; 449: 164-71, 2014 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-24361712

RESUMO

Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Proteínas/análise , Frações Subcelulares/química , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Dados de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...