Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Endocrinol (Lausanne) ; 15: 1376220, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38562414

RESUMO

Background: Identification of patients at risk for type 2 diabetes mellitus (T2DM) can not only prevent complications and reduce suffering but also ease the health care burden. While routine physical examination can provide useful information for diagnosis, manual exploration of routine physical examination records is not feasible due to the high prevalence of T2DM. Objectives: We aim to build interpretable machine learning models for T2DM diagnosis and uncover important diagnostic indicators from physical examination, including age- and sex-related indicators. Methods: In this study, we present three weighted diversity density (WDD)-based algorithms for T2DM screening that use physical examination indicators, the algorithms are highly transparent and interpretable, two of which are missing value tolerant algorithms. Patients: Regarding the dataset, we collected 43 physical examination indicator data from 11,071 cases of T2DM patients and 126,622 healthy controls at the Affiliated Hospital of Southwest Medical University. After data processing, we used a data matrix containing 16004 EHRs and 43 clinical indicators for modelling. Results: The indicators were ranked according to their model weights, and the top 25% of indicators were found to be directly or indirectly related to T2DM. We further investigated the clinical characteristics of different age and sex groups, and found that the algorithms can detect relevant indicators specific to these groups. The algorithms performed well in T2DM screening, with the highest area under the receiver operating characteristic curve (AUC) reaching 0.9185. Conclusion: This work utilized the interpretable WDD-based algorithms to construct T2DM diagnostic models based on physical examination indicators. By modeling data grouped by age and sex, we identified several predictive markers related to age and sex, uncovering characteristic differences among various groups of T2DM patients.


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiologia , Aprendizado de Máquina , Algoritmos , Curva ROC , Biomarcadores
2.
Proteomics ; : e2300184, 2024 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-38643383

RESUMO

Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.

3.
Comput Struct Biotechnol J ; 21: 4836-4848, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37854634

RESUMO

Autophagy is a primary mechanism for maintaining cellular homeostasis. The synergistic actions of autophagy-related (ATG) proteins strictly regulate the whole autophagic process. Therefore, accurate identification of ATGs is a first and critical step to reveal the molecular mechanism underlying the regulation of autophagy. Current computational methods can predict ATGs from primary protein sequences, but owing to the limitations of algorithms, significant room for improvement still exists. In this research, we propose EnsembleDL-ATG, an ensemble deep learning framework that aggregates multiple deep learning models to predict ATGs from protein sequence and evolutionary information. We first evaluated the performance of individual networks for various feature descriptors to identify the most promising models. Then, we explored all possible combinations of independent models to select the most effective ensemble architecture. The final framework was built and maintained by an organization of four different deep learning models. Experimental results show that our proposed method achieves a prediction accuracy of 94.5 % and MCC of 0.890, which are nearly 4 % and 0.08 higher than ATGPred-FL, respectively. Overall, EnsembleDL-ATG is the first ATG machine learning predictor based on ensemble deep learning. The benchmark data and code utilized in this study can be accessed for free at https://github.com/jingry/autoBioSeqpy/tree/2.0/examples/EnsembleDL-ATG.

4.
Front Microbiol ; 14: 1175925, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37275146

RESUMO

Post-transcriptionally RNA modifications, also known as the epitranscriptome, play crucial roles in the regulation of gene expression during development. Recently, deep learning (DL) has been employed for RNA modification site prediction and has shown promising results. However, due to the lack of relevant studies, it is unclear which DL architecture is best suited for some pyrimidine modifications, such as 5-methyluridine (m5U). To fill this knowledge gap, we first performed a comparative evaluation of various commonly used DL models for epigenetic studies with the help of autoBioSeqpy. We identified optimal architectural variations for m5U site classification, optimizing the layer depth and neuron width. Second, we used this knowledge to develop Deepm5U, an improved convolutional-recurrent neural network that accurately predicts m5U sites from RNA sequences. We successfully applied Deepm5U to transcriptomewide m5U profiling data across different sequencing technologies and cell types. Third, we showed that the techniques for interpreting deep neural networks, including LayerUMAP and DeepSHAP, can provide important insights into the internal operation and behavior of models. Overall, we offered practical guidance for the development, benchmark, and analysis of deep learning models when designing new algorithms for RNA modifications.

5.
ACS Omega ; 8(22): 19728-19740, 2023 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-37305295

RESUMO

N7-Methylguanosine (m7G) is a crucial post-transcriptional RNA modification that plays a pivotal role in regulating gene expression. Accurately identifying m7G sites is a fundamental step in understanding the biological functions and regulatory mechanisms associated with this modification. While whole-genome sequencing is the gold standard for RNA modification site detection, it is a time-consuming, expensive, and intricate process. Recently, computational approaches, especially deep learning (DL) techniques, have gained popularity in achieving this objective. Convolutional neural networks and recurrent neural networks are examples of DL algorithms that have emerged as versatile tools for modeling biological sequence data. However, developing an efficient network architecture with superior performance remains a challenging task, requiring significant expertise, time, and effort. To address this, we previously introduced a tool called autoBioSeqpy, which streamlines the design and implementation of DL networks for biological sequence classification. In this study, we utilized autoBioSeqpy to develop, train, evaluate, and fine-tune sequence-level DL models for predicting m7G sites. We provided detailed descriptions of these models, along with a step-by-step guide on their execution. The same methodology can be applied to other systems dealing with similar biological questions. The benchmark data and code utilized in this study can be accessed for free at http://github.com/jingry/autoBioSeeqpy/tree/2.0/examples/m7G.

6.
J Adv Res ; 41: 219-231, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36328750

RESUMO

INTRODUCTION: The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. OBJECTIVES: This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. METHODS: Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. RESULTS: Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. CONCLUSION: We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.


Assuntos
Aprendizado Profundo , Inteligência Artificial , Redes Neurais de Computação , Algoritmos , Aprendizado de Máquina , Proteínas
7.
iScience ; 25(12): 105530, 2022 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-36425757

RESUMO

Despite the impressive success of deep learning techniques in various types of classification and prediction tasks, interpreting these models and explaining their predictions are still major challenges. In this article, we present an easy-to-use command line tool capable of visualizing and analyzing alternative representations of biological observations learned by deep learning models. This new tool, namely, layerUMAP, integrates autoBioSeqpy software and the UMAP library to address learned high-level representations. An important advantage of the tool is that it provides an interactive option that enables users to visualize the outputs of hidden layers along the depth of the model. We use two different classes of examples to illustrate the potential power of layerUMAP, and the results demonstrate that layerUMAP can provide insightful visual feedback about models and further guide us to develop better models.

8.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36305428

RESUMO

Predicting RNA solvent accessibility using only primary sequence data can be regarded as sequence-based prediction work. Currently, the established studies for sequence-based RNA solvent accessibility prediction are limited due to the available number of datasets and black box prediction. To improve these issues, we first expanded the available RNA structures and then developed a sequence-based model using modified attention layers with different receptive fields to conform to the stem-loop structure of RNA chains. We measured the improvement with an extended dataset and further explored the model's interpretability by analysing the model structures, attention values and hyperparameters. Finally, we found that the developed model regarded the pieces of a sequence as templates during the training process. This work will be helpful for researchers who would like to build RNA attribute prediction models using deep learning in the future.


Assuntos
RNA , Solventes/química , RNA/genética
9.
Front Microbiol ; 13: 843425, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35401453

RESUMO

DNA N4-methylcytosine (4mC) is a pivotal epigenetic modification that plays an essential role in DNA replication, repair, expression and differentiation. To gain insight into the biological functions of 4mC, it is critical to identify their modification sites in the genomics. Recently, deep learning has become increasingly popular in recent years and frequently employed for the 4mC site identification. However, a systematic analysis of how to build predictive models using deep learning techniques is still lacking. In this work, we first summarized all existing deep learning-based predictors and systematically analyzed their models, features and datasets, etc. Then, using a typical standard dataset with three species (A. thaliana, C. elegans, and D. melanogaster), we assessed the contribution of different model architectures, encoding methods and the attention mechanism in establishing a deep learning-based model for the 4mC site prediction. After a series of optimizations, convolutional-recurrent neural network architecture using the one-hot encoding and attention mechanism achieved the best overall prediction performance. Extensive comparison experiments were conducted based on the same dataset. This work will be helpful for researchers who would like to build the 4mC prediction models using deep learning in the future.

10.
NAR Genom Bioinform ; 3(4): lqab086, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34617013

RESUMO

Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.

11.
Hum Mutat ; 42(6): 667-684, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33822436

RESUMO

One of the greatest challenges in human genetics is deciphering the link between functional variants in noncoding sequences and the pathophysiology of complex diseases. To address this issue, many methods have been developed to sort functional single-nucleotide variants (SNVs) for neutral SNVs in noncoding regions. In this study, we integrated well-established features and commonly used datasets and merged them into large-scale datasets based on a random forest model, which yielded promising performance and outperformed some cutting-edge approaches. Our analyses of feature importance and data coverage also provide certain clues for future research in enhancing the prediction of functional noncoding SNVs.


Assuntos
Algoritmos , Biologia Computacional/métodos , Doença/genética , RNA não Traduzido/genética , Simulação por Computador , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Predisposição Genética para Doença/genética , Testes Genéticos/métodos , Humanos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Design de Software
12.
Front Microbiol ; 12: 605782, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33552038

RESUMO

Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.

13.
Mol Ther Nucleic Acids ; 22: 862-870, 2020 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-33230481

RESUMO

Cancer is one of the most dangerous diseases to human health. The accurate prediction of anticancer peptides (ACPs) would be valuable for the development and design of novel anticancer agents. Current deep neural network models have obtained state-of-the-art prediction accuracy for the ACP classification task. However, based on existing studies, it remains unclear which deep learning architecture achieves the best performance. Thus, in this study, we first present a systematic exploration of three important deep learning architectures: convolutional, recurrent, and convolutional-recurrent networks for distinguishing ACPs from non-ACPs. We find that the recurrent neural network with bidirectional long short-term memory cells is superior to other architectures. By utilizing the proposed model, we implement a sequence-based deep learning tool (DeepACP) to accurately predict the likelihood of a peptide exhibiting anticancer activity. The results indicate that DeepACP outperforms several existing methods and can be used as an effective tool for the prediction of anticancer peptides. Furthermore, we visualize and understand the deep learning model. We hope that our strategy can be extended to identify other types of peptides and may provide more assistance to the development of proteomics and new drugs.

14.
Biomed Res Int ; 2020: 1475368, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32908867

RESUMO

In clinical cancer research, it is a hot topic on how to accurately stratify patients based on genomic data. With the development of next-generation sequencing technology, more and more types of genomic features, such as mRNA expression level, can be used to distinguish cancer patients. Previous studies commonly stratified patients by using a single type of genomic features, which can only reflect one aspect of the cancer. In fact, multiscale genomic features will provide more information and may be helpful for clinical prediction. In addition, most of the conventional machine learning algorithms use a handcrafted gene set as features to construct models, which is generally selected by a statistical method with an arbitrary cut-off, e.g., p value < 0.05. The genes in the gene set are not necessarily related to the cancer and will make the model unreliable. Therefore, in our study, we thoroughly investigated the performance of different machine learning methods on stratifying breast cancer patients with a single type of genomic features. Then, we proposed a strategy, which can take into account the degree of correlation between genes and cancer patients, to identify the features from mRNAs and microRNAs, and evaluated the performance of the models with the new combined features of the multiscale genomic features. The results showed that, compared with the models constructed with a single type of features, the models with the multiscale genomic features generated by our proposed method achieved better performance on stratifying the ER status of breast cancer patients. Moreover, we found that the identified multiscale genomic features were closely related to the cancer by gene set enrichment analysis, indicating that our proposed strategy can well reflect the biological relevance of the genes to breast cancer. In conclusion, modelling with multiscale genomic features closely related to the cancer not only can guarantee the prediction performance of the models but also can effectively provide candidate genes for interpreting the mechanisms of cancer.


Assuntos
Neoplasias da Mama/genética , Modelos Genéticos , Algoritmos , Carcinoma de Células Renais/genética , Bases de Dados Genéticas , Feminino , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Genômica/métodos , Humanos , Neoplasias Renais/genética , Aprendizado de Máquina , MicroRNAs/genética , RNA Mensageiro/genética , Receptores de Estrogênio/genética , Receptores de Estrogênio/metabolismo , Neoplasias da Glândula Tireoide/genética
15.
J Chem Inf Model ; 60(8): 3755-3764, 2020 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-32786512

RESUMO

Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Transporte Proteico
16.
Front Pharmacol ; 10: 1489, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31992983

RESUMO

Toxicogenomics (TGx) is a powerful method to evaluate toxicity and is widely used in both in vivo and in vitro assays. For in vivo TGx, reduction, refinement, and replacement represent the unremitting pursuit of live-animal tests, but in vitro assays, as alternatives, usually demonstrate poor correlation with real in vivo assays. In living subjects, in addition to drug effects, inner-environmental reactions also affect genetic variation, and these two factors are further jointly reflected in gene abundance. Thus, finding a strategy to factorize inner-environmental factor from in vivo assays based on gene expression levels and to further utilize in vitro data to better simulate in vivo data is needed. We proposed a strategy based on post-modified non-negative matrix factorization, which can estimate the gene expression profiles and contents of major factors in samples. The applicability of the strategy was first verified, and the strategy was then utilized to simulate in vivo data by correcting in vitro data. The similarities between real in vivo data and simulated data (single-dose 0.72, repeat-doses 0.75) were higher than those observed when directly comparing real in vivo data with in vitro data (single-dose 0.56, repeat-doses 0.70). Moreover, by keeping environment-related factor, a simulation can always be generated by using in vitro data to provide potential substitutions for in vivo TGx and to reduce the launch of live-animal tests.

17.
Int J Genomics ; 2018: 8124950, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29546047

RESUMO

In genetic data modeling, the use of a limited number of samples for modeling and predicting, especially well below the attribute number, is difficult due to the enormous number of genes detected by a sequencing platform. In addition, many studies commonly use machine learning methods to evaluate genetic datasets to identify potential disease-related genes and drug targets, but to the best of our knowledge, the information associated with the selected gene set was not thoroughly elucidated in previous studies. To identify a relatively stable scheme for modeling limited samples in the gene datasets and reveal the information that they contain, the present study first evaluated the performance of a series of modeling approaches for predicting clinical endpoints of cancer and later integrated the results using various voting protocols. As a result, we proposed a relatively stable scheme that used a set of methods with an ensemble algorithm. Our findings indicated that the ensemble methodologies are more reliable for predicting cancer prognoses than single machine learning algorithms as well as for gene function evaluating. The ensemble methodologies provide a more complete coverage of relevant genes, which can facilitate the exploration of cancer mechanisms and the identification of potential drug targets.

18.
Sci Rep ; 7: 43709, 2017 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-28262806

RESUMO

Genome-wide association studies (GWAS) have identified more than sixty single nucleotide polymorphisms (SNPs) associated with increased risk for type 2 diabetes (T2D). However, the identification of causal risk SNPs for T2D pathogenesis was complicated by the factor that each risk SNP is a surrogate for the hundreds of SNPs, most of which reside in non-coding regions. Here we provide a comprehensive annotation of 65 known T2D related SNPs and inspect putative functional SNPs probably causing protein dysfunction, response element disruptions of known transcription factors related to T2D genes and regulatory response element disruption of four histone marks in pancreas and pancreas islet. In new identified risk SNPs, some of them were reported as T2D related SNPs in recent studies. Further, we found that accumulation of modest effects of single sites markedly enhanced the risk prediction based on 1989 T2D samples and 3000 healthy controls. The AROC value increased from 0.58 to 0.62 by only using genotype score when putative risk SNPs were added. Besides, the net reclassification improvement is 10.03% on the addition of new risk SNPs. Taken together, functional annotation could provide a list of prioritized potential risk SNPs for the further estimation on the T2D susceptibility of individuals.


Assuntos
Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Biologia Computacional/métodos , Diabetes Mellitus Tipo 2/metabolismo , Epigênese Genética , Éxons , Genômica/métodos , Histonas/metabolismo , Humanos , Desequilíbrio de Ligação , Anotação de Sequência Molecular , Razão de Chances , Regiões Promotoras Genéticas , Curva ROC , Sequências Reguladoras de Ácido Nucleico , Medição de Risco , Fatores de Transcrição/metabolismo
19.
Sci Rep ; 6: 28720, 2016 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-27349736

RESUMO

The interactions among the genes within a disease are helpful for better understanding the hierarchical structure of the complex biological system of it. Most of the current methodologies need the information of known interactions between genes or proteins to create the network connections. However, these methods meet the limitations in clinical cancer researches because different cancers not only share the common interactions among the genes but also own their specific interactions distinguished from each other. Moreover, it is still difficult to decide the boundaries of the sub-networks. Therefore, we proposed a strategy to construct a gene network by using the sparse inverse covariance matrix of gene expression data, and divide it into a series of functional modules by an adaptive partition algorithm. The strategy was validated by using the microarray data of three cancers and the RNA-sequencing data of glioblastoma. The different modules in the network exhibited specific functions in cancers progression. Moreover, based on the gene expression profiles in the modules, the risk of death was well predicted in the clustering analysis and the binary classification, indicating that our strategy can be benefit for investigating the cancer mechanisms and promoting the clinical applications of network-based methodologies in cancer researches.


Assuntos
Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Glioblastoma , Neuroblastoma , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Neuroblastoma/genética , Neuroblastoma/metabolismo
20.
Biomark Med ; 9(11): 1067-78, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26501374

RESUMO

AIMS: Although RNA-sequencing has been widely used to identify the differentially expressed genes (DEGs) as biomarkers to guide the therapeutic treatment, it is necessary to investigate the concordance of DEGs identified by microarray and RNA-sequencing for the clinical prognosis. MATERIAL & METHODS: By using The Cancer Genome Atlas data sets, we thoroughly investigated the concordance of DEGs identified from microarray and RNA-sequencing data and their molecular functions. RESULTS: The DEGs identified by both technologies averaged ~98.6% overlap. The cancer-related gene sets were significantly enriched with the DEGs and consistent between two technologies. CONCLUSIONS: The highly consistency of DEGs in their regulation directionality and molecular functions indicated the good reproducibility between microarray and RNA-sequencing in identifying potential oncogenes for clinical prognosis.


Assuntos
Biomarcadores Tumorais/genética , Análise de Sequência com Séries de Oligonucleotídeos , Oncogenes/genética , Análise de Sequência de RNA , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Prognóstico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...