Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
1.
Nucleic Acids Res ; 50(D1): D1208-D1215, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34792145

RESUMEN

DNA methylation has a growing potential for use as a biomarker because of its involvement in disease. DNA methylation data have also substantially grown in volume during the past 5 years. To facilitate access to these fragmented data, we proposed DiseaseMeth version 3.0 based on DiseaseMeth version 2.0, in which the number of diseases including increased from 88 to 162 and High-throughput profiles samples increased from 32 701 to 49 949. Experimentally confirmed associations added 448 pairs obtained by manual literature mining from 1472 papers in PubMed. The search, analyze and tools sections were updated to increase performance. In particular, the FunctionSearch now provides for the functional enrichment of genes from localized GO and KEGG annotation. We have also developed a unified analysis pipeline for identifying differentially DNA methylated genes (DMGs) from the original data stored in the database. 22 718 DMGs were found in 99 diseases. These DMGs offer application in disease evaluation using two self-developed online tools, Methylation Disease Correlation and Cancer Prognosis & Co-Methylation. All query results can be downloaded and can also be displayed through a box plot, heatmap or network module according to whichever search section is used. DiseaseMeth version 3.0 is freely available at http://diseasemeth.edbc.org/.


Asunto(s)
Metilación de ADN/genética , Bases de Datos Factuales , Perfilación de la Expresión Génica/clasificación , Enfermedades Genéticas Congénitas/clasificación , Biomarcadores de Tumor/genética , Enfermedades Genéticas Congénitas/genética , Humanos , Neoplasias/clasificación , Neoplasias/genética , PubMed
2.
Nucleic Acids Res ; 50(D1): D1164-D1171, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34634794

RESUMEN

Drug response to many diseases varies dramatically due to the complex genomics and functional features and contexts. Cellular diversity of human tissues, especially tumors, is one of the major contributing factors to the different drug response in different samples. With the accumulation of single-cell RNA sequencing (scRNA-seq) data, it is now possible to study the drug response to different treatments at the single cell resolution. Here, we present CeDR Atlas (available at https://ngdc.cncb.ac.cn/cedr), a knowledgebase reporting computational inference of cellular drug response for hundreds of cell types from various tissues. We took advantage of the high-throughput profiling of drug-induced gene expression available through the Connectivity Map resource (CMap) as well as hundreds of scRNA-seq data covering cells from a wide variety of organs/tissues, diseases, and conditions. Currently, CeDR maintains the results for more than 582 single cell data objects for human, mouse and cell lines, including about 140 phenotypes and 1250 tissue-cell combination types. All the results can be explored and searched by keywords for drugs, cell types, tissues, diseases, and signature genes. Overall, CeDR fine maps drug response at cellular resolution and sheds lights on the design of combinatorial treatments, drug resistance and even drug side effects.


Asunto(s)
Biomarcadores Farmacológicos , Bases de Datos Factuales , Neoplasias/tratamiento farmacológico , Programas Informáticos , Animales , Perfilación de la Expresión Génica/clasificación , Humanos , Bases del Conocimiento , Ratones , Neoplasias/clasificación , RNA-Seq/clasificación , Análisis de la Célula Individual/clasificación , Secuenciación del Exoma/clasificación
3.
Nucleic Acids Res ; 49(17): e99, 2021 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-34214174

RESUMEN

Though transcriptomics technologies evolve rapidly in the past decades, integrative analysis of mixed data between microarray and RNA-seq remains challenging due to the inherent variability difference between them. Here, Rank-In was proposed to correct the nonbiological effects across the two technologies, enabling freely blended data for consolidated analysis. Rank-In was rigorously validated via the public cell and tissue samples tested by both technologies. On the two reference samples of the SEQC project, Rank-In not only perfectly classified the 44 profiles but also achieved the best accuracy of 0.9 on predicting TaqMan-validated DEGs. More importantly, on 327 Glioblastoma (GBM) profiles and 248, 523 heterogeneous colon cancer profiles respectively, only Rank-In can successfully discriminate every single cancer profile from normal controls, while the others cannot. Further on different sizes of mixed seq-array GBM profiles, Rank-In can robustly reproduce a median range of DEG overlapping from 0.74 to 0.83 among top genes, whereas the others never exceed 0.72. Being the first effective method enabling mixed data of cross-technology analysis, Rank-In welcomes hybrid of array and seq profiles for integrative study on large/small, paired/unpaired and balanced/imbalanced samples, opening possibility to reduce sampling space of clinical cancer patients. Rank-In can be accessed at http://www.badd-cao.net/rank-in/index.html.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , RNA-Seq/métodos , Análisis por Conglomerados , Neoplasias del Colon/diagnóstico , Neoplasias del Colon/genética , Diagnóstico Diferencial , Perfilación de la Expresión Génica/clasificación , Glioblastoma/diagnóstico , Glioblastoma/genética , Humanos , Internet , Neoplasias/diagnóstico , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
4.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-34020547

RESUMEN

Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , Máquina de Vectores de Soporte , Análisis por Conglomerados , Árboles de Decisión , Perfilación de la Expresión Génica/clasificación , Ontología de Genes , Humanos , Neoplasias/patología , Reproducibilidad de los Resultados
5.
Cancer Med ; 10(11): 3782-3793, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33987975

RESUMEN

Relapsed acute lymphoblastic leukaemia (ALL) remains a prevalent paediatric cancer and one of the most common causes of mortality from malignancy in children. Tailoring the intensity of therapy according to early stratification is a promising strategy but remains a major challenge due to heterogeneity and subtyping difficulty. In this study, we subgroup B-precursor ALL patients by gene expression profiles, using non-negative matrix factorization and minimum description length which unsupervisedly determines the number of subgroups. Within each of the four subgroups, logistic and Cox regression with elastic net regularization are used to build models predicting minimal residual disease (MRD) and relapse-free survival (RFS) respectively. Measured by area under the receiver operating characteristic curve (AUC), subgrouping improves prediction of MRD in one subgroup which mostly overlaps with subtype TCF3-PBX1 (AUC = 0·986 in the training set and 1·0 in the test set), compared to a global model published previously. The models predicting RFS displayed acceptable concordance in training set and discriminate high-relapse-risk patients in three subgroups of the test set (Wilcoxon test p = 0·048, 0·036, and 0·016). Genes playing roles in the models are specific to different subgroups. The improvement of subgrouped MRD prediction and the differences of genes in prediction models of subgroups suggest that the heterogeneity of B-precursor ALL can be handled by subgrouping according to gene expression profiles to improve the prediction accuracy.


Asunto(s)
Perfilación de la Expresión Génica , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Adolescente , Niño , Preescolar , Supervivencia sin Enfermedad , Femenino , Perfilación de la Expresión Génica/clasificación , Humanos , Lactante , Modelos Logísticos , Masculino , Neoplasia Residual , Leucemia-Linfoma Linfoblástico de Células Precursoras/clasificación , Modelos de Riesgos Proporcionales , Curva ROC , Recurrencia , Adulto Joven
6.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33876181

RESUMEN

Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , Neoplasias/genética , Animales , Análisis por Conglomerados , Bases de Datos Factuales/estadística & datos numéricos , Perfilación de la Expresión Génica/clasificación , Regulación Neoplásica de la Expresión Génica , Humanos , Internet , Modelos Genéticos , Mutación , Neoplasias/clasificación , Reproducibilidad de los Resultados
7.
Biol Reprod ; 97(3): 353-364, 2017 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-29025079

RESUMEN

Early mammalian embryonic transcriptomes are dynamic throughout the process of preimplantation development. Cataloging of primate transcriptomics during early development has been accomplished in humans, but global characterization of transcripts is lacking in the rhesus macaque: a key model for human reproductive processes. We report here the systematic classification of individual macaque transcriptomes using RNA-Seq technology from the germinal vesicle stage oocyte through the blastocyst stage embryo. Major differences in gene expression were found between sequential stages, with the 4- to 8-cell stages showing the highest level of differential gene expression. Analysis of putative transcription factor binding sites also revealed a striking increase in key regulatory factors in 8-cell embryos, indicating a strong likelihood of embryonic genome activation occurring at this stage. Furthermore, clustering analyses of gene co-expression throughout this period resulted in distinct groups of transcripts significantly associated to the different embryo stages assayed. The sequence data provided here along with characterizations of major regulatory transcript groups present a comprehensive atlas of polyadenylated transcripts that serves as a useful resource for comparative studies of preimplantation development in humans and other species.


Asunto(s)
Blastocisto/fisiología , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/métodos , Oocitos/fisiología , Transcriptoma/genética , Transcriptoma/fisiología , Animales , Sitios de Unión , Mapeo Cromosómico , Análisis por Conglomerados , ADN Complementario/genética , Desarrollo Embrionario/genética , Femenino , Regulación del Desarrollo de la Expresión Génica/genética , Macaca mulatta , Embarazo , ARN/genética , Factores de Transcripción/metabolismo
8.
Diagn Cytopathol ; 44(11): 867-873, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-27534929

RESUMEN

BACKGROUND: The gene expression classifier (GEC; Afirma-Veracyte) has proven to be an effective triage modality in the management of thyroid nodules. We evaluate our institutional experience with GEC, specifically examining performance as a first line testing strategy versus in conjunction with repeat fine needle aspiration (FNA), usage trends based on clinical setting, and performance related to diagnostic categories of The Bethesda System for Reporting Thyroid Cytology (TBSRTC). METHODS: All nodules undergoing GEC analysis from 1/2011 to 12/2015 at the Hospital of the University of Pennsylvania were identified using electronic database search methods. Corresponding cytologic diagnoses, GEC results, origin of the sample (in-house vs. satellite site), number and diagnosis of prior FNA's, and clinical and histologic follow-up were collected. RESULTS: The cohort included 294 nodules. Of these, 145 (49%) were classified as benign, 136 (46%) as suspicious, and 13 (5%) as quantity insufficient by GEC. Surgical resection was performed in 130 (130/294-44%) cases (107, 82% "suspicious" by GEC); final histopathologic diagnosis was benign in 85 (65%) and malignant in 45 (35%) cases. Three false negative diagnoses were identified in the setting of GEC analysis as a first line testing strategy. Most cases with GEC as a first line testing strategy came from satellite clinical sites (112, 66%). CONCLUSIONS: The GEC showed improved performance characteristics when coupled with a repeat FNA. It continues to be of low specificity and positive predictive value in oncocytic follicular lesions. Diagn. Cytopathol. 2016;44:867-873. © 2016 Wiley Periodicals, Inc.


Asunto(s)
Biopsia por Aspiración con Aguja Fina Guiada por Ultrasonido Endoscópico/normas , Perfilación de la Expresión Génica/normas , Técnicas de Diagnóstico Molecular/normas , Nódulo Tiroideo/patología , Biomarcadores/metabolismo , Biopsia por Aspiración con Aguja Fina Guiada por Ultrasonido Endoscópico/estadística & datos numéricos , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/estadística & datos numéricos , Hospitales Universitarios/estadística & datos numéricos , Humanos , Técnicas de Diagnóstico Molecular/clasificación , Técnicas de Diagnóstico Molecular/estadística & datos numéricos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Nódulo Tiroideo/metabolismo
9.
J Comput Biol ; 23(7): 603-14, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27104372

RESUMEN

Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).


Asunto(s)
Perfilación de la Expresión Génica/clasificación , Análisis de Secuencia por Matrices de Oligonucleótidos/clasificación , Proteínas/genética , Algoritmos , Biología Computacional/métodos
10.
Am J Ophthalmol ; 162: 20-27.e1, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26596399

RESUMEN

PURPOSE: To determine whether any conventional clinical prognostic factors for metastasis from uveal melanoma retain prognostic significance in multivariate models incorporating gene expression profile (GEP) class of the tumor cells. DESIGN: Prospective, interventional case series with a prognostic model. METHODS: Single-institution study of GEP testing and other conventional prognostic factors for metastasis and metastatic death in 299 patients with posterior uveal melanoma evaluated by fine-needle aspiration biopsy (FNAB) at the time of or shortly prior to initial treatment. Univariate prognostic significance of all evaluated potential prognostic variables (patient age, largest linear basal diameter of tumor [LBD], tumor thickness, intraocular location of tumor, melanoma cytomorphologic subtype, and GEP class) was performed by comparison of Kaplan-Meier event rate curves and univariate Cox proportional hazards modeling. Multivariate prognostic significance of combinations of significant prognostic factors identified by univariate analysis was performed using step-up and step-down Cox proportional hazards modeling. RESULTS: GEP class was the strongest prognostic factor for metastatic death in this series. However, tumor LBD, tumor thickness, and intraocular tumor location also proved to be significant individual prognostic factors in this study. On multivariate analysis, a 2-term model that incorporated GEP class and largest basal diameter was associated with strong independent significance of each of the factors. CONCLUSION: Although GEP test is the most robust prognostic indicator in uveal melanoma and early studies of mostly larger tumors found that no clinicopathologic factors had significant prognostic value independent of GEP, our single-center study, which included a substantial proportion of smaller tumors, showed that both GEP and LBD of the tumor are independent prognostic factors for metastasis and metastatic death in multivariate analysis.


Asunto(s)
Melanoma/diagnóstico , Melanoma/genética , Transcriptoma/genética , Neoplasias de la Úvea/diagnóstico , Neoplasias de la Úvea/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Biopsia con Aguja Fina , Femenino , Perfilación de la Expresión Génica/clasificación , Genes Relacionados con las Neoplasias , Humanos , Masculino , Melanoma/clasificación , Melanoma/mortalidad , Persona de Mediana Edad , Proteínas de Neoplasias/genética , Pronóstico , Modelos de Riesgos Proporcionales , Estudios Prospectivos , Tasa de Supervivencia , Neoplasias de la Úvea/clasificación , Neoplasias de la Úvea/mortalidad
11.
PLoS One ; 10(11): e0141874, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26562156

RESUMEN

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Heridas y Lesiones/genética , Análisis por Conglomerados , Simulación por Computador , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/métodos , Humanos , Cadenas de Markov , Modelos Genéticos , Modelos Estadísticos , Método de Montecarlo , Análisis de Secuencia por Matrices de Oligonucleótidos/clasificación , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Evaluación de Resultado en la Atención de Salud/métodos , Evaluación de Resultado en la Atención de Salud/estadística & datos numéricos
12.
Plant Physiol ; 169(4): 2684-99, 2015 12.
Artículo en Inglés | MEDLINE | ID: mdl-26438786

RESUMEN

A plethora of diverse programmed cell death (PCD) processes has been described in living organisms. In animals and plants, different forms of PCD play crucial roles in development, immunity, and responses to the environment. While the molecular control of some animal PCD forms such as apoptosis is known in great detail, we still know comparatively little about the regulation of the diverse types of plant PCD. In part, this deficiency in molecular understanding is caused by the lack of reliable reporters to detect PCD processes. Here, we addressed this issue by using a combination of bioinformatics approaches to identify commonly regulated genes during diverse plant PCD processes in Arabidopsis (Arabidopsis thaliana). Our results indicate that the transcriptional signatures of developmentally controlled cell death are largely distinct from the ones associated with environmentally induced cell death. Moreover, different cases of developmental PCD share a set of cell death-associated genes. Most of these genes are evolutionary conserved within the green plant lineage, arguing for an evolutionary conserved core machinery of developmental PCD. Based on this information, we established an array of specific promoter-reporter lines for developmental PCD in Arabidopsis. These PCD indicators represent a powerful resource that can be used in addition to established morphological and biochemical methods to detect and analyze PCD processes in vivo and in planta.


Asunto(s)
Apoptosis/genética , Proteínas de Arabidopsis/genética , Arabidopsis/genética , Perfilación de la Expresión Génica/métodos , Arabidopsis/crecimiento & desarrollo , Proteínas de Arabidopsis/clasificación , Biología Computacional/métodos , Perfilación de la Expresión Génica/clasificación , Regulación del Desarrollo de la Expresión Génica/efectos de los fármacos , Regulación del Desarrollo de la Expresión Génica/efectos de la radiación , Regulación de la Expresión Génica de las Plantas/efectos de los fármacos , Regulación de la Expresión Génica de las Plantas/efectos de la radiación , Peróxido de Hidrógeno/farmacología , Microscopía Confocal , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Oxidantes/farmacología , Plantas Modificadas Genéticamente , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Cloruro de Sodio/farmacología , Transcriptoma/efectos de los fármacos , Transcriptoma/efectos de la radiación , Rayos Ultravioleta
13.
OMICS ; 19(8): 471-7, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26230532

RESUMEN

High-throughput assays from genomics, proteomics, metabolomics, and next generation sequencing produce massive omics datasets that are challenging to analyze in biological or clinical contexts. Thus far, there is no publicly available program for converting quantitative omics data into input formats to be used in off-the-shelf robust phylogenetic programs. To the best of our knowledge, this is the first report on creation of two Windows-based programs, OmicsTract and SynpExtractor, to address this gap. We note, as a way of introduction and development of these programs, that one particularly useful bioinformatics inferential modeling is the phylogenetic cladogram. Cladograms are multidimensional tools that show the relatedness between subgroups of healthy and diseased individuals and the latter's shared aberrations; they also reveal some characteristics of a disease that would not otherwise be apparent by other analytical methods. The OmicsTract and SynpExtractor were written for the respective tasks of (1) accommodating advanced phylogenetic parsimony analysis (through standard programs of MIX [from PHYLIP] and TNT), and (2) extracting shared aberrations at the cladogram nodes. OmicsTract converts comma-delimited data tables through assigning each data point into a binary value ("0" for normal states and "1" for abnormal states) then outputs the converted data tables into the proper input file formats for MIX or with embedded commands for TNT. SynapExtractor uses outfiles from MIX and TNT to extract the shared aberrations of each node of the cladogram, matching them with identifying labels from the dataset and exporting them into a comma-delimited file. Labels may be gene identifiers in gene-expression datasets or m/z values in mass spectrometry datasets. By automating these steps, OmicsTract and SynpExtractor offer a veritable opportunity for rapid and standardized phylogenetic analyses of omics data; their model can also be extended to next generation sequencing (NGS) data. We make OmicsTract and SynpExtractor publicly and freely available for non-commercial use in order to strengthen and build capacity for the phylogenetic paradigm of omics analysis.


Asunto(s)
Perfilación de la Expresión Génica/clasificación , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/genética , Programas Informáticos , Algoritmos , Conjuntos de Datos como Asunto , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Difusión de la Información , Almacenamiento y Recuperación de la Información , Masculino , Metabolómica/métodos , Próstata/metabolismo , Próstata/patología , Neoplasias de la Próstata/patología
14.
Ren Fail ; 37(7): 1219-24, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26156684

RESUMEN

OBJECTIVE: We attempt to explore the pathogenesis and specific genes with aberrant expression in diabetic nephropathy (DN). METHODS: The gene expression profile of GSE1009 was downloaded from Gene Expression Omnibus database, including 3 normal function glomeruli and DN glomeruli from cadaveric donor kidneys. The differentially expressed genes (DEGs) were analyzed and the aberrant gene-related functions were predicted by informatics methods. The protein-protein interaction (PPI) networks for DEGs were constructed and the functional sub-network was screened. RESULTS: A total of 416 DEGs were found to be differentially expressed in DN samples comparing with normal controls, including 404 up-regulated genes and 12 down-regulated genes. DEGs were involved in the process of combination to saccharides and the decline of tissue repairing ability of the organisms. The genes of VEGFA, ACTG1, HSP90AA1 had high degree in the PPI network. The main biological process of genes in the sub-network was related with cell proliferation and signal transmitting of cell membrane receptor. CONCLUSION: Significant nodes in PPI network provide new insights to understand the mechanism of DN. VEGFA, ACTG1 and HSP90AA1 may be the potential targets in the DN treatment.


Asunto(s)
Biología Computacional , Nefropatías Diabéticas/genética , Perfilación de la Expresión Génica/clasificación , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Bases de Datos Factuales , Regulación hacia Abajo , Humanos , Modelos Lineales , Regulación hacia Arriba
15.
Comput Biol Med ; 64: 292-8, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25712072

RESUMEN

Micro-array data are typically characterized by high dimensional features with a small number of samples. Several problems in identifying genes causing diseases from micro-array data can be transformed into the problem of classifying the features extracted from gene expression in micro-array data. However, too many features can cause low prediction accuracy as well as high computational complexity. Dimensional reduction is a method to eliminate irrelevant features to improve the prediction accuracy. Typically, the eigenvalues or dimensional data variance from principal component analysis are used as criteria to select relevant features. This approach is simple but not efficient since it does not concern the degree of data overlap in each dimension in the feature space. A new method to select relevant features based on degree of dimensional data overlap with proper feature selection was introduced. Furthermore, our study concentrated on small sized data sets which usually occur in reality. The experimental results signified that this new approach can achieve substantially higher prediction accuracy when compared with other methods.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/métodos , Algoritmos , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Componente Principal , Curva ROC , Máquina de Vectores de Soporte
16.
Am J Ophthalmol ; 159(2): 248-56, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25448994

RESUMEN

PURPOSE: To determine the frequency of discordant gene expression profile (GEP) classification of posterior uveal melanomas sampled at 2 tumor sites by fine-needle aspiration biopsy (FNAB). DESIGN: Prospective single-institution longitudinal study performed in conjunction with a multicenter validation study of the prognostic value of GEP class of posterior uveal melanoma cells for metastasis and metastatic death. METHODS: FNAB aspirates of 80 clinically diagnosed primary choroidal and ciliochoroidal melanomas were obtained from 2 tumor sites prior to or at the time of initial ocular tumor treatment and submitted for independent GEP testing and classification. Frequency of discordant GEP classification of these specimens was determined. RESULTS: Using the support vector machine learning algorithm favored by the developer of the GEP test employed in this study, 9 of the 80 cases (11.3% [95% confidence interval: 9.0%-13.6%]) were clearly discordant. If cases with a failed classification at 1 site or a low confidence class assignment by the support vector machine algorithm at 1 or both sites are also regarded as discordant, then this frequency rises to 13 of the 80 cases (16.3% [95% confidence interval: 13.0%-19.6%]). CONCLUSION: Sampling of a clinically diagnosed posterior uveal melanoma at a single site for prognostic GEP testing is associated with a substantial probability of misclassification. Two-site sampling of such tumors with independent GEP testing of each specimen may be advisable to lessen the probability of underestimating an individual patient's prognostic risk of metastasis and metastatic death.


Asunto(s)
Neoplasias de la Coroides/clasificación , Perfilación de la Expresión Génica/clasificación , Frecuencia de los Genes , Melanoma/clasificación , Proteínas de Neoplasias/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Biopsia con Aguja Fina , Braquiterapia , Neoplasias de la Coroides/genética , Neoplasias de la Coroides/mortalidad , Neoplasias de la Coroides/patología , Femenino , Humanos , Masculino , Melanoma/genética , Melanoma/mortalidad , Melanoma/secundario , Persona de Mediana Edad , Reacción en Cadena de la Polimerasa , Pronóstico , Estudios Prospectivos , Transcriptoma
17.
Otolaryngol Clin North Am ; 47(4): 573-93, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25041959

RESUMEN

Thyroid fine-needle aspiration biopsies are cytologically indeterminate in 15% to 30% of cases. When cytologically indeterminate thyroid nodules undergo diagnostic surgery, approximately three-quarters prove to be histologically benign. A negative predictive value of more than or equal to 94% for the Afirma Gene Expression Classifier (GEC) is achieved for indeterminate nodules. Most Afirma GEC benign nodules can be clinically observed, as suggested by the National Comprehensive Cancer Network Thyroid Carcinoma Guideline. More than half of the benign nodules with indeterminate cytology (Bethesda categories III/IV) can be identified as GEC benign and removed from the surgical pool to prevent unnecessary diagnostic surgery.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Nódulo Tiroideo/diagnóstico , Nódulo Tiroideo/genética , Biopsia con Aguja Fina , Citodiagnóstico/métodos , Análisis Mutacional de ADN , Perfilación de la Expresión Génica/clasificación , Regulación Neoplásica de la Expresión Génica , Humanos , Inmunohistoquímica , Sensibilidad y Especificidad , Glándula Tiroides/patología , Nódulo Tiroideo/patología , Tiroidectomía/economía
18.
Proc Natl Acad Sci U S A ; 111(23): E2423-30, 2014 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-24912181

RESUMEN

To modulate the expression of genes involved in nitrogen assimilation, the cyanobacterial PII-interacting protein X (PipX) interacts with the global transcriptional regulator NtcA and the signal transduction protein PII, a protein found in all three domains of life as an integrator of signals of the nitrogen and carbon balance. PipX can form alternate complexes with NtcA and PII, and these interactions are stimulated and inhibited, respectively, by 2-oxoglutarate, providing a mechanistic link between PII signaling and NtcA-regulated gene expression. Here, we demonstrate that PipX is involved in a much wider interaction network. The effect of pipX alleles on transcript levels was studied by RNA sequencing of S. elongatus strains grown in the presence of either nitrate or ammonium, followed by multivariate analyses of relevant mutant/control comparisons. As a result of this process, 222 genes were classified into six coherent groups of differentially regulated genes, two of which, containing either NtcA-activated or NtcA-repressed genes, provided further insights into the function of NtcA-PipX complexes. The remaining four groups suggest the involvement of PipX in at least three NtcA-independent regulatory pathways. Our results pave the way to uncover new regulatory interactions and mechanisms in the control of gene expression in cyanobacteria.


Asunto(s)
Proteínas Bacterianas/genética , Proteínas de Unión al ADN/genética , Regulación Bacteriana de la Expresión Génica , Synechococcus/genética , Factores de Transcripción/genética , Compuestos de Amonio/metabolismo , Compuestos de Amonio/farmacología , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Proteínas de Unión al ADN/metabolismo , Perfilación de la Expresión Génica/clasificación , Ácidos Cetoglutáricos/farmacología , Modelos Genéticos , Datos de Secuencia Molecular , Análisis Multivariante , Mutación , Nitratos/metabolismo , Nitratos/farmacología , Nitrógeno/metabolismo , Nitrógeno/farmacología , Motivos de Nucleótidos/genética , Proteínas PII Reguladoras del Nitrógeno/genética , Proteínas PII Reguladoras del Nitrógeno/metabolismo , Regiones Promotoras Genéticas/genética , Unión Proteica/efectos de los fármacos , Homología de Secuencia de Ácido Nucleico , Synechococcus/metabolismo , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción
19.
ScientificWorldJournal ; 2014: 593503, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24790574

RESUMEN

A Relative Expression Analysis (RXA) uses ordering relationships in a small collection of genes and is successfully applied to classiffication using microarray data. As checking all possible subsets of genes is computationally infeasible, the RXA algorithms require feature selection and multiple restrictive assumptions. Our main contribution is a specialized evolutionary algorithm (EA) for top-scoring pairs called EvoTSP which allows finding more advanced gene relations. We managed to unify the major variants of relative expression algorithms through EA and introduce weights to the top-scoring pairs. Experimental validation of EvoTSP on public available microarray datasets showed that the proposed solution significantly outperforms in terms of accuracy other relative expression algorithms and allows exploring much larger solution space.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Evolución Molecular , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/estadística & datos numéricos , Aptitud Genética , Variación Genética , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos/clasificación , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Recombinación Genética , Selección Genética
20.
BMC Bioinformatics ; 14: 350, 2013 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-24299119

RESUMEN

BACKGROUND: Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison. RESULTS: We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach. CONCLUSION: In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.


Asunto(s)
Drosophila melanogaster/citología , Drosophila melanogaster/genética , Regulación del Desarrollo de la Expresión Génica , Genoma de los Insectos/genética , Modelos Genéticos , Anotación de Secuencia Molecular/métodos , Animales , Diferenciación Celular/genética , División Celular/genética , Biología Computacional/clasificación , Biología Computacional/métodos , Drosophila melanogaster/embriología , Perfilación de la Expresión Génica/clasificación , Perfilación de la Expresión Génica/métodos , Ensayos Analíticos de Alto Rendimiento , Anotación de Secuencia Molecular/clasificación , Valor Predictivo de las Pruebas , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...