Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
F1000Res ; 62017.
Artigo em Inglês | MEDLINE | ID: mdl-29123641

RESUMO

The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.

2.
Front Genet ; 4: 289, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24391662

RESUMO

Integrating gene expression data with secondary data such as pathway or protein-protein interaction data has been proposed as a promising approach for improved outcome prediction of cancer patients. Methods employing this approach usually aggregate the expression of genes into new composite features, while the secondary data guide this aggregation. Previous studies were limited to few data sets with a small number of patients. Moreover, each study used different data and evaluation procedures. This makes it difficult to objectively assess the gain in classification performance. Here we introduce the Amsterdam Classification Evaluation Suite (ACES). ACES is a Python package to objectively evaluate classification and feature-selection methods and contains methods for pooling and normalizing Affymetrix microarrays from different studies. It is simple to use and therefore facilitates the comparison of new approaches to best-in-class approaches. In addition to the methods described in our earlier study (Staiger et al., 2012), we have included two prominent prognostic gene signatures specific for breast cancer outcome, one more composite feature selection method and two network-based gene ranking methods. Employing the evaluation pipeline we show that current composite-feature classification methods do not outperform simple single-genes classifiers in predicting outcome in breast cancer. Furthermore, we find that also the stability of features across different data sets is not higher for composite features. Most stunningly, we observe that prediction performances are not affected when extracting features from randomized PPI networks.

3.
PLoS One ; 7(4): e34796, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22558100

RESUMO

Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcome in breast cancer.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Classificação/métodos , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica/genética , Bases de Dados Genéticas/classificação , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Valor Preditivo dos Testes , Prognóstico
4.
Mol Biol Evol ; 26(8): 1773-80, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19387010

RESUMO

Most mitochondrial proteins are synthesized in the cytosol of eukaryotic cells as precursor proteins carrying N-terminal extensions called transit peptides or presequences, which mediate their specific transport into mitochondria. However, plant cells possess a second potential target organelle for such transit peptides, the chloroplast. It can therefore be assumed that mitochondrial transit peptides in plants are exposed to an increased demand of specificity, which in turn leads to reduced degrees of freedom in these transit peptides compared with those of nonplant organisms. Our study investigates this hypothesis using fractal dimension. Statistical analysis of sequence data shows that the fractal dimension of mitochondrial transit peptides in plants is indeed significantly lower than that from nonplant organisms.


Assuntos
Proteínas de Arabidopsis/genética , Camundongos , Mitocôndrias/genética , Proteínas Mitocondriais/genética , Peptídeos/genética , Saccharomyces cerevisiae/metabolismo , Algoritmos , Animais , Proteínas de Arabidopsis/metabolismo , Cloroplastos/genética , Fractais , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Peptídeos/metabolismo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...