Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Genome Biol ; 9 Suppl 2: S3, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18834494

RESUMO

BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. RESULTS: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. CONCLUSION: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.


Assuntos
Biologia Computacional/métodos , Genes , Sociedades Científicas , Indexação e Redação de Resumos , Animais , Bases de Dados Genéticas , Humanos , MEDLINE , PubMed , Reprodutibilidade dos Testes
2.
Bioinform Biol Insights ; 2: 291-305, 2008 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-19812783

RESUMO

INTRODUCTION: Numerous methods exist for basic processing, e.g. normalization, of microarray gene expression data. These methods have an important effect on the final analysis outcome. Therefore, it is crucial to select methods appropriate for a given dataset in order to assure the validity and reliability of expression data analysis. Furthermore, biological interpretation requires expression values for genes, which are often represented by several spots or probe sets on a microarray. How to best integrate spot/probe set values into gene values has so far been a somewhat neglected problem. RESULTS: We present a case study comparing different between-array normalization methods with respect to the identification of differentially expressed genes. Our results show that it is feasible and necessary to use prior knowledge on gene expression measurements to select an adequate normalization method for the given data. Furthermore, we provide evidence that combining spot/probe set p-values into gene p-values for detecting differentially expressed genes has advantages compared to combining expression values for spots/probe sets into gene expression values. The comparison of different methods suggests to use Stouffer's method for this purpose. The study has been conducted on gene expression experiments investigating human joint cartilage samples of osteoarthritis related groups: a cDNA microarray (83 samples, four groups) and an Affymetrix (26 samples, two groups) data set. CONCLUSION: The apparently straight forward steps of gene expression data analysis, e.g. between-array normalization and detection of differentially regulated genes, can be accomplished by numerous different methods. We analyzed multiple methods and the possible effects and thereby demonstrate the importance of the single decisions taken during data processing. We give guidelines for evaluating normalization outcomes. An overview of these effects via appropriate measures and plots compared to prior knowledge is essential for the biological interpretation of gene expression measurements.

3.
Am J Pathol ; 171(3): 938-46, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17640966

RESUMO

Interleukin (IL)-1 is one of the most important catabolic cytokines in rheumatoid arthritis. In this study, we were interested in whether we could identify IL-1 expression and activity within normal and osteoarthritic cartilage. mRNA expression of IL-1beta and of one of its major target genes, IL-6, was observed at very low levels in normal cartilage, whereas only a minor up-regulation of these cytokines was noted in osteoarthritic cartilage, suggesting that IL-1 signaling is not a major event in osteoarthritis. However, immunolocalization of central mediators involved in IL-1 signaling pathways [38-kd protein kinases, phospho (P)-38-kd protein kinases, extracellular signal-regulated kinase 1/2, P-extracellular signal-regulated kinase 1/2, c-Jun NH(2)-terminal kinase 1/2, P-c-Jun NH(2)-terminal kinase 1/2, and nuclear factor kappaB] showed that the four IL-1 signaling cascades are functional in normal and osteoarthritic articular chondrocytes. In vivo, we found that IL-1 expression and signaling mechanisms were detectible in the upper zones of normal cartilage, whereas these observations were more pronounced in the upper portions of osteoarthritic cartilage. Given these expression and distribution patterns, our data support two roles for IL-1 in the pathophysiology of articular cartilage. First, chondrocytes in the upper zone of osteoarthritic articular cartilage seem to activate catabolic signaling pathways that may be in response to diffusion of external IL-1 from the synovial fluid. Second, IL-1 seems to be involved in normal cartilage tissue homeostasis as shown by identification of baseline expression patterns and signaling cascade activation.


Assuntos
Cartilagem Articular/metabolismo , Condrócitos/fisiologia , Interleucina-1beta/metabolismo , Interleucina-6/metabolismo , Osteoartrite/metabolismo , Transdução de Sinais/fisiologia , Adulto , Idoso , Cartilagem Articular/patologia , Células Cultivadas , Condrócitos/citologia , MAP Quinases Reguladas por Sinal Extracelular/genética , MAP Quinases Reguladas por Sinal Extracelular/metabolismo , Humanos , Interleucina-1beta/genética , Interleucina-6/genética , Proteínas Quinases JNK Ativadas por Mitógeno/genética , Proteínas Quinases JNK Ativadas por Mitógeno/metabolismo , Pessoa de Meia-Idade , NF-kappa B/genética , NF-kappa B/metabolismo
4.
Clin Orthop Relat Res ; 460: 226-33, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17327807

RESUMO

The cDNA array technology is a powerful tool to analyze a high number of genes in parallel. We investigated whether large-scale gene expression analysis allows clustering and identification of cellular phenotypes of chondrocytes in different in vivo and in vitro conditions. In 100% of cases, clustering analysis distinguished between in vivo and in vitro samples, suggesting fundamental differences in chondrocytes in situ and in vitro regardless of the culture conditions or disease status. It also allowed us to differentiate between healthy and osteoarthritic cartilage. The clustering also revealed the relative importance of the investigated culturing conditions (stimulation agent, stimulation time, bead/monolayer). We augmented the cluster analysis with a statistical search for genes showing differential expression. The identified genes provided hints to the molecular basis of the differences between the sample classes. Our approach shows the power of modern bioinformatic algorithms for understanding and classifying chondrocytic phenotypes in vivo and in vitro. Although it does not generate new experimental data per se, it provides valuable information regarding the biology of chondrocytes and may provide tools for diagnosing and staging the osteoarthritic disease process.


Assuntos
Condrócitos/fisiologia , DNA Complementar/análise , Análise por Conglomerados , Biologia Computacional , Expressão Gênica , Genoma Humano , Humanos , Técnicas In Vitro , Interleucina-1/genética , Linfotoxina-alfa/genética , Análise em Microsséries/métodos , Osteoartrite/genética , Fenótipo , Estatísticas não Paramétricas
5.
Bioinformatics ; 23(3): 365-71, 2007 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-17142812

RESUMO

MOTIVATION: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases. RESULTS: We developed RelEx, an approach for relation extraction from free text. It is based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with gene and protein relations and extracted approximately 150,000 relations with an estimated performance of both 80% precision and 80% recall. AVAILABILITY: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our website (http://www.bio.ifi.lmu.de/publications/RelEx/).


Assuntos
Expressão Gênica/fisiologia , Armazenamento e Recuperação da Informação/métodos , MEDLINE , Processamento de Linguagem Natural , Mapeamento de Interação de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo , Algoritmos , Sistemas de Gerenciamento de Base de Dados , Proteínas/classificação , Software
6.
Arthritis Rheum ; 54(11): 3533-44, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17075858

RESUMO

OBJECTIVE: Despite many research efforts in recent decades, the major pathogenetic mechanisms of osteoarthritis (OA), including gene alterations occurring during OA cartilage degeneration, are poorly understood, and there is no disease-modifying treatment approach. The present study was therefore initiated in order to identify differentially expressed disease-related genes and potential therapeutic targets. METHODS: This investigation consisted of a large gene expression profiling study performed based on 78 normal and disease samples, using a custom-made complementary DNA array covering >4,000 genes. RESULTS: Many differentially expressed genes were identified, including the expected up-regulation of anabolic and catabolic matrix genes. In particular, the down-regulation of important oxidative defense genes, i.e., the genes for superoxide dismutases 2 and 3 and glutathione peroxidase 3, was prominent. This indicates that continuous oxidative stress to the cells and the matrix is one major underlying pathogenetic mechanism in OA. Also, genes that are involved in the phenotypic stability of cells, a feature that is greatly reduced in OA cartilage, appeared to be suppressed. CONCLUSION: Our findings provide a reference data set on gene alterations in OA cartilage and, importantly, indicate major mechanisms underlying central cell biologic alterations that occur during the OA disease process. These results identify molecular targets that can be further investigated in the search for therapeutic interventions.


Assuntos
Cartilagem/patologia , Perfilação da Expressão Gênica/métodos , Osteoartrite do Joelho/genética , Osteoartrite do Joelho/patologia , Idoso , Idoso de 80 Anos ou mais , Diferenciação Celular , Condrócitos/metabolismo , Condrócitos/patologia , Análise por Conglomerados , Impressões Digitais de DNA , Metabolismo Energético/genética , Perfilação da Expressão Gênica/normas , Marcadores Genéticos , Predisposição Genética para Doença/epidemiologia , Humanos , Incidência , Pessoa de Meia-Idade , Osteoartrite do Joelho/epidemiologia , Reprodutibilidade dos Testes , Índice de Gravidade de Doença
7.
Bioinformatics ; 22(19): 2356-63, 2006 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-16882647

RESUMO

MOTIVATION: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS: We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS: We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Expressão Gênica , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Osteoartrite/metabolismo , Proteínas/metabolismo , Biomarcadores/análise , Cartilagem/metabolismo , Simulação por Computador , Humanos , Modelos Biológicos , Modelos Estatísticos , Neoplasias/genética , Osteoartrite/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
8.
BMC Bioinformatics ; 7: 372, 2006 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-16899134

RESUMO

BACKGROUND: Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. RESULTS: We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. CONCLUSION: In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application.


Assuntos
Indexação e Redação de Resumos/métodos , Genes , Proteínas , Terminologia como Assunto , Indexação e Redação de Resumos/normas , Animais , Bases de Dados Genéticas/normas , Bases de Dados de Proteínas/normas , Dicionários como Assunto , Humanos , Armazenamento e Recuperação da Informação/métodos , Armazenamento e Recuperação da Informação/normas , Camundongos , Ratos , Reprodutibilidade dos Testes , Interface Usuário-Computador
9.
Bioinformatics ; 21 Suppl 2: ii259-67, 2005 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16204115

RESUMO

MOTIVATION: The interpretation of expression data without appropriate expert knowledge is difficult and usually limited to exploratory data analysis, such as clustering and detecting differentially regulated genes. However, comparing experimental results against manually compiled knowledge resources might limit or bias the perspective on the data. Thus, manual analysis by experts is required to obtain confident predictions about involved processes. RESULTS: We present an algorithm to simultaneously derive interpretations of expression measurements together with biological hypotheses from biomedical publications. It identifies active functional contexts ('concepts'), i.e. gene clusters that exhibit both a significant gene expression as well as a coherent literature profile. Manual intervention by an expert in specifying prior knowledge is not required. The approach scales to realistic applications and does not rely on controlled vocabularies or pathway resources. We validated our algorithm by analyzing a current juvenile arthritis dataset. A number of gene clusters and accompanying literature topics are identified as an interpretation of the data that coincide well with the phenotype and biological processes known to be involved in the disease. We demonstrate that generated clusters are both more sensitive and more specific than Gene Ontology categories detected on the same data. The method allows for in-depth investigation of subsets of genes, the associated literature topics and publications. AVAILABILITY: Supplementary data on clusters is available upon request.


Assuntos
Sistemas Inteligentes , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Proteoma/metabolismo , Integração de Sistemas
10.
Bioinformatics ; 21 Suppl 2: ii268-9, 2005 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16204117

RESUMO

UNLABELLED: Biologists routinely use Microsoft Office applications for standard analysis tasks. Despite ubiquitous internet resources, information needed for everyday work is often not directly and seamlessly available. Here we describe a very simple and easily extendable mechanism using Web Services to enrich standard MS Office applications with internet resources. We demonstrate its capabilities by providing a Web-based thesaurus for biological objects, which maps names to database identifiers and vice versa via an appropriate synonym list. The client application ProTag makes these features available in MS Office applications using Smart Tags and Add-Ins. AVAILABILITY: http://services.bio.ifi.lmu.de/prothesaurus/


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Internet , Processamento de Linguagem Natural , Interface Usuário-Computador , Vocabulário Controlado , Processamento de Texto/métodos , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Terminologia como Assunto
11.
BMC Bioinformatics ; 6 Suppl 1: S14, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15960826

RESUMO

BACKGROUND: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general English words. The Gene List Task of the BioCreAtIvE challenge evaluation enables comparison of systems addressing the problem of protein and gene name identification on common benchmark data. METHODS: The ProMiner system uses a pre-processed synonym dictionary to identify potential name occurrences in the biomedical text and associate protein and gene database identifiers with the detected matches. It follows a rule-based approach and its search algorithm is geared towards recognition of multi-word names. To account for the large number of ambiguous synonyms in the considered organisms, the system has been extended to use specific variants of the detection procedure for highly ambiguous and case-sensitive synonyms. Based on all detected synonyms for one abstract, the most plausible database identifiers are associated with the text. Organism specificity is addressed by a simple procedure based on additionally detected organism names in an abstract. RESULTS: The extended ProMiner system has been applied to the test cases of the BioCreAtIvE competition with highly encouraging results. In blind predictions, the system achieved an F-measure of approximately 0.8 for the organisms mouse and fly and about 0.9 for the organism yeast.


Assuntos
Biologia Computacional/métodos , Genes , Reconhecimento Automatizado de Padrão/métodos , Proteínas/classificação , Reconhecimento Psicológico , Software , Biologia Computacional/normas , Reconhecimento Automatizado de Padrão/normas , Software/normas
12.
BMC Bioinformatics ; 6 Suppl 1: S15, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15960827

RESUMO

BACKGROUND: Significant parts of biological knowledge are available only as unstructured text in articles of biomedical journals. By automatically identifying gene and gene product (protein) names and mapping these to unique database identifiers, it becomes possible to extract and integrate information from articles and various data sources. We present a simple and efficient approach that identifies gene and protein names in texts and returns database identifiers for matches. It has been evaluated in the recent BioCreAtIvE entity extraction and mention normalization task by an independent jury. METHODS: Our approach is based on the use of synonym lists that map the unique database identifiers for each gene/protein to the different synonym names. For yeast and mouse, synonym lists were used as provided by the organizers who generated them from public model organism databases. The synonym list for fly was generated directly from the corresponding organism database. The lists were then extensively curated in largely automated procedure and matched against MEDLINE abstracts by exact text matching. Rule-based and support vector machine-based post filters were designed and applied to improve precision. RESULTS: Our procedure showed high recall and precision with F-measures of 0.897 for yeast and 0.764/0.773 for mouse in the BioCreAtIvE assessment (Task 1B) and 0.768 for fly in a post-evaluation. CONCLUSION: The results were close to the best over all submissions. Depending on the synonym properties it can be crucial to consider context and to filter out erroneous matches. This is especially important for fly, which has a very challenging nomenclature for the protein name identification task. Here, the support vector machine-based post filter proved to be very effective.


Assuntos
Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Automatizado de Padrão/normas , Proteínas/classificação , Terminologia como Assunto , Animais , Bases de Dados Factuais/classificação , Drosophila , Camundongos , Proteínas/genética , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...