Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinform Adv ; 3(1): vbad146, 2023.
Article in English | MEDLINE | ID: mdl-37881170

ABSTRACT

Motivation: Recent advances in highly multiplexed imaging have provided unprecedented insights into the complex cellular organization of tissues, with many applications in translational medicine. However, downstream analyses of multiplexed imaging data face several technical limitations, and although some computational methods and bioinformatics tools are available, deciphering the complex spatial organization of cellular ecosystems remains a challenging problem. Results: To mitigate this problem, we develop a novel computational tool, LOCATOR (anaLysis Of CAncer Tissue micrOenviRonment), for spatial analysis of cancer tissue microenvironments using data acquired from mass cytometry imaging technologies. LOCATOR introduces a graph-based representation of tissue images to describe features of the cellular organization and deploys downstream analysis and visualization utilities that can be used for data-driven patient-risk stratification. Our case studies using mass cytometry imaging data from two well-annotated breast cancer cohorts re-confirmed that the spatial organization of the tumour-immune microenvironment is strongly associated with the clinical outcome in breast cancer. In addition, we report interesting potential associations between the spatial organization of macrophages and patients' survival. Our work introduces an automated and versatile analysis tool for mass cytometry imaging data with many applications in future cancer research projects. Availability and implementation: Datasets and codes of LOCATOR are publicly available at https://github.com/RezvanEhsani/LOCATOR.

2.
BMC Res Notes ; 14(1): 162, 2021 Apr 30.
Article in English | MEDLINE | ID: mdl-33931103

ABSTRACT

OBJECTIVE: Properties of gene products can be described or annotated with Gene Ontology (GO) terms. But for many genes we have limited information about their products, for example with respect to function. This is particularly true for long non-coding RNAs (lncRNAs), where the function in most cases is unknown. However, it has been shown that annotation as described by GO terms to some extent can be predicted by enrichment analysis on properties of co-expressed genes. RESULTS: GAPGOM integrates two relevant algorithms, lncRNA2GOA and TopoICSim, into a user-friendly R package. Here lncRNA2GOA does annotation prediction by co-expression, whereas TopoICSim estimates similarity between GO graphs, which can be used for benchmarking of prediction performance, but also for comparison of GO graphs in general. The package provides an improved implementation of the original tools, with substantial improvements in performance and documentation, unified interfaces, and additional features.


Subject(s)
Benchmarking , Computational Biology , Algorithms , Gene Ontology , Molecular Sequence Annotation
3.
Cancer Inform ; 19: 1176935120965542, 2020.
Article in English | MEDLINE | ID: mdl-33116353

ABSTRACT

The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a "guilt by association" principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case.

4.
BMC Bioinformatics ; 21(1): 134, 2020 Apr 06.
Article in English | MEDLINE | ID: mdl-32252623

ABSTRACT

BACKGROUND: Diseases like cancer will lead to changes in gene expression, and it is relevant to identify key regulatory genes that can be linked directly to these changes. This can be done by computing a Regulatory Impact Factor (RIF) score for relevant regulators. However, this computation is based on estimating correlated patterns of gene expression, often Pearson correlation, and an assumption about a set of specific regulators, normally transcription factors. This study explores alternative measures of correlation, using the Fisher and Sobolev metrics, and an extended set of regulators, including epigenetic regulators and long non-coding RNAs (lncRNAs). Data on prostate cancer have been used to explore the effect of these modifications. RESULTS: A tool for computation of RIF scores with alternative correlation measures and extended sets of regulators was developed and tested on gene expression data for prostate cancer. The study showed that the Fisher and Sobolev metrics lead to improved identification of well-documented regulators of gene expression in prostate cancer, and the sets of identified key regulators showed improved overlap with previously defined gene sets of relevance to cancer. The extended set of regulators lead to identification of several interesting candidates for further studies, including lncRNAs. Several key processes were identified as important, including spindle assembly and the epithelial-mesenchymal transition (EMT). CONCLUSIONS: The study has shown that using alternative metrics of correlation can improve the performance of tools based on correlation of gene expression in genomic data. The Fisher and Sobolev metrics should be considered also in other correlation-based applications.


Subject(s)
Computational Biology/methods , Epigenesis, Genetic , Transcription Factors/metabolism , Databases, Genetic , Epithelial-Mesenchymal Transition , Gene Expression Regulation, Neoplastic , Humans , Male , Prostatic Neoplasms/genetics , Prostatic Neoplasms/pathology , RNA, Long Noncoding/metabolism , Transcription Factors/genetics
5.
BMC Bioinformatics ; 19(1): 533, 2018 Dec 19.
Article in English | MEDLINE | ID: mdl-30567492

ABSTRACT

BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. RESULTS: We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. CONCLUSION: This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.


Subject(s)
Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation , Molecular Sequence Annotation , RNA, Long Noncoding/metabolism , Software , High-Throughput Nucleotide Sequencing , Humans , RNA, Long Noncoding/genetics
6.
BMC Bioinformatics ; 17(1): 459, 2016 Nov 14.
Article in English | MEDLINE | ID: mdl-27842491

ABSTRACT

BACKGROUND: Transcription factors are key proteins in the regulation of gene transcription. An important step in this process is the opening of chromatin in order to make genomic regions available for transcription. Data on DNase I hypersensitivity has previously been used to label a subset of transcription factors as Pioneers, Settlers and Migrants to describe their potential role in this process. These labels represent an interesting hypothesis on gene regulation and possibly a useful approach for data analysis, and therefore we wanted to expand the set of labeled transcription factors to include as many known factors as possible. We have used a well-annotated dataset of 1175 transcription factors as input to supervised machine learning methods, using the subset with previously assigned labels as training set. We then used the final classifier to label the additional transcription factors according to their potential role as Pioneers, Settlers and Migrants. The full set of labeled transcription factors was used to investigate associated properties and functions of each class, including an analysis of interaction data for transcription factors based on DNA co-binding and protein-protein interactions. We also used the assigned labels to analyze a previously published set of gene lists associated with a time course experiment on cell differentiation. RESULTS: The analysis showed that the classification of transcription factors with respect to their potential role in chromatin opening largely was determined by how they bind to DNA. Each subclass of transcription factors was enriched for properties that seemed to characterize the subclass relative to its role in gene regulation, with very general functions for Pioneers, whereas Migrants to a larger extent were associated with specific processes. Further analysis showed that the expanded classification is a useful resource for analyzing other datasets on transcription factors with respect to their potential role in gene regulation. The analysis of transcription factor interaction data showed complementary differences between the subclasses, where transcription factors labeled as Pioneers often interact with other transcription factors through DNA co-binding, whereas Migrants to a larger extent use protein-protein interactions. The analysis of time course data on cell differentiation indicated a shift in the regulatory program associated with Pioneer-like transcription factors during differentiation. CONCLUSIONS: The expanded classification is an interesting resource for analyzing data on gene regulation, as illustrated here on transcription factor interaction data and data from a time course experiment. The potential regulatory function of transcription factors seems largely to be determined by how they bind DNA, but is also influenced by how they interact with each other through cooperativity and protein-protein interactions.


Subject(s)
Gene Expression Regulation , Transcription Factors/metabolism , Chromatin/genetics , Chromatin/metabolism , DNA/genetics , DNA/metabolism , Genomics , Humans , Transcription Factors/genetics
7.
BMC Bioinformatics ; 17(1): 296, 2016 Jul 29.
Article in English | MEDLINE | ID: mdl-27473391

ABSTRACT

BACKGROUND: The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both. RESULTS: Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests. CONCLUSIONS: The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R .


Subject(s)
Computational Biology/methods , Gene Ontology , Algorithms , Humans , Molecular Sequence Annotation , Semantics , Vocabulary, Controlled
8.
Database (Oxford) ; 2015: bav067, 2015.
Article in English | MEDLINE | ID: mdl-26153137

ABSTRACT

Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians.


Subject(s)
Databases, Genetic , Epigenesis, Genetic , Genomic Instability , Histones , Neoplasm Proteins , Neoplasms , Protamines , Epigenomics , Histones/biosynthesis , Histones/genetics , Humans , Neoplasm Proteins/biosynthesis , Neoplasm Proteins/genetics , Neoplasms/genetics , Neoplasms/metabolism , Protamines/genetics , Protamines/metabolism
9.
BMC Res Notes ; 8: 82, 2015 Mar 14.
Article in English | MEDLINE | ID: mdl-25890365

ABSTRACT

BACKGROUND: Transcription factors are essential proteins for regulating gene expression. This regulation depends upon specific features of the transcription factors, including how they interact with DNA, how they interact with each other, and how they are post-translationally modified. Reliable information about key properties associated with transcription factors will therefore be useful for data analysis, in particular of data from high-throughput experiments. RESULTS: We have used an existing list of 1978 human proteins described as transcription factors to make a well-annotated data set, which includes information on Pfam domains, DNA-binding domains, post-translational modifications and protein-protein interactions. We have then used this data set for enrichment analysis. We have investigated correlations within this set of features, and between the features and more general protein properties. We have also used the data set to analyze previously published gene lists associated with cell differentiation, cancer, and tissue distribution. CONCLUSIONS: The study shows that well-annotated feature list for transcription factors is a useful resource for extensive data analysis; both of transcription factor properties in general and of properties associated with specific processes. However, the study also shows that such analyses are easily biased by incomplete coverage in experimental data, and by how gene sets are defined.


Subject(s)
Transcription Factors/metabolism , Binding Sites , DNA/metabolism , Humans , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL
...