Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 16: 261, 2015 Aug 19.
Article in English | MEDLINE | ID: mdl-26283178

ABSTRACT

BACKGROUND: Multiple high-throughput molecular profiling by omics technologies can be collected for the same individuals. Combining these data, rather than exploiting them separately, can significantly increase the power of clinically relevant patients subclassifications. RESULTS: We propose a multi-view approach in which the information from different data layers (views) is integrated at the levels of the results of each single view clustering iterations. It works by factorizing the membership matrices in a late integration manner. We evaluated the effectiveness and the performance of our method on six multi-view cancer datasets. In all the cases, we found patient sub-classes with statistical significance, identifying novel sub-groups previously not emphasized in literature. Our method performed better as compared to other multi-view clustering algorithms and, unlike other existing methods, it is able to quantify the contribution of single views on the final results. CONCLUSION: Our observations suggest that integration of prior information with genomic features in the subtyping analysis is an effective strategy in identifying disease subgroups. The methodology is implemented in R and the source code is available online at http://neuronelab.unisa.it/a-multi-view-genomic-data-integration-methodology/ .


Subject(s)
Algorithms , Genomics/methods , Cluster Analysis , MicroRNAs/genetics , MicroRNAs/metabolism , Sequence Analysis, RNA
2.
BMC Bioinformatics ; 16: 151, 2015 May 12.
Article in English | MEDLINE | ID: mdl-25962835

ABSTRACT

BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722.


Subject(s)
Algorithms , Computational Biology/methods , Computer Simulation , Gene Expression Profiling/methods , Gene Regulatory Networks , Genomics/methods , DNA Copy Number Variations , DNA Methylation , Datasets as Topic , Gene Expression Regulation , Humans , MicroRNAs/genetics
3.
Acta Neuropathol ; 126(4): 575-94, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23955600

ABSTRACT

Head and neck paragangliomas, rare neoplasms of the paraganglia composed of nests of neurosecretory and glial cells embedded in vascular stroma, provide a remarkable example of organoid tumor architecture. To identify genes and pathways commonly deregulated in head and neck paraganglioma, we integrated high-density genome-wide copy number variation (CNV) analysis with microRNA and immunomorphological studies. Gene-centric CNV analysis of 24 cases identified a list of 104 genes most significantly targeted by tumor-associated alterations. The "NOTCH signaling pathway" was the most significantly enriched term in the list (P = 0.002 after Bonferroni or Benjamini correction). Expression of the relevant NOTCH pathway proteins in sustentacular (glial), chief (neuroendocrine) and endothelial cells was confirmed by immunohistochemistry in 47 head and neck paraganglioma cases. There were no relationships between level and pattern of NOTCH1/JAG2 protein expression and germline mutation status in the SDH genes, implicated in paraganglioma predisposition, or the presence/absence of immunostaining for SDHB, a surrogate marker of SDH mutations. Interestingly, NOTCH upregulation was observed also in cases with no evidence of CNVs at NOTCH signaling genes, suggesting altered epigenetic modulation of this pathway. To address this issue we performed microarray-based microRNA expression analyses. Notably 5 microRNAs (miR-200a,b,c and miR-34b,c), including those most downregulated in the tumors, correlated to NOTCH signaling and directly targeted NOTCH1 in in vitro experiments using SH-SY5Y neuroblastoma cells. Furthermore, lentiviral transduction of miR-200s and miR-34s in patient-derived primary tympano-jugular paraganglioma cell cultures was associated with NOTCH1 downregulation and increased levels of markers of cell toxicity and cell death. Taken together, our results provide an integrated view of common molecular alterations associated with head and neck paraganglioma and reveal an essential role of NOTCH pathway deregulation in this tumor type.


Subject(s)
Epigenesis, Genetic/physiology , Head and Neck Neoplasms/genetics , Head and Neck Neoplasms/pathology , Paraganglioma/genetics , Paraganglioma/pathology , Receptors, Notch/genetics , Receptors, Notch/physiology , Signal Transduction/genetics , Signal Transduction/physiology , Blotting, Western , Caspases/metabolism , Cell Death/genetics , Cell Line, Tumor , DNA Mutational Analysis , Fluorescent Antibody Technique , Humans , Immunohistochemistry , Lentivirus/genetics , Microarray Analysis , Microscopy, Immunoelectron , Peripheral Nerves/metabolism , RNA, Messenger/biosynthesis , RNA, Messenger/genetics , Real-Time Polymerase Chain Reaction , Succinate Dehydrogenase/genetics , Transfection
4.
BMC Bioinformatics ; 11: 8, 2010 Jan 05.
Article in English | MEDLINE | ID: mdl-20051127

ABSTRACT

BACKGROUND: Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones. RESULTS: We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful. CONCLUSIONS: By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.


Subject(s)
Disease/etiology , Environment , Genetic Predisposition to Disease/genetics , Models, Statistical , Disease/genetics , Humans , Monte Carlo Method , Risk Factors
5.
PLoS One ; 4(8): e6824, 2009 Aug 31.
Article in English | MEDLINE | ID: mdl-19718455

ABSTRACT

BACKGROUND: Colorectal cancer is mainly attributed to diet, but the role exerted by foods remains unclear because involved factors are extremely complex. Geography substantially impacts on foods. Correlations between international variation in colorectal cancer-associated mutation patterns and food availabilities could highlight the influence of foods on colorectal mutagenesis. METHODOLOGY: To test such hypothesis, we applied techniques based on hierarchical clustering, feature extraction and selection, and statistical pattern recognition to the analysis of 2,572 colorectal cancer-associated TP53 mutations from 12 countries/geographic areas. For food availabilities, we relied on data extracted from the Food Balance Sheets of the Food and Agriculture Organization of the United Nations. Dendrograms for mutation sites, mutation types and food patterns were constructed through Ward's hierarchical clustering algorithm and their stability was assessed evaluating silhouette values. Feature selection used entropy-based measures for similarity between clusterings, combined with principal component analysis by exhaustive and heuristic approaches. CONCLUSION/SIGNIFICANCE: Mutations clustered in two major geographic groups, one including only Western countries, the other Asia and parts of Europe. This was determined by variation in the frequency of transitions at CpGs, the most common mutation type. Higher frequencies of transitions at CpGs in the cluster that included only Western countries mainly reflected higher frequencies of mutations at CpG codons 175, 248 and 273, the three major TP53 hotspots. Pearson's correlation scores, computed between the principal components of the datamatrices for mutation types, food availability and mutation sites, demonstrated statistically significant correlations between transitions at CpGs and both mutation sites and availabilities of meat, milk, sweeteners and animal fats, the energy-dense foods at the basis of "Western" diets. This is best explainable by differential exposure to nitrosative DNA damage due to foods that promote metabolic stress and chronic inflammation.


Subject(s)
Colorectal Neoplasms/genetics , CpG Islands , Food Supply , Genes, p53 , Geography , Mutation , Cluster Analysis , Colorectal Neoplasms/epidemiology , Humans
6.
Neural Netw ; 16(3-4): 297-319, 2003.
Article in English | MEDLINE | ID: mdl-12672427

ABSTRACT

In the last decade, the use of neural networks (NN) and of other soft computing methods has begun to spread also in the astronomical community which, due to the required accuracy of the measurements, is usually reluctant to use automatic tools to perform even the most common tasks of data reduction and data mining. The federation of heterogeneous large astronomical databases which is foreseen in the framework of the astrophysical virtual observatory and national virtual observatory projects, is, however, posing unprecedented data mining and visualization problems which will find a rather natural and user friendly answer in artificial intelligence tools based on NNs, fuzzy sets or genetic algorithms. This review is aimed to both astronomers (who often have little knowledge of the methodological background) and computer scientists (who often know little about potentially interesting applications), and therefore will be structured as follows: after giving a short introduction to the subject, we shall summarize the methodological background and focus our attention on some of the most interesting fields of application, namely: object extraction and classification, time series analysis, noise identification, and data mining. Most of the original work described in the paper has been performed in the framework of the AstroNeural collaboration (Napoli-Salerno).


Subject(s)
Astronomy/classification , Astronomy/methods , Neural Networks, Computer
SELECTION OF CITATIONS
SEARCH DETAIL
...