Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
Genomics ; 94(6): 423-32, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19699293

ABSTRACT

Biomarker development for prediction of patient response to therapy is one of the goals of molecular profiling of human tissues. Due to the large number of transcripts, relatively limited number of samples, and high variability of data, identification of predictive biomarkers is a challenge for data analysis. Furthermore, many genes may be responsible for drug response differences, but often only a few are sufficient for accurate prediction. Here we present an analysis approach, the Convergent Random Forest (CRF) method, for the identification of highly predictive biomarkers. The aim is to select from genome-wide expression data a small number of non-redundant biomarkers that could be developed into a simple and robust diagnostic tool. Our method combines the Random Forest classifier and gene expression clustering to rank and select a small number of predictive genes. We evaluated the CRF approach by analyzing four different data sets. The first set contains transcript profiles of whole blood from rheumatoid arthritis patients, collected before anti-TNF treatment, and their subsequent response to the therapy. In this set, CRF identified 8 transcripts predicting response to therapy with 89% accuracy. We also applied the CRF to the analysis of three previously published expression data sets. For all sets, we have compared the CRF and recursive support vector machines (RSVM) approaches to feature selection and classification. In all cases the CRF selects much smaller number of features, five to eight genes, while achieving similar or better performance on both training and independent testing sets of data. For both methods performance estimates using cross-validation is similar to performance on independent samples. The method has been implemented in R and is available from the authors upon request: Jadwiga.Bienkowska@biogenidec.com.


Subject(s)
Algorithms , Antirheumatic Agents/pharmacology , Arthritis, Rheumatoid/drug therapy , Biomarkers/blood , Decision Trees , Drug Monitoring/methods , Gene Expression Profiling/methods , Genome-Wide Association Study , Tumor Necrosis Factor-alpha/antagonists & inhibitors , Adenocarcinoma/genetics , Antirheumatic Agents/therapeutic use , Arthritis, Rheumatoid/blood , Breast Neoplasms/pathology , Cluster Analysis , Disease Progression , Female , Humans , Leukemia, Myeloid, Acute/genetics , Male , Neoplasm Metastasis , Oligonucleotide Array Sequence Analysis , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Prognosis , Prostatic Neoplasms/genetics , Transcription, Genetic , Treatment Outcome
2.
J Urol ; 180(3): 1126-30, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18639284

ABSTRACT

PURPOSE: We identified significantly hypermethylated genes in clear cell renal cell carcinoma. MATERIALS AND METHODS: We previously identified a set of under expressed genes in renal cell carcinoma tissue through transcriptional profiling and a robust computational screen. We selected 19 of these genes for hypermethylation analysis using a rigorous search for the best candidate regions, considering CpG islands and transcription factor binding sites. The genes were analyzed for hypermethylation in the DNA of 38 matched clear cell renal cell carcinoma and normal samples using matrix assisted laser desorption ionization time-of-flight mass spectrometry. The significance of hypermethylation was assessed using 3 statistical tests. We validated the down-regulation of significantly hypermethylated genes at the RNA and protein levels in a separate set of patients using reverse transcriptase-polymerase chain reaction, immunohistochemistry and Western blots. RESULTS: We found 7 significantly hypermethylated regions from 6 down-regulated genes, including SFRP1, which was previously shown to be hypermethylated in renal cell carcinoma and other cancer types. CONCLUSIONS: To our knowledge we report for the first time that another 5 genes (SCNN1B, SYT6, DACH1, and the tumor suppressors TFAP2A and MT1G) are hypermethylated in renal cell carcinoma. Robust computational screens and the high throughput methylation assay resulted in an enriched set of novel genes that are epigenetically altered in clear cell renal cell carcinoma. Overall the detection of hypermethylation in these highly down-regulated genes suggests that assaying for their methylation using cells from urine or blood could provide the basis for a viable diagnostic test.


Subject(s)
Biomarkers, Tumor/genetics , Carcinoma, Renal Cell/genetics , CpG Islands/genetics , DNA Methylation , Epigenesis, Genetic , Kidney Neoplasms/genetics , Blotting, Western , Down-Regulation , Epithelial Sodium Channels/genetics , Eye Proteins/genetics , Humans , Intercellular Signaling Peptides and Proteins/genetics , Membrane Proteins/genetics , Metallothionein/genetics , Reverse Transcriptase Polymerase Chain Reaction , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization , Synaptotagmins/genetics , Transcription Factor AP-2/genetics , Transcription Factors/genetics
3.
Cancer Res ; 67(22): 10669-76, 2007 Nov 15.
Article in English | MEDLINE | ID: mdl-18006808

ABSTRACT

Gene expression analysis has identified biologically relevant subclasses of breast cancer. However, most classification schemes do not robustly cluster all HER2+ breast cancers, in part due to limitations and bias of clustering techniques used. In this article, we propose an alternative approach that first separates the HER2+ tumors using a gene amplification signal for Her2/neu amplicon genes and then applies consensus ensemble clustering separately to the HER2+ and HER2- clusters to look for further substructure. We applied this procedure to a microarray data set of 286 early-stage breast cancers treated only with surgery and radiation and identified two basal and four luminal subtypes in the HER2- tumors, as well as two novel and robust HER2+ subtypes. HER2+ subtypes had median distant metastasis-free survival of 99 months [95% confidence interval (95% CI), 83-118 months] and 33 months (95% CI, 11-54 months), respectively, and recurrence rates of 11% and 58%, respectively. The low recurrence subtype had a strong relative overexpression of lymphocyte-associated genes and was also associated with a prominent lymphocytic infiltration on histologic analysis. These data suggest that early-stage HER2+ cancers associated with lymphocytic infiltration are a biologically distinct subtype with an improved natural history.


Subject(s)
Breast Neoplasms/metabolism , Gene Expression Regulation, Neoplastic , Lymphocytes/metabolism , Receptor, ErbB-2/biosynthesis , Cell Proliferation , Cluster Analysis , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling , Humans , Multigene Family , Neoplasm Invasiveness , Principal Component Analysis , RNA, Messenger/metabolism , Recurrence
4.
BMC Bioinformatics ; 8: 291, 2007 Aug 06.
Article in English | MEDLINE | ID: mdl-17683614

ABSTRACT

BACKGROUND: Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems. RESULTS: We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of disease (atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Our method identifies an optimum set of genes and divides the samples into stable clusters which correlate with clinical classification into Luminal, Basal-like and Her2+ subtypes. Our analysis reveals a hierarchical portrait of breast cancer progression and identifies genes and pathways for each stage, grade and subtype. An intriguing observation is that the disease phenotype is distinguishable in ADH and progresses along distinct pathways for each subtype. The genetic signature for disease heterogeneity across subtypes is greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes have distinct progression pathways. Our method identifies six disease subtype and one normal clusters. The first split separates the normal samples from the cancer samples. Next, the cancer cluster splits into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3) while the normal cluster is unchanged. Further, the low grade cluster splits into two subclusters and the high grade cluster into four. The final six disease clusters are mapped into one Luminal A, three Luminal B, one Basal-like and one Her2+. CONCLUSION: We confirm that the cancer phenotype can be identified in early stage because the genes altered in this stage progressively alter further as the disease progresses through DCIS into IDC. We identify six subtypes of disease which have distinct genetic signatures and remain separated in the clustering hierarchy. Our findings suggest that the heterogeneity of disease across subtypes is higher than the heterogeneity of the disease progression within a subtype, indicating that the subtypes are in fact distinct diseases.


Subject(s)
Biomarkers, Tumor/analysis , Breast Neoplasms/diagnosis , Breast Neoplasms/metabolism , Carcinoma, Ductal/diagnosis , Carcinoma, Ductal/metabolism , Gene Expression Profiling/methods , Neoplasm Proteins/analysis , Algorithms , Artificial Intelligence , Diagnosis, Computer-Assisted/methods , Disease Progression , Female , Humans , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Principal Component Analysis , Reproducibility of Results , Sensitivity and Specificity
5.
Genome Inform ; 18: 130-40, 2007.
Article in English | MEDLINE | ID: mdl-18546481

ABSTRACT

We describe a new method based on principal component analysis and robust consensus ensemble clustering to identify and elucidate the subtypes of breast cancer disease. The method was applied to microarray gene expression data using micro-dissection of samples from 36 breast cancer patients with at least two of three pathological stages of disease. Controls were normal breast epithelial cells from 3 disease free patients. Our method identified an optimum set of genes and strong, stable clusters which correlated well with clinical classification into Luminal, Basal and Her2+ subtypes based on ER, PR and Her2 status. It also revealed a hierarchical portrait of disease progression through various grades and stages and identified genes and functional pathways for each stage, grade and disease subtype. We found that gene expression heterogeneity across subtypes is much greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes are distinct disease processes. The averaging over data perturbations and clustering methods is critical in the robust identification of subtypes and gene markers for grade and progression.


Subject(s)
Breast Neoplasms/classification , Oligonucleotide Array Sequence Analysis , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Case-Control Studies , Disease Progression , Gene Expression Profiling , Genes, erbB-2 , Humans , Receptors, Estrogen/genetics
6.
Cancer Inform ; 3: 65-92, 2007 Feb 09.
Article in English | MEDLINE | ID: mdl-19455236

ABSTRACT

Microarray gene expression profiling has been used to distinguish histological subtypes of renal cell carcinoma (RCC), and consequently to identify specific tumor markers. The analytical procedures currently in use find sets of genes whose average differential expression across the two categories differ significantly. In general each of the markers thus identified does not distinguish tumor from normal with 100% accuracy, although the group as a whole might be able to do so. For the purpose of developing a widely used economically viable diagnostic signature, however, large groups of genes are not likely to be useful. Here we use two different methods, one a support vector machine variant, and the other an exhaustive search, to reanalyze data previously generated in our Lab (Lenburg et al. 2003). We identify 158 genes, each having an expression level that is higher (lower) in every tumor sample than in any normal sample, and each having a minimum differential expression across the two categories at a significance of 0.01. The set is highly enriched in cancer related genes (p = 1.6 x 10⁻¹²), containing 43 genes previously associated with either RCC or other types of cancer. Many of the biomarkers appear to be associated with the central alterations known to be required for cancer transformation. These include the oncogenes JAZF1, AXL, ABL2; tumor suppressors RASD1, PTPRO, TFAP2A, CDKN1C; and genes involved in proteolysis or cell-adhesion such as WASF2, and PAPPA.

7.
Genome Inform ; 16(1): 245-53, 2005.
Article in English | MEDLINE | ID: mdl-16362927

ABSTRACT

High-throughput gene expression profiling can identify sets of genes that are differentially expressed between different phenotypes. Discovering marker genes is particularly important in diagnosis of a cancer phenotype. However, gene sets produced to date are too large to be economically viable diagnostics. We use a hybrid decision tree-discriminant analysis to identify small sets of genes, i.e. single genes and gene pairs, which separate normal samples from different stages of tumor samples. Half the samples are selected for training to form the probability distribution of expression values of each gene. The distributions for the tumor and normal phenotypes are then used to classify the test samples. The algorithm also identifies gene pairs by combining the probability distributions to construct a decision tree which is used to determine the class of test samples. After a series of training and testing sessions, genes and gene pairs that classify all samples correctly are recorded. The method was applied to a breast cancer data; and classifier genes that distinguish normal breast from different stages of breast tumor were identified. The genes were ranked according to their minimum Euclidean distance between the expression values in tumor and normal samples. The algorithm was able to pick known cancer related genes but also find genes that were not identified as differentially expressed by t-test with a 2 fold cut-off. Overall, the method generates possible diagnostic genes and gene pairs for a specific disease phenotype to pursue further biological interpretations in cancer biology.


Subject(s)
Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Data Interpretation, Statistical , Decision Trees , Genes, Neoplasm , Algorithms , Breast Neoplasms/classification , Breast Neoplasms/metabolism , Endothelin-3/genetics , Female , Gene Expression Profiling , Humans , Hyperplasia/genetics , Hyperplasia/metabolism , Hyperplasia/pathology , Mammary Glands, Human/metabolism , Neoplasm Invasiveness/genetics , Neoplasm Staging , Phenotype , Precancerous Conditions/genetics , Probability
SELECTION OF CITATIONS
SEARCH DETAIL
...