Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Cancer Res ; 79(1): 263-273, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30487137

ABSTRACT

Low-dose CT (LDCT) is widely accepted as the preferred method for detecting pulmonary nodules. However, the determination of whether a nodule is benign or malignant involves either repeated scans or invasive procedures that sample the lung tissue. Noninvasive methods to assess these nodules are needed to reduce unnecessary invasive tests. In this study, we have developed a pulmonary nodule classifier (PNC) using RNA from whole blood collected in RNA-stabilizing PAXgene tubes that addresses this need. Samples were prospectively collected from high-risk and incidental subjects with a positive lung CT scan. A total of 821 samples from 5 clinical sites were analyzed. Malignant samples were predominantly stage 1 by pathologic diagnosis and 97% of the benign samples were confirmed by 4 years of follow-up. A panel of diagnostic biomarkers was selected from a subset of the samples assayed on Illumina microarrays that achieved a ROC-AUC of 0.847 on independent validation. The microarray data were then used to design a biomarker panel of 559 gene probes to be validated on the clinically tested NanoString nCounter platform. RNA from 583 patients was used to assess and refine the NanoString PNC (nPNC), which was then validated on 158 independent samples (ROC-AUC = 0.825). The nPNC outperformed three clinical algorithms in discriminating malignant from benign pulmonary nodules ranging from 6-20 mm using just 41 diagnostic biomarkers. Overall, this platform provides an accurate, noninvasive method for the diagnosis of pulmonary nodules in patients with non-small cell lung cancer. SIGNIFICANCE: These findings describe a minimally invasive and clinically practical pulmonary nodule classifier that has good diagnostic ability at distinguishing benign from malignant pulmonary nodules.


Subject(s)
Biomarkers, Tumor/genetics , Carcinoma, Non-Small-Cell Lung/diagnosis , Gene Expression Profiling , Lung Neoplasms/diagnosis , Multiple Pulmonary Nodules/diagnosis , Tomography, X-Ray Computed/methods , Aged , Algorithms , Biomarkers, Tumor/blood , Carcinoma, Non-Small-Cell Lung/blood , Carcinoma, Non-Small-Cell Lung/diagnostic imaging , Carcinoma, Non-Small-Cell Lung/genetics , Diagnosis, Differential , Female , Gene Expression Regulation, Neoplastic , Humans , Lung Neoplasms/blood , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/genetics , Male , Middle Aged , Multiple Pulmonary Nodules/blood , Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/genetics , Prospective Studies
2.
Mol Ecol ; 23(22): 5524-37, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25314618

ABSTRACT

Hibernation is an energy-saving adaptation that involves a profound suppression of physical activity that can continue for 6-8 months in highly seasonal environments. While immobility and disuse generate muscle loss in most mammalian species, in contrast, hibernating bears and ground squirrels demonstrate limited muscle atrophy over the prolonged periods of physical inactivity during winter, suggesting that hibernating mammals have adaptive mechanisms to prevent disuse muscle atrophy. To identify common transcriptional programmes that underlie molecular mechanisms preventing muscle loss, we conducted a large-scale gene expression screen in hind limb muscles comparing hibernating and summer-active black bears and arctic ground squirrels using custom 9600 probe cDNA microarrays. A molecular pathway analysis showed an elevated proportion of overexpressed genes involved in all stages of protein biosynthesis and ribosome biogenesis in muscle of both species during torpor of hibernation that suggests induction of translation at different hibernation states. The induction of protein biosynthesis probably contributes to attenuation of disuse muscle atrophy through the prolonged periods of immobility of hibernation. The lack of directional changes in genes of protein catabolic pathways does not support the importance of metabolic suppression for preserving muscle mass during winter. Coordinated reduction in multiple genes involved in oxidation-reduction and glucose metabolism detected in both species is consistent with metabolic suppression and lower energy demand in skeletal muscle during inactivity of hibernation.


Subject(s)
Adaptation, Physiological/genetics , Comparative Genomic Hybridization , Hibernation , Muscular Atrophy/genetics , Sciuridae/genetics , Ursidae/genetics , Animals , Male , Oligonucleotide Array Sequence Analysis , Protein Biosynthesis , Transcriptome
4.
Oncoimmunology ; 1(8): 1414-1416, 2012 Nov 01.
Article in English | MEDLINE | ID: mdl-23243612

ABSTRACT

Attempts to refine and improve outcome predictions using tumor gene expression have been recently reported. We show that peripheral blood mononuclear cell (PBMC)-associated gene signatures can predict outcome in non-small cell lung carcinoma patients independent of demographic data or TNM staging, and that this information may persist after tumor resection.

5.
PLoS One ; 7(3): e34392, 2012.
Article in English | MEDLINE | ID: mdl-22479623

ABSTRACT

Prediction of cancer recurrence in patients with non-small cell lung cancer (NSCLC) currently relies on the assessment of clinical characteristics including age, tumor stage, and smoking history. A better prediction of early stage cancer patients with poorer survival and late stage patients with better survival is needed to design patient-tailored treatment protocols. We analyzed gene expression in RNA from peripheral blood mononuclear cells (PBMC) of NSCLC patients to identify signatures predictive of overall patient survival. We find that PBMC gene expression patterns from NSCLC patients, like patterns from tumors, have information predictive of patient outcomes. We identify and validate a 26 gene prognostic panel that is independent of clinical stage. Many additional prognostic genes are specific to myeloid cells and are more highly expressed in patients with shorter survival. We also observe that significant numbers of prognostic genes change expression levels in PBMC collected after tumor resection. These post-surgery gene expression profiles may provide a means to re-evaluate prognosis over time. These studies further suggest that patient outcomes are not solely determined by tumor gene expression profiles but can also be influenced by the immune response as reflected in peripheral immune cells.


Subject(s)
Carcinoma, Non-Small-Cell Lung/diagnosis , Gene Expression Regulation, Neoplastic , Leukocytes, Mononuclear/metabolism , Lung Neoplasms/diagnosis , Aged , Aged, 80 and over , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/pathology , Female , Gene Expression Profiling , Humans , Leukocytes, Mononuclear/pathology , Lung/immunology , Lung/metabolism , Lung/pathology , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Male , Middle Aged , Prognosis , Survival Analysis
6.
PLoS One ; 7(2): e31241, 2012.
Article in English | MEDLINE | ID: mdl-22359580

ABSTRACT

Inflammatory Bowel Disease--comprised of Crohn's Disease and Ulcerative Colitis (UC)--is a complex, multi-factorial inflammatory disorder of the gastrointestinal tract. In this study we have explored the utility of naturally occurring circulating miRNAs as potential blood-based biomarkers for non-invasive prediction of UC incidences. Whole genome maps of circulating miRNAs in micro-vesicles, Peripheral Blood Mononuclear Cells and platelets have been constructed from a cohort of 20 UC patients and 20 normal individuals. Through Significance Analysis of Microarrays, a signature of 31 differentially expressed platelet-derived miRNAs has been identified and biomarker performance estimated through a non-probabilistic binary linear classification using Support Vector Machines. Through this approach, classifier measurements reveal a predictive score of 92.8% accuracy, 96.2% specificity and 89.5% sensitivity in distinguishing UC patients from normal individuals. Additionally, the platelet-derived biomarker signature can be validated at 88% accuracy through qPCR assays, and a majority of the miRNAs in this panel can be demonstrated to sub-stratify into 4 highly correlated intensity based clusters. Analysis of predicted targets of these biomarkers reveal an enrichment of pathways associated with cytoskeleton assembly, transport, membrane permeability and regulation of transcription factors engaged in a variety of regulatory cascades that are consistent with a cell-mediated immune response model of intestinal inflammation. Interestingly, comparison of the miRNA biomarker panel and genetic loci implicated in IBD through genome-wide association studies identifies a physical linkage between hsa-miR-941 and a UC susceptibility loci located on Chr 20. Taken together, analysis of these expression maps outlines a promising catalog of novel platelet-derived miRNA biomarkers of clinical utility and provides insight into the potential biological function of these candidates in disease pathogenesis.


Subject(s)
Colitis, Ulcerative/diagnosis , Genome-Wide Association Study , MicroRNAs/blood , Biomarkers/blood , Case-Control Studies , Humans , Inflammation/immunology , Predictive Value of Tests , Sensitivity and Specificity , Support Vector Machine
7.
Funct Integr Genomics ; 12(2): 357-65, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22351243

ABSTRACT

Physical inactivity reduces mechanical load on the skeleton, which leads to losses of bone mass and strength in non-hibernating mammalian species. Although bears are largely inactive during hibernation, they show no loss in bone mass and strength. To obtain insight into molecular mechanisms preventing disuse bone loss, we conducted a large-scale screen of transcriptional changes in trabecular bone comparing winter hibernating and summer non-hibernating black bears using a custom 12,800 probe cDNA microarray. A total of 241 genes were differentially expressed (P < 0.01 and fold change >1.4) in the ilium bone of bears between winter and summer. The Gene Ontology and Gene Set Enrichment Analysis showed an elevated proportion in hibernating bears of overexpressed genes in six functional sets of genes involved in anabolic processes of tissue morphogenesis and development including skeletal development, cartilage development, and bone biosynthesis. Apoptosis genes demonstrated a tendency for downregulation during hibernation. No coordinated directional changes were detected for genes involved in bone resorption, although some genes responsible for osteoclast formation and differentiation (Ostf1, Rab9a, and c-Fos) were significantly underexpressed in bone of hibernating bears. Elevated expression of multiple anabolic genes without induction of bone resorption genes, and the down regulation of apoptosis-related genes, likely contribute to the adaptive mechanism that preserves bone mass and structure through prolonged periods of immobility during hibernation.


Subject(s)
Hibernation/genetics , Ilium/anatomy & histology , Ilium/physiology , Up-Regulation , Ursidae/physiology , Animals , Apoptosis/genetics , Biosynthetic Pathways/genetics , Bone Resorption/genetics , Gene Expression , Gene Expression Profiling , Gene Expression Regulation , Genes , Ilium/metabolism , Male , Oligonucleotide Array Sequence Analysis , Organ Size , Osteogenesis/genetics , Ursidae/genetics , Ursidae/metabolism
8.
Clin Cancer Res ; 17(18): 5867-77, 2011 Sep 15.
Article in English | MEDLINE | ID: mdl-21807633

ABSTRACT

PURPOSE: To characterize the interactions of non-small cell lung cancer (NSCLC) tumors with the immune system at the level of mRNA and microRNA (miRNA) expression and to define expression signatures that characterize the presence of a malignant tumor versus a nonmalignant nodule. EXPERIMENTAL DESIGN: We have examined the changes of both mRNA and miRNA expression levels in peripheral blood mononuclear cells (PBMC) between paired samples collected from NSCLC patients before and after tumor removal using Illumina gene expression arrays. RESULTS: We found that malignant tumor removal significantly changes expression of more than 3,000 protein-coding genes, especially genes in pathways associated with suppression of the innate immune response, including natural killer cell signaling and apoptosis-associated ceramide signaling. Binding sites for the ETS domain transcription factors ELK1, ELK4, and SPI1 were enriched in promoter regions of genes upregulated in the presence of a tumor. Additional important regulators included five miRNAs expressed at significantly higher levels before tumor removal. Repressed protein-coding targets of those miRNAs included many transcription factors, several involved in immunologically important pathways. Although there was a significant overlap in the effects of malignant tumors and benign lung nodules on PBMC gene expression, we identified one gene panel which indicates a tumor or nodule presence and a second panel that can distinguish malignant from nonmalignant nodules. CONCLUSIONS: A tumor presence in the lung influences mRNA and miRNA expression in PBMC and this influence is reversed by tumor removal. These results suggest that PBMC gene expression signatures could be used for lung cancer diagnosis.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/immunology , Gene Expression Regulation, Neoplastic/immunology , Lung Neoplasms/genetics , Lung Neoplasms/immunology , Aged , Aged, 80 and over , Binding Sites/genetics , Carcinoma, Non-Small-Cell Lung/surgery , Cluster Analysis , Female , Gene Expression Profiling , Humans , Leukocytes, Mononuclear/metabolism , Lung Neoplasms/surgery , Lymphocyte Activation/genetics , Lymphocyte Subsets/metabolism , Male , MicroRNAs/genetics , Middle Aged , Models, Biological , Organ Specificity/genetics , Promoter Regions, Genetic , Transcription Factors/metabolism
9.
BMC Genomics ; 12: 171, 2011 03 31.
Article in English | MEDLINE | ID: mdl-21453527

ABSTRACT

BACKGROUND: Hibernation is an adaptive strategy to survive in highly seasonal or unpredictable environments. The molecular and genetic basis of hibernation physiology in mammals has only recently been studied using large scale genomic approaches. We analyzed gene expression in the American black bear, Ursus americanus, using a custom 12,800 cDNA probe microarray to detect differences in expression that occur in heart and liver during winter hibernation in comparison to summer active animals. RESULTS: We identified 245 genes in heart and 319 genes in liver that were differentially expressed between winter and summer. The expression of 24 genes was significantly elevated during hibernation in both heart and liver. These genes are mostly involved in lipid catabolism and protein biosynthesis and include RNA binding protein motif 3 (Rbm3), which enhances protein synthesis at mildly hypothermic temperatures. Elevated expression of protein biosynthesis genes suggests induction of translation that may be related to adaptive mechanisms reducing cardiac and muscle atrophies over extended periods of low metabolism and immobility during hibernation in bears. Coordinated reduction of transcription of genes involved in amino acid catabolism suggests redirection of amino acids from catabolic pathways to protein biosynthesis. We identify common for black bears and small mammalian hibernators transcriptional changes in the liver that include induction of genes responsible for fatty acid ß oxidation and carbohydrate synthesis and depression of genes involved in lipid biosynthesis, carbohydrate catabolism, cellular respiration and detoxification pathways. CONCLUSIONS: Our findings show that modulation of gene expression during winter hibernation represents molecular mechanism of adaptation to extreme environments.


Subject(s)
Heart/physiology , Hibernation/physiology , Liver/physiology , Ursidae/genetics , Adaptation, Physiological , Animals , DNA, Complementary/genetics , Gene Expression Profiling , Gene Expression Regulation , Hibernation/genetics , Male , Oligonucleotide Array Sequence Analysis , Seasons , Ursidae/physiology
10.
Cancer Res ; 70(23): 9991-10001, 2010 Dec 01.
Article in English | MEDLINE | ID: mdl-21118961

ABSTRACT

Identifying the functions of proteins, which associate with specific subnuclear structures, is critical to understanding eukaryotic nuclear dynamics. Sp100 is a prototypical protein of ND10/PML nuclear bodies, which colocalizes with Daxx and the proto-oncogenic PML. Sp100 isoforms contain SAND, PHD, Bromo, and HMG domains and are highly sumoylated, all characteristics suggestive of a role in chromatin-mediated gene regulation. A role for Sp100 in oncogenesis has not been defined previously. Using selective Sp100 isoform-knockdown approaches, we show that normal human diploid fibroblasts with reduced Sp100 levels rapidly senesce. Subsequently, small rapidly dividing Sp100 minus cells emerge from the senescing fibroblasts and are found to be highly tumorigenic in nude mice. The derivation of these tumorigenic cells from the parental fibroblasts is confirmed by microsatellite analysis. The small rapidly dividing Sp100 minus cells now also lack ND10/PML bodies, and exhibit genomic instability and p53 cytoplasmic sequestration. They have also activated MYC, RAS, and TERT pathways and express mesenchymal to epithelial transdifferentiation (MET) markers. Reintroduction of expression of only the Sp100A isoform is sufficient to maintain senescence and to inhibit emergence of the highly tumorigenic cells. Global transcriptome studies, quantitative PCR, and protein studies, as well as immunolocalization studies during the course of the transformation, reveal that a transient expression of stem cell markers precedes the malignant transformation. These results identify a role for Sp100 as a tumor suppressor in addition to its role in maintaining ND10/PML bodies and in the epigenetic regulation of gene expression.


Subject(s)
Antigens, Nuclear/genetics , Autoantigens/genetics , Embryonic Stem Cells/metabolism , Fibroblasts/metabolism , Tumor Suppressor Proteins/genetics , Animals , Antigens, Nuclear/metabolism , Autoantigens/metabolism , Blotting, Western , Cell Transformation, Neoplastic/genetics , Cells, Cultured , Cellular Senescence/genetics , Epithelial-Mesenchymal Transition/genetics , Fibroblasts/cytology , Gene Expression Profiling , HEK293 Cells , Humans , Male , Mice , Mice, Nude , Neoplasms, Experimental/genetics , Neoplasms, Experimental/metabolism , Neoplasms, Experimental/pathology , Nuclear Proteins/metabolism , Oligonucleotide Array Sequence Analysis , Promyelocytic Leukemia Protein , Proto-Oncogene Proteins c-myc/metabolism , RNA Interference , Reverse Transcriptase Polymerase Chain Reaction , Transcription Factors/metabolism , Transplantation, Heterologous , Tumor Suppressor Proteins/metabolism , ras Proteins/metabolism
11.
Cancer Res ; 69(24): 9202-10, 2009 Dec 15.
Article in English | MEDLINE | ID: mdl-19951989

ABSTRACT

Early diagnosis of lung cancer followed by surgery presently is the most effective treatment for non-small cell lung cancer (NSCLC). An accurate, minimally invasive test that could detect early disease would permit timely intervention and potentially reduce mortality. Recent studies have shown that the peripheral blood can carry information related to the presence of disease, including prognostic information and information on therapeutic response. We have analyzed gene expression in peripheral blood mononuclear cell samples including 137 patients with NSCLC tumors and 91 patient controls with nonmalignant lung conditions, including histologically diagnosed benign nodules. Subjects were primarily smokers and former smokers. We have identified a 29-gene signature that separates these two patient classes with 86% accuracy (91% sensitivity, 80% specificity). Accuracy in an independent validation set, including samples from a new location, was 78% (sensitivity of 76% and specificity of 82%). An analysis of this NSCLC gene signature in 18 NSCLCs taken presurgery, with matched samples from 2 to 5 months postsurgery, showed that in 78% of cases, the signature was reduced postsurgery and disappeared entirely in 33%. Our results show the feasibility of using peripheral blood gene expression signatures to identify early-stage NSCLC in at-risk populations.


Subject(s)
Carcinoma, Non-Small-Cell Lung/diagnosis , Leukocytes, Mononuclear/physiology , Lung Diseases/diagnosis , Lung Neoplasms/diagnosis , Adult , Aged , Aged, 80 and over , Carcinoma, Non-Small-Cell Lung/blood , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/immunology , Diagnosis, Differential , Early Detection of Cancer/methods , Female , Gene Expression Profiling , Humans , Leukocytes, Mononuclear/immunology , Leukocytes, Mononuclear/metabolism , Lung Diseases/blood , Lung Diseases/genetics , Lung Diseases/immunology , Lung Neoplasms/blood , Lung Neoplasms/genetics , Lung Neoplasms/immunology , Male , Middle Aged , Smoking/adverse effects
12.
BMC Bioinformatics ; 10: 337, 2009 Oct 15.
Article in English | MEDLINE | ID: mdl-19832995

ABSTRACT

BACKGROUND: Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE) RESULTS: Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form n clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained. CONCLUSION: More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.The Matlab version of SVM-RNE can be downloaded from http://web.macam.ac.il/~myousef.


Subject(s)
Biomarkers , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis/methods , Artificial Intelligence , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis/classification , Pattern Recognition, Automated
13.
Physiol Genomics ; 37(2): 108-18, 2009 04 10.
Article in English | MEDLINE | ID: mdl-19240299

ABSTRACT

We conducted a large-scale gene expression screen using the 3,200 cDNA probe microarray developed specifically for Ursus americanus to detect expression differences in liver and skeletal muscle that occur during winter hibernation compared with animals sampled during summer. The expression of 12 genes, including RNA binding protein motif 3 (Rbm3), that are mostly involved in protein biosynthesis, was induced during hibernation in both liver and muscle. The Gene Ontology and Gene Set Enrichment analysis consistently showed a highly significant enrichment of the protein biosynthesis category by overexpressed genes in both liver and skeletal muscle during hibernation. Coordinated induction in transcriptional level of genes involved in protein biosynthesis is a distinctive feature of the transcriptome in hibernating black bears. This finding implies induction of translation and suggests an adaptive mechanism that contributes to a unique ability to reduce muscle atrophy over prolonged periods of immobility during hibernation. Comparing expression profiles in bears to small mammalian hibernators shows a general trend during hibernation of transcriptional changes that include induction of genes involved in lipid metabolism and carbohydrate synthesis as well as depression of genes involved in the urea cycle and detoxification function in liver.


Subject(s)
Gene Expression Profiling , Hibernation/genetics , Liver/metabolism , Muscle, Skeletal/metabolism , Protein Biosynthesis/genetics , Ursidae/genetics , Animals , Basal Metabolism , Body Temperature , Gene Library , Genomics/methods , Male , Oligonucleotide Array Sequence Analysis , Reverse Transcriptase Polymerase Chain Reaction , Ursidae/metabolism , Ursidae/physiology
14.
Algorithms Mol Biol ; 3: 2, 2008 Jan 28.
Article in English | MEDLINE | ID: mdl-18226233

ABSTRACT

BACKGROUND: The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. RESULTS: Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70-80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs. CONCLUSION: One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined.

15.
Bioinformatics ; 23(22): 2987-92, 2007 Nov 15.
Article in English | MEDLINE | ID: mdl-17925304

ABSTRACT

MOTIVATION: Most computational methodologies for miRNA:mRNA target gene prediction use the seed segment of the miRNA and require cross-species sequence conservation in this region of the mRNA target. Methods that do not rely on conservation generate numbers of predictions, which are too large to validate. We describe a target prediction method (NBmiRTar) that does not require sequence conservation, using instead, machine learning by a naïve Bayes classifier. It generates a model from sequence and miRNA:mRNA duplex information from validated targets and artificially generated negative examples. Both the 'seed' and 'out-seed' segments of the miRNA:mRNA duplex are used for target identification. RESULTS: The application of machine-learning techniques to the features we have used is a useful and general approach for microRNA target gene prediction. Our technique produces fewer false positive predictions and fewer target candidates to be tested. It exhibits higher sensitivity and specificity than algorithms that rely on conserved genomic regions to decrease false positive predictions.


Subject(s)
Artificial Intelligence , Gene Targeting/methods , MicroRNAs/genetics , Pattern Recognition, Automated/methods , RNA Probes/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Algorithms , Base Sequence , Bayes Theorem , Molecular Sequence Data
16.
Clin Cancer Res ; 13(10): 2905-15, 2007 May 15.
Article in English | MEDLINE | ID: mdl-17504990

ABSTRACT

PURPOSE: The risk of developing metastatic squamous cell carcinoma for patients with head and neck squamous cell carcinoma (HNSCC) is very high. Because these patients are often heavy tobacco users, they are also at risk for developing a second primary cancer, with squamous cell carcinoma of the lung (LSCC) being the most common. The distinction between a lung metastasis and a primary LSCC is currently based on certain clinical and histologic criteria, although the accuracy of this approach remains in question. EXPERIMENTAL DESIGN: Gene expression patterns derived from 28 patients with HNSCC or LSCC from a single center were analyzed using penalized discriminant analysis. Validation was done on previously published data for 134 total subjects from four independent Affymetrix data sets. RESULTS: We identified a panel of 10 genes (CXCL13, COL6A2, SFTPB, KRT14, TSPYL5, TMP3, KLK10, MMP1, GAS1, and MYH2) that accurately distinguished these two tumor types. This 10-gene classifier was validated on 122 subjects derived from four independent data sets and an average accuracy of 96% was shown. Gene expression values were validated by quantitative reverse transcription-PCR derived on 12 independent samples (seven HNSCC and five LSCC). The 10-gene classifier was also used to determine the site of origin of 12 lung lesions from patients with prior HNSCC. CONCLUSIONS: The results suggest that penalized discriminant analysis using these 10 genes will be highly accurate in determining the origin of squamous cell carcinomas in the lungs of patients with previous head and neck malignancies.


Subject(s)
Carcinoma, Squamous Cell/classification , Gene Expression Profiling , Genes, Neoplasm , Head and Neck Neoplasms/classification , Lung Neoplasms/classification , Carcinoma, Squamous Cell/diagnosis , Carcinoma, Squamous Cell/secondary , Cohort Studies , Female , Head and Neck Neoplasms/diagnosis , Head and Neck Neoplasms/pathology , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/secondary , Male , Middle Aged
17.
BMC Bioinformatics ; 8: 144, 2007 May 02.
Article in English | MEDLINE | ID: mdl-17474999

ABSTRACT

BACKGROUND: Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. RESULTS: We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights. CONCLUSION: SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups. Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.


Subject(s)
Databases, Genetic/classification , Gene Expression Profiling/classification , Gene Expression Regulation, Neoplastic/genetics , Multigene Family/genetics , Databases, Genetic/statistics & numerical data , Gene Expression/genetics , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Head and Neck Neoplasms/genetics , Humans , Male , Prostatic Neoplasms/genetics
18.
Bioinformatics ; 22(11): 1325-34, 2006 Jun 01.
Article in English | MEDLINE | ID: mdl-16543277

ABSTRACT

MOTIVATION: Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naive Bayes classifier. It automatically generates a model from the training data, which consists of sequence and structure information of known miRNAs from a variety of species. RESULTS: Our study shows that the application of machine learning techniques, along with the integration of data from multiple species is a useful and general approach for miRNA gene prediction. Based on our experiments, we believe that this new technique is applicable to an extensive range of eukaryotes' genomes. Specific structure and sequence features are first used to identify miRNAs followed by a comparative analysis to decrease the number of false positives (FPs). The resulting algorithm exhibits higher specificity and similar sensitivity compared to currently used algorithms that rely on conserved genomic regions to decrease the rate of FPs.


Subject(s)
Computational Biology/methods , MicroRNAs/genetics , Sequence Analysis, DNA/methods , Algorithms , Animals , Artificial Intelligence , Base Sequence , Bayes Theorem , Humans , Molecular Sequence Data , Sequence Homology, Nucleic Acid , Species Specificity
19.
Physiol Genomics ; 25(2): 346-53, 2006 Apr 13.
Article in English | MEDLINE | ID: mdl-16464973

ABSTRACT

Hibernation is an energy-saving strategy adopted by a wide range of mammals to survive highly seasonal or unpredictable environments. Arctic ground squirrels living in Alaska provide an extreme example, with 6- to 9-mo-long hibernation seasons when body temperature alternates between levels near 0 degrees C during torpor and 37 degrees C during arousal episodes. Heat production during hibernation is provided, in part, by nonshivering thermogenesis that occurs in large deposits of brown adipose tissue (BAT). BAT is active at tissue temperatures from 0 to 37 degrees C during rewarming and continuously at near 0 degrees C during torpor in subfreezing conditions. Despite its crucial role in hibernation, the global gene expression patterns in BAT during hibernation compared with the nonhibernation season remain largely unknown. We report a large-scale study of differential gene expression in BAT between winter hibernating and summer active arctic ground squirrels using mouse microarrays. Selected differentially expressed genes identified on the arrays were validated by quantitative real-time PCR using ground squirrel specific primers. Our results show that the mRNA levels of the genes involved in nearly every step of the biochemical pathway leading to nonshivering thermogenesis are significantly increased in BAT during hibernation, whereas those of genes involved in protein biosynthesis are significantly decreased compared with summer active animals in August. Surprisingly, the differentially expressed genes also include adipocyte differentiation-related protein or adipophilin (Adfp), gap junction protein 1 (Gja1), and secreted protein acidic and cysteine-rich (Sparc), which may play a role in enhancing thermogenesis at low tissue temperatures in BAT.


Subject(s)
Adipose Tissue, Brown/metabolism , Gene Expression Regulation , Hibernation/genetics , Oligonucleotide Array Sequence Analysis , Alaska , Animals , Connexins/genetics , Connexins/metabolism , Gene Expression Profiling/methods , Membrane Proteins , Mice , Osteonectin/genetics , Osteonectin/metabolism , Peptides/genetics , Peptides/metabolism , Perilipin-2 , RNA, Messenger/metabolism , Reproducibility of Results , Sciuridae , Seasons , Thermogenesis/genetics
20.
Blood ; 107(8): 3189-96, 2006 Apr 15.
Article in English | MEDLINE | ID: mdl-16403914

ABSTRACT

We previously identified a small number of genes using cDNA arrays that accurately diagnosed patients with Sézary Syndrome (SS), the erythrodermic and leukemic form of cutaneous T-cell lymphoma (CTCL). We now report the development of a quantitative real-time polymerase chain reaction (qRT-PCR) assay that uses expression values for just 5 of those genes: STAT4, GATA-3, PLS3, CD1D, and TRAIL. qRT-PCR data from peripheral blood mononuclear cells (PBMCs) accurately classified 88% of 17 patients with high blood tumor burden and 100% of 12 healthy controls in the training set using Fisher linear discriminant analysis (FLDA). The same 5 genes were then assayed on 56 new samples from 49 SS patients with blood tumor burdens of 5% to 99% and 69 samples from 65 new healthy controls. The average accuracy over 1000 resamplings was 90% using FLDA and 88% using support vector machine (SVM). We also tested the classifier on 14 samples from patients with CTCL with no detectable peripheral involvement and 3 patients with atopic dermatitis with severe erythroderma. The accuracy was 100% in identifying these samples as non-SS patients. These results are the first to demonstrate that gene expression profiling by quantitative PCR on a selected number of critical genes can be employed to molecularly diagnosis SS.


Subject(s)
Gene Expression Regulation, Leukemic , Sezary Syndrome/diagnosis , Skin Neoplasms/diagnosis , Tumor Burden , Dermatitis, Atopic/diagnosis , Dermatitis, Atopic/genetics , Dermatitis, Atopic/pathology , Dermatitis, Exfoliative/diagnosis , Dermatitis, Exfoliative/genetics , Dermatitis, Exfoliative/pathology , Humans , Predictive Value of Tests , Reverse Transcriptase Polymerase Chain Reaction/methods , Sezary Syndrome/genetics , Sezary Syndrome/pathology , Skin Neoplasms/genetics , Skin Neoplasms/pathology , Tumor Burden/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...