Search | VHL Regional Portal

1.

Integrative multi-omics analyses to identify the genetic and functional mechanisms underlying ovarian cancer risk regions.

Dareng, Eileen O; Coetzee, Simon G; Tyrer, Jonathan P; Peng, Pei-Chen; Rosenow, Will; Chen, Stephanie; Davis, Brian D; Dezem, Felipe Segato; Seo, Ji-Heui; Nameki, Robbin; Reyes, Alberto L; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia N; Aravantinos, Gerasimos; Bandera, Elisa V; Beane Freeman, Laura E; Beckmann, Matthias W; Beeghly-Fadiel, Alicia; Benitez, Javier; Bernardini, Marcus Q; Bjorge, Line; Black, Amanda; Bogdanova, Natalia V; Bolton, Kelly L; Brenton, James D; Budzilowska, Agnieszka; Butzow, Ralf; Cai, Hui; Campbell, Ian; Cannioto, Rikki; Chang-Claude, Jenny; Chanock, Stephen J; Chen, Kexin; Chenevix-Trench, Georgia; Chiew, Yoke-Eng; Cook, Linda S; DeFazio, Anna; Dennis, Joe; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana M; Ene, Gabrielle; Fasching, Peter A; Flanagan, James M; Fortner, Renée T; Fostira, Florentia; Gentry-Maharaj, Aleksandra.

Am J Hum Genet ; 111(6): 1061-1083, 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38723632

ABSTRACT

To identify credible causal risk variants (CCVs) associated with different histotypes of epithelial ovarian cancer (EOC), we performed genome-wide association analysis for 470,825 genotyped and 10,163,797 imputed SNPs in 25,981 EOC cases and 105,724 controls of European origin. We identified five histotype-specific EOC risk regions (p value <5 × 10-8) and confirmed previously reported associations for 27 risk regions. Conditional analyses identified an additional 11 signals independent of the primary signal at six risk regions (p value <10-5). Fine mapping identified 4,008 CCVs in these regions, of which 1,452 CCVs were located in ovarian cancer-related chromatin marks with significant enrichment in active enhancers, active promoters, and active regions for CCVs from each EOC histotype. Transcriptome-wide association and colocalization analyses across histotypes using tissue-specific and cross-tissue datasets identified 86 candidate susceptibility genes in known EOC risk regions and 32 genes in 23 additional genomic regions that may represent novel EOC risk loci (false discovery rate <0.05). Finally, by integrating genome-wide HiChIP interactome analysis with transcriptome-wide association study (TWAS), variant effect predictor, transcription factor ChIP-seq, and motifbreakR data, we identified candidate gene-CCV interactions at each locus. This included risk loci where TWAS identified one or more candidate susceptibility genes (e.g., HOXD-AS2, HOXD8, and HOXD3 at 2q31) and other loci where no candidate gene was identified (e.g., MYC and PVT1 at 8q24) by TWAS. In summary, this study describes a functional framework and provides a greater understanding of the biological significance of risk alleles and candidate gene targets at EOC susceptibility loci identified by a genome-wide association study.

Subject(s)

Genetic Predisposition to Disease , Genome-Wide Association Study , Ovarian Neoplasms , Polymorphism, Single Nucleotide , Humans , Female , Ovarian Neoplasms/genetics , Ovarian Neoplasms/pathology , Carcinoma, Ovarian Epithelial/genetics , Transcriptome , Risk Factors , Genomics/methods , Case-Control Studies , Multiomics

2.

Exome sequencing identifies HELB as a novel susceptibility gene for non-mucinous, non-high-grade-serous epithelial ovarian cancer.

Dicks, Ed M; Tyrer, Jonthan P; Ezquina, Suzana; Jones, Michelle; Baierl, John; Peng, Pei-Chen; Diaz, Michael; Goode, Ellen; Winham, Stacey J; Dörk, Thilo; Van Gorp, Toon; De Fazio, Ana; Bowtell, David; Odunsi, Kunle; Moysich, Kirsten; Pavanello, Marina; Campbell, Ian; Brenton, James D; Ramus, Susan J; Gayther, Simon A; Pharoah, Paul D P.

medRxiv ; 2024 Apr 03.

Article in English | MEDLINE | ID: mdl-38633804

ABSTRACT

Rare, germline loss-of-function variants in a handful of genes that encode DNA repair proteins have been shown to be associated with epithelial ovarian cancer with a stronger association for the high-grade serous hiostotype. The aim of this study was to collate exome sequencing data from multiple epithelial ovarian cancer case cohorts and controls in order to systematically evaluate the role of coding, loss-of-function variants across the genome in epithelial ovarian cancer risk. We assembled exome data for a total of 2,573 non-mucinous cases (1,876 high-grade serous and 697 non-high grade serous) and 13,925 controls. Harmonised variant calling and quality control filtering was applied across the different data sets. We carried out a gene-by-gene simple burden test for association of rare loss-of-function variants (minor allele frequency < 0.1%) with all non-mucinous ovarian cancer, high grade serous ovarian cancer and non-high grade serous ovarian cancer using logistic regression adjusted for the top four principal components to account for cryptic population structure and genetic ancestry. Seven of the top 10 associated genes were associations of the known ovarian cancer susceptibility genes BRCA1, BRCA2, BRIP1, RAD51C, RAD51D, MSH6 and PALB2 (false discovery probability < 0.1). A further four genes (HELB, OR2T35, NBN and MYO1A) had a false discovery rate of less than 0.1. Of these, HELB was most strongly associated with the non-high grade serous histotype (P = 1.3×10-6, FDR = 9.1×10-4). Further support for this association comes from the observation that loss of function variants in this gene are also associated with age at natural menopause and Mendelian randomisation analysis shows an association between genetically predicted age at natural menopause and endometrioid ovarian cancer, but not high-grade serous ovarian cancer.

3.

Editorial viewpoints of scientific publishing for early-career research scientists.

Peng, Pei-Chen; Coleman, Fadie T.

BMC Proc ; 18(Suppl 1): 4, 2024 Jan 16.

Article in English | MEDLINE | ID: mdl-38229056

ABSTRACT

While the structure and composition of the scientific manuscript is well known within scientific communities, insider knowledge such as the tricks of the trade and editorial viewpoints of scientific publishing are often less known to early-career research scientists. This article focuses on the key aspects of scientific publishing, including tips for success geared towards senior postdocs and junior faculty. It also highlights important considerations for getting manuscripts published in an efficient and successful manner.

4.

ChatGPT and large language models in academia: opportunities and challenges.

Meyer, Jesse G; Urbanowicz, Ryan J; Martin, Patrick C N; O'Connor, Karen; Li, Ruowang; Peng, Pei-Chen; Bright, Tiffani J; Tatonetti, Nicholas; Won, Kyoung Jae; Gonzalez-Hernandez, Graciela; Moore, Jason H.

BioData Min ; 16(1): 20, 2023 Jul 13.

Article in English | MEDLINE | ID: mdl-37443040

ABSTRACT

The introduction of large language models (LLMs) that allow iterative "chat" in late 2022 is a paradigm shift that enables generation of text often indistinguishable from that written by humans. LLM-based chatbots have immense potential to improve academic work efficiency, but the ethical implications of their fair use and inherent bias must be considered. In this editorial, we discuss this technology from the academic's perspective with regard to its limitations and utility for academic writing, education, and programming. We end with our stance with regard to using LLMs and chatbots in academia, which is summarized as (1) we must find ways to effectively use them, (2) their use does not constitute plagiarism (although they may produce plagiarized text), (3) we must quantify their bias, (4) users must be cautious of their poor accuracy, and (5) the future is bright for their application to research and as an academic tool.

5.

Copy Number Variants Are Ovarian Cancer Risk Alleles at Known and Novel Risk Loci.

DeVries, Amber A; Dennis, Joe; Tyrer, Jonathan P; Peng, Pei-Chen; Coetzee, Simon G; Reyes, Alberto L; Plummer, Jasmine T; Davis, Brian D; Chen, Stephanie S; Dezem, Felipe Segato; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia N; Beckmann, Matthias W; Beeghly-Fadiel, Alicia; Berchuck, Andrew; Bogdanova, Natalia V; Bogdanova-Markov, Nadja; Brenton, James D; Butzow, Ralf; Campbell, Ian; Chang-Claude, Jenny; Chenevix-Trench, Georgia; Cook, Linda S; DeFazio, Anna; Doherty, Jennifer A; Dörk, Thilo; Eccles, Diana M; Eliassen, A Heather; Fasching, Peter A; Fortner, Renée T; Giles, Graham G; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Håkansson, Niclas; Hildebrandt, Michelle A T; Huff, Chad; Huntsman, David G; Jensen, Allan; Kar, Siddhartha; Karlan, Beth Y; Khusnutdinova, Elza K; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Labrie, Marilyne; Lambrechts, Diether; Le, Nhu D; Lubinski, Jan.

J Natl Cancer Inst ; 114(11): 1533-1544, 2022 11 14.

Article in English | MEDLINE | ID: mdl-36210504

ABSTRACT

BACKGROUND: Known risk alleles for epithelial ovarian cancer (EOC) account for approximately 40% of the heritability for EOC. Copy number variants (CNVs) have not been investigated as EOC risk alleles in a large population cohort. METHODS: Single nucleotide polymorphism array data from 13â071 EOC cases and 17â306 controls of White European ancestry were used to identify CNVs associated with EOC risk using a rare admixture maximum likelihood test for gene burden and a by-probe ratio test. We performed enrichment analysis of CNVs at known EOC risk loci and functional biofeatures in ovarian cancer-related cell types. RESULTS: We identified statistically significant risk associations with CNVs at known EOC risk genes; BRCA1 (PEOC = 1.60E-21; OREOC = 8.24), RAD51C (Phigh-grade serous ovarian cancer [HGSOC] = 5.5E-4; odds ratio [OR]HGSOC = 5.74 del), and BRCA2 (PHGSOC = 7.0E-4; ORHGSOC = 3.31 deletion). Four suggestive associations (P < .001) were identified for rare CNVs. Risk-associated CNVs were enriched (P < .05) at known EOC risk loci identified by genome-wide association study. Noncoding CNVs were enriched in active promoters and insulators in EOC-related cell types. CONCLUSIONS: CNVs in BRCA1 have been previously reported in smaller studies, but their observed frequency in this large population-based cohort, along with the CNVs observed at BRCA2 and RAD51C gene loci in EOC cases, suggests that these CNVs are potentially pathogenic and may contribute to the spectrum of disease-causing mutations in these genes. CNVs are likely to occur in a wider set of susceptibility regions, with potential implications for clinical genetic testing and disease prevention.

Subject(s)

Genome-Wide Association Study , Ovarian Neoplasms , Female , Humans , Carcinoma, Ovarian Epithelial/genetics , Alleles , DNA Copy Number Variations , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide , Ovarian Neoplasms/genetics , Ovarian Neoplasms/pathology

6.

DNA methylation and transcriptomic features are preserved throughout disease recurrence and chemoresistance in high grade serous ovarian cancers.

Gull, Nicole; Jones, Michelle R; Peng, Pei-Chen; Coetzee, Simon G; Silva, Tiago C; Plummer, Jasmine T; Reyes, Alberto Luiz P; Davis, Brian D; Chen, Stephanie S; Lawrenson, Kate; Lester, Jenny; Walsh, Christine; Rimel, Bobbie J; Li, Andrew J; Cass, Ilana; Berg, Yonatan; Govindavari, John-Paul B; Rutgers, Joanna K L; Berman, Benjamin P; Karlan, Beth Y; Gayther, Simon A.

J Exp Clin Cancer Res ; 41(1): 232, 2022 Jul 27.

Article in English | MEDLINE | ID: mdl-35883104

ABSTRACT

BACKGROUND: Little is known about the role of global DNA methylation in recurrence and chemoresistance of high grade serous ovarian cancer (HGSOC). METHODS: We performed whole genome bisulfite sequencing and transcriptome sequencing in 62 primary and recurrent tumors from 28 patients with stage III/IV HGSOC, of which 11 patients carried germline, pathogenic BRCA1 and/or BRCA2 mutations. RESULTS: Landscapes of genome-wide methylation (on average 24.2 million CpGs per tumor) and transcriptomes in primary and recurrent tumors showed extensive heterogeneity between patients but were highly preserved in tumors from the same patient. We identified significant differences in the burden of differentially methylated regions (DMRs) in tumors from BRCA1/2 compared to non-BRCA1/2 carriers (mean 659 DMRs and 388 DMRs in paired comparisons respectively). We identified overexpression of immune pathways in BRCA1/2 carriers compared to non-carriers, implicating an increased immune response in improved survival (P = 0.006) in these BRCA1/2 carriers. CONCLUSION: These findings indicate methylome and gene expression programs established in the primary tumor are conserved throughout disease progression, even after extensive chemotherapy treatment, and that changes in methylation and gene expression are unlikely to serve as drivers for chemoresistance in HGSOC.

Subject(s)

DNA Methylation , Ovarian Neoplasms , Drug Resistance, Neoplasm/genetics , Female , Humans , Neoplasm Recurrence, Local/drug therapy , Neoplasm Recurrence, Local/genetics , Ovarian Neoplasms/drug therapy , Ovarian Neoplasms/genetics , Ovarian Neoplasms/pathology , Transcriptome

7.

Ovarian Cancer Risk Variants Are Enriched in Histotype-Specific Enhancers and Disrupt Transcription Factor Binding Sites.

Jones, Michelle R; Peng, Pei-Chen; Coetzee, Simon G; Tyrer, Jonathan; Reyes, Alberto Luiz P; Corona, Rosario I; Davis, Brian; Chen, Stephanie; Dezem, Felipe; Seo, Ji-Heui; Kar, Siddartha; Dareng, Eileen; Berman, Benjamin P; Freedman, Matthew L; Plummer, Jasmine T; Lawrenson, Kate; Pharoah, Paul; Hazelett, Dennis J; Gayther, Simon A.

Am J Hum Genet ; 107(4): 622-635, 2020 10 01.

Article in English | MEDLINE | ID: mdl-32946763

ABSTRACT

Quantifying the functional effects of complex disease risk variants can provide insights into mechanisms underlying disease biology. Genome-wide association studies have identified 39 regions associated with risk of epithelial ovarian cancer (EOC). The vast majority of these variants lie in the non-coding genome, where they likely function through interaction with gene regulatory elements. In this study we first estimated the heritability explained by known common low penetrance risk alleles for EOC. The narrow sense heritability (hg2) of EOC overall and high-grade serous ovarian cancer (HGSOCs) were estimated to be 5%-6%. Partitioned SNP heritability across broad functional categories indicated a significant contribution of regulatory elements to EOC heritability. We collated epigenomic profiling data for 77 cell and tissue types from Roadmap Epigenomics and ENCODE, and from H3K27Ac ChIP-seq data generated in 26 ovarian cancer and precursor-related cell and tissue types. We identified significant enrichment of risk single-nucleotide polymorphisms (SNPs) in active regulatory elements marked by H3K27Ac in HGSOCs. To further investigate how risk SNPs in active regulatory elements influence predisposition to ovarian cancer, we used motifbreakR to predict the disruption of transcription factor binding sites. We identified 469 candidate causal risk variants in H3K27Ac peaks that are predicted to significantly break transcription factor (TF) motifs. The most frequently broken motif was REST (p value = 0.0028), which has been reported as both a tumor suppressor and an oncogene. Overall, these systematic functional annotations with epigenomic data improve interpretation of EOC risk variants and shed light on likely cells of origin.

Subject(s)

Carcinoma, Ovarian Epithelial/genetics , Co-Repressor Proteins/genetics , Cystadenocarcinoma, Serous/genetics , Enhancer Elements, Genetic , Histones/genetics , Nerve Tissue Proteins/genetics , Ovarian Neoplasms/genetics , Alleles , Binding Sites , Carcinoma, Ovarian Epithelial/diagnosis , Carcinoma, Ovarian Epithelial/pathology , Chromosome Mapping , Co-Repressor Proteins/metabolism , Cystadenocarcinoma, Serous/diagnosis , Cystadenocarcinoma, Serous/pathology , Female , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Histones/metabolism , Humans , Inheritance Patterns , Nerve Tissue Proteins/metabolism , Ovarian Neoplasms/diagnosis , Ovarian Neoplasms/pathology , Penetrance , Polymorphism, Single Nucleotide , Risk

8.

The Role of Chromatin Accessibility in cis-Regulatory Evolution.

Peng, Pei-Chen; Khoueiry, Pierre; Girardot, Charles; Reddington, James P; Garfield, David A; Furlong, Eileen E M; Sinha, Saurabh.

Genome Biol Evol ; 11(7): 1813-1828, 2019 07 01.

Article in English | MEDLINE | ID: mdl-31114856

ABSTRACT

Transcription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.

Subject(s)

Chromatin/metabolism , Drosophila Proteins/metabolism , Transcription Factors/metabolism , Animals , Chromatin/genetics , Drosophila Proteins/genetics , Drosophila melanogaster , Evolution, Molecular , Protein Binding , Transcription Factors/genetics

9.

Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity.

Khoueiry, Pierre; Girardot, Charles; Ciglar, Lucia; Peng, Pei-Chen; Gustafson, E Hilary; Sinha, Saurabh; Furlong, Eileen Em.

Elife ; 62017 08 09.

Article in English | MEDLINE | ID: mdl-28792889

ABSTRACT

Sequence variation within enhancers plays a major role in both evolution and disease, yet its functional impact on transcription factor (TF) occupancy and enhancer activity remains poorly understood. Here, we assayed the binding of five essential TFs over multiple stages of embryogenesis in two distant Drosophila species (with 1.4 substitutions per neutral site), identifying thousands of orthologous enhancers with conserved or diverged combinatorial occupancy. We used these binding signatures to dissect two properties of developmental enhancers: (1) potential TF cooperativity, using signatures of co-associations and co-divergence in TF occupancy. This revealed conserved combinatorial binding despite sequence divergence, suggesting protein-protein interactions sustain conserved collective occupancy. (2) Enhancer in-vivo activity, revealing orthologous enhancers with conserved activity despite divergence in TF occupancy. Taken together, we identify enhancers with diverged motifs yet conserved occupancy and others with diverged occupancy yet conserved activity, emphasising the need to functionally measure the effect of divergence on enhancer activity.

Subject(s)

DNA/metabolism , Enhancer Elements, Genetic , Evolution, Molecular , Transcription Factors/metabolism , Animals , Drosophila/embryology , Drosophila/genetics , Protein Binding

10.

Quantitative modeling of gene expression using DNA shape features of binding sites.

Peng, Pei-Chen; Sinha, Saurabh.

Nucleic Acids Res ; 44(13): e120, 2016 07 27.

Article in English | MEDLINE | ID: mdl-27257066

ABSTRACT

Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.

Subject(s)

DNA-Binding Proteins/genetics , Drosophila melanogaster/genetics , Embryonic Development/genetics , Gene Expression Regulation, Developmental/genetics , Animals , Binding Sites/genetics , Computational Biology , DNA/genetics , DNA-Binding Proteins/biosynthesis , Position-Specific Scoring Matrices , Regulatory Sequences, Nucleic Acid/genetics , Thermodynamics

11.

Incorporating chromatin accessibility data into sequence-to-expression modeling.

Peng, Pei-Chen; Hassan Samee, Md Abul; Sinha, Saurabh.

Biophys J ; 108(5): 1257-67, 2015 Mar 10.

Article in English | MEDLINE | ID: mdl-25762337

ABSTRACT

Prediction of gene expression levels from regulatory sequences is one of the major challenges of genomic biology today. A particularly promising approach to this problem is that taken by thermodynamics-based models that interpret an enhancer sequence in a given cellular context specified by transcription factor concentration levels and predict precise expression levels driven by that enhancer. Such models have so far not accounted for the effect of chromatin accessibility on interactions between transcription factor and DNA and consequently on gene-expression levels. Here, we extend a thermodynamics-based model of gene expression, called GEMSTAT (Gene Expression Modeling Based on Statistical Thermodynamics), to incorporate chromatin accessibility data and quantify its effect on accuracy of expression prediction. In the new model, called GEMSTAT-A, accessibility at a binding site is assumed to affect the transcription factor's binding strength at the site, whereas all other aspects are identical to the GEMSTAT model. We show that this modification results in significantly better fits in a data set of over 30 enhancers regulating spatial expression patterns in the blastoderm-stage Drosophila embryo. It is important to note that the improved fits result not from an overall elevated accessibility in active enhancers but from the variation of accessibility levels within an enhancer. With whole-genome DNA accessibility measurements becoming increasingly popular, our work demonstrates how such data may be useful for sequence-to-expression models. It also calls for future advances in modeling accessibility levels from sequence and the transregulatory context, so as to predict accurately the effect of cis and trans perturbations on gene expression.

Subject(s)

Chromatin Assembly and Disassembly , Chromatin/genetics , Models, Genetic , Animals , Chromatin/metabolism , Drosophila/genetics , Drosophila/growth & development , Gene Expression Regulation, Developmental , Thermodynamics

12.

Comparison of feature selection methods for cross-laboratory microarray analysis.

Liu, Hsi-Che; Peng, Pei-Chen; Hsieh, Tzung-Chien; Yeh, Ting-Chi; Lin, Chih-Jen; Chen, Chien-Yu; Hou, Jen-Yin; Shih, Lee-Yung; Liang, Der-Cherng.

IEEE/ACM Trans Comput Biol Bioinform ; 10(3): 593-604, 2013.

Article in English | MEDLINE | ID: mdl-24091394

ABSTRACT

The amount of gene expression data of microarray has grown exponentially. To apply them for extensive studies, integrated analysis of cross-laboratory (cross-lab) data becomes a trend, and thus, choosing an appropriate feature selection method is an essential issue. This paper focuses on feature selection for Affymetrix (Affy) microarray studies across different labs. We investigate four feature selection methods: $(t)$-test, significance analysis of microarrays (SAM), rank products (RP), and random forest (RF). The four methods are applied to acute lymphoblastic leukemia, acute myeloid leukemia, breast cancer, and lung cancer Affy data which consist of three cross-lab data sets each. We utilize a rank-based normalization method to reduce the bias from cross-lab data sets. Training on one data set or two combined data sets to test the remaining data set(s) are both considered. Balanced accuracy is used for prediction evaluation. This study provides comprehensive comparisons of the four feature selection methods in cross-lab microarray analysis. Results show that SAM has the best classification performance. RF also gets high classification accuracy, but it is not as stable as SAM. The most naive method is $(t)$-test, but its performance is the worst among the four methods. In this study, we further discuss the influence from the number of training samples, the number of selected genes, and the issue of unbalanced data sets.

Subject(s)

Gene Expression Profiling/methods , Models, Statistical , Neoplasms/genetics , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/methods , Databases, Factual , Gene Expression Regulation, Neoplastic/genetics , Humans

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL