Search | VHL Regional Portal

The Gene Expression Barcode 3.0: improved data processing and mining tools.

McCall, Matthew N; Jaffee, Harris A; Zelisko, Susan J; Sinha, Neeraj; Hooiveld, Guido; Irizarry, Rafael A; Zilliox, Michael J.

Nucleic Acids Res ; 42(Database issue): D938-43, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24271388

ABSTRACT

The Gene Expression Barcode project, http://barcode.luhs.org, seeks to determine the genes expressed for every tissue and cell type in humans and mice. Understanding the absolute expression of genes across tissues and cell types has applications in basic cell biology, hypothesis generation for gene function and clinical predictions using gene expression signatures. In its current version, this project uses the abundant publicly available microarray data sets combined with a suite of single-array preprocessing, quality control and analysis methods. In this article, we present the improvements that have been made since the previous version of the Gene Expression Barcode in 2011. These include a variety of new data mining tools and summaries, estimated transcriptomes and curated annotations.

Subject(s)

Databases, Genetic , Gene Expression Profiling , Animals , Data Mining , Humans , Internet , Mice , Oligonucleotide Array Sequence Analysis , Software , Transcriptome

fRMA ST: frozen robust multiarray analysis for Affymetrix Exon and Gene ST arrays.

McCall, Matthew N; Jaffee, Harris A; Irizarry, Rafael A.

Bioinformatics ; 28(23): 3153-4, 2012 Dec 01.

Article in English | MEDLINE | ID: mdl-23044545

ABSTRACT

SUMMARY: Frozen robust multiarray analysis (fRMA) is a single-array preprocessing algorithm that retains the advantages of multiarray algorithms and removes certain batch effects by downweighting probes that have high between-batch residual variance. Here, we extend the fRMA algorithm to two new microarray platforms--Affymetrix Human Exon and Gene 1.0 ST--by modifying the fRMA probe-level model and extending the frma package to work with oligo ExonFeatureSet and GeneFeatureSet objects. AVAILABILITY AND IMPLEMENTATION: All packages are implemented in R. Source code and binaries are freely available through the Bioconductor project. Convenient links to all software and data packages can be found at http://mnmccall.com/software CONTACT: mccallm@gmail.com.

Subject(s)

Algorithms , Oligonucleotide Array Sequence Analysis/methods , Software , Alternative Splicing , Computational Biology/methods , Exons , Humans , Models, Theoretical , Oligonucleotide Probes/genetics

Peak intraocular pressure and glaucomatous progression in primary open-angle glaucoma.

Konstas, Anastasios G P; Quaranta, Luciano; Mikropoulos, Dimitrios G; Nasr, Mayssa B; Russo, Andrea; Jaffee, Harris A; Stewart, Jeanette A; Stewart, William C.

J Ocul Pharmacol Ther ; 28(1): 26-32, 2012 Feb.

Article in English | MEDLINE | ID: mdl-22004074

ABSTRACT

PURPOSE: To evaluate the effect of 24-h peak intraocular pressure (IOP) on the progression of primary open-angle glaucoma (POAG) and the 24 h time points that best predict peak pressure. METHODS: A retrospective analysis of clinical data evaluating long-term glaucomatous progression in patients with POAG who were previously in a 24-h study of the authors (IOP readings at 2/6/10 A.M. and 2/6/10 PM); had ≥3 treated 10 A.M. (±1 h) IOP measurements over 5-years after an untreated 24-h baseline; and had a treated 24-h curve with a 10 A.M. IOP±2 mmHg within the 10 A.M. mean IOP over 5-years. RESULTS: We included 98 nonprogressed and 53 progressed patients with POAG (n=151). The mean 24-h peak IOP (mmHg) was 19.9±2.7 for progressed and 18.3±2.0 for nonprogressed patients (P<0.001). Progressed patients also showed a higher mean 24-h IOP. Generally, patients with a mean or peak daytime (readings at 10 A.M., 2 and 6 P.M.) or 24-h peak IOP of ≤18 remained nonprogressed in 75%-78% of cases. Further, measuring IOP at night found a higher peak in only 20% of cases, which was ≤2 of the daytime peak in 98% of cases. A multivariate regression analysis showed only 24-h peak IOP as an independent risk factor for progression (P=0.002). CONCLUSIONS: This study suggests that daytime peak IOP may be clinically important in predicting long-term glaucomatous progression. Further, daytime peak IOP may assist, as much as daytime mean IOP and, in most cases, 24-h peak IOP, in helping to guide long-term treatment in POAG.

Subject(s)

Glaucoma, Open-Angle/physiopathology , Intraocular Pressure/physiology , Aged , Disease Progression , Female , Follow-Up Studies , Humans , Male , Middle Aged , Multivariate Analysis , Regression Analysis , Retrospective Studies , Risk Factors , Time Factors

The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes.

McCall, Matthew N; Uppal, Karan; Jaffee, Harris A; Zilliox, Michael J; Irizarry, Rafael A.

Nucleic Acids Res ; 39(Database issue): D1011-5, 2011 Jan.

Article in English | MEDLINE | ID: mdl-21177656

ABSTRACT

Various databases have harnessed the wealth of publicly available microarray data to address biological questions ranging from across-tissue differential expression to homologous gene expression. Despite their practical value, these databases rely on relative measures of expression and are unable to address the most fundamental question--which genes are expressed in a given cell type. The Gene Expression Barcode is the first database to provide reliable absolute measures of expression for most annotated genes for 131 human and 89 mouse tissue types, including diseased tissue. This is made possible by a novel algorithm that leverages information from the GEO and ArrayExpress public repositories to build statistical models that permit converting data from a single microarray into expressed/unexpressed calls for each gene. For selected platforms, users may upload data and obtain results in a matter of seconds. The raw data, curated annotation, and code used to create our resource are also available at http://rafalab.jhsph.edu/barcode.

Subject(s)

Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Animals , Humans , Mice , Software

Redefining CpG islands using hidden Markov models.

Wu, Hao; Caffo, Brian; Jaffee, Harris A; Irizarry, Rafael A; Feinberg, Andrew P.

Biostatistics ; 11(3): 499-514, 2010 Jul.

Article in English | MEDLINE | ID: mdl-20212320

ABSTRACT

The DNA of most vertebrates is depleted in CpG dinucleotide: a C followed by a G in the 5' to 3' direction. CpGs are the target for DNA methylation, a chemical modification of cytosine (C) heritable during cell division and the most well-characterized epigenetic mechanism. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). Knowing CGI locations is important because they mark functionally relevant epigenetic loci in development and disease. For various mammals, including human, a readily available and widely used list of CGI is available from the UCSC Genome Browser. This list was derived using algorithms that search for regions satisfying a definition of CGI proposed by Gardiner-Garden and Frommer more than 20 years ago. Recent findings, enabled by advances in technology that permit direct measurement of epigenetic endpoints at a whole-genome scale, motivate the need to adapt the current CGI definition. In this paper, we propose a procedure, guided by hidden Markov models, that permits an extensible approach to detecting CGI. The main advantage of our approach over others is that it summarizes the evidence for CGI status as probability scores. This provides flexibility in the definition of a CGI and facilitates the creation of CGI lists for other species. The utility of this approach is demonstrated by generating the first CGI lists for invertebrates, and the fact that we can create CGI lists that substantially increases overlap with recently discovered epigenetic marks. A CGI list and the probability scores, as a function of genome location, for each species are available at http://www.rafalab.org.

Subject(s)

CpG Islands/genetics , Epigenesis, Genetic/genetics , Markov Chains , Models, Genetic , Models, Statistical , Genome, Human/genetics , Humans

Comparison of Affymetrix GeneChip expression measures.

Irizarry, Rafael A; Wu, Zhijin; Jaffee, Harris A.

Bioinformatics ; 22(7): 789-94, 2006 Apr 01.

Article in English | MEDLINE | ID: mdl-16410320

ABSTRACT

MOTIVATION: In the Affymetrix GeneChip system, preprocessing occurs before one obtains expression level measurements. Because the number of competing preprocessing methods was large and growing we developed a benchmark to help users identify the best method for their application. A webtool was made available for developers to benchmark their procedures. At the time of writing over 50 methods had been submitted. RESULTS: We benchmarked 31 probe set algorithms using a U95A dataset of spike in controls. Using this dataset, we found that background correction, one of the main steps in preprocessing, has the largest effect on performance. In particular, background correction appears to improve accuracy but, in general, worsen precision. The benchmark results put this balance in perspective. Furthermore, we have improved some of the original benchmark metrics to provide more detailed information regarding precision and accuracy. A handful of methods stand out as providing the best balance using spike-in data with the older U95A array, although different experiments on more current arrays may benchmark differently. AVAILABILITY: The affycomp package, now version 1.5.2, continues to be available as part of the Bioconductor project (http://www.bioconductor.org). The webtool continues to be available at http://affycomp.biostat.jhsph.edu CONTACT: rafa@jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Benchmarking , Oligonucleotide Array Sequence Analysis/instrumentation , Reproducibility of Results , Software

Patients' perceptions of the value of current vision: assessment of preference values among patients with subfoveal choroidal neovascularization--The Submacular Surgery Trials Vision Preference Value Scale: SST Report No. 6.

Bass, Eric B; Marsh, Marsha J; Mangione, Carol M; Bressler, Neil M; Childs, Ashley L; Dong, Li Ming; Hawkins, Barbara S; Jaffee, Harris A; Miskala, Päivi.

Arch Ophthalmol ; 122(12): 1856-67, 2004 Dec.

Article in English | MEDLINE | ID: mdl-15596591

ABSTRACT

OBJECTIVE: To improve understanding and awareness of the impact of subfoveal choroidal neovascularization (CNV) on health-related quality of life, we sought to measure the preference value that patients with subfoveal CNV assigned to their health and vision status. PATIENTS AND METHODS: Patients with subfoveal CNV completed telephone interviews about their quality of life prior to enrollment and random treatment assignment in the Submacular Surgery Trials, a set of multicenter randomized controlled trials evaluating outcomes of submacular surgery compared with observation. The interviewers asked patients to rate their current vision on a scale from 0 (completely blind) to 100 (perfect vision). The interviewers also asked them to rate complete blindness and then perfect vision, assuming their health otherwise was the same as it was at the time of the interview, on a scale from 0 (dead) to 100 (perfect health with perfect vision). Scores were converted to a 0 to 1 preference value scale for health and vision status, where 0 represents death and 1 represents perfect health and vision. RESULTS: Of 1015 participants enrolled in the Submacular Surgery Trials, 996 completed interviews that included the rating questions, and 792 (80%) answered all 3 rating questions in a manner permitting calculation of a single overall preference value for their current health and vision status on a scale from 0 (dead) to 1 (perfect). The mean preference value was 0.64 (median, 0.68; interquartile range, 0.51-0.80). The preference values correlated with age (Pearson correlation coefficient, -0.11; P = .002), patients' self-rated perception of overall health (Spearman correlation coefficient, 0.36; P<.001), and self-reported perception of vision (Spearman correlation coefficient, 0.47; P<.001). The preference values were significantly lower with poorer visual acuity in the better eye and greater evidence of dysfunction on either the Hospital Anxiety and Depression Scale or the Physical or Mental Component Summary scales of the Short Form-36 Health Survey but did not differ significantly by gender or other baseline characteristics such as race, treatment assignment, or size of the CNV lesion. CONCLUSIONS: Vision loss from subfoveal CNV is associated with patient preference values that are as low as or lower than values previously reported for other serious medical conditions such as dialysis-dependent renal failure and AIDS, indicating that both unilateral and bilateral CNV have a profound impact on how patients feel about their overall health-related quality of life.

Subject(s)

Attitude to Health , Choroidal Neovascularization/psychology , Patient Satisfaction , Patients/psychology , Quality of Life/psychology , Vision, Ocular/physiology , Adult , Aged , Aged, 80 and over , Blindness/psychology , Choroidal Neovascularization/surgery , Cross-Sectional Studies , Female , Fovea Centralis , Health Status , Humans , Male , Middle Aged , Surveys and Questionnaires

A benchmark for Affymetrix GeneChip expression measures.

Cope, Leslie M; Irizarry, Rafael A; Jaffee, Harris A; Wu, Zhijin; Speed, Terence P.

Bioinformatics ; 20(3): 323-31, 2004 Feb 12.

Article in English | MEDLINE | ID: mdl-14960458

ABSTRACT

MOTIVATION: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics. There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the best method for a given inquiry. RESULTS: We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark data set consisting of a dilution study and a spike-in study. Because the truth is known for these data, we can identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest and motivated by the presence of appropriate data.

Subject(s)

Algorithms , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/standards , Software , User-Computer Interface , Benchmarking/methods , Computer Graphics , Gene Expression Profiling/instrumentation , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Oligonucleotide Array Sequence Analysis/instrumentation , Reference Standards , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , United States

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL