Search | VHL Regional Portal

1.

On a Structural Similarity Index Approach for Floating-Point Data.

Baker, Allison H; Pinard, Alexander; Hammerling, Dorit M.

IEEE Trans Vis Comput Graph ; PP2023 Nov 15.

Article in English | MEDLINE | ID: mdl-37966929

ABSTRACT

Data visualization is typically a critical component of post-processing analysis workflows for floating-point output data from large simulation codes, such as global climate models. For example, images are often created from the raw data as a means for evaluation against a reference dataset or image. While the popular Structural Similarity Index Measure (SSIM) is a useful tool for such image comparisons, generating large numbers of images can be costly when simulation data volumes are substantial. In fact, computational cost considerations motivated our development of an alternative to the SSIM, which we refer to as the Data SSIM (DSSIM). The DSSIM is conceptually similar to the SSIM, but can be applied directly to the floating-point data as a means of assessing data quality. We present the DSSIM in the context of quantifying differences due to lossy compression on large volumes of simulation data from a popular climate model. Bypassing image creation results in a sizeable performance gain for this case study. In addition, we show that the DSSIM is useful in terms of avoiding plot-specific (but data-independent) choices that can affect the SSIM. While our work is motivated by and evaluated with climate model output data, the DSSIM may prove useful for other applications involving large volumes of simulation data.

2.

Toward Multiscale Measurement-Informed Methane Inventories: Reconciling Bottom-Up Site-Level Inventories with Top-Down Measurements Using Continuous Monitoring Systems.

Daniels, William S; Wang, Jiayang Lyra; Ravikumar, Arvind P; Harrison, Matthew; Roman-White, Selina A; George, Fiji C; Hammerling, Dorit M.

Environ Sci Technol ; 57(32): 11823-11833, 2023 08 15.

Article in English | MEDLINE | ID: mdl-37506319

ABSTRACT

Government policies and corporate strategies aimed at reducing methane emissions from the oil and gas sector increasingly rely on measurement-informed, site-level emission inventories, as conventional bottom-up inventories poorly capture temporal variability and the heavy-tailed nature of methane emissions. This work is based on an 11-month methane measurement campaign at oil and gas production sites. We find that operator-level top-down methane measurements are lower during the end-of-project phase than during the baseline phase. However, gaps persist between end-of-project top-down measurements and bottom-up site-level inventories, which we reconcile with high-frequency data from continuous monitoring systems (CMS). Specifically, we use CMS to (i) validate specific snapshot measurements and determine how they relate to the temporal emission profile of a given site and (ii) create a measurement-informed, site-level inventory that can be validated with top-down measurements to update conventional bottom-up inventories. This work presents a real-world demonstration of how to reconcile CMS rate estimates and top-down snapshot measurements jointly with bottom-up inventories at the site level. More broadly, it demonstrates the importance of multiscale measurements when creating measurement-informed, site-level emission inventories, which is a critical aspect of recent regulatory requirements in the Inflation Reduction Act, voluntary methane initiatives such as the Oil and Gas Methane Partnership 2.0, and corporate strategies.

Subject(s)

Air Pollutants , Methane , Methane/analysis , Natural Gas/analysis , Air Pollutants/analysis

3.

Data-driven evaluation of the Boston marathon qualifying times.

Albrecht, Laura; Ring-Jarvi, Ross; Hammerling, Dorit.

PLoS One ; 18(4): e0283851, 2023.

Article in English | MEDLINE | ID: mdl-37075050

ABSTRACT

The Boston Marathon is one of the most prestigious running races in the world. From its inception in 1897, popularity grew to a point in 1970 where qualifying times were implemented to cap the number of participants. Currently, women's qualifying times in each age group are thirty minutes slower than the men's qualifying times equating to a 16.7% adjustment for the 18-34 age group, decreasing with age to a 10.4% adjustment for the 80+ age group. This setup somewhat counter-intuitively implies that women become faster with age relative to men. We present a data-driven approach to determine qualifying standards that lead to an equal proportion of qualifiers in each age category and gender. We had to exclude the 75-79 and 80+ age groups from analysis due to limited data. When minimizing the difference in proportion of men and women qualifying, the women's times for the 65-69 and 70-74 age groups are 4-5 minutes slower than the current qualifying standard, while they are 0 to 3 minutes faster for all other age groups.

Subject(s)

Marathon Running , Running , Male , Humans , Female , Boston , Sex Factors , Time

4.

Multiscale Methane Measurements at Oil and Gas Facilities Reveal Necessary Frameworks for Improved Emissions Accounting.

Wang, Jiayang Lyra; Daniels, William S; Hammerling, Dorit M; Harrison, Matthew; Burmaster, Kaylyn; George, Fiji C; Ravikumar, Arvind P.

Environ Sci Technol ; 56(20): 14743-14752, 2022 10 18.

Article in English | MEDLINE | ID: mdl-36201663

ABSTRACT

Methane mitigation from the oil and gas (O&G) sector represents a key near-term global climate action opportunity. Recent legislation in the United States requires updating current methane reporting programs for oil and gas facilities with empirical data. While technological advances have led to improvements in methane emissions measurements and monitoring, the overall effectiveness of mitigation strategies rests on quantifying spatially and temporally varying methane emissions more accurately than the current approaches. In this work, we demonstrate a quantification, monitoring, reporting, and verification framework that pairs snapshot measurements with continuous emissions monitoring systems (CEMS) to reconcile measurements with inventory estimates and account for intermittent emission events. We find that site-level emissions exhibit significant intraday and daily emission variations. Snapshot measurements of methane can span over 3 orders of magnitude and may have limited application in developing annualized inventory estimates at the site level. Consequently, while official inventories underestimate methane emissions on average, emissions at individual facilities can be higher or lower than inventory estimates. Using CEMS, we characterize distributions of frequency and duration of intermittent emission events. Technologies that allow high sampling frequency such as CEMS, paired with a mechanistic understanding of facility-level events, are key to an accurate accounting of short-duration, episodic, and high-volume events that are often missed in snapshot surveys and to scale snapshot measurements to annualized emissions estimates.

Subject(s)

Air Pollutants , Natural Gas , Air Pollutants/analysis , Methane/analysis , Natural Gas/analysis , Sulfides , United States , United States Environmental Protection Agency

5.

Advancing data compression via noise detection.

Hammerling, Dorit M; Baker, Allison H.

Nat Comput Sci ; 1(11): 711-712, 2021 Nov.

Article in English | MEDLINE | ID: mdl-38217144

6.

A Case Study Competition Among Methods for Analyzing Large Spatial Data.

Heaton, Matthew J; Datta, Abhirup; Finley, Andrew O; Furrer, Reinhard; Guinness, Joseph; Guhaniyogi, Rajarshi; Gerber, Florian; Gramacy, Robert B; Hammerling, Dorit; Katzfuss, Matthias; Lindgren, Finn; Nychka, Douglas W; Sun, Furong; Zammit-Mangion, Andrew.

J Agric Biol Environ Stat ; 24(3): 398-425, 2019.

Article in English | MEDLINE | ID: mdl-31496633

ABSTRACT

The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. ELECTRONIC SUPPLEMENTARY MATERIAL: Supplementary materials for this article are available at 10.1007/s13253-018-00348-w.

7.

Completing the results of the 2013 Boston marathon.

Hammerling, Dorit; Cefalu, Matthew; Cisewski, Jessi; Dominici, Francesca; Parmigiani, Giovanni; Paulson, Charles; Smith, Richard L.

PLoS One ; 9(4): e93800, 2014.

Article in English | MEDLINE | ID: mdl-24727904

ABSTRACT

The 2013 Boston marathon was disrupted by two bombs placed near the finish line. The bombs resulted in three deaths and several hundred injuries. Of lesser concern, in the immediate aftermath, was the fact that nearly 6,000 runners failed to finish the race. We were approached by the marathon's organizers, the Boston Athletic Association (BAA), and asked to recommend a procedure for projecting finish times for the runners who could not complete the race. With assistance from the BAA, we created a dataset consisting of all the runners in the 2013 race who reached the halfway point but failed to finish, as well as all runners from the 2010 and 2011 Boston marathons. The data consist of split times from each of the 5 km sections of the course, as well as the final 2.2 km (from 40 km to the finish). The statistical objective is to predict the missing split times for the runners who failed to finish in 2013. We set this problem in the context of the matrix completion problem, examples of which include imputing missing data in DNA microarray experiments, and the Netflix prize problem. We propose five prediction methods and create a validation dataset to measure their performance by mean squared error and other measures. The best method used local regression based on a K-nearest-neighbors algorithm (KNN method), though several other methods produced results of similar quality. We show how the results were used to create projected times for the 2013 runners and discuss potential for future application of the same methodology. We present the whole project as an example of reproducible research, in that we are able to make the full data and all the algorithms we have used publicly available, which may facilitate future research extending the methods or proposing completely different approaches.

Subject(s)

Running/physiology , Humans , Physical Endurance/physiology , Sports

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL