Search | VHL Regional Portal

1.

Wasm-iCARE: a portable and privacy-preserving web module to build, validate, and apply absolute risk models.

Balasubramanian, Jeya Balaji; Choudhury, Parichoy Pal; Mukhopadhyay, Srijon; Ahearn, Thomas; Chatterjee, Nilanjan; García-Closas, Montserrat; Almeida, Jonas S.

JAMIA Open ; 7(2): ooae055, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38938691

ABSTRACT

Objectives: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face limitations in portability and privacy due to their need for circulating user data in remote servers for operation. We overcome this by porting iCARE to the web platform. Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) and then compiled it to WebAssembly (Wasm-iCARE)-a portable web module, which operates within the privacy of the user's device. Results: We showcase the portability and privacy of Wasm-iCARE through 2 applications: for researchers to statistically validate risk models and to deliver them to end-users. Both applications run entirely on the client side, requiring no downloads or installations, and keep user data on-device during risk calculation. Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.

2.

FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation.

Martins, Yasmmin C; Bhawsar, Praphulla Ms; Balasubramanian, Jeya B; Russ, Daniel; Wong, Wendy Sw; Maass, Wolfgang; Almeida, Jonas S.

AMIA Jt Summits Transl Sci Proc ; 2024: 65-74, 2024.

Article in English | MEDLINE | ID: mdl-38827109

ABSTRACT

Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.

3.

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning.

Afonso, Martim; Bhawsar, Praphulla M S; Saha, Monjoy; Almeida, Jonas S; Oliveira, Arlindo L.

ArXiv ; 2024 Apr 11.

Article in English | MEDLINE | ID: mdl-38903738

ABSTRACT

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

4.

mSigSDK - private, at scale, computation of mutation signatures.

Ge, Aaron; Zhang, Tongwu; Martins, Yasmmin Côrtes; Landi, Maria Teresa; Park, Brian; Chen, Kailing; Balasubramanian, Jeya; Almeida, Jonas S.

ArXiv ; 2024 Jan 19.

Article in English | MEDLINE | ID: mdl-38327678

ABSTRACT

In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed an in-browser Software Development Kit (a JavaScript SDK), mSigSDK, to facilitate the orchestration of distributed data processing workflows and graphic visualization of mutational signature analysis results. We strictly adhered to modern web computing standards, particularly the modularization standards set by the ECMAScript ES6 framework (JavaScript modules). Our approach allows for the computation to be entirely performed by secure delegation to the computational resources of the user's own machine (in-browser), without any downloads or installations. The mSigSDK was developed primarily as a companion library to the mSig Portal resource of the National Cancer Institute Division of Cancer Epidemiology and Genetics (NIH/NCI/DCEG), with a focus on FAIR extensibility as components of other researchers' own data science constructs. Anticipated extensions include the programmatic operation of other mutation signature API ecosystems such as SIGNAL and COSMIC, advancing towards a data commons for mutational signature research (Grossman et al., 2016).

5.

Image-based multiplex immune profiling of cancer tissues: translational implications. A report of the International Immuno-oncology Biomarker Working Group on Breast Cancer.

Jahangir, Chowdhury Arif; Page, David B; Broeckx, Glenn; Gonzalez, Claudia A; Burke, Caoimbhe; Murphy, Clodagh; Reis-Filho, Jorge S; Ly, Amy; Harms, Paul W; Gupta, Rajarsi R; Vieth, Michael; Hida, Akira I; Kahila, Mohamed; Kos, Zuzana; van Diest, Paul J; Verbandt, Sara; Thagaard, Jeppe; Khiroya, Reena; Abduljabbar, Khalid; Acosta Haab, Gabriela; Acs, Balazs; Adams, Sylvia; Almeida, Jonas S; Alvarado-Cabrero, Isabel; Azmoudeh-Ardalan, Farid; Badve, Sunil; Baharun, Nurkhairul Bariyah; Bellolio, Enrique R; Bheemaraju, Vydehi; Blenman, Kim Rm; Botinelly Mendonça Fujimoto, Luciana; Burgues, Octavio; Chardas, Alexandros; Cheang, Maggie Chon U; Ciompi, Francesco; Cooper, Lee Ad; Coosemans, An; Corredor, Germán; Dantas Portela, Flavio Luis; Deman, Frederik; Demaria, Sandra; Dudgeon, Sarah N; Elghazawy, Mahmoud; Fernandez-Martín, Claudio; Fineberg, Susan; Fox, Stephen B; Giltnane, Jennifer M; Gnjatic, Sacha; Gonzalez-Ericsson, Paula I; Grigoriadis, Anita.

J Pathol ; 262(3): 271-288, 2024 03.

Article in English | MEDLINE | ID: mdl-38230434

ABSTRACT

Recent advances in the field of immuno-oncology have brought transformative changes in the management of cancer patients. The immune profile of tumours has been found to have key value in predicting disease prognosis and treatment response in various cancers. Multiplex immunohistochemistry and immunofluorescence have emerged as potent tools for the simultaneous detection of multiple protein biomarkers in a single tissue section, thereby expanding opportunities for molecular and immune profiling while preserving tissue samples. By establishing the phenotype of individual tumour cells when distributed within a mixed cell population, the identification of clinically relevant biomarkers with high-throughput multiplex immunophenotyping of tumour samples has great potential to guide appropriate treatment choices. Moreover, the emergence of novel multi-marker imaging approaches can now provide unprecedented insights into the tumour microenvironment, including the potential interplay between various cell types. However, there are significant challenges to widespread integration of these technologies in daily research and clinical practice. This review addresses the challenges and potential solutions within a structured framework of action from a regulatory and clinical trial perspective. New developments within the field of immunophenotyping using multiplexed tissue imaging platforms and associated digital pathology are also described, with a specific focus on translational implications across different subtypes of cancer. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Subject(s)

Breast Neoplasms , Humans , Female , Biomarkers, Tumor/genetics , Prognosis , Phenotype , United Kingdom , Tumor Microenvironment

6.

Associations between actigraphy-measured sleep duration, continuity, and timing with mortality in the UK Biobank.

Saint-Maurice, Pedro F; Freeman, Joshua R; Russ, Daniel; Almeida, Jonas S; Shams-White, Marissa M; Patel, Shreya; Wolff-Hughes, Dana L; Watts, Eleanor L; Loftfield, Erikka; Hong, Hyokyoung G; Moore, Steven C; Matthews, Charles E.

Sleep ; 47(3)2024 Mar 11.

Article in English | MEDLINE | ID: mdl-38066693

ABSTRACT

STUDY OBJECTIVES: To examine the associations between sleep duration, continuity, timing, and mortality using actigraphy among adults. METHODS: Data were from a cohort of 88 282 adults (40-69 years) in UK Biobank that wore a wrist-worn triaxial accelerometer for 7 days. Actigraphy data were processed to generate estimates of sleep duration and other sleep characteristics including wake after sleep onset (WASO), number of 5-minute awakenings, and midpoint for sleep onset/wake-up and the least active 5 hours (L5). Data were linked to mortality outcomes with follow-up to October 31, 2021. We implemented Cox models (hazard ratio, confidence intervals [HR, 95% CI]) to quantify sleep associations with mortality. Models were adjusted for demographics, lifestyle factors, and medical conditions. RESULTS: Over an average of 6.8 years 2973 deaths occurred (1700 cancer, 586 CVD deaths). Overall sleep duration was significantly associated with risk for all-cause (pâ<â0.01), cancer (pâ<â0.01), and CVD (pâ=â0.03) mortality. For example, when compared to sleep durations of 7.0 hrs/d, durations of 5 hrs/d were associated with a 29% higher risk for all-cause mortality (HR: 1.29 [1.09, 1.52]). WASO and number of awakenings were not associated with mortality. Individuals with L5 early or late midpoints (<2:30 orâ≥â3:30) had a ~20% higher risk for all-cause mortality, compared to those with intermediate L5 midpoints (3:00-3:29; pâ≤â0.01; e.g. HR ≥ 3:30: 1.19 [1.07, 1.32]). CONCLUSIONS: Shorter sleep duration and both early and late sleep timing were associated with a higher mortality risk. These findings reinforce the importance of public health efforts to promote healthy sleep patterns in adults.

Subject(s)

Cardiovascular Diseases , Neoplasms , Adult , Humans , Actigraphy , Sleep Duration , UK Biobank , Biological Specimen Banks , Sleep

7.

Actigraphy-derived measures of sleep and risk of prostate cancer in the UK Biobank.

Freeman, Joshua R; Saint-Maurice, Pedro F; Watts, Eleanor L; Moore, Steven C; Shams-White, Marissa M; Wolff-Hughes, Dana L; Russ, Daniel E; Almeida, Jonas S; Caporaso, Neil E; Hong, Hyokyoung G; Loftfield, Erikka; Matthews, Charles E.

J Natl Cancer Inst ; 116(3): 434-444, 2024 Mar 07.

Article in English | MEDLINE | ID: mdl-38013591

ABSTRACT

BACKGROUND: Studies of sleep and prostate cancer are almost entirely based on self-report, with limited research using actigraphy. Our goal was to evaluate actigraphy-measured sleep and prostate cancer and to expand on findings from prior studies of self-reported sleep. METHODS: We prospectively examined 34â260 men without a history of prostate cancer in the UK Biobank. Sleep characteristics were measured over 7 days using actigraphy. We calculated sleep duration, onset, midpoint, wake-up time, social jetlag (difference in weekend-weekday sleep midpoints), sleep efficiency (percentage of time spent asleep between onset and wake-up time), and wakefulness after sleep onset. Cox proportional hazards models were used to estimate covariate-adjusted hazards ratios (HRs) and 95% confidence intervals (CIs). RESULTS: Over 7.6 years, 1152 men were diagnosed with prostate cancer. Sleep duration was not associated with prostate cancer risk. Sleep midpoint earlier than 4:00 am was not associated with prostate cancer risk, though sleep midpoint of 5:00 am or later was suggestively associated with lower prostate cancer risk but had limited precision (earlier than 4:00 am vs 4:00-4:59 am HR = 1.00, 95% CI = 0.87 to 1.16; 5:00 am or later vs 4:00-4:59 am HR = 0.79, 95% CI = 0.57 to 1.10). Social jetlag was not associated with greater prostate cancer risk (1 to <2 hours vs <1 hour HR = 1.06, 95% CI = 0.89 to 1.25; ≥2 hours vs <1 hour HR = 0.90, 95% CI = 0.65 to 1.26). Compared with men who averaged less than 30 minutes of wakefulness after sleep onset per day, men with 60 minutes or more had a higher risk of prostate cancer (HR = 1.20, 95% CI = 1.00 to 1.43). CONCLUSIONS: Of the sleep characteristics studied, higher wakefulness after sleep onset-a measure of poor sleep quality-was associated with greater prostate cancer risk. Replication of our findings between wakefulness after sleep onset and prostate cancer are warranted.

Subject(s)

Actigraphy , Prostatic Neoplasms , Male , Humans , UK Biobank , Biological Specimen Banks , Sleep , Prostatic Neoplasms/epidemiology

8.

MedicaidJS: a FAIR approach to real-time drug analytics.

Agarwal, Kunaal; Kim, Hae Rin; Almeida, Jonas S; Sandoval, Lorena.

Bioinform Adv ; 3(1): vbad170, 2023.

Article in English | MEDLINE | ID: mdl-38075478

ABSTRACT

Motivation: As prescription drug prices have drastically risen over the past decade, so has the need for real-time drug tracking resources. In spite of increased public availability to raw data sources, individual drug metrics remain concealed behind intricate nomenclature and complex data models. Some web applications, such as GoodRX, provide insight into real-time drug prices but offer limited interoperability. To overcome both obstacles we pursued the direct programmatic operation of the stateless Application Programming interfaces (HTTP REST APIs) maintained by the Food and Drug Administration (FDA), Medicaid, and National Library of Medicine. These data-intensive resources represent an opportunity to develop Software Development Kits (SDK) to streamline drug metrics without downloads or installations, in a manner that addresses the FAIR principles for stewardship in scientific data-Findability, Accessibility, Interoperability, and Reusability. These principles provide a guideline for continual stewardship of scientific data. Results: MedicaidJS SDK was developed to orchestrate API calls to three complementary data resources: Medicaid (data.medicaid.gov), Food and Drug Administration (open.fda.gov), and the National Library of Medicine RxNorm (lhncbc.nlm.nih.gov/RxNav). MedicaidJS synthesizes response data from each platform into a zero-footprint JavaScript modular library that provides data wrangling, analysis, and generation of embeddable interactive visualizations. The SDK is served on github with live examples on observableHQ notebooks. It is freely available and can be embedded into web applications as modules returning structured JSON data with standardized identifiers. Availability and implementation: Open source code publicly available at https://github.com/episphere/medicaid, live at episphere.github.io/medicaid, supplementary interactive Observable Notebooks at observablehq.com/@medicaidsdk/medicaidsdk.

9.

EpiVECS: exploring spatiotemporal epidemiological data using cluster embedding and interactive visualization.

Mason, Lee; Hicks, Blànaid; Almeida, Jonas S.

Sci Rep ; 13(1): 21193, 2023 Dec 01.

Article in English | MEDLINE | ID: mdl-38040776

ABSTRACT

The analysis of data over space and time is a core part of descriptive epidemiology, but the complexity of spatiotemporal data makes this challenging. There is a need for methods that simplify the exploration of such data for tasks such as surveillance and hypothesis generation. In this paper, we use combined clustering and dimensionality reduction methods (hereafter referred to as 'cluster embedding' methods) to spatially visualize patterns in epidemiological time-series data. We compare several cluster embedding techniques to see which performs best along a variety of internal cluster validation metrics. We find that methods based on k-means clustering generally perform better than self-organizing maps on real world epidemiological data, with some minor exceptions. We also introduce EpiVECS, a tool which allows the user to perform cluster embedding and explore the results using interactive visualization. EpiVECS is available as a privacy preserving, in-browser open source web application at https://episphere.github.io/epivecs .

10.

PRScalc, a privacy-preserving calculation of raw polygenic risk scores from direct-to-consumer genomics data.

Sandoval, Lorena; Jafri, Saleet; Balasubramanian, Jeya Balaji; Bhawsar, Praphulla; Edelson, Jacob L; Martins, Yasmmin; Maass, Wolfgang; Chanock, Stephen J; Garcia-Closas, Montserrat; Almeida, Jonas S.

Bioinform Adv ; 3(1): vbad145, 2023.

Article in English | MEDLINE | ID: mdl-37868335

ABSTRACT

Motivation: Currently, the Polygenic Score (PGS) Catalog curates over 400 publications on over 500 traits corresponding to over 3000 polygenic risk scores (PRSs). To assess the feasibility of privately calculating the underlying multivariate relative risk for individuals with consumer genomics data, we developed an in-browserPRS calculator for genomic data that does not circulate any data or engage in any computation outside of the user's personal device. Results: A prototype personal risk score calculator, created for research purposes, was developed to demonstrate how the PGS Catalog can be privately and readily applied to readily available direct-to-consumer genetic testing services, such as 23andMe. No software download, installation, or configuration is needed. The PRS web calculator matches individual PGS catalog entries with an individual's 23andMe genome data composed of 600k to 1.4 M single-nucleotide polymorphisms (SNPs). Beta coefficients provide researchers with a convenient assessment of risk associated with matched SNPs. This in-browser application was tested in a variety of personal devices, including smartphones, establishing the feasibility of privately calculating personal risk scores with up to a few thousand reference genetic variations and from the full 23andMe SNP data file (compressed or not). Availability and implementation: The PRScalc web application is developed in JavaScript, HTML, and CSS and is available at GitHub repository (https://episphere.github.io/prs) under an MIT license. The datasets were derived from sources in the public domain: [PGS Catalog, Personal Genome Project].

11.

Wasm-iCARE: a portable and privacy-preserving web module to build, validate, and apply absolute risk models.

Balasubramanian, Jeya Balaji; Choudhury, Parichoy Pal; Mukhopadhyay, Srijon; Ahearn, Thomas; Chatterjee, Nilanjan; García-Closas, Montserrat; Almeida, Jonas S.

ArXiv ; 2023 Oct 13.

Article in English | MEDLINE | ID: mdl-37873020

ABSTRACT

Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to overcome these limitations. Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) then compiled it to WebAssembly (Wasm-iCARE): a portable web module, which operates entirely within the privacy of the user's device. Results: We showcase the portability and privacy of Wasm-iCARE through two applications: for researchers to statistically validate risk models, and to deliver them to end-users. Both applications run entirely on the client-side, requiring no downloads or installations, and keeps user data on-device during risk calculation. Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.

12.

Quest markup for developing FAIR questionnaire modules for epidemiologic studies.

Russ, Daniel E; Gerlanc, Nicole M; Shen, Brian; Patel, Bhaumik; de González, Amy Berrington; Freedman, Neal D; Cusack, Julie M; Gaudet, Mia M; García-Closas, Montserrat; Almeida, Jonas S.

BMC Med Inform Decis Mak ; 23(1): 238, 2023 10 25.

Article in English | MEDLINE | ID: mdl-37880712

ABSTRACT

BACKGROUND: Online questionnaires are commonly used to collect information from participants in epidemiological studies. This requires building questionnaires using machine-readable formats that can be delivered to study participants using web-based technologies such as progressive web applications. However, the paucity of open-source markup standards with support for complex logic make collaborative development of web-based questionnaire modules difficult. This often prevents interoperability and reusability of questionnaire modules across epidemiological studies. RESULTS: We developed an open-source markup language for presentation of questionnaire content and logic, Quest, within a real-time renderer that enables the user to test logic (e.g., skip patterns) and view the structure of data collection. We provide the Quest markup language, an in-browser markup rendering tool, questionnaire development tool and an example web application that embeds the renderer, developed for The Connect for Cancer Prevention Study. CONCLUSION: A markup language can specify both the content and logic of a questionnaire as plain text. Questionnaire markup, such as Quest, can become a standard format for storing questionnaires or sharing questionnaires across the web. Quest is a step towards generation of FAIR data in epidemiological studies by facilitating reusability of questionnaires and data interoperability using open-source tools.

Subject(s)

Software , Humans , Surveys and Questionnaires , Epidemiologic Studies

13.

Spatial analyses of immune cell infiltration in cancer: current methods and future directions: A report of the International Immuno-Oncology Biomarker Working Group on Breast Cancer.

Page, David B; Broeckx, Glenn; Jahangir, Chowdhury Arif; Verbandt, Sara; Gupta, Rajarsi R; Thagaard, Jeppe; Khiroya, Reena; Kos, Zuzana; Abduljabbar, Khalid; Acosta Haab, Gabriela; Acs, Balazs; Akturk, Guray; Almeida, Jonas S; Alvarado-Cabrero, Isabel; Azmoudeh-Ardalan, Farid; Badve, Sunil; Baharun, Nurkhairul Bariyah; Bellolio, Enrique R; Bheemaraju, Vydehi; Blenman, Kim Rm; Botinelly Mendonça Fujimoto, Luciana; Bouchmaa, Najat; Burgues, Octavio; Cheang, Maggie Chon U; Ciompi, Francesco; Cooper, Lee Ad; Coosemans, An; Corredor, Germán; Dantas Portela, Flavio Luis; Deman, Frederik; Demaria, Sandra; Dudgeon, Sarah N; Elghazawy, Mahmoud; Ely, Scott; Fernandez-Martín, Claudio; Fineberg, Susan; Fox, Stephen B; Gallagher, William M; Giltnane, Jennifer M; Gnjatic, Sacha; Gonzalez-Ericsson, Paula I; Grigoriadis, Anita; Halama, Niels; Hanna, Matthew G; Harbhajanka, Aparna; Hardas, Alexandros; Hart, Steven N; Hartman, Johan; Hewitt, Stephen; Hida, Akira I.

J Pathol ; 260(5): 514-532, 2023 08.

Article in English | MEDLINE | ID: mdl-37608771

ABSTRACT

Modern histologic imaging platforms coupled with machine learning methods have provided new opportunities to map the spatial distribution of immune cells in the tumor microenvironment. However, there exists no standardized method for describing or analyzing spatial immune cell data, and most reported spatial analyses are rudimentary. In this review, we provide an overview of two approaches for reporting and analyzing spatial data (raster versus vector-based). We then provide a compendium of spatial immune cell metrics that have been reported in the literature, summarizing prognostic associations in the context of a variety of cancers. We conclude by discussing two well-described clinical biomarkers, the breast cancer stromal tumor infiltrating lymphocytes score and the colon cancer Immunoscore, and describe investigative opportunities to improve clinical utility of these spatial biomarkers. © 2023 The Pathological Society of Great Britain and Ireland.

Subject(s)

Colonic Neoplasms , Humans , Biomarkers , Benchmarking , Lymphocytes, Tumor-Infiltrating , Spatial Analysis , Tumor Microenvironment

14.

Pitfalls in machine learning-based assessment of tumor-infiltrating lymphocytes in breast cancer: A report of the International Immuno-Oncology Biomarker Working Group on Breast Cancer

Thagaard, Jeppe; Broeckx, Glenn; Page, David B; Jahangir, Chowdhury Arif; Verbandt, Sara; Kos, Zuzana; Gupta, Rajarsi; Khiroya, Reena; Abduljabbar, Khalid; Acosta Haab, Gabriela; Acs, Balazs; Akturk, Guray; Almeida, Jonas S; Alvarado-Cabrero, Isabel; Amgad, Mohamed; Azmoudeh-Ardalan, Farid; Badve, Sunil; Baharun, Nurkhairul Bariyah; Balslev, Eva; Bellolio, Enrique R; Bheemaraju, Vydehi; Blenman, Kim Rm; Botinelly Mendonça Fujimoto, Luciana; Bouchmaa, Najat; Burgues, Octavio; Chardas, Alexandros; Chon U Cheang, Maggie; Ciompi, Francesco; Cooper, Lee Ad; Coosemans, An; Corredor, Germán; Dahl, Anders B; Dantas Portela, Flavio Luis; Deman, Frederik; Demaria, Sandra; Doré Hansen, Johan; Dudgeon, Sarah N; Ebstrup, Thomas; Elghazawy, Mahmoud; Fernandez-Martín, Claudio; Fox, Stephen B; Gallagher, William M; Giltnane, Jennifer M; Gnjatic, Sacha; Gonzalez-Ericsson, Paula I; Grigoriadis, Anita; Halama, Niels; Hanna, Matthew G; Harbhajanka, Aparna; Hart, Steven N.

J Pathol ; 260(5): 498-513, 2023 08.

Article in English | MEDLINE | ID: mdl-37608772

ABSTRACT

The clinical significance of the tumor-immune interaction in breast cancer is now established, and tumor-infiltrating lymphocytes (TILs) have emerged as predictive and prognostic biomarkers for patients with triple-negative (estrogen receptor, progesterone receptor, and HER2-negative) breast cancer and HER2-positive breast cancer. How computational assessments of TILs might complement manual TIL assessment in trial and daily practices is currently debated. Recent efforts to use machine learning (ML) to automatically evaluate TILs have shown promising results. We review state-of-the-art approaches and identify pitfalls and challenges of automated TIL evaluation by studying the root cause of ML discordances in comparison to manual TIL quantification. We categorize our findings into four main topics: (1) technical slide issues, (2) ML and image analysis aspects, (3) data challenges, and (4) validation issues. The main reason for discordant assessments is the inclusion of false-positive areas or cells identified by performance on certain tissue patterns or design choices in the computational implementation. To aid the adoption of ML for TIL assessment, we provide an in-depth discussion of ML and image analysis, including validation issues that need to be considered before reliable computational reporting of TILs can be incorporated into the trial and routine clinical management of patients with triple-negative breast cancer. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Subject(s)

Mammary Neoplasms, Animal , Triple Negative Breast Neoplasms , Humans , Animals , Lymphocytes, Tumor-Infiltrating , Biomarkers , Machine Learning

15.

epiDonate - distributed serverless data infrastructure for epidemiological studies.

Almeida, Jonas S; Patel, Bhaumik; Russ, Daniel E; Bhawsar, Praphulla; Maurice, Pedro F Saint-; Matthews, Charles; Anand, Adit; Ferguson, Martin; Johnson, Davin; Brotzman, Michelle; Gerlanc, Nicole; Chanock, Stephen; de Gonzalez, Amy Berrington; Gaudet, Mia; Garcia-Closas, Montserrat.

AMIA Jt Summits Transl Sci Proc ; 2023: 25-31, 2023.

Article in English | MEDLINE | ID: mdl-37350888

ABSTRACT

Motivation: Epidemiological studies face two important challenges: the need to ingest ever more complex data types, and mounting concerns about participant privacy and data governance. These two challenges are compounded by the expectation that data infrastructure will eventually need to facilitate cross-registration of participants by multiple epidemiological studies. Implementation: The portable web-service epiDonate was developed using the serverless model known as FaaS (Function-as-a-Service). The reference implementation uses nodejs. The implementation relies on a simple tokenization scheme, mediated by a public API, that a) distinguishes admin from participant roles, with b) extensible permission configuration operating a read/write structure. General Features: The critical design feature of epiDonate is the absence of business logic on the server-side (the web service). The simplicity removes the need to customize virtual machines and enables ecosystems of multiple web Applications backed by one or more data donation deployments. Availability: https://episphere.github.io/donate.

16.

Interpretable, non-mechanistic forecasting using empirical dynamic modeling and interactive visualization.

Mason, Lee; Berrington de Gonzalez, Amy; Garcia-Closas, Montserrat; Chanock, Stephen J; Hicks, Blànaid; Almeida, Jonas S.

PLoS One ; 18(4): e0277149, 2023.

Article in English | MEDLINE | ID: mdl-37011060

ABSTRACT

Forecasting methods are notoriously difficult to interpret, particularly when the relationship between the data and the resulting forecasts is not obvious. Interpretability is an important property of a forecasting method because it allows the user to complement the forecasts with their own knowledge, a process which leads to more applicable results. In general, mechanistic methods are more interpretable than non-mechanistic methods, but they require explicit knowledge of the underlying dynamics. In this paper, we introduce EpiForecast, a tool which performs interpretable, non-mechanistic forecasts using interactive visualization and a simple, data-focused forecasting technique based on empirical dynamic modelling. EpiForecast's primary feature is a four-plot interactive dashboard which displays a variety of information to help the user understand how the forecasts are generated. In addition to point forecasts, the tool produces distributional forecasts using a kernel density estimation method-these are visualized using color gradients to produce a quick, intuitive visual summary of the estimated future. To ensure the work is FAIR and privacy is ensured, we have released the tool as an entirely in-browser web-application.

17.

Moving Toward Findable, Accessible, Interoperable, Reusable Practices in Epidemiologic Research.

García-Closas, Montserrat; Ahearn, Thomas U; Gaudet, Mia M; Hurson, Amber N; Balasubramanian, Jeya Balaji; Choudhury, Parichoy Pal; Gerlanc, Nicole M; Patel, Bhaumik; Russ, Daniel; Abubakar, Mustapha; Freedman, Neal D; Wong, Wendy S W; Chanock, Stephen J; Berrington de Gonzalez, Amy; Almeida, Jonas S.

Am J Epidemiol ; 192(6): 995-1005, 2023 06 02.

Article in English | MEDLINE | ID: mdl-36804665

ABSTRACT

Data sharing is essential for reproducibility of epidemiologic research, replication of findings, pooled analyses in consortia efforts, and maximizing study value to address multiple research questions. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of data sharing. Epidemiological practices that follow Findable, Accessible, Interoperable, Reusable (FAIR) principles can address these barriers by making data resources findable with the necessary metadata, accessible to authorized users, and interoperable with other data, to optimize the reuse of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to remote, accessible ("Cloud") data servers, using machine-readable and nonproprietary files, and developing open-source code. Adoption of these practices will improve daily work and collaborative analyses and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, organizational support, recognition, and incentives for sharing research resources, both data and code. However, these costs are outweighed by the benefits of making research more reproducible, impactful, and equitable by facilitating the reuse of precious research resources by the scientific community.

Subject(s)

Confidentiality , Information Dissemination , Humans , Reproducibility of Results , Software , Epidemiologic Studies

18.

PLCOjs, a FAIR GWAS web SDK for the NCI Prostate, Lung, Colorectal and Ovarian Cancer Genetic Atlas project.

Ruan, Eric; Nemeth, Erika; Moffitt, Richard; Sandoval, Lorena; Machiela, Mitchell J; Freedman, Neal D; Huang, Wen-Yi; Wong, Wendy; Chen, Kai-Ling; Park, Brian; Jiang, Kevin; Hicks, Belynda; Liu, Jia; Russ, Daniel; Minasian, Lori; Pinsky, Paul; Chanock, Stephen J; Garcia-Closas, Montserrat; Almeida, Jonas S.

Bioinformatics ; 38(18): 4434-4436, 2022 09 15.

Article in English | MEDLINE | ID: mdl-35900159

ABSTRACT

MOTIVATION: The Division of Cancer Epidemiology and Genetics (DCEG) and the Division of Cancer Prevention (DCP) at the National Cancer Institute (NCI) have recently generated genome-wide association study (GWAS) data for multiple traits in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Genomic Atlas project. The GWAS included 110 000 participants. The dissemination of the genetic association data through a data portal called GWAS Explorer, in a manner that addresses the modern expectations of FAIR reusability by data scientists and engineers, is the main motivation for the development of the open-source JavaScript software development kit (SDK) reported here. RESULTS: The PLCO GWAS Explorer resource relies on a public stateless HTTP application programming interface (API) deployed as the sole backend service for both the landing page's web application and third-party analytical workflows. The core PLCOjs SDK is mapped to each of the API methods, and also to each of the reference graphic visualizations in the GWAS Explorer. A few additional visualization methods extend it. As is the norm with web SDKs, no download or installation is needed and modularization supports targeted code injection for web applications, reactive notebooks (Observable) and node-based web services. AVAILABILITY AND IMPLEMENTATION: code at https://github.com/episphere/plco; project page at https://episphere.github.io/plco.

Subject(s)

Colorectal Neoplasms , Ovarian Neoplasms , United States , Male , Humans , Female , Genome-Wide Association Study , National Cancer Institute (U.S.) , Prostate , Software , Ovarian Neoplasms/genetics , Lung

19.

Browser-based Data Annotation, Active Learning, and Real-Time Distribution of Artificial Intelligence Models: From Tumor Tissue Microarrays to COVID-19 Radiology.

Bhawsar, Praphulla M S; Abubakar, Mustapha; Schmidt, Marjanka K; Camp, Nicola J; Cessna, Melissa H; Duggan, Máire A; García-Closas, Montserrat; Almeida, Jonas S.

J Pathol Inform ; 12: 38, 2021.

Article in English | MEDLINE | ID: mdl-34760334

ABSTRACT

BACKGROUND: Artificial intelligence (AI) is fast becoming the tool of choice for scalable and reliable analysis of medical images. However, constraints in sharing medical data outside the institutional or geographical space, as well as difficulties in getting AI models and modeling platforms to work across different environments, have led to a "reproducibility crisis" in digital medicine. METHODS: This study details the implementation of a web platform that can be used to mitigate these challenges by orchestrating a digital pathology AI pipeline, from raw data to model inference, entirely on the local machine. We discuss how this federated platform provides governed access to data by consuming the Application Program Interfaces exposed by cloud storage services, allows the addition of user-defined annotations, facilitates active learning for training models iteratively, and provides model inference computed directly in the web browser at practically zero cost. The latter is of particular relevance to clinical workflows because the code, including the AI model, travels to the user's data, which stays private to the governance domain where it was acquired. RESULTS: We demonstrate that the web browser can be a means of democratizing AI and advancing data socialization in medical imaging backed by consumer-facing cloud infrastructure such as Box.com. As a case study, we test the accompanying platform end-to-end on a large dataset of digital breast cancer tissue microarray core images. We also showcase how it can be applied in contexts separate from digital pathology by applying it to a radiology dataset containing COVID-19 computed tomography images. CONCLUSIONS: The platform described in this report resolves the challenges to the findable, accessible, interoperable, reusable stewardship of data and AI models by integrating with cloud storage to maintain user-centric governance over the data. It also enables distributed, federated computation for AI inference over those data and proves the viability of client-side AI in medical imaging. AVAILABILITY: The open-source application is publicly available at , with a short video demonstration at .

20.

Racial and Ethnic Disparities in Excess Deaths During the COVID-19 Pandemic, March to December 2020.

Shiels, Meredith S; Haque, Anika T; Haozous, Emily A; Albert, Paul S; Almeida, Jonas S; García-Closas, Montserrat; Nápoles, Anna M; Pérez-Stable, Eliseo J; Freedman, Neal D; Berrington de González, Amy.

Ann Intern Med ; 174(12): 1693-1699, 2021 12.

Article in English | MEDLINE | ID: mdl-34606321

ABSTRACT

BACKGROUND: Although racial/ethnic disparities in U.S. COVID-19 death rates are striking, focusing on COVID-19 deaths alone may underestimate the true effect of the pandemic on disparities. Excess death estimates capture deaths both directly and indirectly caused by COVID-19. OBJECTIVE: To estimate U.S. excess deaths by racial/ethnic group. DESIGN: Surveillance study. SETTING: United States. PARTICIPANTS: All decedents. MEASUREMENTS: Excess deaths and excess deaths per 100 000 persons from March to December 2020 were estimated by race/ethnicity, sex, age group, and cause of death, using provisional death certificate data from the Centers for Disease Control and Prevention (CDC) and U.S. Census Bureau population estimates. RESULTS: An estimated 2.88 million deaths occurred between March and December 2020. Compared with the number of expected deaths based on 2019 data, 477 200 excess deaths occurred during this period, with 74% attributed to COVID-19. Age-standardized excess deaths per 100 000 persons among Black, American Indian/Alaska Native (AI/AN), and Latino males and females were more than double those in White and Asian males and females. Non-COVID-19 excess deaths also disproportionately affected Black, AI/AN, and Latino persons. Compared with White males and females, non-COVID-19 excess deaths per 100 000 persons were 2 to 4 times higher in Black, AI/AN, and Latino males and females, including deaths due to diabetes, heart disease, cerebrovascular disease, and Alzheimer disease. Excess deaths in 2020 resulted in substantial widening of racial/ethnic disparities in all-cause mortality from 2019 to 2020. LIMITATIONS: Completeness and availability of provisional CDC data; no estimates of precision around results. CONCLUSION: There were profound racial/ethnic disparities in excess deaths in the United States in 2020 during the COVID-19 pandemic, resulting in rapid increases in racial/ethnic disparities in all-cause mortality between 2019 and 2020. PRIMARY FUNDING SOURCE: National Institutes of Health Intramural Research Program.

Subject(s)

COVID-19/ethnology , COVID-19/mortality , Ethnic and Racial Minorities/statistics & numerical data , Health Status Disparities , Pandemics , Adolescent , Adult , Age Distribution , Aged , Aged, 80 and over , Cause of Death , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Middle Aged , Population Surveillance , SARS-CoV-2 , Sex Distribution , United States/epidemiology , Young Adult

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL