Search | VHL Regional Portal

1.

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning.

Afonso, Martim; Bhawsar, Praphulla M S; Saha, Monjoy; Almeida, Jonas S; Oliveira, Arlindo L.

ArXiv ; 2024 Apr 11.

Article in English | MEDLINE | ID: mdl-38903738

ABSTRACT

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

2.

FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation.

Martins, Yasmmin C; Bhawsar, Praphulla Ms; Balasubramanian, Jeya B; Russ, Daniel; Wong, Wendy Sw; Maass, Wolfgang; Almeida, Jonas S.

AMIA Jt Summits Transl Sci Proc ; 2024: 65-74, 2024.

Article in English | MEDLINE | ID: mdl-38827109

ABSTRACT

Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.

3.

PRScalc, a privacy-preserving calculation of raw polygenic risk scores from direct-to-consumer genomics data.

Sandoval, Lorena; Jafri, Saleet; Balasubramanian, Jeya Balaji; Bhawsar, Praphulla; Edelson, Jacob L; Martins, Yasmmin; Maass, Wolfgang; Chanock, Stephen J; Garcia-Closas, Montserrat; Almeida, Jonas S.

Bioinform Adv ; 3(1): vbad145, 2023.

Article in English | MEDLINE | ID: mdl-37868335

ABSTRACT

Motivation: Currently, the Polygenic Score (PGS) Catalog curates over 400 publications on over 500 traits corresponding to over 3000 polygenic risk scores (PRSs). To assess the feasibility of privately calculating the underlying multivariate relative risk for individuals with consumer genomics data, we developed an in-browserPRS calculator for genomic data that does not circulate any data or engage in any computation outside of the user's personal device. Results: A prototype personal risk score calculator, created for research purposes, was developed to demonstrate how the PGS Catalog can be privately and readily applied to readily available direct-to-consumer genetic testing services, such as 23andMe. No software download, installation, or configuration is needed. The PRS web calculator matches individual PGS catalog entries with an individual's 23andMe genome data composed of 600k to 1.4 M single-nucleotide polymorphisms (SNPs). Beta coefficients provide researchers with a convenient assessment of risk associated with matched SNPs. This in-browser application was tested in a variety of personal devices, including smartphones, establishing the feasibility of privately calculating personal risk scores with up to a few thousand reference genetic variations and from the full 23andMe SNP data file (compressed or not). Availability and implementation: The PRScalc web application is developed in JavaScript, HTML, and CSS and is available at GitHub repository (https://episphere.github.io/prs) under an MIT license. The datasets were derived from sources in the public domain: [PGS Catalog, Personal Genome Project].

4.

epiDonate - distributed serverless data infrastructure for epidemiological studies.

Almeida, Jonas S; Patel, Bhaumik; Russ, Daniel E; Bhawsar, Praphulla; Maurice, Pedro F Saint-; Matthews, Charles; Anand, Adit; Ferguson, Martin; Johnson, Davin; Brotzman, Michelle; Gerlanc, Nicole; Chanock, Stephen; de Gonzalez, Amy Berrington; Gaudet, Mia; Garcia-Closas, Montserrat.

AMIA Jt Summits Transl Sci Proc ; 2023: 25-31, 2023.

Article in English | MEDLINE | ID: mdl-37350888

ABSTRACT

Motivation: Epidemiological studies face two important challenges: the need to ingest ever more complex data types, and mounting concerns about participant privacy and data governance. These two challenges are compounded by the expectation that data infrastructure will eventually need to facilitate cross-registration of participants by multiple epidemiological studies. Implementation: The portable web-service epiDonate was developed using the serverless model known as FaaS (Function-as-a-Service). The reference implementation uses nodejs. The implementation relies on a simple tokenization scheme, mediated by a public API, that a) distinguishes admin from participant roles, with b) extensible permission configuration operating a read/write structure. General Features: The critical design feature of epiDonate is the absence of business logic on the server-side (the web service). The simplicity removes the need to customize virtual machines and enables ecosystems of multiple web Applications backed by one or more data donation deployments. Availability: https://episphere.github.io/donate.

5.

Browser-based Data Annotation, Active Learning, and Real-Time Distribution of Artificial Intelligence Models: From Tumor Tissue Microarrays to COVID-19 Radiology.

Bhawsar, Praphulla M S; Abubakar, Mustapha; Schmidt, Marjanka K; Camp, Nicola J; Cessna, Melissa H; Duggan, Máire A; García-Closas, Montserrat; Almeida, Jonas S.

J Pathol Inform ; 12: 38, 2021.

Article in English | MEDLINE | ID: mdl-34760334

ABSTRACT

BACKGROUND: Artificial intelligence (AI) is fast becoming the tool of choice for scalable and reliable analysis of medical images. However, constraints in sharing medical data outside the institutional or geographical space, as well as difficulties in getting AI models and modeling platforms to work across different environments, have led to a "reproducibility crisis" in digital medicine. METHODS: This study details the implementation of a web platform that can be used to mitigate these challenges by orchestrating a digital pathology AI pipeline, from raw data to model inference, entirely on the local machine. We discuss how this federated platform provides governed access to data by consuming the Application Program Interfaces exposed by cloud storage services, allows the addition of user-defined annotations, facilitates active learning for training models iteratively, and provides model inference computed directly in the web browser at practically zero cost. The latter is of particular relevance to clinical workflows because the code, including the AI model, travels to the user's data, which stays private to the governance domain where it was acquired. RESULTS: We demonstrate that the web browser can be a means of democratizing AI and advancing data socialization in medical imaging backed by consumer-facing cloud infrastructure such as Box.com. As a case study, we test the accompanying platform end-to-end on a large dataset of digital breast cancer tissue microarray core images. We also showcase how it can be applied in contexts separate from digital pathology by applying it to a radiology dataset containing COVID-19 computed tomography images. CONCLUSIONS: The platform described in this report resolves the challenges to the findable, accessible, interoperable, reusable stewardship of data and AI models by integrating with cloud storage to maintain user-centric governance over the data. It also enables distributed, federated computation for AI inference over those data and proves the viability of client-side AI in medical imaging. AVAILABILITY: The open-source application is publicly available at , with a short video demonstration at .

6.

Genomic and evolutionary classification of lung cancer in never smokers.

Zhang, Tongwu; Joubert, Philippe; Ansari-Pour, Naser; Zhao, Wei; Hoang, Phuc H; Lokanga, Rachel; Moye, Aaron L; Rosenbaum, Jennifer; Gonzalez-Perez, Abel; Martínez-Jiménez, Francisco; Castro, Andrea; Muscarella, Lucia Anna; Hofman, Paul; Consonni, Dario; Pesatori, Angela C; Kebede, Michael; Li, Mengying; Gould Rothberg, Bonnie E; Peneva, Iliana; Schabath, Matthew B; Poeta, Maria Luana; Costantini, Manuela; Hirsch, Daniela; Heselmeyer-Haddad, Kerstin; Hutchinson, Amy; Olanich, Mary; Lawrence, Scott M; Lenz, Petra; Duggan, Maire; Bhawsar, Praphulla M S; Sang, Jian; Kim, Jung; Mendoza, Laura; Saini, Natalie; Klimczak, Leszek J; Islam, S M Ashiqul; Otlu, Burcak; Khandekar, Azhar; Cole, Nathan; Stewart, Douglas R; Choi, Jiyeon; Brown, Kevin M; Caporaso, Neil E; Wilson, Samuel H; Pommier, Yves; Lan, Qing; Rothman, Nathaniel; Almeida, Jonas S; Carter, Hannah; Ried, Thomas.

Nat Genet ; 53(9): 1348-1359, 2021 09.

Article in English | MEDLINE | ID: mdl-34493867

ABSTRACT

Lung cancer in never smokers (LCINS) is a common cause of cancer mortality but its genomic landscape is poorly characterized. Here high-coverage whole-genome sequencing of 232 LCINS showed 3 subtypes defined by copy number aberrations. The dominant subtype (piano), which is rare in lung cancer in smokers, features somatic UBA1 mutations, germline AR variants and stem cell-like properties, including low mutational burden, high intratumor heterogeneity, long telomeres, frequent KRAS mutations and slow growth, as suggested by the occurrence of cancer drivers' progenitor cells many years before tumor diagnosis. The other subtypes are characterized by specific amplifications and EGFR mutations (mezzo-forte) and whole-genome doubling (forte). No strong tobacco smoking signatures were detected, even in cases with exposure to secondhand tobacco smoke. Genes within the receptor tyrosine kinase-Ras pathway had distinct impacts on survival; five genomic alterations independently doubled mortality. These findings create avenues for personalized treatment in LCINS.

Subject(s)

DNA Copy Number Variations/genetics , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Non-Smokers/statistics & numerical data , Adult , Aged , Aged, 80 and over , ErbB Receptors/genetics , Female , Genome/genetics , Genome-Wide Association Study , Humans , Male , Middle Aged , Neoplastic Stem Cells/pathology , Proto-Oncogene Proteins p21(ras)/genetics , Receptors, Androgen/genetics , Risk Factors , Smoking/genetics , Ubiquitin-Activating Enzymes/genetics , Whole Genome Sequencing , Young Adult

7.

Mortality Tracker: the COVID-19 case for real time web APIs as epidemiology commons.

Almeida, Jonas S; Shiels, Meredith; Bhawsar, Praphulla; Patel, Bhaumik; Nemeth, Erika; Moffitt, Richard; Closas, Montserrat Garcia; Freedman, Neal; Berrington, Amy.

Bioinformatics ; 37(14): 2073-2074, 2021 08 04.

Article in English | MEDLINE | ID: mdl-33135727

ABSTRACT

MOTIVATION: Mortality Tracker is an in-browser application for data wrangling, analysis, dissemination and visualization of public time series of mortality in the United States. It was developed in response to requests by epidemiologists for portable real time assessment of the effect of COVID-19 on other causes of death and all-cause mortality. This is performed by comparing 2020 real time values with observations from the same week in the previous 5 years, and by enabling the extraction of temporal snapshots of mortality series that facilitate modeling the interdependence between its causes. RESULTS: Our solution employs a scalable 'Data Commons at Web Scale' approach that abstracts all stages of the data cycle as in-browser components. Specifically, the data wrangling computation, not just the orchestration of data retrieval, takes place in the browser, without any requirement to download or install software. This approach, where operations that would normally be computed server-side are mapped to in-browser SDKs, is sometimes loosely described as Web APIs, a designation adopted here. AVAILABILITYAND IMPLEMENTATION: https://episphere.github.io/mortalitytracker; webcast demo: youtu.be/ZsvCe7cZzLo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

COVID-19 , Computers , Humans , Information Storage and Retrieval , SARS-CoV-2 , Software

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL