Search | VHL Regional Portal

FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation.

Martins, Yasmmin C; Bhawsar, Praphulla Ms; Balasubramanian, Jeya B; Russ, Daniel; Wong, Wendy Sw; Maass, Wolfgang; Almeida, Jonas S.

AMIA Jt Summits Transl Sci Proc ; 2024: 65-74, 2024.

Article in English | MEDLINE | ID: mdl-38827109

ABSTRACT

Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.

Selective model averaging with bayesian rule learning for predictive biomedicine.

Balasubramanian, Jeya B; Visweswaran, Shyam; Cooper, Gregory F; Gopalakrishnan, Vanathi.

AMIA Jt Summits Transl Sci Proc ; 2014: 17-22, 2014.

Article in English | MEDLINE | ID: mdl-25717394

ABSTRACT

Accurate disease classification and biomarker discovery remain challenging tasks in biomedicine. In this paper, we develop and test a practical approach to combining evidence from multiple models when making predictions using selective Bayesian model averaging of probabilistic rules. This method is implemented within a Bayesian Rule Learning system and compared to model selection when applied to twelve biomedical datasets using the area under the ROC curve measure of performance. Cross-validation results indicate that selective Bayesian model averaging statistically significantly outperforms model selection on average in these experiments, suggesting that combining predictions from multiple models may lead to more accurate quantification of classifier uncertainty. This approach would directly impact the generation of robust predictions on unseen test data, while also increasing knowledge for biomarker discovery and mechanisms that underlie disease.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL