Pesquisa | Portal Regional da BVS

Correction: Reproducible big data science: A case study in continuous FAIRness.

Madduri, Ravi; Chard, Kyle; D'Arcy, Mike; Jung, Segun C; Rodriguez, Alexis; Sulakhe, Dinanath; Deutsch, Eric; Funk, Cory; Heavner, Ben; Richards, Matthew; Shannon, Paul; Glusman, Gustavo; Price, Nathan; Kesselman, Carl; Foster, Ian.

PLoS One ; 18(11): e0294883, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37988378

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0213013.].

Building a collaborative cloud platform to accelerate heart, lung, blood, and sleep research.

Ahalt, Stan; Avillach, Paul; Boyles, Rebecca; Bradford, Kira; Cox, Steven; Davis-Dusenbery, Brandi; Grossman, Robert L; Krishnamurthy, Ashok; Manning, Alisa; Paten, Benedict; Philippakis, Anthony; Borecki, Ingrid; Chen, Shu Hui; Kaltman, Jon; Ladwa, Sweta; Schwartz, Chip; Thomson, Alastair; Davis, Sarah; Leaf, Alison; Lyons, Jessica; Sheets, Elizabeth; Bis, Joshua C; Conomos, Matthew; Culotti, Alessandro; Desain, Thomas; Digiovanna, Jack; Domazet, Milan; Gogarten, Stephanie; Gutierrez-Sacristan, Alba; Harris, Tim; Heavner, Ben; Jain, Deepti; O'Connor, Brian; Osborn, Kevin; Pillion, Danielle; Pleiness, Jacob; Rice, Ken; Rupp, Garrett; Serret-Larmande, Arnaud; Smith, Albert; Stedman, Jason P; Stilp, Adrienne; Barsanti, Teresa; Cheadle, John; Erdmann, Christopher; Farlow, Brandy; Gartland-Gray, Allie; Hayes, Julie; Hiles, Hannah; Kerr, Paul.

J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.

Artigo em Inglês | MEDLINE | ID: mdl-37192819

RESUMO

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData Catalystâ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.

Assuntos

COVID-19 , Computação em Nuvem , Humanos , Ecossistema , Reprodutibilidade dos Testes , Pulmão , Software

A Model of Minor Histocompatibility Antigens in Allogeneic Hematopoietic Cell Transplantation.

Martin, Paul J; Levine, David M; Storer, Barry E; Zheng, Xiuwen; Jain, Deepti; Heavner, Ben; Norris, Brandon M; Geraghty, Daniel E; Spellman, Stephen R; Sather, Cassie L; Wu, Feinan; Hansen, John A.

Front Immunol ; 12: 782152, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34868058

RESUMO

Minor histocompatibility antigens (mHAg) composed of peptides presented by HLA molecules can cause immune responses involved in graft-versus-host disease (GVHD) and graft-versus-leukemia effects after allogeneic hematopoietic cell transplantation (HCT). The current study was designed to identify individual graft-versus-host genomic mismatches associated with altered risks of acute or chronic GVHD or relapse after HCT between HLA-genotypically identical siblings. Our results demonstrate that in allogeneic HCT between a pair of HLA-identical siblings, a mHAg manifests as a set of peptides originating from annotated proteins and non-annotated open reading frames, which i) are encoded by a group of highly associated recipient genomic mismatches, ii) bind to HLA allotypes in the recipient, and iii) evoke a donor immune response. Attribution of the immune response and consequent clinical outcomes to individual peptide components within this set will likely differ from patient to patient according to their HLA types.

Assuntos

Transplante de Células-Tronco Hematopoéticas , Antígenos de Histocompatibilidade Menor/imunologia , Imunologia de Transplantes , Adolescente , Adulto , Idoso , Alelos , Criança , Pré-Escolar , Suscetibilidade a Doenças/imunologia , Feminino , Predisposição Genética para Doença , Variação Genética , Doença Enxerto-Hospedeiro/epidemiologia , Doença Enxerto-Hospedeiro/etiologia , Antígenos HLA/genética , Antígenos HLA/imunologia , Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Transplante de Células-Tronco Hematopoéticas/métodos , Humanos , Incidência , Lactente , Recém-Nascido , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Antígenos de Histocompatibilidade Menor/genética , Peptídeos/genética , Peptídeos/imunologia , Transplante Homólogo , Adulto Jovem

BinomiRare: A robust test for association of a rare genetic variant with a binary outcome for mixed models and any case-control proportion.

Sofer, Tamar; Lee, Jiwon; Kurniansyah, Nuzulul; Jain, Deepti; Laurie, Cecelia A; Gogarten, Stephanie M; Conomos, Matthew P; Heavner, Ben; Hu, Yao; Kooperberg, Charles; Haessler, Jeffrey; Vasan, Ramachandran S; Cupples, L Adrienne; Coombes, Brandon J; Seyerle, Amanda; Gharib, Sina A; Chen, Han; O'Connell, Jeffrey R; Zhang, Man; Gottlieb, Daniel J; Psaty, Bruce M; Longstreth, W T; Rotter, Jerome I; Taylor, Kent D; Rich, Stephen S; Guo, Xiuqing; Boerwinkle, Eric; Morrison, Alanna C; Pankow, James S; Johnson, Andrew D; Pankratz, Nathan; Reiner, Alex P; Redline, Susan; Smith, Nicholas L; Rice, Kenneth M; Schifano, Elizabeth D.

HGG Adv ; 2(3)2021 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-34337551

RESUMO

Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.

Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data across 27 Tissue Types.

Funk, Cory C; Casella, Alex M; Jung, Segun; Richards, Matthew A; Rodriguez, Alex; Shannon, Paul; Donovan-Maiye, Rory; Heavner, Ben; Chard, Kyle; Xiao, Yukai; Glusman, Gustavo; Ertekin-Taner, Nilufer; Golde, Todd E; Toga, Arthur; Hood, Leroy; Van Horn, John D; Kesselman, Carl; Foster, Ian; Madduri, Ravi; Price, Nathan D; Ament, Seth A.

Cell Rep ; 32(7): 108029, 2020 08 18.

Artigo em Inglês | MEDLINE | ID: mdl-32814038

RESUMO

Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking. Here, we develop a scalable footprinting workflow using two state-of-the-art algorithms: Wellington and HINT. We apply our workflow to detect footprints in 192 ENCODE DNase-seq experiments and predict the genomic occupancy of 1,515 human TFs in 27 human tissues. We validate that these footprints overlap true-positive TF binding sites from ChIP-seq. We demonstrate that the locations, depth, and tissue specificity of footprints predict effects of genetic variants on gene expression and capture a substantial proportion of genetic risk for complex traits.

Assuntos

Sítios de Ligação/genética , Desoxirribonucleases/metabolismo , Genômica/métodos , Fatores de Transcrição/metabolismo , Humanos

Reproducible big data science: A case study in continuous FAIRness.

PLoS One ; 14(4): e0213013, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30973881

RESUMO

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.

Assuntos

Big Data , Ciência de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Algoritmos , Humanos , Disseminação de Informação , Estudos Longitudinais , Software

Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W.

PLoS One ; 11(8): e0157077, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27494614

RESUMO

BACKGROUND: A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. METHODS AND FINDINGS: Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. CONCLUSIONS: Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

Assuntos

Bases de Dados Factuais , Doença de Parkinson/diagnóstico , Idoso , Progressão da Doença , Feminino , Humanos , Modelos Logísticos , Masculino , Neuroimagem , Doença de Parkinson/genética , Doença de Parkinson/patologia , Máquina de Vetores de Suporte

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA