Search | VHL Regional Portal

RWD-Cockpit: Application for Quality Assessment of Real-world Data.

Babrak, Lmar Marie; Smakaj, Erand; Agac, Teyfik; Asprion, Petra Maria; Grimberg, Frank; der Werf, Daan Van; van Ginkel, Erwin Willem; Tosoni, Deniz David; Clay, Ieuan; Degen, Markus; Brodbeck, Dominique; Natali, Eriberto Noel; Schkommodau, Erik; Miho, Enkelejda.

JMIR Form Res ; 6(10): e29920, 2022 Oct 18.

Article in English | MEDLINE | ID: mdl-35266872

ABSTRACT

BACKGROUND: Digital technologies are transforming the health care system. A large part of information is generated as real-world data (RWD). Data from electronic health records and digital biomarkers have the potential to reveal associations between the benefits and adverse events of medicines, establish new patient-stratification principles, expose unknown disease correlations, and inform on preventive measures. The impact for health care payers and providers, the biopharmaceutical industry, and governments is massive in terms of health outcomes, quality of care, and cost. However, a framework to assess the preliminary quality of RWD is missing, thus hindering the conduct of population-based observational studies to support regulatory decision-making and real-world evidence. OBJECTIVE: To address the need to qualify RWD, we aimed to build a web application as a tool to translate characterization of some quality parameters of RWD into a metric and propose a standard framework for evaluating the quality of the RWD. METHODS: The RWD-Cockpit systematically scores data sets based on proposed quality metrics and customizable variables chosen by the user. Sleep RWD generated de novo and publicly available data sets were used to validate the usability and applicability of the web application. The RWD quality score is based on the evaluation of 7 variables: manageability specifies access and publication status; complexity defines univariate, multivariate, and longitudinal data; sample size indicates the size of the sample or samples; privacy and liability stipulates privacy rules; accessibility specifies how the data set can be accessed and to what granularity; periodicity specifies how often the data set is updated; and standardization specifies whether the data set adheres to any specific technical or metadata standard. These variables are associated with several descriptors that define specific characteristics of the data set. RESULTS: To address the need to qualify RWD, we built the RWD-Cockpit web application, which proposes a framework and applies a common standard for a preliminary evaluation of RWD quality across data sets-molecular, phenotypical, and social-and proposes a standard that can be further personalized by the community retaining an internal standard. Applied to 2 different case studies-de novo-generated sleep data and publicly available data sets-the RWD-Cockpit could identify and provide researchers with variables that might increase quality. CONCLUSIONS: The results from the application of the framework of RWD metrics implemented in the RWD-Cockpit application suggests that multiple data sets can be preliminarily evaluated in terms of quality using the proposed metrics. The output scores-quality identifiers-provide a first quality assessment for the use of RWD. Although extensive challenges remain to be addressed to set RWD quality standards, our proposal can serve as an initial blueprint for community efforts in the characterization of RWD quality for regulated settings.

Machine Learning Detects Anti-DENV Signatures in Antibody Repertoire Sequences.

Horst, Alexander; Smakaj, Erand; Natali, Eriberto Noel; Tosoni, Deniz; Babrak, Lmar Marie; Meier, Patrick; Miho, Enkelejda.

Front Artif Intell ; 4: 715462, 2021.

Article in English | MEDLINE | ID: mdl-34708197

ABSTRACT

Dengue infection is a global threat. As of today, there is no universal dengue fever treatment or vaccines unreservedly recommended by the World Health Organization. The investigation of the specific immune response to dengue virus would support antibody discovery as therapeutics for passive immunization and vaccine design. High-throughput sequencing enables the identification of the multitude of antibodies elicited in response to dengue infection at the sequence level. Artificial intelligence can mine the complex data generated and has the potential to uncover patterns in entire antibody repertoires and detect signatures distinctive of single virus-binding antibodies. However, these machine learning have not been harnessed to determine the immune response to dengue virus. In order to enable the application of machine learning, we have benchmarked existing methods for encoding biological and chemical knowledge as inputs and have investigated novel encoding techniques. We have applied different machine learning methods such as neural networks, random forests, and support vector machines and have investigated the parameter space to determine best performing algorithms for the detection and prediction of antibody patterns at the repertoire and antibody sequence levels in dengue-infected individuals. Our results show that immune response signatures to dengue are detectable both at the antibody repertoire and at the antibody sequence levels. By combining machine learning with phylogenies and network analysis, we generated novel sequences that present dengue-binding specific signatures. These results might aid further antibody discovery and support vaccine design.

Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences.

Smakaj, Erand; Babrak, Lmar; Ohlin, Mats; Shugay, Mikhail; Briney, Bryan; Tosoni, Deniz; Galli, Christopher; Grobelsek, Vendi; D'Angelo, Igor; Olson, Branden; Reddy, Sai; Greiff, Victor; Trück, Johannes; Marquez, Susanna; Lees, William; Miho, Enkelejda.

Bioinformatics ; 36(6): 1731-1739, 2020 03 01.

Article in English | MEDLINE | ID: mdl-31873728

ABSTRACT

SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Benchmarking , High-Throughput Nucleotide Sequencing , Antibodies , Humans , Reproducibility of Results

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL