Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.562
Filter
1.
Metabolomics ; 20(4): 73, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38980450

ABSTRACT

INTRODUCTION: During the Metabolomics 2023 conference, the Metabolomics Quality Assurance and Quality Control Consortium (mQACC) presented a QA/QC workshop for LC-MS-based untargeted metabolomics. OBJECTIVES: The Best Practices Working Group disseminated recent findings from community forums and discussed aspects to include in a living guidance document. METHODS: Presentations focused on reference materials, data quality review, metabolite identification/annotation and quality assurance. RESULTS: Live polling results and follow-up discussions offered a broad international perspective on QA/QC practices. CONCLUSIONS: Community input gathered from this workshop series is being used to shape the living guidance document, a continually evolving QA/QC best practices resource for metabolomics researchers.


Subject(s)
Mass Spectrometry , Metabolomics , Quality Control , Metabolomics/methods , Metabolomics/standards , Chromatography, Liquid/methods , Chromatography, Liquid/standards , Mass Spectrometry/methods , Humans , Consensus , Liquid Chromatography-Mass Spectrometry
2.
JMIR Med Inform ; 12: e52934, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38973192

ABSTRACT

Background: The traditional clinical trial data collection process requires a clinical research coordinator who is authorized by the investigators to read from the hospital's electronic medical record. Using electronic source data opens a new path to extract patients' data from electronic health records (EHRs) and transfer them directly to an electronic data capture (EDC) system; this method is often referred to as eSource. eSource technology in a clinical trial data flow can improve data quality without compromising timeliness. At the same time, improved data collection efficiency reduces clinical trial costs. Objective: This study aims to explore how to extract clinical trial-related data from hospital EHR systems, transform the data into a format required by the EDC system, and transfer it into sponsors' environments, and to evaluate the transferred data sets to validate the availability, completeness, and accuracy of building an eSource dataflow. Methods: A prospective clinical trial study registered on the Drug Clinical Trial Registration and Information Disclosure Platform was selected, and the following data modules were extracted from the structured data of 4 case report forms: demographics, vital signs, local laboratory data, and concomitant medications. The extracted data was mapped and transformed, deidentified, and transferred to the sponsor's environment. Data validation was performed based on availability, completeness, and accuracy. Results: In a secure and controlled data environment, clinical trial data was successfully transferred from a hospital EHR to the sponsor's environment with 100% transcriptional accuracy, but the availability and completeness of the data could be improved. Conclusions: Data availability was low due to some required fields in the EDC system not being available directly in the EHR. Some data is also still in an unstructured or paper-based format. The top-level design of the eSource technology and the construction of hospital electronic data standards should help lay a foundation for a full electronic data flow from EHRs to EDC systems in the future.

3.
Behav Res Methods ; 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38977607

ABSTRACT

To detect careless and insufficient effort (C/IE) survey responders, researchers can use infrequency items - items that almost no one agrees with (e.g., "When a friend greets me, I generally try to say nothing back") - and frequency items - items that almost everyone agrees with (e.g., "I try to listen when someone I care about is telling me something"). Here, we provide initial validation for two sets of these items: the 14-item Invalid Responding Inventory for Statements (IDRIS) and the 6-item Invalid Responding Inventory for Adjectives (IDRIA). Across six studies (N1 = 536; N2 = 701; N3 = 500; N4 = 499; N5 = 629, N6 = 562), we found consistent evidence that the IDRIS is capable of detecting C/IE responding among statement-based scales (e.g., the HEXACO-PI-R) and the IDRIA is capable of detecting C/IE responding among both adjective-based scales (e.g., the Lex-20) and adjective-derived scales (e.g., the BFI-2). These findings were robust across different analytic approaches (e.g., Pearson correlations; Spearman rank-order correlations), different indices of C/IE responding (e.g., person-total correlations; semantic synonyms; horizontal cursor variability), and different sample types (e.g., US undergraduate students; Nigerian survey panel participants). Taken together, these results provide promising evidence for the utility of the IDRIS and IDRIA in detecting C/IE responding.

4.
Clin Chem Lab Med ; 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38965828

ABSTRACT

There is a need for standards for generation and reporting of Biological Variation (BV) reference data. The absence of standards affects the quality and transportability of BV data, compromising important clinical applications. To address this issue, international expert groups under the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) have developed an online resource (https://tinyurl.com/bvmindmap) in the form of an interactive mind map that serves as a guideline for researchers planning, performing and reporting BV studies. The mind map addresses study design, data analysis, and reporting criteria, providing embedded links to relevant references and resources. It also incorporates a checklist approach, identifying a Minimum Data Set (MDS) to enable the transportability of BV data and incorporates the Biological Variation Data Critical Appraisal Checklist (BIVAC) to assess study quality. The mind map is open to access and is disseminated through the EFLM BV Database website, promoting accessibility and compliance to a reporting standard, thereby providing a tool to be used to ensure data quality, consistency, and comparability of BV data. Thus, comparable to the STARD initiative for diagnostic accuracy studies, the mind map introduces a Standard for Reporting Biological Variation Data Studies (STARBIV), which can enhance the reporting quality of BV studies, foster user confidence, provide better decision support, and be used as a tool for critical appraisal. Ongoing refinement is expected to adapt to emerging methodologies, ensuring a positive trajectory toward improving the validity and applicability of BV data in clinical practice.

5.
J Comp Eff Res ; : e240095, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38967245

ABSTRACT

In this update, we discuss recent US FDA guidance offering more specific guidelines on appropriate study design and analysis to support causal inference for non-interventional studies and the launch of the European Medicines Agency (EMA) and the Heads of Medicines Agencies (HMA) public electronic catalogues. We also highlight an article recommending assessing data quality and suitability prior to protocol finalization and a Journal of the American Medical Association-endorsed framework for using causal language when publishing real-world evidence studies. Finally, we explore the potential of large language models to automate the development of health economic models.

6.
JMIR Public Health Surveill ; 10: e49127, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38959048

ABSTRACT

BACKGROUND: Electronic health records (EHRs) play an increasingly important role in delivering HIV care in low- and middle-income countries. The data collected are used for direct clinical care, quality improvement, program monitoring, public health interventions, and research. Despite widespread EHR use for HIV care in African countries, challenges remain, especially in collecting high-quality data. OBJECTIVE: We aimed to assess data completeness, accuracy, and timeliness compared to paper-based records, and factors influencing data quality in a large-scale EHR deployment in Rwanda. METHODS: We randomly selected 50 health facilities (HFs) using OpenMRS, an EHR system that supports HIV care in Rwanda, and performed a data quality evaluation. All HFs were part of a larger randomized controlled trial, with 25 HFs receiving an enhanced EHR with clinical decision support systems. Trained data collectors visited the 50 HFs to collect 28 variables from the paper charts and the EHR system using the Open Data Kit app. We measured data completeness, timeliness, and the degree of matching of the data in paper and EHR records, and calculated concordance scores. Factors potentially affecting data quality were drawn from a previous survey of users in the 50 HFs. RESULTS: We randomly selected 3467 patient records, reviewing both paper and EHR copies (194,152 total data items). Data completeness was >85% threshold for all data elements except viral load (VL) results, second-line, and third-line drug regimens. Matching scores for data values were close to or >85% threshold, except for dates, particularly for drug pickups and VL. The mean data concordance was 10.2 (SD 1.28) for 15 (68%) variables. HF and user factors (eg, years of EHR use, technology experience, EHR availability and uptime, and intervention status) were tested for correlation with data quality measures. EHR system availability and uptime was positively correlated with concordance, whereas users' experience with technology was negatively correlated with concordance. The alerts for missing VL results implemented at 11 intervention HFs showed clear evidence of improving timeliness and completeness of initially low matching of VL results in the EHRs and paper records (11.9%-26.7%; P<.001). Similar effects were seen on the completeness of the recording of medication pickups (18.7%-32.6%; P<.001). CONCLUSIONS: The EHR records in the 50 HFs generally had high levels of completeness except for VL results. Matching results were close to or >85% threshold for nondate variables. Higher EHR stability and uptime, and alerts for entering VL both strongly improved data quality. Most data were considered fit for purpose, but more regular data quality assessments, training, and technical improvements in EHR forms, data reports, and alerts are recommended. The application of quality improvement techniques described in this study should benefit a wide range of HFs and data uses for clinical care, public health, and disease surveillance.


Subject(s)
Data Accuracy , Electronic Health Records , HIV Infections , Health Facilities , Rwanda , Electronic Health Records/statistics & numerical data , Electronic Health Records/standards , Humans , Cross-Sectional Studies , HIV Infections/drug therapy , Health Facilities/statistics & numerical data , Health Facilities/standards
7.
BMC Med Inform Decis Mak ; 24(1): 152, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38831432

ABSTRACT

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community.. RESULTS: The proposed model in this study was applied to three distinct tabular datasets, resulting in notable enhancements in their quality with respect to readiness for ML tasks, as assessed through established data quality metrics. Our findings demonstrate the efficacy of the framework in substantially augmenting the original dataset quality, achieved through the elimination of extraneous features and rows. This refinement yielded improved accuracy across both supervised and unsupervised learning methodologies. CONCLUSION: Our software presents an automated framework for data readiness, aimed at enhancing the integrity of raw datasets to facilitate robust utilization within ML pipelines. Through our proposed framework, we streamline the original dataset, resulting in enhanced accuracy and efficiency within the associated ML algorithms.


Subject(s)
Machine Learning , Humans , Datasets as Topic , Unsupervised Machine Learning , Algorithms , Supervised Machine Learning , Software
8.
BMC Public Health ; 24(1): 1513, 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38840063

ABSTRACT

BACKGROUND: Quality smoking data is crucial for assessing smoking-related health risk and eligibility for interventions related to that risk. Smoking information collected in primary care practices (PCPs) is a major data source; however, little is known about the PCP smoking data quality. This project compared PCP smoking data to that collected in the Maori and Pacific Abdominal Aortic Aneurysm (AAA) screening programme. METHODS: A two stage review was conducted. In Stage 1, data quality was assessed by comparing the PCP smoking data recorded close to AAA screening episodes with the data collected from participants at the AAA screening session. Inter-rater reliability was analysed using Cohen's kappa scores. In Stage 2, an audit of longitudinal smoking status was conducted, of a subset of participants potentially misclassified in Stage 1. Data were compared in three groups: current smoker (smoke at least monthly), ex-smoker (stopped > 1 month ago) and never smoker (smoked < 100 cigarettes in lifetime). RESULTS: Of the 1841 people who underwent AAA screening, 1716 (93%) had PCP smoking information. Stage 1 PCP smoking data showed 82% concordance with the AAA data (adjusted kappa 0.76). Fewer current or ex-smokers were recorded in PCP data. In the Stage 2 analysis of discordant and missing data (N = 313), 212 were enrolled in the 29 participating PCPs, and of these 13% were deceased and 41% had changed PCP. Of the 93 participants still enrolled in the participating PCPs, smoking status had been updated for 43%. Data on quantity, duration, or quit date of smoking were largely missing in PCP records. The AAA data of ex-smokers who were classified as never smokers in the Stage 2 PCP data (N = 27) showed a median smoking cessation duration of 32 years (range 0-50 years), with 85% (N = 23) having quit more than 15 years ago. CONCLUSIONS: PCP smoking data quality compared with the AAA data is consistent with international findings. PCP data captured fewer current and ex-smokers, suggesting ongoing improvement is important. Intervention programmes based on smoking status should consider complementary mechanisms to ensure eligible individuals are not missed from programme invitation.


Subject(s)
Aortic Aneurysm, Abdominal , Primary Health Care , Smoking , Humans , New Zealand/epidemiology , Male , Aortic Aneurysm, Abdominal/diagnosis , Female , Middle Aged , Aged , Smoking/epidemiology , Data Accuracy , Native Hawaiian or Other Pacific Islander/statistics & numerical data , Mass Screening , Maori People
9.
BMC Med Inform Decis Mak ; 24(1): 155, 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38840250

ABSTRACT

BACKGROUND: Diagnosis can often be recorded in electronic medical records (EMRs) as free-text or using a term with a diagnosis code. Researchers, governments, and agencies, including organisations that deliver incentivised primary care quality improvement programs, frequently utilise coded data only and often ignore free-text entries. Diagnosis data are reported for population healthcare planning including resource allocation for patient care. This study sought to determine if diagnosis counts based on coded diagnosis data only, led to under-reporting of disease prevalence and if so, to what extent for six common or important chronic diseases. METHODS: This cross-sectional data quality study used de-identified EMR data from 84 general practices in Victoria, Australia. Data represented 456,125 patients who attended one of the general practices three or more times in two years between January 2021 and December 2022. We reviewed the percentage and proportional difference between patient counts of coded diagnosis entries alone and patient counts of clinically validated free-text entries for asthma, chronic kidney disease, chronic obstructive pulmonary disease, dementia, type 1 diabetes and type 2 diabetes. RESULTS: Undercounts were evident in all six diagnoses when using coded diagnoses alone (2.57-36.72% undercount), of these, five were statistically significant. Overall, 26.4% of all patient diagnoses had not been coded. There was high variation between practices in recording of coded diagnoses, but coding for type 2 diabetes was well captured by most practices. CONCLUSION: In Australia clinical decision support and the reporting of aggregated patient diagnosis data to government that relies on coded diagnoses can lead to significant underreporting of diagnoses compared to counts that also incorporate clinically validated free-text diagnoses. Diagnosis underreporting can impact on population health, healthcare planning, resource allocation, and patient care. We propose the use of phenotypes derived from clinically validated text entries to enhance the accuracy of diagnosis and disease reporting. There are existing technologies and collaborations from which to build trusted mechanisms to provide greater reliability of general practice EMR data used for secondary purposes.


Subject(s)
Electronic Health Records , General Practice , Humans , Cross-Sectional Studies , General Practice/statistics & numerical data , Electronic Health Records/standards , Victoria , Chronic Disease , Clinical Coding/standards , Data Accuracy , Population Health/statistics & numerical data , Male , Female , Middle Aged , Adult , Australia , Aged , Diabetes Mellitus, Type 2/diagnosis , Diabetes Mellitus, Type 2/epidemiology
10.
Antibiotics (Basel) ; 13(6)2024 May 29.
Article in English | MEDLINE | ID: mdl-38927169

ABSTRACT

Antibiotic resistance poses a significant threat to global public health due to complex interactions between bacterial genetic factors and external influences such as antibiotic misuse. Artificial intelligence (AI) offers innovative strategies to address this crisis. For example, AI can analyze genomic data to detect resistance markers early on, enabling early interventions. In addition, AI-powered decision support systems can optimize antibiotic use by recommending the most effective treatments based on patient data and local resistance patterns. AI can accelerate drug discovery by predicting the efficacy of new compounds and identifying potential antibacterial agents. Although progress has been made, challenges persist, including data quality, model interpretability, and real-world implementation. A multidisciplinary approach that integrates AI with other emerging technologies, such as synthetic biology and nanomedicine, could pave the way for effective prevention and mitigation of antimicrobial resistance, preserving the efficacy of antibiotics for future generations.

11.
Nurs Inq ; : e12648, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38865286

ABSTRACT

Big data refers to extremely large data generated at high volume, velocity, variety, and veracity. The nurse scientist is uniquely positioned to leverage big data to suggest novel hypotheses on patient care and the healthcare system. The purpose of this paper is to provide an introductory guide to understanding the use and capability of big data for nurse scientists. Herein, we discuss the practical, ethical, social, and educational implications of using big data in nursing research. Some practical challenges with the use of big data include data accessibility, data quality, missing data, variable data standards, fragmentation of health data, and software considerations. Opposing ethical positions arise with the use of big data, and arguments for and against the use of big data are underpinned by concerns about confidentiality, anonymity, and autonomy. The use of big data has health equity dimensions and addressing equity in data is an ethical imperative. There is a need to incorporate competencies needed to leverage big data for nursing research into advanced nursing educational curricula. Nursing science has a great opportunity to evolve and embrace the potential of big data. Nurse scientists should not be spectators but collaborators and drivers of policy change to better leverage and harness the potential of big data.

12.
Value Health ; 27(6): 692-701, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38871437

ABSTRACT

This ISPOR Good Practices report provides a framework for assessing the suitability of electronic health records data for use in health technology assessments (HTAs). Although electronic health record (EHR) data can fill evidence gaps and improve decisions, several important limitations can affect its validity and relevance. The ISPOR framework includes 2 components: data delineation and data fitness for purpose. Data delineation provides a complete understanding of the data and an assessment of its trustworthiness by describing (1) data characteristics; (2) data provenance; and (3) data governance. Fitness for purpose comprises (1) data reliability items, ie, how accurate and complete the estimates are for answering the question at hand and (2) data relevance items, which assess how well the data are suited to answer the particular question from a decision-making perspective. The report includes a checklist specific to EHR data reporting: the ISPOR SUITABILITY Checklist. It also provides recommendations for HTA agencies and policy makers to improve the use of EHR-derived data over time. The report concludes with a discussion of limitations and future directions in the field, including the potential impact from the substantial and rapid advances in the diffusion and capabilities of large language models and generative artificial intelligence. The report's immediate audiences are HTA evidence developers and users. We anticipate that it will also be useful to other stakeholders, particularly regulators and manufacturers, in the future.


Subject(s)
Checklist , Electronic Health Records , Technology Assessment, Biomedical , Electronic Health Records/standards , Humans , Reproducibility of Results , Advisory Committees , Decision Making
13.
J Med Internet Res ; 26: e50295, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38941134

ABSTRACT

Artificial intelligence (AI)-based clinical decision support systems are gaining momentum by relying on a greater volume and variety of secondary use data. However, the uncertainty, variability, and biases in real-world data environments still pose significant challenges to the development of health AI, its routine clinical use, and its regulatory frameworks. Health AI should be resilient against real-world environments throughout its lifecycle, including the training and prediction phases and maintenance during production, and health AI regulations should evolve accordingly. Data quality issues, variability over time or across sites, information uncertainty, human-computer interaction, and fundamental rights assurance are among the most relevant challenges. If health AI is not designed resiliently with regard to these real-world data effects, potentially biased data-driven medical decisions can risk the safety and fundamental rights of millions of people. In this viewpoint, we review the challenges, requirements, and methods for resilient AI in health and provide a research framework to improve the trustworthiness of next-generation AI-based clinical decision support.


Subject(s)
Artificial Intelligence , Decision Support Systems, Clinical , Humans
14.
Bioengineering (Basel) ; 11(6)2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38927811

ABSTRACT

Accurate and automated segmentation of brain tissue images can significantly streamline clinical diagnosis and analysis. Manual delineation needs improvement due to its laborious and repetitive nature, while automated techniques encounter challenges stemming from disparities in magnetic resonance imaging (MRI) acquisition equipment and accurate labeling. Existing software packages, such as FSL and FreeSurfer, do not fully replace ground truth segmentation, highlighting the need for an efficient segmentation tool. To better capture the essence of cerebral tissue, we introduce nnSegNeXt, an innovative segmentation architecture built upon the foundations of quality assessment. This pioneering framework effectively addresses the challenges posed by missing and inaccurate annotations. To enhance the model's discriminative capacity, we integrate a 3D convolutional attention mechanism instead of conventional convolutional blocks, enabling simultaneous encoding of contextual information through the incorporation of multiscale convolutional features. Our methodology was evaluated on four multi-site T1-weighted MRI datasets from diverse sources, magnetic field strengths, scanning parameters, temporal instances, and neuropsychiatric conditions. Empirical evaluations on the HCP, SALD, and IXI datasets reveal that nnSegNeXt surpasses the esteemed nnUNet, achieving Dice coefficients of 0.992, 0.987, and 0.989, respectively, and demonstrating superior generalizability across four distinct projects with Dice coefficients ranging from 0.967 to 0.983. Additionally, extensive ablation studies have been implemented to corroborate the effectiveness of the proposed model. These findings represent a notable advancement in brain tissue analysis, suggesting that nnSegNeXt holds the promise to significantly refine clinical workflows.

15.
Pharmacy (Basel) ; 12(3)2024 May 31.
Article in English | MEDLINE | ID: mdl-38921962

ABSTRACT

This study delves into the challenges of pharmaceutical forecasting within the Ethiopian public pharmaceutical supply chain, which is vital for ensuring medicine availability and optimizing healthcare delivery. t It aims to identify and analyze key hindrances to pharmaceutical forecasting in Ethiopia, employing qualitative analysis through semi-structured interviews with stakeholders. Thematic analysis using NVIVO 14 software reveals challenges including finance-related constraints, workforce shortages, and data quality issues. Financial challenges arise from funding uncertainties, causing delayed procurement and stockouts. Workforce shortages hinder accurate forecasting, while data quality issues result from incomplete and untimely reporting. Recommendations include prioritizing healthcare financing, investing in workforce development, and improving data quality through technological advancements and enhanced coordination among stakeholders.

16.
Health Informatics J ; 30(2): 14604582241259336, 2024.
Article in English | MEDLINE | ID: mdl-38848696

ABSTRACT

Keeping track of data semantics and data changes in the databases is essential to support retrospective studies and the reproducibility of longitudinal clinical analysis by preventing false conclusions from being drawn from outdated data. A knowledge model combined with a temporal model plays an essential role in organizing the data and improving query expressiveness across time and multiple institutions. This paper presents a modelling framework for temporal relational databases using an ontology to derive a shareable and interoperable data model. The framework is based on: OntoRela an ontology-driven database modelling approach and Unified Historicization Framework a temporal database modelling approach. The method was applied to hospital organizational structures to show the impact of tracking organizational changes on data quality assessment, healthcare activities and data access rights. The paper demonstrated the usefulness of an ontology to provide a formal, interoperable, and reusable definition of entities and their relationships, as well as the adequacy of the temporal database to store, trace, and query data over time.


Subject(s)
Databases, Factual , Humans , Hospital Administration/methods , Data Management/methods
17.
Sci Rep ; 14(1): 10379, 2024 05 06.
Article in English | MEDLINE | ID: mdl-38710783

ABSTRACT

Citizen science (CS) is the most effective tool for overcoming the limitations of government and/or professional data collection. To compensate for quantitative limitations of the 'Winter Waterbird Census of Korea', we conducted a total of four bird monitoring via CS from 2021 to 2022. To use CS data alongside national data, we studied CS data quality and improvement utilizing (1) digit-based analysis using Benford's law and (2) comparative analysis with national data. In addition, we performed bird community analysis using CS-specific data, demonstrating the necessity of CS. Neither CS nor the national data adhered to Benford's law. Alpha diversity (number of species and Shannon index) was lower, and total beta diversity was higher for the CS data than national data. Regarding the observed bird community, the number of species per family was similar; however, the number of individuals per family/species differed. We also identified the necessity of CS by confirming the possibility of predicting bird communities using CS-specific data. CS was influenced by various factors, including the perceptions of the survey participants and their level of experience. Therefore, conducting CS after systematic training can facilitate the collection of higher-quality data.


Subject(s)
Birds , Censuses , Citizen Science , Animals , Birds/physiology , Republic of Korea , Biodiversity
18.
J Biomed Inform ; 155: 104660, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38788889

ABSTRACT

INTRODUCTION: Electronic Health Records (EHR) are a useful data source for research, but their usability is hindered by measurement errors. This study investigated an automatic error detection algorithm for adult height and weight measurements in EHR for the All of Us Research Program (All of Us). METHODS: We developed reference charts for adult heights and weights that were stratified on participant sex. Our analysis included 4,076,534 height and 5,207,328 wt measurements from âˆ¼ 150,000 participants. Errors were identified using modified standard deviation scores, differences from their expected values, and significant changes between consecutive measurements. We evaluated our method with chart-reviewed heights (8,092) and weights (9,039) from 250 randomly selected participants and compared it with the current cleaning algorithm in All of Us. RESULTS: The proposed algorithm classified 1.4 % of height and 1.5 % of weight errors in the full cohort. Sensitivity was 90.4 % (95 % CI: 79.0-96.8 %) for heights and 65.9 % (95 % CI: 56.9-74.1 %) for weights. Precision was 73.4 % (95 % CI: 60.9-83.7 %) for heights and 62.9 (95 % CI: 54.0-71.1 %) for weights. In comparison, the current cleaning algorithm has inferior performance in sensitivity (55.8 %) and precision (16.5 %) for height errors while having higher precision (94.0 %) and lower sensitivity (61.9 %) for weight errors. DISCUSSION: Our proposed algorithm outperformed in detecting height errors compared to weights. It can serve as a valuable addition to the current All of Us cleaning algorithm for identifying erroneous height values.


Subject(s)
Algorithms , Body Height , Body Weight , Electronic Health Records , Humans , Male , Adult , Female , Middle Aged , United States , Reference Values , Aged , Young Adult
19.
J Proteome Res ; 23(6): 1926-1936, 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38691771

ABSTRACT

Data-independent acquisition has seen breakthroughs that enable comprehensive proteome profiling using short gradients. As the proteome coverage continues to increase, the quality of the data generated becomes much more relevant. Using Spectronaut, we show that the default search parameters can be easily optimized to minimize the occurrence of false positives across different samples. Using an immunological infection model system to demonstrate the impact of adjusting search settings, we analyzed Mus musculus macrophages and compared their proteome to macrophages spiked withCandida albicans. This experimental system enabled the identification of "false positives" as Candida albicans peptides and proteins should not be present in the Mus musculus-only samples. We show that adjusting the search parameters reduced "false positive" identifications by 89% at the peptide and protein level, thereby considerably increasing the quality of the data. We also show that these optimized parameters incurred a moderate cost, only reducing the overall number of "true positive" identifications across each biological replicate by <6.7% at both the peptide and protein level. We believe the value of our updated search parameters extends beyond a two-organism analysis and would be of great value to any DIA experiment analyzing heterogeneous populations of cell types or tissues.


Subject(s)
Candida albicans , Macrophages , Proteome , Proteomics , Animals , Mice , Proteome/analysis , Proteomics/methods , Macrophages/metabolism , Macrophages/immunology , Data Accuracy , Peptides/analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...