Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
medRxiv ; 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39228726

RESUMEN

Background: Generative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges. Objective: This study aims to systematically review the use of generative LLMs, and the effectiveness of relevant techniques in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions. Methods: A Boolean search for peer-reviewed articles was conducted on May 19th, 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and conducted data extraction. Only studies utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included studies and proposed future directions. Results: The initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but were not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses. Conclusion: Our review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.

2.
medRxiv ; 2024 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-38633810

RESUMEN

Background: Large language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. Methods: This study, conducted at Mass General Brigham in Boston, MA, analyzed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We used a randomly annotated sample of 4,949 note sections, filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1,996 note sections without keyword filtering was utilized. We developed prompts for two LLMs, Llama 2 and GPT-4, on HIPAA-compliant cloud-computing platforms using multiple approaches (e.g., both hard and soft prompting and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Results: GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models, achieving a precision of 90.3%, a recall of 94.2%, and an F1-score of 92.2%. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusions: LLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localized models and incorporating medical data and domain knowledge to enhance performance on specific tasks.

3.
Sci Rep ; 14(1): 7831, 2024 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-38570569

RESUMEN

The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.


Asunto(s)
Aprendizaje Automático , Procesamiento de Lenguaje Natural , Femenino , Humanos , Lactante , Programas Informáticos , Registros Electrónicos de Salud , Madres
4.
J Am Soc Mass Spectrom ; 34(12): 2857-2863, 2023 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-37874901

RESUMEN

Liquid chromatography-mass spectrometry (LC-MS) metabolomics studies produce high-dimensional data that must be processed by a complex network of informatics tools to generate analysis-ready data sets. As the first computational step in metabolomics, data processing is increasingly becoming a challenge for researchers to develop customized computational workflows that are applicable for LC-MS metabolomics analysis. Ontology-based automated workflow composition (AWC) systems provide a feasible approach for developing computational workflows that consume high-dimensional molecular data. We used the Automated Pipeline Explorer (APE) to create an AWC for LC-MS metabolomics data processing across three use cases. Our results show that APE predicted 145 data processing workflows across all the three use cases. We identified six traditional workflows and six novel workflows. Through manual review, we found that one-third of novel workflows were executable whereby the data processing function could be completed without obtaining an error. When selecting the top six workflows from each use case, the computational viable rate of our predicted workflows reached 45%. Collectively, our study demonstrates the feasibility of developing an AWC system for LC-MS metabolomics data processing.


Asunto(s)
Hominidae , Programas Informáticos , Animales , Flujo de Trabajo , Metabolómica/métodos , Espectrometría de Masas , Cromatografía Liquida/métodos
5.
Nutrients ; 15(17)2023 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-37686800

RESUMEN

Epidemiological data demonstrate that bovine whole milk is often substituted for human milk during the first 12 months of life and may be associated with adverse infant outcomes. The objective of this study is to interrogate the human and bovine milk metabolome at 2 weeks of life to identify unique metabolites that may impact infant health outcomes. Human milk (n = 10) was collected at 2 weeks postpartum from normal-weight mothers (pre-pregnant BMI < 25 kg/m2) that vaginally delivered term infants and were exclusively breastfeeding their infant for at least 2 months. Similarly, bovine milk (n = 10) was collected 2 weeks postpartum from normal-weight primiparous Holstein dairy cows. Untargeted data were acquired on all milk samples using high-resolution liquid chromatography-high-resolution tandem mass spectrometry (HR LC-MS/MS). MS data pre-processing from feature calling to metabolite annotation was performed using MS-DIAL and MS-FLO. Our results revealed that more than 80% of the milk metabolome is shared between human and bovine milk samples during early lactation. Unbiased analysis of identified metabolites revealed that nearly 80% of milk metabolites may contribute to microbial metabolism and microbe-host interactions. Collectively, these results highlight untargeted metabolomics as a potential strategy to identify unique and shared metabolites in bovine and human milk that may relate to and impact infant health outcomes.


Asunto(s)
Lactancia Materna , Espectrometría de Masas en Tándem , Animales , Femenino , Lactante , Embarazo , Humanos , Bovinos , Cromatografía Liquida , Lactancia , Leche Humana , Metabolómica
6.
Metabolomics ; 19(2): 11, 2023 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-36745241

RESUMEN

BACKGROUND: Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW: This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW: We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.


Asunto(s)
Metabolómica , Programas Informáticos , Metabolómica/métodos , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Manejo de Datos
7.
Metabolites ; 12(1)2022 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-35050209

RESUMEN

Clinical metabolomics emerged as a novel approach for biomarker discovery with the translational potential to guide next-generation therapeutics and precision health interventions. However, reproducibility in clinical research employing metabolomics data is challenging. Checklists are a helpful tool for promoting reproducible research. Existing checklists that promote reproducible metabolomics research primarily focused on metadata and may not be sufficient to ensure reproducible metabolomics data processing. This paper provides a checklist including actions that need to be taken by researchers to make computational steps reproducible for clinical metabolomics studies. We developed an eight-item checklist that includes criteria related to reusable data sharing and reproducible computational workflow development. We also provided recommended tools and resources to complete each item, as well as a GitHub project template to guide the process. The checklist is concise and easy to follow. Studies that follow this checklist and use recommended resources may facilitate other researchers to reproduce metabolomics results easily and efficiently.

8.
JMIR Pediatr Parent ; 4(1): e23842, 2021 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-33666558

RESUMEN

BACKGROUND: Electronic health records (EHRs) hold great potential for longitudinal mother-baby studies, ranging from assessing study feasibility to facilitating patient recruitment to streamlining study visits and data collection. Existing studies on the perspectives of pregnant and breastfeeding women on EHR use have been limited to the use of EHRs to engage in health care rather than to participate in research. OBJECTIVE: The aim of this study is to explore the perspectives of pregnant and breastfeeding women on releasing their own and their infants' EHR data for longitudinal research to identify factors affecting their willingness to participate in research. METHODS: We conducted semistructured interviews with pregnant or breastfeeding women from Alachua County, Florida. Participants were asked about their familiarity with EHRs and EHR patient portals, their comfort with releasing maternal and infant EHR data to researchers, the length of time of the data release, and whether individual research test results should be included in the EHR. The interviews were transcribed verbatim. Transcripts were organized and coded using the NVivo 12 software (QSR International), and coded data were thematically analyzed using constant comparison. RESULTS: Participants included 29 pregnant or breastfeeding women aged between 22 and 39 years. More than half of the sample had at least an associate degree or higher. Nearly all participants (27/29, 93%) were familiar with EHRs and had experience accessing an EHR patient portal. Less than half of the participants (12/29, 41%) were willing to make EHR data available to researchers for the duration of a study or longer. Participants' concerns about sharing EHRs for research purposes emerged in 3 thematic domains: privacy and confidentiality, transparency by the research team, and surrogate decision-making on behalf of infants. The potential release of sensitive or stigmatizing information, such as mental or sexual health history, was considered in the decisions to release EHRs. Some participants viewed the simultaneous use of their EHRs for both health care and research as potentially beneficial, whereas others expressed concerns about mixing their health care with research. CONCLUSIONS: This exploratory study indicates that pregnant and breastfeeding women may be willing to release EHR data to researchers if researchers adequately address their concerns regarding the study design, communication, and data management. Pregnant and breastfeeding women should be included in EHR-based research as long as researchers are prepared to address their concerns.

9.
Sci Total Environ ; 764: 143963, 2021 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-33385644

RESUMEN

Consumption of licit and/or illicit compounds during sporting events has traditionally been monitored using population surveys, medical records, and law enforcement seizure data. This pilot study evaluated the temporal and geospatial patterns in drug consumption during a university football game from wastewater using liquid chromatography tandem mass spectrometry (LC-MS/MS). Untreated wastewater samples were collected from three locations within or near the same football stadium every 30 min during a university football game. This analysis leveraged two LCMS/ MS instruments (Waters Acquity TQD and a Shimadzu 8040) to analyze samples for 58 licit or illicit compounds and some of their metabolites. Bayesian multilevel models were implemented to estimate mass load and population-level drug consumption, while accounting for multiple instrument runs and concentrations censored at the lower limit of quantitation. Overall, 29 compounds were detected in at least one wastewater sample collected during the game. The 10 most common compounds included opioids, anorectics, stimulants, and decongestants. For compounds detected in more than 50% of samples, temporal trends in median mass load were correlated with the timing of the game; peak loads for cocaine and tramadol occurred during the first quarter of the game and for phentermine during the third quarter. Stadium-wide estimates of the number of doses of drugs consumed were rank ordered as follows: oxycodone (n = 3246) > hydrocodone (n = 2260) > phentermine (n = 513) > cocaine (n = 415) > amphetamine (n = 372) > tramadol (n = 360) > pseudoephedrine (n = 324). This analysis represents the most comprehensive assessment of drug consumption during a university football game and indicates that wastewater-based epidemiology has potential to inform public health interventions focused on reducing recreational drug consumption during large-scale sporting events.


Asunto(s)
Aguas Residuales , Contaminantes Químicos del Agua , Teorema de Bayes , Cromatografía Liquida , Humanos , Proyectos Piloto , Detección de Abuso de Sustancias , Espectrometría de Masas en Tándem , Universidades , Aguas Residuales/análisis , Contaminantes Químicos del Agua/análisis
10.
BMC Pregnancy Childbirth ; 21(1): 67, 2021 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-33472584

RESUMEN

BACKGROUND: Investigation of the microbiome during early life has stimulated an increasing number of cohort studies in pregnant and breastfeeding women that require non-invasive biospecimen collection. The objective of this study was to explore pregnant and breastfeeding women's perspectives on longitudinal clinical studies that require non-invasive biospecimen collection and how they relate to study logistics and research participation. METHODS: We completed in-depth semi-structured interviews with 40 women who were either pregnant (n = 20) or breastfeeding (n = 20) to identify their understanding of longitudinal clinical research, the motivations and barriers to their participation in such research, and their preferences for providing non-invasive biospecimen samples. RESULTS: Perspectives on research participation were focused on breastfeeding and perinatal education. Participants cited direct benefits of research participation that included flexible childcare, lactation support, and incentives and compensation. Healthcare providers, physician offices, and social media were cited as credible sources and channels for recruitment. Participants viewed lengthy study visits and child protection as the primary barriers to research participation. The barriers to biospecimen collection were centered on stool sampling, inadequate instructions, and drop-off convenience. CONCLUSION: Women in this study were interested in participating in clinical studies that require non-invasive biospecimen collection, and motivations to participate center on breastfeeding and the potential to make a scientific contribution that helps others. Effectively recruiting pregnant or breastfeeding participants for longitudinal microbiome studies requires protocols that account for participant interests and consideration for their time.


Asunto(s)
Lactancia Materna/psicología , Conocimientos, Actitudes y Práctica en Salud , Mujeres Embarazadas/psicología , Sujetos de Investigación/psicología , Manejo de Especímenes/psicología , Adolescente , Adulto , Femenino , Florida , Humanos , Entrevistas como Asunto , Estudios Longitudinales , Persona de Mediana Edad , Motivación , Embarazo , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA