Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29.427
Filtrar
1.
JMIR Med Educ ; 10: e52784, 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39140269

RESUMEN

Background: With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research. Objective: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE). Methods: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt's designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and κ values were employed to evaluate the model's accuracy and consistency. Results: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with κ values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response. Conclusions: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model's reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.


Asunto(s)
Evaluación Educacional , Licencia Médica , Humanos , China , Evaluación Educacional/métodos , Evaluación Educacional/normas , Reproducibilidad de los Resultados , Competencia Clínica/normas
2.
J Biomed Inform ; 157: 104707, 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39142598

RESUMEN

OBJECTIVE: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs. METHODS: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation. RESULTS: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments. CONCLUSION: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.

3.
Toxicology ; : 153933, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39181527

RESUMEN

To underpin scientific evaluations of chemical risks, agencies such as the European Food Safety Authority (EFSA) heavily rely on the outcome of systematic reviews, which currently require extensive manual effort. One specific challenge constitutes the meaningful use of vast amounts of valuable data from new approach methodologies (NAMs) which are mostly reported in an unstructured way in the scientific literature. In the EFSA-initiated project 'AI4NAMS', the potential of large language models (LLMs) was explored. Models from the GPT family, where GPT refers to Generative Pre-trained Transformer, were used for searching, extracting, and integrating data from scientific publications for NAM-based risk assessment. A case study on bisphenol A (BPA), a substance of very high concern due to its adverse effects on human health, focused on the structured extraction of information on test systems measuring biologic activities of BPA. Fine-tuning of a GPT-3 model (Curie base model) for extraction tasks was tested and the performance of the fine-tuned model was compared to the performance of a ready-to-use model (text-davinci-002). To update findings from the AI4NAMS project and to check for technical progress, the fine-tuning exercise was repeated and a newer ready-to-use model (text-davinci-003) served as comparison. In both cases, the fine-tuned Curie model was found to be superior to the ready-to-use model. Performance improvement was also obvious between text-davinci-002 and the newer text-davinci-003. Our findings demonstrate how fine-tuning and the swift general technical development improve model performance and contribute to the growing number of investigations on the use of AI in scientific and regulatory tasks.

4.
Trends Cogn Sci ; 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39181735

RESUMEN

Young children's screen time is increasing, raising concerns about its negative impact on language development, particularly vocabulary. However, digital media is used in a variety of ways, which likely differentially impact language development. Instead of asking 'how much' screen time, the focus should be on how digital media is used.

5.
Drug Discov Today ; 29(10): 104139, 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39154773

RESUMEN

Automatic eligibility criteria parsing in clinical trials is crucial for cohort recruitment leading to data validity and trial completion. Recent years have witnessed an explosion of powerful machine learning (ML) and natural language processing (NLP) models that can streamline the patient accrual process. In this PRISMA-based scoping review, we comprehensively evaluate existing literature on the application of ML/NLP models for parsing clinical trial eligibility criteria. The review covers 9160 papers published between 2000 and 2024, with 88 publications subjected to data charting along 17 dimensions. Our review indicates insufficient use of state-of-the-art artificial intelligence (AI) models in the analysis of clinical protocols.

6.
Curr Biol ; 2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39168122

RESUMEN

Infants' thoughts are classically characterized as iconic, perceptual-like representations.1,2,3 Less clear is whether preverbal infants also possess a propositional language of thought, where mental symbols are combined according to syntactic rules, very much like words in sentences.4,5,6,7,8,9,10,11,12,13,14,15,16,17 Because it is rich, productive, and abstract, a language of thought would provide a key to explaining impressive achievements in early infancy, from logical inference to representation of false beliefs.18,19,20,21,22,23,24,25,26,27,28,29,30,31 A propositional language-including a language of thought5-implies thematic roles that, in a sentence, indicate the relation between noun and verb phrases, defining who acts on whom; i.e., who is the agent and who is the patient.32,33,34,35,36,37,38,39 Agent and patient roles are abstract in that they generally apply to different situations: whether A kicks, helps, or kisses B, A is the agent and B is the patient. Do preverbal infants represent abstract agent and patient roles? We presented 7-month-olds (n = 143) with sequences of scenes where the posture or relative positioning of two individuals indicated that, across different interactions, A acted on B. Results from habituation (experiment 1) and pupillometry paradigms (experiments 2 and 3) demonstrated that infants showed surprise when roles eventually switched (B acted on A). Thus, while encoding social interactions, infants fill in an abstract relational structure that marks the roles of agent and patient and that can be accessed via different event scenes and properties of the event participants (body postures or positioning). This mental process implies a combinatorial capacity that lays the foundations for productivity and compositionality in language and cognition.

7.
Thromb Res ; 241: 109105, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39116484

RESUMEN

BACKGROUND: Identification of pulmonary embolism (PE) across a cohort currently requires burdensome manual review. Previous approaches to automate capture of PE diagnosis have either been too complex for widespread use or have lacked external validation. We sought to develop and validate the Regular Expression Aided Determination of PE (READ-PE) algorithm, which uses a portable text-matching approach to identify PE in reports from computed tomography with angiography (CTA). METHODS: We identified derivation and validation cohorts of final radiology reports for CTAs obtained on adults (≥ 18 years) at two independent, quaternary academic emergency departments (EDs) in the United States. All reports were in the English language. We manually reviewed CTA reports for PE as a reference standard. In the derivation cohort, we developed the READ-PE algorithm by iteratively combining regular expressions to identify PE. We validated the READ-PE algorithm in an independent cohort, and compared performance against three prior algorithms with sensitivity, specificity, positive-predictive-value (PPV), negative-predictive-value (NPV), and the F1 score. RESULTS: Among 2948 CTAs in the derivation cohort 10.8 % had PE and the READ-PE algorithm reached 93 % sensitivity, 99 % specificity, 94 % PPV, 99 % NPV, and 0.93 F1 score, compared to F1 scores ranging from 0.50 to 0.85 for three prior algorithms. Among 1206 CTAs in the validation cohort 9.2 % had PE and the algorithm had 98 % sensitivity, 98 % specificity, 85 % PPV, 100 % NPV, and 0.91 F1 score. CONCLUSIONS: The externally validated READ-PE algorithm identifies PE in English-language reports from CTAs obtained in the ED with high accuracy. This algorithm may be used in the electronic health record to accurately identify PE for research or surveillance. If implemented at other EDs, it should first undergo local validation and may require maintenance over time.


Asunto(s)
Algoritmos , Embolia Pulmonar , Embolia Pulmonar/diagnóstico por imagen , Embolia Pulmonar/diagnóstico , Humanos , Femenino , Masculino , Persona de Mediana Edad , Adulto , Angiografía por Tomografía Computarizada/métodos , Anciano , Tomografía Computarizada por Rayos X/métodos , Estudios de Cohortes
8.
Account Res ; : 1-17, 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39109816

RESUMEN

The recent emergence of Large Language Models (LLMs) and other forms of Artificial Intelligence (AI) has led people to wonder whether they could act as an author on a scientific paper. This paper argues that AI systems should not be included on the author by-line. We agree with current commentators that LLMs are incapable of taking responsibility for their work and thus do not meet current authorship guidelines. We identify other problems with responsibility and authorship. In addition, the problems go deeper as AI tools also do not write in a meaningful sense nor do they have persistent identities. From a broader publication ethics perspective, adopting AI authorship would have detrimental effects on an already overly competitive and stressed publishing ecosystem. Deterrence is possible as backward-looking tools will likely be able to identify past AI usage. Finally, we question the value of using AI to produce more research simply for publication's sake.

9.
Ophthalmol Ther ; 2024 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-39180701

RESUMEN

A large language model (LLM) is an artificial intelligence (AI) model that uses natural language processing (NLP) to understand, interpret, and generate human-like language responses from unstructured text input. Its real-time response capabilities and eloquent dialogue enhance the interactive user experience in human-AI communication like never before. By gathering several sources on the internet, LLM chatbots can interact and respond to a wide range of queries, including problem solving, text summarization, and creating informative notes. Since ophthalmology is one of the medical fields integrating image analysis, telemedicine, AI, and other technologies, LLMs are likely to play an important role in eye care in the near future. This review summarizes the performance and potential applicability of LLMs in ophthalmology according to currently available publications.

10.
Front Aging Neurosci ; 16: 1398015, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39170898

RESUMEN

Introduction: Numerous studies have highlighted cognitive benefits in lifelong bilinguals during aging, manifesting as superior performance on cognitive tasks compared to monolingual counterparts. Yet, the cognitive impacts of acquiring a new language in older adulthood remain unexplored. In this study, we assessed both behavioral and fMRI responses during a Stroop task in older adults, pre- and post language-learning intervention. Methods: A group of 41 participants (age:60-80) from a predominantly monolingual environment underwent a four-month online language course, selecting a new language of their preference. This intervention mandated engagement for 90 minutes a day, five days a week. Daily tracking was employed to monitor progress and retention. All participants completed a color-word Stroop task inside the scanner before and after the language instruction period. Results: We found that performance on the Stroop task, as evidenced by accuracy and reaction time, improved following the language learning intervention. With the neuroimaging data, we observed significant differences in activity between congruent and incongruent trials in key regions in the prefrontal and parietal cortex. These results are consistent with previous reports using the Stroop paradigm. We also found that the amount of time participants spent with the language learning program was related to differential activity in these brain areas. Specifically, we found that people who spent more time with the language learning program showed a greater increase in differential activity between congruent and incongruent trials after the intervention relative to before. Discussion: Future research is needed to determine the optimal parameters for language learning as an effective cognitive intervention for aging populations. We propose that with sufficient engagement, language learning can enhance specific domains of cognition such as the executive functions. These results extend the understanding of cognitive reserve and its augmentation through targeted interventions, setting a foundation for future investigations.

11.
Cureus ; 16(7): e65083, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39171020

RESUMEN

Objectives Large language models (LLMs), for example, ChatGPT, have performed exceptionally well in various fields. Of note, their success in answering postgraduate medical examination questions has been previously reported, indicating their possible utility in surgical education and training. This study evaluated the performance of four different LLMs on the American Board of Thoracic Surgery's (ABTS) Self-Education and Self-Assessment in Thoracic Surgery (SESATS) XIII question bank to investigate the potential applications of these LLMs in the education and training of future surgeons. Methods The dataset in this study comprised 400 best-of-four questions from the SESATS XIII exam. This included 220 adult cardiac surgery questions, 140 general thoracic surgery questions, 20 congenital cardiac surgery questions, and 20 cardiothoracic critical care questions. The GPT-3.5 (OpenAI, San Francisco, CA) and GPT-4 (OpenAI) models were evaluated, as well as Med-PaLM 2 (Google Inc., Mountain View, CA) and Claude 2 (Anthropic Inc., San Francisco, CA), and their respective performances were compared. The subspecialties included were adult cardiac, general thoracic, congenital cardiac, and critical care. Questions requiring visual information, such as clinical images or radiology, were excluded. Results GPT-4 demonstrated a significant improvement over GPT-3.5 overall (87.0% vs. 51.8% of questions answered correctly, p < 0.0001). GPT-4 also exhibited consistently improved performance across all subspecialties, with accuracy rates ranging from 70.0% to 90.0%, compared to 35.0% to 60.0% for GPT-3.5. When using the GPT-4 model, ChatGPT performed significantly better on the adult cardiac and general thoracic subspecialties (p < 0.0001). Conclusions Large language models, such as ChatGPT with the GPT-4 model, demonstrate impressive skill in understanding complex cardiothoracic surgical clinical information, achieving an overall accuracy rate of nearly 90.0% on the SESATS question bank. Our study shows significant improvement between successive GPT iterations. As LLM technology continues to evolve, its potential use in surgical education, training, and continuous medical education is anticipated to enhance patient outcomes and safety in the future.

12.
Neuroimage ; 298: 120809, 2024 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-39187220

RESUMEN

Conceptual preparation is the very initial step in language production. Endogenous semantic variables, reflecting the inherent semantic properties of concepts, could influence the productive lexical retrieval by modulating both conceptual activation and lexical selection. Yet, empirical understandings on this process and underlying mechanisms remain limited. Here, inspired by previous theoretical models and preliminary findings, we proposed a Behavioral-Neural Dual Swinging Model (DSM), revealing the swinging process between conceptual facilitation and lexical interference and extending to neural resource allocation during these processes. To further test the model, we examined the joint effect of semantic richness and semantic density on productive word retrieval both behaviorally and neurally, using a picture naming paradigm. Results nicely support the DSM by showing that the productive retrieval is driven by the swinging between semantic richness-induced conceptual facilitation primarily managed in semantic-related regions and semantic density-induced lexical interference managed in control-related regions. Moreover, the conceptual facilitation accumulated from semantic richness plays a decisive role, mitigating interference from competitors as well as the neural demands in control-related regions.

13.
J Psycholinguist Res ; 53(5): 66, 2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39160280

RESUMEN

The fluency of second language (L2) speech can be influenced by L2 proficiency, but also by differences in the efficiency of cognitive operations and personal speaking styles. The nature of cognitive fluency is still, however, little understood. Therefore, we studied the cognitive fluency of Finnish advanced students of English (N = 64) to understand how the efficiency of cognitive processing influences speech rate. Cognitive fluency was operationalised as automaticity of lexical access (measured by rapid word recognition) and attention control (measured by the Stroop task). The tasks were conducted in both L1 (Finnish) and L2 (English) to examine the (dis)similarity of processing in the two languages. Speech rate in a monologue task was used as the dependent measure of speaking performance. The results showed that after controlling for the L1 speech rate and L1 cognitive fluency, the L2 attention control measures explained a small amount of additional variance in L2 speech rate. These results are discussed in relation to the cognitive fluency framework and general speaking proficiency research.


Asunto(s)
Cognición , Multilingüismo , Habla , Humanos , Habla/fisiología , Masculino , Femenino , Adulto , Cognición/fisiología , Adulto Joven , Atención/fisiología , Lenguaje , Psicolingüística , Finlandia
14.
Healthc Inform Res ; 30(3): 266-276, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39160785

RESUMEN

OBJECTIVES: Sepsis is a leading global cause of mortality, and predicting its outcomes is vital for improving patient care. This study explored the capabilities of ChatGPT, a state-of-the-art natural language processing model, in predicting in-hospital mortality for sepsis patients. METHODS: This study utilized data from the Korean Sepsis Alliance (KSA) database, collected between 2019 and 2021, focusing on adult intensive care unit (ICU) patients and aiming to determine whether ChatGPT could predict all-cause mortality after ICU admission at 7 and 30 days. Structured prompts enabled ChatGPT to engage in in-context learning, with the number of patient examples varying from zero to six. The predictive capabilities of ChatGPT-3.5-turbo and ChatGPT-4 were then compared against a gradient boosting model (GBM) using various performance metrics. RESULTS: From the KSA database, 4,786 patients formed the 7-day mortality prediction dataset, of whom 718 died, and 4,025 patients formed the 30-day dataset, with 1,368 deaths. Age and clinical markers (e.g., Sequential Organ Failure Assessment score and lactic acid levels) showed significant differences between survivors and non-survivors in both datasets. For 7-day mortality predictions, the area under the receiver operating characteristic curve (AUROC) was 0.70-0.83 for GPT-4, 0.51-0.70 for GPT-3.5, and 0.79 for GBM. The AUROC for 30-day mortality was 0.51-0.59 for GPT-4, 0.47-0.57 for GPT-3.5, and 0.76 for GBM. Zero-shot predictions using GPT-4 for mortality from ICU admission to day 30 showed AUROCs from the mid-0.60s to 0.75 for GPT-4 and mainly from 0.47 to 0.63 for GPT-3.5. CONCLUSIONS: GPT-4 demonstrated potential in predicting short-term in-hospital mortality, although its performance varied across different evaluation metrics.

15.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39162312

RESUMEN

Antibodies play a pivotal role in immune defense and serve as key therapeutic agents. The process of affinity maturation, wherein antibodies evolve through somatic mutations to achieve heightened specificity and affinity to target antigens, is crucial for effective immune response. Despite their significance, assessing antibody-antigen binding affinity remains challenging due to limitations in conventional wet lab techniques. To address this, we introduce AntiFormer, a graph-based large language model designed to predict antibody binding affinity. AntiFormer incorporates sequence information into a graph-based framework, allowing for precise prediction of binding affinity. Through extensive evaluations, AntiFormer demonstrates superior performance compared with existing methods, offering accurate predictions with reduced computational time. Application of AntiFormer to severe acute respiratory syndrome coronavirus 2 patient samples reveals antibodies with strong neutralizing capabilities, providing insights for therapeutic development and vaccination strategies. Furthermore, analysis of individual samples following influenza vaccination elucidates differences in antibody response between young and older adults. AntiFormer identifies specific clonotypes with enhanced binding affinity post-vaccination, particularly in young individuals, suggesting age-related variations in immune response dynamics. Moreover, our findings underscore the importance of large clonotype category in driving affinity maturation and immune modulation. Overall, AntiFormer is a promising approach to accelerate antibody-based diagnostics and therapeutics, bridging the gap between traditional methods and complex antibody maturation processes.


Asunto(s)
SARS-CoV-2 , Humanos , SARS-CoV-2/inmunología , SARS-CoV-2/genética , COVID-19/virología , COVID-19/inmunología , Afinidad de Anticuerpos , Anticuerpos Antivirales/inmunología , Anticuerpos Neutralizantes/inmunología , Biología Computacional/métodos , Unión Proteica
16.
F1000Res ; 13: 499, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39165348

RESUMEN

Background: Learning apps can be helpful to non-native language learners in learning Arabic, which includes speaking, writing, and speaking exercises. When learners become better in the language, they become more confident in interacting with the community, thus affecting their Cultural Intelligence (CQ) and Acculturation (AC). This study aimed to explore the relationship between the CQ and AC among non-native learners of Arabic. Additionally, the study aimed to investigate the potential impacts of learning apps and gender. Methods: This study used a correlational approach, involving a sample of 102 non-native Arabic language learners in Jordan. To assess these factors, this study used the Cultural Intelligence Scale and the Acculturation Survey. Results: The findings of this study revealed a positive correlation between the CQ and AC. Furthermore, the use of apps can provide CQ and AC levels. In addition, the study determined that gender did not play a significant role in influencing learners. Conclusion: the utilization of educational apps has been shown to enhance both CQ and AC. Thus, it is imperative to encourage learners to engage with these apps, as they foster cultural awareness, thereby facilitating the process of learning Arabic.


Asunto(s)
Aculturación , Lenguaje , Aprendizaje , Aplicaciones Móviles , Humanos , Masculino , Femenino , Adulto , Adulto Joven , Árabes , Adolescente , Jordania , Inteligencia , Encuestas y Cuestionarios
17.
Rev Bras Med Trab ; 22(1): e20231241, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39165532

RESUMEN

This article explores the impact and potential applications of large language models in Occupational Medicine. Large language models have the ability to provide support for medical decision-making, patient screening, summarization and creation of technical, scientific, and legal documents, training and education for doctors and occupational health teams, as well as patient education, potentially leading to lower costs, reduced time expenditure, and a lower incidence of human errors. Despite promising results and a wide range of applications, large language models also have significant limitations in terms of their accuracy, the risk of generating false information, and incorrect recommendations. Various ethical aspects that have not been well elucidated by the medical and academic communities should also be considered, and the lack of regulation by government entities can create areas of legal uncertainty regarding their use in Occupational Medicine and in the legal environment. Significant future improvements can be expected in these models in the coming years, and further studies on the applications of large language models in Occupational Medicine should be encouraged.


Este artigo explora o impacto e as possíveis aplicações dos grandes modelos de linguagem na Medicina do Trabalho. Os grandes modelos de linguagem têm a capacidade de fornecer suporte durante a tomada de decisão médica, a triagem de pacientes, a sumarização e confecção de documentos técnicos, científicos e jurídicos, o treinamento e educação de médicos e da equipe de saúde ocupacional, bem como a educação de pacientes, potencialmente levando a menores custos, menor gasto de tempo e menor incidência de erros humanos. Apesar dos resultados promissores e da grande variabilidade de aplicações, os grandes modelos de linguagem apresentam também limitações significativas em relação à sua acurácia, ao risco de geração de informações falsas e a recomendações errôneas. Também devem ser considerados diversos aspectos éticos ainda não bem elucidados pela comunidade médica e acadêmica, e a falta de regulamentação pelas entidades governamentais pode gerar áreas de incerteza jurídica sobre o seu uso na Medicina do Trabalho e no ambiente judicial. Melhorias futuras significativas podem ser esperadas nesses modelos nos próximos anos, e mais estudos das aplicações dos grandes modelos de linguagem na Medicina do Trabalho devem ser encorajados.

18.
Front Artif Intell ; 7: 1343214, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39165903

RESUMEN

The relevance and importance of voting advice applications (VAAs) are demonstrated by their popularity among potential voters. On average, around 30% of voters take into account the recommendations of these applications during elections. The comparison between potential voters' and parties' positions is made on the basis of VAA policy statements on which users are asked to express opinions. VAA designers devote substantial time and effort to analyzing domestic and international politics to formulate policy statements and select those to be included in the application. This procedure involves manually reading and evaluating a large volume of publicly available data, primarily party manifestos. A problematic part of the work is the limited time frame. This study proposes a system to assist VAA designers in formulating, revising, and selecting policy statements. Using pre-trained language models and machine learning methods to process politics-related textual data, the system produces a set of suggestions corresponding to relevant VAA statements. Experiments were conducted using party manifestos and YouTube comments from Japan, combined with VAA policy statements from six Japanese and two European VAAs. The technical approaches used in the system are based on the BERT language model, which is known for its capability to capture the context of words in the documents. Although the output of the system does not completely eliminate the need for manual human assessment, it provides valuable suggestions for updating VAA policy statements on an objective, i.e., bias-free, basis.

19.
Front Hum Neurosci ; 18: 1421435, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39165904

RESUMEN

Neurolinguistic assessments play a vital role in neurological examinations, revealing a wide range of language and communication impairments associated with developmental disorders and acquired neurological conditions. Yet, a thorough neurolinguistic assessment is time-consuming and laborious and takes valuable resources from other tasks. To empower clinicians, healthcare providers, and researchers, we have developed Open Brain AI (OBAI). The aim of this computational platform is twofold. First, it aims to provide advanced AI tools to facilitate spoken and written language analysis, automate the analysis process, and reduce the workload associated with time-consuming tasks. The platform currently incorporates multilingual tools for English, Danish, Dutch, Finnish, French, German, Greek, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, and Swedish. The tools involve models for (i) audio transcription, (ii) automatic translation, (iii) grammar error correction, (iv) transcription to the International Phonetic Alphabet, (v) readability scoring, (vi) phonology, morphology, syntax, semantic measures (e.g., counts and proportions), and lexical measures. Second, it aims to support clinicians in conducting their research and automating everyday tasks with "OBAI Companion," an AI language assistant that facilitates language processing, such as structuring, summarizing, and editing texts. OBAI also provides tools for automating spelling and phonology scoring. This paper reviews OBAI's underlying architectures and applications and shows how OBAI can help professionals focus on higher-value activities, such as therapeutic interventions.

20.
PNAS Nexus ; 3(8): pgae278, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39166099

RESUMEN

Theorists have argued that morality builds on several core modular foundations. When do different moral foundations emerge in life? Prior work has explored the conceptual development of different aspects of morality in childhood. Here, we offer an alternative approach to investigate the developmental emergence of moral foundations through the lexicon, namely the words used to talk about moral foundations. We develop a large-scale longitudinal analysis of the linguistic mentions of five moral foundations (in both virtuous and vicious forms) in naturalistic speech between English-speaking children with ages ranging from 1 to 6 and their caretakers. Using computational methods, we collect a dataset of 1,371 human-annotated moral utterances and automatically annotate around one million utterances in child-caretaker conversations. We discover that in childhood, words for expressing the individualizing moral foundations (i.e. Care/Harm, Fairness/Cheating) tend to emerge earlier and more frequently than words for expressing the binding moral foundations (i.e. Authority/Subversion, Loyalty/Betrayal, Purity/Degradation), and words for Care/Harm are expressed substantially more often than the other foundations. We find significant differences between children and caretakers in how often they talk about Fairness, Cheating, and Degradation. Furthermore, we show that the information embedded in childhood speech allows computational models to predict moral judgment of novel scenarios beyond the scope of child-caretaker conversations. Our work provides a large-scale documentation of the moral foundational lexicon in early linguistic communication in English and forges a new link between moral language development and computational studies of morality.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...