Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Stud Health Technol Inform ; 310: 584-588, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269876

ABSTRACT

We document the procedure and performance of a rule-based NLP system that, using transfer learning, automatically extracts essential named entities related to drug errors from Japanese free-text incident reports. Subsequently, we used the rule-based annotated data to fine-tune a pre-trained BERT model and examined the performance of medication-related incident report prediction. The rule-based pipeline achieved a macro-F1-score of 0.81 in an internal dataset and the BERT model fine-tuned with rule-annotated data achieved a macro-F1-score of 0.97 and 0.75 for named entity recognition and relation extraction tasks, respectively. The model can be deployed to other, similar problems in medication-related clinical texts.


Subject(s)
Learning , Natural Language Processing , Humans , Medication Errors/prevention & control , Recognition, Psychology , Machine Learning
2.
JMIR Med Educ ; 10: e51388, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38227356

ABSTRACT

Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to various areas of data science education, including machine learning, data visualization, and traditional statistical models. Initially, we generated 3 synthetic data sets for sepsis, acute hypotension, and antiretroviral therapy for HIV infection. This paper discusses the educational applications of Health Gym's synthetic data sets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales, Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic data sets, designed to enrich hands-on tutorial and workshop experiences. Although we highlight the potential of these data sets in advancing data science education and health care artificial intelligence, we also emphasize the need for continued research into the inherent limitations of synthetic data.


Subject(s)
Artificial Intelligence , HIV Infections , Humans , Data Science , HIV Infections/drug therapy , Health Education , Exercise
3.
J Biomed Inform ; 144: 104436, 2023 08.
Article in English | MEDLINE | ID: mdl-37451495

ABSTRACT

OBJECTIVE: Clinical data's confidential nature often limits the development of machine learning models in healthcare. Generative adversarial networks (GANs) can synthesise realistic datasets, but suffer from mode collapse, resulting in low diversity and bias towards majority demographics and common clinical practices. This work proposes an extension to the classic GAN framework that includes a variational autoencoder (VAE) and an external memory mechanism to overcome these limitations and generate synthetic data accurately describing imbalanced class distributions commonly found in clinical variables. METHODS: The proposed method generated a synthetic dataset related to antiretroviral therapy for human immunodeficiency virus (ART for HIV). We evaluated it based on five metrics: (1) accurately representing imbalanced class distribution; (2) the realism of the individual variables; (3) the realism among variables; (4) patient disclosure risk; and (5) the utility of the generated dataset for developing downstream machine learning models. RESULTS: The proposed method overcomes the issue of mode collapse and generates a synthetic dataset that accurately describes imbalanced class distributions commonly found in clinical variables. The generated data has a patient disclosure risk of 0.095%, lower than the 9% threshold stated by Health Canada and the European Medicines Agency, making it suitable for distribution to the research community with high security. The generated data also has high utility, indicating the potential of the proposed method to enable the development of downstream machine learning algorithms for healthcare applications using synthetic data. CONCLUSION: Our proposed extension to the classic GAN framework, which includes a VAE and an external memory mechanism, represents a promising approach towards generating synthetic data that accurately describe imbalanced class distributions commonly found in clinical variables. This method overcomes the limitations of GANs and creates more realistic datasets with higher patient cohort diversity, facilitating the development of downstream machine learning algorithms for healthcare applications.


Subject(s)
HIV Infections , HIV , Humans , Algorithms , Benchmarking , Disclosure , HIV Infections/drug therapy
4.
Sci Data ; 9(1): 693, 2022 11 11.
Article in English | MEDLINE | ID: mdl-36369205

ABSTRACT

In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends in variables over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.


Subject(s)
Algorithms , Comprehensive Health Care , Machine Learning , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...