ABSTRACT
Objectives@#Skin cancer is a prevalent type of malignancy, necessitating efficient diagnostic tools. This study aimed to develop an automated skin lesion classification model using the dynamically expandable representation (DER) incremental learning algorithm. This algorithm adapts to new data and expands its classification capabilities, with the goal of creating a scalable and efficient system for diagnosing skin cancer. @*Methods@#The DER model with incremental learning was applied to the HAM10000 and ISIC 2019 datasets. Validation involved two steps: initially, training and evaluating the HAM10000 dataset against a fixed ResNet-50; subsequently, performing external validation of the trained model using the ISIC 2019 dataset. The model’s performance was assessed using precision, recall, the F1-score, and area under the precision-recall curve. @*Results@#The developed skin lesion classification model demonstrated high accuracy and reliability across various types of skin lesions, achieving a weighted-average precision, recall, and F1-score of 0.918, 0.808, and 0.847, respectively. The model’s discrimination performance was reflected in an average area under the curve (AUC) value of 0.943. Further external validation with the ISIC 2019 dataset confirmed the model’s effectiveness, as shown by an AUC of 0.911. @*Conclusions@#This study presents an optimized skin lesion classification model based on the DER algorithm, which shows high performance in disease classification with the potential to expand its classification range. The model demonstrated robust results in external validation, indicating its adaptability to new disease classes.
ABSTRACT
Objective@#Early detection and intervention of developmental disabilities (DDs) are critical to improving the long-term outcomes of afflicted children. In this study, our objective was to utilize facial landmark features from mobile application to distinguish between children with DDs and typically developing (TD) children. @*Methods@#The present study recruited 89 children, including 33 diagnosed with DD, and 56 TD children. The aim was to examine the effectiveness of a deep learning classification model using facial video collected from children through mobile-based application. The study participants underwent comprehensive developmental assessments, which included the child completion of the Korean Psychoeducational Profile-Revised and caregiver completing the Korean versions of Vineland Adaptive Behavior Scale, Korean version of the Childhood Autism Rating Scale, Social Responsiveness Scale, and Child Behavior Checklist. We extracted facial landmarks from recorded videos using mobile application and performed DDs classification using long short-term memory with stratified 5-fold cross-validation. @*Results@#The classification model shows an average accuracy of 0.88 (range: 0.78–1.00), an average precision of 0.91 (range: 0.75–1.00), and an average F1-score of 0.80 (range: 0.60–1.00). Upon interpreting prediction results using SHapley Additive exPlanations (SHAP), we verified that the most crucial variable was the nodding head angle variable, with a median SHAP score of 2.6. All the top 10 contributing variables exhibited significant differences in distribution between children with DD and TD (p<0.05). @*Conclusion@#The results of this study provide evidence that facial landmarks, utilizing readily available mobile-based video data, can be used to detect DD at an early stage.
ABSTRACT
Objectives@#The objective of this study was to develop and validate a multicenter-based, multi-model, time-series deep learning model for predicting drug-induced liver injury (DILI) in patients taking angiotensin receptor blockers (ARBs). The study leveraged a national-level multicenter approach, utilizing electronic health records (EHRs) from six hospitals in Korea. @*Methods@#A retrospective cohort analysis was conducted using EHRs from six hospitals in Korea, comprising a total of 10,852 patients whose data were converted to the Common Data Model. The study assessed the incidence rate of DILI among patients taking ARBs and compared it to a control group. Temporal patterns of important variables were analyzed using an interpretable timeseries model. @*Results@#The overall incidence rate of DILI among patients taking ARBs was found to be 1.09%. The incidence rates varied for each specific ARB drug and institution, with valsartan having the highest rate (1.24%) and olmesartan having the lowest rate (0.83%). The DILI prediction models showed varying performance, measured by the average area under the receiver operating characteristic curve, with telmisartan (0.93), losartan (0.92), and irbesartan (0.90) exhibiting higher classification performance. The aggregated attention scores from the models highlighted the importance of variables such as hematocrit, albumin, prothrombin time, and lymphocytes in predicting DILI. @*Conclusions@#Implementing a multicenter-based timeseries classification model provided evidence that could be valuable to clinicians regarding temporal patterns associated with DILI in ARB users. This information supports informed decisions regarding appropriate drug use and treatment strategies.
ABSTRACT
In this study, the Search Your Mind (S.Y.M., 心) project aimed to collect prospective digital phenotypic data centered on mood and anxiety symptoms across psychiatric disorders through a smartphone application (app) platform while using both centralized and decentralized research designs: the centralized research design is a hybrid of a general prospective observational study and a digital platform-based study, and it includes face-to-face research such as informed written consent, clinical evaluation, and blood sampling. It also includes digital phenotypic assessment through an application-based platform using wearable devices. Meanwhile, the decentralized research design is a non-face-to-face study in which anonymous participants agree to electronic informed consent forms on the app. It also exclusively uses an application-based platform to acquire individualized digital phenotypic data. We expect to collect clinical, biological, and digital phenotypic data centered on mood and anxiety symptoms, and we propose a possible model of centralized and decentralized research design.
ABSTRACT
Objectives@#This review article examines international examples of personal health records (PHRs) in advanced countries and discusses the implications of these examples for the establishment and utilization of PHRs in South Korea. @*Methods@#This article synthesized PHR case reports of Organization for Economic Co-operation and Development (OECD) member countries, the Global Digital Health Partnership website on PHRs, and patient portals of individual countries to review the status of PHR services. The concept and significance of PHRs were also discussed with respect to PHR utilization status in European Union and OECD countries. @*Results@#A review of international PHR services showed that the countries shared common points regarding the establishment of Electronic Health Records and national health information infrastructure. In addition, the countries provided services centered on primary healthcare institutions and public hospitals. However, promoting more positive participation and increasing the PHR acceptance rate requires workflow integration, including Electronic Medical Records, the provision of incentives, and the preparation of a supportive legal framework. @*Conclusions@#South Korea is also conducting a national-level PHR project. Since the scope of PHRs is extensive and a wide range of PHR services must be connected, an extensive trial-and-error process will be necessary. A long-term strategy should be prepared, and necessary resources should be secured to establish national-level PHRs.
ABSTRACT
Objectives@#This review article examines international examples of personal health records (PHRs) in advanced countries and discusses the implications of these examples for the establishment and utilization of PHRs in South Korea. @*Methods@#This article synthesized PHR case reports of Organization for Economic Co-operation and Development (OECD) member countries, the Global Digital Health Partnership website on PHRs, and patient portals of individual countries to review the status of PHR services. The concept and significance of PHRs were also discussed with respect to PHR utilization status in European Union and OECD countries. @*Results@#A review of international PHR services showed that the countries shared common points regarding the establishment of Electronic Health Records and national health information infrastructure. In addition, the countries provided services centered on primary healthcare institutions and public hospitals. However, promoting more positive participation and increasing the PHR acceptance rate requires workflow integration, including Electronic Medical Records, the provision of incentives, and the preparation of a supportive legal framework. @*Conclusions@#South Korea is also conducting a national-level PHR project. Since the scope of PHRs is extensive and a wide range of PHR services must be connected, an extensive trial-and-error process will be necessary. A long-term strategy should be prepared, and necessary resources should be secured to establish national-level PHRs.
ABSTRACT
PURPOSE: Discontinuation of hormone therapy is known to lead to a poorer prognosis in breast cancer patients. We aimed to investigate the prescription gap as a prompt index of medication adherence by using prescription data extracted from patient electronic medical records. METHODS: A total of 5,928 patients diagnosed with invasive, non-metastatic breast cancer, who underwent surgery from January 1, 1997 to December 31, 2009, were enrolled retrospectively. The prescription data for 4.5 years of hormonal treatment and breast cancer-related events after treatment completion were analyzed. We examined the characteristics and prognoses of breast cancer in patients with and without a 4-week gap. RESULTS: Patients with a gap showed a significantly higher risk of breast cancer recurrence, distant metastasis, breast cancer-specific death, and overall death after adjustment (hazard ratio [HR], 1.389; 95% confidence interval [CI], 1.089–1.772; HR, 1.568; 95% CI, 1.158–2.123; HR, 2.108; 95% CI, 1.298–3.423; and HR, 2.102; 95% CI, 1.456–3.034, respectively). When patients were categorized based on gap summation, the lower third (160 days) and fourth (391 days) quartiles showed a significantly higher risk of distant metastasis (HR, 1.758; 95% CI, 1.186–2.606 and HR, 1.844; 95% CI, 1.262–2.693, respectively). CONCLUSION: A gap of > 4 weeks in hormonal treatment has negative effects on breast cancer prognosis, and can hence be used as a sentinel index of higher risk due to treatment non-adherence. Further evaluation is needed to determine whether the gap can be used as a universal index for monitoring the adherence to hormonal treatment.
Subject(s)
Humans , Breast Neoplasms , Breast , Electronic Health Records , Estrogen Antagonists , Medication Adherence , Neoplasm Metastasis , Prescriptions , Prognosis , Recurrence , Retrospective StudiesABSTRACT
Recent rapid advances in artificial intelligence (AI), especially in deep learning methods, have produced meaningful results in many areas. However, to achieve meaningful results for healthcare through AI, it is important to understand the meaning and characteristics of data in that area. For medical AI, a simple approach that accumulates massive amounts of data based on existing big data concepts cannot provide meaningful results in the healthcare field. We need well-curated data as opposed to a simple aggregation of data. The purpose of this study is to present the types and characteristics of healthcare data and future directions for the successful combination of AI and medical care.
Subject(s)
Artificial Intelligence , Delivery of Health Care , Korea , Learning , Machine LearningABSTRACT
Lung squamous cell cancer (SCC) is typically found in smokers and has a very low incidence in non-smokers, indicating differences in the tumor biology of lung SCC in smokers and non-smokers. However, the specific mutations that drive tumor growth in non-smokers have not been identified. To identify mutations in lung SCC of non-smokers, we performed a genetic analysis using arrays comparative genomic hybridization (ArrayCGH). We analyzed 19 patients with lung SCC who underwent surgical treatment between April 2005 and April 2015. Clinical characteristics were reviewed, and DNA was extracted from fresh frozen lung cancer specimens. All of copy number alterations from ArrayCGH were validated using The Cancer Genome Atlas (TCGA) copy number variation (CNV) data of lung SCC. We examined the frequency of copy number changes according to the smoking status (non-smoker [n = 8] or smoker [n = 11]). We identified 16 significantly altered regions from ArrayCGH data, three gain and four loss regions overlapped with the TCGA lung squamous cell carcinoma (LUSC) patients. Within these overlapped significant regions, we detected 15 genes that have been reported in the Cancer Gene census. We also found that the proto-oncogene GAB2 (11q14.1) was significantly amplified in non-smokers patients and vice versa in both ArrayCGH and TCGA data. Immunohistochemical analyses showed that GAB2 protein was relatively upregulated in non-smoker than smoker tissues (37.5% vs. 9.0%, P = 0.007). GAB2 amplification may have an important role in the development of lung SCC in non-smokers. GAB2 may represent a potential biomarker for lung SCC in non-smokers.
Subject(s)
Humans , Biology , Carcinoma, Squamous Cell , Censuses , Comparative Genomic Hybridization , DNA , Epithelial Cells , Genes, Neoplasm , Genome , Incidence , Lung Neoplasms , Lung , Neoplasms, Squamous Cell , Proto-Oncogenes , Smoke , SmokingABSTRACT
The propensity score is defined as the probability of each individual study subject being assigned to a group of interest for comparison purposes. Propensity score adjustment is a method of ensuring an even distribution of confounders between groups, thereby increasing between group comparability. Propensity score analysis is therefore an increasingly applied statistical method in observational studies. The purpose of this article was to provide a step-by-step nonmathematical conceptual guide to propensity score analysis with particular emphasis on propensity score matching. A software program code used for propensity score matching was also presented.
Subject(s)
Female , Humans , Male , Middle Aged , Propensity Score , Radiology/methods , Research Design , Research Personnel , SoftwareABSTRACT
De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research.
Subject(s)
Humans , Algorithms , Data Anonymization , Electronic Health Records , Health Records, Personal , Multilingualism , Natural Language Processing , Research DesignABSTRACT
OBJECTIVES: Health Avatar Beans was for the management of chronic kidney disease and end-stage renal disease (ESRD). This article is about the DialysisNet system in Health Avatar Beans for the seamless management of ESRD based on the personal health record. METHODS: For hemodialysis data modeling, we identified common data elements for hemodialysis information (CDEHI). We used ASTM continuity of care record (CCR) and ISO/IEC 11179 for the compliance method with a standard model for the CDEHI. According to the contents of the ASTM CCR, we mapped the CDHEI to the contents and created the metadata from that. It was transformed and parsed into the database and verified according to the ASTM CCR/XML schema definition (XSD). DialysisNet was created as an iPad application. The contents of the CDEHI were categorized for effective management. For the evaluation of information transfer, we used CarePlatform, which was developed for data access. The metadata of CDEHI in DialysisNet was exchanged by the CarePlatform with semantic interoperability. RESULTS: The CDEHI was separated into a content list for individual patient data, a contents list for hemodialysis center data, consultation and transfer form, and clinical decision support data. After matching to the CCR, the CDEHI was transformed to metadata, and it was transformed to XML and proven according to the ASTM CCR/XSD. DialysisNet has specific consideration of visualization, graphics, images, statistics, and database. CONCLUSIONS: We created the DialysisNet application, which can integrate and manage data sources for hemodialysis information based on CCR standards.
Subject(s)
Humans , Chronic Disease , Compliance , Continuity of Patient Care , Fabaceae , Health Information Management , Health Records, Personal , Information Storage and Retrieval , Kidney Failure, Chronic , Renal Dialysis , Renal Insufficiency, Chronic , SemanticsABSTRACT
OBJECTIVES: Extension of the standard model while retaining compliance with it is a challenging issue because there is currently no method for semantically or syntactically verifying an extended data model. A metadata-based extended model, named CCR+, was designed and implemented to achieve interoperability between standard and extended models. METHODS: Furthermore, a multilayered validation method was devised to validate the standard and extended models. The American Society for Testing and Materials (ASTM) Community Care Record (CCR) standard was selected to evaluate the CCR+ model; two CCR and one CCR+ XML files were evaluated. RESULTS: In total, 188 metadata were extracted from the ASTM CCR standard; these metadata are semantically interconnected and registered in the metadata registry. An extended-data-model-specific validation file was generated from these metadata. This file can be used in a smartphone application (Health Avatar CCR+) as a part of a multilayered validation. The new CCR+ model was successfully evaluated via a patient-centric exchange scenario involving multiple hospitals, with the results supporting both syntactic and semantic interoperability between the standard CCR and extended, CCR+, model. CONCLUSIONS: A feasible method for delivering an extended model that complies with the standard model is presented herein. There is a great need to extend static standard models such as the ASTM CCR in various domains: the methods presented here represent an important reference for achieving interoperability between standard and extended models.
Subject(s)
Humans , Compliance , Health Records, Personal , Methods , SemanticsABSTRACT
OBJECTIVES: Classification of data elements (DEs), which is used in clinical documents is challenging, even in across ISO/IEC 11179 compliant clinical metadata registries (MDRs) due to no existence of reliable standard for identifying DEs. We suggest the Clinical Data Element Ontology (CDEO) for unified indexing and retrieval of DEs across MDRs. METHODS: The CDEO was developed through harmonization of existing clinical document models and empirical analysis of MDRs. For specific classification as using data element concept (DEC), The Simple Knowledge Organization System was chosen to represent and organize the DECs. Six basic requirements also were set that the CDEO must meet, including indexing target to be a DEC, organizing DECs using their semantic relationships. For evaluation of the CDEO, three indexers mapped 400 DECs to more than 1 CDEO term in order to determine whether the CDEO produces a consistent index to a given DEC. The level of agreement among the indexers was determined by calculating the intraclass correlation coefficient (ICC). RESULTS: We developed CDEO with 578 concepts. Through two application use-case scenarios, usability of the CDEO is evaluated and it fully met all of the considered requirements. The ICC among the three indexers was estimated to be 0.59 (95% confidence interval, 0.52-0.66). CONCLUSIONS: The CDEO organizes DECs originating from different MDRs into a single unified conceptual structure. It enables highly selective search and retrieval of relevant DEs from multiple MDRs for clinical documentation and clinical research data aggregation.
Subject(s)
Abstracting and Indexing , Classification , Data Collection , Information Dissemination , Information Storage and Retrieval , Registries , SemanticsABSTRACT
Around the world electronic health records data are being shared and exchanged between two different systems for direct patient care, as well as for research, reimbursement, quality assurance, epidemiology, public health, and policy development. It is important to communicate the semantic meaning of the clinical data when exchanging electronic health records data. In order to achieve semantic interoperability of clinical data, it is important not only to specify clinical entries and documents and the structure of data in electronic health records, but also to use clinical terminology to describe clinical data. There are three types of clinical terminology: interface terminology to support a user-friendly structured data entry; reference terminology to store, retrieve, and analyze clinical data; and classification to aggregate clinical data for secondary use. In order to use electronic health records data in an efficient way, healthcare providers first need to record clinical content using a systematic and controlled interface terminology, then clinical content needs to be stored with reference terminology in a clinical data repository or data warehouse, and finally, the clinical content can be converted into a classification for reimbursement and statistical reporting. For electronic health records data collected at the point of care to be used for secondary purposes, it is necessary to map reference terminology with interface terminology and classification. It is necessary to adopt clinical terminology in electronic health records systems to ensure a high level of semantic interoperability.
Subject(s)
Humans , Dietary Sucrose , Electronic Health Records , Health Personnel , Patient Care , Policy Making , Public Health , SemanticsABSTRACT
Infection by microorganisms may cause fatally erroneous interpretations in the biologic researches based on cell culture. The contamination by microorganism in the cell culture is quite frequent (5% to 35%). However, current approaches to identify the presence of contamination have many limitations such as high cost of time and labor, and difficulty in interpreting the result. In this paper, we propose a model to predict cell infection, using a microarray technique which gives an overview of the whole genome profile. By analysis of 62 microarray expression profiles under various experimental conditions altering cell type, source of infection and collection time, we discovered 5 marker genes, NM_005298, NM_016408, NM_014588, S76389, and NM_001853. In addition, we discovered two of these genes, S76389, and NM_001853, are involved in a Mycolplasma-specific infection process. We also suggest models to predict the source of infection, cell type or time after infection. We implemented a web based prediction tool in microarray data, named Prediction of Microbial Infection (http://www.snubi.org/software/PMI).
Subject(s)
Humans , Algorithms , Cell Line , Chondrocytes/cytology , Databases, Genetic , Gene Expression Profiling , Host-Pathogen Interactions , Keratinocytes/cytology , Models, Genetic , Mycoplasma/genetics , Oligonucleotide Array Sequence AnalysisABSTRACT
Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/
Subject(s)
Data Mining , DNA , Gene Expression , Mining , Oligonucleotide Array Sequence AnalysisABSTRACT
OBJECTIVE: Clinical trials are the most time-consuming and expensive part of the drug development process. Clinical Trial Management Systems (CTMSs) help sponsors of clinical trials manage all aspects of planning, performance, and reporting. Most conventional systems provide data processing functions using database management system (DBMS) procedures, which cause DBMS dependency problems. Thus, it is hard to handle the system by researchers who are unfamiliar with database. It is also difficult to share Electronic Case Report Forms (eCRFs) between institutions because conventional systems rely on specific software. METHODS: PhactaManager was developed for solving these problems by introducing an XML Layer in the application tier using an Entity-Attribute-Value model in the database tier. RESULTS: PhactaManager is a three-tier clinical trial management system that has an XML layer. The XML Layer provides a common DBMS independent eCRF document processing platform. Also we developed XML based eCRF Grammar to describe eCRF documents. The XML data elements described by eCRF grammar was constitute to eCRF by PhactaDesigner which an eCRF document design program. CONCLUSION: We achieved DBMS independency by implementing the XML Layer in PhactaManager. The Development of the eCRF Grammar enables the standardization of eCRF design, data correction and data sharing in multicenter clinical trial.
Subject(s)
Database Management Systems , Information DisseminationABSTRACT
Toxicogenomics has recently emerged in the field of toxicology and the DNA microarray technique has become common strategy for predictive toxicology which studies molecular mechanism caused by exposure of chemical or environmental stress. Although microarray experiment offers extensive genomic information to the researchers, yet high dimensional characteristic of the data often makes it hard to extract meaningful result. Therefore we developed toxicant enrichment analysis similar to the common enrichment approach. We also developed web-based system graPT to enable considerable prediction of toxic endpoints of experimental chemical.
Subject(s)
Oligonucleotide Array Sequence Analysis , Toxicogenetics , ToxicologyABSTRACT
Pharmacogenomics research requires an intelligent integration of large-scale genomic and clinical data with public and private knowledge resources. We developed a web-based knowledge base for KPRN (Korea Pharmacogenomics Research Network, http://kprn.snubi. org/). Four major types of information is integrated; genetic variation, drug information, disease information, and literature annotation. Eighteen Korean pharmacogenomics research groups in collaboration have submitted 859 genotype data sets for 91 disease-related genes. Integrative analysis and visualization of the large collection of data supported by integrated biomedical pathways and ontology resources are provided with a user-friendly interface and visualization engine empowered by Generic Genome Browser.