Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 2.018
Filter
1.
Sci Rep ; 14(1): 12697, 2024 06 03.
Article in English | MEDLINE | ID: mdl-38830890

ABSTRACT

Melanoma, the deadliest form of skin cancer, has seen a steady increase in incidence rates worldwide, posing a significant challenge to dermatologists. Early detection is crucial for improving patient survival rates. However, performing total body screening (TBS), i.e., identifying suspicious lesions or ugly ducklings (UDs) by visual inspection, can be challenging and often requires sound expertise in pigmented lesions. To assist users of varying expertise levels, an artificial intelligence (AI) decision support tool was developed. Our solution identifies and characterizes UDs from real-world wide-field patient images. It employs a state-of-the-art object detection algorithm to locate and isolate all skin lesions present in a patient's total body images. These lesions are then sorted based on their level of suspiciousness using a self-supervised AI approach, tailored to the specific context of the patient under examination. A clinical validation study was conducted to evaluate the tool's performance. The results demonstrated an average sensitivity of 95% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that the tool increased dermatologists' confidence when formulating a diagnosis, and the average majority agreement with the top-10 AI-identified UDs reached 100% when assisted by our tool. With the development of this AI-based decision support tool, we aim to address the shortage of specialists, enable faster consultation times for patients, and demonstrate the impact and usability of AI-assisted screening. Future developments will include expanding the dataset to include histologically confirmed melanoma and validating the tool for additional body regions.


Subject(s)
Early Detection of Cancer , Melanoma , Skin Neoplasms , Supervised Machine Learning , Humans , Skin Neoplasms/diagnosis , Melanoma/diagnosis , Early Detection of Cancer/methods , Artificial Intelligence , Algorithms , Male , Female , Skin/pathology
2.
BMC Med Inform Decis Mak ; 24(1): 152, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38831432

ABSTRACT

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community.. RESULTS: The proposed model in this study was applied to three distinct tabular datasets, resulting in notable enhancements in their quality with respect to readiness for ML tasks, as assessed through established data quality metrics. Our findings demonstrate the efficacy of the framework in substantially augmenting the original dataset quality, achieved through the elimination of extraneous features and rows. This refinement yielded improved accuracy across both supervised and unsupervised learning methodologies. CONCLUSION: Our software presents an automated framework for data readiness, aimed at enhancing the integrity of raw datasets to facilitate robust utilization within ML pipelines. Through our proposed framework, we streamline the original dataset, resulting in enhanced accuracy and efficiency within the associated ML algorithms.


Subject(s)
Machine Learning , Humans , Datasets as Topic , Unsupervised Machine Learning , Algorithms , Supervised Machine Learning , Software
3.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38801702

ABSTRACT

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.


Subject(s)
Supervised Machine Learning , Algorithms , Computational Biology/methods
4.
Comput Biol Med ; 176: 108547, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38728994

ABSTRACT

Self-supervised pre-training and fully supervised fine-tuning paradigms have received much attention to solve the data annotation problem in deep learning fields. Compared with traditional pre-training on large natural image datasets, medical self-supervised learning methods learn rich representations derived from unlabeled data itself thus avoiding the distribution shift between different image domains. However, nowadays state-of-the-art medical pre-training methods were specifically designed for downstream tasks making them less flexible and difficult to apply to new tasks. In this paper, we propose grid mask image modeling, a flexible and general self-supervised method to pre-train medical vision transformers for 3D medical image segmentation. Our goal is to guide networks to learn the correlations between organs and tissues by reconstructing original images based on partial observations. The relationships are consistent within the human body and invariant to disease type or imaging modality. To achieve this, we design a Siamese framework consisting of an online branch and a target branch. An adaptive and hierarchical masking strategy is employed in the online branch to (1) learn the boundaries or small contextual mutation regions within images; (2) to learn high-level semantic representations from deeper layers of the multiscale encoder. In addition, the target branch provides representations for contrastive learning to further reduce representation redundancy. We evaluate our method through segmentation performance on two public datasets. The experimental results demonstrate our method outperforms other self-supervised methods. Codes are available at https://github.com/mobiletomb/Gmim.


Subject(s)
Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods , Deep Learning , Algorithms , Supervised Machine Learning
5.
Cell ; 187(10): 2502-2520.e17, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38729110

ABSTRACT

Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.


Subject(s)
Imaging, Three-Dimensional , Prostatic Neoplasms , Supervised Machine Learning , Humans , Male , Deep Learning , Imaging, Three-Dimensional/methods , Prognosis , Prostatic Neoplasms/pathology , Prostatic Neoplasms/diagnostic imaging , X-Ray Microtomography/methods
6.
Comput Biol Med ; 176: 108554, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38744013

ABSTRACT

One of the most common diseases affecting society around the world is kidney tumor. The risk of kidney disease increases due to reasons such as consumption of ready-made food and bad habits. Early diagnosis of kidney tumors is essential for effective treatment, reducing side effects, and reducing the number of deaths. With the development of computer-aided diagnostic methods, the need for accurate renal tumor classification is also increasing. Because traditional methods based on manual detection are time-consuming, boring, and costly, high-accuracy tests can be performed faster and at a lower cost with deep learning (DL) methods in kidney tumor detection (KTD). Among the current challenges regarding artificial intelligence-assisted KTD, obtaining more precise programming information and the capacity to group with high accuracy make clinical determination more vital and bring it to an important point for current treatment in KTD prediction. This encourages us to propose a more effective DL model that can effectively assist specialist physicians in the diagnosis of kidney tumors. In this way, the workload of radiologists can be alleviated and errors in clinical diagnoses that may occur due to the complex structure of the kidney can be prevented. A large amount of data is needed during the training of the developed methods. Although various studies have been conducted to reduce the amount of data with feature selection techniques, these techniques provide little improvement in the classification accuracy rate. In this paper, a masked autoencoder (MAE) is proposed for KTD, which can produce effective results on datasets containing some samples and can be directly fine-tuned and pre-trained. Self-supervised learning (SSL) is achieved through self-distillation (SD), which can be reintroduced into the configuration loss calculation using masked patches. The SD loss on the decoder and encoder outputs' latent representation is calculated operating SSLSD-KTD. The encoder obtains local attention, while the decoder transfers its global attention to calculate losses. The SSLSD-KTD method reached 98.04 % classification accuracy on the KAUH-kidney dataset, including 8400 samples, and 82.14 % on the CT-kidney dataset, containing 840 samples. By adding more external information to the SSLSD-KTD method with transfer learning, accuracy results of 99.82 % and 95.24 % were obtained on the same datasets. Experimental results have shown that the SSLSD-KTD method can effectively extract kidney tumor features with limited data and can be an aid or even an alternative for radiologists in decision-making in the diagnosis of the disease.


Subject(s)
Kidney Neoplasms , Tomography, X-Ray Computed , Humans , Kidney Neoplasms/diagnostic imaging , Kidney Neoplasms/classification , Tomography, X-Ray Computed/methods , Supervised Machine Learning , Deep Learning , Kidney/diagnostic imaging , Male , Female , Radiographic Image Interpretation, Computer-Assisted/methods
7.
Comput Methods Programs Biomed ; 251: 108229, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38761413

ABSTRACT

BACKGROUND AND OBJECTIVE: Optical coherence tomography (OCT) is currently one of the most advanced retinal imaging methods. Retinal biomarkers in OCT images are of clinical significance and can assist ophthalmologists in diagnosing lesions. Compared with fundus images, OCT can provide higher resolution segmentation. However, image annotation at the bounding box level needs to be performed by ophthalmologists carefully and is difficult to obtain. In addition, the large variation in shape of different retinal markers and the inconspicuous appearance of biomarkers make it difficult for existing deep learning-based methods to effectively detect them. To overcome the above challenges, we propose a novel network for the detection of retinal biomarkers in OCT images. METHODS: We first address the issue of labeling cost using a novel weakly semi-supervised object detection method with point annotations which can reduce bounding box-level annotation efforts. To extend the method to the detection of biomarkers in OCT images, we propose multiple consistent regularizations for point-to-box regression network to deal with the shortage of supervision, which aims to learn more accurate regression mappings. Furthermore, in the subsequent fully supervised detection, we propose a cross-scale feature enhancement module to alleviate the detection problems caused by the large-scale variation of biomarkers. We also propose a dynamic label assignment strategy to distinguish samples of different importance more flexibly, thereby reducing detection errors due to the indistinguishable appearance of the biomarkers. RESULTS: When using our detection network, our regressor also achieves an AP value of 20.83 s when utilizing a 5 % fully labeled dataset partition, surpassing the performance of other comparative methods at 5 % and 10 %. Even coming close to the 20.87 % result achieved by Point DETR under 20 % full labeling conditions. When using Group R-CNN as the point-to-box regressor, our detector achieves 27.21 % AP in the 50 % fully labeled dataset experiment. 7.42 % AP improvement is achieved compared to our detection network baseline Faster R-CNN. CONCLUSIONS: The experimental findings not only demonstrate the effectiveness of our approach with minimal bounding box annotations but also highlight the enhanced biomarker detection performance of the proposed module. We have included a detailed algorithmic flow in the supplementary material.


Subject(s)
Algorithms , Biomarkers , Retina , Tomography, Optical Coherence , Tomography, Optical Coherence/methods , Humans , Retina/diagnostic imaging , Deep Learning , Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Neural Networks, Computer , Image Interpretation, Computer-Assisted/methods
8.
Comput Biol Med ; 176: 108605, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38772054

ABSTRACT

In this work, we study various hybrid models of entropy-based and representativeness sampling techniques in the context of active learning in medical segmentation, in particular examining the role of UMAP (Uniform Manifold Approximation and Projection) as a technique for capturing representativeness. Although UMAP has been shown viable as a general purpose dimension reduction method in diverse areas, its role in deep learning-based medical segmentation has yet been extensively explored. Using the cardiac and prostate datasets in the Medical Segmentation Decathlon for validation, we found that a novel hybrid combination of Entropy-UMAP sampling technique achieved a statistically significant Dice score advantage over the random baseline (3.2% for cardiac, 4.5% for prostate), and attained the highest Dice coefficient among the spectrum of 10 distinct active learning methodologies we examined. This provides preliminary evidence that there is an interesting synergy between entropy-based and UMAP methods when the former precedes the latter in a hybrid model of active learning.


Subject(s)
Entropy , Humans , Male , Deep Learning , Prostate/diagnostic imaging , Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Heart
9.
Comput Biol Med ; 176: 108609, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38772056

ABSTRACT

Semi-supervised medical image segmentation presents a compelling approach to streamline large-scale image analysis, alleviating annotation burdens while maintaining comparable performance. Despite recent strides in cross-supervised training paradigms, challenges persist in addressing sub-network disagreement and training efficiency and reliability. In response, our paper introduces a novel cross-supervised learning framework, Quality-driven Deep Cross-supervised Learning Network (QDC-Net). QDC-Net incorporates both an evidential sub-network and an vanilla sub-network, leveraging their complementary strengths to effectively handle disagreement. To enable the reliability and efficiency of semi-supervised training, we introduce a real-time quality estimation of the model's segmentation performance and propose a directional cross-training approach through the design of directional weights. We further design a truncated form of sample-wise loss weighting to mitigate the impact of inaccurate predictions and collapsed samples in semi-supervised training. Extensive experiments on LA and Pancreas-CT datasets demonstrate that QDC-Net surpasses other state-of-the-art methods in semi-supervised medical image segmentation. Code release is available at https://github.com/Medsemiseg.


Subject(s)
Supervised Machine Learning , Humans , Deep Learning , Image Processing, Computer-Assisted/methods , Pancreas/diagnostic imaging , Tomography, X-Ray Computed
10.
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue ; 36(4): 345-352, 2024 Apr.
Article in Chinese | MEDLINE | ID: mdl-38813626

ABSTRACT

OBJECTIVE: To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms. METHODS: The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA). RESULTS: A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca2+, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na+, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively. CONCLUSIONS: The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca2+, Hb, WBC, SAPS III score, APS III score, Na+, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.


Subject(s)
Algorithms , Shock, Septic , Supervised Machine Learning , Support Vector Machine , Humans , Shock, Septic/mortality , Shock, Septic/diagnosis , Female , Prognosis , Intensive Care Units , Male , Middle Aged , Machine Learning , Decision Trees
11.
PLoS One ; 19(5): e0299583, 2024.
Article in English | MEDLINE | ID: mdl-38696410

ABSTRACT

The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (~26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation (CV) performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We hope others will avoid similar mistakes.


Subject(s)
Metabolic Networks and Pathways , Supervised Machine Learning , Humans , Datasets as Topic
12.
Sci Rep ; 14(1): 10820, 2024 05 11.
Article in English | MEDLINE | ID: mdl-38734825

ABSTRACT

Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self-supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods. The code can be accessed at https://github.com/pranavsinghps1/S4MI .


Subject(s)
Image Processing, Computer-Assisted , Supervised Machine Learning , Humans , Image Processing, Computer-Assisted/methods , Diagnostic Imaging/methods , Algorithms
13.
PeerJ ; 12: e17361, 2024.
Article in English | MEDLINE | ID: mdl-38737741

ABSTRACT

Phytoplankton are the world's largest oxygen producers found in oceans, seas and large water bodies, which play crucial roles in the marine food chain. Unbalanced biogeochemical features like salinity, pH, minerals, etc., can retard their growth. With advancements in better hardware, the usage of Artificial Intelligence techniques is rapidly increasing for creating an intelligent decision-making system. Therefore, we attempt to overcome this gap by using supervised regressions on reanalysis data targeting global phytoplankton levels in global waters. The presented experiment proposes the applications of different supervised machine learning regression techniques such as random forest, extra trees, bagging and histogram-based gradient boosting regressor on reanalysis data obtained from the Copernicus Global Ocean Biogeochemistry Hindcast dataset. Results obtained from the experiment have predicted the phytoplankton levels with a coefficient of determination score (R2) of up to 0.96. After further validation with larger datasets, the model can be deployed in a production environment in an attempt to complement in-situ measurement efforts.


Subject(s)
Machine Learning , Phytoplankton , Remote Sensing Technology , Remote Sensing Technology/methods , Remote Sensing Technology/instrumentation , Oceans and Seas , Environmental Monitoring/methods , Supervised Machine Learning
14.
J Neural Eng ; 21(3)2024 May 17.
Article in English | MEDLINE | ID: mdl-38757187

ABSTRACT

Objective.Aiming for the research on the brain-computer interface (BCI), it is crucial to design a MI-EEG recognition model, possessing a high classification accuracy and strong generalization ability, and not relying on a large number of labeled training samples.Approach.In this paper, we propose a self-supervised MI-EEG recognition method based on self-supervised learning with one-dimensional multi-task convolutional neural networks and long short-term memory (1-D MTCNN-LSTM). The model is divided into two stages: signal transform identification stage and pattern recognition stage. In the signal transform recognition phase, the signal transform dataset is recognized by the upstream 1-D MTCNN-LSTM network model. Subsequently, the backbone network from the signal transform identification phase is transferred to the pattern recognition phase. Then, it is fine-tuned using a trace amount of labeled data to finally obtain the motion recognition model.Main results.The upstream stage of this study achieves more than 95% recognition accuracy for EEG signal transforms, up to 100%. For MI-EEG pattern recognition, the model obtained recognition accuracies of 82.04% and 87.14% with F1 scores of 0.7856 and 0.839 on the datasets of BCIC-IV-2b and BCIC-IV-2a.Significance.The improved accuracy proves the superiority of the proposed method. It is prospected to be a method for accurate classification of MI-EEG in the BCI system.


Subject(s)
Brain-Computer Interfaces , Electroencephalography , Imagination , Neural Networks, Computer , Electroencephalography/methods , Humans , Imagination/physiology , Supervised Machine Learning , Pattern Recognition, Automated/methods
15.
BMC Med Inform Decis Mak ; 24(1): 126, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38755563

ABSTRACT

BACKGROUND: Chest X-ray imaging based abnormality localization, essential in diagnosing various diseases, faces significant clinical challenges due to complex interpretations and the growing workload of radiologists. While recent advances in deep learning offer promising solutions, there is still a critical issue of domain inconsistency in cross-domain transfer learning, which hampers the efficiency and accuracy of diagnostic processes. This study aims to address the domain inconsistency problem and improve autonomic abnormality localization performance of heterogeneous chest X-ray image analysis, particularly in detecting abnormalities, by developing a self-supervised learning strategy called "BarlwoTwins-CXR". METHODS: We utilized two publicly available datasets: the NIH Chest X-ray Dataset and the VinDr-CXR. The BarlowTwins-CXR approach was conducted in a two-stage training process. Initially, self-supervised pre-training was performed using an adjusted Barlow Twins algorithm on the NIH dataset with a Resnet50 backbone pre-trained on ImageNet. This was followed by supervised fine-tuning on the VinDr-CXR dataset using Faster R-CNN with Feature Pyramid Network (FPN). The study employed mean Average Precision (mAP) at an Intersection over Union (IoU) of 50% and Area Under the Curve (AUC) for performance evaluation. RESULTS: Our experiments showed a significant improvement in model performance with BarlowTwins-CXR. The approach achieved a 3% increase in mAP50 accuracy compared to traditional ImageNet pre-trained models. In addition, the Ablation CAM method revealed enhanced precision in localizing chest abnormalities. The study involved 112,120 images from the NIH dataset and 18,000 images from the VinDr-CXR dataset, indicating robust training and testing samples. CONCLUSION: BarlowTwins-CXR significantly enhances the efficiency and accuracy of chest X-ray image-based abnormality localization, outperforming traditional transfer learning methods and effectively overcoming domain inconsistency in cross-domain scenarios. Our experiment results demonstrate the potential of using self-supervised learning to improve the generalizability of models in medical settings with limited amounts of heterogeneous data. This approach can be instrumental in aiding radiologists, particularly in high-workload environments, offering a promising direction for future AI-driven healthcare solutions.


Subject(s)
Radiography, Thoracic , Supervised Machine Learning , Humans , Deep Learning , Radiographic Image Interpretation, Computer-Assisted/methods , Datasets as Topic
16.
BMC Med Inform Decis Mak ; 24(1): 127, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38755570

ABSTRACT

BACKGROUND: Medical records are a valuable source for understanding patient health conditions. Doctors often use these records to assess health without solely depending on time-consuming and complex examinations. However, these records may not always be directly relevant to a patient's current health issue. For instance, information about common colds may not be relevant to a more specific health condition. While experienced doctors can effectively navigate through unnecessary details in medical records, this excess information presents a challenge for machine learning models in predicting diseases electronically. To address this, we have developed 'al-BERT', a new disease prediction model that leverages the BERT framework. This model is designed to identify crucial information from medical records and use it to predict diseases. 'al-BERT' operates on the principle that the structure of sentences in diagnostic records is similar to regular linguistic patterns. However, just as stuttering in speech can introduce 'noise' or irrelevant information, similar issues can arise in written records, complicating model training. To overcome this, 'al-BERT' incorporates a semi-supervised layer that filters out irrelevant data from patient visitation records. This process aims to refine the data, resulting in more reliable indicators for disease correlations and enhancing the model's predictive accuracy and utility in medical diagnostics. METHOD: To discern noise diseases within patient records, especially those resembling influenza-like illnesses, our approach employs a customized semi-supervised learning algorithm equipped with a focused attention mechanism. This mechanism is specifically calibrated to enhance the model's sensitivity to chronic conditions while concurrently distilling salient features from patient records, thereby augmenting the predictive accuracy and utility of the model in clinical settings. We evaluate the performance of al-BERT using real-world health insurance data provided by Taiwan's National Health Insurance. RESULT: In our study, we evaluated our model against two others: one based on BERT that uses complete disease records, and another variant that includes extra filtering techniques. Our findings show that models incorporating filtering mechanisms typically perform better than those using the entire, unfiltered dataset. Our approach resulted in improved outcomes across several key measures: AUC-ROC (an indicator of a model's ability to distinguish between classes), precision (the accuracy of positive predictions), recall (the model's ability to find all relevant cases), and overall accuracy. Most notably, our model showed a 15% improvement in recall compared to the current best-performing method in the field of disease prediction. CONCLUSION: The conducted ablation study affirms the advantages of our attention mechanism and underscores the crucial role of the selection module within al-BERT.


Subject(s)
Electronic Health Records , Humans , Supervised Machine Learning , Machine Learning
17.
Biometrics ; 80(2)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38768225

ABSTRACT

Conventional supervised learning usually operates under the premise that data are collected from the same underlying population. However, challenges may arise when integrating new data from different populations, resulting in a phenomenon known as dataset shift. This paper focuses on prior probability shift, where the distribution of the outcome varies across datasets but the conditional distribution of features given the outcome remains the same. To tackle the challenges posed by such shift, we propose an estimation algorithm that can efficiently combine information from multiple sources. Unlike existing methods that are restricted to discrete outcomes, the proposed approach accommodates both discrete and continuous outcomes. It also handles high-dimensional covariate vectors through variable selection using an adaptive least absolute shrinkage and selection operator penalty, producing efficient estimates that possess the oracle property. Moreover, a novel semiparametric likelihood ratio test is proposed to check the validity of prior probability shift assumptions by embedding the null conditional density function into Neyman's smooth alternatives (Neyman, 1937) and testing study-specific parameters. We demonstrate the effectiveness of our proposed method through extensive simulations and a real data example. The proposed methods serve as a useful addition to the repertoire of tools for dealing dataset shifts.


Subject(s)
Algorithms , Computer Simulation , Models, Statistical , Probability , Humans , Likelihood Functions , Biometry/methods , Data Interpretation, Statistical , Supervised Machine Learning
18.
Cell Syst ; 15(5): 475-482.e6, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38754367

ABSTRACT

Image-based spatial transcriptomics methods enable transcriptome-scale gene expression measurements with spatial information but require complex, manually tuned analysis pipelines. We present Polaris, an analysis pipeline for image-based spatial transcriptomics that combines deep-learning models for cell segmentation and spot detection with a probabilistic gene decoder to quantify single-cell gene expression accurately. Polaris offers a unifying, turnkey solution for analyzing spatial transcriptomics data from multiplexed error-robust FISH (MERFISH), sequential fluorescence in situ hybridization (seqFISH), or in situ RNA sequencing (ISS) experiments. Polaris is available through the DeepCell software library (https://github.com/vanvalenlab/deepcell-spots) and https://www.deepcell.org.


Subject(s)
Deep Learning , Gene Expression Profiling , In Situ Hybridization, Fluorescence , Transcriptome , In Situ Hybridization, Fluorescence/methods , Transcriptome/genetics , Gene Expression Profiling/methods , Software , Humans , Single-Cell Analysis/methods , Image Processing, Computer-Assisted/methods , Single Molecule Imaging/methods , Animals , Supervised Machine Learning
19.
Sci Rep ; 14(1): 12543, 2024 05 31.
Article in English | MEDLINE | ID: mdl-38822075

ABSTRACT

The present study combined a supervised machine learning framework with an unsupervised method, finite mixture modeling, to identify prognostically meaningful subgroups of diverse chronic pain patients undergoing interdisciplinary treatment. Questionnaire data collected at pre-treatment and 1-year follow up from 11,995 patients from the Swedish Quality Registry for Pain Rehabilitation were used. Indicators measuring pain characteristics, psychological aspects, and social functioning and general health status were used to form subgroups, and pain interference at follow-up was used for the selection and the performance evaluation of models. A nested cross-validation procedure was used for determining the number of classes (inner cross-validation) and the prediction accuracy of the selected model among unseen cases (outer cross-validation). A four-class solution was identified as the optimal model. Identified subgroups were separable on indicators, predictive of long-term outcomes, and related to background characteristics. Results are discussed in relation to previous clustering attempts of patients with diverse chronic pain conditions. Our analytical approach, as the first to combine mixture modeling with supervised, targeted learning, provides a promising framework that can be further extended and optimized for improving accurate prognosis in pain treatment and identifying clinically meaningful subgroups among chronic pain patients.


Subject(s)
Chronic Pain , Supervised Machine Learning , Humans , Male , Female , Middle Aged , Prognosis , Adult , Aged , Sweden , Surveys and Questionnaires
20.
Nat Commun ; 15(1): 3942, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38729933

ABSTRACT

In clinical oncology, many diagnostic tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques necessitate the need for labels, providing manual cell annotations is time-consuming. In this paper, we propose a self-supervised framework (enVironment-aware cOntrastive cell represenTation learning: VOLTA) for cell representation learning in histopathology images using a technique that accounts for the cell's mutual relationship with its environment. We subject our model to extensive experiments on data collected from multiple institutions comprising over 800,000 cells and six cancer types. To showcase the potential of our proposed framework, we apply VOLTA to ovarian and endometrial cancers and demonstrate that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised models, we provide a framework that can empower discoveries without any annotation data, even in situations where sample sizes are limited.


Subject(s)
Endometrial Neoplasms , Ovarian Neoplasms , Humans , Female , Endometrial Neoplasms/pathology , Ovarian Neoplasms/pathology , Machine Learning , Supervised Machine Learning , Algorithms , Image Processing, Computer-Assisted/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...