Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 373
Filter
1.
Cureus ; 16(6): e61606, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38962619

ABSTRACT

We present the case of a 56-year-old female with a significant medical history of cholelithiasis and recurrent choledocholithiasis. Following an elective cholecystectomy, an obstructing gallstone in the common bile duct led to a series of interventions, including endoscopic retrograde cholangiopancreatography and stent placement. The patient was scheduled for a robot-assisted laparoscopic common bile duct exploration. Due to severe adhesions, the procedure was converted to open with a large right upper quadrant incision. Intraoperative continuous external oblique block and catheter placement were performed at the end of surgery in the OR. Peripheral nerve blocks have become an integral part of multimodal pain management strategies. This case report describes the successful implementation of an ultrasound-guided right external oblique intercostal block and catheter placement for postoperative pain control and minimization of opioids. This case highlights the efficacy and safety of ultrasound-guided peripheral nerve blocks for postoperative pain management. Successful pain control contributed to the patient's overall postoperative recovery.

2.
Disabil Rehabil Assist Technol ; : 1-8, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38967320

ABSTRACT

Multi-Modality Aphasia Treatment (M-MAT) is an effective group intervention for post-stroke aphasia. M-MAT employs interactive card games and the modalities of gesture, drawing, reading, and writing to improve spoken language. However, there are challenges to implementation of group interventions such as M-MAT, particularly for those who cannot travel or live in rural areas. To maximise access to this effective treatment, we aimed to adapt M-MAT to telehealth format (M-MAT Tele). The Human-Centred Design Framework was utilized to guide the adaptation approach. We identified the intended context of use (outpatient/community rehabilitation) and the stakeholders (clinicians, people with aphasia, health service funders). People with aphasia and practising speech pathologists were invited to co-design M-MAT Tele in a series of iterative workshops, to ensure the end product was user-friendly and clinically feasible. The use of co-design allowed us to understand the hardware, software and other constraints and preferences of end users. In particular, clinicians (n = 3) required software compatible with a range of telehealth platforms and people with aphasia (n = 3) valued solutions with minimal technical demands and costs for participants. Co-design within the Human-Centred Design Framework led to a telehealth solution compatible with all major telehealth platforms, with minimal hardware or software requirements. Pilot testing is underway to confirm acceptability of M-MAT Tele to clinicians and people with aphasia, aiming to provide an effective, accessible tool for aphasia therapy in telehealth settings.

3.
PeerJ Comput Sci ; 10: e2077, 2024.
Article in English | MEDLINE | ID: mdl-38983227

ABSTRACT

Background: Dyslexia is a neurological disorder that affects an individual's language processing abilities. Early care and intervention can help dyslexic individuals succeed academically and socially. Recent developments in deep learning (DL) approaches motivate researchers to build dyslexia detection models (DDMs). DL approaches facilitate the integration of multi-modality data. However, there are few multi-modality-based DDMs. Methods: In this study, the authors built a DL-based DDM using multi-modality data. A squeeze and excitation (SE) integrated MobileNet V3 model, self-attention mechanisms (SA) based EfficientNet B7 model, and early stopping and SA-based Bi-directional long short-term memory (Bi-LSTM) models were developed to extract features from magnetic resonance imaging (MRI), functional MRI, and electroencephalography (EEG) data. In addition, the authors fine-tuned the LightGBM model using the Hyperband optimization technique to detect dyslexia using the extracted features. Three datasets containing FMRI, MRI, and EEG data were used to evaluate the performance of the proposed DDM. Results: The findings supported the significance of the proposed DDM in detecting dyslexia with limited computational resources. The proposed model outperformed the existing DDMs by producing an optimal accuracy of 98.9%, 98.6%, and 98.8% for the FMRI, MRI, and EEG datasets, respectively. Healthcare centers and educational institutions can benefit from the proposed model to identify dyslexia in the initial stages. The interpretability of the proposed model can be improved by integrating vision transformers-based feature extraction.

4.
Medicina (Kaunas) ; 60(7)2024 Jul 01.
Article in English | MEDLINE | ID: mdl-39064511

ABSTRACT

Mitral regurgitation (MR) is a broadly diffuse valvular heart disease (VHD) with a significant impact on the healthcare system and patient prognosis. Transcatheter mitral valve interventions (TMVI) are now well-established techniques included in the therapeutic armamentarium for managing patients with mitral regurgitation, either primary or functional MR. Even if the guidelines give indications regarding the correct management of this VHD, the wide heterogeneity of patients' clinical backgrounds and valvular and heart anatomies make each patient a unique case, in which the appropriate device's selection requires a multimodal imaging evaluation and a multidisciplinary discussion. Proper pre-procedural evaluation plays a pivotal role in judging the feasibility of TMVI, while a cooperative work between imagers and interventionalist is also crucial for procedural success. This manuscript aims to provide an exhaustive overview of the main parameters that need to be evaluated for appropriate device selection, pre-procedural planning, intra-procedural guidance and post-operative assessment in the setting of TMVI. In addition, it tries to give some insights about future perspectives for structural cardiovascular imaging.


Subject(s)
Cardiac Catheterization , Heart Valve Prosthesis Implantation , Mitral Valve Insufficiency , Mitral Valve , Multimodal Imaging , Humans , Mitral Valve Insufficiency/surgery , Mitral Valve Insufficiency/diagnostic imaging , Multimodal Imaging/methods , Heart Valve Prosthesis Implantation/methods , Heart Valve Prosthesis Implantation/instrumentation , Heart Valve Prosthesis Implantation/standards , Mitral Valve/surgery , Mitral Valve/diagnostic imaging , Cardiac Catheterization/methods , Cardiac Catheterization/instrumentation
5.
Comput Med Imaging Graph ; 116: 102414, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38981250

ABSTRACT

The use of multi-modality non-contrast images (i.e., T1FS, T2FS and DWI) for segmenting liver tumors provides a solution by eliminating the use of contrast agents and is crucial for clinical diagnosis. However, this remains a challenging task to discover the most useful information to fuse multi-modality images for accurate segmentation due to inter-modal interference. In this paper, we propose a dual-stream multi-level fusion framework (DM-FF) to, for the first time, accurately segment liver tumors from non-contrast multi-modality images directly. Our DM-FF first designs an attention-based encoder-decoder to effectively extract multi-level feature maps corresponding to a specified representation of each modality. Then, DM-FF creates two types of fusion modules, in which a module fuses learned features to obtain a shared representation across multi-modality images to exploit commonalities and improve the performance, and a module fuses the decision evidence of segment to discover differences between modalities to prevent interference caused by modality's conflict. By integrating these three components, DM-FF enables multi-modality non-contrast images to cooperate with each other and enables an accurate segmentation. Evaluation on 250 patients including different types of tumors from two MRI scanners, DM-FF achieves a Dice of 81.20%, and improves performance (Dice by at least 11%) when comparing the eight state-of-the-art segmentation architectures. The results indicate that our DM-FF significantly promotes the development and deployment of non-contrast liver tumor technology.

6.
Article in English | MEDLINE | ID: mdl-39060655

ABSTRACT

To evaluate left atrial (LA) function and strain parameters by cardiac magnetic resonance imaging (CMR) in patients with non-ischemic cardiomyopathy (NICM) and evaluate the association of these parameters with long-term clinical outcomes. We retrospectively included 92 patients with NICM and 50 subjects with no significant cardiovascular disease (control group). We calculated LA volumes using the Simpson area-length method to derive LA ejection fraction and expansion index. LA reservoir (ƐR), conduit (ƐCD), and contractile strain (ƐCT) were measured using dedicated CMR software (cvi42, Circle Cardiovascular Imaging Inc., version 5.14). An adjusted multivariate regression analysis was performed to determine the association of LA parameters with death and heart failure hospitalization (HFH). NICM patients were older with male preponderance. The mean age for NICM patients was 59.6 ± 15.9 years, 64% males, and 73% whites versus 52.2 ± 12.4 years, 34% male and 64% white for controls. LA strain patterns were significantly lower in NICM patients when compared to controls. During a median follow-up of 58.9 months, 12 patients (13%) died and 33(35.9%) had a HFH. None of the clinical or CMR factors were significantly associated with death. On multivariate analysis, after adjusting for age and significant univariate variables, ƐR was the only variable significantly associated with the HFH (OR 0.98, CI 0.96-1.0). Unadjusted and adjusted Cox proportional hazard models divided by the median ƐR (~ 18%) showed a significant difference in HFH over time (χ2 statistic = 21.1; P value = 0.03). In NICM patients, all LA strain components were reduced. ƐR was found to be significantly associated with HFH.

7.
Med Phys ; 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39042362

ABSTRACT

BACKGROUND: Cardiac applications in radiation therapy are rapidly expanding including magnetic resonance guided radiation therapy (MRgRT) for real-time gating for targeting and avoidance near the heart or treating ventricular tachycardia (VT). PURPOSE: This work describes the development and implementation of a novel multi-modality and magnetic resonance (MR)-compatible cardiac phantom. METHODS: The patient-informed 3D model was derived from manual contouring of a contrast-enhanced Coronary Computed Tomography Angiography scan, exported as a Stereolithography model, then post-processed to simulate female heart with an average volume. The model was 3D-printed using Elastic50A to provide MR contrast to water background. Two rigid acrylic modules containing cardiac structures were designed and assembled, retrofitting to an MR-safe programmable motor to supply cardiac and respiratory motion in superior-inferior directions. One module contained a cavity for an ion chamber (IC), and the other was equipped with multiple interchangeable cavities for plastic scintillation detectors (PSDs). Images were acquired on a 0.35 T MR-linac for validation of phantom geometry, motion, and simulated online treatment planning and delivery. Three motion profiles were prescribed: patient-derived cardiac (sine waveform, 4.3 mm peak-to-peak, 60 beats/min), respiratory (cos4 waveform, 30 mm peak-to-peak, 12 breaths/min), and a superposition of cardiac (sine waveform, 4 mm peak-to-peak, 70 beats/min) and respiratory (cos4 waveform, 24 mm peak-to-peak, 12 breaths/min). The amplitude of the motion profiles was evaluated from sagittal cine images at eight frames/s with a resolution of 2.4 mm × 2.4 mm. Gated dosimetry experiments were performed using the two module configurations for calculating dose relative to stationary. A CT-based VT treatment plan was delivered twice under cone-beam CT guidance and cumulative stationary doses to multi-point PSDs were evaluated. RESULTS: No artifacts were observed on any images acquired during phantom operation. Phantom excursions measured 49.3 ± 25.8%/66.9 ± 14.0%, 97.0 ± 2.2%/96.4 ± 1.7%, and 90.4 ± 4.8%/89.3 ± 3.5% of prescription for cardiac, respiratory, and cardio-respiratory motion profiles for the 2-chamber (PSD) and 12-substructure (IC) phantom modules respectively. In the gated experiments, the cumulative dose was <2% from expected using the IC module. Real-time dose measured for the PSDs at 10 Hz acquisition rate demonstrated the ability to detect the dosimetric consequences of cardiac, respiratory, and cardio-respiratory motion when sampling of different locations during a single delivery, and the stability of our phantom dosimetric results over repeated cycles for the high dose and high gradient regions. For the VT delivery, high dose PSD was <1% from expected (5-6 cGy deviation of 5.9 Gy/fraction) and high gradient/low dose regions had deviations <3.6% (6.3 cGy less than expected 1.73 Gy/fraction). CONCLUSIONS: A novel multi-modality modular heart phantom was designed, constructed, and used for gated radiotherapy experiments on a 0.35 T MR-linac. Our phantom was capable of mimicking cardiac, cardio-respiratory, and respiratory motion while performing dosimetric evaluations of gated procedures using IC and PSD configurations. Time-resolved PSDs with small sensitive volumes appear promising for low-amplitude/high-frequency motion and multi-point data acquisition for advanced dosimetric capabilities. Illustrating VT planning and delivery further expands our phantom to address the unmet needs of cardiac applications in radiotherapy.

8.
Photoacoustics ; 38: 100630, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39040971

ABSTRACT

A comprehensive understanding of a tumor is required for accurate diagnosis and effective treatment. However, currently, there is no single imaging modality that can provide sufficient information. Photoacoustic (PA) imaging is a hybrid imaging technique with high spatial resolution and detection sensitivity, which can be combined with ultrasound (US) imaging to provide both optical and acoustic contrast. Elastography can noninvasively map the elasticity distribution of biological tissue, which reflects pathological conditions. In this study, we incorporated PA elastography into a commercial US/PA imaging system to develop a tri-modality imaging system, which has been tested for tumor detection using four mice with different physiological conditions. The results show that this tri-modality imaging system can provide complementary information on acoustic, optical, and mechanical properties. The enabled visualization and dimension estimation of tumors can lead to a more comprehensive tissue characterization for diagnosis and treatment.

10.
Neural Netw ; 178: 106406, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38838393

ABSTRACT

Low-light conditions pose significant challenges to vision tasks, such as salient object detection (SOD), due to insufficient photons. Light-insensitive RGB-T SOD models mitigate the above problems to some extent, but they are limited in performance as they only focus on spatial feature fusion while ignoring the frequency discrepancy. To this end, we propose an RGB-T SOD model by mining spatial-frequency cues, called SFMNet, for low-light scenes. Our SFMNet consists of spatial-frequency feature exploration (SFFE) modules and spatial-frequency feature interaction (SFFI) modules. To be specific, the SFFE module aims to separate spatial-frequency features and adaptively extract high and low-frequency features. Moreover, the SFFI module integrates cross-modality and cross-domain information to capture effective feature representations. By deploying both modules in a top-down pathway, our method generates high-quality saliency predictions. Furthermore, we construct the first low-light RGB-T SOD dataset as a benchmark for evaluating performance. Extensive experiments demonstrate that our SFMNet can achieve higher accuracy than the existing models for low-light scenes.

11.
bioRxiv ; 2024 May 23.
Article in English | MEDLINE | ID: mdl-38826413

ABSTRACT

Background: Volumetry of subregions in the medial temporal lobe (MTL) computed from automatic segmentation in MRI can track neurodegeneration in Alzheimer's disease. However, image quality may vary in MRI. Poor quality MR images can lead to unreliable segmentation of MTL subregions. Considering that different MRI contrast mechanisms and field strengths (jointly referred to as "modalities" here) offer distinct advantages in imaging different parts of the MTL, we developed a muti-modality segmentation model using both 7 tesla (7T) and 3 tesla (3T) structural MRI to obtain robust segmentation in poor-quality images. Method: MRI modalities including 3T T1-weighted, 3T T2-weighted, 7T T1-weighted and 7T T2-weighted (7T-T2w) of 197 participants were collected from a longitudinal aging study at the Penn Alzheimer's Disease Research Center. Among them, 7T-T2w was used as the primary modality, and all other modalities were rigidly registered to the 7T-T2w. A model derived from nnU-Net took these registered modalities as input and outputted subregion segmentation in 7T-T2w space. 7T-T2w images most of which had high quality from 25 selected training participants were manually segmented to train the multi-modality model. Modality augmentation, which randomly replaced certain modalities with Gaussian noise, was applied during training to guide the model to extract information from all modalities. To compare our proposed model with a baseline single-modality model in the full dataset with mixed high/poor image quality, we evaluated the ability of derived volume/thickness measures to discriminate Amyloid+ mild cognitive impairment (A+MCI) and Amyloid- cognitively unimpaired (A-CU) groups, as well as the stability of these measurements in longitudinal data. Results: The multi-modality model delivered good performance regardless of 7T-T2w quality, while the single-modality model under-segmented subregions in poor-quality images. The multi-modality model generally demonstrated stronger discrimination of A+MCI versus A-CU. Intra-class correlation and Bland-Altman plots demonstrate that the multi-modality model had higher longitudinal segmentation consistency in all subregions while the single-modality model had low consistency in poor-quality images. Conclusion: The multi-modality MRI segmentation model provides an improved biomarker for neurodegeneration in the MTL that is robust to image quality. It also provides a framework for other studies which may benefit from multimodal imaging.

12.
Precis Clin Med ; 7(2): pbae012, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38912415

ABSTRACT

Background: The prognosis of breast cancer is often unfavorable, emphasizing the need for early metastasis risk detection and accurate treatment predictions. This study aimed to develop a novel multi-modal deep learning model using preoperative data to predict disease-free survival (DFS). Methods: We retrospectively collected pathology imaging, molecular and clinical data from The Cancer Genome Atlas and one independent institution in China. We developed a novel Deep Learning Clinical Medicine Based Pathological Gene Multi-modal (DeepClinMed-PGM) model for DFS prediction, integrating clinicopathological data with molecular insights. The patients included the training cohort (n = 741), internal validation cohort (n = 184), and external testing cohort (n = 95). Result: Integrating multi-modal data into the DeepClinMed-PGM model significantly improved area under the receiver operating characteristic curve (AUC) values. In the training cohort, AUC values for 1-, 3-, and 5-year DFS predictions increased to 0.979, 0.957, and 0.871, while in the external testing cohort, the values reached 0.851, 0.878, and 0.938 for 1-, 2-, and 3-year DFS predictions, respectively. The DeepClinMed-PGM's robust discriminative capabilities were consistently evident across various cohorts, including the training cohort [hazard ratio (HR) 0.027, 95% confidence interval (CI) 0.0016-0.046, P < 0.0001], the internal validation cohort (HR 0.117, 95% CI 0.041-0.334, P < 0.0001), and the external cohort (HR 0.061, 95% CI 0.017-0.218, P < 0.0001). Additionally, the DeepClinMed-PGM model demonstrated C-index values of 0.925, 0.823, and 0.864 within the three cohorts, respectively. Conclusion: This study introduces an approach to breast cancer prognosis, integrating imaging and molecular and clinical data for enhanced predictive accuracy, offering promise for personalized treatment strategies.

13.
Med Phys ; 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38896829

ABSTRACT

BACKGROUND: Head and neck (HN) gross tumor volume (GTV) auto-segmentation is challenging due to the morphological complexity and low image contrast of targets. Multi-modality images, including computed tomography (CT) and positron emission tomography (PET), are used in the routine clinic to assist radiation oncologists for accurate GTV delineation. However, the availability of PET imaging may not always be guaranteed. PURPOSE: To develop a deep learning segmentation framework for automated GTV delineation of HN cancers using a combination of PET/CT images, while addressing the challenge of missing PET data. METHODS: Two datasets were included for this study: Dataset I: 524 (training) and 359 (testing) oropharyngeal cancer patients from different institutions with their PET/CT pairs provided by the HECKTOR Challenge; Dataset II: 90 HN patients(testing) from a local institution with their planning CT, PET/CT pairs. To handle potentially missing PET images, a model training strategy named the "Blank Channel" method was implemented. To simulate the absence of a PET image, a blank array with the same dimensions as the CT image was generated to meet the dual-channel input requirement of the deep learning model. During the model training process, the model was randomly presented with either a real PET/CT pair or a blank/CT pair. This allowed the model to learn the relationship between the CT image and the corresponding GTV delineation based on available modalities. As a result, our model had the ability to handle flexible inputs during prediction, making it suitable for cases where PET images are missing. To evaluate the performance of our proposed model, we trained it using training patients from Dataset I and tested it with Dataset II. We compared our model (Model 1) with two other models which were trained for specific modality segmentations: Model 2 trained with only CT images, and Model 3 trained with real PET/CT pairs. The performance of the models was evaluated using quantitative metrics, including Dice similarity coefficient (DSC), mean surface distance (MSD), and 95% Hausdorff Distance (HD95). In addition, we evaluated our Model 1 and Model 3 using the 359 test cases in Dataset I. RESULTS: Our proposed model(Model 1) achieved promising results for GTV auto-segmentation using PET/CT images, with the flexibility of missing PET images. Specifically, when assessed with only CT images in Dataset II, Model 1 achieved DSC of 0.56 ± 0.16, MSD of 3.4 ± 2.1 mm, and HD95 of 13.9 ± 7.6 mm. When the PET images were included, the performance of our model was improved to DSC of 0.62 ± 0.14, MSD of 2.8 ± 1.7 mm, and HD95 of 10.5 ± 6.5 mm. These results are comparable to those achieved by Model 2 and Model 3, illustrating Model 1's effectiveness in utilizing flexible input modalities. Further analysis using the test dataset from Dataset I showed that Model 1 achieved an average DSC of 0.77, surpassing the overall average DSC of 0.72 among all participants in the HECKTOR Challenge. CONCLUSIONS: We successfully refined a multi-modal segmentation tool for accurate GTV delineation for HN cancer. Our method addressed the issue of missing PET images by allowing flexible data input, thereby providing a practical solution for clinical settings where access to PET imaging may be limited.

14.
Cureus ; 16(5): e59935, 2024 May.
Article in English | MEDLINE | ID: mdl-38854259

ABSTRACT

BACKGROUND: The routine use of multimodal analgesic modality results in lower pain scores with minimum side effects and opioid utilization. MATERIALS AND METHODS:  A prospective, cross-sectional, observational study was conducted among orthopedicians practicing across India to assess the professional opinions on using analgesics to manage orthopedic pain effectively. RESULTS:  A total of 530 orthopedicians participated in this survey. Over 50% of the participants responded that tramadol with or without paracetamol was the choice of therapy for acute pain. Nearly 50% of the participants mentioned that multimodal interventions can sometimes help to manage pain. A total of 55.6% of participants mentioned that using Non-steroidal anti-inflammatory drugs was the most common in their clinical practice, while 25.7% of participants mentioned that they used tramadol more commonly in their clinical practice. As per clinical efficacy ranking, the combination of tramadol plus paracetamol (44.3%) was ranked first among analgesic combinations, followed by aceclofenac plus paracetamol (40.0%). The severity of pain (62.6%) followed by age (60.6%) and duration of therapy (52.6%) were the most common factors that should be considered while prescribing tramadol plus paracetamol combination. Gastrointestinal and renal are reported as the most common safety concerns encountered with analgesics. CONCLUSION:  The combination of tramadol and paracetamol was identified as the most preferred choice of analgesics for prolonged orthopedic pain management.

15.
Echocardiography ; 41(6): e15859, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38853624

ABSTRACT

Aortic stenosis (AS) stands as the most common valvular heart disease in developed countries and is characterized by progressive narrowing of the aortic valve orifice resulting in elevated transvalvular flow resistance, left ventricular hypertrophy, and progressive increased risk of heart failure and sudden death. This narrative review explores clinical challenges and evolving perspectives in moderate AS, where discrepancies between aortic valve area and pressure gradient measurements may pose diagnostic and therapeutic quandaries. Transthoracic echocardiography is the first-line imaging modality for AS evaluation, yet cases of discordance may require the application of ancillary noninvasive diagnostic modalities. This review underscores the importance of accurate grading of AS severity, especially in low-gradient phenotypes, emphasizing the need for vigilant follow-up. Current clinical guidelines primarily recommend aortic valve replacement for severe AS, potentially overlooking latent risks in moderate disease stages. The noninvasive multimodality imaging approach-including echocardiography, cardiac magnetic resonance, computed tomography, and nuclear techniques-provides unique insights into adaptive and maladaptive cardiac remodeling in AS and offers a promising avenue to deliver precise indications and exact timing for intervention in moderate AS phenotypes and asymptomatic patients, potentially improving long-term outcomes. Nevertheless, what we may have gleaned from a large amount of observational data is still insufficient to build a robust framework for clinical decision-making in moderate AS. Future research will prioritize randomized clinical trials designed to weigh the benefits and risks of preemptive aortic valve replacement in the management of moderate AS, as directed by specific imaging and nonimaging biomarkers.


Subject(s)
Aortic Valve Stenosis , Aortic Valve , Echocardiography , Humans , Aortic Valve Stenosis/physiopathology , Aortic Valve Stenosis/surgery , Echocardiography/methods , Aortic Valve/diagnostic imaging , Aortic Valve/surgery , Aortic Valve/physiopathology , Severity of Illness Index
16.
Sensors (Basel) ; 24(10)2024 May 18.
Article in English | MEDLINE | ID: mdl-38794076

ABSTRACT

Object detection is one of the core technologies for autonomous driving. Current road object detection mainly relies on visible light, which is prone to missed detections and false alarms in rainy, night-time, and foggy scenes. Multispectral object detection based on the fusion of RGB and infrared images can effectively address the challenges of complex and changing road scenes, improving the detection performance of current algorithms in complex scenarios. However, previous multispectral detection algorithms suffer from issues such as poor fusion of dual-mode information, poor detection performance for multi-scale objects, and inadequate utilization of semantic information. To address these challenges and enhance the detection performance in complex road scenes, this paper proposes a novel multispectral object detection algorithm called MRD-YOLO. In MRD-YOLO, we utilize interaction-based feature extraction to effectively fuse information and introduce the BIC-Fusion module with attention guidance to fuse different modal information. We also incorporate the SAConv module to improve the model's detection performance for multi-scale objects and utilize the AIFI structure to enhance the utilization of semantic information. Finally, we conduct experiments on two major public datasets, FLIR_Aligned and M3FD. The experimental results demonstrate that compared to other algorithms, the proposed algorithm achieves superior detection performance in complex road scenes.

17.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38801702

ABSTRACT

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.


Subject(s)
Supervised Machine Learning , Algorithms , Computational Biology/methods
18.
Med Image Anal ; 96: 103214, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38815358

ABSTRACT

Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information using a multi-distribution fusion perspective. Specifically, our method first utilizes normal inverse gamma prior distributions over pre-trained models to learn both aleatoric and epistemic uncertainty for uni-modality. Then, the normal inverse gamma distribution is analyzed as the Student's t distribution. Furthermore, within a confidence-aware fusion framework, we propose a mixture of Student's t distributions to effectively integrate different modalities, imparting the model with heavy-tailed properties and enhancing its robustness and reliability. More importantly, the confidence-aware multi-modality ranking regularization term induces the model to more reasonably rank the noisy single-modal and fused-modal confidence, leading to improved reliability and accuracy. Experimental results on both public and internal datasets demonstrate that our model excels in robustness, particularly in challenging scenarios involving Gaussian noise and modality missing conditions. Moreover, our model exhibits strong generalization capabilities to out-of-distribution data, underscoring its potential as a promising solution for multimodal eye disease screening.


Subject(s)
Eye Diseases , Humans , Eye Diseases/diagnostic imaging , Multimodal Imaging , Reproducibility of Results , Image Interpretation, Computer-Assisted/methods , Algorithms , Machine Learning
19.
Phys Med Biol ; 69(12)2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38776945

ABSTRACT

Objective.In oncology, clinical decision-making relies on a multitude of data modalities, including histopathological, radiological, and clinical factors. Despite the emergence of computer-aided multimodal decision-making systems for predicting hepatocellular carcinoma (HCC) recurrence post-hepatectomy, existing models often employ simplistic feature-level concatenation, leading to redundancy and suboptimal performance. Moreover, these models frequently lack effective integration with clinically relevant data and encounter challenges in integrating diverse scales and dimensions, as well as incorporating the liver background, which holds clinical significance but has been previously overlooked.Approach.To address these limitations, we propose two approaches. Firstly, we introduce the tensor fusion method to our model, which offers distinct advantages in handling multi-scale and multi-dimensional data fusion, potentially enhancing overall performance. Secondly, we pioneer the consideration of the liver background's impact, integrating it into the feature extraction process using a deep learning segmentation-based algorithm. This innovative inclusion aligns the model more closely with real-world clinical scenarios, as the liver background may contain crucial information related to postoperative recurrence.Main results.We collected radiomics (MRI) and histopathological images from 176 cases diagnosed by experienced clinicians across two independent centers. Our proposed network underwent training and 5-fold cross-validation on this dataset before validation on an external test dataset comprising 40 cases. Ultimately, our model demonstrated outstanding performance in predicting early recurrence of HCC postoperatively, achieving an AUC of 0.883.Significance.These findings signify significant progress in addressing challenges related to multimodal data fusion and hold promise for more accurate clinical outcome predictions. In this study, we exploited global 3D liver background into modelling which is crucial to to the prognosis assessment and analyzed the whole liver background in addition to the tumor region. Both MRI images and histopathological images of HCC were fused at high-dimensional feature space using tensor techniques to solve cross-scale data integration issue.


Subject(s)
Carcinoma, Hepatocellular , Liver Neoplasms , Liver Neoplasms/diagnostic imaging , Liver Neoplasms/surgery , Liver Neoplasms/pathology , Humans , Carcinoma, Hepatocellular/diagnostic imaging , Carcinoma, Hepatocellular/surgery , Carcinoma, Hepatocellular/pathology , Neoplasm Recurrence, Local/diagnostic imaging , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging , Recurrence , Deep Learning
20.
Int J Comput Assist Radiol Surg ; 19(7): 1409-1417, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38780829

ABSTRACT

PURPOSE: The modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with natural language capabilities is emerging as a necessity. Our work aims to advance visual question answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. METHODS: First, we propose a surgical scene graph-based dataset, SSG-VQA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. We then propose SSG-VQA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module, which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features. RESULTS: Our comprehensive analysis shows that our SSG-VQA dataset provides a more complex, diverse, geometrically grounded, unbiased and surgical action-oriented dataset compared to existing surgical VQA datasets and SSG-VQA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries. CONCLUSION: We present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. We point out that the bottleneck of the current surgical visual question-answer model lies in learning the encoded representation rather than decoding the sequence. Our SSG-VQA dataset provides a diagnostic benchmark to test the scene understanding and reasoning capabilities of the model. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-VQA .


Subject(s)
Operating Rooms , Humans , Surgery, Computer-Assisted/methods , Natural Language Processing , Video Recording
SELECTION OF CITATIONS
SEARCH DETAIL
...