Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 70
Filter
1.
Article in English | MEDLINE | ID: mdl-38861438

ABSTRACT

Early diagnosis of Alzheimer's disease (AD) is crucial for its prevention, and hippocampal atrophy is a significant lesion for early diagnosis. The current DL-based AD diagnosis methods only focus on either AD classification or hippocampus segmentation independently, neglecting the correlation between the two tasks and lacking pathological interpretability. To address this issue, we propose a Reliable Hippo-guided Learning model for Alzheimer's Disease diagnosis (RLAD), which employs multi-task learning for AD classification as a main task supplemented by hippocampus segmentation. More specifically, our model consists of 1) a hybrid shared features encoder that encodes local and global information in MRI to enhance the model's ability to learn discriminative features; 2) Task Specific Decoders to accomplish AD classification and hippocampus segmentation; and 3) Task Coordination module to correlate the two tasks and guide the classification task to focus on the hippocampus area. Our proposed RLAD model is evaluated on MRI scans of 1631 subjects from three independent datasets, including ADNI-1, ADNI-2, and HarP. Our extensive experimental results demonstrate that the proposed model significantly improves the performance of AD classification and hippocampus segmentation with strong generalization capabilities. Our implementation and model are available at https://github.com/LeoLjl/Explainable-Alzheimer-s-Disease-Diagnosis.

2.
Eur Heart J Digit Health ; 5(3): 235-246, 2024 May.
Article in English | MEDLINE | ID: mdl-38774373

ABSTRACT

Aims: Patients with atrial fibrillation (AF) have a higher risk of ischaemic stroke and death. While anticoagulants are effective at reducing these risks, they increase the risk of bleeding. Current clinical risk scores only perform modestly in predicting adverse outcomes, especially for the outcome of death. We aimed to test the multi-label gradient boosting decision tree (ML-GBDT) model in predicting risks for adverse outcomes in a prospective global AF registry. Methods and results: We studied patients from phase II/III of the Global Registry on Long-Term Oral Anti-Thrombotic Treatment in Patients with Atrial Fibrillation registry between 2011 and 2020. The outcomes were all-cause death, ischaemic stroke, and major bleeding within 1 year following the AF. We trained the ML-GBDT model and compared its discrimination with the clinical scores in predicting patient outcomes. A total of 25 656 patients were included [mean age 70.3 years (SD 10.3); 44.8% female]. Within 1 year after AF, ischaemic stroke occurred in 215 (0.8%), major bleeding in 405 (1.6%), and death in 897 (3.5%) patients. Our model achieved an optimized area under the curve in predicting death (0.785, 95% CI: 0.757-0.813) compared with the Charlson Comorbidity Index (0.747, P = 0.007), ischaemic stroke (0.691, 0.626-0.756) compared with CHA2DS2-VASc (0.613, P = 0.028), and major bleeding (0.698, 0.651-0.745) as opposed to HAS-BLED (0.607, P = 0.002), with improvement in net reclassification index (10.0, 12.5, and 23.6%, respectively). Conclusion: The ML-GBDT model outperformed clinical risk scores in predicting the risks in patients with AF. This approach could be used as a single multifaceted holistic tool to optimize patient risk assessment and mitigate adverse outcomes when managing AF.

3.
Article in English | MEDLINE | ID: mdl-38781059

ABSTRACT

This paper proposes a novel transformer-based framework to generate accurate class-specific object localization maps for weakly supervised semantic segmentation (WSSS). Leveraging the insight that the attended regions of the one-class token in the standard vision transformer can generate class-agnostic localization maps, we investigate the transformer's capacity to capture class-specific attention for class-discriminative object localization by learning multiple class tokens. We present the Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with patch tokens. This is facilitated by a class-aware training strategy that establishes a one-to-one correspondence between output class tokens and ground-truth class labels. We also introduce a Contrastive-Class-Token (CCT) module to enhance the learning of discriminative class tokens, enabling the model to better capture the unique characteristics of each class. Consequently, the proposed framework effectively generates class-discriminative object localization maps from the class-to-patch attentions associated with different class tokens. To refine these localization maps, we propose the utilization of patch-level pairwise affinity derived from the patch-to-patch transformer attention. Furthermore, the proposed framework seamlessly complements the Class Activation Mapping (CAM) method, yielding significant improvements in WSSS performance on PASCAL VOC 2012 and MS COCO 2014. These results underline the importance of the class token for WSSS. The codes and models are publicly available here.

4.
IEEE Trans Image Process ; 33: 2639-2651, 2024.
Article in English | MEDLINE | ID: mdl-38551827

ABSTRACT

Current semi-supervised video object segmentation (VOS) methods often employ the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we introduce a Region Aware Video Object Segmentation (RAVOS) approach, which predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS includes a fast object motion tracker to predict object ROIs in the next frame. For efficient segmentation, object features are extracted based on the ROIs, and an object decoder is designed for object-level segmentation. For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects. In addition to RAVOS, we also propose a large-scale occluded VOS dataset, dubbed OVOS, to benchmark the performance of VOS models under occlusions. Evaluation on DAVIS and YouTube-VOS benchmarks and our new OVOS dataset show that our method achieves state-of-the-art performance with significantly faster inference time, e.g., 86.1 J & F at 42 FPS on DAVIS and 84.4 J & F at 23 FPS on YouTube-VOS. Project page: ravos.netlify.app.

5.
Article in English | MEDLINE | ID: mdl-38478447

ABSTRACT

Most existing weakly supervised semantic segmentation (WSSS) methods rely on class activation mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pretrained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet + , a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant intertask correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet + , saliency detection and multilabel image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.

6.
Sci Rep ; 14(1): 6163, 2024 03 14.
Article in English | MEDLINE | ID: mdl-38485985

ABSTRACT

This study explores the effectiveness of Explainable Artificial Intelligence (XAI) for predicting suicide risk from medical tabular data. Given the common challenge of limited datasets in health-related Machine Learning (ML) applications, we use data augmentation in tandem with ML to enhance the identification of individuals at high risk of suicide. We use SHapley Additive exPlanations (SHAP) for XAI and traditional correlation analysis to rank feature importance, pinpointing primary factors influencing suicide risk and preventive measures. Experimental results show the Random Forest (RF) model is excelling in accuracy, F1 score, and AUC (>97% across metrics). According to SHAP, anger issues, depression, and social isolation emerge as top predictors of suicide risk, while individuals with high incomes, esteemed professions, and higher education present the lowest risk. Our findings underscore the effectiveness of ML and XAI in suicide risk assessment, offering valuable insights for psychiatrists and facilitating informed clinical decisions.


Subject(s)
Artificial Intelligence , Suicide , Humans , Machine Learning , Anger , Risk Assessment
7.
PLoS One ; 18(12): e0279953, 2023.
Article in English | MEDLINE | ID: mdl-38096321

ABSTRACT

INTRODUCTION: Natural language processing (NLP) uses various computational methods to analyse and understand human language, and has been applied to data acquired at Emergency Department (ED) triage to predict various outcomes. The objective of this scoping review is to evaluate how NLP has been applied to data acquired at ED triage, assess if NLP based models outperform humans or current risk stratification techniques when predicting outcomes, and assess if incorporating free-text improve predictive performance of models when compared to predictive models that use only structured data. METHODS: All English language peer-reviewed research that applied an NLP technique to free-text obtained at ED triage was eligible for inclusion. We excluded studies focusing solely on disease surveillance, and studies that used information obtained after triage. We searched the electronic databases MEDLINE, Embase, Cochrane Database of Systematic Reviews, Web of Science, and Scopus for medical subject headings and text keywords related to NLP and triage. Databases were last searched on 01/01/2022. Risk of bias in studies was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Due to the high level of heterogeneity between studies and high risk of bias, a metanalysis was not conducted. Instead, a narrative synthesis is provided. RESULTS: In total, 3730 studies were screened, and 20 studies were included. The population size varied greatly between studies ranging from 1.8 million patients to 598 triage notes. The most common outcomes assessed were prediction of triage score, prediction of admission, and prediction of critical illness. NLP models achieved high accuracy in predicting need for admission, triage score, critical illness, and mapping free-text chief complaints to structured fields. Incorporating both structured data and free-text data improved results when compared to models that used only structured data. However, the majority of studies (80%) were assessed to have a high risk of bias, and only one study reported the deployment of an NLP model into clinical practice. CONCLUSION: Unstructured free-text triage notes have been used by NLP models to predict clinically relevant outcomes. However, the majority of studies have a high risk of bias, most research is retrospective, and there are few examples of implementation into clinical practice. Future work is needed to prospectively assess if applying NLP to data acquired at ED triage improves ED outcomes when compared to usual clinical practice.


Subject(s)
Natural Language Processing , Triage , Critical Illness , Emergency Service, Hospital , Retrospective Studies , Systematic Reviews as Topic
8.
Comput Struct Biotechnol J ; 21: 5676-5685, 2023.
Article in English | MEDLINE | ID: mdl-38058296

ABSTRACT

Long non-coding ribonucleic acids (lncRNAs) have been shown to play an important role in plant gene regulation, involving both epigenetic and transcript regulation. LncRNAs are transcripts longer than 200 nucleotides that are not translated into functional proteins but can be translated into small peptides. Machine learning models have predominantly used transcriptome data with manually defined features to detect lncRNAs, however, they often underrepresent the abundance of lncRNAs and can be biased in their detection. Here we present a study using Natural Language Processing (NLP) models to identify plant lncRNAs from genomic sequences rather than transcriptomic data. The NLP models were trained to predict lncRNAs for seven model and crop species (Zea mays, Arabidopsis thaliana, Brassica napus, Brassica oleracea, Brassica rapa, Glycine max and Oryza sativa) using publicly available genomic references. We demonstrated that lncRNAs can be accurately predicted from genomic sequences with the highest accuracy of 83.4% for Z. mays and the lowest accuracy of 57.9% for B. rapa, revealing that genome assembly quality might affect the accuracy of lncRNA identification. Furthermore, we demonstrated the potential of using NLP models for cross-species prediction with an average of 63.1% accuracy using target species not previously seen by the model. As more species are incorporated into the training datasets, we expect the accuracy to increase, becoming a more reliable tool for uncovering novel lncRNAs. Finally, we show that the models can be interpreted using explainable artificial intelligence to identify motifs important to lncRNA prediction and that these motifs frequently flanked the lncRNA sequence.

9.
Life (Basel) ; 13(9)2023 Sep 05.
Article in English | MEDLINE | ID: mdl-37763273

ABSTRACT

Atrial fibrillation arises mainly due to abnormalities in the cardiac conduction system and is associated with anatomical remodeling of the atria and the pulmonary veins. Cardiovascular imaging techniques, such as echocardiography, computed tomography, and magnetic resonance imaging, are crucial in the management of atrial fibrillation, as they not only provide anatomical context to evaluate structural alterations but also help in determining treatment strategies. However, interpreting these images requires significant human expertise. The potential of artificial intelligence in analyzing these images has been repeatedly suggested due to its ability to automate the process with precision comparable to human experts. This review summarizes the benefits of artificial intelligence in enhancing the clinical care of patients with atrial fibrillation through cardiovascular image analysis. It provides a detailed overview of the two most critical steps in image-guided AF management, namely, segmentation and classification. For segmentation, the state-of-the-art artificial intelligence methodologies and the factors influencing the segmentation performance are discussed. For classification, the applications of artificial intelligence in the diagnosis and prognosis of atrial fibrillation are provided. Finally, this review also scrutinizes the current challenges hindering the clinical applicability of these methods, with the aim of guiding future research toward more effective integration into clinical practice.

10.
IEEE Trans Image Process ; 32: 4800-4811, 2023.
Article in English | MEDLINE | ID: mdl-37610890

ABSTRACT

Cross-resolution person re-identification (CRReID) is a challenging and practical problem that involves matching low-resolution (LR) query identity images against high-resolution (HR) gallery images. Query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. State-of-the-art solutions for CRReID either learn a resolution-invariant representation or adopt a super-resolution (SR) module to recover the missing information from the LR query. In this paper, we propose an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric that is adaptive to the resolution of a query image. We realize this idea by learning resolution-adaptive representations for cross-resolution comparison. We propose two resolution-adaptive mechanisms to achieve this. The first mechanism encodes the resolution specifics into different subvectors in the penultimate layer of the deep neural network, creating a varying-length representation. To better extract resolution-dependent information, we further propose to learn resolution-adaptive masks for intermediate residual feature blocks. A novel progressive learning strategy is proposed to train those masks properly. These two mechanisms are combined to boost the performance of CRReID. Experimental results show that the proposed method outperforms existing approaches and achieves state-of-the-art performance on multiple CRReID benchmarks.

11.
PLoS One ; 18(8): e0290642, 2023.
Article in English | MEDLINE | ID: mdl-37651380

ABSTRACT

INTRODUCTION: Surveys conducted internationally have found widespread interest in artificial intelligence (AI) amongst medical students. No similar surveys have been conducted in Western Australia (WA) and it is not known how medical students in WA feel about the use of AI in healthcare or their understanding of AI. We aim to assess WA medical students' attitudes towards AI in general, AI in healthcare, and the inclusion of AI education in the medical curriculum. METHODS: A digital survey instrument was developed based on a review of available literature and consultation with subject matter experts. The survey was piloted with a group of medical students and refined based on their feedback. We then sent this anonymous digital survey to all medical students in WA (approximately 1539 students). Responses were open from the 7th of September 2021 to the 7th of November 2021. Students' categorical responses were qualitatively analysed, and free text comments from the survey were qualitatively analysed using open coding techniques. RESULTS: Overall, 134 students answered one or more questions (8.9% response rate). The majority of students (82.0%) were 20-29 years old, studying medicine as a postgraduate degree (77.6%), and had started clinical rotations (62.7%). Students were interested in AI (82.6%), self-reported having a basic understanding of AI (84.8%), but few agreed that they had an understanding of the basic computational principles of AI (33.3%) or the limitations of AI (46.2%). Most students (87.5%) had not received teaching in AI. The majority of students (58.6%) agreed that AI should be part of medical training and most (72.7%) wanted more teaching focusing on AI in medicine. Medical students appeared optimistic regarding the role of AI in medicine, with most (74.4%) agreeing with the statement that AI will improve medicine in general. The majority (56.6%) of medical students were not concerned about the impact of AI on their job security as a doctor. Students selected radiology (72.6%), pathology (58.2%), and medical administration (44.8%) as the specialties most likely to be impacted by AI, and psychiatry (61.2%), palliative care (48.5%), and obstetrics and gynaecology (41.0%) as the specialties least likely to be impacted by AI. Qualitative analysis of free text comments identified the use of AI as a tool, and that doctors will not be replaced as common themes. CONCLUSION: Medical students in WA appear to be interested in AI. However, they have not received education about AI and do not feel they understand its basic computational principles or limitations. AI appears to be a current deficit in the medical curriculum in WA, and most students surveyed were supportive of its introduction. These results are consistent with previous surveys conducted internationally.


Subject(s)
Obstetrics , Students, Medical , Female , Pregnancy , Humans , Young Adult , Adult , Australia , Artificial Intelligence , Attitude , Delivery of Health Care
12.
Comput Methods Programs Biomed ; 240: 107685, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37429247

ABSTRACT

BACKGROUND AND OBJECTIVE: The generation of three-dimensional (3D) medical images has great application potential since it takes into account the 3D anatomical structure. Two problems prevent effective training of a 3D medical generative model: (1) 3D medical images are expensive to acquire and annotate, resulting in an insufficient number of training images, and (2) a large number of parameters are involved in 3D convolution. METHODS: We propose a novel GAN model called 3D Split&Shuffle-GAN. To address the 3D data scarcity issue, we first pre-train a two-dimensional (2D) GAN model using abundant image slices and inflate the 2D convolution weights to improve the initialization of the 3D GAN. Novel 3D network architectures are proposed for both the generator and discriminator of the GAN model to significantly reduce the number of parameters while maintaining the quality of image generation. Several weight inflation strategies and parameter-efficient 3D architectures are investigated. RESULTS: Experiments on both heart (Stanford AIMI Coronary Calcium) and brain (Alzheimer's Disease Neuroimaging Initiative) datasets show that our method leads to improved 3D image generation quality (14.7 improvements on Frchet inception distance) with significantly fewer parameters (only 48.5% of the baseline method). CONCLUSIONS: We built a parameter-efficient 3D medical image generation model. Due to the efficiency and effectiveness, it has the potential to generate high-quality 3D brain and heart images for real use cases.


Subject(s)
Image Processing, Computer-Assisted , Imaging, Three-Dimensional , Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Brain/diagnostic imaging , Neuroimaging
13.
Comput Methods Programs Biomed ; 240: 107717, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37454499

ABSTRACT

BACKGROUND: Cardiac exercise stress testing (EST) offers a non-invasive way in the management of patients with suspected coronary artery disease (CAD). However, up to 30% EST results are either inconclusive or non-diagnostic, which results in significant resource wastage. Our aim was to build machine learning (ML) based models, using patients demographic (age, sex) and pre-test clinical information (reason for performing test, medications, blood pressure, heart rate, and resting electrocardiogram), capable of predicting EST results beforehand including those with inconclusive or non-diagnostic results. METHODS: A total of 30,710 patients (mean age 54.0 years, 69% male) were included in the study with 25% randomly sampled in the test set, and the remaining samples were split into a train and validation set with a ratio of 9:1. We constructed different ML models from pre-test variables and compared their discriminant power using the area under the receiver operating characteristic curve (AUC). RESULTS: A network of Oblivious Decision Trees provided the best discriminant power (AUC=0.83, sensitivity=69%, specificity=0.78%) for predicting inconclusive EST results. A total of 2010 inconclusive ESTs were correctly identified in the testing set. CONCLUSIONS: Our ML model, developed using demographic and pre-test clinical information, can accurately predict EST results and could be used to identify patients with inconclusive or non-diagnostic results beforehand. Our system could thus be used as a personalised decision support tool by clinicians for optimizing the diagnostic test selection strategy for CAD patients and to reduce healthcare expenditure by reducing nondiagnostic or inconclusive ESTs.


Subject(s)
Coronary Artery Disease , Deep Learning , Humans , Middle Aged , Coronary Artery Disease/diagnosis , Exercise Test/methods , Coronary Angiography , Diagnostic Tests, Routine
14.
PLoS One ; 18(6): e0286460, 2023.
Article in English | MEDLINE | ID: mdl-37289835

ABSTRACT

Hajj, the Muslim pilgrimage, is a large mass gathering event that involves performing rituals at several sites on specific days and times in a fixed order, thereby requiring transport of pilgrims between sites. For the past two decades, Hajj transport has relied on conventional and shuttle buses, train services, and pilgrims walking along pedestrian routes that link these sites. To ensure smooth and efficient transport during Hajj, specific groups of pilgrims are allocated with the cooperation of Hajj authorities to specific time windows, modes, and routes. However, the large number of pilgrims, delays and changes in bus schedules/timetables, and occasional lack of coordination between transport modes have often caused congestion or delays in pilgrim transfer between sites, with a cascading effect on transport management. This study focuses on modelling and simulating the transport of pilgrims between the sites using a discrete event simulation tool called "ExtendSim". Three transport modules were validated, and different scenarios were developed. These scenarios consider changes in the percentages of pilgrims allocated to each transport mode and the scheduling of various modes. The results can aid authorities to make informed decisions regarding transport strategies for managing the transport infrastructure and fleets. The proposed solutions could be implemented with judicious allocation of resources, through pre-event planning and real-time monitoring during the event.


Subject(s)
Islam , Travel , Records , Saudi Arabia
15.
Article in English | MEDLINE | ID: mdl-37043325

ABSTRACT

Freezing of Gait (FoG) is a common symptom of Parkinson's disease (PD), manifesting as a brief, episodic absence, or marked reduction in walking, despite a patient's intention to move. Clinical assessment of FoG events from manual observations by experts is both time-consuming and highly subjective. Therefore, machine learning-based FoG identification methods would be desirable. In this article, we address this task as a fine-grained human action recognition problem based on vision inputs. A novel deep learning architecture, namely, higher order polynomial transformer (HP-Transformer), is proposed to incorporate pose and appearance feature sequences to formulate fine-grained FoG patterns. In particular, a higher order self-attention mechanism is proposed based on higher order polynomials. To this end, linear, bilinear, and trilinear transformers are formulated in pursuit of discriminative fine-grained representations. These representations are treated as multiple streams and further fused by a cross-order fusion strategy for FoG detection. Comprehensive experiments on a large in-house dataset collected during clinical assessments demonstrate the effectiveness of the proposed method, and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.92 is achieved for detecting FoG.

16.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4153-4166, 2023 Aug.
Article in English | MEDLINE | ID: mdl-34752411

ABSTRACT

Social reviews are indispensable resources for modern consumers' decision making. To influence the reviews, for financial gains, some companies may choose to pay groups of fraudsters rather than individuals to demote or promote products and services. This is because consumers are more likely to be misled by a large amount of similar reviews, produced by a group of fraudsters. Semantic relation such as content similarity (CS) and polarity similarity is an important factor characterizing solicited group frauds. Recent approaches on fraudster group detection employed handcrafted features of group behaviors that failed to capture the semantic relation of review text from the reviewers. In this article, we propose the first neural approach, HIN-RNN, a heterogeneous information network (HIN) compatible recurrent neural network (RNN) for fraudster group detection that makes use of semantic similarity and requires no handcrafted features. The HIN-RNN provides a unifying architecture for representation learning of each reviewer, with the initial vector as the sum of word embeddings (SoWEs) of all review text written by the same reviewer, concatenated by the ratio of negative reviews. Given a co-review network representing reviewers who have reviewed the same items with similar ratings and the reviewers' vector representation, a collaboration matrix is captured through the HIN-RNN training. The proposed approach is demonstrated to be effective with marked improvement over state-of-the-art approaches on both the Yelp (22% and 12% in terms of recall and F1-value, respectively) and Amazon (4% and 2% in terms of recall and F1-value, respectively) datasets.

17.
IEEE Trans Neural Netw Learn Syst ; 34(3): 1588-1600, 2023 03.
Article in English | MEDLINE | ID: mdl-34464270

ABSTRACT

Freezing of gait (FoG) is identified as a sudden and brief episode of movement cessation despite the intention to continue walking. It is one of the most disabling symptoms of Parkinson's disease (PD) and often leads to falls and injuries. Many computer-aided FoG detection methods have been proposed to use data collected from unimodal sources, such as motion sensors, pressure sensors, and video cameras. However, there are limited efforts of multimodal-based methods to maximize the value of all the information collected from different modalities in clinical assessments and improve the FoG detection performance. Therefore, in this study, a novel end-to-end deep architecture, namely graph fusion neural network (GFN), is proposed for multimodal learning-based FoG detection by combining footstep pressure maps and video recordings. GFN constructs multimodal graphs by treating the encoded features of each modality as vertex-level inputs and measures their adjacency patterns to construct complementary FoG representations, thus reducing the representation redundancy among different modalities. In addition, since GFN is devised to process multimodal graphs of arbitrary structures, it is expected to achieve superior performance with inputs containing missing modalities, compared to the alternative unimodal methods. A multimodal FoG dataset was collected, which included clinical assessment videos and footstep pressure sequences of 340 trials from 20 PD patients. Our proposed GFN demonstrates a great promise of multimodal FoG detection with an area under the curve (AUC) of 0.882. To the best of our knowledge, this is one of the first studies to utilize multimodal learning for automated FoG detection, which offers significant opportunities for better patient assessments and clinical trials in the future.


Subject(s)
Gait Disorders, Neurologic , Parkinson Disease , Humans , Parkinson Disease/diagnosis , Parkinson Disease/therapy , Gait Disorders, Neurologic/diagnosis , Neural Networks, Computer , Gait , Movement
18.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1335-1352, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35358041

ABSTRACT

We propose a novel framework to learn the spatiotemporal variability in longitudinal 3D shape data sets, which contain observations of objects that evolve and deform over time. This problem is challenging since surfaces come with arbitrary parameterizations and thus, they need to be spatially registered. Also, different deforming objects, hereinafter referred to as 4D surfaces, evolve at different speeds and thus they need to be temporally aligned. We solve this spatiotemporal registration problem using a Riemannian approach. We treat a 3D surface as a point in a shape space equipped with an elastic Riemannian metric that measures the amount of bending and stretching that the surfaces undergo. A 4D surface can then be seen as a trajectory in this space. With this formulation, the statistical analysis of 4D surfaces can be cast as the problem of analyzing trajectories embedded in a nonlinear Riemannian manifold. However, performing the spatiotemporal registration, and subsequently computing statistics, on such nonlinear spaces is not straightforward as they rely on complex nonlinear optimizations. Our core contribution is the mapping of the surfaces to the space of Square-Root Normal Fields (SRNF) where the [Formula: see text] metric is equivalent to the partial elastic metric in the space of surfaces. Thus, by solving the spatial registration in the SRNF space, the problem of analyzing 4D surfaces becomes the problem of analyzing trajectories embedded in the SRNF space, which has a euclidean structure. In this paper, we develop the building blocks that enable such analysis. These include: (1) the spatiotemporal registration of arbitrarily parameterized 4D surfaces even in the presence of large elastic deformations and large variations in their execution rates; (2) the computation of geodesics between 4D surfaces; (3) the computation of statistical summaries, such as means and modes of variation, of collections of 4D surfaces; and (4) the synthesis of random 4D surfaces. We demonstrate the performance of the proposed framework using 4D facial surfaces and 4D human body shapes.

19.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3200-3225, 2023 03.
Article in English | MEDLINE | ID: mdl-35700242

ABSTRACT

Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this article, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions.


Subject(s)
Algorithms , Pattern Recognition, Automated , Humans , Pattern Recognition, Automated/methods , Acceleration , Human Activities
20.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6511-6536, 2023 May.
Article in English | MEDLINE | ID: mdl-36063506

ABSTRACT

In recent years, advancements in machine learning (ML) techniques, in particular, deep learning (DL) methods have gained a lot of momentum in solving inverse imaging problems, often surpassing the performance provided by hand-crafted approaches. Traditionally, analytical methods have been used to solve inverse imaging problems such as image restoration, inpainting, and superresolution. Unlike analytical methods for which the problem is explicitly defined and the domain knowledge is carefully engineered into the solution, DL models do not benefit from such prior knowledge and instead make use of large datasets to predict an unknown solution to the inverse problem. Recently, a new paradigm of training deep models using a single image, named untrained neural network prior (UNNP) has been proposed to solve a variety of inverse tasks, e.g., restoration and inpainting. Since then, many researchers have proposed various applications and variants of UNNP. In this paper, we present a comprehensive review of such studies and various UNNP applications for different tasks and highlight various open research problems which require further research.

SELECTION OF CITATIONS
SEARCH DETAIL
...