Search | VHL Regional Portal

1.

Federated learning using model projection for multi-center disease diagnosis with non-IID data.

Du, Jie; Li, Wei; Liu, Peng; Vong, Chi-Man; You, Yongke; Lei, Baiying; Wang, Tianfu.

Neural Netw ; 178: 106409, 2024 May 24.

Article in English | MEDLINE | ID: mdl-38823069

ABSTRACT

Multi-center disease diagnosis aims to build a global model for all involved medical centers. Due to privacy concerns, it is infeasible to collect data from multiple centers for training (i.e., centralized learning). Federated Learning (FL) is a decentralized framework that enables multiple clients (e.g., medical centers) to collaboratively train a global model while retaining patient data locally for privacy. However, in practice, the data across medical centers are not independently and identically distributed (Non-IID), causing two challenging issues: (1) catastrophic forgetting at clients, i.e., the local model at clients will forget the knowledge received from the global model after local training, causing reduced performance; and (2) invalid aggregation at the server, i.e., the global model at the server may not be favorable to some clients after model aggregation, resulting in a slow convergence rate. To mitigate these issues, an innovative Federated learning using Model Projection (FedMoP) is proposed, which guarantees: (1) the loss of local model on global data does not increase after local training without accessing the global data so that the performance will not be degenerated; and (2) the loss of global model on local data does not increase after aggregation without accessing local data so that convergence rate can be improved. Extensive experimental results show that our FedMoP outperforms state-of-the-art FL methods in terms of accuracy, convergence rate and communication cost. In particular, our FedMoP also achieves comparable or even higher accuracy than centralized learning. Thus, our FedMoP can ensure privacy protection while outperforming centralized learning in accuracy and communication cost.

2.

Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition.

Pang, Miaoqi; Wang, Hongtao; Huang, Jiayang; Vong, Chi-Man; Zeng, Zhiqiang; Chen, Chuangquan.

IEEE Trans Neural Syst Rehabil Eng ; 32: 1637-1646, 2024.

Article in English | MEDLINE | ID: mdl-38619940

ABSTRACT

Affective brain-computer interfaces (aBCIs) have garnered widespread applications, with remarkable advancements in utilizing electroencephalogram (EEG) technology for emotion recognition. However, the time-consuming process of annotating EEG data, inherent individual differences, non-stationary characteristics of EEG data, and noise artifacts in EEG data collection pose formidable challenges in developing subject-specific cross-session emotion recognition models. To simultaneously address these challenges, we propose a unified pre-training framework based on multi-scale masked autoencoders (MSMAE), which utilizes large-scale unlabeled EEG signals from multiple subjects and sessions to extract noise-robust, subject-invariant, and temporal-invariant features. We subsequently fine-tune the obtained generalized features with only a small amount of labeled data from a specific subject for personalization and enable cross-session emotion recognition. Our framework emphasizes: 1) Multi-scale representation to capture diverse aspects of EEG signals, obtaining comprehensive information; 2) An improved masking mechanism for robust channel-level representation learning, addressing missing channel issues while preserving inter-channel relationships; and 3) Invariance learning for regional correlations in spatial-level representation, minimizing inter-subject and inter-session variances. Under these elaborate designs, the proposed MSMAE exhibits a remarkable ability to decode emotional states from a different session of EEG data during the testing phase. Extensive experiments conducted on the two publicly available datasets, i.e., SEED and SEED-IV, demonstrate that the proposed MSMAE consistently achieves stable results and outperforms competitive baseline methods in cross-session emotion recognition.

Subject(s)

Algorithms , Brain-Computer Interfaces , Electroencephalography , Emotions , Humans , Emotions/physiology , Electroencephalography/methods , Female , Male , Machine Learning , Artifacts , Adult , Neural Networks, Computer

3.

Fast Broad Multiview Multi-Instance Multilabel Learning (FBM3L) With Viewwise Intercorrelation.

Lai, Qi; Vong, Chi-Man; Zhou, Jianhang; Zhou, Yimin; Chen, C L Philip.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jun 28.

Article in English | MEDLINE | ID: mdl-37379190

ABSTRACT

Multiview multi-instance multilabel learning (M3L) is a popular research topic during the past few years in modeling complex real-world objects such as medical images and subtitled video. However, existing M3L methods suffer from relatively low accuracy and training efficiency for large datasets due to several issues: 1) the viewwise intercorrelation (i.e., the correlations of instances and/or bags between different views) are neglected; 2) the diverse correlations (e.g., viewwise intercorrelation, interinstance correlation, and interlabel correlation) are not jointly considered; and 3) high computation burden for training process over bags, instances, and labels across different views. To resolve these issues, a novel framework called fast broad M3L (FBM3L) is proposed with three innovations: 1) utilization of viewwise intercorrelation for better modeling of M3L tasks while existing M3L methods have not considered; 2) based on graph convolutional network (GCN) and broad learning system (BLS), a viewwise subnetwork is newly designed to achieve joint learning among the diverse correlations; and 3) under BLS platform, FBM3L can learn multiple subnetworks jointly across all views with significantly less training time. Experiments show that FBM3L is highly competitive (or even better than) in all evaluation metrics up to 64% in average precision (AP) and much faster than most M3L (or MIML) methods (up to 1030 times), especially on large multiview datasets ( ≥ 260 K objects).

4.

An Adaptive Deep Metric Learning Loss Function for Class-Imbalance Learning via Intraclass Diversity and Interclass Distillation.

Du, Jie; Zhang, Xiaoci; Liu, Peng; Vong, Chi-Man; Wang, Tianfu.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jun 28.

Article in English | MEDLINE | ID: mdl-37379193

ABSTRACT

Deep metric learning (DML) has been widely applied in various tasks (e.g., medical diagnosis and face recognition) due to the effective extraction of discriminant features via reducing data overlapping. However, in practice, these tasks also easily suffer from two class-imbalance learning (CIL) problems: data scarcity and data density, causing misclassification. Existing DML losses rarely consider these two issues, while CIL losses cannot reduce data overlapping and data density. In fact, it is a great challenge for a loss function to mitigate the impact of these three issues simultaneously, which is the objective of our proposed intraclass diversity and interclass distillation (IDID) loss with adaptive weight in this article. IDID-loss generates diverse features within classes regardless of the class sample size (to alleviate the issues of data scarcity and data density) and simultaneously preserves the semantic correlations between classes using learnable similarity when pushing different classes away from each other (to reduce overlapping). In summary, our IDID-loss provides three advantages: 1) it can simultaneously mitigate all the three issues while DML and CIL losses cannot; 2) it generates more diverse and discriminant feature representations with higher generalization ability, compared with DML losses; and 3) it provides a larger improvement on the classes of data scarcity and density with a smaller sacrifice on easy class accuracy, compared with CIL losses. Experimental results on seven public real-world datasets show that our IDID-loss achieves the best performances in terms of G-mean, F1-score, and accuracy when compared with both state-of-the-art (SOTA) DML and CIL losses. In addition, it gets rid of the time-consuming fine-tuning process over the hyperparameters of loss function.

5.

A novel sequential structure for lightweight multi-scale feature learning under limited available images.

Liu, Peng; Du, Jie; Vong, Chi-Man.

Neural Netw ; 164: 124-134, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37148608

ABSTRACT

Although multi-scale feature learning can improve the performances of deep models, its parallel structure quadratically increases the model parameters and causes deep models to become larger and larger when enlarging the receptive fields (RFs). This leads to deep models easily suffering from over-fitting issue in many practical applications where the available training samples are always insufficient or limited. In addition, under this limited situation, although lightweight models (with fewer model parameters) can effectively reduce over-fitting, they may suffer from under-fitting because of insufficient training data for effective feature learning. In this work, a lightweight model called Sequential Multi-scale Feature Learning Network (SMF-Net) is proposed to alleviate these two issues simultaneously using a novel sequential structure of multi-scale feature learning. Compared to both deep and lightweight models, the proposed sequential structure in SMF-Net can easily extract features with larger RFs for multi-scale feature learning only with a few and linearly increased model parameters. The experimental results on both classification and segmentation tasks demonstrate that our SMF-Net only has 1.25M model parameters (5.3% of Res2Net50) with 0.7G FLOPS (14.6% of Res2Net50) for classification and 1.54M parameters (8.9% of UNet) with 3.35G FLOPs (10.9% of UNet) for segmentation but achieves higher accuracy than SOTA deep models and lightweight models, even when the training data is very limited available.

Subject(s)

Image Processing, Computer-Assisted , Learning

6.

Class-Incremental Learning Method With Fast Update and High Retainability Based on Broad Learning System.

Du, Jie; Liu, Peng; Vong, Chi-Man; Chen, Chuangquan; Wang, Tianfu; Chen, C L Philip.

IEEE Trans Neural Netw Learn Syst ; PP2023 Mar 29.

Article in English | MEDLINE | ID: mdl-37030863

ABSTRACT

Machine learning aims to generate a predictive model from a training dataset of a fixed number of known classes. However, many real-world applications (such as health monitoring and elderly care) are data streams in which new data arrive continually in a short time. Such new data may even belong to previously unknown classes. Hence, class-incremental learning (CIL) is necessary, which incrementally and rapidly updates an existing model with the data of new classes while retaining the existing knowledge of old classes. However, most current CIL methods are designed based on deep models that require a computationally expensive training and update process. In addition, deep learning based CIL (DCIL) methods typically employ stochastic gradient descent (SGD) as an optimizer that forgets the old knowledge to a certain extent. In this article, a broad learning system-based CIL (BLS-CIL) method with fast update and high retainability of old class knowledge is proposed. Traditional BLS is a fast and effective shallow neural network, but it does not work well on CIL tasks. However, our proposed BLS-CIL can overcome these issues and provide the following: 1) high accuracy due to our novel class-correlation loss function that considers the correlations between old and new classes; 2) significantly short training/update time due to the newly derived closed-form solution for our class-correlation loss without iterative optimization; and 3) high retainability of old class knowledge due to our newly derived recursive update rule for CIL (RULL) that does not replay the exemplars of all old classes, as contrasted to the exemplars-replaying methods with the SGD optimizer. The proposed BLS-CIL has been evaluated over 12 real-world datasets, including seven tabular/numerical datasets and six image datasets, and the compared methods include one shallow network and seven classical or state-of-the-art DCIL methods. Experimental results show that our BIL-CIL can significantly improve the classification performance over a shallow network by a large margin (8.80%-48.42%). It also achieves comparable or even higher accuracy than DCIL methods, but greatly reduces the training time from hours to minutes and the update time from minutes to seconds.

7.

Parameter-Free Loss for Class-Imbalanced Deep Learning in Image Classification.

Du, Jie; Zhou, Yanhong; Liu, Peng; Vong, Chi-Man; Wang, Tianfu.

IEEE Trans Neural Netw Learn Syst ; 34(6): 3234-3240, 2023 Jun.

Article in English | MEDLINE | ID: mdl-34524963

ABSTRACT

Current state-of-the-art class-imbalanced loss functions for deep models require exhaustive tuning on hyperparameters for high model performance, resulting in low training efficiency and impracticality for nonexpert users. To tackle this issue, a parameter-free loss (PF-loss) function is proposed, which works for both binary and multiclass-imbalanced deep learning for image classification tasks. PF-loss provides three advantages: 1) training time is significantly reduced due to NO tuning on hyperparameter(s); 2) it dynamically pays more attention on minority classes (rather than outliers compared to the existing loss functions) with NO hyperparameters in the loss function; and 3) higher accuracy can be achieved since it adapts to the changes of data distribution in each mini-batch instead of the fixed hyperparameters in the existing methods during training, especially when the data are highly skewed. Experimental results on some classical image datasets with different imbalance ratios (IR, up to 200) show that PF-loss reduces the training time down to 1/148 of that spent by compared state-of-the-art losses and simultaneously achieves comparable or even higher accuracy in terms of both G-mean and area under receiver operating characteristic (ROC) curve (AUC) metrics, especially when the data are highly skewed.

8.

Fast AUC Maximization Learning Machine With Simultaneous Outlier Detection.

Sun, Yichen; Vong, Chi Man; Wang, Shitong.

IEEE Trans Cybern ; 53(11): 6843-6857, 2023 Nov.

Article in English | MEDLINE | ID: mdl-35476558

ABSTRACT

While AUC maximizing support vector machine (AUCSVM) has been developed to solve imbalanced classification tasks, its huge computational burden will make AUCSVM become impracticable and even computationally forbidden for medium or large-scale imbalanced data. In addition, minority class sometimes means extremely important information for users or is corrupted by noises and/or outliers in practical application scenarios such as medical diagnosis, which actually inspires us to generalize the AUC concept to reflect such importance or upper bound of noises or outliers. In order to address these issues, by means of both the generalized AUC metric and the core vector machine (CVM) technique, a fast AUC maximizing learning machine, called ρ -AUCCVM, with simultaneous outlier detection is proposed in this study. ρ -AUCCVM has its notorious merits: 1) it indeed shares the CVM's advantage, that is, asymptotically linear time complexity with respect to the total number of sample pairs, together with space complexity independent on the total number of sample pairs and 2) it can automatically determine the importance of the minority class (assuming no noise) or the upper bound of noises or outliers. Extensive experimental results about benchmarking imbalanced datasets verify the above advantages of ρ -AUCCVM.

9.

Accurate and Efficient Large-Scale Multi-Label Learning With Reduced Feature Broad Learning System Using Label Correlation.

Huang, Jintao; Vong, Chi-Man; Chen, C L Philip; Zhou, Yimin.

IEEE Trans Neural Netw Learn Syst ; 34(12): 10240-10253, 2023 Dec.

Article in English | MEDLINE | ID: mdl-35436203

ABSTRACT

Multi-label learning for large-scale data is a grand challenge because of a large number of labels with a complex data structure. Hence, the existing large-scale multi-label methods either have unsatisfactory classification performance or are extremely time-consuming for training utilizing a massive amount of data. A broad learning system (BLS), a flat network with the advantages of succinct structures, is appropriate for addressing large-scale tasks. However, existing BLS models are not directly applicable for large-scale multi-label learning due to the large and complex label space. In this work, a novel multi-label classifier based on BLS (called BLS-MLL) is proposed with two new mechanisms: kernel-based feature reduction module and correlation-based label thresholding. The kernel-based feature reduction module contains three layers, namely, the feature mapping layer, enhancement nodes layer, and feature reduction layer. The feature mapping layer employs elastic network regularization to solve the randomness of features in order to improve performance. In the enhancement nodes layer, the kernel method is applied for high-dimensional nonlinear conversion to achieve high efficiency. The newly constructed feature reduction layer is used to further significantly improve both the training efficiency and accuracy when facing high-dimensionality with abundant or noisy information embedded in large-scale data. The correlation-based label thresholding enables BLS-MLL to generate a label-thresholding function for effective conversion of the final decision values to logical outputs, thus, improving the classification performance. Finally, experimental comparisons among six state-of-the-art multi-label classifiers on ten datasets demonstrate the effectiveness of the proposed BLS-MLL. The results of the classification performance show that BLS-MLL outperforms the compared algorithms in 86% of cases with better training efficiency in 90% of cases.

10.

Fuzzy KNN Method With Adaptive Nearest Neighbors.

Bian, Zekang; Vong, Chi Man; Wong, Pak Kin; Wang, Shitong.

IEEE Trans Cybern ; 52(6): 5380-5393, 2022 Jun.

Article in English | MEDLINE | ID: mdl-33232252

ABSTRACT

Due to its strong performance in handling uncertain and ambiguous data, the fuzzy k -nearest-neighbor method (FKNN) has realized substantial success in a wide variety of applications. However, its classification performance would be heavily deteriorated if the number k of nearest neighbors was unsuitably fixed for each testing sample. This study examines the feasibility of using only one fixed k value for FKNN on each testing sample. A novel FKNN-based classification method, namely, fuzzy KNN method with adaptive nearest neighbors (A-FKNN), is devised for learning a distinct optimal k value for each testing sample. In the training stage, after applying a sparse representation method on all training samples for reconstruction, A-FKNN learns the optimal k value for each training sample and builds a decision tree (namely, A-FKNN tree) from all training samples with new labels (the learned optimal k values instead of the original labels), in which each leaf node stores the corresponding optimal k value. In the testing stage, A-FKNN identifies the optimal k value for each testing sample by searching the A-FKNN tree and runs FKNN with the optimal k value for each testing sample. Moreover, a fast version of A-FKNN, namely, FA-FKNN, is designed by building the FA-FKNN decision tree, which stores the optimal k value with only a subset of training samples in each leaf node. Experimental results on 32 UCI datasets demonstrate that both A-FKNN and FA-FKNN outperform the compared methods in terms of classification accuracy, and FA-FKNN has a shorter running time.

Subject(s)

Algorithms , Cluster Analysis

11.

Novel Efficient RNN and LSTM-Like Architectures: Recurrent and Gated Broad Learning Systems and Their Applications for Text Classification.

Du, Jie; Vong, Chi-Man; Chen, C L Philip.

IEEE Trans Cybern ; 51(3): 1586-1597, 2021 Mar.

Article in English | MEDLINE | ID: mdl-32086231

ABSTRACT

High accuracy of text classification can be achieved through simultaneous learning of multiple information, such as sequence information and word importance. In this article, a kind of flat neural networks called the broad learning system (BLS) is employed to derive two novel learning methods for text classification, including recurrent BLS (R-BLS) and long short-term memory (LSTM)-like architecture: gated BLS (G-BLS). The proposed two methods possess three advantages: 1) higher accuracy due to the simultaneous learning of multiple information, even compared to deep LSTM that extracts deeper but single information only; 2) significantly faster training time due to the noniterative learning in BLS, compared to LSTM; and 3) easy integration with other discriminant information for further improvement. The proposed methods have been evaluated over 13 real-world datasets from various types of text classification. From the experimental results, the proposed methods achieve higher accuracies than LSTM while taking significantly less training time on most evaluated datasets, especially when the LSTM is in deep architecture. Compared to R-BLS, G-BLS has an extra forget gate to control the flow of information (similar to LSTM) to further improve the accuracy on text classification so that G-BLS is more effective while R-BLS is more efficient.

12.

Intelligent diagnosis of gastric intestinal metaplasia based on convolutional neural network and limited number of endoscopic images.

Yan, Tao; Wong, Pak Kin; Choi, I Cheong; Vong, Chi Man; Yu, Hon Ho.

Comput Biol Med ; 126: 104026, 2020 11.

Article in English | MEDLINE | ID: mdl-33059237

ABSTRACT

BACKGROUND: Gastric intestinal metaplasia (GIM) is a precancerous lesion of gastric cancer. Currently, diagnosis of GIM is based on the experience of a physician, which is liable to interobserver variability. Thus, an intelligent diagnostic (ID) system, based on narrow-band and magnifying narrow-band images, was constructed to provide objective assistance in the diagnosis of GIM. METHOD: We retrospectively collected 1880 endoscopic images (1048 GIM and 832 non-GIM) via biopsy from 336 patients confirmed histologically as GIM or non-GIM, from the Kiang Wu Hospital, Macau. We developed an ID system with these images using a modified convolutional neural network algorithm. A separate test dataset containing 477 pathologically confirmed images (242 GIM and 235 non-GIM) from 80 patients was used to test the performance of the ID system. Experienced endoscopists also examined the same test dataset, for comparison with the ID system. One of the challenges faced in this study was that it was difficult to obtain a large number of training images. Thus, data augmentation and transfer learning were applied together. RESULTS: The area under the receiver operating characteristic curve was 0.928 for the pre-patient analysis of the ID system, while the sensitivities, specificities, and accuracies of the ID system against those of the human experts were (91.9% vs. 86.5%, p-value = 1.000) (86.0% vs. 81.4%, p-value = 0.754), and (88.8% vs. 83.8%, p-value = 0.424), respectively. Even though the three indices of the ID system were slightly higher than those of the human experts, there were no significant differences. CONCLUSIONS: In this pilot study, a novel ID system was developed to diagnose GIM. This system exhibits promising diagnostic performance. It is believed that the proposed system has the potential for clinical application in the future.

Subject(s)

Precancerous Conditions , Stomach Neoplasms , Humans , Metaplasia/diagnostic imaging , Narrow Band Imaging , Neural Networks, Computer , Pilot Projects , Precancerous Conditions/diagnostic imaging , Prospective Studies , Retrospective Studies , Stomach Neoplasms/diagnostic imaging

13.

Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data.

Vong, Chi-Man; Du, Jie.

Neural Netw ; 128: 268-278, 2020 Aug.

Article in English | MEDLINE | ID: mdl-32454371

ABSTRACT

Multi-class classification for highly imbalanced data is a challenging task in which multiple issues must be resolved simultaneously, including (i) accuracy on classifying highly imbalanced multi-class data; (ii) training efficiency for large data; and (iii) sensitivity to high imbalance ratio (IR). In this paper, a novel sequential ensemble learning (SEL) framework is designed to simultaneously resolve these issues. SEL framework provides a significant property over traditional AdaBoost, in which the majority samples can be divided into multiple small and disjoint subsets for training multiple weak learners without compromising accuracy (while AdaBoost cannot). To ensure the class balance and majority-disjoint property of subsets, a learning strategy called balanced and majority-disjoint subsets division (BMSD) is developed. Unfortunately it is difficult to derive a general learner combination method (LCM) for any kind of weak learner. In this work, LCM is specifically designed for extreme learning machine, called LCM-ELM. The proposed SEL framework with BMSD and LCM-ELM has been compared with state-of-the-art methods over 16 benchmark datasets. In the experiments, under highly imbalanced multi-class data (IR up to 14K; data size up to 493K), (i) the proposed works improve the performance in different measures including G-mean, macro-F, micro-F, MAUC; (ii) training time is significantly reduced.

Subject(s)

Machine Learning/standards , Data Collection/standards

14.

Robust Online Multilabel Learning Under Dynamic Changes in Data Distribution With Labels.

Du, Jie; Vong, Chi-Man.

IEEE Trans Cybern ; 50(1): 374-385, 2020 Jan.

Article in English | MEDLINE | ID: mdl-31107670

ABSTRACT

In this paper, a robust online multilabel learning method dealing with dynamically changing multilabel data streams is proposed. The proposed method has three advantages: 1) higher accuracy due to a newly defined objective function based on labels ranking; 2) fast training and update based on a newly derived closed-form (rather than gradient descent based) solution for the new objective function; and 3) high robustness to a newly identified concept drift in multilabel data streams, namely, changes in data distribution with labels (CDDL). The high robustness benefits from two novel works: 1) a new sequential update rule that preserves the labels ranking information learned from all old (but discarded) samples while updating the model only based on new incoming samples and 2) a fixed threshold for label bipartition that is insensitive to any kind of changes in data distribution including CDDL. The proposed method has been evaluated over 13 benchmark datasets from various domains. As shown in the experimental results, the proposed work is highly robust to CDDL in both the sequential model update and multilabel thresholding. Furthermore, the proposed method improves the performance in different evaluation measures, including Hamming loss, F1-measure, Precision, and Recall while taking short training time on most evaluated datasets.

15.

Author Correction: Drug screening of cancer cell lines and human primary tumors using droplet microfluidics.

Wong, Ada Hang-Heng; Li, Haoran; Jia, Yanwei; Mak, Pui-In; da Silva Martins, Rui Paulo; Liu, Yan; Vong, Chi Man; Wong, Hang Cheong; Wong, Pak Kin; Wang, Haitao; Sun, Heng; Deng, Chu-Xia.

Sci Rep ; 9(1): 18660, 2019 Dec 04.

Article in English | MEDLINE | ID: mdl-31796858

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

16.

3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation.

Han, Zhizhong; Lu, Honglei; Liu, Zhenbao; Vong, Chi-Man; Liu, Yu-Shen; Zwicker, Matthias; Han, Junwei; Chen, C L Philip.

IEEE Trans Image Process ; 28(8): 3986-3999, 2019 Aug.

Article in English | MEDLINE | ID: mdl-30872228

ABSTRACT

Learning 3D global features by aggregating multiple views is important. Pooling is widely used to aggregate views in deep learning models. However, pooling disregards a lot of content information within views and the spatial relationship among the views, which limits the discriminability of learned features. To resolve this issue, 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation. Specifically, the content information within each view is first encoded. Then, the encoded view content information and the sequential spatiality among the views are simultaneously aggregated by the hierarchical attention aggregation, where view-level attention and class-level attention are proposed to hierarchically weight sequential views and shape classes. View-level attention is learned to indicate how much attention is paid to each view by each shape class, which subsequently weights sequential views through a novel recursive view integration. Recursive view integration learns the semantic meaning of view sequence, which is robust to the first view position. Furthermore, class-level attention is introduced to describe how much attention is paid to each shape class, which innovatively employs the discriminative ability of the fine-tuned network. 3D2SeqViews learns more discriminative features than the state-of-the-art, which leads to the outperforming results in shape classification and retrieval under three large-scale benchmarks.

17.

Unsupervised Learning of 3-D Local Features From Raw Voxels Based on a Novel Permutation Voxelization Strategy.

Han, Zhizhong; Liu, Zhenbao; Han, Junwei; Vong, Chi-Man; Bu, Shuhui; Chen, C L Philip.

IEEE Trans Cybern ; 49(2): 481-494, 2019 Feb.

Article in English | MEDLINE | ID: mdl-29990288

ABSTRACT

Effective 3-D local features are significant elements for 3-D shape analysis. Existing hand-crafted 3-D local descriptors are effective but usually involve intensive human intervention and prior knowledge, which burdens the subsequent processing procedures. An alternative resorts to the unsupervised learning of features from raw 3-D representations via popular deep learning models. However, this alternative suffers from several significant unresolved issues, such as irregular vertex topology, arbitrary mesh resolution, orientation ambiguity on the 3-D surface, and rigid and slightly nonrigid transformation invariance. To tackle these issues, we propose an unsupervised 3-D local feature learning framework based on a novel permutation voxelization strategy to learn high-level and hierarchical 3-D local features from raw 3-D voxels. Specifically, the proposed strategy first applies a novel voxelization which discretizes each 3-D local region with irregular vertex topology and arbitrary mesh resolution into regular voxels, and then, a novel permutation is applied to permute the voxels to simultaneously eliminate the effect of rotation transformation and orientation ambiguity on the surface. Based on the proposed strategy, the permuted voxels can fully encode the geometry and structure of each local region in regular, sparse, and binary vectors. These voxel vectors are highly suitable for the learning of hierarchical common surface patterns by stacked sparse autoencoder with hierarchical abstraction and sparse constraint. Experiments are conducted on three aspects for evaluating the learned local features: 1) global shape retrieval; 2) partial shape retrieval; and 3) shape correspondence. The experimental results show that the learned local features outperform the other state-of-the-art 3-D shape descriptors.

18.

SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention.

Han, Zhizhong; Shang, Mingyang; Liu, Zhenbao; Vong, Chi-Man; Liu, Yu-Shen; Zwicker, Matthias; Han, Junwei; Chen, C L Philip.

IEEE Trans Image Process ; 28(2): 658-672, 2019 Feb.

Article in English | MEDLINE | ID: mdl-30183634

ABSTRACT

Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.

19.

Deep Spatiality: Unsupervised Learning of Spatially-Enhanced Global and Local 3D Features by Deep Neural Network with Coupled Softmax.

Han, Zhizhong; Liu, Zhenbao; Vong, Chi-Man; Liua, Yu-Shen; Bu, Shuhui; Han, Junwei; Chen, C L Philip.

IEEE Trans Image Process ; 27(9): 3049-3063, 2018 06.

Article in English | MEDLINE | ID: mdl-29993805

ABSTRACT

The discriminability of Bag-of-Words representations can be increased via encoding the spatial relationship among virtual words on 3D shapes. However, this encoding task involves several issues, including arbitrary mesh resolutions, irregular vertex topology, orientation ambiguity on 3D surface, invariance to rigid and non-rigid shape transformations. To address these issues, a novel unsupervised spatial learning framework based on deep neural network, deep spatiality (DS), is proposed. Specifically, DS employs two novel components: spatial context extractor and deep context learner. Spatial context extractor extracts the spatial relationship among virtual words in a local region into a raw spatial representation. Along a consistent circular direction, a directed circular graph is constructed to encode relative positions between pairwise virtual words in each face ring into a relative spatial matrix. By decomposing each relative spatial matrix using SVD, the raw spatial representation is formed, from which deep context learner conducts unsupervised learning of global and local features. Deep context learner is a deep neural network with a novel model structure to adapt the proposed coupled softmax layer, which encodes not only the discriminative information among local regions but also the one among global shapes. Experimental results show that DS outperforms state-of-the-art methods.

Subject(s)

Deep Learning , Imaging, Three-Dimensional/methods , Unsupervised Machine Learning , Software

20.

Postboosting Using Extended G-Mean for Online Sequential Multiclass Imbalance Learning.

Vong, Chi-Man; Du, Jie; Wong, Chi-Man; Cao, Jiu-Wen.

IEEE Trans Neural Netw Learn Syst ; 29(12): 6163-6177, 2018 12.

Article in English | MEDLINE | ID: mdl-29993897

ABSTRACT

In this paper, a novel learning method called postboosting using extended G-mean (PBG) is proposed for online sequential multiclass imbalance learning (OS-MIL) in neural networks. PBG is effective due to three reasons. 1) Through postadjusting a classification boundary under extended G-mean, the challenging issue of imbalanced class distribution for sequentially arriving multiclass data can be effectively resolved. 2) A newly derived update rule for online sequential learning is proposed, which produces a high G-mean for current model and simultaneously possesses almost the same information of its previous models. 3) A dynamic adjustment mechanism provided by extended G-mean is valid to deal with the unresolved challenging dense-majority problem and two dynamic changing issues, namely, dynamic changing data scarcity (DCDS) and dynamic changing data diversity (DCDD). Compared to other OS-MIL methods, PBG is highly effective on resolving DCDS, while PBG is the only method to resolve dense-majority and DCDD. Furthermore, PBG can directly and effectively handle unscaled data stream. Experiments have been conducted for PBG and two popular OS-MIL methods for neural networks under massive binary and multiclass data sets. Through the analyses of experimental results, PBG is shown to outperform the other compared methods on all data sets in various aspects including the issues of data scarcity, dense-majority, DCDS, DCDD, and unscaled data.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL