Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Neural Netw ; 161: 142-153, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36745939

ABSTRACT

Segmentation of a road portion from a satellite image is challenging due to its complex background, occlusion, shadows, clouds, and other optical artifacts. One must combine both local and global cues for an accurate and continuous/connected road network extraction. This paper proposes a model using fractional derivative-based weighted skip connections on a densely connected convolutional neural network for road segmentation. Weights corresponding to the skip connections are determined using Grunwald-Letnikov fractional derivative. Fractional derivatives being non-local in nature incorporates memory into the system and thereby combine both local and global features. Experiments have been performed on two open source widely used benchmark databases viz. Massachusetts Road database (MRD) and Ottawa Road database (ORD). Both these datasets represent different road topography and network structure including varying road widths and complexities. Result reveals that the proposed system demonstrated better performance than the other state-of-the-art methods by achieving an F1-score of 0.748 and the mIoU of 0.787 at fractional order 0.4 on the MRD and a mIoU of 0.9062 at fractional order 0.5 on the ORD.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Image Processing, Computer-Assisted/methods , Databases, Factual
2.
Neural Netw ; 159: 57-69, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36535129

ABSTRACT

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.


Subject(s)
Benchmarking , Unmanned Aerial Devices , Humans , Learning , Neural Networks, Computer , Recognition, Psychology
3.
Neural Netw ; 146: 11-21, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34839089

ABSTRACT

Human activity recognition (HAR) is an important task in many applications such as smart homes, sports analysis, healthcare services, etc. Popular modalities for human activity recognition involving computer vision and inertial sensors are in the literature for solving HAR, however, they face serious limitations with respect to different illumination, background, clutter, obtrusiveness, and other factors. In recent years, WiFi channel state information (CSI) based activity recognition is gaining momentum due to its many advantages including easy deployability, and cost-effectiveness. This work proposes CSITime, a modified InceptionTime network architecture, a generic architecture for CSI-based human activity recognition. We perceive CSI activity recognition as a multi-variate time series problem. The methodology of CSITime is threefold. First, we pre-process CSI signals followed by data augmentation using two label-mixing strategies - mixup and cutmix to enhance the neural network's learning. Second, in the basic block of CSITime, features from multiple convolutional kernels are concatenated and passed through a self-attention layer followed by a fully connected layer with Mish activation. CSITime network consists of six such blocks followed by a global average pooling layer and a final fully connected layer for the final classification. Third, in the training of the neural network, instead of adopting general training procedures such as early stopping, we use one-cycle policy and cosine annealing to monitor the learning rate. The proposed model has been tested on publicly available benchmark datasets, i.e., ARIL, StanWiFi, and SignFi datasets. The proposed CSITime has achieved accuracy of 98.20%, 98%, and 95.42% on ARIL, StanWiFi, and SignFi datasets, respectively, for WiFi-based activity recognition. This is an improvement on state-of-the-art accuracies by 3.3%, 0.67%, and 0.82% on ARIL, StanWiFi, and SignFi datasets, respectively. In lab-5 users' scenario of the SignFi dataset, which has the training and testing data from different distributions, our model achieved accuracy was 2.17% higher than state-of-the-art, which shows the comparative robustness of our model.


Subject(s)
Human Activities , Privacy , Algorithms , Humans , Neural Networks, Computer , Recognition, Psychology
4.
Article in English | MEDLINE | ID: mdl-34851833

ABSTRACT

Recently, unsupervised cross-dataset person reidentification (Re-ID) has attracted more and more attention, which aims to transfer knowledge of a labeled source domain to an unlabeled target domain. There are two common frameworks: one is pixel-alignment of transferring low-level knowledge, and the other is feature-alignment of transferring high-level knowledge. In this article, we propose a novel recurrent autoencoder (RAE) framework to unify these two kinds of methods and inherit their merits. Specifically, the proposed RAE includes three modules, i.e., a feature-transfer (FT) module, a pixel-transfer (PT) module, and a fusion module. The FT module utilizes an encoder to map source and target images to a shared feature space. In the space, not only features are identity-discriminative but also the gap between source and target features is reduced. The PT module takes a decoder to reconstruct original images with its features. Here, we hope that the images reconstructed from target features are in the source style. Thus, the low-level knowledge can be propagated to the target domain. After transferring both high- and low-level knowledge with the two proposed modules above, we design another bilinear pooling layer to fuse both kinds of knowledge. Extensive experiments on Market-1501, DukeMTMC-ReID, and MSMT17 datasets show that our method significantly outperforms either pixel-alignment or feature-alignment Re-ID methods and achieves new state-of-the-art results.

6.
IEEE Trans Fuzzy Syst ; 29(1): 34-45, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33408453

ABSTRACT

Traditional deep learning methods are sub-optimal in classifying ambiguity features, which often arise in noisy and hard to predict categories, especially, to distinguish semantic scoring. Semantic scoring, depending on semantic logic to implement evaluation, inevitably contains fuzzy description and misses some concepts, for example, the ambiguous relationship between normal and probably normal always presents unclear boundaries (normal - more likely normal - probably normal). Thus, human error is common when annotating images. Differing from existing methods that focus on modifying kernel structure of neural networks, this study proposes a dominant fuzzy fully connected layer (FFCL) for Breast Imaging Reporting and Data System (BI-RADS) scoring and validates the universality of this proposed structure. This proposed model aims to develop complementary properties of scoring for semantic paradigms, while constructing fuzzy rules based on analyzing human thought patterns, and to particularly reduce the influence of semantic conglutination. Specifically, this semantic-sensitive defuzzier layer projects features occupied by relative categories into semantic space, and a fuzzy decoder modifies probabilities of the last output layer referring to the global trend. Moreover, the ambiguous semantic space between two relative categories shrinks during the learning phases, as the positive and negative growth trends of one category appearing among its relatives were considered. We first used the Euclidean Distance (ED) to zoom in the distance between the real scores and the predicted scores, and then employed two sample t test method to evidence the advantage of the FFCL architecture. Extensive experimental results performed on the CBIS-DDSM dataset show that our FFCL structure can achieve superior performances for both triple and multiclass classification in BI-RADS scoring, outperforming the state-of-the-art methods.

7.
Neural Netw ; 135: 1-12, 2021 Mar.
Article in English | MEDLINE | ID: mdl-33310193

ABSTRACT

Knowledge graph reasoning aims to find reasoning paths for relations over incomplete knowledge graphs (KG). Prior works may not take into account that the rewards for each position (vertex in the graph) may be different. We propose the distance-aware reward in the reinforcement learning framework to assign different rewards for different positions. We observe that KG embeddings are learned from independent triples and therefore cannot fully cover the information described in the local neighborhood. To this effect, we integrate a graph self-attention (GSA) mechanism to capture more comprehensive entity information from the neighboring entities and relations. To let the model remember the path, we incorporate the GSA mechanism with GRU to consider the memory of relations in the path. Our approach can train the agent in one-pass, thus eliminating the pre-training or fine-tuning process, which significantly reduces the problem complexity. Experimental results demonstrate the effectiveness of our method. We found that our model can mine more balanced paths for each relation.


Subject(s)
Databases, Factual , Deep Learning , Pattern Recognition, Automated/methods , Reinforcement, Psychology , Algorithms , Databases, Factual/trends , Deep Learning/trends , Humans , Knowledge , Pattern Recognition, Automated/trends
8.
Neural Netw ; 133: 40-56, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33125917

ABSTRACT

Conversational sentiment analysis is an emerging, yet challenging subtask of the sentiment analysis problem. It aims to discover the affective state and sentimental change in each person in a conversation based on their opinions. There exists a wealth of interaction information that affects speaker sentiment in conversations. However, existing sentiment analysis approaches are insufficient in dealing with this subtask due to two primary reasons: the lack of benchmark conversational sentiment datasets and the inability to model interactions between individuals. To address these issues, in this paper, we first present a new conversational dataset that we created and made publicly available, named ScenarioSA, to support the development of conversational sentiment analysis models. Then, we investigate how interaction dynamics are associated with conversations and study the multidimensional nature of interactions, which is understandability, credibility and influence. Finally, we propose an interactive long short-term memory (LSTM) network for conversational sentiment analysis to model interactions between speakers in a conversation by (1) adding a confidence gate before each LSTM hidden unit to estimate the credibility of the previous speakers and (2) combining the output gate with the learned influence scores to incorporate the influences of the previous speakers. Extensive experiments are conducted on ScenarioSA and IEMOCAP, and the results show that our model outperforms a wide range of strong baselines and achieves competitive results with the state-of-art approaches.


Subject(s)
Emotions/physiology , Memory, Long-Term/physiology , Memory, Short-Term/physiology , Neural Networks, Computer , Voice Recognition/physiology , Communication , Humans
9.
Neural Netw ; 132: 190-210, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32911304

ABSTRACT

This article proposes a novel and comprehensive framework on how to describe the probabilistic nature of decision-making process. We suggest extending the quantum-like Bayesian network formalism to incorporate the notion of maximum expected utility to model human paradoxical, sub-optimal and irrational decisions. What distinguishes this work is that we take advantage of the quantum interference effects produced in quantum-like Bayesian Networks during the inference process to influence the probabilities used to compute the maximum expected utility of some decision. The proposed quantum-like decision model is able to (1) predict the probability distributions found in different experiments reported in the literature by modelling uncertainty through quantum interference, (2) to identify decisions that the decision-makers perceive to be optimal within their belief space, but that are actually irrational with respect to expected utility theory, (3) gain an understanding of how the decision-maker's beliefs evolve within a decision-making scenario. The proposed model has the potential to provide new insights in decision science, as well as having direct implications for decision support systems that deal with human data, such as in the fields of economics, finance, psychology, etc.


Subject(s)
Decision Making , Probability , Quantum Theory , Uncertainty , Bayes Theorem , Humans , Problem Solving
10.
Neural Netw ; 129: 43-54, 2020 Sep.
Article in English | MEDLINE | ID: mdl-32563024

ABSTRACT

Tracklet association methods learn the cross camera retrieval ability though associating underlying cross camera positive samples, which have proven to be successful in unsupervised person re-identification task. However, most of them use poor-efficiency association strategies which costs long training hours but gains the low performance. To solve this, we propose an effective end-to-end exemplar associations (EEA) framework in this work. EEA mainly adapts three strategies to improve efficiency: (1) end-to-end exemplar-based training, (2) exemplar association and (3) dynamic selection threshold. The first one is to accelerate the training process, while the others aim to improve the tracklet association precision. Compared with existing tracklet associating methods, EEA obviously reduces the training cost and achieves the higher performance. Extensive experiments and ablation studies on seven RE-ID datasets demonstrate the superiority of the proposed EEA over most state-of-the-art unsupervised and domain adaptation RE-ID methods.


Subject(s)
Biometric Identification/methods , Unsupervised Machine Learning/standards , Biometric Identification/standards , Unsupervised Machine Learning/economics
11.
Neural Netw ; 128: 294-304, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32470795

ABSTRACT

RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. Considering no correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing marginal distribution divergence between the entire RGB and IR sets. However, this set-level alignment strategy may lead to misalignment of some instances, which limit the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged features. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Third, our method learns a latent manifold space. In the space, we can random sample and generate lots of images of unseen classes. Training with those images, the learned identity feature space is more smooth can generalize better when test. Finally, extensive experimental results on two standard benchmarks demonstrate that the proposed model favorably against state-of-the-art methods.


Subject(s)
Biometric Identification/methods , Machine Learning , Infrared Rays
14.
Neural Netw ; 123: 191-216, 2020 Mar.
Article in English | MEDLINE | ID: mdl-31884181

ABSTRACT

Deep kernel learning has been well explored for multi-class classification tasks; however, relatively less work is done for one-class classification (OCC). OCC needs samples from only one class to train the model. Most recently, kernel regularized least squares (KRL) method-based deep architecture is developed for the OCC task. This paper introduces a novel extension of this method by embedding minimum variance information within this architecture. This embedding improves the generalization capability of the classifier by reducing the intra-class variance. In contrast to traditional deep learning methods, this method can effectively work with small-size datasets. We conduct a comprehensive set of experiments on 18 benchmark datasets (13 biomedical and 5 other datasets) to demonstrate the performance of the proposed classifier. We compare the results with 16 state-of-the-art one-class classifiers. Further, we also test our method for 2 real-world biomedical datasets viz.; detection of Alzheimer's disease from structural magnetic resonance imaging data and detection of breast cancer from histopathological images. Proposed method exhibits more than 5% F1 score compared to existing state-of-the-art methods for various biomedical benchmark datasets. This makes it viable for application in biomedical fields where relatively less amount of data is available.


Subject(s)
Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Alzheimer Disease/diagnostic imaging , Breast Neoplasms/diagnostic imaging , Female , Humans , Image Processing, Computer-Assisted/standards , Least-Squares Analysis , Practice Guidelines as Topic
15.
Neural Netw ; 122: 407-419, 2020 Feb.
Article in English | MEDLINE | ID: mdl-31794950

ABSTRACT

A novel method for person identification based on the fusion of iris and periocular biometrics has been proposed in this paper. The challenges for image acquisition for Near-Infrared or Visual Wavelength lights under constrained and unconstrained environments have been considered here. The proposed system is divided into image preprocessing data augmentation followed by feature learning for classification components. In image preprocessing an annular iris, the portion is segmented out from an eyeball image and then transformed into a fixed-sized image region. The parameters of iris localization have been used to extract the local periocular region. Due to different imaging environments, the images suffer from various noise artifacts which create data insufficiency and complicate the recognition task. To overcome this situation, a novel method for data augmentation technique has been introduced here. For features extraction and classification tasks well-known, VGG16, ResNet50, and Inception-v3 CNN architectures have been employed. The performance due to iris and periocular are fused together to increase the performance of the recognition system. The extensive experimental results have been demonstrated in four benchmark iris databases namely: MMU1, UPOL, CASIA-Iris-distance, and UBIRIS.v2. The comparison with the state-of-the-art methods with respect to these databases shows the robustness and effectiveness of the proposed approach.


Subject(s)
Biometric Identification/methods , Deep Learning , Image Processing, Computer-Assisted/methods , Iris , Algorithms , Databases, Factual , Face , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...