Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Sensors (Basel) ; 23(13)2023 Jul 07.
Article in English | MEDLINE | ID: mdl-37448078

ABSTRACT

Recently, stereoscopic image quality assessment has attracted a lot attention. However, compared with 2D image quality assessment, it is much more difficult to assess the quality of stereoscopic images due to the lack of understanding of 3D visual perception. This paper proposes a novel no-reference quality assessment metric for stereoscopic images using natural scene statistics with consideration of both the quality of the cyclopean image and 3D visual perceptual information (binocular fusion and binocular rivalry). In the proposed method, not only is the quality of the cyclopean image considered, but binocular rivalry and other 3D visual intrinsic properties are also exploited. Specifically, in order to improve the objective quality of the cyclopean image, features of the cyclopean images in both the spatial domain and transformed domain are extracted based on the natural scene statistics (NSS) model. Furthermore, to better comprehend intrinsic properties of the stereoscopic image, in our method, the binocular rivalry effect and other 3D visual properties are also considered in the process of feature extraction. Following adaptive feature pruning using principle component analysis, improved metric accuracy can be found in our proposed method. The experimental results show that the proposed metric can achieve a good and consistent alignment with subjective assessment of stereoscopic images in comparison with existing methods, with the highest SROCC (0.952) and PLCC (0.962) scores being acquired on the LIVE 3D database Phase I.


Subject(s)
Depth Perception , Imaging, Three-Dimensional , Imaging, Three-Dimensional/methods , Visual Perception , Attention , Databases, Factual
2.
Article in English | MEDLINE | ID: mdl-37027547

ABSTRACT

Recently, learning-based algorithms have shown impressive performance in underwater image enhancement. Most of them resort to training on synthetic data and obtain outstanding performance. However, these deep methods ignore the significant domain gap between the synthetic and real data (i.e., inter-domain gap), and thus the models trained on synthetic data often fail to generalize well to real-world underwater scenarios. Moreover, the complex and changeable underwater environment also causes a great distribution gap among the real data itself (i.e., intra-domain gap). However, almost no research focuses on this problem and thus their techniques often produce visually unpleasing artifacts and color distortions on various real images. Motivated by these observations, we propose a novel Two-phase Underwater Domain Adaptation network (TUDA) to simultaneously minimize the inter-domain and intra-domain gap. Concretely, in the first phase, a new triple-alignment network is designed, including a translation part for enhancing realism of input images, followed by a task-oriented enhancement part. With performing image-level, feature-level and output-level adaptation in these two parts through jointly adversarial learning, the network can better build invariance across domains and thus bridging the inter-domain gap. In the second phase, an easy-hard classification of real data according to the assessed quality of enhanced images is performed, in which a new rank-based underwater quality assessment method is embedded. By leveraging implicit quality information learned from rankings, this method can more accurately assess the perceptual quality of enhanced images. Using pseudo labels from the easy part, an easy-hard adaptation technique is then conducted to effectively decrease the intra-domain gap between easy and hard samples. Extensive experimental results demonstrate that the proposed TUDA is significantly superior to existing works in terms of both visual quality and quantitative metrics.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2652-2659, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35452385

ABSTRACT

Subspace clustering is useful for clustering data points according to the underlying subspaces. Many methods have been presented in recent years, among which Sparse Subspace Clustering (SSC), Low-Rank Representation (LRR) and Least Squares Regression clustering (LSR) are three representative methods. These approaches achieve good results by assuming the structure of errors as a prior and removing errors in the original input space by modeling them in their objective functions. In this paper, we propose a novel method from an energy perspective to eliminate errors in the projected space rather than the input space. Since the block diagonal property can lead to correct clustering, we measure the correctness in terms of a block in the projected space with an energy function. A correct block corresponds to the subset of columns with the maximal energy. The energy of a block is defined based on the unary column, pairwise and high-order similarity of columns for each block. We relax the energy function of a block and approximate it by a constrained homogenous function. Moreover, we propose an efficient iterative algorithm to remove errors in the projected space. Both theoretical analysis and experiments show the superiority of our method over existing solutions to the clustering problem, especially when noise exists.

4.
Vet Res Forum ; 14(11): 589-594, 2023.
Article in English | MEDLINE | ID: mdl-38169479

ABSTRACT

Activity patterns and time budgets play a crucial role in the successful farming and management of animals. In this study, the behavior patterns of 53 forest musk deer (Moschus berezovskii) were analyzed from October 2nd to 16th, 2021, throughout the day and night. The results showed a distinct dawn-dusk activity rhythm in the captive forest musk deer with a peak activity observed at dawn (07:00 - 10:00) and dusk (16:00 - 19:00). Additionally, there were smaller activity peaks lasting less than an hour during the nighttime (00:00 - 04:00). Comparing behavior ratios between peak and off-peak periods, it was evident that all behaviors, except rumination (RU), showed significant differences. Furthermore, no significant differences were found in the behavior ratios of the forest musk deer between the daytime and night-time. During the daytime, the percentages of time spent performing locomotion (32.87 ± 3.38%), feeding (14.43 ± 1.81%), and RU (5.62 ± 1.46%) were slightly higher compared to the night-time. Based on these findings, it is important to match the management strategies for musk deer farming with the animals' activity patterns and behavioral rhythms. Doing so can enhance farming outputs and contribute to the welfare of captive forest musk deer.

5.
Opt Express ; 30(23): 42224-42240, 2022 Nov 07.
Article in English | MEDLINE | ID: mdl-36366680

ABSTRACT

To alleviate the spatial-angular trade-off in sampled light fields (LFs), LF super-resolution (SR) has been studied. Most of the current LFSR methods only concern limited relations in LFs, which leads to the insufficient exploitation of the multi-dimensional information. To address this issue, we present a multi-models fusion framework for LFSR in this paper. Models embodying LF from distinct aspects are integrated to constitute the fusion framework. Therefore, the number and the arrangement of these models together with the depth of each model determine the performance of the framework; we make the comprehensive analysis on these factors to reach the best SR result. However, models in the framework are isolated to each other as the unique inputs are required. To tackle this issue, the representation alternate convolution (RAC) is introduced. As the fusion is conducted successfully through the RAC, the multi-dimensional information in LFs is fully exploited. Experimental results demonstrate that our method achieves superior performance against state-of-the-art techniques quantitatively and qualitatively.

6.
Sensors (Basel) ; 22(11)2022 May 31.
Article in English | MEDLINE | ID: mdl-35684826

ABSTRACT

Event detection is an important task in the field of natural language processing, which aims to detect trigger words in a sentence and classify them into specific event types. Event detection tasks suffer from data sparsity and event instances imbalance problems in small-scale datasets. For this reason, the correlation information of event types can be used to alleviate the above problems. In this paper, we design a Hierarchical Attention Neural Network for Event Types (HANN-ET). Specifically, we select Long Short-Term Memory (LSTM) as the semantic encoder and utilize dynamic multi-pooling and the Graph Attention Network (GAT) to enrich the sentence feature. Meanwhile, we build several upper-level event type modules and employ a weighted attention aggregation mechanism to integrate these modules to obtain the correlation event type information. Each upper-level module is completed by a Neural Module Network (NMNs), event types within the same upper-level module can share information, and an attention aggregation mechanism can provide effective bias scores for the trigger word classifier. We conduct extensive experiments on the ACE2005 and the MAVEN datasets, and the results show that our approach outperforms previous state-of-the-art methods and achieves the competitive F1 scores of 78.9% on the ACE2005 dataset and 68.8% on the MAVEN dataset.


Subject(s)
Natural Language Processing , Neural Networks, Computer , Language , Memory, Long-Term , Semantics
7.
IEEE Trans Image Process ; 30: 6459-6472, 2021.
Article in English | MEDLINE | ID: mdl-34236964

ABSTRACT

Recently, many deep learning based researches are conducted to explore the potential quality improvement of compressed videos. These methods mostly utilize either the spatial or temporal information to perform frame-level video enhancement. However, they fail in combining different spatial-temporal information to adaptively utilize adjacent patches to enhance the current patch and achieve limited enhancement performance especially on scene-changing and strong-motion videos. To overcome these limitations, we propose a patch-wise spatial-temporal quality enhancement network which firstly extracts spatial and temporal features, then recalibrates and fuses the obtained spatial and temporal features. Specifically, we design a temporal and spatial-wise attention-based feature distillation structure to adaptively utilize the adjacent patches for distilling patch-wise temporal features. For adaptively enhancing different patch with spatial and temporal information, a channel and spatial-wise attention fusion block is proposed to achieve patch-wise recalibration and fusion of spatial and temporal features. Experimental results demonstrate our network achieves peak signal-to-noise ratio improvement, 0.55 - 0.69 dB compared with the compressed videos at different quantization parameters, outperforming state-of-the-art approach.

8.
IEEE Trans Image Process ; 27(9): 4195-4206, 2018 Sep.
Article in English | MEDLINE | ID: mdl-29870341

ABSTRACT

3D-high efficiency video coding (HEVC) is developed for the compression of the multi-view video plus depth format, which is based on the latest generation of video coding standard, HEVC. It further adopts several new intra prediction modes, depth-modeling modes (DMMs) in intra candidate modes for a better representation of edges in depth maps, which introduces a drastic increase in the computational complexity. The procedure of depth intra mode decision together with DMMs and existing intra modes is a very time consuming part due to huge complexity of full rate distortion (RD) cost calculation. In this paper, a low complexity intra mode selection algorithm is proposed to reduce complexity of depth intra prediction in both intra-frames and inter-frames. An experimental analysis is first performed to study the inter-view correlation and the inter-component (texture video and its associated depth) correlation in intra coding information such as the intra mode and RD cost. All intra modes available in 3D-HEVC are classified into three activity classes assigned with different mode-weight factors, and the coding mode complexity of a coding unit (CU) is defined according to the intra mode information from available spatiotemporal, inter-view, and inter-component neighboring coded CUs. The coding mode complexity analysis is utilized to assign different candidate intra modes for different types of CUs. The optimal intra prediction mode and the RD cost value in current CU depth level are further used to skip unnecessary intra prediction sizes. Experimental results show that the proposed fast depth intra coding algorithm achieves 61% complexity reduction on intra prediction, while incurring a 0.2% Bjontegaard metric increase for coded and synthesized views compared to the test model of 3D-HEVC.

9.
Appl Opt ; 56(29): 8291-8302, 2017 Oct 10.
Article in English | MEDLINE | ID: mdl-29047696

ABSTRACT

Stereoscopic imaging technology has been growingly prevalent driven by both the entertainment industry and scientific applications in today's world. But objective quality assessment of stereoscopic images is a challenging task. In this paper, we propose a novel stereoscopic image quality assessment (SIQA) method by jointly considering monocular perception and binocular interaction. As the most significant contribution of this study, binocular perceptual properties of simple and complex cells are considered for full-reference (FR) SIQA. Specifically, the proposed scheme first simulates the receptive fields of simple cells (one class of V1 neurons) using a push-pull combination of receptive fields response, which is used to represent a monocular cue. Further, the receptive fields of complex cells (the other class of V1 neurons) are simulated by using binocular energy response and binocular rivalry response, which are used to represent a binocular cue. Subsequently, various quality-aware features are extracted from the response of area V1 by calculating the self-weighted histogram of the local binary pattern on four types of feature maps of similarity measurement that will change in the presence of distortions. Finally, kernel ridge regression is used to simulate a nonlinear relationship between the quality-aware features and objective quality scores. The performance of our method is evaluated over popular stereoscopic image databases and shown to be competitive with the state-of-the-art FR SIQA algorithms.

10.
PLoS One ; 12(2): e0171018, 2017.
Article in English | MEDLINE | ID: mdl-28182719

ABSTRACT

The 3D High Efficiency Video Coding (3D-HEVC) standard aims to code 3D videos that usually contain multi-view texture videos and its corresponding depth information. It inherits the same quadtree prediction structure of HEVC to code both texture videos and depth maps. Each coding unit (CU) allows recursively splitting into four equal sub-CUs. At each CU depth level, it enables 10 types of inter modes and 35 types of intra modes in inter frames. Furthermore, the inter-view prediction tools are applied to each view in the test model of 3D-HEVC (HTM), which uses variable size disparity-compensated prediction to exploit inter-view correlation within neighbor views. It also exploits redundancies between a texture video and its associated depth using inter-component coding tools. These achieve the highest coding efficiency to code 3D videos but require a very high computational complexity. In this paper, we propose a context-adaptive based fast CU processing algorithm to jointly optimize the most complex components of HTM including CU depth level decision, mode decision, motion estimation (ME) and disparity estimation (DE) processes. It is based on the hypothesis that the optimal CU depth level, prediction mode and motion vector of a CU are correlated with those from spatiotemporal, inter-view and inter-component neighboring CUs. We analyze the video content based on coding information from neighboring CUs and early predict each CU into one of five categories i.e., DE-omitted CU, ME-DE-omitted CU, SPLIT CU, Non-SPLIT CU and normal CU, and then each type of CU adaptively adopts different processing strategies. Experimental results show that the proposed algorithm saves 70% encoder runtime on average with only a 0.1% BD-rate increase on coded views and 0.8% BD-rate increase on synthesized views. Our algorithm outperforms the state-of-the-art algorithms in terms of coding time saving or with better RD performance.


Subject(s)
Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Video Recording/methods
11.
IEEE Trans Image Process ; 23(10): 4232-41, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25069112

ABSTRACT

In high efficiency video coding (HEVC), the tree structured coding unit (CU) is adopted to allow recursive splitting into four equally sized blocks. At each depth level (or CU size), it enables up to 35 intraprediction modes, including a planar mode, a dc mode, and 33 directional modes. The intraprediction via exhaustive mode search exploited in the test model of HEVC (HM) effectively improves coding efficiency, but results in a very high computational complexity. In this paper, a fast CU size decision algorithm for HEVC intracoding is proposed to speed up the process by reducing the number of candidate CU sizes required to be checked for each treeblock. The novelty of the proposed algorithm lies in the following two aspects: 1) an early determination of CU size decision with adaptive thresholds is developed based on the texture homogeneity and 2) a novel bypass strategy for intraprediction on large CU size is proposed based on the combination of texture property and coding information from neighboring coded CUs. Experimental results show that the proposed effective CU size decision algorithm achieves a computational complexity reduction up to 67%, while incurring only 0.06-dB loss on peak signal-to-noise ratio or 1.08% increase on bit rate compared with that of the original coding in HM.

12.
Opt Lett ; 38(5): 700-2, 2013 Mar 01.
Article in English | MEDLINE | ID: mdl-23455270

ABSTRACT

We propose an efficient regional histogram (RH)-based computation model for saliency detection in natural images. First, the global histogram is constructed by performing an adaptive color quantization on the original image. Then multiple RHs are built on the basis of the region segmentation result, and the color-spatial similarity between each pixel and each RH is calculated accordingly. Two efficient measures, distinctiveness and compactness of each RH, are evaluated based on the color difference with the global histogram and the color distribution over the whole image, respectively. Finally, the pixel-level saliency map is generated by integrating the color-spatial similarity measures with the distinctiveness and compactness measures. Experimental results on a dataset containing 1000 test images with ground truths demonstrate that the proposed saliency model consistently outperforms state-of-the-art saliency models.

13.
IEEE Trans Image Process ; 21(5): 2582-91, 2012 May.
Article in English | MEDLINE | ID: mdl-22155957

ABSTRACT

A joint model of scalable video coding (SVC) uses exhaustive mode and motion searches to select the best prediction mode and motion vector for each macroblock (MB) with high coding efficiency at the cost of computational complexity. If major characteristics of a coding MB such as the complexity of the prediction mode and the motion property can be identified and used in adjusting motion estimation (ME), one can design an algorithm that can adapt coding parameters to the video content. This way, unnecessary mode and motion searches can be avoided. In this paper, we propose a content-adaptive ME for SVC, including analyses of mode complexity and motion property to assist mode and motion searches. An experimental analysis is performed to study interlayer and spatial correlations in the coding information. Based on the correlations, the motion and mode characteristics of the current MB are identified and utilized to adjust each step of ME at the enhancement layer including mode decision, search-range selection, and prediction direction selection. Experimental results show that the proposed algorithm can significantly reduce the computational complexity of SVC while maintaining nearly the same rate distortion performance as the original encoder.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Motion , Pattern Recognition, Automated/methods , Subtraction Technique , Video Recording/methods , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...