Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
IEEE Trans Image Process ; 32: 3092-3107, 2023.
Article in English | MEDLINE | ID: mdl-37204945

ABSTRACT

In this paper we propose novel extensions to JPEG 2000 for the coding of discontinuous media which includes piecewise smooth imagery such as depth maps and optical flows. These extensions use breakpoints to model discontinuity boundary geometry and apply a breakpoint dependent Discrete Wavelet Transform (BP-DWT) to the input imagery. The highly scalable and accessible coding features provided by the JPEG 2000 compression framework are preserved by our proposed extensions, with the breakpoint and transform components encoded as independent bit streams that can be progressively decoded. Comparative rate-distortion results are provided along with corresponding visual examples which highlight the advantages of using breakpoint representations with accompanying BD-DWT and embedded bit-plane coding. Recently our proposed extensions have been adopted and are in the process of being published as a new Part 17 to the JPEG 2000 family of coding standards.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5700-5714, 2022 Sep.
Article in English | MEDLINE | ID: mdl-34048338

ABSTRACT

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).

3.
Article in English | MEDLINE | ID: mdl-32286976

ABSTRACT

Recently, many fast implementations of the bilateral and the nonlocal filters were proposed based on lattice and vector quantization, e.g. clustering, in higher dimensions. However, these approaches can still be inefficient owing to the complexities in the resampling process or in filtering the high-dimensional resampled signal. In contrast, simply scalar resampling the high-dimensional signal after decorrelation presents the opportunity to filter signals using multi-rate signal processing techniques. Cis work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately implementing the filter is important not only for image processing applications, but also for a number of recently proposed bilateralregularized inverse problems, where the accuracy of the solutions depends ultimately on an accurate filter implementation. We show that our Gaussian lifting approach filters images more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal means filtering are also explored.

4.
Article in English | MEDLINE | ID: mdl-32286986

ABSTRACT

We propose the fast optical flow extractor, a filtering method that recovers artifact-free optical flow fields from HEVCcompressed video. To extract accurate optical flow fields, we form a regularized optimization problem that considers the smoothness of the solution and the pixelwise confidence weights of an artifactridden HEVC motion field. Solving such an optimization problem is slow, so we first convert the problem into a confidence-weighted filtering task. By leveraging the already-available HEVC motion parameters, we achieve a 100-fold speed-up in the running times compared to similar methods, while producing subpixel-accurate flow estimates. Je fast optical flow extractor is useful when video frames are already available in coded formats. Our method is not specific to a coder, and works with motion fields from video coders such as H.264/AVC and HEVC.

5.
Article in English | MEDLINE | ID: mdl-31613756

ABSTRACT

This paper proposes graph Laplacian regularization for robust estimation of optical flow. First, we analyze the spectral properties of dense graph Laplacians and show that dense graphs achieve a better trade-off between preserving flow discontinuities and filtering noise, compared with the usual Laplacian. Using this analysis, we then propose a robust optical flow estimation method based on Gaussian graph Laplacians. We revisit the framework of iteratively reweighted least-squares from the perspective of graph edge reweighting, and employ the Welsch loss function to preserve flow discontinuities and handle occlusions. Our experiments using the Middlebury and MPI-Sintel optical flow datasets demonstrate the robustness and the efficiency of our proposed approach.

6.
J Acoust Soc Am ; 145(4): 2254, 2019 Apr.
Article in English | MEDLINE | ID: mdl-31046345

ABSTRACT

Three dimensional acoustic soundfields can be represented by a set of spherical harmonic coefficients that are extracted from pressure signals recorded by a spherical microphone array. The extraction method used and truncation order chosen introduce errors of spatial aliasing in the coefficients and truncation error in a reconstructed signal. A spatial Wiener filter (SWF) extraction method is proposed in this paper, using second order statistics of typical soundfield characteristics (signal power, estimated source locations, and internal microphone noise) and accounts for the presence of coefficients beyond the truncation order to reduce spatial aliasing in the extracted coefficients. The SWF can also distinguish between "wanted" and "unwanted" sources, reducing the contributions of unwanted sources to the extracted coefficients. The SWF is compared against the state of the art methods; regularized inverse (or generalized inverse), and orthonormal extraction methods, which are explored under a similar framework to the SWF. The authors compare these methods and show the benefit of the SWF for plane waves, with varying assumptions about the source characteristics. The SWF can also extract coefficients beyond the traditional truncation limit of a given array, unlike other methods.

7.
IEEE Trans Image Process ; 28(9): 4313-4327, 2019 Sep.
Article in English | MEDLINE | ID: mdl-30908217

ABSTRACT

In this paper, we are interested in the compression of image sets or video with considerable changes in illumination. We develop a framework to decompose frames into illumination fields and texture in order to achieve sparser representations of frames which is beneficial for compression. Illumination variations or contrast ratio factors among frames are described by a full resolution multiplicative field. First, we propose a Lifting-based Illumination Adaptive Transform (LIAT) framework which incorporates illumination compensation to temporal wavelet transforms. We estimate a full resolution illumination field, taking heed of its spatial sparsity by a rate-distortion (R-D) driven framework. An affine mesh model is also developed as a point of comparison. We find the operational coding cost of the subband frames by modeling a typical t + 2D wavelet video coding system. While our general findings on R-D optimization are applicable to a range of coding frameworks, in this paper, we report results based on employing JPEG 2000 coding tools. The experimental results highlight the benefits of the proposed R-D driven illumination estimation and compensation in comparison with alternative scalable coding methods and non-scalable coding schemes of AVC and HEVC employing weighted prediction.

8.
IEEE Trans Image Process ; 28(7): 3205-3218, 2019 Jul.
Article in English | MEDLINE | ID: mdl-30676962

ABSTRACT

We present a compression scheme for multiview imagery that facilitates high scalability and accessibility of the compressed content. Our scheme relies upon constructing at a single base view, a disparity model for a group of views, and then utilizing this base-anchored model to infer disparity at all views belonging to the group. We employ a hierarchical disparity-compensated inter-view transform where the corresponding analysis and synthesis filters are applied along the geometric flows defined by the base-anchored disparity model. The output of this inter-view transform along with the disparity information is subjected to spatial wavelet transforms and embedded block-based coding. Rate-distortion results reveal superior performance to the x.265 anchor chosen by the JPEG Pleno standards activity for the coding of multiview imagery captured by high-density camera arrays.

9.
IEEE Trans Image Process ; 28(1): 343-355, 2019 Jan.
Article in English | MEDLINE | ID: mdl-30176592

ABSTRACT

We address the problem of decoding joint photographic experts group (JPEG)-encoded images with less visual artifacts. We view the decoding task as an ill-posed inverse problem and find a regularized solution using a convex, graph Laplacian-regularized model. Since the resulting problem is non-smooth and entails non-local regularization, we use fast high-dimensional Gaussian filtering techniques with the proximal gradient descent method to solve our convex problem efficiently. Our patch-based "coefficient graph" is better suited than the traditional pixel-based ones for regularizing smooth non-stationary signals such as natural images and relates directly to classic non-local means de-noising of images. We also extend our graph along the temporal dimension to handle the decoding of M-JPEG-encoded video. Despite the minimalistic nature of our convex problem, it produces decoded images with similar quality to other more complex, state-of-the-art methods while being up to five times faster. We also expound on the relationship between our method and the classic ANCE method, reinterpreting ANCE from a graph-based regularization perspective.

10.
IEEE Trans Image Process ; 27(6): 3100-3113, 2018 06.
Article in English | MEDLINE | ID: mdl-29993601

ABSTRACT

This paper presents a method leveraging coded motion information to obtain a fast, high quality motion field estimation. The method is inspired by a recent trend followed by a number of top-performing optical flow estimation schemes that first estimate a sparse set of features between two frames, and then use an edge-preserving interpolation scheme (EPIC) to obtain a piecewise-smooth motion field that respects moving object boundaries. In order to skip the time-consuming estimation of features, we propose to directly derive motion seeds from decoded HEVC block motion; we call the resulting scheme "HEVCEPIC". We propose motion seed weighting strategies that account for the fact that some motion seeds are less reliable than others. Experiments on a large variety of challenging sequences and various bit-rates show that HEVC-EPIC runs significantly faster than EPIC flow, while producing motion fields that have a slightly lower average endpoint error (A-EPE). HEVC-EPIC opens the door of seamlessly integrating HEVC motion into video analysis and enhancement tasks. When employed as input to a framerate upsampling scheme, the average Y-PSNR of the interpolated frames using HEVC-EPIC motion slightly outperforms EPIC flow across the tested bit-rates, while running an order of magnitude faster.

11.
IEEE Trans Image Process ; 26(6): 2972-2987, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28422683

ABSTRACT

Dictionary learning has emerged as a promising alternative to the conventional hybrid coding framework. However, the rigid structure of sequential training and prediction degrades its performance in scalable video coding. This paper proposes a progressive dictionary learning framework with hierarchical predictive structure for scalable video coding, especially in low bitrate region. For pyramidal layers, sparse representation based on spatio-temporal dictionary is adopted to improve the coding efficiency of enhancement layers with a guarantee of reconstruction performance. The overcomplete dictionary is trained to adaptively capture local structures along motion trajectories as well as exploit the correlations between the neighboring layers of resolutions. Furthermore, progressive dictionary learning is developed to enable the scalability in temporal domain and restrict the error propagation in a closed-loop predictor. Under the hierarchical predictive structure, online learning is leveraged to guarantee the training and prediction performance with an improved convergence rate. To accommodate with the state-of-the-art scalable extension of H.264/AVC and latest High Efficiency Video Coding (HEVC), standardized codec cores are utilized to encode the base and enhancement layers. Experimental results show that the proposed method outperforms the latest scalable extension of HEVC and HEVC simulcast over extensive test sequences with various resolutions.

12.
IEEE Trans Image Process ; 25(3): 1095-108, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26742132

ABSTRACT

This paper proposes a new method of calculating a matching metric for motion estimation. The proposed method splits the information in the source images into multiple scale and orientation subbands, reduces the subband values to a binary representation via an adaptive thresholding algorithm, and uses mutual information to model the similarity of corresponding square windows in each image. A moving window strategy is applied to recover a dense estimated motion field whose properties are explored. The proposed matching metric is a sum of mutual information scores across space, scale, and orientation. This facilitates the exploitation of information diversity in the source images. Experimental comparisons are performed amongst several related approaches, revealing that the proposed matching metric is better able to exploit information diversity, generating more accurate motion fields.

13.
IEEE Trans Image Process ; 25(1): 39-52, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26540690

ABSTRACT

Existing video coders anchor motion fields at frames that are to be predicted. In this paper, we demonstrate how changing the anchoring of motion fields to reference frames has some important advantages over conventional anchoring. We work with piecewise-smooth motion fields, and use breakpoints to signal discontinuities at moving object boundaries. We show how discontinuity information can be used to resolve double mappings arising when motion is warped from reference to target frames. We present an analytical model that allows to determine weights for texture, motion, and breakpoints to guide the rate-allocation for scalable encoding. Compared with the conventional way of anchoring motion fields, the proposed scheme requires fewer bits for the coding of motion; furthermore, the reconstructed video frames contain fewer ghosting artefacts. The experimental results show the superior performance compared with the traditional anchoring, and demonstrate the high scalability attributes of the proposed method.

14.
IEEE Trans Image Process ; 23(9): 3802-15, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24968173

ABSTRACT

In this paper, we propose the use of "motion hints" to produce interframe predictions. A motion hint is a loose and global description of motion that can be communicated using metadata; it describes a continuous and invertible motion model over multiple frames, spatially overlapping other motion hints. A motion hint provides a reasonably accurate description of motion but only a loose description of where it is applicable; it is the task of the client to identify the exact locations where this motion model is applicable. The focus of this paper is a probabilistic multiscale approach to identifying these locations of applicability; the method is robust to noise, quantization, and contrast changes. The proposed approach employs the Laplacian pyramid; it generates motion hint probabilities from observations at each scale of the pyramid. These probabilities are then combined across the scales of the pyramid starting from the coarsest scale. The computational cost of the approach is reasonable, and only the neighborhood of a pixel is employed to determine a motion hint probability, which makes parallel implementation feasible. This paper also elaborates on how motion hint probabilities are exploited in generating interframe predictions. The scheme of this paper is applicable to closed-loop prediction, but it is more useful in open-loop prediction scenarios, such as using prediction in conjunction with remote browsing of surveillance footage, communicated by a JPEG2000 Interactive Protocol (JPIP) server. We show that the interframe predictions obtained using the proposed approach are good both visually and in terms of PSNR.

15.
IEEE Trans Image Process ; 23(5): 2222-34, 2014 May.
Article in English | MEDLINE | ID: mdl-24686283

ABSTRACT

We present a noniterative multiresolution motion estimation strategy, involving block-based comparisons in each detail band of a Laplacian pyramid. A novel matching score is developed and analyzed. The proposed matching score is based on a class of nonlinear transformations of Laplacian detail bands, yielding 1-bit or 2-bit representations. The matching score is evaluated in a dense full-search motion estimation setting, with synthetic video frames and an optical flow data set. Together with a strategy for combining the matching scores across resolutions, the proposed method is shown to produce smoother and more robust estimates than mean square error (MSE) in each detail band and combined. It tolerates more of nontranslational motion, such as rotation, validating the analysis, while providing much better localization of the motion discontinuities. We also provide an efficient implementation of the motion estimation strategy and show that the computational complexity of the approach is closely related to the traditional MSE block-based full-search motion estimation procedure.

16.
IEEE Trans Image Process ; 22(11): 4394-406, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24048014

ABSTRACT

This paper devises an augmented active surface model for the recovery of small structures in a low resolution and high noise setting, where the role of regularization is especially important. The emphasis here is on evaluating performance using real clinical computed tomography (CT) data with comparisons made to an objective ground truth acquired using micro-CT. In this paper, we show that the application of conventional active contour methods to small objects leads to non-optimal results because of the inherent properties of the energy terms and their interactions with one another. We show that the blind use of a gradient magnitude based energy performs poorly at these object scales and that the point spread function (PSF) is a critical factor that needs to be accounted for. We propose a new model that augments the external energy with prior knowledge by incorporating the PSF and the assumption of reasonably constant underlying CT numbers.


Subject(s)
Algorithms , Imaging, Three-Dimensional/methods , Models, Biological , Pattern Recognition, Automated/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Vestibule, Labyrinth/diagnostic imaging , Computer Simulation , Humans , Radiographic Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
17.
IEEE Trans Image Process ; 22(11): 4364-79, 2013 Nov.
Article in English | MEDLINE | ID: mdl-23864205

ABSTRACT

This paper investigates priority encoding transmission (PET) protection for streaming scalably compressed video streams over erasure channels, for the scenarios where a small number of retransmissions are allowed. In principle, the optimal protection depends not only on the importance of each stream element, but also on the expected channel behavior. By formulating a collection of hypotheses concerning its own behavior in future transmissions, limited-retransmission PET (LR-PET) effectively constructs channel codes spanning multiple transmission slots and thus offers better protection efficiency than the original PET. As the number of transmission opportunities increases, the optimization for LR-PET becomes very challenging because the number of hypothetical retransmission paths increases exponentially. As a key contribution, this paper develops a method to derive the effective recovery-probability versus redundancy-rate characteristic for the LR-PET procedure with any number of transmission opportunities. This significantly accelerates the protection assignment procedure in the original LR-PET with only two transmissions, and also makes a quick and optimal protection assignment feasible for scenarios where more transmissions are possible. This paper also gives a concrete proof to the redundancy embedding property of the channel codes formed by LR-PET, which allows for a decoupled optimization for sequentially dependent source elements with convex utility-length characteristic. This essentially justifies the source-independent construction of the protection convex hull for LR-PET.


Subject(s)
Algorithms , Data Compression/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Photography/methods , Signal Processing, Computer-Assisted , Video Recording/methods , Reproducibility of Results , Sensitivity and Specificity
18.
IEEE Trans Image Process ; 22(5): 1982-95, 2013 May.
Article in English | MEDLINE | ID: mdl-23335671

ABSTRACT

Recent work on depth map compression has revealed the importance of incorporating a description of discontinuity boundary geometry into the compression scheme. We propose a novel compression strategy for depth maps that incorporates geometry information while achieving the goals of scalability and embedded representation. Our scheme involves two separate image pyramid structures, one for breakpoints and the other for sub-band samples produced by a breakpoint-adaptive transform. Breakpoints capture geometric attributes, and are amenable to scalable coding. We develop a rate-distortion optimization framework for determining the presence and precision of breakpoints in the pyramid representation. We employ a variation of the EBCOT scheme to produce embedded bit-streams for both the breakpoint and sub-band data. Compared to JPEG 2000, our proposed scheme enables the same the scalability features while achieving substantially improved rate-distortion performance at the higher bit-rate range and comparable performance at the lower rates.

19.
IEEE Trans Image Process ; 20(9): 2650-63, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21411403

ABSTRACT

In a recent work, the authors proposed a novel paradigm for interactive video streaming and coined the term JPEG2000-Based Scalable Interactive Video (JSIV) for it. In this work, we investigate JSIV when motion compensation is employed to improve prediction, something that was intentionally left out in our earlier treatment. JSIV relies on three concepts: storing the video sequence as independent JPEG2000 frames to provide quality and spatial resolution scalability, prediction and conditional replenishment of code-blocks to exploit inter-frame redundancy, and loosely coupled server and client policies in which a server optimally selects the number of quality layers for each code-block transmitted and a client makes the most of the received (distorted) frames. In JSIV, the server transmission problem is optimally solved using Lagrangian-style rate-distortion optimization. The flexibility of JSIV enables us to employ a wide variety of frame prediction arrangements, including hierarchical B-frames. JSIV provides considerably better interactivity compared with existing schemes and can adapt immediately to interactive changes in client interests, such as forward or backward playback and zooming into individual frames. Experimental results show that JSIV's performance is inferior to that of SVC in conventional streaming applications while JSIV performs better in interactive browsing applications.

20.
IEEE Trans Image Process ; 20(5): 1234-48, 2011 May.
Article in English | MEDLINE | ID: mdl-21078580

ABSTRACT

The authors present a computationally efficient technique for maximum a posteriori (MAP) estimation of images in the presence of both blur and noise. The image is divided into statistically independent regions. Each region is modelled with a WSS Gaussian prior. Classical Wiener filter theory is used to generate a set of convex sets in the solution space, with the solution to the MAP estimation problem lying at the intersection of these sets. The proposed algorithm uses an underlying segmentation of the image, and a means of determining the segmentation and refining it are described. The algorithm is suitable for a range of image restoration problems, as it provides a computationally efficient means to deal with the shortcomings of Wiener filtering without sacrificing the computational simplicity of the filtering approach. The algorithm is also of interest from a theoretical viewpoint as it provides a continuum of solutions between Wiener filtering and Inverse filtering depending upon the segmentation used. We do not attempt to show here that the proposed method is the best general approach to the image reconstruction problem. However, related work referenced herein shows excellent performance in the specific problem of demosaicing.


Subject(s)
Image Enhancement/methods , Image Processing, Computer-Assisted/methods , Algorithms , Regression Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...