Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 868-886, 2023 Jan.
Article in English | MEDLINE | ID: mdl-35025739

ABSTRACT

Point cloud (PC)-a collection of discrete geometric samples of a 3D object's surface-is typically large, which entails expensive subsequent operations. Thus, PC sub-sampling is of practical importance. Previous model-based sub-sampling schemes are ad-hoc in design and do not preserve the overall shape sufficiently well, while previous data-driven schemes are trained for specific pre-determined input PC sizes and sub-sampling rates and thus do not generalize well. Leveraging advances in graph sampling, we propose a fast PC sub-sampling algorithm of linear time complexity that chooses a 3D point subset while minimizing a global reconstruction error. Specifically, to articulate a sampling objective, we first assume a super-resolution (SR) method based on feature graph Laplacian regularization (FGLR) that reconstructs the original high-res PC, given points chosen by a sampling matrix H. We prove that minimizing a worst-case SR reconstruction error is equivalent to maximizing the smallest eigenvalue λmin of matrix HT H+ µL, where L is a symmetric, positive semi-definite matrix derived from a neighborhood graph connecting the 3D points. To arrive at a fast algorithm, instead of maximizing λmin, we maximize a lower bound λ-min(HT H+ µL) via selection of H-this translates to a graph sampling problem for a signed graph G with self-loops specified by graph Laplacian L. We tackle this general graph sampling problem in three steps. First, we approximate G with a balanced graph GB specified by Laplacian LB. Second, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we perform a similarity transform Lp = SLB S-1, so that all Gershgorin disc left-ends of Lp are aligned exactly at λmin(LB). Finally, we choose samples on GB using a previous graph sampling algorithm to maximize λ-min(HT H+ µLp) in linear time. Experimental results show that 3D points chosen by our algorithm outperformed competing schemes both numerically and visually in reconstruction quality.

2.
IEEE Trans Image Process ; 31: 4117-4132, 2022.
Article in English | MEDLINE | ID: mdl-35696478

ABSTRACT

Point cloud (PC) is a collection of discrete geometric samples of a physical object in 3D space. A PC video consists of temporal frames evenly spaced in time, each containing a static PC at one time instant. PCs in adjacent frames typically do not have point-to-point (P2P) correspondence, and thus exploiting temporal redundancy for PC restoration across frames is difficult. In this paper, we focus on the super-resolution (SR) problem for PC video: increase point density of PCs in video frames while preserving salient geometric features consistently across time. We accomplish this with two ideas. First, we establish partial P2P coupling between PCs of adjacent frames by interpolating interior points in a low-resolution PC patch in frame t and translating them to a corresponding patch in frame t+1 , via a motion model computed by iterative closest point (ICP). Second, we promote piecewise smoothness in 3D geometry in each patch using feature graph Laplacian regularizer (FGLR) in an easily computable quadratic form. The two ideas translate to an unconstrained quadratic programming (QP) problem with a system of linear equations as solution-one where we ensure the numerical stability by upper-bounding the condition number of the coefficient matrix. Finally, to improve the accuracy of the ICP motion model, we re-sample points in a super-resolved patch at time t to better match a low-resolution patch at time t+1 via bipartite graph matching after each SR iteration. Experimental results show temporally consistent super-resolved PC videos generated by our scheme, outperforming SR competitors that optimized on a per-frame basis, in two established PC metrics.

3.
IEEE Trans Image Process ; 31: 2739-2754, 2022.
Article in English | MEDLINE | ID: mdl-35324440

ABSTRACT

At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machine vision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machine vision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.


Subject(s)
Vision, Ocular , Humans
4.
Data Brief ; 41: 107892, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35198673

ABSTRACT

We present an update to the previously published dataset known as SFU-HW-Objects-v1. The new dataset is called SFU-HW-Tracks-v1 and contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and its dimensions. The dataset can be used to evaluate object tracking performance on uncompressed video sequences and study the relationship between video compression and object tracking, which was not possible using SFU-HW-Objects-v1.

5.
IEEE Trans Image Process ; 30: 3321-3334, 2021.
Article in English | MEDLINE | ID: mdl-33635788

ABSTRACT

We propose a neural network model to estimate the current frame from two reference frames, using affine transformation and adaptive spatially-varying filters. The estimated affine transformation allows for using shorter filters compared to existing approaches for deep frame prediction. The predicted frame is used as a reference for coding the current frame. Since the proposed model is available at both encoder and decoder, there is no need to code or transmit motion information for the predicted frame. By making use of dilated convolutions and reduced filter length, our model is significantly smaller, yet more accurate, than any of the neural networks in prior works on this topic. Two versions of the proposed model - one for uni-directional, and one for bi-directional prediction - are trained using a combination of Discrete Cosine Transform (DCT)-based l1 -loss with various transform sizes, multi-scale Mean Squared Error (MSE) loss, and an object context reconstruction loss. The trained models are integrated with the HEVC video coding pipeline. The experiments show that the proposed models achieve about 7.3%, 5.4%, and 4.2% bit savings for the luminance component on average in the Low delay P, Low delay, and Random access configurations, respectively.

6.
IEEE Trans Image Process ; 30: 3348-3361, 2021.
Article in English | MEDLINE | ID: mdl-33635790

ABSTRACT

In recent studies, collaborative intelligence (CI) has emerged as a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile/edge devices. In CI, the AI model (a deep neural network) is split between the edge and the cloud, and intermediate features are sent from the edge sub-model to the cloud sub-model. In this article, we study bit allocation for feature coding in multi-stream CI systems. We model task distortion as a function of rate using convex surfaces similar to those found in distortion-rate theory. Using such models, we are able to provide closed-form bit allocation solutions for single-task systems and scalarized multi-task systems. Moreover, we provide analytical characterization of the full Pareto set for 2-stream k -task systems, and bounds on the Pareto set for 3-stream 2-task systems. Analytical results are examined on a variety of DNN models from the literature to demonstrate wide applicability of the results.

7.
Data Brief ; 34: 106701, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33457477

ABSTRACT

We present an object labelled dataset called SFU-HW-Objects-v1, which contains object labels for a set of raw video sequences. The dataset can be useful for the cases where both object detection accuracy and video coding efficiency need to be evaluated on the same dataset. Object ground-truths for 18 of the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences have been labelled. The object categories used for the labeling are based on the Common Objects in Context (COCO) labels. A total of 21 object classes are found in test sequences, out of the 80 original COCO label classes. Brief descriptions of the labeling process and the structure of the dataset are presented.

8.
Article in English | MEDLINE | ID: mdl-32012012

ABSTRACT

Point cloud is a collection of 3D coordinates that are discrete geometric samples of an object's 2D surfaces. Imperfection in the acquisition process means that point clouds are often corrupted with noise. Building on recent advances in graph signal processing, we design local algorithms for 3D point cloud denoising. Specifically, we design a signal-dependent feature graph Laplacian regularizer (SDFGLR) that assumes surface normals computed from point coordinates are piecewise smooth with respect to a signal-dependent graph Laplacian matrix. Using SDFGLR as a signal prior, we formulate an optimization problem with a general 'p-norm fidelity term that can explicitly remove only two types of additive noise: small but non-sparse noise like Gaussian (using '2 fidelity term) and large but sparser noise like Laplacian (using '1 fidelity term). To establish a linear relationship between normals and 3D point coordinates, we first perform bipartite graph approximation to divide the point cloud into two disjoint node sets (red and blue). We then optimize the red and blue nodes' coordinates alternately. For '2-norm fidelity term, we iteratively solve an unconstrained quadratic programming (QP) problem, efficiently computed using conjugate gradient with a bounded condition number to ensure numerical stability. For '1-norm fidelity term, we iteratively minimize an '1-'2 cost function using accelerated proximal gradient (APG), where a good step size is chosen via Lipschitz continuity analysis. Finally, we propose simple mean and median filters for flat patches of a given point cloud to estimate the noise variance given the noise type, which in turn is used to compute a weight parameter trading off the fidelity term and signal prior in the problem formulation. Extensive experiments show state-of-the-art denoising performance among local methods using our proposed algorithms.

9.
Data Brief ; 27: 104752, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31886334

ABSTRACT

We present two new fisheye image datasets for training object and face detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.

10.
Article in English | MEDLINE | ID: mdl-30507532

ABSTRACT

Just noticeable difference (JND) models are widely used for perceptual redundancy estimation in images and videos. A common method for measuring the accuracy of a JND model is to inject random noise in an image based on the JND model, and check whether the JND-noise-contaminated image is perceptually distinguishable from the original image or not. Also, when comparing the accuracy of two different JND models, the model that produces the JND-noise-contaminated image with better quality at the same level of noise energy is the better model. But in both of these cases, a subjective test is necessary, which is very time consuming and costly. In this paper, we present a full-reference metric called PDP (perceptual distinguishability predictor), which can be used to determine whether a given JND-noise-contaminated image is perceptually distinguishable from the reference image. The proposed metric employs the concept of sparse coding, and extracts a feature vector out of a given image pair. The feature vector is then fed to a multilayer neural network for classification. To train the network, we built a public database of 999 natural images with distinguishbility thresholds for four different JND models obtained from an extensive subjective experiment. The results indicated that PDD achieves high classification accuracy of 97.1%. The proposed method can be used to objectively compare various JND models without performing any subjective test. It can also be used to obtain proper scaling factors to improve the JND thresholds estimated by an arbitrary JND model.

11.
Sci Data ; 3: 160037, 2016 Jun 07.
Article in English | MEDLINE | ID: mdl-27271937

ABSTRACT

With the cost of consuming resources increasing (both economically and ecologically), homeowners need to find ways to curb consumption. The Almanac of Minutely Power dataset Version 2 (AMPds2) has been released to help computational sustainability researchers, power and energy engineers, building scientists and technologists, utility companies, and eco-feedback researchers test their models, systems, algorithms, or prototypes on real house data. In the vast majority of cases, real-world datasets lead to more accurate models and algorithms. AMPds2 is the first dataset to capture all three main types of consumption (electricity, water, and natural gas) over a long period of time (2 years) and provide 11 measurement characteristics for electricity. No other such datasets from Canada exist. Each meter has 730 days of captured data. We also include environmental and utility billing data for cost analysis. AMPds2 data has been pre-cleaned to provide for consistent and comparable accuracy results amongst different researchers and machine learning algorithms.

12.
IEEE Trans Image Process ; 23(1): 19-33, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24107933

ABSTRACT

In region-of-interest (ROI)-based video coding, ROI parts of the frame are encoded with higher quality than non-ROI parts. At low bit rates, such encoding may produce attention-grabbing coding artifacts, which may draw viewer's attention away from ROI, thereby degrading visual quality. In this paper, we present a saliency-aware video compression method for ROI-based video coding. The proposed method aims at reducing salient coding artifacts in non-ROI parts of the frame in order to keep user's attention on ROI. Further, the method allows saliency to increase in high quality parts of the frame, and allows saliency to reduce in non-ROI parts. Experimental results indicate that the proposed method is able to improve visual quality of encoded video relative to conventional rate distortion optimized video coding, as well as two state-of-the art perceptual video coding methods.


Subject(s)
Algorithms , Artifacts , Data Compression/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Video Recording/methods , Photography/methods , Reproducibility of Results , Sensitivity and Specificity
13.
IEEE Trans Image Process ; 22(12): 4825-40, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23955763

ABSTRACT

A new method for video watermarking is presented in this paper. In the proposed method, data are embedded in the LL subband of wavelet coefficients, and decoding is performed based on the comparison among the elements of the first principal component resulting from empirical principal component analysis (PCA). The locations for data embedding are selected such that they offer the most robust PCA-based decoding. Data are inserted in the LL subband in an adaptive manner based on the energy of high frequency subbands and visual saliency. Extensive testing was performed under various types of attacks, such as spatial attacks (uniform and Gaussian noise and median filtering), compression attacks (MPEG-2, H. 263, and H. 264), and temporal attacks (frame repetition, frame averaging, frame swapping, and frame rate conversion). The results show that the proposed method offers improved performance compared with several methods from the literature, especially under additive noise and compression attacks.

14.
IEEE Trans Image Process ; 22(1): 300-13, 2013 Jan.
Article in English | MEDLINE | ID: mdl-22910117

ABSTRACT

Despite the recent progress in both pixel-domain and compressed-domain video object tracking, the need for a tracking framework with both reasonable accuracy and reasonable complexity still exists. This paper presents a method for tracking moving objects in H.264/AVC-compressed video sequences using a spatio-temporal Markov random field (ST-MRF) model. An ST-MRF model naturally integrates the spatial and temporal aspects of the object's motion. Built upon such a model, the proposed method works in the compressed domain and uses only the motion vectors (MVs) and block coding modes from the compressed bitstream to perform tracking. First, the MVs are preprocessed through intracoded block motion approximation and global motion compensation. At each frame, the decision of whether a particular block belongs to the object being tracked is made with the help of the ST-MRF model, which is updated from frame to frame in order to follow the changes in the object's motion. The proposed method is tested on a number of standard sequences, and the results demonstrate its advantages over some of the recent state-of-the-art methods.

15.
IEEE Trans Image Process ; 21(2): 898-903, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21859619

ABSTRACT

This correspondence describes a publicly available database of eye-tracking data, collected on a set of standard video sequences that are frequently used in video compression, processing, and transmission simulations. A unique feature of this database is that it contains eye-tracking data for both the first and second viewings of the sequence. We have made available the uncompressed video sequences and the raw eye-tracking data for each sequence, along with different visualizations of the data and a preliminary analysis based on two well-known visual attention models.


Subject(s)
Algorithms , Databases, Factual , Eye Movements/physiology , Video Recording , Adult , Female , Humans , Image Processing, Computer-Assisted , Internet , Male
16.
IEEE Trans Image Process ; 20(11): 3195-206, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21435974

ABSTRACT

In video transmission over packet-based networks, packet losses often occur in bursts. In this paper, we present a novel packetization method for increasing the robustness of compressed video against bursty packet losses. The proposed method is based on creating a coding order of macroblocks (MBs) so that the blocks that are close to each other in the coding order end up being far from each other in the frame. We formulate this idea as a discrete optimization problem, prove its NP-hardness, and discuss several possible solution methods. Experimental results indicate that the proposed method improves the quality of reconstructed frames under burst loss by several decibels compared to conventional flexible MB ordering techniques, and about 0.7 dB compared to the state-of-the-art method called explicit chessboard wipe.

17.
IEEE Trans Image Process ; 19(10): 2693-704, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20442048

ABSTRACT

In this paper we present joint decoding of JPEG2000 bitstreams and Reed-Solomon codes in the context of unequal loss protection. Using error resilience features of JPEG2000 bitstreams, the joint decoder helps to restore the erased symbols when the Reed-Solomon decoder fails to retrieve them on its own. However, the joint decoding process might become time-consuming due to a search through the set of possible erased symbols. We propose the use of smaller codeblocks and transmission of a relatively small amount of side information with high reliability as two approaches to accelerate the joint decoding process. The accelerated joint decoder can deliver essentially the same quality enhancement as the nonaccelerated one, while operating several times faster.

18.
Nucleic Acids Res ; 34(Web Server issue): W560-5, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16845070

ABSTRACT

Nucleosomes, a basic structural unit of eukaryotic chromatin, play a significant role in regulating gene expression. We have developed a web tool based on DNA sequences known from empirical and theoretical studies to influence DNA bending and flexibility, and to exclude nucleosomes. NXSensor (available at http://www.sfu.ca/~ibajic/NXSensor/) finds nucleosome exclusion sequences, evaluates their length and spacing, and computes an 'accessibility score' giving the proportion of base pairs likely to be nucleosome-free. Application of NXSensor to the promoter regions of housekeeping (HK) genes and those of tissue-specific (TS) genes revealed a significant difference between the two classes of gene, the former being significantly more open, on average, particularly near transcription start sites (TSSs). NXSensor should be a useful tool in assessing the likelihood of nucleosome formation in regions involved in gene regulation and other aspects of chromatin function.


Subject(s)
DNA-Binding Proteins/metabolism , Nucleosomes/chemistry , Promoter Regions, Genetic , Sequence Analysis, DNA/methods , Software , Transcription Factors/metabolism , Binding Sites , DNA/chemistry , Internet , Nucleic Acid Conformation , Nucleosomes/metabolism , User-Computer Interface
19.
IEEE Trans Image Process ; 15(5): 1226-35, 2006 May.
Article in English | MEDLINE | ID: mdl-16671303

ABSTRACT

In this paper, we present an adaptive maximum a posteriori (MAP) error concealment algorithm for dispersively packetized wavelet-coded images. We model the subbands of a wavelet-coded image as Markov random fields, and use the edge characteristics in a particular subband, and regularity properties of subband/wavelet samples across scales, to adapt the potential functions locally. The resulting adaptive MAP estimation gives PSNR advantages of up to 0.7 dB compared to the competing algorithms. The advantage is most evident near the edges, which helps improve the visual quality of the reconstructed images.


Subject(s)
Algorithms , Artifacts , Computer Communication Networks , Data Compression/methods , Image Enhancement/methods , Signal Processing, Computer-Assisted , Computer Graphics , Computer Simulation , Data Interpretation, Statistical , Likelihood Functions , Models, Statistical , Numerical Analysis, Computer-Assisted
20.
IEEE Trans Image Process ; 12(10): 1211-25, 2003.
Article in English | MEDLINE | ID: mdl-18237888

ABSTRACT

In this paper, we present a method of creating domain-based multiple descriptions of images and video. These descriptions are created by partitioning the transform domain of the signal into sets whose points are maximally separated from each other. This property enables simple error concealment methods to produce good estimates of lost signal samples. We present the approach in the context of Internet transmission of subband/wavelet-coded images and scalable motion compensated three-dimensional (3D) subband/wavelet-coded video, but applications are not limited to these scenarios. The results indicate that the proposed methods offer improvements over similar competing methods by up to 1 dB for images, and several decibels for video. Visual quality is also improved.

SELECTION OF CITATIONS
SEARCH DETAIL
...