Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Int J Comput Vis ; 131(1): 259-283, 2023.
Article in English | MEDLINE | ID: mdl-36624862

ABSTRACT

The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used "off-the-shelf" or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated. Supplementary Information: The online version contains supplementary material available at 10.1007/s11263-022-01694-6.

3.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 4125-4141, 2021 11.
Article in English | MEDLINE | ID: mdl-32365017

ABSTRACT

Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions. Our videos depict nonscripted daily activities, as recording is started every time a participant entered their kitchen. Recording took place in four countries by participants belonging to ten different nationalities, resulting in highly diverse kitchen habits and cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action segments and 454.2K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. We introduce new baselines that highlight the multimodal nature of the dataset and the importance of explicit temporal modelling to discriminate fine-grained actions (e.g., 'closing a tap' from 'opening' it up).


Subject(s)
Algorithms , Cooking , Attention , Humans
4.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 4021-4036, 2021 11.
Article in English | MEDLINE | ID: mdl-32386143

ABSTRACT

In this paper, we tackle the problem of egocentric action anticipation, i.e., predicting what actions the camera wearer will perform in the near future and which objects they will interact with. Specifically, we contribute Rolling-Unrolling LSTM, a learning architecture to anticipate actions from egocentric videos. The method is based on three components: 1) an architecture comprised of two LSTMs to model the sub-tasks of summarizing the past and inferring the future, 2) a Sequence Completion Pre-Training technique which encourages the LSTMs to focus on the different sub-tasks, and 3) a Modality ATTention (MATT) mechanism to efficiently fuse multi-modal predictions performed by processing RGB frames, optical flow fields and object-based features. The proposed approach is validated on EPIC-Kitchens, EGTEA Gaze+ and ActivityNet. The experiments show that the proposed architecture is state-of-the-art in the domain of egocentric videos, achieving top performances in the 2019 EPIC-Kitchens egocentric action anticipation challenge. The approach also achieves competitive performance on ActivityNet with respect to methods not based on unsupervised pre-training and generalizes to the tasks of early action recognition and action recognition. To encourage research on this challenging topic, we made our code, trained models, and pre-extracted features available at our web page: http://iplab.dmi.unict.it/rulstm.


Subject(s)
Algorithms , Attention , Humans , Learning
5.
IEEE Trans Image Process ; 26(2): 696-710, 2017 Feb.
Article in English | MEDLINE | ID: mdl-27849539

ABSTRACT

Perspective cameras are the most popular imaging sensors used in computer vision. However, many application fields, including automotive, surveillance, and robotics, require the use of wide angle cameras (e.g., fisheye), which allow to acquire a larger portion of the scene using a single device at the cost of the introduction of noticeable radial distortion in the images. Affine covariant feature detectors have proved successful in a variety of computer vision applications, including object recognition, image registration, and visual search. Moreover, their robustness to a series of variabilities related to both the scene and the image acquisition process has been thoroughly studied in the literature. In this paper, we investigate their effectiveness on fisheye images providing both theoretical and experimental analyses. As theoretical outcome, we show that the inherently non-linear radial distortion can be locally approximated by linear functions with a reasonably small error. The experimental analysis builds on Mikolajczyk's benchmark to assess the robustness of three popular affine region detectors (i.e., maximally stable extremal regions, and Harris and Hessian affine region detectors), with respect to different variabilities as well as to radial distortion. To support the evaluations, we rely on the Oxford data set and introduce a novel benchmark data set comprising 50 images depicting different scene categories. Experiments are carried out on rectilinear images to which radial distortion is artificially added, and on real-world images acquired using fisheye lenses. Our analysis points out that affine region detectors can be effectively employed directly on fisheye images and that the radial distortion is locally modeled as an additional affine variability.

6.
Comput Biol Med ; 77: 23-39, 2016 10 01.
Article in English | MEDLINE | ID: mdl-27498058

ABSTRACT

Automatic food understanding from images is an interesting challenge with applications in different domains. In particular, food intake monitoring is becoming more and more important because of the key role that it plays in health and market economies. In this paper, we address the study of food image processing from the perspective of Computer Vision. As first contribution we present a survey of the studies in the context of food image processing from the early attempts to the current state-of-the-art methods. Since retrieval and classification engines able to work on food images are required to build automatic systems for diet monitoring (e.g., to be embedded in wearable cameras), we focus our attention on the aspect of the representation of the food images because it plays a fundamental role in the understanding engines. The food retrieval and classification is a challenging task since the food presents high variableness and an intrinsic deformability. To properly study the peculiarities of different image representations we propose the UNICT-FD1200 dataset. It was composed of 4754 food images of 1200 distinct dishes acquired during real meals. Each food plate is acquired multiple times and the overall dataset presents both geometric and photometric variabilities. The images of the dataset have been manually labeled considering 8 categories: Appetizer, Main Course, Second Course, Single Course, Side Dish, Dessert, Breakfast, Fruit. We have performed tests employing different representations of the state-of-the-art to assess the related performances on the UNICT-FD1200 dataset. Finally, we propose a new representation based on the perceptual concept of Anti-Textons which is able to encode spatial information between Textons outperforming other representations in the context of food retrieval and Classification.


Subject(s)
Algorithms , Food/classification , Image Processing, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Cell Phone , Diet/classification , Humans , Mobile Applications , Photography
7.
IEEE Trans Image Process ; 23(5): 2081-95, 2014 May.
Article in English | MEDLINE | ID: mdl-24723572

ABSTRACT

Content-aware image resizing techniques allow to take into account the visual content of images during the resizing process. The basic idea beyond these algorithms is the removal of vertical and/or horizontal paths of pixels (i.e., seams) containing low salient information. In this paper, we present a method which exploits the gradient vector flow (GVF) of the image to establish the paths to be considered during the resizing. The relevance of each GVF path is straightforward derived from an energy map related to the magnitude of the GVF associated to the image to be resized. To make more relevant, the visual content of the images during the content-aware resizing, we also propose to select the generated GVF paths based on their visual saliency properties. In this way, visually important image regions are better preserved in the final resized image. The proposed technique has been tested, both qualitatively and quantitatively, by considering a representative data set of 1000 images labeled with corresponding salient objects (i.e., ground-truth maps). Experimental results demonstrate that our method preserves crucial salient regions better than other state-of-the-art algorithms.

8.
Sensors (Basel) ; 13(2): 2515-29, 2013 Feb 18.
Article in English | MEDLINE | ID: mdl-23429514

ABSTRACT

This paper describes both hardware and software components to detect counterfeits of Euro banknotes. The proposed system is also able to recognize the banknote values. Differently than other state-of-the-art methods, the proposed approach makes use of banknote images acquired with a near infrared camera to perform recognition and authentication. This allows one to build a system that can effectively deal with real forgeries, which are usually not detectable with visible light. The hardware does not use any mechanical parts, so the overall system is low-cost. The proposed solution is reliable for ambient light and banknote positioning. Users should simply lean the banknote to be analyzed on a flat glass, and the system detects forgery, as well as recognizes the banknote value. The effectiveness of the proposed solution has been properly tested on a dataset composed by genuine and fake Euro banknotes provided by Italy's central bank. 

9.
Article in English | MEDLINE | ID: mdl-17354961

ABSTRACT

A new method is proposed to unambiguously define a geometric partitioning of 3D models of female thorax. A breast partitioning scheme is derived from simple geometric primitives and well-defined anatomical points. Relevant measurements can be extrapolated from breast partition. Our method has been tested on a number of breast 3D models acquired by means of a commercial scanner on real clinical cases.


Subject(s)
Breast/anatomy & histology , Breast/surgery , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Mammaplasty/methods , Outcome Assessment, Health Care/methods , Surgery, Computer-Assisted/methods , Algorithms , Artificial Intelligence , Female , Humans , Image Enhancement/methods , Pattern Recognition, Automated/methods , Prognosis , Reproducibility of Results , Sensitivity and Specificity , Treatment Outcome
SELECTION OF CITATIONS
SEARCH DETAIL
...