Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Int J Comput Vis ; 131(1): 259-283, 2023.
Article in English | MEDLINE | ID: mdl-36624862

ABSTRACT

The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used "off-the-shelf" or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated. Supplementary Information: The online version contains supplementary material available at 10.1007/s11263-022-01694-6.

2.
Comput Med Imaging Graph ; 102: 102142, 2022 12.
Article in English | MEDLINE | ID: mdl-36446308

ABSTRACT

Convolutional neural networks (CNNs) applied to magnetic resonance imaging (MRI) have demonstrated their ability in the automatic diagnosis of knee injuries. Despite the promising results, the currently available solutions do not take into account the particular anatomy of knee disorders. Existing works have shown that injuries are localized in small-sized knee regions near the center of MRI scans. Based on such insights, we propose MRPyrNet, a CNN architecture capable of extracting more relevant features from these regions. Our solution is composed of a Feature Pyramid Network with Pyramidal Detail Pooling, and can be plugged into any existing CNN-based diagnostic pipeline. The first module aims to enhance the CNN intermediate features to better detect the small-sized appearance of disorders, while the second one captures such kind of evidence by maintaining its detailed information. An extensive evaluation campaign is conducted to understand in-depth the potential of the proposed solution. The experimental results achieved demonstrate that the application of MRPyrNet to baseline methodologies improves their diagnostic capability, especially in the case of anterior cruciate ligament tear and meniscal tear because of MRPyrNet's ability in exploiting the relevant appearance features of such disorders. Code is available at https://github.com/matteo-dunnhofer/MRPyrNet.


Subject(s)
Magnetic Resonance Imaging , Neural Networks, Computer
3.
Med Image Anal ; 60: 101631, 2020 02.
Article in English | MEDLINE | ID: mdl-31927473

ABSTRACT

The tracking of the knee femoral condyle cartilage during ultrasound-guided minimally invasive procedures is important to avoid damaging this structure during such interventions. In this study, we propose a new deep learning method to track, accurately and efficiently, the femoral condyle cartilage in ultrasound sequences, which were acquired under several clinical conditions, mimicking realistic surgical setups. Our solution, that we name Siam-U-Net, requires minimal user initialization and combines a deep learning segmentation method with a siamese framework for tracking the cartilage in temporal and spatio-temporal sequences of 2D ultrasound images. Through extensive performance validation given by the Dice Similarity Coefficient, we demonstrate that our algorithm is able to track the femoral condyle cartilage with an accuracy which is comparable to experienced surgeons. It is additionally shown that the proposed method outperforms state-of-the-art segmentation models and trackers in the localization of the cartilage. We claim that the proposed solution has the potential for ultrasound guidance in minimally invasive knee procedures.


Subject(s)
Cartilage, Articular/diagnostic imaging , Image Processing, Computer-Assisted/methods , Knee Joint/diagnostic imaging , Neural Networks, Computer , Ultrasonography, Interventional/methods , Arthroscopy , Deep Learning , Female , Healthy Volunteers , Humans , Imaging, Three-Dimensional , Male
4.
Neural Netw ; 124: 20-38, 2020 Apr.
Article in English | MEDLINE | ID: mdl-31962232

ABSTRACT

Classification of high dimensional data suffers from curse of dimensionality and over-fitting. Neural tree is a powerful method which combines a local feature selection and recursive partitioning to solve these problems, but it leads to high depth trees in classifying high dimensional data. On the other hand, if less depth trees are used, the classification accuracy decreases or over-fitting increases. This paper introduces a novel Neural Tree exploiting Expert Nodes (NTEN) to classify high-dimensional data. It is based on a decision tree structure, whose internal nodes are expert nodes performing multi-dimensional splitting. Any expert node has three decision-making abilities. Firstly, they can select the most eligible neural network with respect to the data complexity. Secondly, they evaluate the over-fitting. Thirdly, they can cluster the features to jointly minimize redundancy and overlapping. To this aim, metaheuristic optimization algorithms including GA, NSGA-II, PSO and ACO are applied. Based on these concepts, any expert node splits a class when the over-fitting is low, and clusters the features when the over-fitting is high. Some theoretical results on NTEN are derived, and experiments on 35 standard data show that NTEN reaches good classification results, reduces tree depth without over-fitting and degrading accuracy.


Subject(s)
Neural Networks, Computer , Data Management/methods , Decision Trees
5.
IEEE Trans Image Process ; 26(4): 1650-1665, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28092558

ABSTRACT

Existing approaches for person re-identification are mainly based on creating distinctive representations or on learning optimal metrics. The achieved results are then provided in the form of a list of ranked matching persons. It often happens that the true match is not ranked first but it is in the first positions. This is mostly due to the visual ambiguities shared between the true match and other "similar" persons. At the current state, there is a lack of a study of such visual ambiguities which limit the re-identification performance within the first ranks. We believe that an analysis of the similar appearances of the first ranks can be helpful in detecting, hence removing, such visual ambiguities. We propose to achieve such a goal by introducing an unsupervised post-ranking framework. Once the initial ranking is available, content and context sets are extracted. Then, these are exploited to remove the visual ambiguities and to obtain the discriminant feature space which is finally exploited to compute the new ranking. An in-depth analysis of the performance achieved on three public benchmark data sets support our believes. For every data set, the proposed method remarkably improves the first ranks results and outperforms the state-of-the-art approaches.

6.
IEEE Trans Cybern ; 47(11): 3530-3541, 2017 Nov.
Article in English | MEDLINE | ID: mdl-27249845

ABSTRACT

Plenty of research has been conducted to obtain the best reidentification performance between a single camera-pairs. None of the current approaches has addressed the reidentification in a camera network by considering the network topology (i.e., the structure of the monitored environment). We introduce a distributed network person reidentification framework which introduces the following contributions. 1) a camera matching cost to measure the reidentification performance between nodes of the network and 2) a derivation of the distance vector algorithm which allows to learn the network topology thus to prioritize and limit the cameras inquired for the matching of the probe. Results on three benchmark datasets show that the network topology can be learned in an unsupervised fashion and network-wise reidentification performance improves. As a side effect, we obtain that the communication bandwidth usage is reduced.


Subject(s)
Biometric Identification/methods , Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Algorithms , Humans , Video Recording
7.
IEEE Trans Image Process ; 24(12): 5645-58, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26452280

ABSTRACT

Person re-identification in a non-overlapping multi-camera scenario is an open and interesting challenge. While the task can hardly be completed by machines, we, as humans, are inherently able to sample those relevant persons' details that allow us to correctly solve the problem in a fraction of a second. Thus, knowing where a human might fixate to recognize a person is of paramount interest for re-identification. Inspired by the human gazing capabilities, we want to identify the salient regions of a person appearance to tackle the problem. Toward this objective, we introduce the following main contributions. A kernelized graph-based approach is used to detect the salient regions of a person appearance, later used as a weighting tool in the feature extraction process. The proposed person representation combines visual features either considering or not the saliency. These are then exploited in a pairwise-based multiple metric learning framework. Finally, the non-Euclidean metrics that have been separately learned for each feature are fused to re-identify a person. The proposed kernelized saliency-based person re-identification through multiple metric learning has been evaluated on four publicly available benchmark data sets to show its superior performance over the state-of-the-art approaches (e.g., it achieves a rank 1 correct recognition rate of 42.41% on the VIPeR data set).


Subject(s)
Algorithms , Biometric Identification/methods , Image Processing, Computer-Assisted/methods , Machine Learning , Humans
8.
IEEE Trans Pattern Anal Mach Intell ; 37(8): 1656-69, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26353002

ABSTRACT

Person re-identification in a non-overlapping multicamera scenario is an open challenge in computer vision because of the large changes in appearances caused by variations in viewing angle, lighting, background clutter, and occlusion over multiple cameras. As a result of these variations, features describing the same person get transformed between cameras. To model the transformation of features, the feature space is nonlinearly warped to get the "warp functions". The warp functions between two instances of the same target form the set of feasible warp functions while those between instances of different targets form the set of infeasible warp functions. In this work, we build upon the observation that feature transformations between cameras lie in a nonlinear function space of all possible feature transformations. The space consisting of all the feasible and infeasible warp functions is the warp function space (WFS). We propose to learn a discriminating surface separating these two sets of warp functions in the WFS and to re-identify persons by classifying a test warp function as feasible or infeasible. Towards this objective, a Random Forest (RF) classifier is employed which effectively chooses the warp function components according to their importance in separating the feasible and the infeasible warp functions in the WFS. Extensive experiments on five datasets are carried out to show the superior performance of the proposed approach over state-of-the-art person re-identification methods. We show that our approach outperforms all other methods when large illumination variations are considered. At the same time it has been shown that our method reaches the best average performance over multiple combinations of the datasets, thus, showing that our method is not designed only to address a specific challenge posed by a particular dataset.


Subject(s)
Algorithms , Biometric Identification/methods , Databases, Factual , Humans , Video Recording
9.
Neural Netw ; 27: 81-90, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22071271

ABSTRACT

This paper proposes a new neural tree (NT) architecture, balanced neural tree (BNT), to reduce tree size and improve classification with respect to classical NTs. To achieve this result, two main innovations have been introduced: (a) perceptron substitution and (b) pattern removal. The first innovation aims to balance the structure of the tree. If the last-trained perceptron largely misclassifies the given training set into a reduced number of classes, then this perceptron is substituted with a new perceptron. The second novelty consists of the introduction of a new criterion for the removal of tough training patterns that generate the problem of over-fitting. Finally, a new error function based on the depth of the tree is introduced to reduce perceptron training time. The proposed BNT has been tested on various synthetic and real datasets. The experimental results show that the proposed BNT leads to satisfactory results in terms of both tree depth reduction and classification accuracy.


Subject(s)
Decision Trees , Neural Networks, Computer , Pattern Recognition, Automated/methods
10.
Sensors (Basel) ; 9(4): 2252-70, 2009.
Article in English | MEDLINE | ID: mdl-22574011

ABSTRACT

The paper is a survey of the main technological aspects of advanced visual-based surveillance systems. A brief historical view of such systems from the origins to nowadays is given together with a short description of the main research projects in Italy on surveillance applications in the last twenty years. The paper then describes the main characteristics of an advanced visual sensor network that (a) directly processes locally acquired digital data, (b) automatically modifies intrinsic (focus, iris) and extrinsic (pan, tilt, zoom) parameters to increase the quality of acquired data and (c) automatically selects the best subset of sensors in order to monitor a given moving object in the observed environment.

SELECTION OF CITATIONS
SEARCH DETAIL
...