Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8538-8552, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37015490

RESUMO

This article examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking. The rankings of algorithms by a criterion can fluctuate with different choices of parameters, e.g. Intersection over Union (IoU) threshold, making their evaluations unreliable. More importantly, there is no means to verify whether we can trust the evaluations of a criterion. This work suggests a notion of trustworthiness for performance criteria, which requires (i) robustness to parameters for reliability, (ii) contextual meaningfulness in sanity tests, and (iii) consistency with mathematical requirements such as the metric properties. We observe that these requirements were overlooked by many widely-used criteria, and explore alternative criteria using metrics for sets of shapes. We also assess all these criteria based on the suggested requirements for trustworthiness.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6748-6765, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-33798067

RESUMO

We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of annotated multimodal sensor data including stereo cylindrical 360 ° RGB video at 15 fps, 3D point clouds from two 16 planar rays Velodyne LiDARs, line 3D point clouds from two Sick Lidars, audio signal, RGB-D video at 30 fps, 360 ° spherical image from a fisheye camera and encoder values from the robot's wheels. Our dataset incorporates data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, all from the ego-perspective of the robot, both stationary and navigating. The dataset has been annotated with over 2.4 million bounding boxes spread over five individual cameras and 1.8 million associated 3D cuboids around all people in the scenes totaling over 3500 time consistent trajectories. Together with our dataset and the annotations, we launch a benchmark and metrics for 2D and 3D person detection and tracking. With this dataset, which we plan on extending with further types of annotation in the future, we hope to provide a new source of data and a test-bench for research in the areas of egocentric robot vision, autonomous navigation, and all perceptual tasks around social robotics in human environments.


Assuntos
Ambiente Construído , Robótica , Percepção Visual , Humanos , Algoritmos , Benchmarking , Robótica/métodos
3.
Nat Commun ; 12(1): 5721, 2021 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-34615862

RESUMO

The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, because performing large-scale in silico experiments on evolution and learning is challenging. Here, we introduce Deep Evolutionary Reinforcement Learning (DERL): a computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, we demonstrate a morphological Baldwin effect i.e., in our simulations evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the descendants lifetime. Third, we suggest a mechanistic basis for the above relationships through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.


Assuntos
Evolução Biológica , Aprendizado Profundo , Recompensa , Animais , Simulação por Computador
4.
NPJ Digit Med ; 4(1): 145, 2021 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-34620993

RESUMO

Biology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case-annotating cell types-and running experiments with seven pathologists-experts at the microscopic analysis of biological specimens-we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.

5.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 467-481, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-28287959

RESUMO

There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-words and object-words. An activity is about a set of action-topics and object-topics indicating which actions are present and which objects are interacting with. We then propose a new probabilistic model relating the words and the topics. It allows us to model long-range action relations that commonly exist in the composite activities, which is challenging in previous works. We apply our model to the unsupervised action segmentation and clustering, and to a novel application that detects forgotten actions, which we call action patching. For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects. Moreover, we develop a robotic system that watches and reminds people using our action patching algorithm. Our robotic setup can be easily deployed on any assistive robots.

6.
Front Psychol ; 5: 417, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24904452

RESUMO

Building fine-grained visual recognition systems that are capable of recognizing tens of thousands of categories, has received much attention in recent years. The well known semantic hierarchical structure of categories and concepts, has been shown to provide a key prior which allows for optimal predictions. The hierarchical organization of various domains and concepts has been subject to extensive research, and led to the development of the WordNet domains hierarchy (Fellbaum, 1998), which was also used to organize the images in the ImageNet (Deng et al., 2009) dataset, in which the category count approaches the human capacity. Still, for the human visual system, the form of the hierarchy must be discovered with minimal use of supervision or innate knowledge. In this work, we propose a new Bayesian generative model for learning such domain hierarchies, based on semantic input. Our model is motivated by the super-subordinate organization of domain labels and concepts that characterizes WordNet, and accounts for several important challenges: maintaining context information when progressing deeper into the hierarchy, learning a coherent semantic concept for each node, and modeling uncertainty in the perception process.

7.
IEEE Trans Pattern Anal Mach Intell ; 36(6): 1242-57, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26353284

RESUMO

This paper presents a principled framework for analyzing collective activities at different levels of semantic granularity from videos. Our framework is capable of jointly tracking multiple individuals, recognizing activities performed by individuals in isolation (i.e., atomic activities such as walking or standing), recognizing the interactions between pairs of individuals (i.e., interaction activities) as well as understanding the activities of group of individuals (i.e., collective activities). A key property of our work is that it can coherently combine bottom-up information stemming from detections or fragments of tracks (or tracklets) with top-down evidence. Top-down evidence is provided by a newly proposed descriptor that captures the coherent behavior of groups of individuals in a spatial-temporal neighborhood of the sequence. Top-down evidence provides contextual information for establishing accurate associations between detections or tracklets across frames and, thus, for obtaining more robust tracking results. Bottom-up evidence percolates upwards so as to automatically infer collective activity labels. Experimental results on two challenging data sets demonstrate our theoretical claims and indicate that our model achieves enhances tracking results and the best collective classification results to date.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Gravação de Videoteipe/métodos , Algoritmos , Atividades Humanas/classificação , Humanos , Postura/fisiologia
8.
IEEE Trans Pattern Anal Mach Intell ; 36(7): 1370-83, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26353309

RESUMO

In the last few years, substantially different approaches have been adopted for segmenting and detecting "things" (object categories that have a well defined shape such as people and cars) and "stuff" (object categories which have an amorphous spatial extent such as grass and sky). While things have been typically detected by sliding window or Hough transform based methods, detection of stuff is generally formulated as a pixel or segment-wise classification problem. This paper proposes a framework for scene understanding that models both things and stuff using a common representation while preserving their distinct nature by using a property list. This representation allows us to enforce sophisticated geometric and semantic relationships between thing and stuff categories via property interactions in a single graphical model. We use the latest advances made in the field of discrete optimization to efficiently perform maximum a posteriori (MAP) inference in this model. We evaluate our method on the Stanford dataset by comparing it against state-of-the-art methods for object segmentation and detection. We also show that our method achieves competitive performances on the challenging PASCAL '09 segmentation dataset.

9.
IEEE Trans Pattern Anal Mach Intell ; 35(7): 1577-91, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23681988

RESUMO

In this paper, we present a general framework for tracking multiple, possibly interacting, people from a mobile vision platform. To determine all of the trajectories robustly and in a 3D coordinate system, we estimate both the camera's ego-motion and the people's paths within a single coherent framework. The tracking problem is framed as finding the MAP solution of a posterior probability, and is solved using the reversible jump Markov chain Monte Carlo (RJ-MCMC) particle filtering method. We evaluate our system on challenging datasets taken from moving cameras, including an outdoor street scene video dataset, as well as an indoor RGB-D dataset collected in an office. Experimental evidence shows that the proposed method can robustly estimate a camera's motion from dynamic scenes and stably track people who are moving independently or interacting.


Assuntos
Atividades Humanas/classificação , Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Gravação em Vídeo/métodos , Humanos , Cadeias de Markov , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...