Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Front Robot AI ; 10: 1076780, 2023.
Article in English | MEDLINE | ID: mdl-37205224

ABSTRACT

We present Affordance Recognition with One-Shot Human Stances (AROS), a one-shot learning approach that uses an explicit representation of interactions between highly articulated human poses and 3D scenes. The approach is one-shot since it does not require iterative training or retraining to add new affordance instances. Furthermore, only one or a small handful of examples of the target pose are needed to describe the interactions. Given a 3D mesh of a previously unseen scene, we can predict affordance locations that support the interactions and generate corresponding articulated 3D human bodies around them. We evaluate the performance of our approach on three public datasets of scanned real environments with varied degrees of noise. Through rigorous statistical analysis of crowdsourced evaluations, our results show that our one-shot approach is preferred up to 80% of the time over data-intensive baselines.

3.
Front Neurosci ; 16: 909448, 2022.
Article in English | MEDLINE | ID: mdl-36046469

ABSTRACT

Many types of Convolutional Neural Network (CNN) models and training methods have been proposed in recent years aiming to provide efficiency for embedded and edge devices with limited computation and memory resources. The wide variety of architectures makes this a complex task that has to balance generality with efficiency. Among the most interesting camera-sensor architectures are Pixel Processor Arrays (PPAs). This study presents two methods that are useful for embedded CNNs in general but particularly suitable for PPAs. The first is for training purely binarized CNNs, the second is for deploying larger models with a model swapping paradigm that loads model components dynamically. Specifically, this study trains and implements networks with batch normalization and adaptive threshold for binary activations. Then, we convert batch normalization and binary activations into a bias matrix which can be parallelly implemented by an add/sub operation. For dynamic model swapping, we propose to decompose applications that are beyond the capacity of a PPA into sub-tasks that can be solved by tree networks that can be loaded dynamically as needed. We demonstrate our approaches to various tasks including classification, localization, and coarse segmentation on a highly resource constrained PPA sensor-processor.

4.
Sci Robot ; 7(67): eabl7755, 2022 06 29.
Article in English | MEDLINE | ID: mdl-35767647

ABSTRACT

Vision processing for control of agile autonomous robots requires low-latency computation, within a limited power and space budget. This is challenging for conventional computing hardware. Parallel processor arrays (PPAs) are a new class of vision sensor devices that exploit advances in semiconductor technology, embedding a processor within each pixel of the image sensor array. Sensed pixel data are processed on the focal plane, and only a small amount of relevant information is transmitted out of the vision sensor. This tight integration of sensing, processing, and memory within a massively parallel computing architecture leads to an interesting trade-off between high performance, low latency, low power, low cost, and versatility in a machine vision system. Here, we review the history of image sensing and processing hardware from the perspective of in-pixel computing and outline the key features of a state-of-the-art smart camera system based on a PPA device, through the description of the SCAMP-5 system. We describe several robotic applications for agile ground and aerial vehicles, demonstrating PPA sensing functionalities including high-speed odometry, target tracking, obstacle detection, and avoidance. In the conclusions, we provide some insight and perspective on the future development of PPA devices, including their application and benefits within agile, robust, adaptable, and lightweight robotics.


Subject(s)
Robotics , Computers , Vision, Ocular , Visual Perception
5.
J Vis ; 21(3): 13, 2021 03 01.
Article in English | MEDLINE | ID: mdl-33688920

ABSTRACT

Eye movements can support ongoing manipulative actions, but a class of so-called look ahead fixations (LAFs) are related to future tasks. We examined LAFs in a complex natural task-assembling a camping tent. Tent assembly is a relatively uncommon task and requires the completion of multiple subtasks in sequence over a 5- to 20-minute duration. Participants wore a head-mounted camera and eye tracker. Subtasks and LAFs were annotated. We document four novel aspects of LAFs. First, LAFs were not random and their frequency was biased to certain objects and subtasks. Second, latencies are larger than previously noted, with 35% of LAFs occurring within 10 seconds before motor manipulation and 75% within 100 seconds. Third, LAF behavior extends far into future subtasks, because only 47% of LAFs are made to objects relevant to the current subtask. Seventy-five percent of LAFs are to objects used within five upcoming steps. Last, LAFs are often directed repeatedly to the target before manipulation, suggesting memory volatility. LAFs with short fixation-action latencies have been hypothesized to benefit future visual search and/or motor manipulation. However, the diversity of LAFs suggest they may also reflect scene exploration and task relevance, as well as longer term problem solving and task planning.


Subject(s)
Camping , Eye Movements/physiology , Fixation, Ocular/physiology , Psychomotor Performance/physiology , Adult , Female , Humans , Male , Young Adult
6.
Front Neurorobot ; 14: 45, 2020.
Article in English | MEDLINE | ID: mdl-32733228

ABSTRACT

Agents that need to act on their surroundings can significantly benefit from the perception of their interaction possibilities or affordances. In this paper we combine the benefits of the Interaction Tensor, a straight-forward geometrical representation that captures multiple object-scene interactions, with deep learning saliency for fast parsing of affordances in the environment. Our approach works with visually perceived 3D pointclouds and enables to query a 3D scene for locations that support affordances such as sitting or riding, as well as interactions for everyday objects like the where to hang an umbrella or place a mug. Crucially, the nature of the interaction description exhibits one-shot generalization. Experiments with numerous synthetic and real RGB-D scenes and validated by human subjects, show that the representation enables the prediction of affordance candidate locations in novel environments from a single training example. The approach also allows for a highly parallelizable, multiple-affordance representation, and works at fast rates. The combination of the deep neural network that learns to estimate scene saliency with the one-shot geometric representation aligns well with the expectation that computational models for affordance estimation should be perceptually direct and economical.

7.
Front Robot AI ; 7: 126, 2020.
Article in English | MEDLINE | ID: mdl-33501292

ABSTRACT

Environments in which Global Positioning Systems (GPS), or more generally Global Navigation Satellite System (GNSS), signals are denied or degraded pose problems for the guidance, navigation, and control of autonomous systems. This can make operating in hostile GNSS-Impaired environments, such as indoors, or in urban and natural canyons, impossible or extremely difficult. Pixel Processor Array (PPA) cameras-in conjunction with other on-board sensors-can be used to address this problem, aiding in tracking, localization, and control. In this paper we demonstrate the use of a PPA device-the SCAMP vision chip-combining perception and compute capabilities on the same device for aiding in real-time navigation and control of aerial robots. A PPA consists of an array of Processing Elements (PEs), each of which features light capture, processing, and storage capabilities. This allows various image processing tasks to be efficiently performed directly on the sensor itself. Within this paper we demonstrate visual odometry and target identification running concurrently on-board a single PPA vision chip at a combined frequency in the region of 400 Hz. Results from outdoor multirotor test flights are given along with comparisons against baseline GPS results. The SCAMP PPA's High Dynamic Range (HDR) and ability to run multiple algorithms at adaptive rates makes the sensor well suited for addressing outdoor flight of small UAS in GNSS challenging or denied environments. HDR allows operation to continue during the transition from indoor to outdoor environments, and in other situations where there are significant variations in light levels. Additionally, the PPA only needs to output specific information such as the optic flow and target position, rather than having to output entire images. This significantly reduces the bandwidth required for communication between the sensor and on-board flight computer, enabling high frame rate, low power operation.

8.
PLoS One ; 10(6): e0127769, 2015.
Article in English | MEDLINE | ID: mdl-26126116

ABSTRACT

Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user's pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a combined entity as well as point out limitations of the system.


Subject(s)
Algorithms , Occupational Health , Workflow , Cognition , Humans , Imaging, Three-Dimensional , Learning , Occupational Medicine , Systems Integration , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...