Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
IEEE Trans Vis Comput Graph ; 30(5): 2767-2775, 2024 May.
Article in English | MEDLINE | ID: mdl-38564356

ABSTRACT

High-precision virtual environments are increasingly important for various education, simulation, training, performance, and entertainment applications. We present HoloCamera, an innovative volumetric capture instrument to rapidly acquire, process, and create cinematic-quality virtual avatars and scenarios. The HoloCamera consists of a custom-designed free-standing structure with 300 high-resolution RGB cameras mounted with uniform spacing spanning the four sides and the ceiling of a room-sized studio. The light field acquired from these cameras is streamed through a distributed array of GPUs that interleave the processing and transmission of 4K resolution images. The distributed compute infrastructure that powers these RGB cameras consists of 50 Jetson AGX Xavier boards, with each processing unit dedicated to driving and processing imagery from six cameras. A high-speed Gigabit Ethernet network fabric seamlessly interconnects all computing boards. In this systems paper, we provide an in-depth description of the steps involved and lessons learned in constructing such a cutting-edge volumetric capture facility that can be generalized to other such facilities. We delve into the techniques employed to achieve precise frame synchronization and spatial calibration of cameras, careful determination of angled camera mounts, image processing from the camera sensors, and the need for a resilient and robust network infrastructure. To advance the field of volumetric capture, we are releasing a high-fidelity static light-field dataset, which will serve as a benchmark for further research and applications of cinematic-quality volumetric light fields.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6820-6831, 2023 06.
Article in English | MEDLINE | ID: mdl-34705636

ABSTRACT

Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.


Subject(s)
Quality of Life , Visually Impaired Persons , Humans , Algorithms , Hand , Visual Perception
3.
Article in English | MEDLINE | ID: mdl-38545919

ABSTRACT

To ensure that AI-infused systems work for disabled people, we need to bring accessibility datasets sourced from this community in the development lifecycle. However, there are many ethical and privacy concerns limiting greater data inclusion, making such datasets not readily available. We present a pair of studies where 13 blind participants engage in data capturing activities and reflect with and without probing on various factors that influence their decision to share their data via an AI dataset. We see how different factors influence blind participants' willingness to share study data as they assess risk-benefit tradeoffs. The majority support sharing of their data to improve technology but also express concerns over commercial use, associated metadata, and the lack of transparency about the impact of their data. These insights have implications for the development of responsible practices for stewarding accessibility datasets, and can contribute to broader discussions in this area.

4.
Web4All (2022) ; 20222022 Apr.
Article in English | MEDLINE | ID: mdl-37942017

ABSTRACT

Researchers have adopted remote methods, such as online surveys and video conferencing, to overcome challenges in conducting in-person usability testing, such as participation, user representation, and safety. However, remote user evaluation on hardware testbeds is limited, especially for blind participants, as such methods restrict access to observations of user interactions. We employ smart glasses in usability testing with blind people and share our lessons from a case study conducted in blind participants' homes (N = 12), where the experimenter can access participants' activities via dual video conferencing: a third-person view via a laptop camera and a first-person view via smart glasses worn by the participant. We show that smart glasses hold potential for observing participants' interactions with smartphone testbeds remotely; on average 58.7% of the interactions were fully captured via the first-person view compared to 3.7% via the third-person. However, this gain is not uniform across participants as it is susceptible to head movements orienting the ear towards a sound source, which highlights the need for a more inclusive camera form factor. We also share our lessons learned when it comes to dealing with lack of screen reader support in smart glasses, a rapidly draining battery, and Internet connectivity in remote studies with blind participants.

5.
ASSETS ; 20222022 Oct.
Article in English | MEDLINE | ID: mdl-36916963

ABSTRACT

Teachable object recognizers provide a solution for a very practical need for blind people - instance level object recognition. They assume one can visually inspect the photos they provide for training, a critical and inaccessible step for those who are blind. In this work, we engineer data descriptors that address this challenge. They indicate in real time whether the object in the photo is cropped or too small, a hand is included, the photos is blurred, and how much photos vary from each other. Our descriptors are built into open source testbed iOS app, called MYCam. In a remote user study in (N = 12) blind participants' homes, we show how descriptors, even when error-prone, support experimentation and have a positive impact in the quality of training set that can translate to model performance though this gain is not uniform. Participants found the app simple to use indicating that they could effectively train it and that the descriptors were useful. However, many found the training being tedious, opening discussions around the need for balance between information, time, and cognitive load.

6.
ASSETS ; 20222022 Oct.
Article in English | MEDLINE | ID: mdl-36939417

ABSTRACT

As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.

7.
Article in English | MEDLINE | ID: mdl-35069983

ABSTRACT

Iteratively building and testing machine learning models can help children develop creativity, flexibility, and comfort with machine learning and artificial intelligence. We explore how children use machine teaching interfaces with a team of 14 children (aged 7-13 years) and adult co-designers. Children trained image classifiers and tested each other's models for robustness. Our study illuminates how children reason about ML concepts, offering these insights for designing machine teaching experiences for children: (i) ML metrics (e.g. confidence scores) should be visible for experimentation; (ii) ML activities should enable children to exchange models for promoting reflection and pattern recognition; and (iii) the interface should allow quick data inspection (e.g. images vs. gestures).

8.
Article in English | MEDLINE | ID: mdl-35187410

ABSTRACT

Negative attitudes shape experiences with stigmatized conditions such as dementia, from affecting social relationships to influencing willingness to adopt technology. Consequently, attitudinal change has been identified as one lever to improve life for people with stigmatized conditions. Though recognized as a scaleable approach, social media has not been studied in terms of how it should best be designed or deployed to target attitudes and understanding of dementia. Through a mixed methods design with 123 undergraduate college students, we study the effect of being exposed to dementia-related media, including content produced by people with dementia. We selected undergraduate college students as the target of our intervention, as they represent the next generation that will work and interact with individuals with dementia. Our analysis describes changes over the period of two weeks in attitudes and understanding of the condition. The shifts in understanding of dementia that we found in our qualitative analysis were not captured by the instrument we selected to assess understanding of dementia. While small improvements in positive and overall attitudes were seen across all interventions and the control, we observe a different pattern with negative attitudes, where transcriptions of content produced by people with dementia significantly reduced negative attitudes. The discussion presents implications for supporting people with dementia as content producers, doing so in ways that best affect attitudes and understanding by drawing on research on cues and interactive media, and supporting students in changing their perspectives towards people with dementia.

9.
ASSETS ; 12021.
Article in English | MEDLINE | ID: mdl-35187541

ABSTRACT

Datasets sourced from people with disabilities and older adults play an important role in innovation, benchmarking, and mitigating bias for both assistive and inclusive AI-infused applications. However, they are scarce. We conduct a systematic review of 137 accessibility datasets manually located across different disciplines over the last 35 years. Our analysis highlights how researchers navigate tensions between benefits and risks in data collection and sharing. We uncover patterns in data collection purpose, terminology, sample size, data types, and data sharing practices across communities of focus. We conclude by critically reflecting on challenges and opportunities related to locating and sharing accessibility datasets calling for technical, legal, and institutional privacy frameworks that are more attuned to concerns from these communities.

10.
ASSETS ; 172021.
Article in English | MEDLINE | ID: mdl-35187542

ABSTRACT

The majority of online video contents remain inaccessible to people with visual impairments due to the lack of audio descriptions to depict the video scenes. Content creators have traditionally relied on professionals to author audio descriptions, but their service is costly and not readily-available. We investigate the feasibility of creating more cost-effective audio descriptions that are also of high quality by involving novices. Specifically, we designed, developed, and evaluated ViScene, a web-based collaborative audio description authoring tool that enables a sighted novice author and a reviewer either sighted or blind to interact and contribute to scene descriptions (SDs)-text that can be transformed into audio through text-to-speech. Through a mixed-design study with N = 60 participants, we assessed the quality of SDs created by sighted novices with feedback from both sighted and blind reviewers. Our results showed that with ViScene novices could produce content that is Descriptive, Objective, Referable, and Clear at a cost of i.e., US$2.81pvm to US$5.48pvm, which is 54% to 96% lower than the professional service. However, the descriptions lacked in other quality dimensions (e.g., learning, a measure of how well an SD conveys the video's intended message). While professional audio describers remain the gold standard, for content creators who cannot afford it, ViScene offers a cost-effective alternative, ultimately leading to a more accessible medium.

11.
ASSETS ; 212021.
Article in English | MEDLINE | ID: mdl-35187543

ABSTRACT

The spatial behavior of passersby can be critical to blind individuals to initiate interactions, preserve personal space, or practice social distancing during a pandemic. Among other use cases, wearable cameras employing computer vision can be used to extract proxemic signals of others and thus increase access to the spatial behavior of passersby for blind people. Analyzing data collected in a study with blind (N=10) and sighted (N=40) participants, we explore: (i) visual information on approaching passersby captured by a head-worn camera; (ii) pedestrian detection algorithms for extracting proxemic signals such as passerby presence, relative position, distance, and head pose; and (iii) opportunities and limitations of using wearable cameras for helping blind people access proxemics related to nearby people. Our observations and findings provide insights into dyadic behaviors for assistive pedestrian detection and lead to implications for the design of future head-worn cameras and interactions.

12.
ASSETS ; 932021.
Article in English | MEDLINE | ID: mdl-35187544

ABSTRACT

Audio descriptions (ADs) can increase access to videos for blind people. Researchers have explored different mechanisms for generating ADs, with some of the most recent studies involving paid novices; to improve the quality of their ADs, novices receive feedback from reviewers. However, reviewer feedback is not instantaneous. To explore the potential for real-time feedback through automation, in this paper, we analyze 1, 120 comments that 40 sighted novices received from a sighted or a blind reviewer. We find that feedback patterns tend to fall under four themes: (i) Quality; commenting on different AD quality variables, (ii) Speech Act; the utterance or speech action that the reviewers used, (iii) Required Action; the recommended action that the authors should do to improve the AD, and (iv) Guidance; the additional help that the reviewers gave to help the authors. We discuss which of these patterns could be automated within the review process as design implications for future AD collaborative authoring systems.

13.
IEEE Winter Conf Appl Comput Vis ; 2020: 3411-3421, 2020 Mar.
Article in English | MEDLINE | ID: mdl-32803098

ABSTRACT

Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users often tend to include their hand either interacting with the object that they wish to recognize or simply placing it in proximity for better camera aiming. We propose localization models that leverage the presence of the hand as the contextual information for priming the center area of the object of interest. In our approach, hand segmentation is fed to either the entire localization network or its last convolutional layers. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves higher precision than other approaches, such as fine-tuning, multi-class, and multi-task learning, which also encode hand-object interactions in localization.

14.
Article in English | MEDLINE | ID: mdl-32803195

ABSTRACT

Blind people have limited access to information about their surroundings, which is important for ensuring one's safety, managing social interactions, and identifying approaching pedestrians. With advances in computer vision, wearable cameras can provide equitable access to such information. However, the always-on nature of these assistive technologies poses privacy concerns for parties that may get recorded. We explore this tension from both perspectives, those of sighted passersby and blind users, taking into account camera visibility, in-person versus remote experience, and extracted visual information. We conduct two studies: an online survey with MTurkers (N=206) and an in-person experience study between pairs of blind (N=10) and sighted (N=40) participants, where blind participants wear a working prototype for pedestrian detection and pass by sighted participants. Our results suggest that both of the perspectives of users and bystanders and the several factors mentioned above need to be carefully considered to mitigate potential social tensions.

15.
ASSETS ; 20202020 Oct.
Article in English | MEDLINE | ID: mdl-34282410

ABSTRACT

Audio descriptions can make the visual content in videos accessible to people with visual impairments. However, the majority of the online videos lack audio descriptions due in part to the shortage of experts who can create high-quality descriptions. We present ViScene, a web-based authoring tool that taps into the larger pool of sighted non-experts to help them generate high-quality descriptions via two feedback mechanisms-succinct visualizations and comments from an expert. Through a mixed-design study with N = 6 participants, we explore the usability of ViScene and the quality of the descriptions created by sighted non-experts with and without feedback comments. Our results indicate that non-experts can produce better descriptions with feedback comments; preliminary insights also highlight the role that people with visual impairments can play in providing this feedback.

16.
ASSETS ; 722020.
Article in English | MEDLINE | ID: mdl-34423335

ABSTRACT

Datasets and data sharing play an important role for innovation, benchmarking, mitigating bias, and understanding the complexity of real world AI-infused applications. However, there is a scarcity of available data generated by people with disabilities with the potential for training or evaluating machine learning models. This is partially due to smaller populations, disparate characteristics, lack of expertise for data annotation, as well as privacy concerns. Even when data are collected and are publicly available, it is often difficult to locate them. We present a novel data surfacing repository, called IncluSet, that allows researchers and the disability community to discover and link accessibility datasets. The repository is pre-populated with information about 139 existing datasets: 65 made publicly available, 25 available upon request, and 49 not shared by the authors but described in their manuscripts. More importantly, IncluSet is designed to expose existing and new dataset contributions so they may be discoverable through Google Dataset Search.

17.
Article in English | MEDLINE | ID: mdl-32043091

ABSTRACT

Camera manipulation confounds the use of object recognition applications by blind people. This is exacerbated when photos from this population are also used to train models, as with teachable machines, where out-of-frame or partially included objects against cluttered backgrounds degrade performance. Leveraging prior evidence on the ability of blind people to coordinate hand movements using proprioception, we propose a deep learning system that jointly models hand segmentation and object localization for object classification. We investigate the utility of hands as a natural interface for including and indicating the object of interest in the camera frame. We confirm the potential of this approach by analyzing existing datasets from people with visual impairments for object recognition. With a new publicly available egocentric dataset and an extensive error analysis, we provide insights into this approach in the context of teachable recognizers.

18.
ASSETS ; 2019: 83-95, 2019 Oct.
Article in English | MEDLINE | ID: mdl-32783045

ABSTRACT

For people with visual impairments, photography is essential in identifying objects through remote sighted help and image recognition apps. This is especially the case for teachable object recognizers, where recognition models are trained on user's photos. Here, we propose real-time feedback for communicating the location of an object of interest in the camera frame. Our audio-haptic feedback is powered by a deep learning model that estimates the object center location based on its proximity to the user's hand. To evaluate our approach, we conducted a user study in the lab, where participants with visual impairments (N = 9) used our feedback to train and test their object recognizer in vanilla and cluttered environments. We found that very few photos did not include the object (2% in the vanilla and 8% in the cluttered) and the recognition performance was promising even for participants with no prior camera experience. Participants tended to trust the feedback even though they know it can be wrong. Our cluster analysis indicates that better feedback is associated with photos that include the entire object. Our results provide insights into factors that can degrade feedback and recognition performance in teachable interfaces.

SELECTION OF CITATIONS
SEARCH DETAIL
...