Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Sensors (Basel) ; 24(12)2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38931706

ABSTRACT

The remarkable human ability to predict others' intent during physical interactions develops at a very early age and is crucial for development. Intent prediction, defined as the simultaneous recognition and generation of human-human interactions, has many applications such as in assistive robotics, human-robot interaction, video and robotic surveillance, and autonomous driving. However, models for solving the problem are scarce. This paper proposes two attention-based agent models to predict the intent of interacting 3D skeletons by sampling them via a sequence of glimpses. The novelty of these agent models is that they are inherently multimodal, consisting of perceptual and proprioceptive pathways. The action (attention) is driven by the agent's generation error, and not by reinforcement. At each sampling instant, the agent completes the partially observed skeletal motion and infers the interaction class. It learns where and what to sample by minimizing the generation and classification errors. Extensive evaluation of our models is carried out on benchmark datasets and in comparison to a state-of-the-art model for intent prediction, which reveals that classification and generation accuracies of one of the proposed models are comparable to those of the state of the art even though our model contains fewer trainable parameters. The insights gained from our model designs can inform the development of efficient agents, the future of artificial intelligence (AI).


Subject(s)
Algorithms , Humans , Robotics/methods , Attention/physiology
2.
Sci Rep ; 13(1): 3305, 2023 02 27.
Article in English | MEDLINE | ID: mdl-36849543

ABSTRACT

Multiple attention-based models that recognize objects via a sequence of glimpses have reported results on handwritten numeral recognition. However, no attention-tracking data for handwritten numeral or alphabet recognition is available. Availability of such data would allow attention-based models to be evaluated in comparison to human performance. We collect mouse-click attention tracking data from 382 participants trying to recognize handwritten numerals and alphabets (upper and lowercase) from images via sequential sampling. Images from benchmark datasets are presented as stimuli. The collected dataset, called AttentionMNIST, consists of a sequence of sample (mouse click) locations, predicted class label(s) at each sampling, and the duration of each sampling. On average, our participants observe only 12.8% of an image for recognition. We propose a baseline model to predict the location and the class(es) a participant will select at the next sampling. When exposed to the same stimuli and experimental conditions as our participants, a highly-cited attention-based reinforcement model falls short of human efficiency.


Subject(s)
Benchmarking , Recognition, Psychology , Humans , Reinforcement, Psychology
3.
Front Psychol ; 9: 5, 2018.
Article in English | MEDLINE | ID: mdl-29441027

ABSTRACT

Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver's speech are considered. During agent-caregiver interaction, the agent selects a word from the caregiver's utterance and learns the relations between that word and the objects in its visual environment. The "novel words to novel objects" language-specific constraint is assumed for computing rewards. The models are learned by maximizing the expected reward using reinforcement learning algorithms [i.e., table-based algorithms: Q-learning, SARSA, SARSA-λ, and neural network-based algorithms: Q-learning for neural network (Q-NN), neural-fitted Q-network (NFQ), and deep Q-network (DQN)]. Neural network-based reinforcement learning models are chosen over table-based models for better generalization and quicker convergence. Simulations are carried out using mother-infant interaction CHILDES dataset for learning word-object pairings. Reinforcement is modeled in two cross-situational learning cases: (1) with joint attention (Attentional models), and (2) with joint attention and prosodic cues (Attentional-prosodic models). Attentional-prosodic models manifest superior performance to Attentional ones for the task of word-learning. The Attentional-prosodic DQN outperforms existing word-learning models for the same task.

4.
J Acoust Soc Am ; 142(1): EL102, 2017 07.
Article in English | MEDLINE | ID: mdl-28764471

ABSTRACT

A corpus of recordings of deaf speech is introduced. Adults who were pre- or post-lingually deafened as well as those with normal hearing read standardized speech passages totaling 11 h of .wav recordings. Preliminary acoustic analyses are included to provide a glimpse of the kinds of analyses that can be conducted with this corpus of recordings. Long term average speech spectra as well as spectral moment analyses provide considerable insight into differences observed in the speech of talkers judged to have low, medium, or high speech intelligibility.

5.
J Acoust Soc Am ; 138(3): EL229-35, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26428818

ABSTRACT

The problem of nonlinear acoustic to articulatory inversion mapping is investigated in the feature space using two models, the deep belief network (DBN) which is the state-of-the-art, and the general regression neural network (GRNN). The task is to estimate a set of articulatory features for improved speech recognition. Experiments with MOCHA-TIMIT and MNGU0 databases reveal that, for speech inversion, GRNN yields a lower root-mean-square error and a higher correlation than DBN. It is also shown that conjunction of acoustic and GRNN-estimated articulatory features yields state-of-the-art accuracy in broad class phonetic classification and phoneme recognition using less computational power.


Subject(s)
Neural Networks, Computer , Phonetics , Speech Acoustics , Speech Production Measurement/methods , Computer Simulation , Databases, Factual , Female , Humans , Male , Nonlinear Dynamics , Numerical Analysis, Computer-Assisted , Pattern Recognition, Automated , Regression Analysis , Reproducibility of Results
6.
Top Cogn Sci ; 3(4): 760-77, 2011 Oct.
Article in English | MEDLINE | ID: mdl-25164509

ABSTRACT

Diagrams are a form of spatial representation that supports reasoning and problem solving. Even when diagrams are external, not to mention when there are no external representations, problem solving often calls for internal representations, that is, representations in cognition, of diagrammatic elements and internal perceptions on them. General cognitive architectures--Soar and ACT-R, to name the most prominent--do not have representations and operations to support diagrammatic reasoning. In this article, we examine some requirements for such internal representations and processes in cognitive architectures. We discuss the degree to which DRS, our earlier proposal for such an internal representation for diagrams, meets these requirements. In DRS, the diagrams are not raw images, but a composition of objects that can be individuated and thus symbolized, while, unlike traditional symbols, the referent of the symbol is an object that retains its perceptual essence, namely, its spatiality. This duality provides a way to resolve what anti-imagists thought was a contradiction in mental imagery: the compositionality of mental images that seemed to be unique to symbol systems, and their support of a perceptual experience of images and some types of perception on them. We briefly review the use of DRS to augment Soar and ACT-R with a diagrammatic representation component. We identify issues for further research.


Subject(s)
Cognition/physiology , Imagination/physiology , Models, Psychological , Problem Solving/physiology , Visual Perception/physiology , Humans
7.
IEEE Trans Neural Netw ; 18(5): 1463-71, 2007 Sep.
Article in English | MEDLINE | ID: mdl-18220194

ABSTRACT

The phenomenon of self-organization has been of special interest to the neural network community throughout the last couple of decades. In this paper, we study a variant of the self-organizing map (SOM) that models the phenomenon of self-organization of the particles forming a string when the string is tightened from one or both of its ends. The proposed variant, called the string tightening self-organizing neural network (STON), can be used to solve certain practical problems, such as computation of shortest homotopic paths, smoothing paths to avoid sharp turns, computation of convex hull, etc. These problems are of considerable interest in computational geometry, robotics path-planning, artificial intelligence (AI) (diagrammatic reasoning), very large scale integration (VLSI) routing, and geographical information systems. Given a set of obstacles and a string with two fixed terminal points in a 2-D space, the STON model continuously tightens the given string until the unique shortest configuration in terms of the Euclidean metric is reached. The STON minimizes the total length of a string on convergence by dynamically creating and selecting feature vectors in a competitive manner. Proof of correctness of this anytime algorithm and experimental results obtained by its deployment have been presented in the paper.


Subject(s)
Algorithms , Models, Theoretical , Nerve Net , Pattern Recognition, Automated/methods , Computer Simulation , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...