Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
J Integr Neurosci ; 22(5): 124, 2023 Aug 16.
Article in English | MEDLINE | ID: mdl-37735137

ABSTRACT

BACKGROUND: The computer-based simulation of the whole processing route for speech production and speech perception in a neurobiologically inspired way remains a challenge. Only a few neural based models of speech production exist, and these models either concentrate on the cognitive-linguistic component or the lower-level sensorimotor component of speech production and speech perception. Moreover, these existing models are second-generation neural network models using rate-based neuron approaches. The aim of this paper is to describe recent work developing a third-generation spiking-neuron neural network capable of modeling the whole process of speech production, including cognitive and sensorimotor components. METHODS: Our neural model of speech production was developed within the Neural Engineering Framework (NEF), incorporating the concept of Semantic Pointer Architecture (SPA), which allows the construction of large-scale neural models of the functioning brain based on only a few essential and neurobiologically well-grounded modeling or construction elements (i.e., single spiking neuron elements, neural connections, neuron ensembles, state buffers, associative memories, modules for binding and unbinding of states, modules for time scale generation (oscillators) and ramp signal generation (integrators), modules for input signal processing, modules for action selection, etc.). RESULTS: We demonstrated that this modeling approach is capable of constructing a fully functional model of speech production based on these modeling elements (i.e., biologically motivated spiking neuron micro-circuits or micro-networks). The model is capable of (i) modeling the whole processing chain of speech production and, in part, for speech perception based on leaky-integrate-and-fire spiking neurons and (ii) simulating (macroscopic) speaking behavior in a realistic way, by using neurobiologically plausible (microscopic) neural construction elements. CONCLUSIONS: The model presented here is a promising approach for describing speech processing in a bottom-up manner based on a set of micro-circuit neural network elements for generating a large-scale neural network. In addition, the model conforms to a top-down design, as it is available in a condensed form in box-and-arrow models based on functional imaging and electrophysiological data recruited from speech processing tasks.


Subject(s)
Semantics , Speech , Neurons , Neural Networks, Computer , Computer Simulation
2.
Front Hum Neurosci ; 16: 844529, 2022.
Article in English | MEDLINE | ID: mdl-35634209

ABSTRACT

A broad sketch for a model of speech production is outlined which describes developmental aspects of its cognitive-linguistic and sensorimotor components. A description of the emergence of phonological knowledge is a central point in our model sketch. It will be shown that the phonological form level emerges during speech acquisition and becomes an important representation at the interface between cognitive-linguistic and sensorimotor processes. Motor planning as well as motor programming are defined as separate processes in our model sketch and it will be shown that both processes revert to the phonological information. Two computational simulation experiments based on quantitative implementations (simulation models) are undertaken to show proof of principle of key ideas of the model sketch: (i) the emergence of phonological information over developmental stages, (ii) the adaptation process for generating new motor programs, and (iii) the importance of various forms of phonological representation in that process. Based on the ideas developed within our sketch of a production model and its quantitative spell-out within the simulation models, motor planning can be defined here as the process of identifying a succession of executable chunks from a currently activated phoneme sequence and of coding them as raw gesture scores. Motor programming can be defined as the process of building up the complete set of motor commands by specifying all gestures in detail (fully specified gesture score including temporal relations). This full specification of gesture scores is achieved in our model by adapting motor information from phonologically similar syllables (adapting approach) or by assembling motor programs from sub-syllabic units (assembling approach).

3.
Front Robot AI ; 9: 796739, 2022.
Article in English | MEDLINE | ID: mdl-35494539

ABSTRACT

Modeling speech production and speech articulation is still an evolving research topic. Some current core questions are: What is the underlying (neural) organization for controlling speech articulation? How to model speech articulators like lips and tongue and their movements in an efficient but also biologically realistic way? How to develop high-quality articulatory-acoustic models leading to high-quality articulatory speech synthesis? Thus, on the one hand computer-modeling will help us to unfold underlying biological as well as acoustic-articulatory concepts of speech production and on the other hand further modeling efforts will help us to reach the goal of high-quality articulatory-acoustic speech synthesis based on more detailed knowledge on vocal tract acoustics and speech articulation. Currently, articulatory models are not able to reach the quality level of corpus-based speech synthesis. Moreover, biomechanical and neuromuscular based approaches are complex and still not usable for sentence-level speech synthesis. This paper lists many computer-implemented articulatory models and provides criteria for dividing articulatory models in different categories. A recent major research question, i.e., how to control articulatory models in a neurobiologically adequate manner is discussed in detail. It can be concluded that there is a strong need to further developing articulatory-acoustic models in order to test quantitative neurobiologically based control concepts for speech articulation as well as to uncover the remaining details in human articulatory and acoustic signal generation. Furthermore, these efforts may help us to approach the goal of establishing high-quality articulatory-acoustic as well as neurobiologically grounded speech synthesis.

4.
Front Comput Neurosci ; 14: 573554, 2020.
Article in English | MEDLINE | ID: mdl-33262697

ABSTRACT

Our understanding of the neurofunctional mechanisms of speech production and their pathologies is still incomplete. In this paper, a comprehensive model of speech production based on the Neural Engineering Framework (NEF) is presented. This model is able to activate sensorimotor plans based on cognitive-functional processes (i.e., generation of the intention of an utterance, selection of words and syntactic frames, generation of the phonological form and motor plan; feedforward mechanism). Since the generation of different states of the utterance are tied to different levels in the speech production hierarchy, it is shown that different forms of speech errors as well as speech disorders can arise at different levels in the production hierarchy or are linked to different levels and different modules in the speech production model. In addition, the influence of the inner feedback mechanisms on normal as well as on disordered speech is examined in terms of the model. The model uses a small number of core concepts provided by the NEF, and we show that these are sufficient to create this neurobiologically detailed model of the complex process of speech production in a manner that is, we believe, clear, efficient, and understandable.

5.
Front Psychol ; 11: 1594, 2020.
Article in English | MEDLINE | ID: mdl-32774315

ABSTRACT

BACKGROUND: To produce and understand words, humans access the mental lexicon. From a functional perspective, the long-term memory component of the mental lexicon is comprised of three levels: the concept level, the lemma level, and the phonological level. At each level, different kinds of word information are stored. Semantic as well as phonological cues can help to facilitate word access during a naming task, especially when neural dysfunctions are present. The processing corresponding to word access occurs in specific parts of working memory. Neural models for simulating speech processing help to uncover the complex relationships that exist between neural dysfunctions and corresponding behavioral patterns. METHODS: The Neural Engineering Framework (NEF) and the Semantic Pointer Architecture (SPA) are used to develop a quantitative neural model of the mental lexicon and its access during speech processing. By simulating a picture-naming task (WWT 6-10), the influence of cues is investigated by introducing neural dysfunctions within the neural model at different levels of the mental lexicon. RESULTS: First, the neural model is able to simulate the test behavior for normal children that exhibit no lexical dysfunction. Second, the model shows worse results in test performance as larger degrees of dysfunction are introduced. Third, if the severity of dysfunction is not too high, phonological and semantic cues are observed to lead to an increase in the number of correctly named words. Phonological cues are observed to be more effective than semantic cues. CONCLUSION: Our simulation results are in line with human experimental data. Specifically, phonological cues seem not only to activate phonologically similar items within the phonological level. Moreover, phonological cues support higher-level processing during access of the mental lexicon. Thus, the neural model introduced in this paper offers a promising approach to modeling the mental lexicon, and to incorporating the mental lexicon into a complex model of language processing.

6.
Front Psychol ; 10: 1462, 2019.
Article in English | MEDLINE | ID: mdl-31354560

ABSTRACT

A comprehensive model of speech processing and speech learning has been established. The model comprises a mental lexicon, an action repository and an articulatory-acoustic module for executing motor plans and generating auditory and somatosensory feedback information (Kröger and Cao, 2015). In this study a "model language" based on three auditory and motor realizations of 70 monosyllabic words has been trained in order to simulate early phases of speech acquisition (babbling and imitation). We were able to show that (i) the emergence of phonetic-phonological features results from an increasing degree of ordering of syllable representations within the action repository and that (ii) this ordering or arrangement of syllables is mainly shaped by auditory information. Somatosensory information helps to increase the speed of learning. Especially consonantal features like place of articulation are learned earlier if auditory information is accompanied by somatosensory information. It can be concluded that somatosensory information as it is generated already during the babbling and the imitation phase of speech acquisition is very helpful especially for learning features like place of articulation. After learning is completed acoustic information together with semantic information is sufficient for determining the phonetic-phonological information from the speech signal. Moreover it is possible to learn phonetic-phonological features like place of articulation from auditory and semantic information only but not as fast as when somatosensory information is also available during the early stages of learning.

7.
Front Robot AI ; 6: 62, 2019.
Article in English | MEDLINE | ID: mdl-33501077

ABSTRACT

Many medical screenings used for the diagnosis of neurological, psychological or language and speech disorders access the language and speech processing system. Specifically, patients are asked to fulfill a task (perception) and then requested to give answers verbally or by writing (production). To analyze cognitive or higher-level linguistic impairments or disorders it is thus expected that specific parts of the language and speech processing system of patients are working correctly or that verbal instructions are replaced by pictures (avoiding auditory perception) or oral answers by pointing (avoiding speech articulation). The first goal of this paper is to propose a large-scale neural model which comprises cognitive and lexical levels of the human neural system, and which is able to simulate the human behavior occurring in medical screenings. The second goal of this paper is to relate (microscopic) neural deficits introduced into the model to corresponding (macroscopic) behavioral deficits resulting from the model simulations. The Neural Engineering Framework and the Semantic Pointer Architecture are used to develop the large-scale neural model. Parts of two medical screenings are simulated: (1) a screening of word naming for the detection of developmental problems in lexical storage and lexical retrieval; and (2) a screening of cognitive abilities for the detection of mild cognitive impairment and early dementia. Both screenings include cognitive, language, and speech processing, and for both screenings the same model is simulated with and without neural deficits (physiological case vs. pathological case). While the simulation of both screenings results in the expected normal behavior in the physiological case, the simulations clearly show a deviation of behavior, e.g., an increase in errors in the pathological case. Moreover, specific types of neural dysfunctions resulting from different types of neural defects lead to differences in the type and strength of the observed behavioral deficits.

8.
Front Comput Neurosci ; 12: 41, 2018.
Article in English | MEDLINE | ID: mdl-29928197

ABSTRACT

Background: Parkinson's disease affects many motor processes including speech. Besides drug treatment, deep brain stimulation (DBS) in the subthalamic nucleus (STN) and globus pallidus internus (GPi) has developed as an effective therapy. Goal: We present a neural model that simulates a syllable repetition task and evaluate its performance when varying the level of dopamine in the striatum, and the level of activity reduction in the STN or GPi. Method: The Neural Engineering Framework (NEF) is used to build a model of syllable sequencing through a cortico-basal ganglia-thalamus-cortex circuit. The model is able to simulate a failing substantia nigra pars compacta (SNc), as occurs in Parkinson's patients. We simulate syllable sequencing parameterized by (i) the tonic dopamine level in the striatum and (ii) average neural activity in STN or GPi. Results: With decreased dopamine levels, the model produces syllable sequencing errors in the form of skipping and swapping syllables, repeating the same syllable, breaking and restarting in the middle of a sequence, and cessation ("freezing") of sequences. We also find that reducing (inhibiting) activity in either STN or GPi reduces the occurrence of syllable sequencing errors. Conclusion: The model predicts that inhibiting activity in STN or GPi can reduce syllable sequencing errors in Parkinson's patients. Since DBS also reduces syllable sequencing errors in Parkinson's patients, we therefore suggest that STN or GPi inhibition is one mechanism through which DBS reduces syllable sequencing errors in Parkinson's patients.

9.
Front Comput Neurosci ; 10: 51, 2016.
Article in English | MEDLINE | ID: mdl-27303287

ABSTRACT

Production and comprehension of speech are closely interwoven. For example, the ability to detect an error in one's own speech, halt speech production, and finally correct the error can be explained by assuming an inner speech loop which continuously compares the word representations induced by production to those induced by perception at various cognitive levels (e.g., conceptual, word, or phonological levels). Because spontaneous speech errors are relatively rare, a picture naming and halt paradigm can be used to evoke them. In this paradigm, picture presentation (target word initiation) is followed by an auditory stop signal (distractor word) for halting speech production. The current study seeks to understand the neural mechanisms governing self-detection of speech errors by developing a biologically inspired neural model of the inner speech loop. The neural model is based on the Neural Engineering Framework (NEF) and consists of a network of about 500,000 spiking neurons. In the first experiment we induce simulated speech errors semantically and phonologically. In the second experiment, we simulate a picture naming and halt task. Target-distractor word pairs were balanced with respect to variation of phonological and semantic similarity. The results of the first experiment show that speech errors are successfully detected by a monitoring component in the inner speech loop. The results of the second experiment show that the model correctly reproduces human behavioral data on the picture naming and halt task. In particular, the halting rate in the production of target words was lower for phonologically similar words than for semantically similar or fully dissimilar distractor words. We thus conclude that the neural architecture proposed here to model the inner speech loop reflects important interactions in production and perception at phonological and semantic levels.

10.
J Acoust Soc Am ; 137(3): 1503-12, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25786961

ABSTRACT

Vocal emotions are signaled by specific patterns of prosodic parameters, most notably pitch, phone duration, intensity, and phonation type. Phonation type was so far the least accessible parameter in emotion research, because it was difficult to extract from speech signals and difficult to manipulate in natural or synthetic speech. The present study built on recent advances in articulatory speech synthesis to exclusively control phonation type in re-synthesized German sentences spoken with seven different emotions. The goal was to find out to what extent the sole change of phonation type affects the perception of these emotions. Therefore, portrayed emotional utterances were re-synthesized with their original phonation type, as well as with each purely breathy, modal, and pressed phonation, and then rated by listeners with respect to the perceived emotions. Highly significant effects of phonation type on the recognition rates of the original emotions were found, except for disgust. While fear, anger, and the neutral emotion require specific phonation types for correct perception, sadness, happiness, boredom, and disgust primarily rely on other prosodic parameters. These results can help to improve the expression of emotions in synthesized speech and facilitate the robust automatic recognition of vocal emotions.


Subject(s)
Emotions , Loudness Perception , Phonation , Pitch Perception , Speech Acoustics , Speech Perception , Voice Quality , Acoustics , Adult , Female , Humans , Male , Pattern Recognition, Automated , Phonetics , Signal Processing, Computer-Assisted , Sound Spectrography , Speech Production Measurement , Time Factors , Young Adult
11.
Front Psychol ; 5: 236, 2014.
Article in English | MEDLINE | ID: mdl-24688478

ABSTRACT

Based on the incremental nature of knowledge acquisition, in this study we propose a growing self-organizing neural network approach for modeling the acquisition of auditory and semantic categories. We introduce an Interconnected Growing Self-Organizing Maps (I-GSOM) algorithm, which takes associations between auditory information and semantic information into consideration, in this paper. Direct phonetic-semantic association is simulated in order to model the language acquisition in early phases, such as the babbling and imitation stages, in which no phonological representations exist. Based on the I-GSOM algorithm, we conducted experiments using paired acoustic and semantic training data. We use a cyclical reinforcing and reviewing training procedure to model the teaching and learning process between children and their communication partners. A reinforcing-by-link training procedure and a link-forgetting procedure are introduced to model the acquisition of associative relations between auditory and semantic information. Experimental results indicate that (1) I-GSOM has good ability to learn auditory and semantic categories presented within the training data; (2) clear auditory and semantic boundaries can be found in the network representation; (3) cyclical reinforcing and reviewing training leads to a detailed categorization as well as to a detailed clustering, while keeping the clusters that have already been learned and the network structure that has already been developed stable; and (4) reinforcing-by-link training leads to well-perceived auditory-semantic associations. Our I-GSOM model suggests that it is important to associate auditory information with semantic information during language acquisition. Despite its high level of abstraction, our I-GSOM approach can be interpreted as a biologically-inspired neurocomputational model.

12.
Front Hum Neurosci ; 7: 121, 2013.
Article in English | MEDLINE | ID: mdl-23576970

ABSTRACT

A speech-action-repository (SAR) or "mental syllabary" has been proposed as a central module for sensorimotor processing of syllables. In this approach, syllables occurring frequently within language are assumed to be stored as holistic sensorimotor patterns, while non-frequent syllables need to be assembled from sub-syllabic units. Thus, frequent syllables are processed efficiently and quickly during production or perception by a direct activation of their sensorimotor patterns. Whereas several behavioral psycholinguistic studies provided evidence in support of the existence of a syllabary, fMRI studies have failed to demonstrate its neural reality. In the present fMRI study a reaction paradigm using homogeneous vs. heterogeneous syllable blocks are used during overt vs. covert speech production and auditory vs. visual presentation modes. Two complementary data analyses were performed: (1) in a logical conjunction, activation for syllable processing independent of input modality and response mode was assessed, in order to support the assumption of existence of a supramodal hub within a SAR. (2) In addition priming effects in the BOLD response in homogeneous vs. heterogeneous blocks were measured in order to identify brain regions, which indicate reduced activity during multiple production/perception repetitions of a specific syllable in order to determine state maps. Auditory-visual conjunction analysis revealed an activation network comprising bilateral precentral gyrus (PrCG) and left inferior frontal gyrus (IFG) (area 44). These results are compatible with the notion of a supramodal hub within the SAR. The main effect of homogeneity priming revealed an activation pattern of areas within frontal, temporal, and parietal lobe. These findings are taken to represent sensorimotor state maps of the SAR. In conclusion, the present study provided preliminary evidence for a SAR.

14.
Cogn Process ; 11(3): 187-205, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20012153

ABSTRACT

The concept of action as basic motor control unit for goal-directed movement behavior has been used primarily for private or non-communicative actions like walking, reaching, or grasping. In this paper, literature is reviewed indicating that this concept can also be used in all domains of face-to-face communication like speech, co-verbal facial expression, and co-verbal gesturing. Three domain-specific types of actions, i.e. speech actions, facial actions, and hand-arm actions, are defined in this paper and a model is proposed that elucidates the underlying biological mechanisms of action production, action perception, and action acquisition in all domains of face-to-face communication. This model can be used as theoretical framework for empirical analysis or simulation with embodied conversational agents, and thus for advanced human-computer interaction technologies.


Subject(s)
Communication , Face , Facial Expression , Movement/physiology , Perception/physiology , Humans , Models, Psychological
15.
J Speech Lang Hear Res ; 51(4): 914-21, 2008 Aug.
Article in English | MEDLINE | ID: mdl-18658061

ABSTRACT

PURPOSE: Northern Digital Instruments (NDI; Waterloo, Ontario, Canada) manufactures a commercially available magnetometer device called Aurora that features real-time display of sensor position tracked in 3 dimensions. To test its potential for speech production research, data were collected to assess the measurement accuracy and reliability of the system. METHOD: First, sensors affixed at a known distance on a rigid ruler were moved systematically through the measurement space. Second, sensors attached to the speech articulators of a human participant were tracked during various speech tasks. RESULTS: In the ruler task, results showed mean distance errors of less than 1 mm, with some sensitivity to location within the measurement field. In the speech tasks, Euclidean distance between jaw-mounted sensors showed comparable accuracy; however, a high incidence of missing samples was observed, positively correlated with sensor velocity. CONCLUSIONS: The real-time positional feedback provided by the system makes it potentially useful in speech therapy applications. The overall missing data rate observed during speech tasks makes use of the system in its current form problematic for the quantitative measurement of speech articulator movements; however, NDI is actively working to improve the Aurora system for use in this context.


Subject(s)
Speech-Language Pathology/instrumentation , Speech/physiology , Equipment Design , Humans , Research/instrumentation , Speech Production Measurement
SELECTION OF CITATIONS
SEARCH DETAIL
...