Search | VHL Regional Portal

Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri.

Mugler, Emily M; Tate, Matthew C; Livescu, Karen; Templer, Jessica W; Goldrick, Matthew A; Slutzky, Marc W.

J Neurosci ; 38(46): 9803-9813, 2018 11 14.

Article in English | MEDLINE | ID: mdl-30257858

ABSTRACT

Speech is a critical form of human communication and is central to our daily lives. Yet, despite decades of study, an understanding of the fundamental neural control of speech production remains incomplete. Current theories model speech production as a hierarchy from sentences and phrases down to words, syllables, speech sounds (phonemes), and the actions of vocal tract articulators used to produce speech sounds (articulatory gestures). Here, we investigate the cortical representation of articulatory gestures and phonemes in ventral precentral and inferior frontal gyri in men and women. Our results indicate that ventral precentral cortex represents gestures to a greater extent than phonemes, while inferior frontal cortex represents both gestures and phonemes. These findings suggest that speech production shares a common cortical representation with that of other types of movement, such as arm and hand movements. This has important implications both for our understanding of speech production and for the design of brain-machine interfaces to restore communication to people who cannot speak.SIGNIFICANCE STATEMENT Despite being studied for decades, the production of speech by the brain is not fully understood. In particular, the most elemental parts of speech, speech sounds (phonemes) and the movements of vocal tract articulators used to produce these sounds (articulatory gestures), have both been hypothesized to be encoded in motor cortex. Using direct cortical recordings, we found evidence that primary motor and premotor cortices represent gestures to a greater extent than phonemes. Inferior frontal cortex (part of Broca's area) appears to represent both gestures and phonemes. These findings suggest that speech production shares a similar cortical organizational structure with the movement of other body parts.

Subject(s)

Brain Mapping/methods , Electrocorticography/methods , Frontal Lobe/physiology , Gestures , Prefrontal Cortex/physiology , Speech/physiology , Adult , Brain Mapping/instrumentation , Female , Humans , Male , Movement/physiology , Photic Stimulation/methods

Multistream articulatory feature-based models for visual speech recognition.

Saenko, Kate; Livescu, Karen; Glass, James; Darrell, Trevor.

IEEE Trans Pattern Anal Mach Intell ; 31(9): 1700-7, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19574628

ABSTRACT

We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models.

Subject(s)

Image Interpretation, Computer-Assisted/methods , Lip/anatomy & histology , Lip/physiology , Lipreading , Models, Biological , Speech Production Measurement/methods , Speech Recognition Software , Algorithms , Computer Simulation , Humans , Image Enhancement/methods , Models, Anatomic , Pattern Recognition, Automated/methods

Speech production knowledge in automatic speech recognition.

King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam.

J Acoust Soc Am ; 121(2): 723-42, 2007 Feb.

Article in English | MEDLINE | ID: mdl-17348495

ABSTRACT

Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

Subject(s)

Phonation , Phonetics , Speech Acoustics , Speech Production Measurement , Speech Recognition Software , Bayes Theorem , Humans , Likelihood Functions , Markov Chains , Neural Networks, Computer , Semantics , Sound Spectrography , Speech Articulation Tests

LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.

Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu.

Proc IEEE Int Conf Acoust Speech Signal Process ; 1(1415088): 1213-1216, 2005.

Article in English | MEDLINE | ID: mdl-19212454

ABSTRACT

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL