Search | VHL Regional Portal

EARSHOT: A Minimal Neural Network Model of Incremental Human Speech Recognition.

Magnuson, James S; You, Heejo; Luthra, Sahil; Li, Monica; Nam, Hosung; Escabí, Monty; Brown, Kevin; Allopenna, Paul D; Theodore, Rachel M; Monto, Nicholas; Rueckl, Jay G.

Cogn Sci ; 44(4): e12823, 2020 04.

Article in English | MEDLINE | ID: mdl-32274861

ABSTRACT

Despite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), human listeners experience phonetic constancy and typically perceive what a speaker intends. Most models of human speech recognition (HSR) have side-stepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, carefully engineered deep learning networks allow robust, real-world automatic speech recognition (ASR). However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support HSR. In this brief article, we report preliminary results from a two-layer network that borrows one element from ASR, long short-term memory nodes, which provide dynamic memory for a range of temporal spans. This allows the model to learn to map real speech from multiple talkers to semantic targets with high accuracy, with human-like timecourse of lexical access and phonological competition. Internal representations emerge that resemble phonetically organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of HSR.

Subject(s)

Computer Simulation , Models, Neurological , Neural Networks, Computer , Speech Perception , Speech , Female , Humans , Male , Phonetics , Semantics

Universal Features in Phonological Neighbor Networks.

Brown, Kevin S; Allopenna, Paul D; Hunt, William R; Steiner, Rachael; Saltzman, Elliot; McRae, Ken; Magnuson, James S.

Entropy (Basel) ; 20(7)2018 Jul 12.

Article in English | MEDLINE | ID: mdl-33265615

ABSTRACT

Human speech perception involves transforming a countinuous acoustic signal into discrete linguistically meaningful units (phonemes) while simultaneously causing a listener to activate words that are similar to the spoken utterance and to each other. The Neighborhood Activation Model posits that phonological neighbors (two forms [words] that differ by one phoneme) compete significantly for recognition as a spoken word is heard. This definition of phonological similarity can be extended to an entire corpus of forms to produce a phonological neighbor network (PNN). We study PNNs for five languages: English, Spanish, French, Dutch, and German. Consistent with previous work, we find that the PNNs share a consistent set of topological features. Using an approach that generates random lexicons with increasing levels of phonological realism, we show that even random forms with minimal relationship to any real language, combined with only the empirical distribution of language-specific phonological form lengths, are sufficient to produce the topological properties observed in the real language PNNs. The resulting pseudo-PNNs are insensitive to the level of lingustic realism in the random lexicons but quite sensitive to the shape of the form length distribution. We therefore conclude that "universal" features seen across multiple languages are really string universals, not language universals, and arise primarily due to limitations in the kinds of networks generated by the one-step neighbor definition. Taken together, our results indicate that caution is warranted when linking the dynamics of human spoken word recognition to the topological properties of PNNs, and that the investigation of alternative similarity metrics for phonological forms should be a priority.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL