Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
1.
PLoS One ; 16(10): e0258178, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34597350

RESUMO

Measurements of the physical outputs of speech-vocal tract geometry and acoustic energy-are high-dimensional, but linguistic theories posit a low-dimensional set of categories such as phonemes and phrase types. How can it be determined when and where in high-dimensional articulatory and acoustic signals there is information related to theoretical categories? For a variety of reasons, it is problematic to directly quantify mutual information between hypothesized categories and signals. To address this issue, a multi-scale analysis method is proposed for localizing category-related information in an ensemble of speech signals using machine learning algorithms. By analyzing how classification accuracy on unseen data varies as the temporal extent of training input is systematically restricted, inferences can be drawn regarding the temporal distribution of category-related information. The method can also be used to investigate redundancy between subsets of signal dimensions. Two types of theoretical categories are examined in this paper: phonemic/gestural categories and syntactic relative clause categories. Moreover, two different machine learning algorithms were examined: linear discriminant analysis and neural networks with long short-term memory units. Both algorithms detected category-related information earlier and later in signals than would be expected given standard theoretical assumptions about when linguistic categories should influence speech. The neural network algorithm was able to identify category-related information to a greater extent than the discriminant analyses.


Assuntos
Acústica , Aprendizado de Máquina , Percepção da Fala/fisiologia , Fala/classificação , Algoritmos , Análise Discriminante , Gestos , Humanos , Redes Neurais de Computação , Língua/fisiologia
2.
PLoS One ; 16(4): e0250173, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33930026

RESUMO

SUBESCO is an audio-only emotional speech corpus for Bangla language. The total duration of the corpus is in excess of 7 hours containing 7000 utterances, and it is the largest emotional speech corpus available for this language. Twenty native speakers participated in the gender-balanced set, each recording of 10 sentences simulating seven targeted emotions. Fifty university students participated in the evaluation of this corpus. Each audio clip of this corpus, except those of Disgust emotion, was validated four times by male and female raters. Raw hit rates and unbiased rates were calculated producing scores above chance level of responses. Overall recognition rate was reported to be above 70% for human perception tests. Kappa statistics and intra-class correlation coefficient scores indicated high-level of inter-rater reliability and consistency of this corpus evaluation. SUBESCO is an Open Access database, licensed under Creative Common Attribution 4.0 International, and can be downloaded free of charge from the web link: https://doi.org/10.5281/zenodo.4526477.


Assuntos
Fala/classificação , Adulto , Bangladesh , Emoções , Feminino , Humanos , Índia , Idioma , Masculino , Reconhecimento Psicológico , Reprodutibilidade dos Testes , Percepção da Fala , Comportamento Verbal
3.
Neural Netw ; 136: 87-96, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33453522

RESUMO

In this paper, we propose Stacked DeBERT, short for StackedDenoising Bidirectional Encoder Representations from Transformers. This novel model improves robustness in incomplete data, when compared to existing systems, by designing a novel encoding scheme in BERT, a powerful language representation model solely based on attention mechanisms. Incomplete data in natural language processing refer to text with missing or incorrect words, and its presence can hinder the performance of current models that were not implemented to withstand such noises, but must still perform well even under duress. This is due to the fact that current approaches are built for and trained with clean and complete data, and thus are not able to extract features that can adequately represent incomplete data. Our proposed approach consists of obtaining intermediate input representations by applying an embedding layer to the input tokens followed by vanilla transformers. These intermediate features are given as input to novel denoising transformers which are responsible for obtaining richer input representations. The proposed approach takes advantage of stacks of multilayer perceptrons for the reconstruction of missing words' embeddings by extracting more abstract and meaningful hidden feature vectors, and bidirectional transformers for improved embedding representation. We consider two datasets for training and evaluation: the Chatbot Natural Language Understanding Evaluation Corpus and Kaggle's Twitter Sentiment Corpus. Our model shows improved F1-scores and better robustness in informal/incorrect texts present in tweets and in texts with Speech-to-Text error in the sentiment and intent classification tasks.1.


Assuntos
Bases de Dados Factuais/classificação , Processamento de Linguagem Natural , Redes Neurais de Computação , Fala/classificação , Humanos , Idioma
4.
Elife ; 92020 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-32223894

RESUMO

Speech perception presumably arises from internal models of how specific sensory features are associated with speech sounds. These features change constantly (e.g. different speakers, articulation modes etc.), and listeners need to recalibrate their internal models by appropriately weighing new versus old evidence. Models of speech recalibration classically ignore this volatility. The effect of volatility in tasks where sensory cues were associated with arbitrary experimenter-defined categories were well described by models that continuously adapt the learning rate while keeping a single representation of the category. Using neurocomputational modelling we show that recalibration of natural speech sound categories is better described by representing the latter at different time scales. We illustrate our proposal by modeling fast recalibration of speech sounds after experiencing the McGurk effect. We propose that working representations of speech categories are driven both by their current environment and their long-term memory representations.


People can distinguish words or syllables even though they may sound different with every speaker. This striking ability reflects the fact that our brain is continually modifying the way we recognise and interpret the spoken word based on what we have heard before, by comparing past experience with the most recent one to update expectations. This phenomenon also occurs in the McGurk effect: an auditory illusion in which someone hears one syllable but sees a person saying another syllable and ends up perceiving a third distinct sound. Abstract models, which provide a functional rather than a mechanistic description of what the brain does, can test how humans use expectations and prior knowledge to interpret the information delivered by the senses at any given moment. Olasagasti and Giraud have now built an abstract model of how brains recalibrate perception of natural speech sounds. By fitting the model with existing experimental data using the McGurk effect, the results suggest that, rather than using a single sound representation that is adjusted with each sensory experience, the brain recalibrates sounds at two different timescales. Over and above slow "procedural" learning, the findings show that there is also rapid recalibration of how different sounds are interpreted. This working representation of speech enables adaptation to changing or noisy environments and illustrates that the process is far more dynamic and flexible than previously thought.


Assuntos
Simulação por Computador , Fonética , Percepção da Fala , Fala/classificação , Estimulação Acústica , Percepção Auditiva , Humanos , Fala/fisiologia , Fatores de Tempo
5.
IEEE J Biomed Health Inform ; 23(6): 2265-2275, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31478879

RESUMO

Currently, depression has become a common mental disorder and one of the main causes of disability worldwide. Due to the difference in depressive symptoms evoked by individual differences, how to design comprehensive and effective depression detection methods has become an urgent demand. This study explored from physiological and behavioral perspectives simultaneously and fused pervasive electroencephalography (EEG) and vocal signals to make the detection of depression more objective, effective and convenient. After extraction of several effective features for these two types of signals, we trained six representational classifiers on each modality, then denoted diversity and correlation of decisions from different classifiers using co-decision tensor and combined these decisions into the ultimate classification result with multi-agent strategy. Experimental results on 170 (81 depressed patients and 89 normal controls) subjects showed that the proposed multi-modal depression detection strategy is superior to the single-modal classifiers or other typical late fusion strategies in accuracy, f1-score and sensitivity. This work indicates that late fusion of pervasive physiological and behavioral signals is promising for depression detection and the multi-agent strategy can take advantage of diversity and correlation of different classifiers effectively to gain a better final decision.


Assuntos
Depressão/diagnóstico , Eletroencefalografia/métodos , Processamento de Sinais Assistido por Computador , Espectrografia do Som/métodos , Fala/classificação , Algoritmos , Feminino , Humanos , Masculino
6.
J Speech Lang Hear Res ; 62(9): 3265-3275, 2019 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-31433709

RESUMO

Purpose To better enable communication among researchers, clinicians, and caregivers, we aimed to assess how untrained listeners classify early infant vocalization types in comparison to terms currently used by researchers and clinicians. Method Listeners were caregivers with no prior formal education in speech and language development. A 1st group of listeners reported on clinician/researcher-classified vowel, squeal, growl, raspberry, whisper, laugh, and cry vocalizations obtained from archived video/audio recordings of 10 infants from 4 through 12 months of age. A list of commonly used terms was generated based on listener responses and the standard research terminology. A 2nd group of listeners was presented with the same vocalizations and asked to select terms from the list that they thought best described the sounds. Results Classifications of the vocalizations by listeners largely overlapped with published categorical descriptors and yielded additional insight into alternate terms commonly used. The biggest discrepancies were found for the vowel category. Conclusion Prior research has shown that caregivers are accurate in identifying canonical babbling, a major prelinguistic vocalization milestone occurring at about 6-7 months of age. This indicates that caregivers are also well attuned to even earlier emerging vocalization types. This supports the value of continuing basic and clinical research on the vocal types infants produce in the 1st months of life and on their potential diagnostic utility, and may also help improve communication between speech-language pathologists and families.


Assuntos
Linguagem Infantil , Fonação/fisiologia , Fala/classificação , Fala/fisiologia , Adulto , Feminino , Audição , Humanos , Lactente , Masculino , Adulto Jovem
7.
IEEE J Biomed Health Inform ; 23(6): 2294-2301, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31034426

RESUMO

Childhood anxiety and depression often go undiagnosed. If left untreated these conditions, collectively known as internalizing disorders, are associated with long-term negative outcomes including substance abuse and increased risk for suicide. This paper presents a new approach for identifying young children with internalizing disorders using a 3-min speech task. We show that machine learning analysis of audio data from the task can be used to identify children with an internalizing disorder with 80% accuracy (54% sensitivity, 93% specificity). The speech features most discriminative of internalizing disorder are analyzed in detail, showing that affected children exhibit especially low-pitch voices, with repeatable speech inflections and content, and high-pitched response to surprising stimuli relative to controls. This new tool is shown to outperform clinical thresholds on parent-reported child symptoms, which identify children with an internalizing disorder with lower accuracy (67-77% versus 80%), and similar specificity (85-100% versus 93%), and sensitivity (0-58% versus 54%) in this sample. These results point toward the future use of this approach for screening children for internalizing disorders so that interventions can be deployed when they have the highest chance for long-term success.


Assuntos
Ansiedade/diagnóstico , Depressão/diagnóstico , Aprendizado de Máquina , Fala/classificação , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Psicopatologia , Processamento de Sinais Assistido por Computador
8.
IEEE Trans Cybern ; 49(9): 3293-3306, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29994138

RESUMO

It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.


Assuntos
Face , Reconhecimento Automatizado de Padrão/métodos , Fala , Algoritmos , Teorema de Bayes , Face/anatomia & histologia , Face/fisiologia , Expressão Facial , Músculos Faciais/fisiologia , Humanos , Processamento de Sinais Assistido por Computador , Fala/classificação , Fala/fisiologia
9.
IEEE J Biomed Health Inform ; 23(4): 1618-1630, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30137018

RESUMO

Parkinson's disease is a neurodegenerative disorder characterized by a variety of motor symptoms. Particularly, difficulties to start/stop movements have been observed in patients. From a technical/diagnostic point of view, these movement changes can be assessed by modeling the transitions between voiced and unvoiced segments in speech, the movement when the patient starts or stops a new stroke in handwriting, or the movement when the patient starts or stops the walking process. This study proposes a methodology to model such difficulties to start or to stop movements considering information from speech, handwriting, and gait. We used those transitions to train convolutional neural networks to classify patients and healthy subjects. The neurological state of the patients was also evaluated according to different stages of the disease (initial, intermediate, and advanced). In addition, we evaluated the robustness of the proposed approach when considering speech signals in three different languages: Spanish, German, and Czech. According to the results, the fusion of information from the three modalities is highly accurate to classify patients and healthy subjects, and it shows to be suitable to assess the neurological state of the patients in several stages of the disease. We also aimed to interpret the feature maps obtained from the deep learning architectures with respect to the presence or absence of the disease and the neurological state of the patients. As far as we know, this is one of the first works that considers multimodal information to assess Parkinson's disease following a deep learning approach.


Assuntos
Aprendizado Profundo , Doença de Parkinson/classificação , Processamento de Sinais Assistido por Computador , Idoso , Idoso de 80 Anos ou mais , Bases de Dados Factuais , Feminino , Marcha/fisiologia , Análise da Marcha , Escrita Manual , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/diagnóstico , Doença de Parkinson/fisiopatologia , Curva ROC , Fala/classificação
10.
J Acoust Soc Am ; 144(5): EL410, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30522292

RESUMO

Recent research has revealed substantial between-speaker variation in speech rhythm, which in effect refers to the coordination of consonants and vowels over time. In the current proof-of-concept study, the hypothesis was investigated that these idiosyncrasies arise, in part, from differences in the tongue's movement amplitude. Speech rhythm was parameterized by means of the percentage over which speech is vocalic (%V) in the German pronoun "sie" [ziː]. The findings support the hypothesis: all else being equal, idiosyncratic %V values behaved proportionally to a speaker's tongue movement area. This research underlines the importance of studying language-external factors, such as a speaker's individual tongue movement behavior, to investigate variation in temporal coordination.


Assuntos
Movimento/fisiologia , Fala/fisiologia , Língua/fisiologia , Adulto , Algoritmos , Fenômenos Eletromagnéticos , Feminino , Alemanha/epidemiologia , Humanos , Idioma , Masculino , Fonética , Fala/classificação , Fatores de Tempo , Língua/anatomia & histologia
11.
PLoS One ; 13(12): e0207452, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30517122

RESUMO

Foreign accents have been shown to have considerable impact on how language is processed [1]. However, the impact of a foreign accent on semantic processing is not well understood. Conflicting results have been reported by previous event-related potential (ERP) studies investigating the impact of foreign-accentedness on the N400 effect elicited by semantic violations. Furthermore, these studies have only examined a subset of the four characteristics of the N400 (i.e. onset latency, latency, amplitude, and scalp distribution), and have been conducted in linguistic environments where foreign-accented speech is relatively uncommon. The current study therefore compared the N400 effect elicited by semantic violations in native Australian English vs. Mandarin-accented English, in a context where foreign-accented speech is common. Factors which may be responsible for individual variability in N400 amplitude were also investigated. The results showed no differences between the N400s elicited by native and foreign-accented speech in any of the four aforementioned characteristics. However, the analysis of individual variability revealed an effect of familiarity with foreign-accented speech on the amplitude of N400 effects for semantic violations. An effect of working memory capacity on N400 amplitude was also found. These findings highlight the relevance of the ambient linguistic environment for studies of speech processing, and demonstrate the interacting influences of both speaker- and listener-related factors on semantic processing.


Assuntos
Percepção Auditiva/fisiologia , Compreensão/fisiologia , Percepção da Fala/fisiologia , Adolescente , Adulto , Austrália , Potenciais Evocados/fisiologia , Feminino , Humanos , Idioma , Masculino , Memória de Curto Prazo/fisiologia , Fonética , Reconhecimento Psicológico , Semântica , Fala/classificação
12.
Med Biol Eng Comput ; 56(6): 1041-1051, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29134413

RESUMO

In this paper, we present a performance comparison of 14 feature evaluation criteria and 4 classifiers for isolated Thai word classification based on electromyography signals (EMG) to find a near-optimal criterion and classifier. Ten subjects spoke 11 Thai number words in both audible and silent modes while the EMG signal from five positions of the facial and neck muscles were captured. After signal collection and preprocessing, 22 EMG features widely used in the EMG recognition field were computed and were then evaluated based on 14 evaluation criteria including both independent criteria (IC) and dependent criteria (DC) for feature evaluation and selection. Subsequently, the top nine features were selected for each criterion, and were used as inputs to classifiers. Four types of classifier were employed with 10-fold cross-validation to estimate classification performance. The results showed that features selected with a DC on a Fisher's least square linear discriminant classifier (D_FLDA) used with a linear Bayes normal classifier (LBN) gave the best average accuracies, of 93.25 and 80.12% in the audible and the silent modes, respectively.


Assuntos
Eletromiografia/métodos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Fala/fisiologia , Adulto , Algoritmos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fala/classificação
13.
Behav Brain Sci ; 40: e46, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26434499

RESUMO

How does sign language compare with gesture, on the one hand, and spoken language on the other? Sign was once viewed as nothing more than a system of pictorial gestures without linguistic structure. More recently, researchers have argued that sign is no different from spoken language, with all of the same linguistic structures. The pendulum is currently swinging back toward the view that sign is gestural, or at least has gestural components. The goal of this review is to elucidate the relationships among sign language, gesture, and spoken language. We do so by taking a close look not only at how sign has been studied over the past 50 years, but also at how the spontaneous gestures that accompany speech have been studied. We conclude that signers gesture just as speakers do. Both produce imagistic gestures along with more categorical signs or words. Because at present it is difficult to tell where sign stops and gesture begins, we suggest that sign should not be compared with speech alone but should be compared with speech-plus-gesture. Although it might be easier (and, in some cases, preferable) to blur the distinction between sign and gesture, we argue that distinguishing between sign (or speech) and gesture is essential to predict certain types of learning and allows us to understand the conditions under which gesture takes on properties of sign, and speech takes on properties of gesture. We end by calling for new technology that may help us better calibrate the borders between sign and gesture.


Assuntos
Gestos , Língua de Sinais , Fala/classificação , Humanos , Desenvolvimento da Linguagem , Aprendizagem/fisiologia , Fala/fisiologia
14.
PLoS One ; 11(8): e0160588, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27529813

RESUMO

Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output.


Assuntos
Informática/métodos , Fala/classificação , Acústica , Adulto , Automação , Humanos , Lactente , Estatística como Assunto
15.
Clinics (Sao Paulo) ; 71(3): 114-27, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27074171

RESUMO

OBJECTIVE: To propose and test the applicability of a dysphonia risk screening protocol with score calculation in individuals with and without dysphonia. METHOD: This descriptive cross-sectional study included 365 individuals (41 children, 142 adult women, 91 adult men and 91 seniors) divided into a dysphonic group and a non-dysphonic group. The protocol consisted of 18 questions and a score was calculated using a 10-cm visual analog scale. The measured value on the visual analog scale was added to the overall score, along with other partial scores. Speech samples allowed for analysis/assessment of the overall degree of vocal deviation and initial definition of the respective groups and after six months, the separation of the groups was confirmed using an acoustic analysis. RESULTS: The mean total scores were different between the groups in all samples. Values ranged between 37.0 and 57.85 in the dysphonic group and between 12.95 and 19.28 in the non-dysphonic group, with overall means of 46.09 and 15.55, respectively. High sensitivity and specificity were demonstrated when discriminating between the groups with the following cut-off points: 22.50 (children), 29.25 (adult women), 22.75 (adult men), and 27.10 (seniors). CONCLUSION: The protocol demonstrated high sensitivity and specificity in differentiating groups of individuals with and without dysphonia in different sample groups and is thus an effective instrument for use in voice clinics.


Assuntos
Disfonia/diagnóstico , Fala/classificação , Inquéritos e Questionários , Escala Visual Analógica , Adulto , Idoso , Percepção Auditiva/fisiologia , Criança , Pré-Escolar , Estudos Transversais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Medição de Risco/métodos , Sensibilidade e Especificidade , Acústica da Fala , Qualidade da Voz , Adulto Jovem
17.
Clinics ; 71(3): 114-127, Mar. 2016. tab, graf
Artigo em Inglês | LILACS | ID: lil-778994

RESUMO

OBJECTIVE: To propose and test the applicability of a dysphonia risk screening protocol with score calculation in individuals with and without dysphonia. METHOD: This descriptive cross-sectional study included 365 individuals (41 children, 142 adult women, 91 adult men and 91 seniors) divided into a dysphonic group and a non-dysphonic group. The protocol consisted of 18 questions and a score was calculated using a 10-cm visual analog scale. The measured value on the visual analog scale was added to the overall score, along with other partial scores. Speech samples allowed for analysis/assessment of the overall degree of vocal deviation and initial definition of the respective groups and after six months, the separation of the groups was confirmed using an acoustic analysis. RESULTS: The mean total scores were different between the groups in all samples. Values ranged between 37.0 and 57.85 in the dysphonic group and between 12.95 and 19.28 in the non-dysphonic group, with overall means of 46.09 and 15.55, respectively. High sensitivity and specificity were demonstrated when discriminating between the groups with the following cut-off points: 22.50 (children), 29.25 (adult women), 22.75 (adult men), and 27.10 (seniors). CONCLUSION: The protocol demonstrated high sensitivity and specificity in differentiating groups of individuals with and without dysphonia in different sample groups and is thus an effective instrument for use in voice clinics.


Assuntos
Humanos , Masculino , Feminino , Pré-Escolar , Criança , Adulto , Pessoa de Meia-Idade , Idoso , Adulto Jovem , Fala/classificação , Inquéritos e Questionários , Disfonia/diagnóstico , Escala Visual Analógica , Percepção Auditiva/fisiologia , Acústica da Fala , Qualidade da Voz , Estudos Transversais , Sensibilidade e Especificidade , Medição de Risco/métodos
18.
Atten Percept Psychophys ; 78(2): 566-82, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26542400

RESUMO

Learning nonnative speech categories is often considered a challenging task in adulthood. This difficulty is driven by cross-language differences in weighting critical auditory dimensions that differentiate speech categories. For example, previous studies have shown that differentiating Mandarin tonal categories requires attending to dimensions related to pitch height and direction. Relative to native speakers of Mandarin, the pitch direction dimension is underweighted by native English speakers. In the current study, we examined the effect of explicit instructions (dimension instruction) on native English speakers' Mandarin tone category learning within the framework of a dual-learning systems (DLS) model. This model predicts that successful speech category learning is initially mediated by an explicit, reflective learning system that frequently utilizes unidimensional rules, with an eventual switch to a more implicit, reflexive learning system that utilizes multidimensional rules. Participants were explicitly instructed to focus and/or ignore the pitch height dimension, the pitch direction dimension, or were given no explicit prime. Our results show that instruction instructing participants to focus on pitch direction, and instruction diverting attention away from pitch height, resulted in enhanced tone categorization. Computational modeling of participant responses suggested that instruction related to pitch direction led to faster and more frequent use of multidimensional reflexive strategies and enhanced perceptual selectivity along the previously underweighted pitch direction dimension.


Assuntos
Estimulação Acústica/métodos , Aprendizagem/classificação , Aprendizagem/fisiologia , Percepção da Fala/fisiologia , Fala/classificação , Fala/fisiologia , Adulto , Atenção/fisiologia , Feminino , Humanos , Idioma , Masculino
19.
Psicológica (Valencia, Ed. impr.) ; 37(1): 85-104, 2016. tab
Artigo em Inglês | IBECS | ID: ibc-148722

RESUMO

In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a two-parameter logistic testlet response model and a two-parameter logistic item response model. Results showed that the cloze test produced substantially higher magnitudes of local dependence than reading items, albeit the levels of local dependency produced by reading items was not ignorable. Further analyses demonstrated that while even substantial magnitudes of testlet effect does not impact parameter estimates it does influence test reliability and information. Implications of the research for foreign language proficiency testing, where testlets are regularly used, are discussed (AU)


No disponible


Assuntos
Humanos , Masculino , Feminino , Compreensão/ética , Compreensão/fisiologia , Fala/fisiologia , Inquéritos e Questionários/classificação , Engenharia/métodos , Linguística/educação , Psicologia Educacional/educação , Compreensão/classificação , Fala/classificação , Inquéritos e Questionários , Engenharia/classificação , Linguística/tendências , Psicologia Educacional/métodos
20.
Sensors (Basel) ; 16(1)2015 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-26712757

RESUMO

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one.


Assuntos
Emoções/classificação , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Fala/classificação , Feminino , Humanos , Masculino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...