Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 155(2): 1198-1215, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38341746

RESUMO

Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to the diversity of acoustic events/conditions in such daylong audio streams, automated speaker diarization technology would need to be advanced to address this challenging domain for segmenting audio as well as information extraction. This study investigates alternate deep learning-based lightweight, knowledge-distilled, diarization solutions for segmenting classroom interactions of 3-5 years old children with teachers. In this context, the focus on speech-type diarization which classifies speech segments as being either from adults or children partitioned across multiple classrooms. Our lightest CNN model achieves a best F1-score of ∼76.0% on data from two classrooms, based on dev and test sets of each classroom. It is utilized with automatic speech recognition-based re-segmentation modules to perform child-adult diarization. Additionally, F1-scores are obtained for individual segments with corresponding speaker tags (e.g., adult vs child), which provide knowledge for educators on child engagement through naturalistic communications. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics have been used for broad-based feedback for teachers with the help of visualizations.


Assuntos
Percepção da Fala , Fala , Adulto , Humanos , Pré-Escolar , Comunicação , Idioma , Desenvolvimento da Linguagem
2.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 4909-4913, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30441444

RESUMO

Pediatric speech sound disorders (SSD) encompass a wide range of speech production deficits that can interfere with children's educational growth, social engagement and employment opportunities. Early detection of SSDs can facilitate timely intervention and minimize the potential for life-long adverse effects, but distinguishing between typical and atypical speech production in preschoolers is challenging due to developmental and individual variability in speech acquisition. In this study we apply Gaussian Mixture Models to speech samples from 3- to 6-year-old children, recorded by parents using an iOS app. Speech-language pathologists previously classified the samples as positive ('at risk' speech, warranting a referral for a speech-language evaluation) or negative ('no risk' speech). In a series of exploratory analyses, novel distance measures and group scoring techniques are developed which show good subject-level prediction accuracy. Our results provide evidence that it may be feasible to use Speech Processing and Speaker Verification techniques to model and screen speech samples from children for possible speech sound disorders.


Assuntos
Transtorno Fonológico , Fala , Criança , Pré-Escolar , Humanos , Pais , Medida da Produção da Fala
3.
Int J Speech Lang Pathol ; 20(6): 669-679, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30409057

RESUMO

Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility × speaking rate) from acoustic and articulatory features of the recorded samples. Result: Acoustic, lip movement, and tongue movement information separately, yielded a R2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R2 (0.712) and the lowest RMSE (37.562 WPM). Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.


Assuntos
Esclerose Lateral Amiotrófica/complicações , Distúrbios da Fala/diagnóstico , Distúrbios da Fala/etiologia , Interface para o Reconhecimento da Fala , Máquina de Vetores de Suporte , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Acústica da Fala , Inteligibilidade da Fala/fisiologia , Medida da Produção da Fala/métodos
4.
Artigo em Inglês | MEDLINE | ID: mdl-29423454

RESUMO

Amyotrophic lateral sclerosis (ALS) is a rapidly progressive neurological disease that affects the speech motor functions, resulting in dysarthria, a motor speech disorder. Speech and articulation deterioration is an indicator of the disease progression of ALS; timely monitoring of the disease progression is critical for clinical management of these patients. This paper investigated machine prediction of intelligible speaking rate of nine individuals with ALS based on a small number of speech acoustic and articulatory samples. Two feature selection techniques - decision tree and gradient boosting - were used with support vector regression for predicting the intelligible speaking rate. Experimental results demonstrated the feasibility of predicting intelligible speaking rate from only a small number of speech samples. Furthermore, adding articulatory features to acoustic features improved prediction performance, when decision tree was used as the feature selection technique.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...