Pesquisa | Portal Regional da BVS

Towards interpretable speech biomarkers: exploring MFCCs.

Tracey, Brian; Volfson, Dmitri; Glass, James; Haulcy, R'mani; Kostrzebski, Melissa; Adams, Jamie; Kangarloo, Tairmae; Brodtmann, Amy; Dorsey, E Ray; Vogel, Adam.

Sci Rep ; 13(1): 22787, 2023 12 21.

Artigo em Inglês | MEDLINE | ID: mdl-38123603

RESUMO

While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning approaches may lack clinical interpretability. As an example, Mel frequency cepstral coefficients (MFCCs) have been identified in several studies as a useful marker of disease, but are regarded as uninterpretable. Here we explore correlations between MFCC coefficients and more interpretable speech biomarkers. In particular we quantify the MFCC2 endpoint, which can be interpreted as a weighted ratio of low- to high-frequency energy, a concept which has been previously linked to disease-induced voice changes. By exploring MFCC2 in several datasets, we show how its sensitivity to disease can be increased by adjusting computation parameters.

Assuntos

Acústica da Fala , Fala , Processamento de Sinais Assistido por Computador

Classifying Alzheimer's Disease Using Audio and Text-Based Representations of Speech.

Haulcy, R'mani; Glass, James.

Front Psychol ; 11: 624137, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33519651

RESUMO

Alzheimer's Disease (AD) is a form of dementia that affects the memory, cognition, and motor skills of patients. Extensive research has been done to develop accessible, cost-effective, and non-invasive techniques for the automatic detection of AD. Previous research has shown that speech can be used to distinguish between healthy patients and afflicted patients. In this paper, the ADReSS dataset, a dataset balanced by gender and age, was used to automatically classify AD from spontaneous speech. The performance of five classifiers, as well as a convolutional neural network and long short-term memory network, was compared when trained on audio features (i-vectors and x-vectors) and text features (word vectors, BERT embeddings, LIWC features, and CLAN features). The same audio and text features were used to train five regression models to predict the Mini-Mental State Examination score for each patient, a score that has a maximum value of 30. The top-performing classification models were the support vector machine and random forest classifiers trained on BERT embeddings, which both achieved an accuracy of 85.4% on the test set. The best-performing regression model was the gradient boosting regression model trained on BERT embeddings and CLAN features, which had a root mean squared error of 4.56 on the test set. The performance on both tasks illustrates the feasibility of using speech to classify AD and predict neuropsychological scores.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA