Depression detection based on linear and nonlinear speech features in I-vector/SVDA framework.

Mobram, Shamim; Vali, Mansour

Mobram, Shamim; Vali, Mansour.

Mobram S; Speech and Sound Processing Lab, Department of Electrical Engineering, K.N. Toosi University of Technology, Tehran, Iran. Electronic address: sh.mobram@email.kntu.ac.ir.
Vali M; Speech and Sound Processing Lab, Department of Electrical Engineering, K.N. Toosi University of Technology, Tehran, Iran. Electronic address: mansour.vali@eetd.kntu.ac.ir.

Comput Biol Med ; 149: 105926, 2022 10.

Article in English | MEDLINE | ID: covidwho-2035907

ABSTRACT

ABSTRACT

This study proposes depression detection systems based on the i-vector framework for classifying speakers as depressed or healthy and predicting depression levels according to the Beck Depression Inventory-II (BDI-II). Linear and non-linear speech features are investigated as front-end features to i-vectors. To take advantage of the complementary effects of features, i-vector systems based on linear and non-linear features are combined through the decision-level fusion. Variability compensation techniques, such as Linear Discriminant Analysis (LDA) and Within-Class Covariance Normalization (WCCN), are widely used to reduce unwanted variabilities. A more generalizable technique than the LDA is required when limited training data are available. We employ a support vector discriminant analysis (SVDA) technique that uses the boundary of classes to find discriminatory directions to address this problem. Experiments conducted on the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) depression database indicate that the best accuracy improvement obtained using SVDA is about 15.15% compared to the uncompensated i-vectors. In all cases, experimental results confirm that the decision-level fusion of i-vector systems based on three feature sets, TEO-CB-Auto-Env+Δ, Glottal+Δ, and MFCC+Δ+ΔΔ, achieves the best results. This fusion significantly improves classifying results, yielding an accuracy of 90%. The combination of SVDA-transformed BDI-II score prediction systems based on these three feature sets achieved RMSE and MAE of 8.899 and 6.991, respectively, which means 29.18% and 30.34% improvements in RMSE and MAE, respectively, over the baseline system on the test partition. Furthermore, this proposed combination outperforms other audio-based studies available in the literature using the AVEC 2014 database.

Subject(s)

Depression; Speech; Databases, Factual; Depression/diagnosis; Discriminant Analysis; Emotions

Keywords

Depression detection; SVDA; Speech feature; TEO-CB-Auto-Env; i-vector

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: Speech / Depression Type of study: Diagnostic study / Prognostic study Language: English Journal: Comput Biol Med Year: 2022 Document Type: Article

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google