Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Med Eng Phys ; 126: 104161, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38621841

RESUMO

The application of deep learning to the classification of pulse waves in Traditional Chinese Medicine (TCM) related to hypertensive target organ damage (TOD) is hindered by challenges such as low classification accuracy and inadequate generalization performance. To address these challenges, we introduce a lightweight transfer learning model named MobileNetV2SCP. This model transforms time-domain pulse waves into 36-dimensional frequency-domain waveform feature maps and establishes a dedicated pre-training network based on these maps to enhance the learning capability for small samples. To improve global feature correlation, we incorporate a novel fusion attention mechanism (SAS) into the inverted residual structure, along with the utilization of 3 × 3 convolutional layers and BatchNorm layers to mitigate model overfitting. The proposed model is evaluated using cross-validation results from 805 cases of pulse waves associated with hypertensive TOD. The assessment metrics, including Accuracy (92.74 %), F1-score (91.47 %), and Area Under Curve (AUC) (97.12 %), demonstrate superior classification accuracy and generalization performance compared to various state-of-the-art models. Furthermore, this study investigates the correlations between time-domain and frequency-domain features in pulse waves and their classification in hypertensive TOD. It analyzes key factors influencing pulse wave classification, providing valuable insights for the clinical diagnosis of TOD.


Assuntos
Hipertensão , Humanos , Hipertensão/complicações
2.
Heliyon ; 10(4): e26583, 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38434048

RESUMO

In this manuscript, we introduce a novel methodology for modeling acoustic units within a mobile architecture, employing a synergistic combination of various motivating techniques, including deep learning, sparse coding, and wavelet networks. The core concept involves constructing a Deep Sparse Wavelet Network (DSWN) through the integration of stacked wavelet autoencoders. The DSWN is designed to classify a specific class and discern it from other classes within a dataset of acoustic units. Mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) features are utilized for encoding speech units. This approach is tailored to leverage the computational capabilities of mobile devices by establishing deep networks with minimal connections, thereby immediately reducing computational overhead. The experimental findings demonstrate the efficacy of our system when applied to a segmented corpus of Arabic words. Notwithstanding promising results, we will explore the limitations of our methodology. One limitation concerns the use of a specific dataset of Arabic words. The generalizability of the sparse deep wavelet network (DSWN) to various contexts requires further investigation "We will evaluate the impact of speech variations, such as accents, on the performance of our model, for a nuanced understanding.

3.
Sci Rep ; 14(1): 6589, 2024 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-38504098

RESUMO

Identifying and recognizing the food on the basis of its eating sounds is a challenging task, as it plays an important role in avoiding allergic foods, providing dietary preferences to people who are restricted to a particular diet, showcasing its cultural significance, etc. In this research paper, the aim is to design a novel methodology that helps to identify food items by analyzing their eating sounds using various deep learning models. To achieve this objective, a system has been proposed that extracts meaningful features from food-eating sounds with the help of signal processing techniques and deep learning models for classifying them into their respective food classes. Initially, 1200 audio files for 20 food items labeled have been collected and visualized to find relationships between the sound files of different food items. Later, to extract meaningful features, various techniques such as spectrograms, spectral rolloff, spectral bandwidth, and mel-frequency cepstral coefficients are used for the cleaning of audio files as well as to capture the unique characteristics of different food items. In the next phase, various deep learning models like GRU, LSTM, InceptionResNetV2, and the customized CNN model have been trained to learn spectral and temporal patterns in audio signals. Besides this, the models have also been hybridized i.e. Bidirectional LSTM + GRU and RNN + Bidirectional LSTM, and RNN + Bidirectional GRU to analyze their performance for the same labeled data in order to associate particular patterns of sound with their corresponding class of food item. During evaluation, the highest accuracy, precision,F1 score, and recall have been obtained by GRU with 99.28%, Bidirectional LSTM + GRU with 97.7% as well as 97.3%, and RNN + Bidirectional LSTM with 97.45%, respectively. The results of this study demonstrate that deep learning models have the potential to precisely identify foods on the basis of their sound by computing the best outcomes.


Assuntos
Aprendizado Profundo , Humanos , Reconhecimento Psicológico , Alimentos , Rememoração Mental , Registros
4.
Brain Inform ; 11(1): 6, 2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-38340211

RESUMO

Sleep stage classification is a necessary step for diagnosing sleep disorders. Generally, experts use traditional methods based on every 30 seconds (s) of the biological signals, such as electrooculograms (EOGs), electrocardiograms (ECGs), electromyograms (EMGs), and electroencephalograms (EEGs), to classify sleep stages. Recently, various state-of-the-art approaches based on a deep learning model have been demonstrated to have efficient and accurate outcomes in sleep stage classification. In this paper, a novel deep convolutional neural network (CNN) combined with a long short-time memory (LSTM) model is proposed for sleep scoring tasks. A key frequency domain feature named Mel-frequency Cepstral Coefficient (MFCC) is extracted from EEG and EMG signals. The proposed method can learn features from frequency domains on different bio-signal channels. It firstly extracts the MFCC features from multi-channel signals, and then inputs them to several convolutional layers and an LSTM layer. Secondly, the learned representations are fed to a fully connected layer and a softmax classifier for sleep stage classification. The experiments are conducted on two widely used sleep datasets, Sleep Heart Health Study (SHHS) and Vincent's University Hospital/University College Dublin Sleep Apnoea (UCDDB) to test the effectiveness of the method. The results of this study indicate that the model can perform well in the classification of sleep stages using the features of the 2-dimensional (2D) MFCC feature. The advantage of using the feature is that it can be used to input a two-dimensional data stream, which can be used to retain information about each sleep stage. Using 2D data streams can reduce the time it takes to retrieve the data from the one-dimensional stream. Another advantage of this method is that it eliminates the need for deep layers, which can help improve the performance of the model. For instance, by reducing the number of layers, our seven layers of the model structure takes around 400 s to train and test 100 subjects in the SHHS1 dataset. Its best accuracy and Cohen's kappa are 82.35% and 0.75 for the SHHS dataset, and 73.07% and 0.63 for the UCDDB dataset, respectively.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38340022

RESUMO

Multimodal sentiment analysis, an increasingly vital task in the realms of natural language processing and machine learning, addresses the nuanced understanding of emotions and sentiments expressed across diverse data sources. This study presents the Hybrid LXGB (Long short-term memory Extreme Gradient Boosting) Model, a novel approach for multimodal sentiment analysis that merges the strengths of long short-term memory (LSTM) and XGBoost classifiers. The primary objective is to address the intricate task of understanding emotions across diverse data sources, such as textual data, images, and audio cues. By leveraging the capabilities of deep learning and gradient boosting, the Hybrid LXGB Model achieves an exceptional accuracy of 97.18% on the CMU-MOSEI dataset, surpassing alternative classifiers, including LSTM, CNN, DNN, and XGBoost. This study not only introduces an innovative model but also contributes to the field by showcasing its effectiveness and balance in capturing the nuanced spectrum of sentiments within multimodal datasets. The comparison with equivalent studies highlights the model's remarkable success, emphasizing its potential for practical applications in real-world scenarios. The Hybrid LXGB Model offers a unique and promising perspective in the realm of multimodal sentiment analysis, demonstrating the significance of integrating LSTM and XGBoost for enhanced performance.

6.
Network ; 35(1): 1-26, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38018148

RESUMO

In the diagnosis of cardiac disorders Heart sound has a major role, and early detection is crucial to safeguard the patients. Computerized strategies of heart sound classification advocate intensive and more exact results in a quick and better manner. Using a hybrid optimization-controlled deep learning strategy this paper proposed an automatic heart sound classification module. The parameter tuning of the Deep Neural Network (DNN) classifier in a satisfactory manner is the importance of this research which depends on the Hybrid Sneaky optimization algorithm. The developed sneaky optimization algorithm inherits the traits of questing and societal search agents. Moreover, input data from the Phonocardiogram (PCG) database undergoes the process of feature extraction which extract the important features, like statistical, Heart Rate Variability (HRV), and to enhance the performance of this model, the features of Mel frequency Cepstral coefficients (MFCC) are assisted. The developed Sneaky optimization-based DNN classifier's performance is determined in respect of the metrics, namely precision, accuracy, specificity, and sensitivity, which are around 97%, 96.98%, 97%, and 96.9%, respectively.


Assuntos
Cardiopatias , Ruídos Cardíacos , Humanos , Redes Neurais de Computação , Algoritmos , Bases de Dados Factuais
7.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 40(6): 1152-1159, 2023 Dec 25.
Artigo em Chinês | MEDLINE | ID: mdl-38151938

RESUMO

Feature extraction methods and classifier selection are two critical steps in heart sound classification. To capture the pathological features of heart sound signals, this paper introduces a feature extraction method that combines mel-frequency cepstral coefficients (MFCC) and power spectral density (PSD). Unlike conventional classifiers, the adaptive neuro-fuzzy inference system (ANFIS) was chosen as the classifier for this study. In terms of experimental design, we compared different PSDs across various time intervals and frequency ranges, selecting the characteristics with the most effective classification outcomes. We compared four statistical properties, including mean PSD, standard deviation PSD, variance PSD, and median PSD. Through experimental comparisons, we found that combining the features of median PSD and MFCC with heart sound systolic period of 100-300 Hz yielded the best results. The accuracy, precision, sensitivity, specificity, and F1 score were determined to be 96.50%, 99.27%, 93.35%, 99.60%, and 96.35%, respectively. These results demonstrate the algorithm's significant potential for aiding in the diagnosis of congenital heart disease.


Assuntos
Cardiopatias Congênitas , Ruídos Cardíacos , Humanos , Redes Neurais de Computação , Algoritmos
8.
Comput Biol Med ; 163: 107153, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37321101

RESUMO

This study proposes a new deep learning-based method that demonstrates high performance in detecting Covid-19 disease from cough, breath, and voice signals. This impressive method, named CovidCoughNet, consists of a deep feature extraction network (InceptionFireNet) and a prediction network (DeepConvNet). The InceptionFireNet architecture, based on Inception and Fire modules, was designed to extract important feature maps. The DeepConvNet architecture, which is made up of convolutional neural network blocks, was developed to predict the feature vectors obtained from the InceptionFireNet architecture. The COUGHVID dataset containing cough data and the Coswara dataset containing cough, breath, and voice signals were used as the data sets. The pitch-shifting technique was used to data augmentation the signal data, which significantly contributed to improving performance. Additionally, Chroma features (CF), Root mean square energy (RMSE), Spectral centroid (SC), Spectral bandwidth (SB), Spectral rolloff (SR), Zero crossing rate (ZCR), and Mel frequency cepstral coefficients (MFCC) feature extraction techniques were used to extract important features from voice signals. Experimental studies have shown that using the pitch-shifting technique improved performance by around 3% compared to raw signals. When the proposed model was used with the COUGHVID dataset (Healthy, Covid-19, and Symptomatic), a high performance of 99.19% accuracy, 0.99 precision, 0.98 recall, 0.98 F1-Score, 97.77% specificity, and 98.44% AUC was achieved. Similarly, when the voice data in the Coswara dataset was used, higher performance was achieved compared to the cough and breath studies, with 99.63% accuracy, 100% precision, 0.99 recall, 0.99 F1-Score, 99.24% specificity, and 99.24% AUC. Moreover, when compared with current studies in the literature, the proposed model was observed to exhibit highly successful performance. The codes and details of the experimental studies can be accessed from the relevant Github page: (https://github.com/GaffariCelik/CovidCoughNet).


Assuntos
COVID-19 , Tosse , Humanos , Tosse/diagnóstico , COVID-19/diagnóstico , Redes Neurais de Computação
9.
Signal Image Video Process ; 17(5): 1785-1792, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36408330

RESUMO

This work investigates the significance of the voiced and unvoiced region for detecting common cold from the speech signal. In literature, the entire speech signal is processed to detect the common cold and other diseases. This study uses a short-time energy-based approach to segment the voiced and unvoiced region of the speech signal. Then, frame-wise mel frequency cepstral coefficients (MFCC) features are extracted from the voiced and unvoiced segments of each speech utterance, and statistics (mean, variance, skewness, and kurtosis) are calculated to get the feature vector for each speech utterance. The support vector machine (SVM) is utilized to analyze the performance of features extracted from the voiced and unvoiced region. Result shows that the feature extracted from voiced segments, unvoiced segments, and complete active speech (CAS) gives almost similar results using the MFCC features and SVM classifier. Therefore, rather than processing the CAS, we can process the unvoiced speech segments, which have fewer frames compared to CAS and voiced regions of speech. The processing of solely unvoiced segments can reduce the time and computation complexity of a speech signal-based common cold detection system.

10.
Int J Lang Commun Disord ; 58(2): 279-294, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36117378

RESUMO

BACKGROUND: Auditory-perceptual assessment of voice is a subjective procedure. Artificial intelligence with deep learning (DL) may improve the consistency and accessibility of this task. It is unclear how a DL model performs on different acoustic features. AIMS: To develop a generalizable DL framework for identifying dysphonia using a multidimensional acoustic feature. METHODS & PROCEDURES: Recordings of sustained phonations of /a/ and /i/ were retrospectively collected from a clinical database. Subjects contained 238 dysphonic and 223 vocally healthy speakers of Chinese Mandarin. All audio clips were split into multiple 1.5-s segments and normalized to the same loudness level. Mel frequency cepstral coefficients and mel-spectrogram were extracted from these standardized segments. Each set of features was used in a convolutional neural network (CNN) to perform a binary classification task. The best feature was obtained through a five-fold cross-validation on a random selection of 80% data. The resultant DL framework was tested on the remaining 20% data and a public German voice database. The performance of the DL framework was compared with those of two baseline machine-learning models. OUTCOMES & RESULTS: The mel-spectrogram yielded the best model performance, with a mean area under the receiver operating characteristic curve of 0.972 and an accuracy of 92% in classifying audio segments. The resultant DL framework significantly outperformed both baseline models in detecting dysphonic subjects on both test sets. The best outcomes were achieved when classifications were made based on all segments of both vowels, with 95% accuracy, 92% recall, 98% precision and 98% specificity on the Chinese test set, and 92%, 95%, 90% and 89%, respectively, on the German set. CONCLUSIONS & IMPLICATIONS: This study demonstrates the feasibility of DL for automatic detection of dysphonia. The mel-spectrogram is a preferred acoustic feature for the task. This framework may be used for vocal health screening and facilitate automatic perceptual evaluation of voice in the era of big data. WHAT THIS PAPER ADDS: What is already known on this subject Auditory-perceptual assessment is the current gold standard in clinical evaluation of voice quality, but its value may be limited by the rater's reliability and accessibility. DL is a new method of artificial intelligence that can overcome these disadvantages and promote automatic voice assessment. This study explored the feasibility of a DL approach for automatic detection of dysphonia, along with a quantitative comparison of two common sets of acoustic features. What this study adds to existing knowledge A CNN model is excellent at decoding multidimensional acoustic features, outperforming the baseline parameter-based models in identifying dysphonic voices. The first 13 mel-frequency cepstral coefficients (MFCCs) are sufficient for this task. The mel-spectrogram results in greater performance, indicating the acoustic features are presented in a more favourable way than the MFCCs to the CNN model. What are the potential or actual clinical implications of this work? DL is a feasible method for the detection of dysphonia. The current DL framework may be used for remote vocal health screening or documenting voice recovery after treatment. In future, DL models may potentially be used to perform auditory-perceptual tasks in an automatic, efficient, reliable and low-cost manner.


Assuntos
Aprendizado Profundo , Disfonia , Humanos , Disfonia/diagnóstico , Acústica da Fala , Estudos Retrospectivos , Inteligência Artificial , Reprodutibilidade dos Testes , Medida da Produção da Fala/métodos , Acústica
11.
Journal of Biomedical Engineering ; (6): 1152-1159, 2023.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-1008945

RESUMO

Feature extraction methods and classifier selection are two critical steps in heart sound classification. To capture the pathological features of heart sound signals, this paper introduces a feature extraction method that combines mel-frequency cepstral coefficients (MFCC) and power spectral density (PSD). Unlike conventional classifiers, the adaptive neuro-fuzzy inference system (ANFIS) was chosen as the classifier for this study. In terms of experimental design, we compared different PSDs across various time intervals and frequency ranges, selecting the characteristics with the most effective classification outcomes. We compared four statistical properties, including mean PSD, standard deviation PSD, variance PSD, and median PSD. Through experimental comparisons, we found that combining the features of median PSD and MFCC with heart sound systolic period of 100-300 Hz yielded the best results. The accuracy, precision, sensitivity, specificity, and F1 score were determined to be 96.50%, 99.27%, 93.35%, 99.60%, and 96.35%, respectively. These results demonstrate the algorithm's significant potential for aiding in the diagnosis of congenital heart disease.


Assuntos
Humanos , Ruídos Cardíacos , Redes Neurais de Computação , Algoritmos , Cardiopatias Congênitas
12.
PeerJ Comput Sci ; 9: e1740, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38192463

RESUMO

Nowadays, biometric authentication has gained relevance due to the technological advances that have allowed its inclusion in many daily-use devices. However, this same advantage has also brought dangers, as spoofing attacks are now more common. This work addresses the vulnerabilities of automatic speaker verification authentication systems, which are prone to attacks arising from new techniques for the generation of spoofed audio. In this article, we present a countermeasure for these attacks using an approach that includes easy to implement feature extractors such as spectrograms and mel frequency cepstral coefficients, as well as a modular architecture based on deep neural networks. Finally, we evaluate our proposal using the well-know ASVspoof 2017 V2 database, the experiments show that using the final architecture the best performance is obtained, achieving an equal error rate of 6.66% on the evaluation set.

13.
Sensors (Basel) ; 22(23)2022 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-36501869

RESUMO

Gun violence has been on the rise in recent years. To help curb the downward spiral of this negative influence in communities, machine learning strategies on gunshot detection can be developed and deployed. After outlining the procedure by which a typical type of gunshot-like sounds were measured, this paper focuses on the analysis of feature importance pertaining to gunshot and gunshot-like sounds. The random forest mean decrease in impurity and the SHapley Additive exPlanations feature importance analysis were employed for this task. From the feature importance analysis, feature reduction was then carried out. Via the Mel-frequency cepstral coefficients feature extraction process on 1-sec audio clips, these extracted features were then reduced to a more manageable quantity using the above-mentioned feature reduction processes. These reduced features were sent to a random forest classifier. The SHapley Additive exPlanations feature importance output was compared to that of the mean decrease in impurity feature importance. The results show what Mel-frequency cepstral coefficients features are important in discriminating gunshot sounds and various gunshot-like sounds. Together with the feature importance/reduction processes, the recent uniform manifold approximation and projection method was used to compare the closeness of various gunshot-like sounds to gunshot sounds in the feature space. Finally, the approach presented in this paper provides people with a viable means to make gunshot sounds more discernible from other sounds.


Assuntos
Aprendizado de Máquina , Som , Humanos
14.
Comput Biol Med ; 150: 106123, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36228465

RESUMO

The recent investigation has started for evaluating the human respiratory sounds, like voice recorded, cough, and breathing from hospital confirmed Covid-19 tools, which differs from healthy person's sound. The cough-based detection of Covid-19 also considered with non-respiratory and respiratory sounds data related with all declared situations. Covid-19 is respiratory disease, which is usually produced by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). However, it is more indispensable to detect the positive cases for reducing further spread of virus, and former treatment of affected patients. With constant rise in the COVID-19 cases, there has been a constant rise in the need of efficient and safe ways to detect an infected individual. With the cases multiplying constantly, the current detecting devices like RT-PCR and fast testing kits have become short in supply. An effectual Covid-19 detection model using devised hybrid Honey Badger Optimization-based Deep Neuro Fuzzy Network (HBO-DNFN) is developed in this paper. Here, the audio signal is considered as input for detecting Covid-19. The gaussian filter is applied to input signal for removing the noises and then feature extraction is performed. The substantial features, like spectral roll-off, spectral bandwidth, Mel frequency cepstral coefficients (MFCC), spectral flatness, zero crossing rate, spectral centroid, mean square energy and spectral contract are extracted for further processing. Finally, DNFN is applied for detecting Covid-19 and the deep leaning model is trained by designed hybrid HBO algorithm. Accordingly, the developed Hybrid HBO method is newly designed by incorporating Honey Badger optimization Algorithm (HBA) and Jaya algorithm. The performance of developed Covid-19 detection model is evaluated using three metrics, like testing accuracy, sensitivity and specificity. The developed Hybrid HBO-based DNFN is outpaced than other existing approaches in terms of testing accuracy, sensitivity and specificity of "0.9176, 0.9218 and 0. 9219". All the test results are validated with the k-fold cross validation method in order to make an assessment of the generalizability of these results. When k-fold value is 9, sensitivity of existing techniques and developed JHBO-based DNFN is 0.8982, 0.8816, 0.8938, and 0.9207. The sensitivity of developed approach is improved by means of gaussian filtering model. The specificity of DCNN is 0.9125, BI-AT-GRU is 0.8926, and XGBoost is 0.9014, while developed JHBO-based DNFN is 0.9219 in k-fold value 9.


Assuntos
COVID-19 , Aprendizado Profundo , Mustelidae , Humanos , Animais , COVID-19/diagnóstico , SARS-CoV-2 , Tosse , Sons Respiratórios
15.
Cognit Comput ; : 1-16, 2022 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-36247809

RESUMO

COVID-19 (coronavirus disease 2019) is an ongoing global pandemic caused by severe acute respiratory syndrome coronavirus 2. Recently, it has been demonstrated that the voice data of the respiratory system (i.e., speech, sneezing, coughing, and breathing) can be processed via machine learning (ML) algorithms to detect respiratory system diseases, including COVID-19. Consequently, many researchers have applied various ML algorithms to detect COVID-19 by using voice data from the respiratory system. However, most of the recent COVID-19 detection systems have worked on a limited dataset. In other words, the systems utilize cough and breath voices only and ignore the voices of the other respiratory system, such as speech and vowels. In addition, another issue that should be considered in COVID-19 detection systems is the classification accuracy of the algorithm. The particle swarm optimization-extreme learning machine (PSO-ELM) is an ML algorithm that can be considered an accurate and fast algorithm in the process of classification. Therefore, this study proposes a COVID-19 detection system by utilizing the PSO-ELM as a classifier and mel frequency cepstral coefficients (MFCCs) for feature extraction. In this study, respiratory system voice samples were taken from the Corona Hack Respiratory Sound Dataset (CHRSD). The proposed system involves thirteen different scenarios: breath deep, breath shallow, all breath, cough heavy, cough shallow, all cough, count fast, count normal, all count, vowel a, vowel e, vowel o, and all vowels. The experimental results demonstrated that the PSO-ELM was capable of attaining the highest accuracy, reaching 95.83%, 91.67%, 89.13%, 96.43%, 92.86%, 88.89%, 96.15%, 96.43%, 88.46%, 96.15%, 96.15%, 95.83%, and 82.89% for breath deep, breath shallow, all breath, cough heavy, cough shallow, all cough, count fast, count normal, all count, vowel a, vowel e, vowel o, and all vowel scenarios, respectively. The PSO-ELM is an efficient technique for the detection of COVID-19 utilizing voice data from the respiratory system.

16.
Sensors (Basel) ; 22(18)2022 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-36146316

RESUMO

Aphasia is a type of speech disorder that can cause speech defects in a person. Identifying the severity level of the aphasia patient is critical for the rehabilitation process. In this research, we identify ten aphasia severity levels motivated by specific speech therapies based on the presence or absence of identified characteristics in aphasic speech in order to give more specific treatment to the patient. In the aphasia severity level classification process, we experiment on different speech feature extraction techniques, lengths of input audio samples, and machine learning classifiers toward classification performance. Aphasic speech is required to be sensed by an audio sensor and then recorded and divided into audio frames and passed through an audio feature extractor before feeding into the machine learning classifier. According to the results, the mel frequency cepstral coefficient (MFCC) is the most suitable audio feature extraction method for the aphasic speech level classification process, as it outperformed the classification performance of all mel-spectrogram, chroma, and zero crossing rates by a large margin. Furthermore, the classification performance is higher when 20 s audio samples are used compared with 10 s chunks, even though the performance gap is narrow. Finally, the deep neural network approach resulted in the best classification performance, which was slightly better than both K-nearest neighbor (KNN) and random forest classifiers, and it was significantly better than decision tree algorithms. Therefore, the study shows that aphasia level classification can be completed with accuracy, precision, recall, and F1-score values of 0.99 using MFCC for 20 s audio samples using the deep neural network approach in order to recommend corresponding speech therapy for the identified level. A web application was developed for English-speaking aphasia patients to self-diagnose the severity level and engage in speech therapies.


Assuntos
Afasia , Fala , Afasia/diagnóstico , Afasia/terapia , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Fonoterapia
17.
Knowl Based Syst ; 253: 109539, 2022 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-35915642

RESUMO

Alongside the currently used nasal swab testing, the COVID-19 pandemic situation would gain noticeable advantages from low-cost tests that are available at any-time, anywhere, at a large-scale, and with real time answers. A novel approach for COVID-19 assessment is adopted here, discriminating negative subjects versus positive or recovered subjects. The scope is to identify potential discriminating features, highlight mid and short-term effects of COVID on the voice and compare two custom algorithms. A pool of 310 subjects took part in the study; recordings were collected in a low-noise, controlled setting employing three different vocal tasks. Binary classifications followed, using two different custom algorithms. The first was based on the coupling of boosting and bagging, with an AdaBoost classifier using Random Forest learners. A feature selection process was employed for the training, identifying a subset of features acting as clinically relevant biomarkers. The other approach was centered on two custom CNN architectures applied to mel-Spectrograms, with a custom knowledge-based data augmentation. Performances, evaluated on an independent test set, were comparable: Adaboost and CNN differentiated COVID-19 positive from negative with accuracies of 100% and 95% respectively, and recovered from negative individuals with accuracies of 86.1% and 75% respectively. This study highlights the possibility to identify COVID-19 positive subjects, foreseeing a tool for on-site screening, while also considering recovered subjects and the effects of COVID-19 on the voice. The two proposed novel architectures allow for the identification of biomarkers and demonstrate the ongoing relevance of traditional ML versus deep learning in speech analysis.

18.
Sensors (Basel) ; 22(11)2022 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-35684884

RESUMO

With conventional stethoscopes, the auscultation results may vary from one doctor to another due to a decline in his/her hearing ability with age or his/her different professional training, and the problematic cardiopulmonary sound cannot be recorded for analysis. In this paper, to resolve the above-mentioned issues, an electronic stethoscope was developed consisting of a traditional stethoscope with a condenser microphone embedded in the head to collect cardiopulmonary sounds and an AI-based classifier for cardiopulmonary sounds was proposed. Different deployments of the microphone in the stethoscope head with amplification and filter circuits were explored and analyzed using fast Fourier transform (FFT) to evaluate the effects of noise reduction. After testing, the microphone placed in the stethoscope head surrounded by cork is found to have better noise reduction. For classifying normal (healthy) and abnormal (pathological) cardiopulmonary sounds, each sample of cardiopulmonary sound is first segmented into several small frames and then a principal component analysis is performed on each small frame. The difference signal is obtained by subtracting PCA from the original signal. MFCC (Mel-frequency cepstral coefficients) and statistics are used for feature extraction based on the difference signal, and ensemble learning is used as the classifier. The final results are determined by voting based on the classification results of each small frame. After the testing, two distinct classifiers, one for heart sounds and one for lung sounds, are proposed. The best voting for heart sounds falls at 5-45% and the best voting for lung sounds falls at 5-65%. The best accuracy of 86.9%, sensitivity of 81.9%, specificity of 91.8%, and F1 score of 86.1% are obtained for heart sounds using 2 s frame segmentation with a 20% overlap, whereas the best accuracy of 73.3%, sensitivity of 66.7%, specificity of 80%, and F1 score of 71.5% are yielded for lung sounds using 5 s frame segmentation with a 50% overlap.


Assuntos
Estetoscópios , Algoritmos , Auscultação , Eletrônica , Feminino , Humanos , Masculino , Sons Respiratórios , Processamento de Sinais Assistido por Computador
19.
Int J Neural Syst ; 32(6): 2250024, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35575003

RESUMO

In recent years, speech emotion recognition (SER) has emerged as one of the most active human-machine interaction research areas. Innovative electronic devices, services and applications are increasingly aiming to check the user emotional state either to issue alerts under some predefined conditions or to adapt the system responses to the user emotions. Voice expression is a very rich and noninvasive source of information for emotion assessment. This paper presents a novel SER approach based on that is a hybrid of a time-distributed convolutional neural network (TD-CNN) and a long short-term memory (LSTM) network. Mel-frequency log-power spectrograms (MFLPSs) extracted from audio recordings are parsed by a sliding window that selects the input for the TD-CNN. The TD-CNN transforms the input image data into a sequence of high-level features that are feed to the LSTM, which carries out the overall signal interpretation. In order to reduce overfitting, the MFLPS representation allows innovative image data augmentation techniques that have no immediate equivalent on the original audio signal. Validation of the proposed hybrid architecture achieves an average recognition accuracy of 73.98% on the most widely and hardest publicly distributed database for SER benchmarking. A permutation test confirms that this result is significantly different from random classification ([Formula: see text]). The proposed architecture outperforms state-of-the-art deep learning models as well as conventional machine learning techniques evaluated on the same database trying to identify the same number of emotions.


Assuntos
Emoções , Fala , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Percepção
20.
Multimed Tools Appl ; 81(27): 39185-39205, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35505670

RESUMO

Every respiratory-related checkup includes audio samples collected from the individual, collected through different tools (sonograph, stethoscope). This audio is analyzed to identify pathology, which requires time and effort. The research work proposed in this paper aims at easing the task with deep learning by the diagnosis of lung-related pathologies using Convolutional Neural Network (CNN) with the help of transformed features from the audio samples. International Conference on Biomedical and Health Informatics (ICBHI) corpus dataset was used for lung sound. Here a novel approach is proposed to pre-process the data and pass it through a newly proposed CNN architecture. The combination of pre-processing steps MFCC, Melspectrogram, and Chroma CENS with CNN improvise the performance of the proposed system, which helps to make an accurate diagnosis of lung sounds. The comparative analysis shows how the proposed approach performs better with previous state-of-the-art research approaches. It also shows that there is no need for a wheeze or a crackle to be present in the lung sound to carry out the classification of respiratory pathologies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...