Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 4.505
Filter
1.
Neuroscience ; 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38992565

ABSTRACT

The neuroimaging mechanisms underlying differences in the outcomes of sound therapy for tinnitus patients remain unclear. We hypothesize that abnormal hierarchical architecture is the neuro-biomarker for treatment outcome explanation. We conducted functional connectome gradient analyses on resting-state functional MRI images that acquired before intervention to investigate differences among the patients with effective treatment (ET, n = 27), ineffective treatment (IT, n = 41), and healthy controls (HC, n = 59). General linear models were used to analyze the associations between intergroup differential regions and clinical characteristics. Partial least squares regression was employed to reveal correlations with gene expression. Compared to HC, both ET and IT groups displayed significant differences in the default mode network. Moreover, the ET group exhibited wider gradient range and greater gradient variance. Also, the gradient scores of the differential regions between the ET and HC groups were significantly correlated with Self-rating Anxiety Scale and Self-rating Depression Scale scores, and exhibited positive correlations with the transcriptional profiles of genes related to depression and anxiety. Our results indicated that the abnormalities of ET group, may be more relevant to psychiatric disorders, bringing a higher possible therapeutic potential due to the plasticity of the nervous system. Connectome gradient dysfunction with genetic evidence may serve as an indicator for identifying diverse treatment outcomes of the sound therapy for tinnitus patients before treatment.

2.
Front Cell Neurosci ; 18: 1414484, 2024.
Article in English | MEDLINE | ID: mdl-38962512

ABSTRACT

Acetylcholine (ACh) is a prevalent neurotransmitter throughout the nervous system. In the brain, ACh is widely regarded as a potent neuromodulator. In neurons, ACh signals are conferred through a variety of receptors that influence a broad range of neurophysiological phenomena such as transmitter release or membrane excitability. In sensory circuitry, ACh modifies neural responses to stimuli and coordinates the activity of neurons across multiple levels of processing. These factors enable individual neurons or entire circuits to rapidly adapt to the dynamics of complex sensory stimuli, underscoring an essential role for ACh in sensory processing. In the auditory system, histological evidence shows that acetylcholine receptors (AChRs) are expressed at virtually every level of the ascending auditory pathway. Despite its apparent ubiquity in auditory circuitry, investigation of the roles of this cholinergic network has been mainly focused on the inner ear or forebrain structures, while less attention has been directed at regions between the cochlear nuclei and midbrain. In this review, we highlight what is known about cholinergic function throughout the auditory system from the ear to the cortex, but with a particular emphasis on brainstem and midbrain auditory centers. We will focus on receptor expression, mechanisms of modulation, and the functional implications of ACh for sound processing, with the broad goal of providing an overview of a newly emerging view of impactful cholinergic modulation throughout the auditory pathway.

3.
Clin Linguist Phon ; : 1-17, 2024 Jul 04.
Article in English | MEDLINE | ID: mdl-38965836

ABSTRACT

A small body of research and reports from educational and clinical practice suggest that teaching literacy skills may facilitate the development of speech sound production in students with intellectual disabilities (ID). However, intervention research is needed to test the potential connection. This study aimed to investigate whether twelve weeks of systematic, digital literacy intervention enhanced speech sound production in students with ID and communication difficulties. A sample of 121 students with ID were assigned to four different groups: phonics-based, comprehension-based, a combination with both phonics- and comprehension-based intervention and a comparison group with teaching-as-usual. Speech sound production was assessed before and after the intervention. The results on the data without the imputed variable suggested a significant positive effect of systematic, digital literacy interventions on speech sound production. However, results from sensitivity analyses with imputed missing data was more ambiguous, with the effect only approaching significance (ps = .05-.07) for one of the interventions. Nonetheless, we tentatively suggest that systematic, digital literacy intervention could support speech development in students with ID and communication difficulties. Future research should be done to confirm and further elucidate the functional mechanisms of this link, so that we may have a better understanding and can improve instruction and the pivotal abilities of speech and reading.

4.
J Neurosci Methods ; 409: 110213, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38964476

ABSTRACT

BACKGROUND: Diagnosis and severity assessment of tinnitus are mostly based on the patient's descriptions and subjective questionnaires, which lacks objective means of diagnosis and assessment bases, the accuracy of which fluctuates with the clarity of the patient's description. This complicates the timely modification of treatment strategies or therapeutic music to improve treatment efficacy. NEW METHOD: We employed a novel random convolutional kernel-based method for electrocardiogram (ECG) signal analysis to identify patients' emotional states during Music Tinnitus Sound Therapy (Music-TST) sessions. Then analyzed correlations between emotional changes in different treatment phase and Tinnitus Handicap Inventory (THI) score differences to determine the impact of emotions on tinnitus treatment efficacy. RESULTS: This study revealed a significant correlation between patients' emotion changes during Music-TST and the therapy's effectiveness. Changes in arousal and dominance dimension, were strongly linked to THI variations. These findings highlight the substantial impact of emotional responses on sound therapy's efficacy, offering a new perspective for understanding and optimizing tinnitus treatment. COMPARISON WITH EXISTING METHODS: Compared to existing methods, we proposed an objective indicator to assess the progress of sound therapy, the indicator could also be used to provide feedback to optimize sound therapy music. CONCLUSIONS: This study revealed the critical role of emotion changes in tinnitus sound therapy. By integrating objective ECG-based emotion analysis with traditional subjective scale like THI, we present an innovative approach to assess and potentially optimize therapy effectiveness. This finding could lead to more personalized and effective treatment strategies for tinnitus sound therapy.

5.
J Audiol Otol ; 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38946331

ABSTRACT

Background and Objectives: : Localization of a sound source in the horizontal plane depends on the listener's interaural comparison of arrival time and level. Hearing loss (HL) can reduce access to these binaural cues, possibly disrupting the localization and memory of spatial information. Thus, this study aimed to investigate the horizontal sound localization performance and the spatial short-term memory in listeners with actual and simulated HL. Subjects and Methods: : Seventeen listeners with bilateral symmetric HL and 17 listeners with normal hearing (NH) participated in the study. The hearing thresholds of NH listeners were elevated by a spectrally shaped masking noise for the simulations of unilateral hearing loss (UHL) and bilateral hearing loss (BHL). The localization accuracy and errors as well as the spatial short-term memory span were measured in the free field using a set of 11 loudspeakers arrayed over a 150° arc. Results: : The localization abilities and spatial short-term memory span did not significantly differ between actual BHL listeners and BHL-simulated NH listeners. Overall, the localization performance with the UHL simulation was approximately twofold worse than that with the BHL simulation, and the hearing asymmetry led to a detrimental effect on spatial memory. The mean localization score as a function of stimulus location in the UHL simulation was less than 30% even for the front (0° azimuth) stimuli and much worse on the side closer to the simulated ear. In the UHL simulation, the localization responses were biased toward the side of the intact ear even when sounds were coming from the front. Conclusions: : Hearing asymmetry induced by the UHL simulation substantially disrupted the localization performance and recall abilities of spatial positions encoded and stored in the memory, due to fewer chances to learn strategies to improve localization. The marked effect of hearing asymmetry on sound localization highlights the need for clinical assessments of spatial hearing in addition to conventional hearing tests.

6.
Clin Exp Dent Res ; 10(4): e917, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38973208

ABSTRACT

OBJECTIVES: To determine the correlation between the primary implant stability quotient and the implant percussion sound frequency. MATERIALS AND METHODS: A total of 14 pigs' ribs were scanned using a dental cone beam computed tomography (CBCT) scanner to classify the bone specimens into three distinct bone density Hounsfield units (HU) value categories: D1 bone: >1250 HU; D2: 850-1250 HU; D3: <850 HU. Then, 96 implants were inserted: 32 implants in D1 bone, 32 implants in D2 bone, and 32 implants in D3 bone. The primary implant stability quotient (ISQ) was analyzed, and percussion sound was recorded using a wireless microphone connected and analyzed with frequency analysis software. RESULTS: Statistically significant positive correlations were found between the primary ISQ and the bone density HU value (r = 0.719; p < 0.001), and statistically significant positive correlations between the primary ISQ and the percussion sound frequency (r = 0.606; p < 0.001). Furthermore, significant differences in primary ISQ values and percussion sound frequency were found between D1 and D2 bone, as well as between D1 and D3 bone. However, no significant differences were found in primary ISQ values and percussion sound frequency between D2 and D3 bone. CONCLUSION: The primary ISQ value and the percussion sound frequency are positively correlated.


Subject(s)
Bone Density , Cone-Beam Computed Tomography , Dental Implants , Percussion , Animals , Swine , Percussion/instrumentation , Bone Density/physiology , Sound , Ribs/surgery , Dental Implantation, Endosseous/methods , Dental Implantation, Endosseous/instrumentation , Dental Prosthesis Retention
7.
Phys Eng Sci Med ; 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38954378

ABSTRACT

The study presents a novel technique for lung auscultation based on graph theory, emphasizing the potential of graph parameters in distinguishing lung sounds and supporting earlier detection of various respiratory pathologies. The frequency spread and the component magnitudes are revealed from the analysis of eighty-five bronchial (BS) and pleural rub (PS) lung sounds employing the power spectral density (PSD) plot and wavelet scalogram. The low-frequency spread, and persistence of the high-intensity frequency components are visible in BS sounds emanating from the uniform cross-sectional area of the trachea. The frictional rub between the pleurae causes a higher frequency spread of low-intensity intermittent frequency components in PS signals. From the complex networks of BS and PS, the extracted graph features are - graph density ([Formula: see text], transitivity ([Formula: see text], degree centrality ([Formula: see text]), betweenness centrality ([Formula: see text], eigenvector centrality ([Formula: see text]), and graph entropy (En). The high values of [Formula: see text] and [Formula: see text] show a strong correlation between distinct segments of the BS signal originating from a consistent cross-sectional tracheal diameter and, hence, the generation of high-intense low-spread frequency components. An intermittent low-intense and a relatively greater frequency spread in PS signal appear as high [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] values. With these complex network parameters as input attributes, the supervised machine learning techniques- discriminant analyses, support vector machines, k-nearest neighbors, and neural network pattern recognition (PRNN)- classify the signals with more than 90% accuracy, with PRNN having 25 neurons in the hidden layer achieving the highest (98.82%).

8.
JMIR AI ; 3: e51118, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38985504

ABSTRACT

BACKGROUND: Abdominal auscultation (i.e., listening to bowel sounds (BSs)) can be used to analyze digestion. An automated retrieval of BS would be beneficial to assess gastrointestinal disorders noninvasively. OBJECTIVE: This study aims to develop a multiscale spotting model to detect BSs in continuous audio data from a wearable monitoring system. METHODS: We designed a spotting model based on the Efficient-U-Net (EffUNet) architecture to analyze 10-second audio segments at a time and spot BSs with a temporal resolution of 25 ms. Evaluation data were collected across different digestive phases from 18 healthy participants and 9 patients with inflammatory bowel disease (IBD). Audio data were recorded in a daytime setting with a smart T-Shirt that embeds digital microphones. The data set was annotated by independent raters with substantial agreement (Cohen κ between 0.70 and 0.75), resulting in 136 hours of labeled data. In total, 11,482 BSs were analyzed, with a BS duration ranging between 18 ms and 6.3 seconds. The share of BSs in the data set (BS ratio) was 0.0089. We analyzed the performance depending on noise level, BS duration, and BS event rate. We also report spotting timing errors. RESULTS: Leave-one-participant-out cross-validation of BS event spotting yielded a median F1-score of 0.73 for both healthy volunteers and patients with IBD. EffUNet detected BSs under different noise conditions with 0.73 recall and 0.72 precision. In particular, for a signal-to-noise ratio over 4 dB, more than 83% of BSs were recognized, with precision of 0.77 or more. EffUNet recall dropped below 0.60 for BS duration of 1.5 seconds or less. At a BS ratio greater than 0.05, the precision of our model was over 0.83. For both healthy participants and patients with IBD, insertion and deletion timing errors were the largest, with a total of 15.54 minutes of insertion errors and 13.08 minutes of deletion errors over the total audio data set. On our data set, EffUNet outperformed existing BS spotting models that provide similar temporal resolution. CONCLUSIONS: The EffUNet spotter is robust against background noise and can retrieve BSs with varying duration. EffUNet outperforms previous BS detection approaches in unmodified audio data, containing highly sparse BS events.

9.
J Pediatr ; : 114185, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38986929
10.
J Exp Biol ; 2024 Jul 11.
Article in English | MEDLINE | ID: mdl-38989535

ABSTRACT

The ability to communicate through vocalization plays a key role in the survival of animals across all vertebrate groups. While avian reptiles have received much attention relating to their stunning sound repertoire, non-avian reptiles have been wrongfully assumed to have less elaborate vocalization types and little is known about the biomechanics of sound production and their underlying neural pathways. We investigated alarm calls of Gekko gecko using audio and cineradiographic recordings of their alarm calls. Acoustic analysis revealed three distinct call types: a sinusoidal call type (type 1), a train-like call type, characterized by distinct pulse trains (type 3), and an intermediary type, which showed both sinusoidal and pulse train components (type 2). Kinematic analysis of cineradiographic recordings showed that laryngeal movements differ significantly between respiratory and vocal behavior: during respiration, animals repeatedly moved their jaws to partially open their mouths, which was accompanied by small glottal movements. During vocalization, the glottis was pulled back, contrasting with what has previously been reported. In-vitro retrograde tracing of the nerve innervating the laryngeal constrictor and dilator muscles revealed round to fusiform motoneurons in the hindbrain-spinal cord transition ipsilateral to the labeled nerve. Taken together, our observations provide insight into the alarm calls generated by G. gecko, the biomechanics of this sound generation and the underlying organization of motoneurons involved in the generation of vocalizations. Our observations suggest that G. gecko may be an excellent non-avian reptile model organism for enhancing our understanding of the evolution of vertebrate vocalization.

11.
Comput Biol Med ; 178: 108698, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38861896

ABSTRACT

The auscultation is a non-invasive and cost-effective method used for the diagnosis of lung diseases, which are one of the leading causes of death worldwide. However, the efficacy of the auscultation suffers from the limitations of the analog stethoscopes and the subjective nature of human interpretation. To overcome these limitations, the accurate diagnosis of these diseases by employing the computer based automated algorithms applied to the digitized lung sounds has been studied for the last decades. This study proposes a novel approach that uses a Tunable Q-factor Wavelet Transform (TQWT) based statistical feature extraction followed by individual and ensemble learning model training with the aim of lung disease classification. During the learning stage various machine learning algorithms are utilized as the individual learners as well as the hard and soft voting fusion approaches are employed for performance enhancement with the aid of the predictions of individual models. For an objective evaluation of the proposed approach, the study was structured into two main tasks that were investigated in detail by using several sub-tasks to comparison with state-of-the-art studies. Among the sub-tasks which investigates patient-based classification, the highest accuracy obtained for the binary classification was achieved as 97.63% (healthy vs. non-healthy), while accuracy values up to 66.32% for three-class classification (obstructive-related, restrictive-related, and healthy), and 53.42% for five-class classification (asthma, chronic obstructive pulmonary disease, interstitial lung disease, pulmonary infection, and healthy) were obtained. Regarding the other sub-task, which investigates sample-based classification, the proposed approach was superior to almost all previous findings. The proposed method underscores the potential of TQWT based signal decomposition that leverages the power of its adaptive time-frequency resolution property satisfied by Q-factor adjustability. The obtained results are very promising and the proposed approach paves the way for more accurate and automated digital auscultation techniques in clinical settings.

12.
Article in English | MEDLINE | ID: mdl-38862745

ABSTRACT

PURPOSE: Even though workflow analysis in the operating room has come a long way, current systems are still limited to research. In the quest for a robust, universal setup, hardly any attention has been given to the dimension of audio despite its numerous advantages, such as low costs, location, and sight independence, or little required processing power. METHODOLOGY: We present an approach for audio-based event detection that solely relies on two microphones capturing the sound in the operating room. Therefore, a new data set was created with over 63 h of audio recorded and annotated at the University Hospital rechts der Isar. Sound files were labeled, preprocessed, augmented, and subsequently converted to log-mel-spectrograms that served as a visual input for an event classification using pretrained convolutional neural networks. RESULTS: Comparing multiple architectures, we were able to show that even lightweight models, such as MobileNet, can already provide promising results. Data augmentation additionally improved the classification of 11 defined classes, including inter alia different types of coagulation, operating table movements as well as an idle class. With the newly created audio data set, an overall accuracy of 90%, a precision of 91% and a F1-score of 91% were achieved, demonstrating the feasibility of an audio-based event recognition in the operating room. CONCLUSION: With this first proof of concept, we demonstrated that audio events can serve as a meaningful source of information that goes beyond spoken language and can easily be integrated into future workflow recognition pipelines using computational inexpensive architectures.

13.
Cogn Emot ; : 1-14, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38863208

ABSTRACT

The auditory gaze cueing effect (auditory-GCE) is a faster response to auditory targets at an eye-gaze cue location than at a non-cue location. Previous research has found that auditory-GCE can be influenced by the integration of both gaze direction and emotion conveyed through facial expressions. However, it is unclear whether the emotional information of auditory targets can be cross-modally integrated with gaze direction to affect auditory-GCE. Here, we set neutral faces with different gaze directions as cues and three emotional sounds (fearful, happy, and neutral) as targets to investigate how the emotion of sound target modulates the auditory-GCE. Moreover, we conducted a controlled experiment using arrow cues. The results show that the emotional content of sound targets influences the auditory-GCE but only for those induced by facial cues. Specifically, fearful sounds elicit a significantly larger auditory-GCE compared to happy and neutral sounds, indicating that the emotional content of auditory targets plays a modulating role in the auditory-GCE. Furthermore, this modulation appears to occur only at a higher level of social meaning, involving the integration of emotional information from a sound with social gaze direction, rather than at a lower level, which involves the integration of direction and auditory emotion.

14.
Acta Paediatr ; 2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38884542

ABSTRACT

AIM: This initial Norwegian study aimed to quantify the vibrations and sounds experienced by neonates when they were transported by helicopter in an incubator. METHODS: Two neonatal manikins weighing 500 and 2000 g were placed in a transport incubator and transported in an Airbus H145 D3 helicopter during standard flight profiles. The vibrations were measured on the mattress inside the incubator and the sound levels were measured inside and outside the incubator. RESULTS: The highest vibration levels were recorded during standard flight profiles when the lighter manikin was used. These ranged 0.27-0.94 m/s2, compared to 0.27-0.76 m/s2 for the heavier manikin. The measurements exceeded the action levels set by the European Union Vibration Directive for adult work environments. The sound levels inside the incubator ranged 84.6-86.3 A-weighted decibels, with a C-weighted peak level of 122 decibels. The sound levels inside the incubator were approximately 10 decibels lower than outside, but amplification was observed in the incubator at frequencies below 160 Hz. CONCLUSION: Vibrations were highest for the lighter manikin. The sound levels during helicopter transport were higher than recommended for neonatal environments and sounds were amplified within the incubator at lower frequencies.

15.
J Ultrasound Med ; 2024 Jun 14.
Article in English | MEDLINE | ID: mdl-38873702

ABSTRACT

OBJECTIVES: To develop a robust algorithm for estimating ultrasonic axial transmission velocity from neonatal tibial bone, and to investigate the relationships between ultrasound velocity and neonatal anthropometric measurements as well as clinical biochemical markers of skeletal health. METHODS: This study presents an unsupervised learning approach for the automatic detection of first arrival time and estimation of ultrasonic velocity from axial transmission waveforms, which potentially indicates bone quality. The proposed method combines the ReliefF algorithm and fuzzy C-means clustering. It was first validated using an in vitro dataset measured from a Sawbones phantom. It was subsequently applied on in vivo signals collected from 40 infants, comprising 21 males and 19 females. The extracted neonatal ultrasonic velocity was subjected to statistical analysis to explore correlations with the infants' anthropometric features and biochemical indicators. RESULTS: The results of in vivo data analysis revealed significant correlations between the extracted ultrasonic velocity and the neonatal anthropometric measurements and biochemical markers. The velocity of first arrival signals showed good associations with body weight (ρ = 0.583, P value <.001), body length (ρ = 0.583, P value <.001), and gestational age (ρ = 0.557, P value <.001). CONCLUSION: These findings suggest that fuzzy C-means clustering is highly effective in extracting ultrasonic propagating velocity in bone and reliably applicable in in vivo measurement. This work is a preliminary study that holds promise in advancing the development of a standardized ultrasonic tool for assessing neonatal bone health. Such advancements are crucial in the accurate diagnosis of bone growth disorders.

16.
Sci Prog ; 107(2): 368504241262195, 2024.
Article in English | MEDLINE | ID: mdl-38872447

ABSTRACT

A vestibular schwannoma is a benign tumor; however, the schwannoma itself and interventions can cause sensorineural hearing loss. Most vestibular schwannomas are unilateral tumors that affect hearing only on one side. Attention has focused on improving the quality of life for patients with unilateral hearing loss and therapeutic interventions to address this issue have been emphasized. Herein, we encountered a patient who was a candidate for hearing preservation surgery based on preoperative findings and had nonserviceable hearing after the surgery, according to the Gardner-Robertson classification. Postoperatively, the patient had decreased listening comprehension and ability to localize sound sources. He was fitted with bilateral hearing aids, and his ability to localize sound sources improved. Although the patient had postoperative nonserviceable hearing on the affected side and age-related hearing loss on the unaffected side, hearing aids in both ears were useful for his daily life. Therefore, the patient was able to maintain a binaural hearing effect and the ability to localize the sound source improved. This report emphasizes the importance of hearing preservation with vestibular schwannomas, and the demand for hearing loss rehabilitation as a postoperative complication can increase, even if hearing loss is nonserviceable.


Subject(s)
Hearing Aids , Neuroma, Acoustic , Humans , Neuroma, Acoustic/surgery , Male , Middle Aged , Hearing Loss, Sensorineural/surgery , Hearing Loss, Sensorineural/rehabilitation , Hearing Loss, Sensorineural/etiology , Quality of Life , Hearing Loss/etiology , Hearing Loss/surgery , Hearing Loss/rehabilitation , Postoperative Complications/etiology
17.
JMIR Biomed Eng ; 9: e56246, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38875677

ABSTRACT

BACKGROUND: Vocal biomarkers, derived from acoustic analysis of vocal characteristics, offer noninvasive avenues for medical screening, diagnostics, and monitoring. Previous research demonstrated the feasibility of predicting type 2 diabetes mellitus through acoustic analysis of smartphone-recorded speech. Building upon this work, this study explores the impact of audio data compression on acoustic vocal biomarker development, which is critical for broader applicability in health care. OBJECTIVE: The objective of this research is to analyze how common audio compression algorithms (MP3, M4A, and WMA) applied by 3 different conversion tools at 2 bitrates affect features crucial for vocal biomarker detection. METHODS: The impact of audio data compression on acoustic vocal biomarker development was investigated using uncompressed voice samples converted into MP3, M4A, and WMA formats at 2 bitrates (320 and 128 kbps) with MediaHuman (MH) Audio Converter, WonderShare (WS) UniConverter, and Fast Forward Moving Picture Experts Group (FFmpeg). The data set comprised recordings from 505 participants, totaling 17,298 audio files, collected using a smartphone. Participants recorded a fixed English sentence up to 6 times daily for up to 14 days. Feature extraction, including pitch, jitter, intensity, and Mel-frequency cepstral coefficients (MFCCs), was conducted using Python and Parselmouth. The Wilcoxon signed rank test and the Bonferroni correction for multiple comparisons were used for statistical analysis. RESULTS: In this study, 36,970 audio files were initially recorded from 505 participants, with 17,298 recordings meeting the fixed sentence criteria after screening. Differences between the audio conversion software, MH, WS, and FFmpeg, were notable, impacting compression outcomes such as constant or variable bitrates. Analysis encompassed diverse data compression formats and a wide array of voice features and MFCCs. Wilcoxon signed rank tests yielded P values, with those below the Bonferroni-corrected significance level indicating significant alterations due to compression. The results indicated feature-specific impacts of compression across formats and bitrates. MH-converted files exhibited greater resilience compared to WS-converted files. Bitrate also influenced feature stability, with 38 cases affected uniquely by a single bitrate. Notably, voice features showed greater stability than MFCCs across conversion methods. CONCLUSIONS: Compression effects were found to be feature specific, with MH and FFmpeg showing greater resilience. Some features were consistently affected, emphasizing the importance of understanding feature resilience for diagnostic applications. Considering the implementation of vocal biomarkers in health care, finding features that remain consistent through compression for data storage or transmission purposes is valuable. Focused on specific features and formats, future research could broaden the scope to include diverse features, real-time compression algorithms, and various recording methods. This study enhances our understanding of audio compression's influence on voice features and MFCCs, providing insights for developing applications across fields. The research underscores the significance of feature stability in working with compressed audio data, laying a foundation for informed voice data use in evolving technological landscapes.

18.
JMIR Biomed Eng ; 9: e56245, 2024 Mar 21.
Article in English | MEDLINE | ID: mdl-38875685

ABSTRACT

BACKGROUND: The digital era has witnessed an escalating dependence on digital platforms for news and information, coupled with the advent of "deepfake" technology. Deepfakes, leveraging deep learning models on extensive data sets of voice recordings and images, pose substantial threats to media authenticity, potentially leading to unethical misuse such as impersonation and the dissemination of false information. OBJECTIVE: To counteract this challenge, this study aims to introduce the concept of innate biological processes to discern between authentic human voices and cloned voices. We propose that the presence or absence of certain perceptual features, such as pauses in speech, can effectively distinguish between cloned and authentic audio. METHODS: A total of 49 adult participants representing diverse ethnic backgrounds and accents were recruited. Each participant contributed voice samples for the training of up to 3 distinct voice cloning text-to-speech models and 3 control paragraphs. Subsequently, the cloning models generated synthetic versions of the control paragraphs, resulting in a data set consisting of up to 9 cloned audio samples and 3 control samples per participant. We analyzed the speech pauses caused by biological actions such as respiration, swallowing, and cognitive processes. Five audio features corresponding to speech pause profiles were calculated. Differences between authentic and cloned audio for these features were assessed, and 5 classical machine learning algorithms were implemented using these features to create a prediction model. The generalization capability of the optimal model was evaluated through testing on unseen data, incorporating a model-naive generator, a model-naive paragraph, and model-naive participants. RESULTS: Cloned audio exhibited significantly increased time between pauses (P<.001), decreased variation in speech segment length (P=.003), increased overall proportion of time speaking (P=.04), and decreased rates of micro- and macropauses in speech (both P=.01). Five machine learning models were implemented using these features, with the AdaBoost model demonstrating the highest performance, achieving a 5-fold cross-validation balanced accuracy of 0.81 (SD 0.05). Other models included support vector machine (balanced accuracy 0.79, SD 0.03), random forest (balanced accuracy 0.78, SD 0.04), logistic regression, and decision tree (balanced accuracies 0.76, SD 0.10 and 0.72, SD 0.06). When evaluating the optimal AdaBoost model, it achieved an overall test accuracy of 0.79 when predicting unseen data. CONCLUSIONS: The incorporation of perceptual, biological features into machine learning models demonstrates promising results in distinguishing between authentic human voices and cloned audio.

19.
JMIR Res Protoc ; 13: e54030, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38935945

ABSTRACT

BACKGROUND: Sound therapy methods have seen a surge in popularity, with a predominant focus on music among all types of sound stimulation. There is substantial evidence documenting the integrative impact of music therapy on psycho-emotional and physiological outcomes, rendering it beneficial for addressing stress-related conditions such as pain syndromes, depression, and anxiety. Despite these advancements, the therapeutic aspects of sound, as well as the mechanisms underlying its efficacy, remain incompletely understood. Existing research on music as a holistic cultural phenomenon often overlooks crucial aspects of sound therapy mechanisms, particularly those related to speech acoustics or the so-called "music of speech." OBJECTIVE: This study aims to provide an overview of empirical research on sound interventions to elucidate the mechanism underlying their positive effects. Specifically, we will focus on identifying therapeutic factors and mechanisms of change associated with sound interventions. Our analysis will compare the most prevalent types of sound interventions reported in clinical studies and experiments. Moreover, we will explore the therapeutic effects of sound beyond music, encompassing natural human speech and intermediate forms such as traditional poetry performances. METHODS: This review adheres to the methodological guidance of the Joanna Briggs Institute and follows the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist for reporting review studies, which is adapted from the Arksey and O'Malley framework. Our search strategy encompasses PubMed, Web of Science, Scopus, and PsycINFO or EBSCOhost, covering literature from 1990 to the present. Among the different study types, randomized controlled trials, clinical trials, laboratory experiments, and field experiments were included. RESULTS: Data collection began in October 2022. We found a total of 2027 items. Our initial search uncovered an asymmetry in the distribution of studies, with a larger number focused on music therapy compared with those exploring prosody in spoken interventions such as guided meditation or hypnosis. We extracted and selected papers using Rayyan software (Rayyan) and identified 41 eligible papers after title and abstract screening. The completion of the scoping review is anticipated by October 2024, with key steps comprising the analysis of findings by May 2024, drafting and revising the study by July 2024, and submitting the paper for publication in October 2024. CONCLUSIONS: In the next step, we will conduct a quality evaluation of the papers and then chart and group the therapeutic factors extracted from them. This process aims to unveil conceptual gaps in existing studies. Gray literature sources, such as Google Scholar, ClinicalTrials.gov, nonindexed conferences, and reference list searches of retrieved studies, will be added to our search strategy to increase the number of relevant papers that we cover. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/54030.


Subject(s)
Music Therapy , Stress, Psychological , Humans , Stress, Psychological/therapy , Music Therapy/methods , Adult
20.
Pediatr Cardiol ; 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38937337

ABSTRACT

Research has shown that X-rays and fundus images can classify gender, age group, and race, raising concerns about bias and fairness in medical AI applications. However, the potential for physiological sounds to classify sociodemographic traits has not been investigated. Exploring this gap is crucial for understanding the implications and ensuring fairness in the field of medical sound analysis. We aimed to develop classifiers to determine gender (men/women) based on heart sound recordings and using machine learning (ML). Data-driven ML analysis. We utilized the open-access CirCor DigiScope Phonocardiogram Dataset obtained from cardiac screening programs in Brazil. Volunteers < 21 years of age. Each participant completed a questionnaire and underwent a clinical examination, including electronic auscultation at four cardiac points: aortic (AV), mitral (MV), pulmonary (PV), and tricuspid (TV). We used Mel-frequency cepstral coefficients (MFCCs) to develop the ML classifiers. From each patient and from each auscultation sound recording, we extracted 10 MFCCs. In sensitivity analysis, we additionally extracted 20, 30, 40, and 50 MFCCs. The most effective gender classifier was developed using PV recordings (AUC ROC = 70.3%). The second best came from MV recordings (AUC ROC = 58.8%). AV and TV recordings produced classifiers with an AUC ROC of 56.4% and 56.1%, respectively. Using more MFCCs did not substantially improve the classifiers. It is possible to classify between males and females using phonocardiogram data. As health-related audio recordings become more prominent in ML applications, research is required to explore if these recordings contain signals that could distinguish sociodemographic features.

SELECTION OF CITATIONS
SEARCH DETAIL
...