Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9439-9450, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35385390

ABSTRACT

In this article, we propose a novel loss function for training generative adversarial networks (GANs) aiming toward deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions (CGFs) giving rise to Cumulant GAN. Relying on a recently derived variational formula, we show that the corresponding optimization problem is equivalent to Rényi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the Rényi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance, and χ2 -divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions, and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score (IS) and Fréchet inception distance (FID) when both weaker and stronger discriminators are considered.

2.
Comput Biol Med ; 100: 132-143, 2018 09 01.
Article in English | MEDLINE | ID: mdl-29990646

ABSTRACT

This study concerns the task of automatic structural heart abnormality risk detection from digital phonocardiogram (PCG) signals aiming at pediatric heart disease screening applications. Recently, various systems based on convolutional neural networks trained on time-frequency representations of segmental PCG frames have been presented that outperform systems using hand-crafted features. This study focuses on the segmentation and time-frequency representation components of the CNN-based designs. We consider the most commonly used features (MFCC and Mel-Spectrogram) used in state-of-the-art systems and a time-frequency representation influenced by domain-knowledge, namely sub-band envelopes as an alternative feature. Via tests carried on two high quality databases with a large set of possible settings, we show that sub-band envelopes are preferable to the most commonly used features and period synchronous windowing is preferable over asynchronous windowing.


Subject(s)
Databases, Factual , Heart Defects, Congenital , Heart Sounds , Neural Networks, Computer , Signal Processing, Computer-Assisted , Heart Defects, Congenital/diagnosis , Heart Defects, Congenital/physiopathology , Humans
3.
Trends Hear ; 22: 2331216518756533, 2018.
Article in English | MEDLINE | ID: mdl-29441834

ABSTRACT

Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.


Subject(s)
Auditory Perceptual Disorders , Noise , Speech Perception , Adolescent , Auditory Perceptual Disorders/physiopathology , Child , Female , Hearing Tests , Humans , Male , Speech
4.
J Acoust Soc Am ; 141(1): 189, 2017 01.
Article in English | MEDLINE | ID: mdl-28147616

ABSTRACT

Four algorithms designed to enhance the intelligibility of speech when noise is added after processing were evaluated under the constraint that the speech should have the same loudness before and after processing, as determined using a loudness model. The algorithms applied spectral modifications and two of them included dynamic-range compression. On average, the methods with dynamic-range compression required the least level adjustment to equate loudness for the unprocessed and processed speech. Subjects with normal-hearing (experiment 1) and mild-to-moderate hearing loss (experiment 2) were tested using unmodified and enhanced speech presented in speech-shaped noise (SSN) and a competing speaker (CS). The results showed (a) the algorithms with dynamic-range compression yielded the largest intelligibility gains in both experiments and for both types of background; (b) the algorithms without dynamic-range compression either yielded benefit only with the SSN or yielded no consistent benefit; (c) speech reception thresholds for unprocessed speech were higher for hearing-impaired than for normal-hearing subjects, by about 2 dB for the SSN and 6 dB for the CS. It is concluded that the enhancement methods incorporating dynamic-range compression can improve intelligibility under the equal-loudness constraint for both normal-hearing and hearing-impaired subjects and for both steady and fluctuating backgrounds.

5.
J Acoust Soc Am ; 140(1): 402, 2016 07.
Article in English | MEDLINE | ID: mdl-27475164

ABSTRACT

A model for the loudness of time-varying sounds [Glasberg and Moore (2012). J. Audio. Eng. Soc. 50, 331-342] was assessed for its ability to predict the loudness of sentences that were processed to either decrease or increase their dynamic fluctuations. In a paired-comparison task, subjects compared the loudness of unprocessed and processed sentences that had been equalized in (1) root-mean square (RMS) level; (2) the peak long-term loudness predicted by the model; (3) the mean long-term loudness predicted by the model. Method 2 was most effective in equating the loudness of the original and processed sentences.


Subject(s)
Audiometry, Speech , Loudness Perception/physiology , Speech Intelligibility , Adult , Aged , Female , Humans , Male , Middle Aged , Models, Biological , Sound , Speech Perception , Time Factors , Young Adult
6.
J Voice ; 26(3): 372-7, 2012 May.
Article in English | MEDLINE | ID: mdl-21839613

ABSTRACT

OBJECTIVES/HYPOTHESIS: The objective was to study the role of the Greek version of Voice Handicap Index (VHI) in comparison with Voice Symptom Scale (VoiSS) in terms of measuring voice surgery outcome in patients with benign laryngeal lesions. STUDY DESIGN: Nonrandomized prospective. METHODS: Forty-six patients operated for benign laryngeal lesions were enrolled in the present study. All patients were assessed according to the European Laryngological Society guidelines. In terms of self-evaluation, patients answered the Greek versions of both VHI and VoiSS, preoperatively and 6 weeks postoperatively, and the results were statistically analyzed. RESULTS: The strongest correlation was observed between the functional subscale of VHI and the impairment subscale of VoiSS, as well as, between the emotional subscales of both VHI and VoiSS, pre- and postoperatively. A statistically significant change in subscale and total scores was found. VHI and VoiSS subscales and total scores correlated with the stroboscopic and aerodynamic measurements in a variable manner. Perceptual measurements, as well as shimmer and harmonic-to-noise ratio showed significant correlation with both VHI and VoiSS subscale and total scores postoperatively. CONCLUSION: VHI and VoiSS are considered useful tools in evaluating voice surgery outcome, in the Greek language.


Subject(s)
Disability Evaluation , Language , Otorhinolaryngologic Surgical Procedures , Surveys and Questionnaires , Vocal Cords/surgery , Voice Disorders/surgery , Voice Quality , Chi-Square Distribution , Emotions , Female , Greece , Humans , Male , Middle Aged , Odds Ratio , Otorhinolaryngologic Surgical Procedures/adverse effects , Phonation , Predictive Value of Tests , Prospective Studies , Quality of Life , Recovery of Function , Speech Production Measurement , Stroboscopy , Time Factors , Treatment Outcome , Video Recording , Vocal Cords/physiopathology , Voice Disorders/diagnosis , Voice Disorders/physiopathology , Voice Disorders/psychology
7.
Logoped Phoniatr Vocol ; 36(2): 60-9, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21073260

ABSTRACT

This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and then the outputs of each classifier were used in a second classification stage. In order to establish the best configuration which provides the highest accuracy in the detection, the fusion of information was carried out employing different classifier combination strategies. The experiments were carried out using two different databases: the one developed by The Massachusetts Eye and Ear Infirmary Voice Laboratory, and a database recorded by the Universidad Politécnica de Madrid. The results show that the combination of MFCC and MS features employing the proposed approach yields an improvement in the detection accuracy, demonstrating that both methods of parameterization are complementary.


Subject(s)
Signal Processing, Computer-Assisted , Speech Production Measurement , Voice Disorders/diagnosis , Voice Quality , Adolescent , Adult , Aged , Algorithms , Automation , Child , Databases as Topic , Female , Fourier Analysis , Humans , Male , Middle Aged , Pattern Recognition, Automated , Phonation , Predictive Value of Tests , Sound Spectrography , Speech Acoustics , Voice Disorders/physiopathology , Young Adult
8.
Article in English | MEDLINE | ID: mdl-19964970

ABSTRACT

In this paper, we consider the use of Modulation Spectra for voice pathology detection and classification. To reduce the high-dimensionality space generated by Modulation spectra we suggest the use of Higher Order Singular Value Decomposition (SVD) and we propose a feature selection algorithm based on the Mutual Information between subjective voice quality and computed features. Using SVM with a radial basis function (RBF) kernel as classifier, we conducted experiments on a database of sustained vowel recordings from healthy and pathological voices. For voice pathology detection, the suggested approach achieved a detection rate of 94.1% and an Area Under the Curve (AUC) score of 97.8%. For voice pathology classification, an average detection rate and AUC of 88.6% and 94.8%, respectively, was achieved in classifying polyp against keratosis leukoplakia, adductor spasmodic dysphonia and vocal nodules.


Subject(s)
Sound Spectrography/instrumentation , Speech Production Measurement/methods , Voice Disorders/physiopathology , Voice Quality , Adult , Algorithms , Area Under Curve , Female , Fourier Analysis , Humans , Male , Middle Aged , Models, Statistical , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Sound Spectrography/methods , Speech Acoustics , Voice Disorders/diagnosis
9.
Folia Phoniatr Logop ; 61(3): 153-70, 2009.
Article in English | MEDLINE | ID: mdl-19571550

ABSTRACT

In this paper, we investigate the use of jitter estimation over short time intervals (short-term jitter) for voice pathology detection in the case of running or continuous speech. Short-term jitter estimations are provided by the spectral jitter estimator (SJE), which is based on a mathematical description of the jitter phenomenon. The SJE has been shown to be robust against errors in pitch period estimations, which makes it a good candidate for measuring jitter in continuous speech. On two large databases of sustained vowel recordings from healthy and pathological voices, we suggest a threshold for the SJE for pathology detection based on cross-database validation. Applying that to a database of continuous speech (reading text) from normophonic and dysphonic speakers, a second threshold and new features are suggested for monitoring jitter in continuous speech. Detection performance of the suggested thresholds and features was evaluated using receiver operating characteristic curves and their discriminative efficiency between healthy and pathological voices was judged using the area under the curve index. In terms of area under the curve, the suggested features for reading text provide a discrimination score of about 95%, while the second threshold provides a classification rate of 87.8%. Furthermore, estimated short-term jitter values from reading text were found to confirm the studies showing a decrease of jitter with increasing fundamental frequencies, and the more frequent presence of high jitter values in the case of pathological voices as time increases.


Subject(s)
Speech Production Measurement/methods , Speech , Voice Disorders/diagnosis , Algorithms , Area Under Curve , Databases, Factual , Dysphonia/diagnosis , Humans , Phonetics , ROC Curve , Reading , Speech Acoustics , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...