Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
IEEE J Biomed Health Inform ; 26(7): 2941-2950, 2022 07.
Article in English | MEDLINE | ID: mdl-35213321

ABSTRACT

Obstructive sleep apnea (OSA) is a chronic and prevalent condition with well-established comorbidities. However, many severe cases remain undiagnosed due to poor access to polysomnography (PSG), the gold standard for Obstructive sleep apnea (OSA) diagnosis. Accurate home-based methods to screen for OSA are needed, which can be applied inexpensively to high-risk subjects to identify those that require PSG to fully assess their condition. A number of methods that analyse speech or breathing sounds to screen for OSA have been previously investigated. However, these methods have constraints that limit their use in home environments (e.g., they require specialised equipment, are not robust to background noise, are obtrusive or depend on tightly controlled conditions). This paper proposes a novel method to screen for OSA, which analyses sleep breathing sounds recorded with a smartphone at home. Audio recordings made over a whole night are divided into segments, each of which is classified for the presence or absence of OSA by a deep neural network. The apnea-hypopnea index estimated from the segments predicted as containing evidence of OSA is then used to screen for the condition. Audio recordings made during home sleep apnea testing from 103 participants for 1 or 2 nights were used to develop and evaluate the proposed system. When screening for moderate OSA the acoustics based system achieved a sensitivity of 0.79 and a specificity of 0.80. The sensitivity and specificity when screening for severe OSA were 0.78 and 0.93, respectively. The system is suitable for implementation on consumer smartphones.


Subject(s)
Respiratory Sounds , Sleep Apnea, Obstructive , Acoustics , Home Environment , Humans , Neural Networks, Computer , Sleep Apnea, Obstructive/diagnosis
2.
Cochlear Implants Int ; 20(5): 255-265, 2019 09.
Article in English | MEDLINE | ID: mdl-31234737

ABSTRACT

Objectives: Training software to facilitate participation in conversations where overlapping talk is common was to be developed with the involvement of Cochlear implant (CI) users. Methods: Examples of common types of overlap were extracted from a recorded corpus of 3.5 hours of British English conversation. In eight meetings, an expert panel of five CI users tried out ideas for a computer-based training programme addressing difficulties in turn-taking. Results: Based on feedback from the panel, a training programme was devised. The first module consists of introductory videos. The three remaining modules, implemented in interactive software, focus on non-overlapped turn-taking, competitive overlaps and accidental overlaps. Discussion: The development process is considered in light of feedback from panel members and from an end of project dissemination event. Benefits, limitations and challenges of the present approach to user involvement and to the design of self-administered communication training programmes are discussed. Conclusion: The project was characterized by two innovative features: the involvement of service users not only at its outset and conclusion but throughout its course; and the exclusive use of naturally occurring conversational speech in the training programme. While both present practical challenges, the project has demonstrated the potential for ecologically valid speech rehabilitation training.


Subject(s)
Cochlear Implantation/rehabilitation , Cochlear Implants , Correction of Hearing Impairment/methods , Deafness/rehabilitation , Speech Therapy/methods , Communication , Deafness/psychology , Humans , Language , Program Evaluation , Software
3.
J Acoust Soc Am ; 143(6): EL523, 2018 06.
Article in English | MEDLINE | ID: mdl-29960497

ABSTRACT

This paper presents a bi-view (front and side) audiovisual Lombard speech corpus, which is freely available for download. It contains 5400 utterances (2700 Lombard and 2700 plain reference utterances), produced by 54 talkers, with each utterance in the dataset following the same sentence format as the audiovisual "Grid" corpus [Cooke, Barker, Cunningham, and Shao (2006). J. Acoust. Soc. Am. 120(5), 2421-2424]. Analysis of this dataset confirms previous research, showing prominent acoustic, phonetic, and articulatory speech modifications in Lombard speech. In addition, gender differences are observed in the size of Lombard effect. Specifically, female talkers exhibit a greater increase in estimated vowel duration and a greater reduction in F2 frequency.


Subject(s)
Adaptation, Psychological , Noise/adverse effects , Speech Acoustics , Speech Perception , Visual Perception , Voice Quality , Acoustics , Adolescent , Adult , Female , Humans , Male , Phonetics , Sex Factors , Speech Production Measurement , Video Recording , Young Adult
4.
J Acoust Soc Am ; 139(2): 904-17, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26936571

ABSTRACT

Visual displays in passive sonar based on the Fourier spectrogram are underpinned by detection models that rely on signal and noise power statistics. Time-frequency representations specialised for sparse signals achieve a sharper signal representation, either by reassigning signal energy based on temporal structure or by conveying temporal structure directly. However, temporal representations involve nonlinear transformations that make it difficult to reason about how they respond to additive noise. This article analyses the effect of noise on temporal fine structure measurements such as zero crossings and instantaneous frequency. Detectors that rely on zero crossing intervals, intervals and peak amplitudes, and instantaneous frequency measurements are developed, and evaluated for the detection of a sinusoid in Gaussian noise, using the power detector as a baseline. Detectors that rely on fine structure outperform the power detector under certain circumstances; and detectors that rely on both fine structure and power measurements are superior. Reassigned spectrograms assume that the statistics used to reassign energy are reliable, but the derivation of the fine structure detectors indicates the opposite. The article closes by proposing and demonstrating the concept of a doubly reassigned spectrogram, wherein temporal measurements are reassigned according to a statistical model of the noise background.

5.
J Acoust Soc Am ; 136(6): 3072, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25480056

ABSTRACT

Mounting evidence suggests that listeners perceptually compensate for the adverse effects of reverberation in rooms when listening to speech monaurally. However, it is not clear whether the underlying perceptual mechanism would be at all effective in the high levels of stimulus uncertainty that are present in everyday listening. Three experiments investigated monaural compensation with a consonant identification task in which listeners heard different speech on each trial. Consonant confusions frequently arose when a greater degree of reverberation was added to a test-word than to its surrounding context, but compensation became apparent in conditions where the context reverberation was increased to match that of the test-word; here, the confusions were largely resolved. A second experiment shows that information from the test-word itself can also effect compensation. Finally, the time course of compensation was examined by applying reverberation to a portion of the preceding context; consonant identification improves as this portion increases in duration. These findings indicate a monaural compensation mechanism that is likely to be effective in everyday listening, allowing listeners to recalibrate as their reverberant environment changes.


Subject(s)
Perceptual Distortion , Perceptual Masking , Phonetics , Speech Acoustics , Speech Perception , Adult , Female , Humans , Male , Sound Spectrography
6.
J Acoust Soc Am ; 134(3): EL282-8, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23968061

ABSTRACT

Different methods of extracting speech features from an auditory model were systematically investigated in terms of their robustness to different noises. The methods either computed the average firing rate within frequency channels (spectral features) or inter-spike-intervals (timing features) from the simulated auditory nerve response. When used as the front-end for an automatic speech recognizer, timing features outperformed spectral features in Gaussian noise. However, this advantage was lost in babble, because timing features extracted the spectro-temporal structure of babble noise, which is similar to the target speaker. This suggests that different feature extraction methods are optimal depending on the background noise.


Subject(s)
Cochlear Nerve/physiology , Models, Neurological , Noise , Speech Acoustics , Speech Production Measurement , Computer Simulation , Fourier Analysis , Humans , Pattern Recognition, Automated , Signal-To-Noise Ratio , Sound Spectrography , Speech Production Measurement/methods , Speech Recognition Software , Time Factors
7.
Adv Exp Med Biol ; 787: 11-9; discussion 19-20, 2013.
Article in English | MEDLINE | ID: mdl-23716204

ABSTRACT

Computer models of the auditory periphery provide a tool for -formulating theories concerning the relationship between the physiology of the auditory system and the perception of sounds both in normal and impaired hearing. However, the time-consuming nature of their construction constitutes a major impediment to their use, and it is important that transparent models be available on an 'off-the-shelf' basis to researchers. The MATLAB Auditory Periphery (MAP) model aims to meet these requirements and be freely available. The model can be used to simulate simple psychophysical tasks such as absolute threshold, pitch matching and forward masking and those used to measure compression and frequency selectivity. It can be used as a front end to automatic speech recognisers for the study of speech in quiet and in noise. The model can also simulate theories of hearing impairment and be used to make predictions about the efficacy of hearing aids. The use of the software will be described along with illustrations of its application in the study of the psychology of hearing.


Subject(s)
Auditory Pathways/physiology , Auditory Perception/physiology , Computer Simulation , Hearing/physiology , Models, Biological , Communication Aids for Disabled , Humans , Psychophysics/methods
8.
J Acoust Soc Am ; 132(3): 1535-41, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22978882

ABSTRACT

The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943-954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model's ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds.


Subject(s)
Auditory Pathways/physiology , Feedback, Sensory , Models, Neurological , Neural Inhibition , Noise/adverse effects , Perceptual Masking , Recognition, Psychology , Speech Perception , Comprehension , Computer Simulation , Efferent Pathways/physiology , Female , Humans , Male , Sound Spectrography , Speech Recognition Software
9.
Lang Speech ; 55(Pt 1): 57-76, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22480026

ABSTRACT

In order to explore the influence of context on the phonetic design of talk-in-interaction, we investigated the pitch characteristics of short turns (insertions) that are produced by one speaker between turns from another speaker. We investigated the hypothesis that the speaker of the insertion designs her turn as a pitch match to the prior turn in order to align with the previous speaker's agenda, whereas non-matching displays that the speaker of the insertion is non-aligning, for example to initiate a new action. Data were taken from the AMI meeting corpus, focusing on the spontaneous talk of first-language English participants. Using sequential analysis, 177 insertions were classified as either aligning or non-aligning in accordance with definitions of these terms in the Conversation Analysis literature. The degree of similarity between the pitch contour of the insertion and that of the prior speaker's turn was measured, using a new technique that integrates normalized F0 and intensity information. The results showed that aligning insertions were significantly more similar to the immediately preceding turn, in terms of pitch contour, than were non-aligning insertions. This supports the view that choice of pitch contour is managed locally, rather than by reference to an intonational lexicon.


Subject(s)
Cues , Phonetics , Pitch Perception , Social Behavior , Speech Acoustics , Speech Perception , Speech , Verbal Behavior , Humans , Interpersonal Relations , Sound Spectrography , Speech Production Measurement , Time Factors , Voice Quality
10.
J Acoust Soc Am ; 127(2): 943-54, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20136217

ABSTRACT

The neural mechanisms underlying the ability of human listeners to recognize speech in the presence of background noise are still imperfectly understood. However, there is mounting evidence that the medial olivocochlear system plays an important role, via efferents that exert a suppressive effect on the response of the basilar membrane. The current paper presents a computer modeling study that investigates the possible role of this activity on speech intelligibility in noise. A model of auditory efferent processing [Ferry, R. T., and Meddis, R. (2007). J. Acoust. Soc. Am. 122, 3519-3526] is used to provide acoustic features for a statistical automatic speech recognition system, thus allowing the effects of efferent activity on speech intelligibility to be quantified. Performance of the "basic" model (without efferent activity) on a connected digit recognition task is good when the speech is uncorrupted by noise but falls when noise is present. However, recognition performance is much improved when efferent activity is applied. Furthermore, optimal performance is obtained when the amount of efferent activity is proportional to the noise level. The results obtained are consistent with the suggestion that efferent suppression causes a "release from adaptation" in the auditory-nerve response to noisy speech, which enhances its intelligibility.


Subject(s)
Auditory Perception/physiology , Models, Neurological , Noise , Speech Perception/physiology , Speech Recognition Software , Speech , Acoustic Stimulation , Animals , Basilar Membrane/physiology , Cats , Cochlear Nucleus/physiology , Computer Simulation , Efferent Pathways/physiology , Humans , Markov Chains , Olivary Nucleus/physiology , Pattern Recognition, Automated , Pattern Recognition, Physiological/physiology , Recognition, Psychology/physiology , Sound Spectrography
11.
IEEE Trans Neural Netw ; 15(5): 1151-63, 2004 Sep.
Article in English | MEDLINE | ID: mdl-15484891

ABSTRACT

The human auditory system is able to separate acoustic mixtures in order to create a perceptual description of each sound source. It has been proposed that this is achieved by an auditory scene analysis (ASA) in which a mixture of sounds is parsed to give a number of perceptual streams, each of which describes a single sound source. It is widely assumed that ASA is a precursor of attentional mechanisms, which select a stream for attentional focus. However, recent studies suggest that attention plays a key role in the formation of auditory streams. Motivated by these findings, this paper presents a conceptual framework for auditory selective attention in which the formation of groups and streams is heavily influenced by conscious and subconscious attention. This framework is implemented as a computational model comprising a network of neural oscillators, which perform stream segregation on the basis of oscillatory correlation. Within the network, attentional interest is modeled as a Gaussian distribution in frequency. This determines the connection weights between oscillators and the attentional process, which is modeled as an attentional leaky integrator (ALI). Acoustic features are held to be the subject of attention if their oscillatory activity coincides temporally with a peak in the ALI activity. The output of the model is an "attentional stream," which encodes the frequency bands in the attentional focus at each epoch. The model successfully simulates a range of psychophysical phenomena.


Subject(s)
Attention/physiology , Auditory Cortex/physiology , Auditory Pathways/physiology , Auditory Perception/physiology , Models, Neurological , Action Potentials/physiology , Animals , Biological Clocks/physiology , Humans , Memory/physiology , Neural Networks, Computer , Neurons/physiology , Normal Distribution , Synapses/physiology , Synaptic Transmission/physiology
12.
J Acoust Soc Am ; 114(4 Pt 1): 2236-52, 2003 Oct.
Article in English | MEDLINE | ID: mdl-14587621

ABSTRACT

At a cocktail party, one can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel, supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial localization cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, the notion of an "ideal" time-frequency binary mask is suggested, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. It is observed that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, pattern classification is performed in order to estimate ideal binary masks. A systematic evaluation in terms of signal-to-noise ratio as well as automatic speech recognition performance shows that the resulting system produces masks very close to ideal binary ones. A quantitative comparison shows that the model yields significant improvement in performance over an existing approach. Furthermore, under certain conditions the model produces large speech intelligibility improvements with normal listeners.


Subject(s)
Attention , Perceptual Masking , Sound Localization , Speech Perception , Adult , Dichotic Listening Tests , Female , Humans , Male , Mathematical Computing , Sound Spectrography , Speech Acoustics , Speech Intelligibility
SELECTION OF CITATIONS
SEARCH DETAIL
...