Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 19(1): e0296347, 2024.
Article in English | MEDLINE | ID: mdl-38166055

ABSTRACT

During their creative process, designers routinely seek the feedback of end users. Yet, the collection of perceptual judgments is costly and time-consuming, since it involves repeated exposure to the designed object under elementary variations. Thus, considering the practical limits of working with human subjects, randomized protocols in interactive sound design face the risk of inefficiency, in the sense of collecting mostly uninformative judgments. This risk is all the more severe that the initial search space of design variations is vast. In this paper, we propose heuristics for reducing the design space considered during an interactive optimization process. These heuristics operate by using an approximation model, called surrogate model, of the perceptual quantity of interest. As an application, we investigate the design of pleasant and detectable electric vehicle sounds using an interactive genetic algorithm. We compare two types of surrogate models for this task, one based on acoustical descriptors gathered from the literature and the other based on behavioral data. We find that reducing by a factor of up to 64 an original design space of 4096 possible settings with the proposed heuristics reduces the number of iterations of the design process by up to 2 to reach the same performance. The behavioral approach leads to the best improvement of the explored designs overall, while the acoustical approach requires an appropriate choice of acoustical descriptor to be effective. Our approach accelerates the convergence of interactive design. As such, it is particularly suitable to tasks in which exhaustive search is prohibitively slow or expensive.


Subject(s)
Acoustics , Heuristics , Humans , Emotions , Sound
2.
Physiol Meas ; 43(9)2022 09 09.
Article in English | MEDLINE | ID: mdl-35688143

ABSTRACT

We describe an automatic classifier of arrhythmias based on 12-lead and reduced-lead electrocardiograms. Our classifier comprises four modules: scattering transform (ST), phase harmonic correlation (PHC), depthwise separable convolutions (DSC), and a long short-term memory (LSTM) network. It is trained on PhysioNet/Computing in Cardiology Challenge 2021 data. The ST captures short-term temporal ECG modulations while the PHC characterizes the phase dependence of coherent ECG components. Both reduce the sampling rate to a few samples per typical heart beat. We pass the output of the ST and PHC to a depthwise-separable convolution layer (DSC) which combines lead responses separately for each ST or PHC coefficient and then combines resulting values across all coefficients. At a deeper level, two LSTM layers integrate local variations of the input over long time scales. We train in an end-to-end fashion as a multilabel classification problem with a normal and 25 arrhythmia classes. Lastly, we use canonical correlation analysis (CCA) for transfer learning from 12-lead ST and PHC representations to reduced-lead ones. After local cross-validation on the public data from the challenge, our team 'BitScattered' achieved the following results: 0.682 ± 0.0095 for 12-lead; 0.666 ± 0.0257 for 6-lead; 0.674 ± 0.0185 for 4-lead; 0.661 ± 0.0098 for 3-lead; and 0.662 ± 0.0151 for 2-lead.


Subject(s)
Electrocardiography , Neural Networks, Computer , Algorithms , Arrhythmias, Cardiac/diagnosis , Electrocardiography/methods , Heart Rate , Humans
3.
J Acoust Soc Am ; 149(6): 4309, 2021 06.
Article in English | MEDLINE | ID: mdl-34241459

ABSTRACT

Machine listening systems for environmental acoustic monitoring face a shortage of expert annotations to be used as training data. To circumvent this issue, the emerging paradigm of self-supervised learning proposes to pre-train audio classifiers on a task whose ground truth is trivially available. Alternatively, training set synthesis consists in annotating a small corpus of acoustic events of interest, which are then automatically mixed at random to form a larger corpus of polyphonic scenes. Prior studies have considered these two paradigms in isolation but rarely ever in conjunction. Furthermore, the impact of data curation in training set synthesis remains unclear. To fill this gap in research, this article proposes a two-stage approach. In the self-supervised stage, we formulate a pretext task (Audio2Vec skip-gram inpainting) on unlabeled spectrograms from an acoustic sensor network. Then, in the supervised stage, we formulate a downstream task of multilabel urban sound classification on synthetic scenes. We find that training set synthesis benefits overall performance more than self-supervised learning. Interestingly, the geographical origin of the acoustic events in training set synthesis appears to have a decisive impact.


Subject(s)
Acoustics , Sound
4.
Article in English | MEDLINE | ID: mdl-33488686

ABSTRACT

Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time-frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 99.0%±1. An ablation study demonstrates that removing either the joint time-frequency scattering transform or the metric learning algorithm noticeably degrades performance.

5.
PLoS One ; 14(10): e0214168, 2019.
Article in English | MEDLINE | ID: mdl-31647815

ABSTRACT

Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 ms) and long-term (30 min) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer, i.e. an affine layer whose weights are dynamically adapted at prediction time by an auxiliary network taking long-term summary statistics of spectrotemporal features as input. We show that PCEN reduces temporal overfitting across dawn vs. dusk audio clips whereas context adaptation on PCEN-based summary statistics reduces spatial overfitting across sensor locations. Moreover, combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.


Subject(s)
Acoustics/instrumentation , Echolocation/physiology , Neural Networks, Computer , Signal Processing, Computer-Assisted/instrumentation , Vocalization, Animal/physiology , Animals , Birds/physiology , Flight, Animal/physiology , Noise , Reproducibility of Results
6.
Physiol Meas ; 40(7): 074001, 2019 07 23.
Article in English | MEDLINE | ID: mdl-31158822

ABSTRACT

OBJECTIVE: Early detection of sleep arousal in polysomnographic (PSG) signals is crucial for monitoring or diagnosing sleep disorders and reducing the risk of further complications, including heart disease and blood pressure fluctuations. APPROACH: In this paper, we present a new automatic detector of non-apnea arousal regions in multichannel PSG recordings. This detector cascades four different modules: a second-order scattering transform (ST) with Morlet wavelets; depthwise-separable convolutional layers; bidirectional long short-term memory (BiLSTM) layers; and dense layers. While the first two are shared across all channels, the latter two operate in a multichannel formulation. Following a deep learning paradigm, the whole architecture is trained in an end-to-end fashion in order to optimize two objectives: the detection of arousal onset and offset, and the classification of the type of arousal. Main results and Significance: The novelty of the approach is three-fold: it is the first use of a hybrid ST-BiLSTM network with biomedical signals; it captures frequency information lower (0.1 Hz) than the detection sampling rate (0.5 Hz); and it requires no explicit mechanism to overcome class imbalance in the data. In the follow-up phase of the 2018 PhysioNet/CinC Challenge the proposed architecture achieved a state-of-the-art area under the precision-recall curve (AUPRC) of 0.50 on the hidden test data, tied for the second-highest official result overall.


Subject(s)
Arousal/physiology , Neural Networks, Computer , Sleep/physiology , Automation , Humans , Polymers , Polysomnography
SELECTION OF CITATIONS
SEARCH DETAIL
...