Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Data Brief ; 53: 110229, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38445201

ABSTRACT

Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 5139-5157, 2023 Apr.
Article in English | MEDLINE | ID: mdl-35939468

ABSTRACT

Belief propagation (BP) is a popular method for performing probabilistic inference on graphical models. In this work, we enhance BP and propose self-guided belief propagation (SBP) that incorporates the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We provide a formal analysis to demonstrate that SBP finds the global optimum of the Bethe approximation for attractive models where all variables favor the same state. Moreover, we apply SBP to various graphs with random potentials and empirically show that: (i) SBP is superior in terms of accuracy whenever BP converges, and (ii) SBP obtains a unique, stable, and accurate solution whenever BP does not converge.

3.
IEEE Trans Biomed Eng ; 69(9): 2872-2882, 2022 09.
Article in English | MEDLINE | ID: mdl-35254969

ABSTRACT

Computational methods for lung sound analysis are beneficial for computer-aided diagnosis support, storage and monitoring in critical care. In this paper, we use pre-trained ResNet models as backbone architectures for classification of adventitious lung sounds and respiratory diseases. The learned representation of the pre-trained model is transferred by using vanilla fine-tuning, co-tuning, stochastic normalization and the combination of the co-tuning and stochastic normalization techniques. Furthermore, data augmentation in both time domain and time-frequency domain is used to account for the class imbalance of the ICBHI and our multi-channel lung sound dataset. Additionally, we introduce spectrum correction to account for the variations of the recording device properties on the ICBHI dataset. Empirically, our proposed systems mostly outperform all state-of-the-art lung sound classification systems for the adventitious lung sounds and respiratory diseases of both datasets.


Subject(s)
Diagnosis, Computer-Assisted , Respiratory Sounds , Humans , Lung , Respiratory Sounds/diagnosis
4.
Article in English | MEDLINE | ID: mdl-34891244

ABSTRACT

Large annotated lung sound databases are publicly available and might be used to train algorithms for diagnosis systems. However, it might be a challenge to develop a well-performing algorithm for small non-public data, which have only a few subjects and show differences in recording devices and setup. In this paper, we use transfer learning to tackle the mismatch of the recording setup. This allows us to transfer knowledge from one dataset to another dataset for crackle detection in lung sounds. In particular, a single input convolutional neural network (CNN) model is pre-trained on a source domain using ICBHI 2017, the largest publicly available database of lung sounds. We use log-mel spectrogram features of respiratory cycles of lung sounds. The pre-trained network is used to build a multi-input CNN model, which shares the same network architecture for respiratory cycles and their corresponding respiratory phases. The multi-input model is then fine-tuned on the target domain of our self-collected lung sound database for classifying crackles and normal lung sounds. Our experimental results show significant performance improvements of 9.84% (absolute) in F-score on the target domain using the multi-input CNN model and transfer learning for crackle detection.Clinical relevance- Crackle detection in lung sounds, multi-input convolutional neural networks, transfer learning.


Subject(s)
Neural Networks, Computer , Respiratory Sounds , Algorithms , Humans , Machine Learning , Sound
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 760-763, 2020 07.
Article in English | MEDLINE | ID: mdl-33018097

ABSTRACT

We propose a robust and efficient lung sound classification system using a snapshot ensemble of convolutional neural networks (CNNs). A robust CNN architecture is used to extract high-level features from log mel spectrograms. The CNN architecture is trained on a cosine cycle learning rate schedule. Capturing the best model of each training cycle allows to obtain multiple models settled on various local optima from cycle to cycle at the cost of training a single mode. Therefore, the snapshot ensemble boosts performance of the proposed system while keeping the drawback of expensive training of ensembles moderate. To deal with the class-imbalance of the dataset, temporal stretching and vocal tract length perturbation (VTLP) for data augmentation and the focal loss objective are used. Empirically, our system outperforms state-of-the-art systems for the prediction task of four classes (normal, crackles, wheezes, and both crackles and wheezes) and two classes (normal and abnormal (i.e. crackles, wheezes, and both crackles and wheezes)) and achieves 78.4% and 83.7% ICBHI specific micro-averaged accuracy, respectively. The average accuracy is repeated on ten random splittings of 80% training and 20% testing data using the ICBHI 2017 dataset of respiratory cycles.


Subject(s)
Neural Networks, Computer , Respiratory Sounds , Humans , Learning , Machine Learning
6.
Comput Biol Med ; 122: 103831, 2020 07.
Article in English | MEDLINE | ID: mdl-32658732

ABSTRACT

In this paper, we present an approach for multi-channel lung sound classification, exploiting spectral, temporal and spatial information. In particular, we propose a frame-wise classification framework to process full breathing cycles of multi-channel lung sound recordings with a convolutional recurrent neural network. With our recently developed 16-channel lung sound recording device, we collect lung sound recordings from lung-healthy subjects and patients with idiopathic pulmonary fibrosis (IPF), within a clinical trial. From the lung sound recordings, we extract spectrogram features and compare different deep neural network architectures for binary classification, i.e. healthy vs. pathological. Our proposed classification framework with the convolutional recurrent neural network outperforms the other networks by achieving an F-score of F1≈92%. Together with our multi-channel lung sound recording device, we present a holistic approach to multi-channel lung sound analysis.


Subject(s)
Neural Networks, Computer , Respiratory Sounds , Humans , Lung/diagnostic imaging , Respiration
7.
IEEE Trans Pattern Anal Mach Intell ; 42(1): 246-252, 2020 01.
Article in English | MEDLINE | ID: mdl-30530353

ABSTRACT

We extend feed-forward neural networks with a Dirichlet process prior over the weight distribution. This enforces a sharing on the network weights, which can reduce the overall number of parameters drastically. We alternately sample from the posterior of the weights and the posterior of assignments of network connections to the weights. This results in a weight sharing that is adopted to the given data. In order to make the procedure feasible, we present several techniques to reduce the computational burden. Experiments show that our approach mostly outperforms models with random weight sharing. Our model is capable of reducing the memory footprint substantially while maintaining a good performance compared to neural networks without weight sharing.

8.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 356-359, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30440410

ABSTRACT

In this paper, we present a method for event detection in single-channel lung sound recordings. This includes the detection of crackles and breathing phase events (inspiration/expiration). Therefore, we propose an event detection approach with spectral features and bidirectional gated recurrent neural networks (BiGRNNs). In our experiments, we use multichannel lung sound recordings from lung-healthy subjects and patients diagnosed with idiopathic pulmonary fibrosis, collected within a clinical trial. We achieve an event-based F-score of F1 ≈ 86% for breathing phase events and F1 ≈ 72% for crackles. The proposed method shows robustness regarding the contamination of the lung sound recordings with noise, bowel and heart sounds.


Subject(s)
Neural Networks, Computer , Respiratory Sounds , Heart Sounds , Humans , Lung , Respiration , Respiratory Sounds/diagnosis , Sound , Sound Spectrography/methods
9.
IEEE Trans Biomed Eng ; 65(9): 1964-1974, 2018 09.
Article in English | MEDLINE | ID: mdl-29993398

ABSTRACT

OBJECTIVE: In this paper, we accurately detect the state-sequence first heart sound (S1)-systole-second heart sound (S2)-diastole, i.e., the positions of S1 and S2, in heart sound recordings. We propose an event detection approach without explicitly incorporating a priori information of the state duration. This renders it also applicable to recordings with cardiac arrhythmia and extendable to the detection of extra heart sounds (third and fourth heart sound), heart murmurs, as well as other acoustic events. METHODS: We use data from the 2016 PhysioNet/CinC Challenge, containing heart sound recordings and annotations of the heart sound states. From the recordings, we extract spectral and envelope features and investigate the performance of different deep recurrent neural network (DRNN) architectures to detect the state sequence. We use virtual adversarial training, dropout, and data augmentation for regularization. RESULTS: We compare our results with the state-of-the-art method and achieve an average score for the four events of the state sequence of ${\bf F}_{1}\approx 96$% on an independent test set. CONCLUSION: Our approach shows state-of-the-art performance carefully evaluated on the 2016 PhysioNet/CinC Challenge dataset. SIGNIFICANCE: In this work, we introduce a new methodology for the segmentation of heart sounds, suggesting an event detection approach with DRNNs using spectral or envelope features.


Subject(s)
Heart Sounds/physiology , Neural Networks, Computer , Phonocardiography/methods , Signal Processing, Computer-Assisted , Algorithms , Diastole/physiology , Humans , Sound Spectrography/methods , Systole/physiology
10.
IEEE Trans Pattern Anal Mach Intell ; 40(9): 2124-2136, 2018 09.
Article in English | MEDLINE | ID: mdl-28885150

ABSTRACT

Belief propagation (BP) is an iterative method to perform approximate inference on arbitrary graphical models. Whether BP converges and if the solution is a unique fixed point depends on both the structure and the parametrization of the model. To understand this dependence it is interesting to find all fixed points. In this work, we formulate a set of polynomial equations, the solutions of which correspond to BP fixed points. To solve such a nonlinear system we present the numerical polynomial-homotopy-continuation (NPHC) method. Experiments on binary Ising models and on error-correcting codes show how our method is capable of obtaining all BP fixed points. On Ising models with fixed parameters we show how the structure influences both the number of fixed points and the convergence properties. We further asses the accuracy of the marginals and weighted combinations thereof. Weighting marginals with their respective partition function increases the accuracy in all experiments. Contrary to the conjecture that uniqueness of BP fixed points implies convergence, we find graphs for which BP fails to converge, even though a unique fixed point exists. Moreover, we show that this fixed point gives a good approximation, and the NPHC method is able to obtain this fixed point.

11.
Curr Neurol Neurosci Rep ; 17(5): 43, 2017 May.
Article in English | MEDLINE | ID: mdl-28390033

ABSTRACT

PURPOSE OF REVIEW: Substantial research exists focusing on the various aspects and domains of early human development. However, there is a clear blind spot in early postnatal development when dealing with neurodevelopmental disorders, especially those that manifest themselves clinically only in late infancy or even in childhood. RECENT FINDINGS: This early developmental period may represent an important timeframe to study these disorders but has historically received far less research attention. We believe that only a comprehensive interdisciplinary approach will enable us to detect and delineate specific parameters for specific neurodevelopmental disorders at a very early age to improve early detection/diagnosis, enable prospective studies and eventually facilitate randomised trials of early intervention. In this article, we propose a dynamic framework for characterising neurofunctional biomarkers associated with specific disorders in the development of infants and children. We have named this automated detection 'Fingerprint Model', suggesting one possible approach to accurately and early identify neurodevelopmental disorders.


Subject(s)
Biomarkers , Early Diagnosis , Neurodevelopmental Disorders/diagnosis , Humans
12.
PLoS One ; 12(2): e0170986, 2017.
Article in English | MEDLINE | ID: mdl-28151950

ABSTRACT

The present study aimed to define differences between silent and oral reading with respect to spatial and temporal eye movement parameters. Eye movements of 22 German-speaking adolescents (14 females; mean age = 13;6 years;months) were recorded while reading an age-appropriate text silently and orally. Preschool cognitive abilities were assessed at the participants' age of 5;7 (years;months) using the Kaufman Assessment Battery for Children. The participants' reading speed and reading comprehension at the age of 13;6 (years;months) were determined using a standardized inventory to evaluate silent reading skills in German readers (Lesegeschwindigkeits- und -verständnistest für Klassen 6-12). The results show that (i) reading mode significantly influenced both spatial and temporal characteristics of eye movement patterns; (ii) articulation decreased the consistency of intraindividual reading performances with regard to a significant number of eye movement parameters; (iii) reading skills predicted the majority of eye movement parameters during silent reading, but influenced only a restricted number of eye movement parameters when reading orally; (iv) differences with respect to a subset of eye movement parameters increased with reading skills; (v) an overall preschool cognitive performance score predicted reading skills at the age of 13;6 (years;months), but not eye movement patterns during either silent or oral reading. However, we found a few significant correlations between preschool performances on subscales of sequential and simultaneous processing and eye movement parameters for both reading modes. Overall, the findings suggest that eye movement patterns depend on the reading mode. Preschool cognitive abilities were more closely related to eye movement patterns of oral than silent reading, while reading skills predicted eye movement patterns during silent reading, but less so during oral reading.


Subject(s)
Eye Movements/physiology , Reading , Adolescent , Child, Preschool , Cognition/physiology , Female , Humans , Male
13.
IEEE Trans Pattern Anal Mach Intell ; 39(10): 2030-2044, 2017 10.
Article in English | MEDLINE | ID: mdl-27875213

ABSTRACT

One of the central themes in Sum-Product networks (SPNs) is the interpretation of sum nodes as marginalized latent variables (LVs). This interpretation yields an increased syntactic or semantic structure, allows the application of the EM algorithm and to efficiently perform MPE inference. In literature, the LV interpretation was justified by explicitly introducing the indicator variables corresponding to the LVs' states. However, as pointed out in this paper, this approach is in conflict with the completeness condition in SPNs and does not fully specify the probabilistic model. We propose a remedy for this problem by modifying the original approach for introducing the LVs, which we call SPN augmentation. We discuss conditional independencies in augmented SPNs, formally establish the probabilistic interpretation of the sum-weights and give an interpretation of augmented SPNs as Bayesian networks. Based on these results, we find a sound derivation of the EM algorithm for SPNs. Furthermore, the Viterbi-style algorithm for MPE proposed in literature was never proven to be correct. We show that this is indeed a correct algorithm, when applied to selective SPNs, and in particular when applied to augmented SPNs. Our theoretical results are confirmed in experiments on synthetic data and 103 real-world datasets.

14.
IEEE Trans Pattern Anal Mach Intell ; 37(4): 774-85, 2015 Apr.
Article in English | MEDLINE | ID: mdl-26353293

ABSTRACT

Bayesian network classifier (BNCs) are typically implemented on nowadays desktop computers. However, many real world applications require classifier implementation on embedded or low power systems. Aspects for this purpose have not been studied rigorously. We partly close this gap by analyzing reduced precision implementations of BNCs. In detail, we investigate the quantization of the parameters of BNCs with discrete valued nodes including the implications on the classification rate (CR). We derive worst-case and probabilistic bounds on the CR for different bit-widths. These bounds are evaluated on several benchmark datasets. Furthermore, we compare the classification performance and the robustness of BNCs with generatively and discriminatively optimized parameters, i.e. parameters optimized for high data likelihood and parameters optimized for classification, with respect to parameter quantization. Generatively optimized parameters are more robust for very low bit-widths, i.e. less classifications change because of quantization. However, classification performance is better for discriminatively optimized parameters for all but very low bit-widths. Additionally, we perform analysis for margin-optimized tree augmented network (TAN) structures which outperform generatively optimized TAN structures in terms of CR and robustness.

15.
Springerplus ; 4: 243, 2015.
Article in English | MEDLINE | ID: mdl-26085973

ABSTRACT

In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method.

16.
Pattern Recognit ; 46(2): 464-471, 2013 Feb.
Article in English | MEDLINE | ID: mdl-24511159

ABSTRACT

The margin criterion for parameter learning in graphical models gained significant impact over the last years. We use the maximum margin score for discriminatively optimizing the structure of Bayesian network classifiers. Furthermore, greedy hill-climbing and simulated annealing search heuristics are applied to determine the classifier structures. In the experiments, we demonstrate the advantages of maximum margin optimized Bayesian network structures in terms of classification performance compared to traditionally used discriminative structure learning methods. Stochastic simulated annealing requires less score evaluations than greedy heuristics. Additionally, we compare generative and discriminative parameter learning on both generatively and discriminatively structured Bayesian network classifiers. Margin-optimized Bayesian network classifiers achieve similar classification performance as support vector machines. Moreover, missing feature values during classification can be handled by discriminatively optimized Bayesian network classifiers, a case where purely discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

17.
Neurocomputing (Amst) ; 80(1): 38-46, 2012 Mar 15.
Article in English | MEDLINE | ID: mdl-22505792

ABSTRACT

Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the [Formula: see text] of the factor matrices. On the other hand, little work has been done using a more natural sparseness measure, the [Formula: see text]. In this paper, we propose a framework for approximate NMF which constrains the [Formula: see text] of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. In experiments we demonstrate the benefits of our methods, which compare to, or outperform existing approaches.

18.
IEEE Trans Pattern Anal Mach Intell ; 34(3): 521-32, 2012 Mar.
Article in English | MEDLINE | ID: mdl-21808086

ABSTRACT

We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.


Subject(s)
Bayes Theorem , Pattern Recognition, Automated/methods , Speech , Algorithms , Humans , Learning
19.
IEEE Trans Syst Man Cybern B Cybern ; 38(6): 1465-75, 2008 Dec.
Article in English | MEDLINE | ID: mdl-19022719

ABSTRACT

Recently, much work has been done in multiple object tracking on the one hand and on reference model adaptation for a single-object tracker on the other side. In this paper, we do both tracking of multiple objects (faces of people) in a meeting scenario and online learning to incrementally update the models of the tracked objects to account for appearance changes during tracking. Additionally, we automatically initialize and terminate tracking of individual objects based on low-level features, i.e., face color, face size, and object movement. Many methods unlike our approach assume that the target region has been initialized by hand in the first frame. For tracking, a particle filter is incorporated to propagate sample distributions over time. We discuss the close relationship between our implemented tracker based on particle filters and genetic algorithms. Numerous experiments on meeting data demonstrate the capabilities of our tracking approach. Additionally, we provide an empirical verification of the reference model learning during tracking of indoor and outdoor scenes which supports a more robust tracking. Therefore, we report the average of the standard deviation of the trajectories over numerous tracking runs depending on the learning rate.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Computer Simulation , Image Enhancement/methods , Models, Statistical , Motion , Online Systems , Reproducibility of Results , Sensitivity and Specificity
20.
IEEE Trans Pattern Anal Mach Intell ; 27(8): 1344-8, 2005 Aug.
Article in English | MEDLINE | ID: mdl-16119273

ABSTRACT

We propose a genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data. This algorithm is capable of selecting the number of components of the model using the minimum description length (MDL) criterion. Our approach benefits from the properties of Genetic algorithms (GA) and the EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM method. Therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The GA-EM algorithm is elitist which maintains the monotonic convergence property of the EM algorithm. The experiments on simulated and real data show that the GA-EM outperforms the EM method since: 1) We have obtained a better MDL score while using exactly the same termination condition for both algorithms. 2) Our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.


Subject(s)
Algorithms , Artificial Intelligence , Handwriting , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Models, Statistical , Pattern Recognition, Automated/methods , Cluster Analysis , Computer Simulation , Image Enhancement/methods , Normal Distribution , Numerical Analysis, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...