Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Cybern ; 50(3): 1230-1239, 2020 Mar.
Article in English | MEDLINE | ID: mdl-30872254

ABSTRACT

The task of obtaining meaningful annotations is a tedious work, incurring considerable costs and time consumption. Dynamic active learning and cooperative learning are recently proposed approaches to reduce human effort of annotating data with subjective phenomena. In this paper, we introduce a novel generic annotation framework, with the aim to achieve the optimal tradeoff between label reliability and cost reduction by making efficient use of human and machine work force. To this end, we use dropout to assess model uncertainty and thereby to decide which instances can be automatically labeled by the machine and which ones require human inspection. In addition, we propose an early stopping criterion based on inter-rater agreement in order to focus human resources on those ambiguous instances that are difficult to label. In contrast to the existing algorithms, the new confidence measures are not only applicable to binary classification tasks but also regression problems. The proposed method is evaluated on the benchmark datasets for non-native English prosody estimation, provided in the Interspeech computational paralinguistics challenge. In the result, the novel dynamic cooperative learning algorithm yields 0.424 Spearman's correlation coefficient compared to 0.413 with passive learning, while reducing the amount of human annotations by 74%.


Subject(s)
Data Curation/methods , Man-Machine Systems , Supervised Machine Learning , Adult , Algorithms , Databases, Factual , Female , Humans , Male , Middle Aged , Reproducibility of Results , Young Adult
2.
PLoS One ; 11(5): e0154486, 2016.
Article in English | MEDLINE | ID: mdl-27176486

ABSTRACT

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient.


Subject(s)
Eating/physiology , Food , Hearing/physiology , Speech Recognition Software , Speech/physiology , Adult , Audiovisual Aids , Automation , Databases as Topic , Female , Humans , Male , Regression Analysis , Self Report , Support Vector Machine
3.
Front Psychol ; 4: 292, 2013.
Article in English | MEDLINE | ID: mdl-23750144

ABSTRACT

WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

4.
PLoS One ; 8(12): e78506, 2013.
Article in English | MEDLINE | ID: mdl-24391704

ABSTRACT

Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.


Subject(s)
Artificial Intelligence , Motion Pictures , Violence , Algorithms , Databases, Factual , Emotions , Humans , Motion Pictures/statistics & numerical data , Multimedia
5.
Genes Chromosomes Cancer ; 42(3): 299-307, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15609343

ABSTRACT

Microarray technology has been proposed as an addition to the methods in current use for diagnosing leukemia. Before a new technology can be used in a diagnostic setting, the method has to be shown to produce robust results. It is known that, given the technical aspects of specimen sampling and target preparation, global gene expression patterns can change dramatically. Various parameters such as RNA degradation, shipment time, sample purity, and patient age can principally influence measured gene expression. However, thus far, no information has been available on the robustness of a diagnostic gene expression signature. We demonstrate here that for a subset of acute leukemia, expression profiling is applicable in a diagnostic setting, considering various influencing parameters. With the use of a set of differentially expressed genes, that is, a diagnostic gene expression signature, four genetically defined acute myeloid leukemia subtypes with recurrent chromosomal aberrations can clearly be identified. In addition, we show that preparation by different operators and using different sample-handling procedures did not impair the robustness of diagnostic expression signatures. In conclusion, our results provide additional support for the applicability of microarrays in a diagnostic setting, and we have been encouraged to enroll patients in a prospective study in which microarrays will be tested as an additional routine diagnostic method in parallel with standard diagnostic procedures.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation, Leukemic , Leukemia, Myeloid/genetics , Neoplasm Proteins/genetics , Adult , Chromosome Aberrations , Humans , Leukemia, Myeloid/diagnosis , Oligonucleotide Array Sequence Analysis , Prognosis , Translocation, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...