Search | VHL Regional Portal

1.

Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System.

Heo, Yoonseok; Kang, Sangwoo; Seo, Jungyun.

Sensors (Basel) ; 23(18)2023 Sep 14.

Article in English | MEDLINE | ID: mdl-37765933

ABSTRACT

With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses an audio-visual scene-aware dialog system that can communicate with users about audio-visual scenes. It is essential to understand not only visual and textual information but also audio information in a comprehensive way. Despite the substantial progress in multimodal representation learning with language and visual modalities, there are still two caveats: ineffective use of auditory information and the lack of interpretability of the deep learning systems' reasoning. To address these issues, we propose a novel audio-visual scene-aware dialog system that utilizes a set of explicit information from each modality as a form of natural language, which can be fused into a language model in a natural way. It leverages a transformer-based decoder to generate a coherent and correct response based on multimodal knowledge in a multitask learning setting. In addition, we also address the way of interpreting the model with a response-driven temporal moment localization method to verify how the system generates the response. The system itself provides the user with the evidence referred to in the system response process as a form of the timestamp of the scene. We show the superiority of the proposed model in all quantitative and qualitative measurements compared to the baseline. In particular, the proposed model achieved robust performance even in environments using all three modalities, including audio. We also conducted extensive experiments to investigate the proposed model. In addition, we obtained state-of-the-art performance in the system response reasoning task.

2.

Atypical occurrence of anti-Ma2-associated encephalitis after breast cancer surgery and COVID-19.

Seo, Jungyun; Kim, Hong Nam; Kwon, Minsuk; Kim, Tae-Joon.

Encephalitis ; 3(3): 97-101, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37500102

ABSTRACT

In this report, we present a rare case of anti-Ma2-associated encephalitis concurrent with coronavirus disease 2019 (COVID-19) following breast cancer surgery. The patient exhibited minimal clinical symptoms of COVID-19 infection but developed seizures and altered mental status after surgery, leading to diagnosis of a classic paraneoplastic syndrome. This case highlights the possibility of paraneoplastic neurological syndrome even after cancer surgery and the need for careful consideration of post-acute infection syndromes when neurological symptoms occur following an infection.

3.

Data on preboundary lengthening in Tokyo Japanese as a function of prosodic prominence, boundary, lexical pitch accent and moraic structure.

Seo, Jungyun; Kim, Sahyang; Cho, Taehong.

Data Brief ; 35: 106919, 2021 Apr.

Article in English | MEDLINE | ID: mdl-33786344

ABSTRACT

This article provides individual speakers' acoustic durational data on preboundary (phrase-final) lengthening in Japanese. The data are based on speech recorded from fourteen native speakers of Tokyo Japanese in a laboratory setting. Each speaker produced Japanese disyllabic words with four different moraic structures (CVCV, CVCVN, CVNCV, and CVNCVN, where C stands for a non-nasal onset consonant, V for a vowel, and N for a moraic nasal coda) and two pitch accent patterns (initially-accented and unaccented). The target words were produced in carrier sentences in which they were placed in two different prosodic boundary conditions (Intonational Phrase-final ('IPf') and Intonational Phrase-medial ('IPm')) and two focus contexts (focused and unfocused). The measured raw values of acoustic duration of each segment in different conditions are included in a CSV-formatted file. Another CSV-formatted file is provided with numeric calculations in both absolute and relative terms that exhibit the magnitude of preboundary lengthening across different prominence contexts (focused/unfocused and initially-accented/unaccented). The absolute durational difference was obtained as a numeric increase of preboundary lengthening of each segment produced in phrase-final position versus phrase-medial position (i.e., Δ(IPf-IPm) where 'f' = 'final' and 'm' = 'medial'). The relative durational difference was obtained as a percentage increase of preboundary lengthening in IP-final position versus IP-medial position, which was calculated by the absolute durational difference divided by the duration of the segment in phrase-medial position and then multiplied by 100 (i.e., (Absolute difference/IPm)*100). This article also provides figures that exemplify speaker variation in terms of absolute and relative differences of preboundary lengthening as a function of pitch accent. Some theoretical aspects of the data are discussed in the full-length article entitled "Preboundary lengthening in Japanese: To what extent do lexical pitch accent and moraic structure matter?" [1].

4.

Preboundary lengthening in Japanese: To what extent do lexical pitch accent and moraic structure matter?

Seo, Jungyun; Kim, Sahyang; Kubozono, Haruo; Cho, Taehong.

J Acoust Soc Am ; 146(3): 1817, 2019 09.

Article in English | MEDLINE | ID: mdl-31590553

ABSTRACT

In this acoustic study, preboundary lengthening (PBL) in Japanese is investigated in relation to the prosodic structure in disyllabic words with different moraic and pitch accent distributions. Results showed gradient progressive PBL effects largely independent of the mora count. The domain of PBL is better explained by the syllable structure than the moraic structure. PBL, however, is attracted toward a non-final moraic nasal, showing some role of the mora. The initial pitch accent does not attract PBL directly, but it suppresses PBL of the final rime as a way of maintaining the relative prominence, showing a language-specific PBL modulation.

Subject(s)

Phonetics , Speech Acoustics , Adult , Asian People , Female , Humans , Male , Voice/physiology

5.

An Efficient Framework for Development of Task-Oriented Dialog Systems in a Smart Home Environment.

Park, Youngmin; Kang, Sangwoo; Seo, Jungyun.

Sensors (Basel) ; 18(5)2018 May 16.

Article in English | MEDLINE | ID: mdl-29772668

ABSTRACT

In recent times, with the increasing interest in conversational agents for smart homes, task-oriented dialog systems are being actively researched. However, most of these studies are focused on the individual modules of such a system, and there is an evident lack of research on a dialog framework that can integrate and manage the entire dialog system. Therefore, in this study, we propose a framework that enables the user to effectively develop an intelligent dialog system. The proposed framework ontologically expresses the knowledge required for the task-oriented dialog system's process and can build a dialog system by editing the dialog knowledge. In addition, the framework provides a module router that can indirectly run externally developed modules. Further, it enables a more intelligent conversation by providing a hierarchical argument structure (HAS) to manage the various argument representations included in natural language sentences. To verify the practicality of the framework, an experiment was conducted in which developers without any previous experience in developing a dialog system developed task-oriented dialog systems using the proposed framework. The experimental results show that even beginner dialog system developers can develop a high-level task-oriented dialog system.

6.

A neural network model with feature selection for Korean speech act classification.

Kim, Kyungsun; Kim, Harksoo; Seo, Jungyun.

Int J Neural Syst ; 14(6): 407-14, 2004 Dec.

Article in English | MEDLINE | ID: mdl-15714607

ABSTRACT

A speech act is a linguistic action intended by a speaker. Speech act classification is an essential part of a dialogue understanding system because the speech act of an utterance is closely tied with the user's intention in the utterance. We propose a neural network model for Korean speech act classification. In addition, we propose a method that extracts morphological features from surface utterances and selects effective ones among the morphological features. Using the feature selection method, the proposed neural network can partially increase precision and decrease training time. In the experiment, the proposed neural network showed better results than other models using comparatively high-level linguistic features. Based on the experimental result, we believe that the proposed neural network model is suitable for real field applications because it is easy to expand the neural network model into other domains. Moreover, we found that neural networks can be useful in speech act classification if we can convert surface sentences into vectors with fixed dimensions by using an effective feature selection method.

Subject(s)

Models, Neurological , Neural Networks, Computer , Speech Recognition Software , Speech/classification , Asian People , Humans

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL