Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Neural Netw ; 172: 106120, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38266474

RESUMO

High-dimensional data such as natural images or speech signals exhibit some form of regularity, preventing their dimensions from varying independently. This suggests that there exists a lower dimensional latent representation from which the high-dimensional observed data were generated. Uncovering the hidden explanatory features of complex data is the goal of representation learning, and deep latent variable generative models have emerged as promising unsupervised approaches. In particular, the variational autoencoder (VAE) which is equipped with both a generative and an inference model allows for the analysis, transformation, and generation of various types of data. Over the past few years, the VAE has been extended to deal with data that are either multimodal or dynamical (i.e., sequential). In this paper, we present a multimodal and dynamical VAE (MDVAE) applied to unsupervised audiovisual speech representation learning. The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality. A static latent variable is also introduced to encode the information that is constant over time within an audiovisual speech sequence. The model is trained in an unsupervised manner on an audiovisual emotional speech dataset, in two stages. In the first stage, a vector quantized VAE (VQ-VAE) is learned independently for each modality, without temporal modeling. The second stage consists in learning the MDVAE model on the intermediate representation of the VQ-VAEs before quantization. The disentanglement between static versus dynamical and modality-specific versus modality-common information occurs during this second training stage. Extensive experiments are conducted to investigate how audiovisual speech latent factors are encoded in the latent space of MDVAE. These experiments include manipulating audiovisual speech, audiovisual facial image denoising, and audiovisual speech emotion recognition. The results show that MDVAE effectively combines the audio and visual information in its latent space. They also show that the learned static representation of audiovisual speech can be used for emotion recognition with few labeled data, and with better accuracy compared with unimodal baselines and a state-of-the-art supervised model based on an audiovisual transformer architecture.


Assuntos
Aprendizagem , Fala , Emoções , Face , Reconhecimento Psicológico
2.
PLoS One ; 18(8): e0290612, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37624781

RESUMO

Recent deep-learning techniques have made it possible to manipulate facial expressions in digital photographs or videos, however, these techniques still lack fine and personalized ways to control their creation. Moreover, current technologies are highly dependent on large labeled databases, which limits the range and complexity of expressions that can be modeled. Thus, these technologies cannot deal with non-basic emotions. In this paper, we propose a novel interdisciplinary approach combining the Generative Adversarial Network (GAN) with a technique inspired by cognitive sciences, psychophysical reverse correlation. Reverse correlation is a data-driven method able to extract an observer's 'mental representation' of what a given facial expression should look like. Our approach can generate 1) personalized facial expression prototypes, 2) of basic emotions, and non-basic emotions that are not available in existing databases, and 3) without the need for expertise. Personalized prototypes obtained with reverse correlation can then be applied to manipulate facial expressions. In addition, our system challenges the universality of facial expression prototypes by proposing the concepts of dominant and complementary action units to describe facial expression prototypes. The evaluations we conducted on a limited number of emotions validate the effectiveness of our proposed method. The code is available at https://github.com/yansen0508/Mental-Deep-Reverse-Engineering.


Assuntos
Emoções , Expressão Facial , Bases de Dados Factuais , Estudos Interdisciplinares , Fotografação
3.
Cancers (Basel) ; 15(3)2023 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-36765906

RESUMO

BACKGROUND: Awake craniotomy (AC) with brain mapping for language and motor functions is often performed for tumors within or adjacent to eloquent brain regions. However, other important functions, such as vision and visuospatial and social cognition, are less frequently mapped, at least partly due to the difficulty of defining tasks suitable for the constrained AC environment. OBJECTIVE: The aim of this retrospective study was to demonstrate, through illustrative cases, how a virtual reality headset (VRH) equipped with eye tracking can open up new possibilities for the mapping of language, the visual field and complex cognitive functions in the operating room. METHODS: Virtual reality (VR) tasks performed during 69 ACs were evaluated retrospectively. Three types of VR tasks were used: VR-DO80 for language evaluation, VR-Esterman for visual field assessment and VR-TANGO for the evaluation of visuospatial and social functions. RESULTS: Surgery was performed on the right hemisphere for 29 of the 69 ACs performed (42.0%). One AC (1.5%) was performed with all three VR tasks, 14 ACs (20.3%) were performed with two VR tasks and 54 ACs (78.3%) were performed with one VR task. The median duration of VRH use per patient was 15.5 min. None of the patients had "VR sickness". Only transitory focal seizures of no consequence and unrelated to VRH use were observed during AC. Patients were able to perform all VR tasks. Eye tracking was functional, enabling the medical team to analyze the patients' attention and exploration of the visual field of the VRH directly. CONCLUSIONS: This preliminary experiment shows that VR approaches can provide neurosurgeons with a way of investigating various functions, including social cognition during AC. Given the rapid advances in VR technology and the unbelievable sense of immersion provided by the most recent devices, there is a need for ongoing reflection and discussions of the ethical and methodological considerations associated with the use of these advanced technologies in AC and brain mapping procedures.

4.
J Med Internet Res ; 23(3): e24373, 2021 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-33759794

RESUMO

BACKGROUND: Language mapping during awake brain surgery is currently a standard procedure. However, mapping is rarely performed for other cognitive functions that are important for social interaction, such as visuospatial cognition and nonverbal language, including facial expressions and eye gaze. The main reason for this omission is the lack of tasks that are fully compatible with the restrictive environment of an operating room and awake brain surgery procedures. OBJECTIVE: This study aims to evaluate the feasibility and safety of a virtual reality headset equipped with an eye-tracking device that is able to promote an immersive visuospatial and social virtual reality (VR) experience for patients undergoing awake craniotomy. METHODS: We recruited 15 patients with brain tumors near language and/or motor areas. Language mapping was performed with a naming task, DO 80, presented on a computer tablet and then in 2D and 3D via the VRH. Patients were also immersed in a visuospatial and social VR experience. RESULTS: None of the patients experienced VR sickness, whereas 2 patients had an intraoperative focal seizure without consequence; there was no reason to attribute these seizures to virtual reality headset use. The patients were able to perform the VR tasks. Eye tracking was functional, enabling the medical team to analyze the patients' attention and exploration of the visual field of the virtual reality headset directly. CONCLUSIONS: We found that it is possible and safe to immerse the patient in an interactive virtual environment during awake brain surgery, paving the way for new VR-based brain mapping procedures. TRIAL REGISTRATION: ClinicalTrials.gov NCT03010943; https://clinicaltrials.gov/ct2/show/NCT03010943.


Assuntos
Mapeamento Encefálico , Neoplasias Encefálicas , Realidade Virtual , Neoplasias Encefálicas/cirurgia , Feminino , Humanos , Masculino , Estudos Prospectivos , Vigília
5.
J Acoust Soc Am ; 147(6): 4087, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32611174

RESUMO

Head-related transfer function individualization is a key matter in binaural synthesis. However, currently available databases are limited in size compared to the high dimensionality of the data. In this paper, the process of generating a synthetic dataset of 1000 ear shapes and matching sets of pinna-related transfer functions (PRTFs), named WiDESPREaD (wide dataset of ear shapes and pinna-related transfer functions obtained by random ear drawings), is presented and made freely available to other researchers. Contributions in this article are threefold. First, from a proprietary dataset of 119 three-dimensional left-ear scans, a matching dataset of PRTFs was built by performing fast-multipole boundary element method (FM-BEM) calculations. Second, the underlying geometry of each type of high-dimensional data was investigated using principal component analysis. It was found that this linear machine-learning technique performs better at modeling and reducing data dimensionality on ear shapes than on matching PRTF sets. Third, based on these findings, a method was devised to generate an arbitrarily large synthetic database of PRTF sets that relies on the random drawing of ear shapes and subsequent FM-BEM computations.


Assuntos
Pavilhão Auricular , Orelha Externa , Cabeça , Aprendizado de Máquina , Análise de Componente Principal
6.
IEEE Comput Graph Appl ; 30(4): 51-61, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20650728

RESUMO

Modern modeling and rendering techniques have produced nearly photorealistic face models, but truly expressive digital faces also require natural-looking movements. Virtual characters in today's applications often display unrealistic facial expressions. Indeed, facial animation with traditional schemes such as keyframing and motion capture demands expertise. Moreover, the traditional schemes aren't adapted to interactive applications that require the real-time generation of context-dependent movements. A new animation system produces realistic expressive facial motion at interactive speed. The system relies on a set of motion models controlling facial-expression dynamics. The models are fitted on captured motion data and therefore retain the dynamic signature of human facial expressions. They also contain a nondeterministic component that ensures the variety of the long-term visual behavior. This system can efficiently animate any synthetic face. The video illustrates interactive use of a system that generates facial-animation sequences.


Assuntos
Face/anatomia & histologia , Expressão Facial , Modelos Biológicos , Interface Usuário-Computador , Humanos , Processamento de Imagem Assistida por Computador , Dinâmica não Linear , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...