Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 881
Filtrar
1.
Curr Biol ; 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-39019036

RESUMO

Effective detection and avoidance from environmental threats are crucial for animals' survival. Integration of sensory cues associated with threats across different modalities can significantly enhance animals' detection and behavioral responses. However, the neural circuit-level mechanisms underlying the modulation of defensive behavior or fear response under simultaneous multimodal sensory inputs remain poorly understood. Here, we report in mice that bimodal looming stimuli combining coherent visual and auditory signals elicit more robust defensive/fear reactions than unimodal stimuli. These include intensified escape and prolonged hiding, suggesting a heightened defensive/fear state. These various responses depend on the activity of the superior colliculus (SC), while its downstream nucleus, the parabigeminal nucleus (PBG), predominantly influences the duration of hiding behavior. PBG temporally integrates visual and auditory signals and enhances the salience of threat signals by amplifying SC sensory responses through its feedback projection to the visual layer of the SC. Our results suggest an evolutionarily conserved pathway in defense circuits for multisensory integration and cross-modality enhancement.

2.
Front Neurosci ; 18: 1395627, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39010944

RESUMO

Objective: This study aimed to determine whether patients with disorders of consciousness (DoC) could experience neural entrainment to individualized music, which explored the cross-modal influences of music on patients with DoC through phase-amplitude coupling (PAC). Furthermore, the study assessed the efficacy of individualized music or preferred music (PM) versus relaxing music (RM) in impacting patient outcomes, and examined the role of cross-modal influences in determining these outcomes. Methods: Thirty-two patients with DoC [17 with vegetative state/unresponsive wakefulness syndrome (VS/UWS) and 15 with minimally conscious state (MCS)], alongside 16 healthy controls (HCs), were recruited for this study. Neural activities in the frontal-parietal network were recorded using scalp electroencephalography (EEG) during baseline (BL), RM and PM. Cerebral-acoustic coherence (CACoh) was explored to investigate participants' abilitiy to track music, meanwhile, the phase-amplitude coupling (PAC) was utilized to evaluate the cross-modal influences of music. Three months post-intervention, the outcomes of patients with DoC were followed up using the Coma Recovery Scale-Revised (CRS-R). Results: HCs and patients with MCS showed higher CACoh compared to VS/UWS patients within musical pulse frequency (p = 0.016, p = 0.045; p < 0.001, p = 0.048, for RM and PM, respectively, following Bonferroni correction). Only theta-gamma PAC demonstrated a significant interaction effect between groups and music conditions (F (2,44) = 2.685, p = 0.036). For HCs, the theta-gamma PAC in the frontal-parietal network was stronger in the PM condition compared to the RM (p = 0.016) and BL condition (p < 0.001). For patients with MCS, the theta-gamma PAC was stronger in the PM than in the BL (p = 0.040), while no difference was observed among the three music conditions in patients with VS/UWS. Additionally, we found that MCS patients who showed improved outcomes after 3 months exhibited evident neural responses to preferred music (p = 0.019). Furthermore, the ratio of theta-gamma coupling changes in PM relative to BL could predict clinical outcomes in MCS patients (r = 0.992, p < 0.001). Conclusion: Individualized music may serve as a potential therapeutic method for patients with DoC through cross-modal influences, which rely on enhanced theta-gamma PAC within the consciousness-related network.

3.
Sensors (Basel) ; 24(13)2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-39000877

RESUMO

In complex environments a single visible image is not good enough to perceive the environment, this paper proposes a novel dual-stream real-time detector designed for target detection in extreme environments such as nighttime and fog, which is able to efficiently utilise both visible and infrared images to achieve Fast All-Weatherenvironment sensing (FAWDet). Firstly, in order to allow the network to process information from different modalities simultaneously, this paper expands the state-of-the-art end-to-end detector YOLOv8, the backbone is expanded in parallel as a dual stream. Then, for purpose of avoid information loss in the process of network deepening, a cross-modal feature enhancement module is designed in this study, which enhances each modal feature by cross-modal attention mechanisms, thus effectively avoiding information loss and improving the detection capability of small targets. In addition, for the significant differences between modal features, this paper proposes a three-stage fusion strategy to optimise the feature integration through the fusion of spatial, channel and overall dimensions. It is worth mentioning that the cross-modal feature fusion module adopts an end-to-end training approach. Extensive experiments on two datasets validate that the proposed method achieves state-of-the-art performance in detecting small targets. The cross-modal real-time detector in this study not only demonstrates excellent stability and robust detection performance, but also provides a new solution for target detection techniques in extreme environments.

4.
Neuroimage ; : 120720, 2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-38971484

RESUMO

This meta-analysis summarizes evidence from 44 neuroimaging experiments and characterizes the general linguistic network in early deaf individuals. Meta-analytic comparisons with hearing individuals found that a specific set of regions (in particular the left inferior frontal gyrus and posterior middle temporal gyrus) participates in supramodal language processing. In addition to previously described modality-specific differences, the present study showed that the left calcarine gyrus and the right caudate were additionally recruited in deaf compared with hearing individuals. In addition, this study showed that the bilateral posterior superior temporal gyrus is shaped by cross-modal plasticity, whereas the left frontotemporal areas are shaped by early language experience. Although an overall left-lateralized pattern for language processing was observed in the early deaf individuals, regional lateralization was altered in the inferior temporal gyrus and anterior temporal lobe. These findings indicate that the core language network functions in a modality-independent manner, and provide a foundation for determining the contributions of sensory and linguistic experiences in shaping the neural bases of language processing.

5.
Adv Sci (Weinh) ; : e2404845, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-39031820

RESUMO

Constructing discriminative representations of molecules lies at the core of a number of domains such as drug discovery, chemistry, and medicine. State-of-the-art methods employ graph neural networks and self-supervised learning (SSL) to learn unlabeled data for structural representations, which can then be fine-tuned for downstream tasks. Albeit powerful, these methods are pre-trained solely on molecular structures and thus often struggle with tasks involved in intricate biological processes. Here, it is proposed to assist the learning of molecular representation by using the perturbed high-content cell microscopy images at the phenotypic level. To incorporate the cross-modal pre-training, a unified framework is constructed to align them through multiple types of contrastive loss functions, which is proven effective in the formulated novel tasks to retrieve the molecules and corresponding images mutually. More importantly, the model can infer functional molecules according to cellular images generated by genetic perturbations. In parallel, the proposed model can transfer non-trivially to molecular property predictions, and has shown great improvement over clinical outcome predictions. These results suggest that such cross-modality learning can bridge molecules and phenotype to play important roles in drug discovery.

6.
Quant Imaging Med Surg ; 14(7): 4579-4604, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39022265

RESUMO

Background: The information between multimodal magnetic resonance imaging (MRI) is complementary. Combining multiple modalities for brain tumor image segmentation can improve segmentation accuracy, which has great significance for disease diagnosis and treatment. However, different degrees of missing modality data often occur in clinical practice, which may lead to serious performance degradation or even failure of brain tumor segmentation methods relying on full-modality sequences to complete the segmentation task. To solve the above problems, this study aimed to design a new deep learning network for incomplete multimodal brain tumor segmentation. Methods: We propose a novel cross-modal attention fusion-based deep neural network (CMAF-Net) for incomplete multimodal brain tumor segmentation, which is based on a three-dimensional (3D) U-Net architecture with encoding and decoding structure, a 3D Swin block, and a cross-modal attention fusion (CMAF) block. A convolutional encoder is initially used to extract the specific features from different modalities, and an effective 3D Swin block is constructed to model the long-range dependencies to obtain richer information for brain tumor segmentation. Then, a cross-attention based CMAF module is proposed that can deal with different missing modality situations by fusing features between different modalities to learn the shared representations of the tumor regions. Finally, the fused latent representation is decoded to obtain the final segmentation result. Additionally, channel attention module (CAM) and spatial attention module (SAM) are incorporated into the network to further improve the robustness of the model; the CAM to help focus on important feature channels, and the SAM to learn the importance of different spatial regions. Results: Evaluation experiments on the widely-used BraTS 2018 and BraTS 2020 datasets demonstrated the effectiveness of the proposed CMAF-Net which achieved average Dice scores of 87.9%, 81.8%, and 64.3%, as well as Hausdorff distances of 4.21, 5.35, and 4.02 for whole tumor, tumor core, and enhancing tumor on the BraTS 2020 dataset, respectively, outperforming several state-of-the-art segmentation methods in missing modalities situations. Conclusions: The experimental results show that the proposed CMAF-Net can achieve accurate brain tumor segmentation in the case of missing modalities with promising application potential.

7.
Front Neural Circuits ; 18: 1430783, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39040685

RESUMO

Early life experiences shape physical and behavioral outcomes throughout lifetime. Sensory circuits are especially susceptible to environmental and physiological changes during development. However, the impact of different types of early life experience are often evaluated in isolation. In this mini review, we discuss the specific effects of postnatal sensory experience, sleep, social isolation, and substance exposure on barrel cortex development. Considering these concurrent factors will improve understanding of the etiology of atypical sensory perception in many neuropsychiatric and neurodevelopmental disorders.


Assuntos
Córtex Somatossensorial , Córtex Somatossensorial/fisiologia , Córtex Somatossensorial/crescimento & desenvolvimento , Animais , Humanos , Isolamento Social/psicologia , Sono/fisiologia
8.
Cereb Cortex ; 34(6)2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38879756

RESUMO

Midbrain multisensory neurons undergo a significant postnatal transition in how they process cross-modal (e.g. visual-auditory) signals. In early stages, signals derived from common events are processed competitively; however, at later stages they are processed cooperatively such that their salience is enhanced. This transition reflects adaptation to cross-modal configurations that are consistently experienced and become informative about which correspond to common events. Tested here was the assumption that overt behaviors follow a similar maturation. Cats were reared in omnidirectional sound thereby compromising the experience needed for this developmental process. Animals were then repeatedly exposed to different configurations of visual and auditory stimuli (e.g. spatiotemporally congruent or spatially disparate) that varied on each side of space and their behavior was assessed using a detection/localization task. Animals showed enhanced performance to stimuli consistent with the experience provided: congruent stimuli elicited enhanced behaviors where spatially congruent cross-modal experience was provided, and spatially disparate stimuli elicited enhanced behaviors where spatially disparate cross-modal experience was provided. Cross-modal configurations not consistent with experience did not enhance responses. The presumptive benefit of such flexibility in the multisensory developmental process is to sensitize neural circuits (and the behaviors they control) to the features of the environment in which they will function. These experiments reveal that these processes have a high degree of flexibility, such that two (conflicting) multisensory principles can be implemented by cross-modal experience on opposite sides of space even within the same animal.


Assuntos
Estimulação Acústica , Percepção Auditiva , Encéfalo , Estimulação Luminosa , Percepção Visual , Animais , Gatos , Percepção Auditiva/fisiologia , Percepção Visual/fisiologia , Estimulação Luminosa/métodos , Encéfalo/fisiologia , Encéfalo/crescimento & desenvolvimento , Masculino , Feminino , Comportamento Animal/fisiologia
9.
Brain Sci ; 14(6)2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38928618

RESUMO

Intracerebral hemorrhage (ICH) is a critical condition characterized by a high prevalence, substantial mortality rates, and unpredictable clinical outcomes, which results in a serious threat to human health. Improving the timeliness and accuracy of prognosis assessment is crucial to minimizing mortality and long-term disability associated with ICH. Due to the complexity of ICH, the diagnosis of ICH in clinical practice heavily relies on the professional expertise and clinical experience of physicians. Traditional prognostic methods largely depend on the specialized knowledge and subjective judgment of healthcare professionals. Meanwhile, existing artificial intelligence (AI) methodologies, which predominantly utilize features derived from computed tomography (CT) scans, fall short of capturing the multifaceted nature of ICH. Although existing methods are capable of integrating clinical information and CT images for prognosis, the effectiveness of this fusion process still requires improvement. To surmount these limitations, the present study introduces a novel AI framework, termed the ICH Network (ICH-Net), which employs a joint-attention cross-modal network to synergize clinical textual data with CT imaging features. The architecture of ICH-Net consists of three integral components: the Feature Extraction Module, which processes and abstracts salient characteristics from the clinical and imaging data, the Feature Fusion Module, which amalgamates the diverse data streams, and the Classification Module, which interprets the fused features to deliver prognostic predictions. Our evaluation, conducted through a rigorous five-fold cross-validation process, demonstrates that ICH-Net achieves a commendable accuracy of up to 87.77%, outperforming other state-of-the-art methods detailed within our research. This evidence underscores the potential of ICH-Net as a formidable tool in prognosticating ICH, promising a significant advancement in clinical decision-making and patient care.

10.
Math Biosci Eng ; 21(4): 4989-5006, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38872523

RESUMO

Due to irregular sampling or device failure, the data collected from sensor network has missing value, that is, missing time-series data occurs. To address this issue, many methods have been proposed to impute random or non-random missing data. However, the imputation accuracy of these methods are not accurate enough to be applied, especially in the case of complete data missing (CDM). Thus, we propose a cross-modal method to impute time-series missing data by dense spatio-temporal transformer nets (DSTTN). This model embeds spatial modal data into time-series data by stacked spatio-temporal transformer blocks and deployment of dense connections. It adopts cross-modal constraints, a graph Laplacian regularization term, to optimize model parameters. When the model is trained, it recovers missing data finally by an end-to-end imputation pipeline. Various baseline models are compared by sufficient experiments. Based on the experimental results, it is verified that DSTTN achieves state-of-the-art imputation performance in the cases of random and non-random missing. Especially, the proposed method provides a new solution to the CDM problem.

11.
Neural Netw ; 178: 106400, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38850633

RESUMO

In large-scale power systems, accurately detecting and diagnosing the type of faults when they occur in the grid is a challenging problem. The classification performance of most existing grid fault diagnosis methods depends on the richness and reliability of the data, in addition, it is difficult to obtain sufficient feature information from unimodal circuit signals. To address these issues, we propose a deep residual convolutional neural network (DRCNN)-based framework for grid fault diagnosis. First, we design a comprehensive information entropy value (CIEV) evaluation metric that combines fuzzy entropy (FuzEn) and mutual approximation entropy (MutEn) to integrate multiple decomposition subsequences. Then, DRCNN and heterogeneous graph transformer (HGT) are constructed for extracting multimodal features and considering modal variability. In addition, to obtain the implicit information of multimodal features and control the degree of their performance, we propose to incorporate the cross-modal attention fusion (CMAF) mechanism in the synthesis framework. We validate the proposed method on the three-phase transmission line dataset and VSB power line dataset with accuracies of 99.4 % and 99.0 %, respectively. The proposed method also achieves superior performance compared to classical and state-of-the-art methods.

12.
Foods ; 13(11)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38890857

RESUMO

As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.

13.
Elife ; 122024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38842277

RESUMO

Flexible responses to sensory stimuli based on changing rules are critical for adapting to a dynamic environment. However, it remains unclear how the brain encodes and uses rule information to guide behavior. Here, we made single-unit recordings while head-fixed mice performed a cross-modal sensory selection task where they switched between two rules: licking in response to tactile stimuli while rejecting visual stimuli, or vice versa. Along a cortical sensorimotor processing stream including the primary (S1) and secondary (S2) somatosensory areas, and the medial (MM) and anterolateral (ALM) motor areas, single-neuron activity distinguished between the two rules both prior to and in response to the tactile stimulus. We hypothesized that neural populations in these areas would show rule-dependent preparatory states, which would shape the subsequent sensory processing and behavior. This hypothesis was supported for the motor cortical areas (MM and ALM) by findings that (1) the current task rule could be decoded from pre-stimulus population activity; (2) neural subspaces containing the population activity differed between the two rules; and (3) optogenetic disruption of pre-stimulus states impaired task performance. Our findings indicate that flexible action selection in response to sensory input can occur via configuration of preparatory states in the motor cortex.


Assuntos
Córtex Motor , Animais , Camundongos , Córtex Motor/fisiologia , Masculino , Córtex Somatossensorial/fisiologia , Neurônios/fisiologia , Feminino , Optogenética , Comportamento Animal/fisiologia
14.
Prog Neurobiol ; 240: 102637, 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38879074

RESUMO

While it is well established that sensory cortical regions traditionally thought to be unimodal can be activated by stimuli from modalities other than the dominant one, functions of such foreign-modal activations are still not clear. Here we show that visual activations in early auditory cortex can be related to whether or not the monkeys engaged in audio-visual tasks, to the time when the monkeys reacted to the visual component of such tasks, and to the correctness of the monkeys' response to the auditory component of such tasks. These relationships between visual activations and behavior suggest that auditory cortex can be recruited for visually-guided behavior and that visual activations can prime auditory cortex such that it is prepared for processing future sounds. Our study thus provides evidence that foreign-modal activations in sensory cortex can contribute to a subject's ability to perform tasks on stimuli from foreign and dominant modalities.

15.
Behav Brain Res ; 470: 115072, 2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-38815697

RESUMO

Previous studies have shown that individuals not only successfully engage in cross-domain analogies but also accomplish cross-modal reasoning. Yet, the behavioral representation and neurophysiological basis of cross-modal and cross-domain analogical reasoning remain unclear. This study established three analogical reasoning conditions by combining a multi-to-multi learning-test paradigm with a four­term analogy paradigm: within-domain, cross-domain, and cross-modal conditions. Thirty participants were required to judge whether the relationship between C and D was the same as the learned relationship between A and B. Behavioral results revealed no significant differences in reaction times and accuracy between cross-domain and cross-modal conditions, but both conditions showed significantly lower accuracy than within-domain condition. ERP results indicated a larger P2 amplitude in the cross-modal condition, while a larger N400 amplitude was observed in the cross-domain condition. These findings suggest: (1) The P2 in cross-modal analogical reasoning is associated with more difficult access to cross-modal information. (2) The N400 in cross-domain analogical reasoning is related to more challenging semantic processing. This study provides the first evidence of behavioral and ERP differences between cross-modal and cross-domain analogical reasoning, deepening our understanding of the cognitive processes involved in cross-modal analogical reasoning.


Assuntos
Eletroencefalografia , Potenciais Evocados , Tempo de Reação , Humanos , Masculino , Feminino , Potenciais Evocados/fisiologia , Adulto Jovem , Tempo de Reação/fisiologia , Adulto , Encéfalo/fisiologia , Resolução de Problemas/fisiologia
16.
Sensors (Basel) ; 24(10)2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38793984

RESUMO

Fine-grained representation is fundamental to species classification based on deep learning, and in this context, cross-modal contrastive learning is an effective method. The diversity of species coupled with the inherent contextual ambiguity of natural language poses a primary challenge in the cross-modal representation alignment of conservation area image data. Integrating cross-modal retrieval tasks with generation tasks contributes to cross-modal representation alignment based on contextual understanding. However, during the contrastive learning process, apart from learning the differences in the data itself, a pair of encoders inevitably learns the differences caused by encoder fluctuations. The latter leads to convergence shortcuts, resulting in poor representation quality and an inaccurate reflection of the similarity relationships between samples in the original dataset within the shared space of features. To achieve fine-grained cross-modal representation alignment, we first propose a residual attention network to enhance consistency during momentum updates in cross-modal encoders. Building upon this, we propose momentum encoding from a multi-task perspective as a bridge for cross-modal information, effectively improving cross-modal mutual information, representation quality, and optimizing the distribution of feature points within the cross-modal shared semantic space. By acquiring momentum encoding queues for cross-modal semantic understanding through multi-tasking, we align ambiguous natural language representations around the invariant image features of factual information, alleviating contextual ambiguity and enhancing model robustness. Experimental validation shows that our proposed multi-task perspective of cross-modal momentum encoders outperforms similar models on standardized image classification tasks and image-text cross-modal retrieval tasks on public datasets by up to 8% on the leaderboard, demonstrating the effectiveness of the proposed method. Qualitative experiments on our self-built conservation area image-text paired dataset show that our proposed method accurately performs cross-modal retrieval and generation tasks among 8142 species, proving its effectiveness on fine-grained cross-modal image-text conservation area image datasets.

17.
Plants (Basel) ; 13(9)2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38732391

RESUMO

Tomato leaf disease control in the field of smart agriculture urgently requires attention and reinforcement. This paper proposes a method called LAFANet for image-text retrieval, which integrates image and text information for joint analysis of multimodal data, helping agricultural practitioners to provide more comprehensive and in-depth diagnostic evidence to ensure the quality and yield of tomatoes. First, we focus on six common tomato leaf disease images and text descriptions, creating a Tomato Leaf Disease Image-Text Retrieval Dataset (TLDITRD), introducing image-text retrieval into the field of tomato leaf disease retrieval. Then, utilizing ViT and BERT models, we extract detailed image features and sequences of textual features, incorporating contextual information from image-text pairs. To address errors in image-text retrieval caused by complex backgrounds, we propose Learnable Fusion Attention (LFA) to amplify the fusion of textual and image features, thereby extracting substantial semantic insights from both modalities. To delve further into the semantic connections across various modalities, we propose a False Negative Elimination-Adversarial Negative Selection (FNE-ANS) approach. This method aims to identify adversarial negative instances that specifically target false negatives within the triplet function, thereby imposing constraints on the model. To bolster the model's capacity for generalization and precision, we propose Adversarial Regularization (AR). This approach involves incorporating adversarial perturbations during model training, thereby fortifying its resilience and adaptability to slight variations in input data. Experimental results show that, compared with existing ultramodern models, LAFANet outperformed existing models on TLDITRD dataset, with top1, top5, and top10 reaching 83.3% and 90.0%, and top1, top5, and top10 reaching 80.3%, 93.7%, and 96.3%. LAFANet offers fresh technical backing and algorithmic insights for the retrieval of tomato leaf disease through image-text correlation.

18.
Neural Netw ; 178: 106403, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38815470

RESUMO

The goal of multi-modal neural machine translation (MNMT) is to incorporate language-agnostic visual information into text to enhance the performance of machine translation. However, due to the inherent differences between image and text, these two modalities inevitably suffer from semantic mismatch problems. To tackle this issue, this paper adopts a multi-grained visual pivot-guided multi-modal fusion strategy with cross-modal contrastive disentangling to eliminate the linguistic gaps between different languages. By using the disentangled multi-grained visual information as a cross-lingual pivot, we can enhance the alignment between different languages and improve the performance of MNMT. We first introduce text-guided stacked cross-modal disentangling modules to progressively disentangle image into two types of visual information: MT-related visual and background information. Then we effectively integrate these two kinds of multi-grained visual elements to assist target sentence generation. Extensive experiments on four benchmark MNMT datasets are conducted, and the results demonstrate that our proposed approach achieves significant improvement over the other state-of-the-art (SOTA) approaches on all test sets. The in-depth analysis highlights the benefits of text-guided cross-modal disentangling and visual pivot-based multi-modal fusion strategies in MNMT. We release the code at https://github.com/nlp-mnmt/ConVisPiv-MNMT.

19.
Sensors (Basel) ; 24(7)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38610419

RESUMO

Through-wall radar human body pose recognition technology has broad applications in both military and civilian sectors. Identifying the current pose of targets behind walls and predicting subsequent pose changes are significant challenges. Conventional methods typically utilize radar information along with machine learning algorithms such as SVM and random forests to aid in recognition. However, these approaches have limitations, particularly in complex scenarios. In response to this challenge, this paper proposes a cross-modal supervised through-wall radar human body pose recognition method. By integrating information from both cameras and radar, a cross-modal dataset was constructed, and a corresponding deep learning network architecture was designed. During training, the network effectively learned the pose features of targets obscured by walls, enabling accurate pose recognition (e.g., standing, crouching) in scenarios with unknown wall obstructions. The experimental results demonstrated the superiority of the proposed method over traditional approaches, offering an effective and innovative solution for practical through-wall radar applications. The contribution of this study lies in the integration of deep learning with cross-modal supervision, providing new perspectives for enhancing the robustness and accuracy of target pose recognition.


Assuntos
Corpo Humano , Militares , Humanos , Radar , Algoritmos , Aprendizado de Máquina
20.
Psychon Bull Rev ; 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38653956

RESUMO

Whether information in working memory (WM) is stored in a domain-independent or domain-specific system is still the subject of intense debate. This study used the delayed match-to-sample paradigm, the dual-task paradigm, and the selective interference paradigm to investigate the mechanism of cross-modal storage in visual and vibrotactile WM. We postulated that WM may store cross-modal data from haptics and vision independently, and we proposed domain-specific WM storage. According to the findings, the WM can store cross-modal information from vision and haptics independently, and the storage of visual and tactile WM may be domain-specific. This study provides early support for the hypothesis that haptic and visuospatial sketchpads are dissociated. In addition, the current study provides evidence to elucidate the mechanisms by which WM stores and processes data from different modalities and content. The results also indicate that a cross-modal approach can broaden the cognitive processing bandwidth of WM.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...