Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38959142

RESUMO

Disentangled representation learning aims at obtaining an independent latent representation without supervisory signals. However, the independence of a representation does not guarantee interpretability to match human intuition in the unsupervised settings. In this article, we introduce conceptual representation learning, an unsupervised strategy to learn a representation and its concepts. An antonym pair forms a concept, which determines the semantically meaningful axes in the latent space. Since the connection between signifying words and signified notions is arbitrary in natural languages, the verbalization of data features makes the representation make sense to humans. We thus construct Conceptual VAE (ConcVAE), a variational autoencoder (VAE)-based generative model with an explicit process in which the semantic representation of data is generated via trainable concepts. In visual data, ConcVAE utilizes natural language arbitrariness as an inductive bias of unsupervised learning by using a vision-language pretraining, which can tell an unsupervised model what makes sense to humans. Qualitative and quantitative evaluations show that the conceptual inductive bias in ConcVAE effectively disentangles the latent representation in a sense-making manner without supervision. Code is available at https://github.com/ganmodokix/concvae.

2.
Bioengineering (Basel) ; 11(6)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38927759

RESUMO

This study presents a trial analysis that uses brain activity information obtained from mice to detect rheumatoid arthritis (RA) in its presymptomatic stages. Specifically, we confirmed that F759 mice, serving as a mouse model of RA that is dependent on the inflammatory cytokine IL-6, and healthy wild-type mice can be classified on the basis of brain activity information. We clarified which brain regions are useful for the presymptomatic detection of RA. We introduced a matrix completion-based approach to handle missing brain activity information to perform the aforementioned analysis. In addition, we implemented a canonical correlation-based method capable of analyzing the relationship between various types of brain activity information. This method allowed us to accurately classify F759 and wild-type mice, thereby identifying essential features, including crucial brain regions, for the presymptomatic detection of RA. Our experiment obtained brain activity information from 15 F759 and 10 wild-type mice and analyzed the acquired data. By employing four types of classifiers, our experimental results show that the thalamus and periaqueductal gray are effective for the classification task. Furthermore, we confirmed that classification performance was maximized when seven brain regions were used, excluding the electromyogram and nucleus accumbens.

3.
Sensors (Basel) ; 24(11)2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38894233

RESUMO

This paper proposes a multimodal Transformer model that uses time-series data to detect and predict winter road surface conditions. For detecting or predicting road surface conditions, the previous approach focuses on the cooperative use of multiple modalities as inputs, e.g., images captured by fixed-point cameras (road surface images) and auxiliary data related to road surface conditions under simple modality integration. Although such an approach achieves performance improvement compared to the method using only images or auxiliary data, there is a demand for further consideration of the way to integrate heterogeneous modalities. The proposed method realizes a more effective modality integration using a cross-attention mechanism and time-series processing. Concretely, when integrating multiple modalities, feature compensation through mutual complementation between modalities is realized through a feature integration technique based on a cross-attention mechanism, and the representational ability of the integrated features is enhanced. In addition, by introducing time-series processing for the input data across several timesteps, it is possible to consider the temporal changes in the road surface conditions. Experiments are conducted for both detection and prediction tasks using data corresponding to the current winter condition and data corresponding to a few hours after the current winter condition, respectively. The experimental results verify the effectiveness of the proposed method for both tasks. In addition to the construction of the classification model for winter road surface conditions, we first attempt to visualize the classification results, especially the prediction results, through the image style transfer model as supplemental extended experiments on image generation at the end of the paper.

4.
Sensors (Basel) ; 24(10)2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38793890

RESUMO

In our digitally driven society, advances in software and hardware to capture video data allow extensive gathering and analysis of large datasets. This has stimulated interest in extracting information from video data, such as buildings and urban streets, to enhance understanding of the environment. Urban buildings and streets, as essential parts of cities, carry valuable information relevant to daily life. Extracting features from these elements and integrating them with technologies such as VR and AR can contribute to more intelligent and personalized urban public services. Despite its potential benefits, collecting videos of urban environments introduces challenges because of the presence of dynamic objects. The varying shape of the target building in each frame necessitates careful selection to ensure the extraction of quality features. To address this problem, we propose a novel evaluation metric that considers the video-inpainting-restoration quality and the relevance of the target object, considering minimizing areas with cars, maximizing areas with the target building, and minimizing overlapping areas. This metric extends existing video-inpainting-evaluation metrics by considering the relevance of the target object and interconnectivity between objects. We conducted experiment to validate the proposed metrics using real-world datasets from Japanese cities Sapporo and Yokohama. The experiment results demonstrate feasibility of selecting video frames conducive to building feature extraction.

5.
Sensors (Basel) ; 24(10)2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38793888

RESUMO

In this study, we propose a classification method of expert-novice levels using a graph convolutional network (GCN) with a confidence-aware node-level attention mechanism. In classification using an attention mechanism, highlighted features may not be significant for accurate classification, thereby degrading classification performance. To address this issue, the proposed method introduces a confidence-aware node-level attention mechanism into a spatiotemporal attention GCN (STA-GCN) for the classification of expert-novice levels. Consequently, our method can contrast the attention value of each node on the basis of the confidence measure of the classification, which solves the problem of classification approaches using attention mechanisms and realizes accurate classification. Furthermore, because the expert-novice levels have ordinalities, using a classification model that considers ordinalities improves the classification performance. The proposed method involves a model that minimizes a loss function that considers the ordinalities of classes to be classified. By implementing the above approaches, the expert-novice level classification performance is improved.

6.
Sensors (Basel) ; 24(10)2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38793943

RESUMO

The advancements in deep learning have significantly enhanced the capability of image generation models to produce images aligned with human intentions. However, training and adapting these models to new data and tasks remain challenging because of their complexity and the risk of catastrophic forgetting. This study proposes a method for addressing these challenges involving the application of class-replacement techniques within a continual learning framework. This method utilizes selective amnesia (SA) to efficiently replace existing classes with new ones while retaining crucial information. This approach improves the model's adaptability to evolving data environments while preventing the loss of past information. We conducted a detailed evaluation of class-replacement techniques, examining their impact on the "class incremental learning" performance of models and exploring their applicability in various scenarios. The experimental results demonstrated that our proposed method could enhance the learning efficiency and long-term performance of image generation models. This study broadens the application scope of image generation technology and supports the continual improvement and adaptability of corresponding models.

7.
Neural Netw ; 172: 106154, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38309137

RESUMO

Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.


Assuntos
COVID-19 , Destilação , Humanos , Benchmarking , Generalização Psicológica , Privacidade
8.
Sensors (Basel) ; 24(3)2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38339636

RESUMO

Text-guided image editing has been highlighted in the fields of computer vision and natural language processing in recent years. The approach takes an image and text prompt as input and aims to edit the image in accordance with the text prompt while preserving text-unrelated regions. The results of text-guided image editing differ depending on the way the text prompt is represented, even if it has the same meaning. It is up to the user to decide which result best matches the intended use of the edited image. This paper assumes a situation in which edited images are posted to social media and proposes a novel text-guided image editing method to help the edited images gain attention from a greater audience. In the proposed method, we apply the pre-trained text-guided image editing method and obtain multiple edited images from the multiple text prompts generated from a large language model. The proposed method leverages the novel model that predicts post scores representing engagement rates and selects one image that will gain the most attention from the audience on social media among these edited images. Subject experiments on a dataset of real Instagram posts demonstrate that the edited images of the proposed method accurately reflect the content of the text prompts and provide a positive impression to the audience on social media compared to those of previous text-guided image editing methods.


Assuntos
Mídias Sociais , Humanos , Idioma , Processamento de Linguagem Natural
9.
Sensors (Basel) ; 23(23)2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38067982

RESUMO

Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road and reduce traffic accidents. Most existing methods for traffic sign recognition use convolutional neural networks (CNNs) and can achieve high recognition accuracy. However, these methods first require a large number of carefully crafted traffic sign datasets for the training process. Moreover, since traffic signs differ in each country and there is a variety of traffic signs, these methods need to be fine-tuned when recognizing new traffic sign categories. To address these issues, we propose a traffic sign matching method for zero-shot recognition. Our proposed method can perform traffic sign recognition without training data by directly matching the similarity of target and template traffic sign images. Our method uses the midlevel features of CNNs to obtain robust feature representations of traffic signs without additional training or fine-tuning. We discovered that midlevel features improve the accuracy of zero-shot traffic sign recognition. The proposed method achieves promising recognition results on the German Traffic Sign Recognition Benchmark open dataset and a real-world dataset taken from Sapporo City, Japan.

10.
Sensors (Basel) ; 23(22)2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-38005673

RESUMO

At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics.

11.
Sensors (Basel) ; 23(15)2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37571685

RESUMO

Zero-shot neural decoding aims to decode image categories, which were not previously trained, from functional magnetic resonance imaging (fMRI) activity evoked when a person views images. However, having insufficient training data due to the difficulty in collecting fMRI data causes poor generalization capability. Thus, models suffer from the projection domain shift problem when novel target categories are decoded. In this paper, we propose a zero-shot neural decoding approach with semi-supervised multi-view embedding. We introduce the semi-supervised approach that utilizes additional images related to the target categories without fMRI activity patterns. Furthermore, we project fMRI activity patterns into a multi-view embedding space, i.e., visual and semantic feature spaces of viewed images to effectively exploit the complementary information. We define several source and target groups whose image categories are very different and verify the zero-shot neural decoding performance. The experimental results demonstrate that the proposed approach rectifies the projection domain shift problem and outperforms existing methods.

12.
Sensors (Basel) ; 23(10)2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37430712

RESUMO

In this paper, we propose a hierarchical multi-modal multi-label attribute classification model for anime illustrations using a graph convolutional network (GCN). Our focus is on the challenging task of multi-label attribute classification, which requires capturing subtle features intentionally highlighted by creators of anime illustrations. To address the hierarchical nature of these attributes, we leverage hierarchical clustering and hierarchical label assignments to organize the attribute information into a hierarchical feature. The proposed GCN-based model effectively utilizes this hierarchical feature to achieve high accuracy in multi-label attribute classification. The contributions of the proposed method are as follows. Firstly, we introduce GCN to the multi-label attribute classification task of anime illustrations, enabling the capturing of more comprehensive relationships between attributes from their co-occurrence. Secondly, we capture subordinate relationships among the attributes by adopting hierarchical clustering and hierarchical label assignment. Lastly, we construct a hierarchical structure of attributes that appear more frequently in anime illustrations based on certain rules derived from previous studies, which helps to reflect the relationships between different attributes. The experimental results on multiple datasets show that the proposed method is effective and extensible by comparing it with some existing methods, including the state-of-the-art method.

13.
Sensors (Basel) ; 23(9)2023 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-37177712

RESUMO

In soccer, quantitatively evaluating the performance of players and teams is essential to improve tactical coaching and players' decision-making abilities. To achieve this, some methods use predicted probabilities of shoot event occurrences to quantify player performances, but conventional shoot prediction models have not performed well and have failed to consider the reliability of the event probability. This paper proposes a novel method that effectively utilizes players' spatio-temporal relations and prediction uncertainty to predict shoot event occurrences with greater accuracy and robustness. Specifically, we represent players' relations as a complete bipartite graph, which effectively incorporates soccer domain knowledge, and capture latent features by applying a graph convolutional recurrent neural network (GCRNN) to the constructed graph. Our model utilizes a Bayesian neural network to predict the probability of shoot event occurrence, considering spatio-temporal relations between players and prediction uncertainty. In our experiments, we confirmed that the proposed method outperformed several other methods in terms of prediction performance, and we found that considering players' distances significantly affects the prediction accuracy.

14.
Sensors (Basel) ; 23(9)2023 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-37177744

RESUMO

This study proposes a novel off-screen sound separation method based on audio-visual pre-training. In the field of audio-visual analysis, researchers have leveraged visual information for audio manipulation tasks, such as sound source separation. Although such audio manipulation tasks are based on correspondences between audio and video, these correspondences are not always established. Specifically, sounds coming from outside a screen have no audio-visual correspondences and thus interfere with conventional audio-visual learning. The proposed method separates such off-screen sounds based on their arrival directions using binaural audio, which provides us with three-dimensional sensation. Furthermore, we propose a new pre-training method that can consider the off-screen space and use the obtained representation to improve off-screen sound separation. Consequently, the proposed method can separate off-screen sounds irrespective of the direction from which they arrive. We conducted our evaluation using generated video data to circumvent the problem of difficulty in collecting ground truth for off-screen sounds. We confirmed the effectiveness of our methods through off-screen sound detection and separation tasks.

15.
Int J Comput Assist Radiol Surg ; 18(10): 1841-1848, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37040011

RESUMO

PURPOSE: Manual annotation of gastric X-ray images by doctors for gastritis detection is time-consuming and expensive. To solve this, a self-supervised learning method is developed in this study. The effectiveness of the proposed self-supervised learning method in gastritis detection is verified using a few annotated gastric X-ray images. METHODS: In this study, we develop a novel method that can perform explicit self-supervised learning and learn discriminative representations from gastric X-ray images. Models trained based on the proposed method were fine-tuned on datasets comprising a few annotated gastric X-ray images. Five self-supervised learning methods, i.e., SimSiam, BYOL, PIRL-jigsaw, PIRL-rotation, and SimCLR, were compared with the proposed method. Furthermore, three previous methods, one pretrained on ImageNet, one trained from scratch, and one semi-supervised learning method, were compared with the proposed method. RESULTS: The proposed method's harmonic mean score of sensitivity and specificity after fine-tuning with the annotated data of 10, 20, 30, and 40 patients were 0.875, 0.911, 0.915, and 0.931, respectively. The proposed method outperformed all comparative methods, including the five self-supervised learning and three previous methods. Experimental results showed the effectiveness of the proposed method in gastritis detection using a few annotated gastric X-ray images. CONCLUSIONS: This paper proposes a novel self-supervised learning method based on a teacher-student architecture for gastritis detection using gastric X-ray images. The proposed method can perform explicit self-supervised learning and learn discriminative representations from gastric X-ray images. The proposed method exhibits potential clinical use in gastritis detection using a few annotated gastric X-ray images.


Assuntos
Gastrite , Humanos , Raios X , Gastrite/diagnóstico por imagem , Rotação , Aprendizado de Máquina Supervisionado
16.
Comput Biol Med ; 158: 106877, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37019015

RESUMO

PROBLEM: Detecting COVID-19 from chest X-ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19. However, the existing methods usually use supervised transfer learning from natural images as a pretraining process. These methods do not consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. AIM: In this paper, we want to design a novel high-accuracy COVID-19 detection method that uses CXR images, which can consider the unique features of COVID-19 and the similar features between COVID-19 and other pneumonia. METHODS: Our method consists of two phases. One is self-supervised learning-based pertaining; the other is batch knowledge ensembling-based fine-tuning. Self-supervised learning-based pretraining can learn distinguished representations from CXR images without manually annotated labels. On the other hand, batch knowledge ensembling-based fine-tuning can utilize category knowledge of images in a batch according to their visual feature similarities to improve detection performance. Unlike our previous implementation, we introduce batch knowledge ensembling into the fine-tuning phase, reducing the memory used in self-supervised learning and improving COVID-19 detection accuracy. RESULTS: On two public COVID-19 CXR datasets, namely, a large dataset and an unbalanced dataset, our method exhibited promising COVID-19 detection performance. Our method maintains high detection accuracy even when annotated CXR training images are reduced significantly (e.g., using only 10% of the original dataset). In addition, our method is insensitive to changes in hyperparameters. CONCLUSION: The proposed method outperforms other state-of-the-art COVID-19 detection methods in different settings. Our method can reduce the workloads of healthcare providers and radiologists.


Assuntos
COVID-19 , Humanos , COVID-19/diagnóstico por imagem , Radiologistas , Tórax , Extremidade Superior , Aprendizado de Máquina Supervisionado
17.
Sensors (Basel) ; 23(3)2023 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-36772095

RESUMO

Auxiliary clinical diagnosis has been researched to solve unevenly and insufficiently distributed clinical resources. However, auxiliary diagnosis is still dominated by human physicians, and how to make intelligent systems more involved in the diagnosis process is gradually becoming a concern. An interactive automated clinical diagnosis with a question-answering system and a question generation system can capture a patient's conditions from multiple perspectives with less physician involvement by asking different questions to drive and guide the diagnosis. This clinical diagnosis process requires diverse information to evaluate a patient from different perspectives to obtain an accurate diagnosis. Recently proposed medical question generation systems have not considered diversity. Thus, we propose a diversity learning-based visual question generation model using a multi-latent space to generate informative question sets from medical images. The proposed method generates various questions by embedding visual and language information in different latent spaces, whose diversity is trained by our newly proposed loss. We have also added control over the categories of generated questions, making the generated questions directional. Furthermore, we use a new metric named similarity to accurately evaluate the proposed model's performance. The experimental results on the Slake and VQA-RAD datasets demonstrate that the proposed method can generate questions with diverse information. Our model works with an answering model for interactive automated clinical diagnosis and generates datasets to replace the process of annotation that incurs huge labor costs.


Assuntos
Processamento de Linguagem Natural , Semântica , Humanos , Idioma
18.
Sensors (Basel) ; 23(3)2023 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-36772694

RESUMO

This study presents a method for distress image classification in road infrastructures introducing self-supervised learning. Self-supervised learning is an unsupervised learning method that does not require class labels. This learning method can reduce annotation efforts and allow the application of machine learning to a large number of unlabeled images. We propose a novel distress image classification method using contrastive learning, which is a type of self-supervised learning. Contrastive learning provides image domain-specific representation, constraining such that similar images are embedded nearby in the latent space. We augment the single input distress image into multiple images by image transformations and construct the latent space, in which the augmented images are embedded close to each other. This provides a domain-specific representation of the damage in road infrastructure using a large number of unlabeled distress images. Finally, the representation obtained by contrastive learning is used to improve the distress image classification performance. The obtained contrastive learning model parameters are used for the distress image classification model. We realize the successful distress image representation by utilizing unlabeled distress images, which have been difficult to use in the past. In the experiments, we use the distress images obtained from the real world to verify the effectiveness of the proposed method for various distress types and confirm the performance improvement.

19.
Int J Comput Assist Radiol Surg ; 18(4): 715-722, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36538184

RESUMO

PURPOSE: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, chest radiography can be used to fast screen COVID-19 during the patient triage, thereby determining the priority of patient's care to help saturated medical facilities in a pandemic situation. METHODS: In this paper, we propose a new learning scheme called self-supervised transfer learning for detecting COVID-19 from chest X-ray (CXR) images. We compared six self-supervised learning (SSL) methods (Cross, BYOL, SimSiam, SimCLR, PIRL-jigsaw, and PIRL-rotation) with the proposed method. Additionally, we compared six pretrained DCNNs (ResNet18, ResNet50, ResNet101, CheXNet, DenseNet201, and InceptionV3) with the proposed method. We provide quantitative evaluation on the largest open COVID-19 CXR dataset and qualitative results for visual inspection. RESULTS: Our method achieved a harmonic mean (HM) score of 0.985, AUC of 0.999, and four-class accuracy of 0.953. We also used the visualization technique Grad-CAM++ to generate visual explanations of different classes of CXR images with the proposed method to increase the interpretability. CONCLUSIONS: Our method shows that the knowledge learned from natural images using transfer learning is beneficial for SSL of the CXR images and boosts the performance of representation learning for COVID-19 detection. Our method promises to reduce the incidence of infections among radiologists and healthcare providers.


Assuntos
COVID-19 , Humanos , COVID-19/diagnóstico por imagem , Pandemias , Raios X , Tórax , Aprendizado de Máquina
20.
Sensors (Basel) ; 22(23)2022 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-36502199

RESUMO

This paper presents a trial analysis of the relationship between taste and biological information obtained while eating strawberries (for a sensory evaluation). This study used the visual analog scale (VAS); we collected questionnaires used in previous studies and human brain activity obtained while eating strawberries. In our analysis, we assumed that brain activity is highly correlated with taste. Then, the relationships between brain activity and other data, such as VAS and questionnaires, could be analyzed through a canonical correlation analysis, which is a multivariate analysis. Through an analysis of brain activity, the potential relationship with "taste" (that is not revealed by the initial simple correlation analysis) can be discovered. This is the main contribution of this study. In the experiments, we discovered the potential relationship between cultural factors (in the questionnaires) and taste. We also found a strong relationship between taste and individual information. In particular, the analysis of cross-loading between brain activity and individual information suggests that acidity and the sugar-to-acid ratio are related to taste.


Assuntos
Fragaria , Humanos , Frutas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...