Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 2.319
Filtrar
1.
Med Image Anal ; 98: 103295, 2024 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-39217673

RESUMEN

PURPOSE: Vision Transformers recently achieved a competitive performance compared with CNNs due to their excellent capability of learning global representation. However, there are two major challenges when applying them to 3D image segmentation: i) Because of the large size of 3D medical images, comprehensive global information is hard to capture due to the enormous computational costs. ii) Insufficient local inductive bias in Transformers affects the ability to segment detailed features such as ambiguous and subtly defined boundaries. Hence, to apply the Vision Transformer mechanism in the medical image segmentation field, the above challenges need to be overcome adequately. METHODS: We propose a hybrid paradigm, called Variable-Shape Mixed Transformer (VSmTrans), that integrates self-attention and convolution and can enjoy the benefits of free learning of both complex relationships from the self-attention mechanism and the local prior knowledge from convolution. Specifically, we designed a Variable-Shape self-attention mechanism, which can rapidly expand the receptive field without extra computing cost and achieve a good trade-off between global awareness and local details. In addition, the parallel convolution paradigm introduces strong local inductive bias to facilitate the ability to excavate details. Meanwhile, a pair of learnable parameters can automatically adjust the importance of the above two paradigms. Extensive experiments were conducted on two public medical image datasets with different modalities: the AMOS CT dataset and the BraTS2021 MRI dataset. RESULTS: Our method achieves the best average Dice scores of 88.3 % and 89.7 % on these datasets, which are superior to the previous state-of-the-art Swin Transformer-based and CNN-based architectures. A series of ablation experiments were also conducted to verify the efficiency of the proposed hybrid mechanism and the components and explore the effectiveness of those key parameters in VSmTrans. CONCLUSIONS: The proposed hybrid Transformer-based backbone network for 3D medical image segmentation can tightly integrate self-attention and convolution to exploit the advantages of these two paradigms. The experimental results demonstrate our method's superiority compared to other state-of-the-art methods. The hybrid paradigm seems to be most appropriate to the medical image segmentation field. The ablation experiments also demonstrate that the proposed hybrid mechanism can effectively balance large receptive fields with local inductive biases, resulting in highly accurate segmentation results, especially in capturing details. Our code is available at https://github.com/qingze-bai/VSmTrans.

2.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 41(4): 807-817, 2024 Aug 25.
Artículo en Chino | MEDLINE | ID: mdl-39218608

RESUMEN

High-grade serous ovarian cancer has a high degree of malignancy, and at detection, it is prone to infiltration of surrounding soft tissues, as well as metastasis to the peritoneum and lymph nodes, peritoneal seeding, and distant metastasis. Whether recurrence occurs becomes an important reference for surgical planning and treatment methods for this disease. Current recurrence prediction models do not consider the potential pathological relationships between internal tissues of the entire ovary. They use convolutional neural networks to extract local region features for judgment, but the accuracy is low, and the cost is high. To address this issue, this paper proposes a new lightweight deep learning algorithm model for predicting recurrence of high-grade serous ovarian cancer. The model first uses ghost convolution (Ghost Conv) and coordinate attention (CA) to establish ghost counter residual (SCblock) modules to extract local feature information from images. Then, it captures global information and integrates multi-level information through proposed layered fusion Transformer (STblock) modules to enhance interaction between different layers. The Transformer module unfolds the feature map to compute corresponding region blocks, then folds it back to reduce computational cost. Finally, each STblock module fuses deep and shallow layer depth information and incorporates patient's clinical metadata for recurrence prediction. Experimental results show that compared to the mainstream lightweight mobile visual Transformer (MobileViT) network, the proposed slicer visual Transformer (SlicerViT) network improves accuracy, precision, sensitivity, and F1 score, with only 1/6 of the computational cost and half the parameter count. This research confirms that the proposed algorithm model is more accurate and efficient in predicting recurrence of high-grade serous ovarian cancer. In the future, it can serve as an auxiliary diagnostic technique to improve patient survival rates and facilitate the application of the model in embedded devices.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Recurrencia Local de Neoplasia , Redes Neurales de la Computación , Neoplasias Ováricas , Humanos , Femenino , Neoplasias Ováricas/patología , Metadatos , Cistadenocarcinoma Seroso/patología
3.
Skin Res Technol ; 30(9): e70040, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39221858

RESUMEN

BACKGROUND: Skin cancer is one of the highly occurring diseases in human life. Early detection and treatment are the prime and necessary points to reduce the malignancy of infections. Deep learning techniques are supplementary tools to assist clinical experts in detecting and localizing skin lesions. Vision transformers (ViT) based on image segmentation classification using multiple classes provide fairly accurate detection and are gaining more popularity due to legitimate multiclass prediction capabilities. MATERIALS AND METHODS: In this research, we propose a new ViT Gradient-Weighted Class Activation Mapping (GradCAM) based architecture named ViT-GradCAM for detecting and classifying skin lesions by spreading ratio on the lesion's surface area. The proposed system is trained and validated using a HAM 10000 dataset by studying seven skin lesions. The database comprises 10 015 dermatoscopic images of varied sizes. The data preprocessing and data augmentation techniques are applied to overcome the class imbalance issues and improve the model's performance. RESULT: The proposed algorithm is based on ViT models that classify the dermatoscopic images into seven classes with an accuracy of 97.28%, precision of 98.51, recall of 95.2%, and an F1 score of 94.6, respectively. The proposed ViT-GradCAM obtains better and more accurate detection and classification than other state-of-the-art deep learning-based skin lesion detection models. The architecture of ViT-GradCAM is extensively visualized to highlight the actual pixels in essential regions associated with skin-specific pathologies. CONCLUSION: This research proposes an alternate solution to overcome the challenges of detecting and classifying skin lesions using ViTs and GradCAM, which play a significant role in detecting and classifying skin lesions accurately rather than relying solely on deep learning models.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Dermoscopía , Neoplasias Cutáneas , Humanos , Dermoscopía/métodos , Neoplasias Cutáneas/diagnóstico por imagen , Neoplasias Cutáneas/clasificación , Neoplasias Cutáneas/patología , Interpretación de Imagen Asistida por Computador/métodos , Bases de Datos Factuales , Piel/diagnóstico por imagen , Piel/patología
4.
Front Artif Intell ; 7: 1384709, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39219699

RESUMEN

Agriculture is considered the backbone of Tanzania's economy, with more than 60% of the residents depending on it for survival. Maize is the country's dominant and primary food crop, accounting for 45% of all farmland production. However, its productivity is challenged by the limitation to detect maize diseases early enough. Maize streak virus (MSV) and maize lethal necrosis virus (MLN) are common diseases often detected too late by farmers. This has led to the need to develop a method for the early detection of these diseases so that they can be treated on time. This study investigated the potential of developing deep-learning models for the early detection of maize diseases in Tanzania. The regions where data was collected are Arusha, Kilimanjaro, and Manyara. Data was collected through observation by a plant. The study proposed convolutional neural network (CNN) and vision transformer (ViT) models. Four classes of imagery data were used to train both models: MLN, Healthy, MSV, and WRONG. The results revealed that the ViT model surpassed the CNN model, with 93.1 and 90.96% accuracies, respectively. Further studies should focus on mobile app development and deployment of the model with greater precision for early detection of the diseases mentioned above in real life.

5.
Biomed Eng Lett ; 14(5): 1069-1077, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39220025

RESUMEN

Multiclass classification of brain tumors from magnetic resonance (MR) images is challenging due to high inter-class similarities. To this end, convolution neural networks (CNN) have been widely adopted in recent studies. However, conventional CNN architectures fail to capture the small lesion patterns of brain tumors. To tackle this issue, in this paper, we propose a global transformer network dubbed GT-Net for multiclass brain tumor classification. The GT-Net mainly comprises a global transformer module (GTM), which is introduced on the top of a backbone network. A generalized self-attention block (GSB) is proposed to capture the feature inter-dependencies not only across spatial dimension but also channel dimension, thereby facilitating the extraction of the detailed tumor lesion information while ignoring less important information. Further, multiple GSB heads are used in GTM to leverage global feature dependencies. We evaluate our GT-Net on a benchmark dataset by adopting several backbone networks, and the results demonstrate the effectiveness of GTM. Further, comparison with state-of-the-art methods validates the superiority of our model.

6.
Biomed Eng Lett ; 14(5): 1023-1035, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39220023

RESUMEN

Deep learning-based methods for fast target segmentation of computed tomography (CT) imaging have become increasingly popular. The success of current deep learning methods usually depends on a large amount of labeled data. Labeling medical data is a time-consuming and laborious task. Therefore, this paper aims to enhance the segmentation of CT images by using a semi-supervised learning method. In order to utilize the valid information in unlabeled data, we design a semi-supervised network model for contrastive learning based on entropy constraints. We use CNN and Transformer to capture the image's local and global feature information, respectively. In addition, the pseudo-labels generated by the teacher networks are unreliable and will lead to degradation of the model performance if they are directly added to the training. Therefore, unreliable samples with high entropy values are discarded to avoid the model extracting the wrong features. In the student network, we also introduce the residual squeeze and excitation module to learn the connection between different channels of each layer feature to obtain better segmentation performance. We demonstrate the effectiveness of the proposed method on the COVID-19 CT public dataset. We mainly considered three evaluation metrics: DSC, HD95, and JC. Compared with several existing state-of-the-art semi-supervised methods, our method improves DSC by 2.3%, JC by 2.5%, and reduces HD95 by 1.9 mm. In this paper, a semi-supervised medical image segmentation method is designed by fusing CNN and Transformer and utilizing entropy-constrained contrastive learning loss, which improves the utilization of unlabeled medical images.

7.
Artículo en Inglés | MEDLINE | ID: mdl-39220623

RESUMEN

Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potential, the task of generalizing deep learning techniques for intracranial measurements faces data availability constraints due to limited manually annotated atlases encompassing whole brain and TICV/PFV labels. In this paper, we enhancing the hierarchical transformer UNesT for whole brain segmentation to achieve segmenting whole brain with 133 classes and TICV/PFV simultaneously. To address the problem of data scarcity, the model is first pretrained on 4859 T1-weighted (T1w) 3D volumes sourced from 8 different sites. These volumes are processed through a multi-atlas segmentation pipeline for label generation, while TICV/PFV labels are unavailable. Subsequently, the model is finetuned with 45 T1w 3D volumes from Open Access Series Imaging Studies (OASIS) where both 133 whole brain classes and TICV/PFV labels are available. We evaluate our method with Dice similarity coefficients(DSC). We show that our model is able to conduct precise TICV/PFV estimation while maintaining the 132 brain regions performance at a comparable level. Code and trained model are available at: https://github.com/MASILab/UNesT/wholebrainSeg.

8.
Sci Rep ; 14(1): 20355, 2024 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-39223198

RESUMEN

To address the problems of low accuracy in fault diagnosis of oil-immersed transformers, poor state perception ability and real-time collaboration during diagnosis feedback, a fault diagnosis method for transformers based on the integration of digital twins is proposed. Firstly, fault sample balance is achieved through Iterative Nearest Neighbor Oversampling (INNOS), Secondly, nine-dimensional ratio features are extracted, and the correlation between dissolved gases in oil and fault types is established. Then, sparse principal component analysis (SPCA) is used for feature fusion and dimensionality reduction. Finally, the Aquila Optimizer (AO) is introduced to optimize the parameters of the Kernel Extreme Learning Machine (KELM), establishing the optimal AO-KELM diagnosis model. The final fault diagnosis accuracy reaches 98.1013%. Combining transformer digital twin models, real-time interaction mapping between physical entities and virtual space is achieved, enabling online diagnosis of transformer faults. Experimental results show that the method proposed in this paper has high diagnostic accuracy and strong stability, providing reference for the intelligent operation and maintenance of transformers.

9.
Network ; : 1-21, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39224075

RESUMEN

Fault detection, classification, and location prediction are crucial for maintaining the stability and reliability of modern power systems, reducing economic losses, and enhancing system protection sensitivity. This paper presents a novel Hierarchical Deep Learning Approach (HDLA) for accurate and efficient fault diagnosis in transmission lines. HDLA leverages two-stage transformer-based classification and regression models to perform Fault Detection (FD), Fault Type Classification (FTC), and Fault Location Prediction (FLP) directly from synchronized raw three-phase current and voltage samples. By bypassing the need for feature extraction, HDLA significantly reduces computational complexity while achieving superior performance compared to existing deep learning methods. The efficacy of HDLA is validated on a comprehensive dataset encompassing various fault scenarios with diverse types, locations, resistances, inception angles, and noise levels. The results demonstrate significant improvements in accuracy, recall, precision, and F1-score metrics for classification, and Mean Absolute Errors (MAEs) and Root Mean Square Errors (RMSEs) for prediction, showcasing the effectiveness of HDLA for real-time fault diagnosis in power systems.

10.
Heliyon ; 10(16): e35964, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39224303

RESUMEN

Micro-expression is extensively studied due to their ability to fully reflect individuals' genuine emotions. However, accurate micro-expression recognition is a challenging task due to the subtle motion of facial muscle. Therefore, this paper introduces a Graph Attention Mechanism-based Motion Magnification Guided Micro-Expression Recognition Network (GAM-MM-MER) to amplify delicate muscle motions and focus on key facial landmarks. First, we propose a Swin Transformer-based network for micro-expression motion magnification (ST-MEMM) to enhance the subtle motions in micro-expression videos, thereby unveiling imperceptible facial muscle movements. Then, we propose a graph attention mechanism-based network for micro-expression recognition (GAM-MER), which optimizes facial key area maps and prioritizes adjacent nodes crucial for mitigating the influence of noisy neighbors, while attending to key feature information. Finally, experimental evaluations conducted on the CASME II and SAMM datasets demonstrate the high accuracy and effectiveness of the proposed network compared to state-of-the-art approaches. The results of our network exhibit significant superiority over existing methods. Furthermore, ablation studies provide compelling evidence of the robustness of our proposed network, substantiating its efficacy in micro-expression recognition.

11.
Front Neurorobot ; 18: 1437737, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39224907

RESUMEN

Utilizing deep features from electroencephalography (EEG) data for emotional music composition provides a novel approach for creating personalized and emotionally rich music. Compared to textual data, converting continuous EEG and music data into discrete units presents significant challenges, particularly the lack of a clear and fixed vocabulary for standardizing EEG and audio data. The lack of this standard makes the mapping relationship between EEG signals and musical elements (such as rhythm, melody, and emotion) blurry and complex. Therefore, we propose a method of using clustering to create discrete representations and using the Transformer model to reverse mapping relationships. Specifically, the model uses clustering labels to segment signals and independently encodes EEG and emotional music data to construct a vocabulary, thereby achieving discrete representation. A time series dictionary was developed using clustering algorithms, which more effectively captures and utilizes the temporal and structural relationships between EEG and audio data. In response to the insensitivity to temporal information in heterogeneous data, we adopted a multi head attention mechanism and positional encoding technology to enable the model to focus on information in different subspaces, thereby enhancing the understanding of the complex internal structure of EEG and audio data. In addition, to address the mismatch between local and global information in emotion driven music generation, we introduce an audio masking prediction loss learning method. Our method generates music that Hits@20 On the indicator, a performance of 68.19% was achieved, which improved the score by 4.9% compared to other methods, indicating the effectiveness of this method.

12.
Adv Sci (Weinh) ; : e2405404, 2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39206846

RESUMEN

Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances in data-driven methods for affinity prediction, their accuracy is still limited, partially because they only take advantage of static crystal structures while the actual binding affinities are generally determined by the thermodynamic ensembles between proteins and ligands. One effective way to approximate such a thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, an MD dataset containing 3,218 different protein-ligand complexes is curated, and Dynaformer, a graph-based deep learning model is further developed to predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories. In silico experiments demonstrated that the model exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset, outperforming the methods hitherto reported. Moreover, in a virtual screening on heat shock protein 90 (HSP90) using Dynaformer, 20 candidates are identified and their binding affinities are further experimentally validated. Dynaformer displayed promising results in virtual drug screening, revealing 12 hit compounds (two are in the submicromolar range), including several novel scaffolds. Overall, these results demonstrated that the approach offer a promising avenue for accelerating the early drug discovery process.

13.
JMIR Infodemiology ; 4: e59641, 2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39207842

RESUMEN

BACKGROUND: Manually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes. OBJECTIVE: We aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts? METHODS: We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM. RESULTS: The LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant. CONCLUSIONS: LLMs can effectively and efficiently process large social media-based health-related data sets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public's interests and concerns and determining the public's ideas to address them.


Asunto(s)
Medios de Comunicación Sociales , Humanos , Procesamiento de Lenguaje Natural
14.
Neural Netw ; 180: 106663, 2024 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-39208459

RESUMEN

Utilizing large-scale pretrained models is a well-known strategy to enhance performance on various target tasks. It is typically achieved through fine-tuning pretrained models on target tasks. However, naï ve fine-tuning may not fully leverage knowledge embedded in pretrained models. In this study, we introduce a novel fine-tuning method, called stochastic cross-attention (StochCA), specific to Transformer architectures. This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning. Specifically, in each block, instead of self-attention, cross-attention is performed stochastically according to the predefined probability, where keys and values are extracted from the corresponding block of a pretrained model. By doing so, queries and channel-mixing multi-layer perceptron layers of a target model are fine-tuned to target tasks to learn how to effectively exploit rich representations of pretrained models. To verify the effectiveness of StochCA, extensive experiments are conducted on benchmarks in the areas of transfer learning and domain generalization, where the exploitation of pretrained models is critical. Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas. Furthermore, we demonstrate that StochCA is complementary to existing approaches, i.e., it can be combined with them to further improve performance. We release the code at https://github.com/daintlab/stochastic_cross_attention.

15.
Brain Sci ; 14(8)2024 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-39199530

RESUMEN

Epilepsy seizure prediction is vital for enhancing the quality of life for individuals with epilepsy. In this study, we introduce a novel hybrid deep learning architecture, merging DenseNet and Vision Transformer (ViT) with an attention fusion layer for seizure prediction. DenseNet captures hierarchical features and ensures efficient parameter usage, while ViT offers self-attention mechanisms and global feature representation. The attention fusion layer effectively amalgamates features from both networks, guaranteeing the most relevant information is harnessed for seizure prediction. The raw EEG signals were preprocessed using the short-time Fourier transform (STFT) to implement time-frequency analysis and convert EEG signals into time-frequency matrices. Then, they were fed into the proposed hybrid DenseNet-ViT network model to achieve end-to-end seizure prediction. The CHB-MIT dataset, including data from 24 patients, was used for evaluation and the leave-one-out cross-validation method was utilized to evaluate the performance of the proposed model. Our results demonstrate superior performance in seizure prediction, exhibiting high accuracy and low redundancy, which suggests that combining DenseNet, ViT, and the attention mechanism can significantly enhance prediction capabilities and facilitate more precise therapeutic interventions.

16.
Sensors (Basel) ; 24(16)2024 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-39204844

RESUMEN

While digital twin networks (DTNs) can potentially estimate network strategy performance in pre-validation environments, they are still in their infancy for split learning (SL) tasks, facing challenges like unknown non-i.i.d. data distributions, inaccurate channel states, and misreported resource availability across devices. To address these challenges, this paper proposes a TransNeural algorithm for DTN pre-validation environment to estimate SL latency and convergence. First, the TransNeural algorithm integrates transformers to efficiently model data similarities between different devices, considering different data distributions and device participate sequence greatly influence SL training convergence. Second, it leverages neural network to automatically establish the complex relationships between SL latency and convergence with data distributions, wireless and computing resources, dataset sizes, and training iterations. Deviations in user reports are also accounted for in the estimation process. Simulations show that the TransNeural algorithm improves latency estimation accuracy by 9.3% and convergence estimation accuracy by 22.4% compared to traditional equation-based methods.

17.
Sensors (Basel) ; 24(16)2024 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-39204882

RESUMEN

The centralized coordination of Connected and Automated Vehicles (CAVs) at unsignalized intersections aims to enhance traffic efficiency, driving safety, and passenger comfort. Autonomous Intersection Management (AIM) systems introduce a novel approach for centralized coordination. However, existing rule-based and optimization methods often face the challenges of poor generalization and low computational efficiency when dealing with complex traffic environments and highly dynamic traffic conditions. Additionally, current Reinforcement Learning (RL)-based methods encounter difficulties around policy inference and safety. To address these issues, this study proposes Constraint-Guided Behavior Transformer for Safe Reinforcement Learning (CoBT-SRL), which uses transformers as the policy network to achieve efficient decision-making for vehicle driving behaviors. This method leverages the ability of transformers to capture long-range dependencies and improve data sample efficiency by using historical states, actions, and reward and cost returns to predict future actions. Furthermore, to enhance policy exploration performance, a sequence-level entropy regularizer is introduced to encourage policy exploration while ensuring the safety of policy updates. Simulation results indicate that CoBT-SRL exhibits stable training progress and converges effectively. CoBT-SRL outperforms other RL methods and vehicle intersection coordination schemes (VICS) based on optimal control in terms of traffic efficiency, driving safety, and passenger comfort.

18.
Sensors (Basel) ; 24(16)2024 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-39204917

RESUMEN

No-reference image quality assessment aims to evaluate image quality based on human subjective perceptions. Current methods face challenges with insufficient ability to focus on global and local information simultaneously and information loss due to image resizing. To address these issues, we propose a model that combines Swin-Transformer and natural scene statistics. The model utilizes Swin-Transformer to extract multi-scale features and incorporates a feature enhancement module and deformable convolution to improve feature representation, adapting better to structural variations in images, apply dual-branch attention to focus on key areas, and align the assessment more closely with human visual perception. The Natural Scene Statistics compensates information loss caused by image resizing. Additionally, we use a normalized loss function to accelerate model convergence and enhance stability. We evaluate our model on six standard image quality assessment datasets (both synthetic and authentic), and show that our model achieves advanced results across multiple datasets. Compared to the advanced DACNN method, our model achieved Spearman rank correlation coefficients of 0.922 and 0.923 on the KADID and KonIQ datasets, respectively, representing improvements of 1.9% and 2.4% over this method. It demonstrated outstanding performance in handling both synthetic and authentic scenes.

19.
Sensors (Basel) ; 24(16)2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39205066

RESUMEN

Automated segmentation algorithms for dermoscopic images serve as effective tools that assist dermatologists in clinical diagnosis. While existing deep learning-based skin lesion segmentation algorithms have achieved certain success, challenges remain in accurately delineating the boundaries of lesion regions in dermoscopic images with irregular shapes, blurry edges, and occlusions by artifacts. To address these issues, a multi-attention codec network with selective and dynamic fusion (MASDF-Net) is proposed for skin lesion segmentation in this study. In this network, we use the pyramid vision transformer as the encoder to model the long-range dependencies between features, and we innovatively designed three modules to further enhance the performance of the network. Specifically, the multi-attention fusion (MAF) module allows for attention to be focused on high-level features from various perspectives, thereby capturing more global contextual information. The selective information gathering (SIG) module improves the existing skip-connection structure by eliminating the redundant information in low-level features. The multi-scale cascade fusion (MSCF) module dynamically fuses features from different levels of the decoder part, further refining the segmentation boundaries. We conducted comprehensive experiments on the ISIC 2016, ISIC 2017, ISIC 2018, and PH2 datasets. The experimental results demonstrate the superiority of our approach over existing state-of-the-art methods.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Aprendizaje Profundo , Dermoscopía/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Piel/diagnóstico por imagen , Piel/patología , Interpretación de Imagen Asistida por Computador/métodos
20.
Sensors (Basel) ; 24(16)2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39205065

RESUMEN

The precise recognition of entire classroom meta-actions is a crucial challenge for the tailored adaptive interpretation of student behavior, given the intricacy of these actions. This paper proposes a Dynamic Position Embedding-based Model for Student Classroom Complete Meta-Action Recognition (DPE-SAR) based on the Video Swin Transformer. The model utilizes a dynamic positional embedding technique to perform conditional positional encoding. Additionally, it incorporates a deep convolutional network to improve the parsing ability of the spatial structure of meta-actions. The full attention mechanism of ViT3D is used to extract the potential spatial features of actions and capture the global spatial-temporal information of meta-actions. The proposed model exhibits exceptional performance compared to baseline models in action recognition as observed in evaluations on public datasets and smart classroom meta-action recognition datasets. The experimental results confirm the superiority of the model in meta-action recognition.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA