Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38781059

RESUMO

This paper proposes a novel transformer-based framework to generate accurate class-specific object localization maps for weakly supervised semantic segmentation (WSSS). Leveraging the insight that the attended regions of the one-class token in the standard vision transformer can generate class-agnostic localization maps, we investigate the transformer's capacity to capture class-specific attention for class-discriminative object localization by learning multiple class tokens. We present the Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with patch tokens. This is facilitated by a class-aware training strategy that establishes a one-to-one correspondence between output class tokens and ground-truth class labels. We also introduce a Contrastive-Class-Token (CCT) module to enhance the learning of discriminative class tokens, enabling the model to better capture the unique characteristics of each class. Consequently, the proposed framework effectively generates class-discriminative object localization maps from the class-to-patch attentions associated with different class tokens. To refine these localization maps, we propose the utilization of patch-level pairwise affinity derived from the patch-to-patch transformer attention. Furthermore, the proposed framework seamlessly complements the Class Activation Mapping (CAM) method, yielding significant improvements in WSSS performance on PASCAL VOC 2012 and MS COCO 2014. These results underline the importance of the class token for WSSS. The codes and models are publicly available here.

2.
Sci Rep ; 14(1): 6163, 2024 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-38485985

RESUMO

This study explores the effectiveness of Explainable Artificial Intelligence (XAI) for predicting suicide risk from medical tabular data. Given the common challenge of limited datasets in health-related Machine Learning (ML) applications, we use data augmentation in tandem with ML to enhance the identification of individuals at high risk of suicide. We use SHapley Additive exPlanations (SHAP) for XAI and traditional correlation analysis to rank feature importance, pinpointing primary factors influencing suicide risk and preventive measures. Experimental results show the Random Forest (RF) model is excelling in accuracy, F1 score, and AUC (>97% across metrics). According to SHAP, anger issues, depression, and social isolation emerge as top predictors of suicide risk, while individuals with high incomes, esteemed professions, and higher education present the lowest risk. Our findings underscore the effectiveness of ML and XAI in suicide risk assessment, offering valuable insights for psychiatrists and facilitating informed clinical decisions.


Assuntos
Inteligência Artificial , Suicídio , Humanos , Aprendizado de Máquina , Ira , Medição de Risco
3.
Artigo em Inglês | MEDLINE | ID: mdl-38478447

RESUMO

Most existing weakly supervised semantic segmentation (WSSS) methods rely on class activation mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pretrained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet + , a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant intertask correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet + , saliency detection and multilabel image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.

4.
IEEE Trans Image Process ; 32: 4800-4811, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37610890

RESUMO

Cross-resolution person re-identification (CRReID) is a challenging and practical problem that involves matching low-resolution (LR) query identity images against high-resolution (HR) gallery images. Query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. State-of-the-art solutions for CRReID either learn a resolution-invariant representation or adopt a super-resolution (SR) module to recover the missing information from the LR query. In this paper, we propose an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric that is adaptive to the resolution of a query image. We realize this idea by learning resolution-adaptive representations for cross-resolution comparison. We propose two resolution-adaptive mechanisms to achieve this. The first mechanism encodes the resolution specifics into different subvectors in the penultimate layer of the deep neural network, creating a varying-length representation. To better extract resolution-dependent information, we further propose to learn resolution-adaptive masks for intermediate residual feature blocks. A novel progressive learning strategy is proposed to train those masks properly. These two mechanisms are combined to boost the performance of CRReID. Experimental results show that the proposed method outperforms existing approaches and achieves state-of-the-art performance on multiple CRReID benchmarks.

5.
Comput Methods Programs Biomed ; 240: 107685, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37429247

RESUMO

BACKGROUND AND OBJECTIVE: The generation of three-dimensional (3D) medical images has great application potential since it takes into account the 3D anatomical structure. Two problems prevent effective training of a 3D medical generative model: (1) 3D medical images are expensive to acquire and annotate, resulting in an insufficient number of training images, and (2) a large number of parameters are involved in 3D convolution. METHODS: We propose a novel GAN model called 3D Split&Shuffle-GAN. To address the 3D data scarcity issue, we first pre-train a two-dimensional (2D) GAN model using abundant image slices and inflate the 2D convolution weights to improve the initialization of the 3D GAN. Novel 3D network architectures are proposed for both the generator and discriminator of the GAN model to significantly reduce the number of parameters while maintaining the quality of image generation. Several weight inflation strategies and parameter-efficient 3D architectures are investigated. RESULTS: Experiments on both heart (Stanford AIMI Coronary Calcium) and brain (Alzheimer's Disease Neuroimaging Initiative) datasets show that our method leads to improved 3D image generation quality (14.7 improvements on Frchet inception distance) with significantly fewer parameters (only 48.5% of the baseline method). CONCLUSIONS: We built a parameter-efficient 3D medical image generation model. Due to the efficiency and effectiveness, it has the potential to generate high-quality 3D brain and heart images for real use cases.


Assuntos
Processamento de Imagem Assistida por Computador , Imageamento Tridimensional , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Encéfalo/diagnóstico por imagem , Neuroimagem
6.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6511-6536, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36063506

RESUMO

In recent years, advancements in machine learning (ML) techniques, in particular, deep learning (DL) methods have gained a lot of momentum in solving inverse imaging problems, often surpassing the performance provided by hand-crafted approaches. Traditionally, analytical methods have been used to solve inverse imaging problems such as image restoration, inpainting, and superresolution. Unlike analytical methods for which the problem is explicitly defined and the domain knowledge is carefully engineered into the solution, DL models do not benefit from such prior knowledge and instead make use of large datasets to predict an unknown solution to the inverse problem. Recently, a new paradigm of training deep models using a single image, named untrained neural network prior (UNNP) has been proposed to solve a variety of inverse tasks, e.g., restoration and inpainting. Since then, many researchers have proposed various applications and variants of UNNP. In this paper, we present a comprehensive review of such studies and various UNNP applications for different tasks and highlight various open research problems which require further research.

7.
IEEE Trans Image Process ; 31: 4803-4816, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35830405

RESUMO

Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations. Unlike conventional unsupervised re-ID methods that use pseudo labels based on global clustering, we construct patch surrogate classes as initial supervision, and propose to assign pseudo labels to images through the pairwise gradient-guided similarity separation. This can cluster images in pseudo pairs, and the pseudos can be updated during training. Based on pseudo pairs, we propose to improve the generalization of similarity function via a novel self-similarity learning:it learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity. The intra-similarity learning is based on channel attention to detect diverse local features from an image. The inter-similarity learning employs a deformable convolution with a non-local block to align patches for cross-image similarity. Experimental results on several re-ID benchmark datasets demonstrate the superiority of the proposed method over the state-of-the-arts.


Assuntos
Identificação Biométrica , Algoritmos , Benchmarking , Identificação Biométrica/métodos , Análise por Conglomerados , Humanos
8.
Plants (Basel) ; 11(12)2022 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-35736770

RESUMO

Gene models are regions of the genome that can be transcribed into RNA and translated to proteins, or belong to a class of non-coding RNA genes. The prediction of gene models is a complex process that can be unreliable, leading to false positive annotations. To help support the calling of confident conserved gene models and minimize false positives arising during gene model prediction we have developed Truegene, a machine learning approach to classify potential low confidence gene models using 14 gene and 41 protein-based characteristics. Amino acid and nucleotide sequence-based features were calculated for conserved (high confidence) and non-conserved (low confidence) annotated genes from the published Pisum sativum Cameor genome. These features were used to train eXtreme Gradient Boost (XGBoost) classifier models to predict whether a gene model is likely to be real. The optimized models demonstrated a prediction accuracy ranging from 87% to 90% and an F-1 score of 0.91-0.94. We used SHapley Additive exPlanations (SHAP) and feature importance plots to identify the features that contribute to the model predictions, and we show that protein and gene-based features can be used to build accurate models for gene prediction that have applications in supporting future gene annotation processes.

9.
IEEE Trans Image Process ; 31: 2094-2105, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35196234

RESUMO

The goal of ground-to-aerial image geo-localization is to determine the location of a ground query image by matching it against a reference database consisting of aerial/satellite images. This task is highly challenging due to the large appearance difference caused by extreme changes in viewpoint and orientation. In this work, we show that the training difficulty is an important cue that can be leveraged to improve metric learning on cross-view images. More specifically, we propose a new Soft Exemplar Highlighting (SEH) loss to achieve online soft selection of exemplars. Adaptive weights are generated for exemplars by measuring their associated training difficulty using distance rectified logistic regression. These weights are then constrained to remove simple exemplars from training and truncate the large weights of extremely hard exemplars to escape from the trap with a local optimal solution. We further use the proposed SEH loss to train two mainstream convolutional neural networks for ground-to-aerial image-based geo-localization. Experimental results on two benchmark cross-view image datasets demonstrate that the proposed method achieves significant improvements in feature discriminativeness and outperforms the state-of-the-art image-based geo-localization methods.

10.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 1738-1764, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-33079659

RESUMO

Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted a growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this paper, we provide a comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for deep learning-based stereo for depth estimation research.


Assuntos
Aprendizado Profundo , Algoritmos , Humanos , Aprendizado de Máquina
11.
Sensors (Basel) ; 20(2)2020 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-31941132

RESUMO

Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort has been made globally to introduce automation and machine learning algorithms to accelerate both classification and assessment of marine benthic biota. One major issue lies with organisms that move with swell and currents, such as kelps. This paper presents an automatic hierarchical classification method local binary classification as opposed to the conventional flat classification to classify kelps in images collected by autonomous underwater vehicles. The proposed kelp classification approach exploits learned feature representations extracted from deep residual networks. We show that these generic features outperform the traditional off-the-shelf CNN features and the conventional hand-crafted features. Experiments also demonstrate that the hierarchical classification method outperforms the traditional parallel multi-class classifications by a significant margin (90.0% vs. 57.6% and 77.2% vs. 59.0%) on Benthoz15 and Rottnest datasets respectively. Furthermore, we compare different hierarchical classification approaches and experimentally show that the sibling hierarchical training approach outperforms the inclusive hierarchical approach by a significant margin. We also report an application of our proposed method to study the change in kelp cover over time for annually repeated AUV surveys.


Assuntos
Algoritmos , Aprendizado Profundo , Kelp/classificação , Austrália , Automação , Bases de Dados como Assunto , Processamento de Imagem Assistida por Computador , Ilhas
12.
Artigo em Inglês | MEDLINE | ID: mdl-31484121

RESUMO

Human actions represented with 3D skeleton sequences are robust to clustered backgrounds and illumination changes. In this paper, we investigate skeleton-based action prediction, which aims to recognize an action from a partial skeleton sequence that contains incomplete action information. We propose a new Latent Global Network based on adversarial learning for action prediction. We demonstrate that the proposed network provides latent long-term global information that is complementary to the local action information of the partial sequences and helps improve action prediction. We show that action prediction can be improved by combining the latent global information with the local action information. We test the proposed method on three challenging skeleton datasets and report state-of-the-art performance.

13.
Neural Netw ; 105: 419-430, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29945061

RESUMO

By introducing sign constraints on the weights, this paper proposes sign constrained rectifier networks (SCRNs), whose training can be solved efficiently by the well known majorization-minimization (MM) algorithms. We prove that the proposed two-hidden-layer SCRNs, which exhibit negative weights in the second hidden layer and negative weights in the output layer, are capable of separating any number of disjoint pattern sets. Furthermore, the proposed two-hidden-layer SCRNs can decompose the patterns of each class into several clusters so that each cluster is convexly separable from all the patterns from the other classes. This provides a means to learn the pattern structures and analyse the discriminant factors between different classes of patterns. Experimental results are provided to show the benefits of sign constraints in improving classification performance and the efficiency of the proposed MM algorithm.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação
14.
ACS Nano ; 12(6): 6079-6088, 2018 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-29792677

RESUMO

In this work, we present a high-performance smart electronic nose (E-nose) system consisting of a multiplexed tin oxide (SnO2) nanotube sensor array, read-out circuit, wireless data transmission unit, mobile phone receiver, and data processing application (App). Using the designed nanotube sensor device structure in conjunction with multiple electrode materials, high-sensitivity gas detection and discrimination have been achieved at room temperature, enabling a 1000 times reduction of the sensor's power consumption as compared to a conventional device using thin film SnO2. The experimental results demonstrate that the developed E-nose can identify indoor target gases using a simple vector-matching gas recognition algorithm. In addition, the fabricated E-nose has achieved state-of-the-art sensitivity for H2 and benzene detection at room temperature with metal oxide sensors. Such a smart E-nose system can address the imperative needs for distributed environmental monitoring in smart homes, smart buildings, and smart cities.

15.
IEEE Trans Image Process ; 27(6): 2842-2855, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29570086

RESUMO

This paper presents a new representation of skeleton sequences for 3D action recognition. Existing methods based on hand-crafted features or recurrent neural networks cannot adequately capture the complex spatial structures and the long-term temporal dynamics of the skeleton sequences, which are very important to recognize the actions. In this paper, we propose to transform each channel of the 3D coordinates of a skeleton sequence into a clip. Each frame of the generated clip represents the temporal information of the entire skeleton sequence and one particular spatial relationship between the skeleton joints. The entire clip incorporates multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We also propose a multitask convolutional neural network (MTCNN) to learn the generated clips for action recognition. The proposed MTCNN processes all the frames of the generated clips in parallel to explore the spatial and temporal information of the skeleton sequences. The proposed method has been extensively tested on six challenging benchmark datasets. Experimental results consistently demonstrate the superiority of the proposed clip representation and the feature learning method for 3D action recognition compared to the existing techniques.

16.
Sensors (Basel) ; 15(11): 29192-208, 2015 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-26610492

RESUMO

This paper presents a high-efficiency inductorless self-controlled rectifier for piezoelectric energy harvesting. High efficiency is achieved by discharging the piezoelectric device (PD) capacitance each time the current produced by the PD changes polarity. This is achieved automatically without the use of delay lines, thereby making the proposed circuit compatible with any type of PD. In addition, the proposed rectifier alleviates the need for an inductor, making it suitable for on-chip integration. Reported experimental results show that the proposed rectifier can harvest up to 3.9 times more energy than a full wave bridge rectifier.

17.
Appl Spectrosc ; 69(4): 473-80, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25742260

RESUMO

Despite the importance of data reduction as part of the processing of reflection-based classifications, this study represents one of the first in which the effects of both spatial and spectral data reductions on classification accuracies are quantified. Furthermore, the effects of approaches to data reduction were quantified for two separate classification methods, linear discriminant analysis (LDA) and support vector machine (SVM). As the model dataset, reflection data were acquired using a hyperspectral camera in 230 spectral channels from 401 to 879 nm (spectral resolution of 2.1 nm) from field pea (Pisum sativum) samples with and without internal pea weevil (Bruchus pisorum) infestation. We deployed five levels of spatial data reduction (binning) and eight levels of spectral data reduction (40 datasets). Forward stepwise LDA was used to select and include only spectral channels contributing the most to the separation of pixels from non-infested and infested field peas. Classification accuracies obtained with LDA and SVM were based on the classification of independent validation datasets. Overall, SVMs had significantly higher classification accuracies than LDAs (P < 0.01). There was a negative association between pixel resolution and classification accuracy, while spectral binning equivalent to up to 98% data reduction had negligible effect on classification accuracies. This study supports the potential use of reflection-based technologies in the quality control of food products with internal defects, and it highlights that spatial and spectral data reductions can (1) improve classification accuracies, (2) vastly decrease computer constraints, and (3) reduce analytical concerns associated with classifications of large and high-dimensional datasets.


Assuntos
Análise de Alimentos/métodos , Processamento de Imagem Assistida por Computador/métodos , Análise Espectral/métodos , Animais , Análise Discriminante , Alimentos/normas , Pisum sativum/química , Pisum sativum/parasitologia , Máquina de Vetores de Suporte , Gorgulhos/química
18.
Opt Express ; 19(6): 5565-73, 2011 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-21445195

RESUMO

We report a micropolarizer array technology exploiting "guest-host" interactions in liquid crystals for visible imaging polarimetry. We demonstrate high resolution thin micropolarizer arrays with a 5 µm×5 µm pixel pitch and a thickness of 0.95 µm. With the "host" nematic liquid crystal molecules photo-aligned by sulfonic azo-dye SD1, we report averaged major principal transmittance, polarization efficiency and order parameter of 80.3%, 0.863 and 0.848, respectively across the 400 nm-700 nm visible spectrum range. The proposed fabrication technology completely removes the need for any selective etching during the fabrication/integration process of the micropolarizer array. Fully CMOS compatible, it is simple and cost-effective, requiring only spin-coating followed by a single ultraviolet-exposure through a "photoalignment master". This makes it well suited to low cost polarization imaging applications.

19.
Front Neuroeng ; 4: 18, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22319491

RESUMO

Sensory perception results from the way sensory information is subsequently transformed in the brain. Olfaction is a typical example in which odor representations undergo considerable changes as they pass from olfactory receptor neurons (ORNs) to second-order neurons. First, many ORNs expressing the same receptor protein yet presenting heterogeneous dose-response properties converge onto individually identifiable glomeruli. Second, onset latency of glomerular activation is believed to play a role in encoding odor quality and quantity in the context of fast information processing. Taking inspiration from the olfactory pathway, we designed a simple yet robust glomerular latency coding scheme for processing gas sensor data. The proposed bio-inspired approach was evaluated using an in-house SnO(2) sensor array. Glomerular convergence was achieved by noting the possible analogy between receptor protein expressed in ORNs and metal catalyst used across the fabricated gas sensor array. Ion implantation was another technique used to account both for sensor heterogeneity and enhanced sensitivity. The response of the gas sensor array was mapped into glomerular latency patterns, whose rank order is concentration-invariant. Gas recognition was achieved by simply looking for a "match" within a library of spatio-temporal spike fingerprints. Because of its simplicity, this approach enables the integration of sensing and processing onto a single-chip.

20.
Opt Express ; 18(17): 17776-87, 2010 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-20721165

RESUMO

In this paper, we describe the design, modeling, fabrication, and optical characterization of the first micropolarimeter array enabling full Stokes polarization imaging in visible spectrum. The proposed micropolarimeter is fabricated by patterning a liquid-crystal (LC) layer on top of a visible-regime metal-wire-grid polarizer (MWGP) using ultraviolet sensitive sulfonic-dye-1 as the LC photoalignment material. This arrangement enables the formation of either micrometer-scale LC polarization rotators, neutral density filters or quarter wavelength retarders. These elements are in turn exploited to acquire all components of the Stokes vector, which describes all possible polarization states of light. Reported major principal transmittance of 75% and extinction ratio of 1100 demonstrate that the MWGP's superior optical characteristics are retained. The proposed liquidcrystal micropolarimeter array can be integrated on top of a complementary metal-oxide-semiconductor (CMOS) image sensor for real-time full Stokes polarization imaging.


Assuntos
Luz , Cristais Líquidos , Microscopia de Polarização/instrumentação , Óptica e Fotônica/instrumentação , Semicondutores , Desenho de Equipamento , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...