Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
2.
IEEE Trans Image Process ; 32: 2252-2266, 2023.
Article in English | MEDLINE | ID: mdl-37058382

ABSTRACT

Conventional Few-shot classification (FSC) aims to recognize samples from novel classes given limited labeled data. Recently, domain generalization FSC (DG-FSC) has been proposed with the goal to recognize novel class samples from unseen domains. DG-FSC poses considerable challenges to many models due to the domain shift between base classes (used in training) and novel classes (encountered in evaluation). In this work, we make two novel contributions to tackle DG-FSC. Our first contribution is to propose Born-Again Network (BAN) episodic training and comprehensively investigate its effectiveness for DG-FSC. As a specific form of knowledge distillation, BAN has been shown to achieve improved generalization in conventional supervised classification with a closed-set setup. This improved generalization motivates us to study BAN for DG-FSC, and we show that BAN is promising to address the domain shift encountered in DG-FSC. Building on the encouraging findings, our second (major) contribution is to propose Few-Shot BAN (FS-BAN), a novel BAN approach for DG-FSC. Our proposed FS-BAN includes novel multi-task learning objectives: Mutual Regularization, Mismatched Teacher, and Meta-Control Temperature, each of these is specifically designed to overcome central and unique challenges in DG-FSC, namely overfitting and domain discrepancy. We analyze different design choices of these techniques. We conduct comprehensive quantitative and qualitative analysis and evaluation over six datasets and three baseline models. The results suggest that our proposed FS-BAN consistently improves the generalization performance of baseline models and achieves state-of-the-art accuracy for DG-FSC. Project Page: yunqing-me.github.io/Born-Again-FS/.

3.
IEEE Trans Neural Netw Learn Syst ; 34(9): 6289-6302, 2023 Sep.
Article in English | MEDLINE | ID: mdl-34982698

ABSTRACT

In this article, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed cross-modal info-max hashing (CMIMH). First, to learn informative representations that can preserve both intramodal and intermodal similarities, we leverage the recent advances in estimating variational lower bound of MI to maximizing the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modeled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intramodal and intermodal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence, balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6438-6453, 2022 Oct.
Article in English | MEDLINE | ID: mdl-34048335

ABSTRACT

The foundational assumption of machine learning is that the data under consideration is separable into classes; while intuitively reasonable, separability constraints have proven remarkably difficult to formulate mathematically. We believe this problem is rooted in the mismatch between existing statistical techniques and commonly encountered data; object representations are typically high dimensional but statistical techniques tend to treat high dimensions a degenerate case. To address this problem, we develop a dedicated statistical framework for machine learning in high dimensions. The framework derives from the observation that object relations form a natural hierarchy; this leads us to model objects as instances of a high dimensional, hierarchal generative processes. Using a distance based statistical technique, also developed in this paper, we show that in such generative processes, instances of each process in the hierarchy, are almost-always encapsulated by a distinctive-shell that excludes almost-all other instances. The result is shell theory, a statistical machine learning framework in which separability constraints (distinctive-shells) are formally derived from the assumed generative process.

5.
IEEE Trans Image Process ; 31: 525-540, 2022.
Article in English | MEDLINE | ID: mdl-34793299

ABSTRACT

When neural networks are employed for high-stakes decision-making, it is desirable that they provide explanations for their prediction in order for us to understand the features that have contributed to the decision. At the same time, it is important to flag potential outliers for in-depth verification by domain experts. In this work we propose to unify two differing aspects of explainability with outlier detection. We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction and at the same time identify regions of similarity between the predicted sample and the examples. The examples are real prototypical cases sampled from the training set via a novel iterative prototype replacement algorithm. Furthermore, we propose to use the prototype similarity scores for identifying outliers. We compare performance in terms of the classification, explanation quality and outlier detection of our proposed network with baselines. We show that our prototype-based networks extending beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.

6.
IEEE Trans Image Process ; 30: 1882-1897, 2021.
Article in English | MEDLINE | ID: mdl-33428571

ABSTRACT

Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance of using more data in GAN training. Yet it is expensive to collect data in many domains such as medical applications. Data Augmentation (DA) has been applied in these applications. In this work, we first argue that the classical DA approach could mislead the generator to learn the distribution of the augmented data, which could be different from that of the original data. We then propose a principled framework, termed Data Augmentation Optimized for GAN (DAG), to enable the use of augmented data in GAN training to improve the learning of the original distribution. We provide theoretical analysis to show that using our proposed DAG aligns with the original GAN in minimizing the Jensen-Shannon (JS) divergence between the original distribution and model distribution. Importantly, the proposed DAG effectively leverages the augmented data to improve the learning of discriminator and generator. We conduct experiments to apply DAG to different GAN models: unconditional GAN, conditional GAN, self-supervised GAN and CycleGAN using datasets of natural images and medical images. The results show that DAG achieves consistent and considerable improvements across these models. Furthermore, when DAG is used in some GAN models, the system establishes state-of-the-art Fréchet Inception Distance (FID) scores. Our code is available (https://github.com/tntrung/dag-gans).

7.
Article in English | MEDLINE | ID: mdl-32784139

ABSTRACT

This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods.

8.
JMIR Med Inform ; 7(4): e14401, 2019 Sep 15.
Article in English | MEDLINE | ID: mdl-31573929

ABSTRACT

BACKGROUND: Artificial intelligence (AI)-based therapeutics, devices, and systems are vital innovations in cancer control; particularly, they allow for diagnosis, screening, precise estimation of survival, informing therapy selection, and scaling up treatment services in a timely manner. OBJECTIVE: The aim of this study was to analyze the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. METHODS: An exploratory factor analysis was conducted to identify research domains emerging from abstract contents. The Jaccard similarity index was utilized to identify the most frequently co-occurring terms. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. RESULTS: From 1991 to 2018, the number of studies examining the application of AI in cancer care has grown to 3555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volume of publications include (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life, and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches are largely driven by machine learning, artificial neural networks, and AI in various clinical practices. CONCLUSIONS: The research landscapes show that the development of AI in cancer care is focused on not only improving prediction in cancer screening and AI-assisted therapeutics but also on improving other corresponding areas such as precision and personalized medicine and patient-reported outcomes.

9.
Comput Biol Med ; 110: 144-155, 2019 07.
Article in English | MEDLINE | ID: mdl-31154258

ABSTRACT

The Gene or DNA sequence in every cell does not control genetic properties on its own; Rather, this is done through the translation of DNA into protein and subsequent formation of a certain 3D structure. The biological function of a protein is tightly connected to its specific 3D structure. Prediction of the protein secondary structure is a crucial intermediate step towards elucidating its 3D structure and function. Traditional experimental methods for prediction of protein structure are expensive and time-consuming. Nevertheless, the average accuracy of the suggested solutions has hardly reached beyond 80%. The possible underlying reasons are the ambiguous sequence-structure relation, noise in input protein data, class imbalance, and the high dimensionality of the encoding schemes. Furthermore, we utilize a compound string dissimilarity measure to directly interpret protein sequence content and avoid information loss. In order to improve accuracy, we employ two different classifiers including support vector machine and fuzzy nearest neighbor and collectively aggregate the classification outcomes to infer the final protein structures. We conduct comprehensive experiments to compare our model with the current state-of-the-art approaches. The experimental results demonstrate that given a set of input sequences, our multi-component framework can accurately predict the protein structure. Nevertheless, the effectiveness of our unified model can be further enhanced through framework configuration.


Subject(s)
Machine Learning , Models, Molecular , Proteins/chemistry , Protein Structure, Secondary
10.
IEEE Trans Image Process ; 28(10): 4954-4969, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31071035

ABSTRACT

Representing images by compact hash codes is an attractive approach for large-scale content-based image retrieval. In most state-of-the-art hashing-based image retrieval systems, for each image, local descriptors are first aggregated as a global representation vector. This global vector is then subjected to a hashing function to generate a binary hash code. In previous works, the aggregating and the hashing processes are designed independently. Hence, these frameworks may generate suboptimal hash codes. In this paper, we first propose a novel unsupervised hashing framework in which feature aggregating and hashing are designed simultaneously and optimized jointly. Specifically, our joint optimization generates aggregated representations that can be better reconstructed by some binary codes. This leads to more discriminative binary hash codes and improved retrieval accuracy. In addition, the proposed method is flexible. It can be extended for supervised hashing. When the data label is available, the framework can be adapted to learn binary codes which minimize the reconstruction loss with respect to label vectors. Furthermore, we also propose a fast version of the state-of-the-art hashing method Binary Autoencoder to be used in our proposed frameworks. Extensive experiments on benchmark datasets under various settings show that the proposed methods outperform the state-of-the-art unsupervised and supervised hashing methods.

11.
J Clin Med ; 8(3)2019 Mar 14.
Article in English | MEDLINE | ID: mdl-30875745

ABSTRACT

The increasing application of Artificial Intelligence (AI) in health and medicine has attracted a great deal of research interest in recent decades. This study aims to provide a global and historical picture of research concerning AI in health and medicine. A total of 27,451 papers that were published between 1977 and 2018 (84.6% were dated 2008⁻2018) were retrieved from the Web of Science platform. The descriptive analysis examined the publication volume, and authors and countries collaboration. A global network of authors' keywords and content analysis of related scientific literature highlighted major techniques, including Robotic, Machine learning, Artificial neural network, Artificial intelligence, Natural language process, and their most frequent applications in Clinical Prediction and Treatment. The number of cancer-related publications was the highest, followed by Heart Diseases and Stroke, Vision impairment, Alzheimer's, and Depression. Moreover, the shortage in the research of AI application to some high burden diseases suggests future directions in AI research. This study offers a first and comprehensive picture of the global efforts directed towards this increasingly important and prolific field of research and suggests the development of global and national protocols and regulations on the justification and adaptation of medical AI products.

12.
IEEE Trans Image Process ; 28(4): 1675-1690, 2019 Apr.
Article in English | MEDLINE | ID: mdl-30452361

ABSTRACT

We present the design of an entire on-device system for large-scale urban localization using images. The proposed design integrates compact image retrieval and 2D-3D correspondence search to estimate the location in extensive city regions. Our design is GPS agnostic and does not require network connection. In order to overcome the resource constraints of mobile devices, we propose a system design that leverages the scalability advantage of image retrieval and accuracy of 3D model-based localization. Furthermore, we propose a new hashing-based cascade search for fast computation of 2D-3D correspondences. In addition, we propose a new one-many RANSAC for accurate pose estimation. The new one-many RANSAC addresses the challenge of repetitive building structures (e.g. windows and balconies) in urban localization. Extensive experiments demonstrate that our 2D-3D correspondence search achieves the state-of-the-art localization accuracy on multiple benchmark datasets. Furthermore, our experiments on a large Google street view image dataset show the potential of large-scale localization entirely on a typical mobile device.

13.
IEEE Trans Pattern Anal Mach Intell ; 40(3): 626-638, 2018 03.
Article in English | MEDLINE | ID: mdl-28358674

ABSTRACT

The objective of this paper is to design an embedding method that maps local features describing an image (e.g., SIFT) to a higher dimensional representation useful for the image retrieval problem. First, motivated by the relationship between the linear approximation of a nonlinear function in high dimensional space and the state-of-the-art feature representation used in image retrieval, i.e., VLAD, we propose a new approach for the approximation. The embedded vectors resulted by the function approximation process are then aggregated to form a single representation for image retrieval. Second, in order to make the proposed embedding method applicable to large scale problem, we further derive its fast version in which the embedded vectors can be efficiently computed, i.e., in the closed-form. We compare the proposed embedding methods with the state of the art in the context of image search under various settings: when the images are represented by medium length vectors, short vectors, or binary vectors. The experimental results show that the proposed embedding methods outperform existing the state of the art on the standard public image retrieval benchmarks.

14.
IEEE Trans Image Process ; 25(8): 3489-504, 2016 08.
Article in English | MEDLINE | ID: mdl-27244739

ABSTRACT

The ability to efficiently switch from one pre-encoded video stream to another (e.g., for bitrate adaptation or view switching) is important for many interactive streaming applications. Recently, stream-switching mechanisms based on distributed source coding (DSC) have been proposed. In order to reduce the overall transmission rate, these approaches provide a merge mechanism, where information is sent to the decoder, such that the exact same frame can be reconstructed given that any one of a known set of side information (SI) frames is available at the decoder (e.g., each SI frame may correspond to a different stream from which we are switching). However, the use of bit-plane coding and channel coding in many DSC approaches leads to complex coding and decoding. In this paper, we propose an alternative approach for merging multiple SI frames, using a piecewise constant (PWC) function as the merge operator. In our approach, for each block to be reconstructed, a series of parameters of these PWC merge functions are transmitted in order to guarantee identical reconstruction given the known SI blocks. We consider two different scenarios. In the first case, a target frame is first given, and then merge parameters are chosen, so that this frame can be reconstructed exactly at the decoder. In contrast, in the second scenario, the reconstructed frame and the merge parameters are jointly optimized to meet a rate-distortion criteria. Experiments show that for both scenarios, our proposed merge techniques can outperform both a recent approach based on DSC and the SP-frame approach in H.264, in terms of compression efficiency and decoder complexity.

15.
IEEE Trans Image Process ; 25(5): 1961-76, 2016 May.
Article in English | MEDLINE | ID: mdl-26978820

ABSTRACT

We propose an analytical model to estimate the depth-error-induced virtual view synthesis distortion (VVSD) in 3D video, taking the distance between reference and virtual views (virtual view position) into account. In particular, we start with a comprehensive preanalysis and discussion over several possible VVSD scenarios. Taking intrinsic characteristic of each scenario into consideration, we specifically classify them into four clusters: 1) overlapping region; 2) disocclusion and boundary region; 3) edge region; and 4) infrequent region. We propose to model VVSD as the linear combination of the distortion under different scenarios (DDSs) weighted by the probability under different scenarios (PDSs). We show analytically that DDS and PDS can be related to the virtual view position using quadratic/biquadratic models and linear models, respectively. Experimental results verify that the proposed model is capable of estimating the relationship between VVSD and the distance between reference and virtual views. Therefore, our model can be used to inform a reference view setup for capturing, or distortion at certain virtual view positions, when depth information is compressed.

16.
Biol Psychol ; 115: 35-42, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26777128

ABSTRACT

An evolutionarily ancient skill we possess is the ability to distinguish between food and non-food. Our goal here is to identify the neural correlates of visually driven 'edible-inedible' perceptual distinction. We also investigate correlates of the finer-grained likability assessment. Our stimuli depicted food or non-food items with sub-classes of appealing or unappealing exemplars. Using data-classification techniques drawn from machine-learning, as well as evoked-response analyses, we sought to determine whether these four classes of stimuli could be distinguished based on the patterns of brain activity they elicited. Subjects viewed 200 images while in a MEG scanner. Our analyses yielded two successes and a surprising failure. The food/non-food distinction had a robust neural counterpart and emerged as early as 85 ms post-stimulus onset. The likable/non-likable distinction too was evident in the neural signals when food and non-food stimuli were grouped together, or when only the non-food stimuli were included in the analyses. However, we were unable to identify any neural correlates of this distinction when limiting the analyses only to food stimuli. Taken together, these positive and negative results further our understanding of the substrates of a set of ecologically important judgments and have clinical implications for conditions like eating-disorders and anhedonia.


Subject(s)
Discrimination, Psychological/physiology , Food , Pattern Recognition, Visual/physiology , Adolescent , Adult , Arousal/physiology , Cerebral Cortex/physiology , Feeding and Eating Disorders/physiopathology , Female , Food Preferences/physiology , Humans , Machine Learning , Magnetoencephalography , Male , Statistics as Topic , Young Adult
17.
IEEE Trans Image Process ; 25(7): 3112-3125, 2016 07.
Article in English | MEDLINE | ID: mdl-28113182

ABSTRACT

In a mobile cloud gaming, high-quality, high-frame-rate game images of immense data size need to be delivered to the clients over wireless networks under stringent delay requirement. For good gaming experience, reducing the transmission bit rate of the game images is necessary. Most existing cloud gaming platforms simply employ standard, off-the-shelf video codecs for game image compression. In this paper, we propose the layered coding scheme to reduce transmission bandwidth and latency. We leverage the rendering computation of modern mobile devices to render a low-quality local game image, or the base layer (BL). Instead of sending a high-quality game image, cloud servers can send enhancement layer information, which clients can utilize to improve the quality of the BL. Central to the layered coding scheme is the design of a complexity-scalable BL rendering pipeline that can be executed on a range of power-constrained mobile devices. In this paper, we focus on the lighting stage in modern graphics rendering and propose a method to scale the popular Blinn-Phong lighting for the use in BL rendering. We derive an information-theoretic model on the Blinn-Phong lighting to estimate the rendered image entropy. The analytic model informs the optimal BL rendering design that can lead to maximum bandwidth saving subject to the constraint on the computation capability of the client. We show that the information rate of the enhancement layer could be much less than that of the high-quality game image, while the BL can be generated with only a very small amount of computation. Experiment results suggest that our analytic model is accurate in estimating. For layered coding scheme, up to 84% reduction in bandwidth usage can be achieved by sending the enhancement layer information instead of the original high-quality game images compressed by H.264/AVC.

18.
Article in English | MEDLINE | ID: mdl-25571546

ABSTRACT

We research a mobile imaging system for early diagnosis of melanoma. Different from previous work, we focus on smartphone-captured images, and propose a detection system that runs entirely on the smartphone. Smartphone-captured images taken under loosely-controlled conditions introduce new challenges for melanoma detection, while processing performed on the smartphone is subject to computation and memory constraints. To address these challenges, we propose to localize the skin lesion by combining fast skin detection and fusion of two fast segmentation results. We propose new features to capture color variation and border irregularity which are useful for smartphone-captured images. We also propose a new feature selection criterion to select a small set of good features used in the final lightweight system. Our evaluation confirms the effectiveness of proposed algorithms and features. In addition, we present our system prototype which computes selected visual features from a user-captured skin lesion image, and analyzes them to estimate the likelihood of malignance, all on an off-the-shelf smartphone.


Subject(s)
Early Detection of Cancer , Image Interpretation, Computer-Assisted , Skin Neoplasms/diagnosis , Algorithms , Cell Phone , Diagnostic Imaging , Humans , Melanoma/diagnosis , Photography , Skin/pathology , Skin Pigmentation
19.
IEEE Trans Image Process ; 23(1): 185-99, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24184725

ABSTRACT

We propose an analytical model to estimate the synthesized view quality in 3D video. The model relates errors in the depth images to the synthesis quality, taking into account texture image characteristics, texture image quality, and the rendering process. Especially, we decompose the synthesis distortion into texture-error induced distortion and depth-error induced distortion. We analyze the depth-error induced distortion using an approach combining frequency and spatial domain techniques. Experiment results with video sequences and coding/rendering tools used in MPEG 3DV activities show that our analytical model can accurately estimate the synthesis noise power. Thus, the model can be used to estimate the rendering quality for different system designs.


Subject(s)
Algorithms , Artifacts , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Models, Statistical , Photography/methods , Video Recording/methods , Computer Simulation , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
20.
IEEE Trans Image Process ; 22(10): 3818-29, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23674452

ABSTRACT

In general, subpixel-based downsampling can achieve higher apparent resolution of the down-sampled images on LCD or OLED displays than pixel-based downsampling. With the frequency domain analysis of subpixel-based downsampling, we discover special characteristics of the luma-chroma color transform choice for monochrome images. With these, we model the anti-aliasing filter design for subpixel-based monochrome image downsampling as a human visual system-based optimization problem with a two-term cost function and obtain a closed-form solution. One cost term measures the luminance distortion and the other term measures the chrominance aliasing in our chosen luma-chroma space. Simulation results suggest that the proposed method can achieve sharper down-sampled gray/font images compared with conventional pixel and subpixel-based methods, without noticeable color fringing artifacts.

SELECTION OF CITATIONS
SEARCH DETAIL
...