Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Article in English | MEDLINE | ID: mdl-35412971

ABSTRACT

Most state-of-the-art object detection methods have achieved impressive perfomrace on several public benchmarks, which are trained with high definition images. However, existing detectors are often sensitive to the visual variations and out-of-distribution data due to the domain gap caused by various confounders, e.g. the adverse weathre conditions. To bridge the gap, previous methods have been mainly exploring domain alignment, which requires to collect an amount of domain-specific training samples. In this paper, we introduce a novel domain adaptation model to discover a weather condition invariant feature representation. Specifically, we first employ a memory network to develop a confounder dictionary, which stores prototypes of object features under various scenarios. To guarantee the representativeness of each prototype in the dictionary, a dynamic item extraction strategy is used to update the memory dictionary. After that, we introduce a causal intervention reasoning module to explore the invariant representation of a specific object under different weather conditions. Finally, a categorical consistency regularization is used to constrain the similarities between categories in order to automatically search for the aligned instances among distinct domains. Experiments are conducted on several public benchmarks (RTTS, Foggy-Cityscapes, RID, and BDD 100K) with state-of-the-art performance achieved under multiple weather conditions.

2.
Ann Biomed Eng ; 49(3): 1033-1045, 2021 Mar.
Article in English | MEDLINE | ID: mdl-33057890

ABSTRACT

A python computer package is developed to segment and analyze scanning electron microscope (SEM) images of scaffolds for bone tissue engineering. The method requires only a portion of an SEM image to be labeled and used for training. The algorithm is then able to detect the pore characteristics for other SEM images acquired at different ambient conditions from different scaffolds with the same material as the labeled image. The quality of SEM images is first enhanced using histogram equalization. Then, a global thresholding method is used to perform the image analysis. The thresholding values for the SEM images are obtained using genetic algorithm (GA). The image analysis results include pore distributions of pore size, pore elongation and pore orientation. The results agree satisfactorily with the experimental data for the chitosan-alginate porous scaffolds considered. Applications of the method developed for image segmentation is not limited to scaffold pore structure analysis. The method can also be used for any SEM image containing multiple objects such as different types of cells and subcellular components.


Subject(s)
Microscopy, Electron, Scanning , Tissue Engineering , Algorithms , Bone and Bones , Porosity , Tissue Scaffolds
3.
IEEE Trans Pattern Anal Mach Intell ; 42(8): 1823-1841, 2020 Aug.
Article in English | MEDLINE | ID: mdl-30843818

ABSTRACT

During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving and augmented reality. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between real images and the synthetic data hinders the models' performance. Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network, while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and three backbone networks. We also report extensive ablation studies about our approach.

4.
Ann Biomed Eng ; 48(3): 1090-1102, 2020 Mar.
Article in English | MEDLINE | ID: mdl-31654152

ABSTRACT

Freeze-casting is a popular method to produce biomaterial scaffolds with highly porous structures. The pore structure of freeze-cast biomaterial scaffolds is influenced by processing parameters but has mostly been controlled experimentally. A mathematical model integrating Computational Fluid Dynamics with Population Balance Model was developed to predict average pore size (APS) of 3D porous chitosan-alginate scaffolds and to assess the influence of the geometrical parameters of mold on scaffold pore structure. The model predicted the crystallization pattern and APS for scaffolds cast in different diameter molds and filled to different heights. The predictions demonstrated that the temperature gradient and solidification pattern affect ice crystal nucleation and growth, subsequently influencing APS homogeneity. The predicted APS compared favorably with APS measurements from a corresponding experimental dataset, validating the model. Sensitivity analysis was performed to assess the response of the APS to the three geometrical parameters of the mold: well radius; solution fill height; and spacing between wells. The pore size was most sensitive to the distance between the wells and least sensitive to solution height. This validated model demonstrates a method for optimizing the APS of freeze-cast biomaterial scaffolds that could be applied to other compositions or applications.


Subject(s)
Models, Theoretical , Tissue Engineering , Tissue Scaffolds , Alginates , Biocompatible Materials , Chitosan , Crystallization , Hydrodynamics , Porosity , Temperature
5.
Article in English | MEDLINE | ID: mdl-31021764

ABSTRACT

State-of-the-art methods on sketch classification and retrieval are based on deep convolutional neural network to learn representations. Although deep neural networks have the ability to model images with hierarchical representations by convolution kernels, they can not automatically extract the structural representations of object categories in a human-perceptible way. Furthermore, sketch images usually have large scale visual variations caused by the styles of drawing or viewpoints, which make it difficult to develop generalized representations using the fixed computational mode of convolutional kernel. In this paper, our aim is to address the problem of fixed computational mode in feature extraction process without extra supervision. We propose a novel architecture to dynamically discover the object landmarks and learn the discriminative structural representations. Our model is composed of two components: a representative landmark discovering module that localizes the key points on the object, and a category-aware representation learning module that develops the category-specific features. Specifically, we develop a structure-aware offset layer to dynamically localize the representative landmarks, which is optimized based on the category labels without extra supervision. After that, a diversity branch is introduced to extract the global discriminative features for each category. Finally, we employ a multi-task loss function to develop an end-to-end trainable architecture. At testing time, we fuse all the predictions with different number of landmarks to achieve the final results. Through extensive experiments, we compare our model with several state-of-the-art methods on two challenging datasets TU-Berlin and Sketchy for sketch classification and retrieval, and the experimental results demonstrate the effectiveness of our proposed model.

6.
IEEE Trans Pattern Anal Mach Intell ; 41(12): 3057-3070, 2019 12.
Article in English | MEDLINE | ID: mdl-30371353

ABSTRACT

Sampling is an important and effective strategy in analyzing "big data," whereby a smaller subset of a dataset is used to estimate the characteristics of its entire population. The main goal in sampling is often to achieve a significant gain in the computational time. However, a major obstacle towards this goal is the assessment of the smallest sample size needed to ensure, with a high probability, a faithful representation of the entire dataset, especially when the data set is compiled of a large number of diverse structures (e.g., clusters). To address this problem, we propose a method referred to as the Sparse Withdrawal of Inliers in a First Trial (SWIFT) that determines the smallest sample size of a subset of a dataset sampled in one grab, with the guarantee that the subset provides a sufficient number of samples from each of the underlying structures necessary for the discovery and inference. The latter is established with high probability, and the lower bound of the smallest sample size depends on probabilistic guarantees. In addition, we derive an upper bound on the smallest sample size that allows for detection of the structures and show that the two bounds are very close to each other in a variety of scenarios. We show that the problem can be modeled using either a hypergeometric or a multinomial probability mass function (pmf), and derive accurate mathematical bounds to determine a tight approximation to the sample size, leading thus to a sparse sampling strategy. The key features of the proposed method are: (i) sparseness of the sampled subset for analyzing data, where the level of sparseness is independent of the population size; (ii) no prior knowledge of the distribution of data, or the number of underlying structures in the data; and (iii) robustness in the presence of overwhelming number of outliers. We evaluate the method thoroughly in terms of accuracy, its behavior against different parameters, and its effectiveness in reducing the computational cost in various applications of computer vision, such as subspace clustering and structure from motion.

7.
IEEE Trans Image Process ; 26(2): 619-632, 2017 Feb.
Article in English | MEDLINE | ID: mdl-27875221

ABSTRACT

Automatic image annotation methods are extremely beneficial for image search, retrieval, and organization systems. The lack of strict correlation between semantic concepts and visual features, referred to as the semantic gap, is a huge challenge for annotation systems. In this paper, we propose an image annotation model that incorporates contextual cues collected from sources both intrinsic and extrinsic to images, to bridge the semantic gap. The main focus of this paper is a large real-world data set of news images that we collected. Unlike standard image annotation benchmark data sets, our data set does not require human annotators to generate artificial ground truth descriptions after data collection, since our images already include contextually meaningful and real-world captions written by journalists. We thoroughly study the nature of image descriptions in this real-world data set. News image captions describe both visual contents and the contexts of images. Auxiliary information sources are also available with such images in the form of news article and metadata (e.g., keywords and categories). The proposed framework extracts contextual-cues from available sources of different data modalities and transforms them into a common representation space, i.e., the probability space. Predicted annotations are later transformed into sentence-like captions through an extractive framework applied over news articles. Our context-driven framework outperforms the state of the art on the collected data set of approximately 20 000 items, as well as on a previously available smaller news images data set.

8.
IEEE Trans Pattern Anal Mach Intell ; 39(10): 2000-2014, 2017 10.
Article in English | MEDLINE | ID: mdl-27893385

ABSTRACT

Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities' network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.

9.
IEEE Trans Biomed Eng ; 63(10): 2155-68, 2016 10.
Article in English | MEDLINE | ID: mdl-26841384

ABSTRACT

GOAL: In refractive surgery, astigmatism-correcting treatments are generally planned with the aid of some diagnostic imaging device and often executed by some computer guided laser system. In the transition from sitting down at a diagnostic device to lying down beneath a laser system, a phenomenon known as cyclotorsion (rotation of the eye within the socket) occurs. Hence, registration between lasers and diagnostic devices is necessary. The purpose of this paper is to present a newly developed algorithm that accomplishes robust registration using images of the patient's iris in the context of laser-assisted cataract surgery, and evaluate its efficacy. METHODS: The proposed iris registration algorithm was tested on real cataract patient images obtained from commercially available devices. Accuracy was measured against manual registrations performed by trained humans. Conservative bounds on success and failure rates were computed using novel statistical methods. RESULTS: The algorithm better approximated the cyclotorsion as averaged over manual measurements from three trained humans than any of the three individual humans, with a 95% tolerance interval of ±1.36(°) . In addition, a success rate ≥ 99.0% was observed for an acceptance threshold setting that allowed for a false registration rate ≤ 1.00*10(-3)%. CONCLUSION: The proposed iris registration algorithm accurately and consistently compensates for cyclotorsion in laser-assisted cataract surgery. SIGNIFICANCE: This paper details the first algorithm to be used for iris registration in laser-assisted cataract surgery. Enabling surgeons to make use of this algorithm in real surgeries is expected to have a significant impact on astigmatism management in cataract surgery.


Subject(s)
Cataract Extraction/methods , Eye Movements/physiology , Image Processing, Computer-Assisted/methods , Surgery, Computer-Assisted/methods , Algorithms , Humans , Iris/diagnostic imaging , Models, Statistical
10.
IEEE Trans Image Process ; 24(11): 4381-93, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26259245

ABSTRACT

In this paper, we focus on face clustering in videos. To promote the performance of video clustering by multiple intrinsic cues, i.e., pairwise constraints and multiple views, we propose a constrained multi-view video face clustering method under a unified graph-based model. First, unlike most existing video face clustering methods which only employ these constraints in the clustering step, we strengthen the pairwise constraints through the whole video face clustering framework, both in sparse subspace representation and spectral clustering. In the constrained sparse subspace representation, the sparse representation is forced to explore unknown relationships. In the constrained spectral clustering, the constraints are used to guide for learning more reasonable new representations. Second, our method considers both the video face pairwise constraints as well as the multi-view consistence simultaneously. In particular, the graph regularization enforces the pairwise constraints to be respected and the co-regularization penalizes the disagreement among different graphs of multiple views. Experiments on three real-world video benchmark data sets demonstrate the significant improvements of our method over the state-of-the-art methods.

11.
IEEE Trans Image Process ; 24(8): 2488-501, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25910089

ABSTRACT

We propose that the dynamics of an action in video data forms a sparse self-similar manifold in the space-time volume, which can be fully characterized by a linear rank decomposition. Inspired by the recurrence plot theory, we introduce the concept of Joint Self-Similarity Volume (Joint-SSV) to model this sparse action manifold, and hence propose a new optimized rank-1 tensor approximation of the Joint-SSV to obtain compact low-dimensional descriptors that very accurately characterize an action in a video sequence. We show that these descriptor vectors make it possible to recognize actions without explicitly aligning the videos in time in order to compensate for speed of execution or differences in video frame rates. Moreover, we show that the proposed method is generic, in the sense that it can be applied using different low-level features, such as silhouettes, tracked points, histogram of oriented gradients, and so forth. Therefore, our method does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public data sets demonstrate that our method produces promising results and outperforms many baseline methods.

12.
IEEE Trans Image Process ; 24(4): 1302-14, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25705915

ABSTRACT

Texts in natural scenes carry critical semantic clues for understanding images. When capturing natural scene images, especially by handheld cameras, a common artifact, i.e., blur, frequently happens. To improve the visual quality of such images, deblurring techniques are desired, which also play an important role in character recognition and image understanding. In this paper, we study the problem of recovering the clear scene text by exploiting the text field characteristics. A series of text-specific multiscale dictionaries (TMD) and a natural scene dictionary is learned for separately modeling the priors on the text and nontext fields. The TMD-based text field reconstruction helps to deal with the different scales of strings in a blurry image effectively. Furthermore, an adaptive version of nonuniform deblurring method is proposed to efficiently solve the real-world spatially varying problem. Dictionary learning allows more flexible modeling with respect to the text field property, and the combination with the nonuniform method is more appropriate in real situations where blur kernel sizes are depth dependent. Experimental results show that the proposed method achieves the deblurring results with better visual quality than the state-of-the-art methods.

13.
IEEE Trans Pattern Anal Mach Intell ; 31(10): 1898-905, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19696457

ABSTRACT

We propose a new view-invariant measure for action recognition. For this purpose, we introduce the idea that the motion of an articulated body can be decomposed into rigid motions of planes defined by triplets of body points. Using the fact that the homography induced by the motion of a triplet of body points in two identical pose transitions reduces to the special case of a homology, we use the equality of two of its eigenvalues as a measure of the similarity of the pose transitions between two subjects, observed by different perspective cameras and from different viewpoints. Experimental results show that our method can accurately identify human pose transitions and actions even when they include dynamic timeline maps, and are obtained from totally different viewpoints with different unknown camera parameters.


Subject(s)
Movement/physiology , Pattern Recognition, Automated/methods , Posture/physiology , Algorithms , Databases, Factual , Human Activities , Humans , Motion
14.
IEEE Trans Image Process ; 17(7): 1061-8, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18586615

ABSTRACT

In this paper, we present a novel and efficient solution to phase-shifting 2-D nonseparable Haar wavelet coefficients. While other methods either modify existing wavelets or introduce new ones to handle the lack of shift-invariance, we derive the explicit relationships between the coefficients of the shifted signal and those of the unshifted one. We then establish their computational complexity, and compare and demonstrate the superior performance of the proposed approach against classical interpolation tools in terms of accumulation of errors under successive shifting.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Signal Processing, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity
15.
Opt Lett ; 33(11): 1237-9, 2008 Jun 01.
Article in English | MEDLINE | ID: mdl-18516186

ABSTRACT

We previously demonstrated that radial basis functions may be preferred as a descriptor of free-form shape for a single mirror magnifier when compared to other conventional descriptions such as polynomials [Opt. Express 16, 1583 (2008)]. A key contribution is the application of radial basis functions to describe and optimize the shape of a free-form mirror in a dual-element magnifier with the specific goal of optimizing the pupil size given a 20 degrees field of view. We demonstrate a 12 mm exit pupil, 20 degrees diagonal full field of view, 15.5 mm eye clearance, 1.5 arc min resolution catadioptric dual-element magnifier design operating across the photopic visual regime. A second contribution is the explanation of why it is possible to approximate any optical mirror shape using radial basis functions.

16.
Opt Express ; 16(3): 1583-9, 2008 Feb 04.
Article in English | MEDLINE | ID: mdl-18542236

ABSTRACT

A local optical surface representation as a sum of basis functions is proposed and implemented. Specifically, we investigate the use of linear combination of Gaussians. The proposed approach is a local descriptor of shape and we show how such surfaces are optimized to represent rotationally non-symmetric surfaces as well as rotationally symmetric surfaces. As an optical design example, a single surface off-axis mirror with multiple fields is optimized, analyzed, and compared to existing shape descriptors. For the specific case of the single surface off-axis magnifier with a 3 mm pupil, >15 mm eye relief, 24 degree diagonal full field of view, we found the linear combination of Gaussians surface to yield an 18.5% gain in the average MTF across 17 field points compared to a Zernike polynomial up to and including 10th order. The sum of local basis representation is not limited to circular apertures.


Subject(s)
Computer-Aided Design , Equipment Design/methods , Equipment Failure Analysis/methods , Lenses , Models, Theoretical , Computer Simulation
17.
IEEE Trans Syst Man Cybern B Cybern ; 37(4): 803-16, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17702281

ABSTRACT

In order to monitor sufficiently large areas of interest for surveillance or any event detection, we need to look beyond stationary cameras and employ an automatically configurable network of nonoverlapping cameras. These cameras need not have an overlapping field of view and should be allowed to move freely in space. Moreover, features like zooming in/out, readily available in security cameras these days, should be exploited in order to focus on any particular area of interest if needed. In this paper, a practical framework is proposed to self-calibrate dynamically moving and zooming cameras and determine their absolute and relative orientations, assuming that their relative position is known. A global linear solution is presented for self-calibrating each zooming/focusing camera in the network. After self-calibration, it is shown that only one automatically computed vanishing point and a line lying on any plane orthogonal to the vertical direction is sufficient to infer the dynamic network configuration. Our method generalizes previous work which considers restricted camera motions. Using minimal assumptions, we are able to successfully demonstrate promising results on synthetic, as well as on real data.


Subject(s)
Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Photography/methods , Reproducibility of Results , Sensitivity and Specificity
18.
IEEE Trans Image Process ; 15(11): 3614-9, 2006 Nov.
Article in English | MEDLINE | ID: mdl-17076420

ABSTRACT

This paper proposes a novel method for camera calibration using images of a mirror symmetric object. Assuming unit aspect ratio and zero skew, we show that interimage homographies can be expressed as a function of only the principal point. By minimizing symmetric transfer errors, we thus obtain an accurate solution for the camera parameters. We also extend our approach to a calibration technique using images of a 1-D object with a fixed pivoting point. Unlike existing methods that rely on orthogonality or pole-polar relationship, our approach utilizes new inter-image constraints and does not require knowledge of the 3-D coordinates of feature points. To demonstrate the effectiveness of the approach, we present results for both synthetic and real images.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Photogrammetry/instrumentation , Photogrammetry/methods , Calibration , Information Storage and Retrieval/methods , Phantoms, Imaging , Photogrammetry/standards , Reproducibility of Results , Sensitivity and Specificity
19.
IEEE Trans Image Process ; 15(7): 1965-72, 2006 Jul.
Article in English | MEDLINE | ID: mdl-16830916

ABSTRACT

In this paper, we establish the exact relationship between the continuous and the discrete phase difference of two shifted images, and show that their discrete phase difference is a two-dimensional sawtooth signal. Subpixel registration can, thus, be performed directly in the Fourier domain by counting number of cycles of the phase difference matrix along each frequency axis. The subpixel portion is given by the noninteger fraction of the last cycle along each axis. The problem is formulated as an overdetermined homogeneous quadratic cost function under rank constraint for the phase difference, and the shape constraint for the filter that computes the group delay. The optimal tradeoff for imposing the constraints is determined using the method of generalized cross validation. Also, in order to robustify the solution, we assume a mixture model of inlying and outlying estimated shifts and truncate our quadratic cost function using expectation maximization.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Subtraction Technique , Artificial Intelligence , Computer Graphics , Models, Statistical , Numerical Analysis, Computer-Assisted
20.
IEEE Trans Image Process ; 14(2): 222-30, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15700527

ABSTRACT

In this paper, we address some of the major issues in optical flow within a new framework assuming nonstationary statistics for the motion field and for the errors. Problems addressed include the preservation of discontinuities, model/data errors, outliers, confidence measures, and performance evaluation. In solving these problems, we assume that the statistics of the motion field and the errors are not only spatially varying, but also unknown. We, thus, derive a blind adaptive technique based on generalized cross validation for estimating an independent regularization parameter for each pixel. Our formulation is pixelwise and combines existing first- and second-order constraints with a new second-order temporal constraint. We derive a new confidence measure for an adaptive rejection of erroneous and outlying motion vectors, and compare our results to other techniques in the literature. A new performance measure is also derived for estimating the signal-to-noise ratio for real sequences when the ground truth is unknown.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Movement , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Subtraction Technique , Video Recording/methods , Cluster Analysis , Information Storage and Retrieval/methods , Models, Statistical , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity , Stochastic Processes
SELECTION OF CITATIONS
SEARCH DETAIL
...