Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5092-5113, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38315601

RESUMO

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective than weakly supervised and zero-shot settings. This paper thoroughly reviews open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by juxtaposing open vocabulary learning with analogous concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Subsequently, we examine several pertinent tasks within the realms of segmentation and detection, encompassing long-tail problems, few-shot, and zero-shot settings. As a foundation for our method survey, we first elucidate the fundamental principles of detection and segmentation in close-set scenarios. Next, we examine various contexts where open vocabulary learning is employed, pinpointing recurring design elements and central themes. This is followed by a comparative analysis of recent detection and segmentation methodologies in commonly used datasets and benchmarks. Our review culminates with a synthesis of insights, challenges, and discourse on prospective research trajectories. To our knowledge, this constitutes the inaugural exhaustive literature review on open vocabulary learning.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3388-3405, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38090829

RESUMO

The training and inference of Graph Neural Networks (GNNs) are costly when scaling up to large-scale graphs. Graph Lottery Ticket (GLT) has presented the first attempt to accelerate GNN inference on large-scale graphs by jointly pruning the graph structure and the model weights. Though promising, GLT encounters robustness and generalization issues when deployed in real-world scenarios, which are also long-standing and critical problems in deep learning ideology. In real-world scenarios, the distribution of unseen test data is typically diverse. We attribute the failures on out-of-distribution (OOD) data to the incapability of discerning causal patterns, which remain stable amidst distribution shifts. In traditional spase graph learning, the model performance deteriorates dramatically as the graph/network sparsity exceeds a certain high level. Worse still, the pruned GNNs are hard to generalize to unseen graph data due to limited training set at hand. To tackle these issues, we propose the Resilient Graph Lottery Ticket (RGLT) to find more robust and generalizable GLT in GNNs. Concretely, we reactivate a fraction of weights/edges by instantaneous gradient information at each pruning point. After sufficient pruning, we conduct environmental interventions to extrapolate potential test distribution. Finally, we perform last several rounds of model averages to further improve generalization. We provide multiple examples and theoretical analyses that underpin the universality and reliability of our proposal. Further, RGLT has been experimentally verified across various independent identically distributed (IID) and out-of-distribution (OOD) graph benchmarks.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13024-13034, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37603491

RESUMO

Graph Neural Networks (GNNs) have been drawing significant attention to representation learning on graphs. Recent works developed frameworks to train very deep GNNs and showed impressive results in tasks like point cloud learning and protein interaction prediction. In this work, we study the performance of such deep models in large-scale graphs. In particular, we look at the effect of adequately choosing an aggregation function on deep models. We find that GNNs are very sensitive to the choice of aggregation functions (e.g. mean, max, and sum) when applied to different datasets. We systematically study and propose to alleviate this issue by introducing a novel class of aggregation functions named Generalized Aggregation Functions. The proposed functions extend beyond commonly used aggregation functions to a wide range of new permutation-invariant functions. Generalized Aggregation Functions are fully differentiable, where their parameters can be learned in an end-to-end fashion to yield a suitable aggregation function for each task. We show that equipped with the proposed aggregation functions, deep residual GNNs outperform state-of-the-art in several benchmarks from Open Graph Benchmark (OGB) across tasks and domains.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8621-8633, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37022056

RESUMO

The task of situation recognition aims to solve the visual reasoning problem with the ability to predict the activity happening (salient action) in an image and the nouns of all associated semantic roles playing in the activity. This poses severe challenges due to long-tailed data distributions and local class ambiguities. Prior works only propagate the local noun-level features on one single image without utilizing global information. We propose a Knowledge-aware Global Reasoning (KGR) framework to endow neural networks with the capability of adaptive global reasoning over nouns by exploiting diverse statistical knowledge. Our KGR is a local-global architecture, which consists of a local encoder to generate noun features using local relations and a global encoder to enhance the noun features via global reasoning supervised by an external global knowledge pool. The global knowledge pool is created by counting the pairwise relationships of nouns in the dataset. In this paper, we design an action-guided pairwise knowledge as the global knowledge pool based on the characteristic of the situation recognition task. Extensive experiments have shown that our KGR not only achieves state-of-the-art results on a large-scale situation recognition benchmark, but also effectively solves the long-tailed problem of noun classification by our global knowledge.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6923-6939, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33872143

RESUMO

Convolutional neural networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling factor for their great performance has been the ability to train very deep networks. Despite their huge success in many tasks, CNNs do not work well with non-euclidean data, which is prevalent in many real-world applications. Graph Convolutional Networks (GCNs) offer an alternative that allows for non-Eucledian data input to a neural network. While GCNs already achieve encouraging results, they are currently limited to architectures with a relatively small number of layers, primarily due to vanishing gradients during training. This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs. We show the benefit of using deep GCNs (with as many as 112 layers) experimentally across various datasets and tasks. Specifically, we achieve very promising performance in part segmentation and semantic segmentation on point clouds and in node classification of protein functions across biological protein-protein interaction (PPI) graphs. We believe that the insights in this work will open avenues for future research on GCNs and their application to further tasks not explored in this paper. The source code for this work is available at https://github.com/lightaime/deep_gcns_torch and https://github.com/lightaime/deep_gcns for PyTorch and TensorFlow implementations respectively.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 5027-5037, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36001517

RESUMO

This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the network parameters. This geometric characterization provides new perspectives to three tasks. (i) We propose a new tropical perspective to the lottery ticket hypothesis, where we view the effect of different initializations on the tropical geometric representation of a network's decision boundaries. (ii) Moreover, we propose new tropical based optimization reformulations that directly influence the decision boundaries of the network for the task of network pruning. (iii) At last, we discuss the reformulation of the generation of adversarial attacks in a tropical sense. We demonstrate that one can construct adversaries in a new tropical setting by perturbing a specific set of decision boundaries by perturbing a set of parameters in the network.

7.
Adv Sci (Weinh) ; 9(32): e2203460, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36089657

RESUMO

Respiration signals reflect many underlying health conditions, including cardiopulmonary functions, autonomic disorders and respiratory distress, therefore continuous measurement of respiration is needed in various cases. Unfortunately, there is still a lack of effective portable electronic devices that meet the demands for medical and daily respiration monitoring. This work showcases a soft, wireless, and non-invasive device for quantitative and real-time evaluation of human respiration. This device simultaneously captures respiration and temperature signatures using customized capacitive and resistive sensors, encapsulated by a breathable layer, and does not limit the user's daily life. Further a machine learning-based respiration classification algorithm with a set of carefully studied features as inputs is proposed and it is deployed into mobile clients. The body status of users, such as being quiet, active and coughing, can be accurately recognized by the algorithm and displayed on clients. Moreover, multiple devices can be linked to a server network to monitor a group of users and provide each user with the statistical duration of physiological activities, coughing alerts, and body health advice. With these devices, individual and group respiratory health status can be quantitatively collected, analyzed, and stored for daily physiological signal detections as well as medical assistance.


Assuntos
Dispositivos Eletrônicos Vestíveis , Humanos , Monitorização Fisiológica , Respiração , Computadores , Aprendizado de Máquina
8.
Proc Natl Acad Sci U S A ; 119(28): e2118260119, 2022 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-35763567

RESUMO

Type VI CRISPR-Cas systems have been repurposed for various applications such as gene knockdown, viral interference, and diagnostics. However, the identification and characterization of thermophilic orthologs will expand and unlock the potential of diverse biotechnological applications. Herein, we identified and characterized a thermostable ortholog of the Cas13a family from the thermophilic organism Thermoclostridium caenicola (TccCas13a). We show that TccCas13a has a close phylogenetic relation to the HheCas13a ortholog from the thermophilic bacterium Herbinix hemicellulosilytica and shares several properties such as thermostability and inability to process its own pre-CRISPR RNA. We demonstrate that TccCas13a possesses robust cis and trans activities at a broad temperature range of 37 to 70 °C, compared with HheCas13a, which has a more limited range and lower activity. We harnessed TccCas13a thermostability to develop a sensitive, robust, rapid, and one-pot assay, named OPTIMA-dx, for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) detection. OPTIMA-dx exhibits no cross-reactivity with other viruses and a limit of detection of 10 copies/µL when using a synthetic SARS-CoV-2 genome. We used OPTIMA-dx for SARS-CoV-2 detection in clinical samples, and our assay showed 95% sensitivity and 100% specificity compared with qRT-PCR. Furthermore, we demonstrated that OPTIMA-dx is suitable for multiplexed detection and is compatible with the quick extraction protocol. OPTIMA-dx exhibits critical features that enable its use at point of care (POC). Therefore, we developed a mobile phone application to facilitate OPTIMA-dx data collection and sharing of patient sample results. This work demonstrates the power of CRISPR-Cas13 thermostable enzymes in enabling key applications in one-pot POC diagnostics and potentially in transcriptome engineering, editing, and therapies.


Assuntos
Proteínas de Bactérias , COVID-19 , Proteínas Associadas a CRISPR , Clostridiales , Endodesoxirribonucleases , Testes Imediatos , SARS-CoV-2 , Proteínas de Bactérias/química , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Biotecnologia , COVID-19/diagnóstico , Proteínas Associadas a CRISPR/química , Proteínas Associadas a CRISPR/classificação , Proteínas Associadas a CRISPR/genética , Clostridiales/enzimologia , Endodesoxirribonucleases/química , Endodesoxirribonucleases/classificação , Endodesoxirribonucleases/genética , Estabilidade Enzimática , Temperatura Alta , Humanos , Filogenia , SARS-CoV-2/isolamento & purificação
9.
Sci Data ; 9(1): 355, 2022 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-35729183

RESUMO

Soccer videos are a rich playground for computer vision, involving many elements, such as players, lines, and specific objects. Hence, to capture the richness of this sport and allow for fine automated analyses, we release SoccerNet-v3, a major extension of the SoccerNet dataset, providing a wide variety of spatial annotations and cross-view correspondences. SoccerNet's broadcast videos contain replays of important actions, allowing us to retrieve a same action from different viewpoints. We annotate those live and replay action frames showing same moments with exhaustive local information. Specifically, we label lines, goal parts, players, referees, teams, salient objects, jersey numbers, and we establish player correspondences between the views. This yields 1,324,732 annotations on 33,986 soccer images, making SoccerNet-v3 the largest dataset for multi-view soccer analysis. Derived tasks may benefit from these annotations, like camera calibration, player localization, team discrimination and multi-view re-identification, which can further sustain practical applications in augmented reality and soccer analytics. Finally, we provide Python codes to easily download our data and access our annotations.

10.
IEEE Trans Image Process ; 30: 5889-5904, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34156942

RESUMO

Viewing various stereo images under different viewing conditions has escalated the need for effective object-level remapping techniques. In this paper, we propose a new object spatial mapping scheme, which adjusts the depth and size of the selected object to match user preference and viewing conditions. Existing warping-based methods often distort the shape of important objects or cannot faithfully adjust the depth/size of the selected object due to improper warping such as local rotations. In this paper, by explicitly reducing the transformation freedom degree of warping, we propose an optimization model based on axis-aligned warping for object spatial remapping. The proposed axis-aligned warping based optimization model can simultaneously adjust the depths and sizes of selected objects to their target values without introducing severe shape distortions. Moreover, we propose object consistency constraints to ensure the size/shape of parts inside a selected object to be consistently adjusted. Such constraints improve the size/shape adjustment performance while remaining robust to some extent to incomplete object extraction. Experimental results demonstrate that the proposed method achieves high flexibility and effectiveness in adjusting the size and depth of objects compared with existing methods.

11.
Plant Physiol ; 186(3): 1632-1644, 2021 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-33856485

RESUMO

Witchweeds (Striga spp.) and broomrapes (Orobanchaceae and Phelipanche spp.) are root parasitic plants that infest many crops in warm and temperate zones, causing enormous yield losses and endangering global food security. Seeds of these obligate parasites require rhizospheric, host-released stimulants to germinate, which opens up possibilities for controlling them by applying specific germination inhibitors or synthetic stimulants that induce lethal germination in the host's absence. To determine their effect on germination, root exudates or synthetic stimulants/inhibitors are usually applied to parasitic seeds in in vitro bioassays, followed by assessment of germination ratios. Although these protocols are very sensitive, the germination recording process is laborious, representing a challenge for researchers and impeding high-throughput screens. Here, we developed an automatic seed census tool to count and discriminate germinated seeds (GS) from non-GS. We combined deep learning, a powerful data-driven framework that can accelerate the procedure and increase its accuracy, for object detection with computer vision latest development based on the Faster Region-based Convolutional Neural Network algorithm. Our method showed an accuracy of 94% in counting seeds of Striga hermonthica and reduced the required time from approximately 5 min to 5 s per image. Our proposed software, SeedQuant, will be of great help for seed germination bioassays and enable high-throughput screening for germination stimulants/inhibitors. SeedQuant is an open-source software that can be further trained to count different types of seeds for research purposes.


Assuntos
Germinação/efeitos dos fármacos , Orobanchaceae/crescimento & desenvolvimento , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/parasitologia , Plantas Daninhas/crescimento & desenvolvimento , Software , Sorghum/parasitologia , Striga/crescimento & desenvolvimento , Produtos Agrícolas/crescimento & desenvolvimento , Produtos Agrícolas/parasitologia , Tomada de Decisões Assistida por Computador , Aprendizado Profundo
12.
IEEE Trans Neural Netw Learn Syst ; 32(5): 2251-2265, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-32644931

RESUMO

In real-world scenarios (i.e., in the wild), pedestrians are often far from the camera (i.e., small scale), and they often gather together and occlude with each other (i.e., heavily occluded). However, detecting these small-scale and heavily occluded pedestrians remains a challenging problem for the existing pedestrian detection methods. We argue that these problems arise because of two factors: 1) insufficient resolution of feature maps for handling small-scale pedestrians and 2) lack of an effective strategy for extracting body part information that can directly deal with occlusion. To solve the above-mentioned problems, in this article, we propose a key-point-guided super-resolution network (coined KGSNet) for detecting these small-scale and heavily occluded pedestrians in the wild. Specifically, to address factor 1), a super-resolution network is first trained to generate a clear super-resolution pedestrian image from a small-scale one. In the super-resolution network, we exploit key points of the human body to guide the super-resolution network to recover fine details of the human body region for easier pedestrian detection. To address factor 2), a part estimation module is proposed to encode the semantic information of different human body parts where four semantic body parts (i.e., head and upper/middle/bottom body) are extracted based on the key points. Finally, based on the generated clear super-resolved pedestrian patches padded with the extracted semantic body part images at the image level, a classification network is trained to further distinguish pedestrians/backgrounds from the inputted proposal regions. Both proposed networks (i.e., super-resolution network and classification network) are optimized in an alternating manner and trained in an end-to-end fashion. Extensive experiments on the challenging CityPersons data set demonstrate the effectiveness of the proposed method, which achieves superior performance over previous state-of-the-art methods, especially for those small-scale and heavily occluded instances. Beyond this, we also achieve state-of-the-art performance (i.e., 3.89% MR-2 on the reasonable subset) on the Caltech data set.

13.
IEEE Trans Pattern Anal Mach Intell ; 42(9): 2148-2164, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-31056489

RESUMO

In popular TV programs (such as CSI), a very low-resolution face image of a person, who is not even looking at the camera in many cases, is digitally super-resolved to a degree that suddenly the person's identity is made visible and recognizable. Of course, we suspect that this is merely a cinematographic special effect and such a magical transformation of a single image is not technically possible. Or, is it? In this paper, we push the boundaries of super-resolving (hallucinating to be more accurate) a tiny, non-frontal face image to understand how much of this is possible by leveraging the availability of large datasets and deep networks. To this end, we introduce a novel Transformative Adversarial Neural Network (TANN) to jointly frontalize very-low resolution (i.e., 16 × 16 pixels) out-of-plane rotated face images (including profile views) and aggressively super-resolve them (8×), regardless of their original poses and without using any 3D information. TANN is composed of two components: a transformative upsampling network which embodies encoding, spatial transformation and deconvolutional layers, and a discriminative network that enforces the generated high-resolution frontal faces to lie on the same manifold as real frontal face images. We evaluate our method on a large set of synthesized non-frontal face images to assess its reconstruction performance. Extensive experiments demonstrate that TANN generates both qualitatively and quantitatively superior results achieving over 4 dB improvement over the state-of-the-art.

14.
IEEE Trans Pattern Anal Mach Intell ; 41(2): 352-364, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29990015

RESUMO

Total Variation (TV) is an effective and popular prior model in the field of regularization-based image processing. This paper focuses on total variation for removing impulse noise in image restoration. This type of noise frequently arises in data acquisition and transmission due to many reasons, e.g., a faulty sensor or analog-to-digital converter errors. Removing this noise is an important task in image restoration. State-of-the-art methods such as Adaptive Outlier Pursuit(AOP) [1] , which is based on TV with l02-norm data fidelity, only give sub-optimal performance. In this paper, we propose a new sparse optimization method, called l0TV-PADMM, which solves the TV-based restoration problem with l0-norm data fidelity. To effectively deal with the resulting non-convex non-smooth optimization problem, we first reformulate it as an equivalent biconvex Mathematical Program with Equilibrium Constraints (MPEC), and then solve it using a proximal Alternating Direction Method of Multipliers (PADMM). Our l0TV-PADMM method finds a desirable solution to the original l0-norm optimization problem and is proven to be convergent under mild conditions. We apply l0TV-PADMM to the problems of image denoising and deblurring in the presence of impulse noise. Our extensive experiments demonstrate that l0TV-PADMM outperforms state-of-the-art image restoration methods.

15.
IEEE Trans Pattern Anal Mach Intell ; 41(7): 1695-1708, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29994196

RESUMO

This paper revisits the integer programming (IP) problem, which plays a fundamental role in many computer vision and machine learning applications. The literature abounds with many seminal works that address this problem, some focusing on continuous approaches (e.g., linear program relaxation), while others on discrete ones (e.g., min-cut). However, since many of these methods are designed to solve specific IP forms, they cannot adequately satisfy the simultaneous requirements of accuracy, feasibility, and scalability. To this end, we propose a novel and versatile framework called $\ell _p$ℓp-box ADMM, which is based on two main ideas. (1) The discrete constraint is equivalently replaced by the intersection of a box and an $\ell _p$ℓp-norm sphere. (2) We infuse this equivalence into the Alternating Direction Method of Multipliers (ADMM) framework to handle the continuous constraints separately and to harness its attractive properties. More importantly, the ADMM update steps can lead to manageable sub-problems in the continuous domain. To demonstrate its efficacy, we apply it to an optimization form that occurs often in computer vision and machine learning, namely binary quadratic programming (BQP). In this case, the ADMM steps are simple, computationally efficient. Moreover, we present the theoretic analysis about the global convergence of the $\ell _p$ℓp-box ADMM through adding a perturbation with the sufficiently small factor $\epsilon$ε to the original IP problem. Specifically, the globally converged solution generated by $\ell _p$ℓp-box ADMM for the perturbed IP problem will be close to the stationary and feasible point of the original IP problem within $O(\epsilon)$O(ε). We demonstrate the applicability of $\ell _p$ℓp-box ADMM on three important applications: MRF energy minimization, graph matching, and clustering. Results clearly show that it significantly outperforms existing generic IP solvers both in runtime and objective. It also achieves very competitive performance to state-of-the-art methods designed specifically for these applications.

16.
IEEE Trans Cybern ; 46(1): 51-63, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25680224

RESUMO

In this paper, we formulate particle filter-based object tracking as an exclusive sparse learning problem that exploits contextual information. To achieve this goal, we propose the context-aware exclusive sparse tracker (CEST) to model particle appearances as linear combinations of dictionary templates that are updated dynamically. Learning the representation of each particle is formulated as an exclusive sparse representation problem, where the overall dictionary is composed of multiple group dictionaries that can contain contextual information. With context, CEST is less prone to tracker drift. Interestingly, we show that the popular L1 tracker is a special case of our CEST formulation. The proposed learning problem is efficiently solved using an accelerated proximal gradient method that yields a sequence of closed form updates. To make the tracker much faster, we reduce the number of learning problems to be solved by using the dual problem to quickly and systematically rank and prune particles in each frame. We test our CEST tracker on challenging benchmark sequences that involve heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that CEST consistently outperforms state-of-the-art trackers.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...