Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 94
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38963736

RESUMO

Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering (DC), which can learn clustering-friendly representations using deep neural networks (DNNs), has been broadly applied in a wide range of clustering tasks. Existing surveys for DC mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this article, we provide a comprehensive survey for DC in views of data sources. With different data sources, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, DC methods are introduced according to four categories, i.e., traditional single-view DC, semi-supervised DC, deep multiview clustering (MVC), and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of DC.

2.
Neural Netw ; 176: 106341, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38692189

RESUMO

The great learning ability of deep learning facilitates us to comprehend the real physical world, making learning to simulate complicated particle systems a promising endeavour both in academia and industry. However, the complex laws of the physical world pose significant challenges to the learning based simulations, such as the varying spatial dependencies between interacting particles and varying temporal dependencies between particle system states in different time stamps, which dominate particles' interacting behavior and the physical systems' evolution patterns. Existing learning based methods fail to fully account for the complexities, making them unable to yield satisfactory simulations. To better comprehend the complex physical laws, we propose a novel model - Graph Networks with Spatial-Temporal neural Ordinary Differential Equations (GNSTODE) - that characterizes the varying spatial and temporal dependencies in particle systems using a united end-to-end framework. Through training with real-world particle-particle interaction observations, GNSTODE can simulate any possible particle systems with high precisions. We empirically evaluate GNSTODE's simulation performance on two real-world particle systems, Gravity and Coulomb, with varying levels of spatial and temporal dependencies. The results show that GNSTODE yields better simulations than state-of-the-art methods, showing that GNSTODE can serve as an effective tool for particle simulation in real-world applications. Our code is made available at https://github.com/Guangsi-Shi/AI-for-physics-GNSTODE.


Assuntos
Simulação por Computador , Redes Neurais de Computação , Gravitação , Física , Aprendizado Profundo , Algoritmos
3.
Artigo em Inglês | MEDLINE | ID: mdl-38743549

RESUMO

Adversarial training (AT) is widely considered as the most promising strategy to defend against adversarial attacks and has drawn increasing interest from researchers. However, the existing AT methods still suffer from two challenges. First, they are unable to handle unrestricted adversarial examples (UAEs), which are built from scratch, as opposed to restricted adversarial examples (RAEs), which are created by adding perturbations bound by an lp norm to observed examples. Second, the existing AT methods often achieve adversarial robustness at the expense of standard generalizability (i.e., the accuracy on natural examples) because they make a tradeoff between them. To overcome these challenges, we propose a unique viewpoint that understands UAEs as imperceptibly perturbed unobserved examples. Also, we find that the tradeoff results from the separation of the distributions of adversarial examples and natural examples. Based on these ideas, we propose a novel AT approach called Provable Unrestricted Adversarial Training (PUAT), which can provide a target classifier with comprehensive adversarial robustness against both UAE and RAE, and simultaneously improve its standard generalizability. Particularly, PUAT utilizes partially labeled data to achieve effective UAE generation by accurately capturing the natural data distribution through a novel augmented triple-GAN. At the same time, PUAT extends the traditional AT by introducing the supervised loss of the target classifier into the adversarial loss and achieves the alignment between the UAE distribution, the natural data distribution, and the distribution learned by the classifier, with the collaboration of the augmented triple-GAN. Finally, the solid theoretical analysis and extensive experiments conducted on widely-used benchmarks demonstrate the superiority of PUAT.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38687672

RESUMO

Multiple instance learning (MIL) trains models from bags of instances, where each bag contains multiple instances, and only bag-level labels are available for supervision. The application of graph neural networks (GNNs) in capturing intrabag topology effectively improves MIL. Existing GNNs usually require filtering low-confidence edges among instances and adapting graph neural architectures to new bag structures. However, such asynchronous adjustments to structure and architecture are tedious and ignore their correlations. To tackle these issues, we propose a reinforced GNN framework for MIL (RGMIL), pioneering the exploitation of multiagent deep reinforcement learning (MADRL) in MIL tasks. MADRL enables the flexible definition or extension of factors that influence bag graphs or GNNs and provides synchronous control over them. Moreover, MADRL explores structure-to-architecture correlations while automating adjustments. Experimental results on multiple MIL datasets demonstrate that RGMIL achieves the best performance with excellent explainability. The code and data are available at https://github.com/RingBDStack/RGMIL.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38648122

RESUMO

While existing fairness interventions show promise in mitigating biased predictions, most studies concentrate on single-attribute protections. Although a few methods consider multiple attributes, they either require additional constraints or prediction heads, incurring high computational overhead or jeopardizing the stability of the training process. More critically, they consider per-attribute protection approaches, raising concerns about fairness gerrymandering where certain attribute combinations remain unfair. This work aims to construct a neutral domain containing fused information across all subgroups and attributes. It delivers fair predictions as the fused input contains neutralized information for all considered attributes. Specifically, we adopt mixup operations to generate samples with fused information. However, our experiments reveal that directly adopting the operations leads to degraded prediction results. The excessive mixup operations result in unrecognizable training data. To this end, we design three distinct mixup schemes that balance information fusion across attributes while retaining distinct visual features critical for training valid models. Extensive experiments with multiple datasets and up to eight sensitive attributes demonstrate that the proposed MultiFair method can deliver fairness protections for multiple attributes while maintaining valid prediction results.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38408012

RESUMO

Community detection has become a prominent task in complex network analysis. However, most of the existing methods for community detection only focus on the lower order structure at the level of individual nodes and edges and ignore the higher order connectivity patterns that characterize the fundamental building blocks within the network. In recent years, researchers have shown interest in motifs and their role in network analysis. However, most of the existing higher order approaches are based on shallow methods, failing to capture the intricate nonlinear relationships between nodes. In order to better fuse higher order and lower order structural information, a novel deep learning framework called motif-based contrastive learning for community detection (MotifCC) is proposed. First, a higher order network is constructed based on motifs. Subnetworks are then obtained by removing isolated nodes, addressing the fragmentation issue in the higher order network. Next, the concept of contrastive learning is applied to effectively fuse various kinds of information from nodes, edges, and higher order and lower order structures. This aims to maximize the similarity of corresponding node information, while distinguishing different nodes and different communities. Finally, based on the community structure of subnetworks, the community labels of all nodes are obtained by using the idea of label propagation. Extensive experiments on real-world datasets validate the effectiveness of MotifCC.

7.
Int J Neural Syst ; 34(3): 2450009, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38318751

RESUMO

Large-scale benchmark datasets are crucial in advancing research within the computer science communities. They enable the development of more sophisticated AI models and serve as "golden" benchmarks for evaluating their performance. Thus, ensuring the quality of these datasets is of utmost importance for academic research and the progress of AI systems. For the emerging vision-language tasks, some datasets have been created and frequently used, such as Flickr30k, COCO, and NoCaps, which typically contain a large number of images paired with their ground-truth textual descriptions. In this paper, an automatic method is proposed to assess the quality of large-scale benchmark datasets designed for vision-language tasks. In particular, a new cross-modal matching model is developed, which is capable of automatically scoring the textual descriptions of visual images. Subsequently, this model is employed to evaluate the quality of vision-language datasets by automatically assigning a score to each 'ground-truth' description for every image picture. With a good agreement between manual and automated scoring results on the datasets, our findings reveal significant disparities in the quality of the ground-truth descriptions included in the benchmark datasets. Even more surprising, it is evident that a small portion of the descriptions are unsuitable for serving as reliable ground-truth references. These discoveries emphasize the need for careful utilization of these publicly accessible benchmark databases.


Assuntos
Benchmarking , Bases de Dados Factuais
8.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15275-15291, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37751343

RESUMO

Few-shot learning aims to fast adapt a deep model from a few examples. While pre-training and meta-training can create deep models powerful for few-shot generalization, we find that pre-training and meta-training focus respectively on cross-domain transferability and cross-task transferability, which restricts their data efficiency in the entangled settings of domain shift and task shift. We thus propose the Omni-Training framework to seamlessly bridge pre-training and meta-training for data-efficient few-shot learning. Our first contribution is a tri-flow Omni-Net architecture. Besides the joint representation flow, Omni-Net introduces two parallel flows for pre-training and meta-training, responsible for improving domain transferability and task transferability respectively. Omni-Net further coordinates the parallel flows by routing their representations via the joint-flow, enabling knowledge transfer across flows. Our second contribution is the Omni-Loss, which introduces a self-distillation strategy separately on the pre-training and meta-training objectives for boosting knowledge transfer throughout different training stages. Omni-Training is a general framework to accommodate many existing algorithms. Evaluations justify that our single framework consistently and clearly outperforms the individual state-of-the-art methods on both cross-task and cross-domain settings in a variety of classification, regression and reinforcement learning problems.

9.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37401373

RESUMO

Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Humanos , Interações Medicamentosas , Processamento de Linguagem Natural , Descoberta de Drogas
10.
Artigo em Inglês | MEDLINE | ID: mdl-37216231

RESUMO

Social network alignment, aiming at linking identical identities across different social platforms, is a fundamental task in social graph mining. Most existing approaches are supervised models and require a large number of manually labeled data, which are infeasible in practice considering the yawning gap between social platforms. Recently, isomorphism across social networks is incorporated as complementary to link identities from the distribution level, which contributes to alleviating the dependency on sample-level annotations. Adversarial learning is adopted to learn a shared projection function by minimizing the distance between two social distributions. However, the hypothesis of isomorphism might not always hold true as social user behaviors are generally unpredictable, and thus a shared projection function is insufficient to handle the sophisticated cross-platform correlations. In addition, adversarial learning suffers from training instability and uncertainty, which may hinder model performance. In this article, we propose a novel meta-learning-based social network alignment model Meta-SNA to effectively capture the isomorphism and the unique characteristics of each identity. Our motivation lies in learning a shared meta-model to preserve the global cross-platform knowledge and an adaptor to learn a specific projection function for each identity. Sinkhorn distance is further introduced as the distribution closeness measurement to tackle the limitations of adversarial learning, which owns an explicitly optimal solution and can be efficiently computed by the matrix scaling algorithm. Empirically, we evaluate the proposed model over multiple datasets, and the experimental results demonstrate the superiority of Meta-SNA.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8063-8080, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37018637

RESUMO

While graph representation learning methods have shown success in various graph mining tasks, what knowledge is exploited for predictions is less discussed. This paper proposes a novel Adaptive Subgraph Neural Network named AdaSNN to find critical structures in graph data, i.e., subgraphs that are dominant to the prediction results. To detect critical subgraphs of arbitrary size and shape in the absence of explicit subgraph-level annotations, AdaSNN designs a Reinforced Subgraph Detection Module to search subgraphs adaptively without heuristic assumptions or predefined rules. To encourage the subgraph to be predictive at the global scale, we design a Bi-Level Mutual Information Enhancement Mechanism including both global-aware and label-aware mutual information maximization to further enhance the subgraph representations in the perspective of information theory. By mining critical subgraphs that reflect the intrinsic property of a graph, AdaSNN can provide sufficient interpretability to the learned results. Comprehensive experimental results on seven typical graph datasets demonstrate that AdaSNN has a significant and consistent performance improvement and provides insightful results.

12.
IEEE/ACM Trans Comput Biol Bioinform ; 20(4): 2577-2586, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37018664

RESUMO

Biomedical Named Entity Recognition (BioNER) aims at identifying biomedical entities such as genes, proteins, diseases, and chemical compounds in the given textual data. However, due to the issues of ethics, privacy, and high specialization of biomedical data, BioNER suffers from the more severe problem of lacking in quality labeled data than the general domain especially for the token-level. Facing the extremely limited labeled biomedical data, this work studies the problem of gazetteer-based BioNER, which aims at building a BioNER system from scratch. It needs to identify the entities in the given sentences when we have zero token-level annotations for training. Previous works usually use sequential labeling models to solve the NER or BioNER task and obtain weakly labeled data from gazetteers when we don't have full annotations. However, these labeled data are quite noisy since we need the labels for each token and the entity coverage of the gazetteers is limited. Here we propose to formulate the BioNER task as a Textual Entailment problem and solve the task via Textual Entailment with Dynamic Contrastive learning (TEDC). TEDC not only alleviates the noisy labeling issue, but also transfers the knowledge from pre-trained textual entailment models. Additionally, the dynamic contrastive learning framework contrasts the entities and non-entities in the same sentence and improves the model's discrimination ability. Experiments on two real-world biomedical datasets show that TEDC can achieve state-of-the-art performance for gazetteer-based BioNER.


Assuntos
Aprendizado Profundo , Proteínas
13.
Res Pract Thromb Haemost ; 7(2): 100068, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36777286

RESUMO

Background: Although early evidence concluded a lack of clinical benefit of convalescent plasma therapy (CPT) in COVID-19 management, recent trials have demonstrated the therapeutic potential of CPT in ambulatory care. CPT may also potentiate thromboembolic events, given the presence of coagulation factors and the prothrombotic state of COVID-19. Objectives: The present study aimed to assess and compare the clinical efficacy and the risk of venous thromboembolism (VTE)/arterial thromboembolism (ATE) of CPT in ambulatory versus hospitalized patients with COVID-19. Methods: MEDLINE, Embase, and Cochrane CENTRAL were searched from December 2019 to December 2022 for randomized controlled trials that investigated the use of CPT against placebo or standard of care in adult patients with COVID-19. The primary outcome was nonmortality disease progression. Secondary outcomes include VTE, ATE, 28-day mortality, clinical improvement, length of hospitalization, sepsis/fever, and major adverse cardiovascular events. Results: Twenty randomized controlled trials, with 21,340 patients, were included. CPT significantly reduced nonmortality disease progression in ambulatory patients (odds ratio [OR], 0.72; 95% CI, 0.56-0.92; P = .009) but not in hospitalized patients (OR, 1.03; 95% CI, 0.94-1.12; P = .58). The risk of VTE and ATE did not differ between the CPT and the control group (OR, 1.16; 95% CI, 0.82-1.66; P = .40; and OR, 1.01; 95% CI, 0.37-2.79; P = .98, respectively). No conclusive differences between CPT and control groups were noted in 28-day mortality, clinical improvement, length of hospitalization, risk of sepsis/fever, and major adverse cardiovascular events. Conclusion: In conclusion, treatment of COVID-19 with CPT prevents the progression of COVID-19 in the ambulatory care. It is not associated with an increased risk of VTE, ATE, or other adverse events.

14.
IEEE Trans Cybern ; 53(5): 3060-3074, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34767522

RESUMO

Community detection in multiview networks has drawn an increasing amount of attention in recent years. Many approaches have been developed from different perspectives. Despite the success, the problem of community detection in adversarial multiview networks remains largely unsolved. An adversarial multiview network is a multiview network that suffers an adversarial attack on community detection in which the attackers may deliberately remove some critical edges so as to hide the underlying community structure, leading to the performance degeneration of the existing approaches. To address this problem, we propose a novel approach, called higher order connection enhanced multiview modularity (HCEMM). The main idea lies in enhancing the intracommunity connection of each view by means of utilizing the higher order connection structure. The first step is to discover the view-specific higher order Microcommunities (VHM-communities) from the higher order connection structure. Then, for each view of the original multiview network, additional edges are added to make the nodes in each of its VHM-communities fully connected like a clique, by which the intracommunity connection of the multiview network can be enhanced. Therefore, the proposed approach is able to discover the underlying community structure in a multiview network while recovering the missing edges. Extensive experiments conducted on 16 real-world datasets confirm the effectiveness of the proposed approach.

15.
IEEE Trans Neural Netw Learn Syst ; 34(9): 5557-5569, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34878980

RESUMO

As deep learning models mature, one of the most prescient questions we face is: what is the ideal tradeoff between accuracy, fairness, and privacy (AFP)? Unfortunately, both the privacy and the fairness of a model come at the cost of its accuracy. Hence, an efficient and effective means of fine-tuning the balance between this trinity of needs is critical. Motivated by some curious observations in privacy-accuracy tradeoffs with differentially private stochastic gradient descent (DP-SGD), where fair models sometimes result, we conjecture that fairness might be better managed as an indirect byproduct of this process. Hence, we conduct a series of analyses, both theoretical and empirical, on the impacts of implementing DP-SGD in deep neural network models through gradient clipping and noise addition. The results show that, in deep learning, the number of training epochs is central to striking a balance between AFP because DP-SGD makes the training less stable, providing the possibility of model updates at a low discrimination level without much loss in accuracy. Based on this observation, we designed two different early stopping criteria to help analysts choose the optimal epoch at which to stop training a model so as to achieve their ideal tradeoff. Extensive experiments show that our methods can achieve an ideal balance between AFP.

16.
IEEE Trans Neural Netw Learn Syst ; 34(2): 973-986, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34432638

RESUMO

Most existing multiview clustering methods are based on the original feature space. However, the feature redundancy and noise in the original feature space limit their clustering performance. Aiming at addressing this problem, some multiview clustering methods learn the latent data representation linearly, while performance may decline if the relation between the latent data representation and the original data is nonlinear. The other methods which nonlinearly learn the latent data representation usually conduct the latent representation learning and clustering separately, resulting in that the latent data representation might be not well adapted to clustering. Furthermore, none of them model the intercluster relation and intracluster correlation of data points, which limits the quality of the learned latent data representation and therefore influences the clustering performance. To solve these problems, this article proposes a novel multiview clustering method via proximity learning in latent representation space, named multiview latent proximity learning (MLPL). For one thing, MLPL learns the latent data representation in a nonlinear manner which takes the intercluster relation and intracluster correlation into consideration simultaneously. For another, through conducting the latent representation learning and consensus proximity learning simultaneously, MLPL learns a consensus proximity matrix with k connected components to output the clustering result directly. Extensive experiments are conducted on seven real-world datasets to demonstrate the effectiveness and superiority of the MLPL method compared with the state-of-the-art multiview clustering methods.

17.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7934-7945, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35157599

RESUMO

In multiagent learning, one of the main ways to improve learning performance is to ask for advice from another agent. Contemporary advising methods share a common limitation that a teacher agent can only advise a student agent if the teacher has experience with an identical state. However, in highly complex learning scenarios, such as autonomous driving, it is rare for two agents to experience exactly the same state, which makes the advice less of a learning aid and more of a one-time instruction. In these scenarios, with contemporary methods, agents do not really help each other learn, and the main outcome of their back and forth requests for advice is an exorbitant communications' overhead. In human interactions, teachers are often asked for advice on what to do in situations that students are personally unfamiliar with. In these, we generally draw from similar experiences to formulate advice. This inspired us to provide agents with the same ability when asked for advice on an unfamiliar state. Hence, we propose a model-based self-advising method that allows agents to train a model based on states similar to the state in question to inform its response. As a result, the advice given can not only be used to resolve the current dilemma but also many other similar situations that the student may come across in the future via self-advising. Compared with contemporary methods, our method brings a significant improvement in learning performance with much lower communication overheads.

18.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 980-998, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35077355

RESUMO

Detecting hot social events (e.g., political scandal, momentous meetings, natural hazards, etc.) from social messages is crucial as it highlights significant happenings to help people understand the real world. On account of the streaming nature of social messages, incremental social event detection models in acquiring, preserving, and updating messages over time have attracted great attention. However, the challenge is that the existing event detection methods towards streaming social messages are generally confronted with ambiguous events features, dispersive text contents, and multiple languages, and hence result in low accuracy and generalization ability. In this paper, we present a novel reinForced, incremental and cross-lingual social Event detection architecture, namely FinEvent, from streaming social messages. Concretely, we first model social messages into heterogeneous graphs integrating both rich meta-semantics and diverse meta-relations, and convert them to weighted multi-relational message graphs. Second, we propose a new reinforced weighted multi-relational graph neural network framework by using a Multi-agent Reinforcement Learning algorithm to select optimal aggregation thresholds across different relations/edges to learn social message embeddings. To solve the long-tail problem in social event detection, a balanced sampling strategy guided Contrastive Learning mechanism is designed for incremental social message representation learning. Third, a new Deep Reinforcement Learning guided density-based spatial clustering model is designed to select the optimal minimum number of samples required to form a cluster and optimal minimum distance between two clusters in social event detection tasks. Finally, we implement incremental social message representation learning based on knowledge preservation on the graph neural network and achieve the transferring cross-lingual social event detection. We conduct extensive experiments to evaluate the FinEvent on Twitter streams, demonstrating a significant and consistent improvement in model quality with 14%-118%, 8%-170%, and 2%-21% increases in performance on offline, online, and cross-lingual social event detection tasks.

19.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1746-1760, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36251903

RESUMO

The "curse of dimensionality" brings new challenges to the feature selection (FS) problem, especially in bioinformatics filed. In this paper, we propose a hybrid Two-Stage Teaching-Learning-Based Optimization (TS-TLBO) algorithm to improve the performance of bioinformatics data classification. In the selection reduction stage, potentially informative features, as well as noisy features, are selected to effectively reduce the search space. In the following comparative self-learning stage, the teacher and the worst student with self-learning evolve together based on the duality of the FS problems to enhance the exploitation capabilities. In addition, an opposition-based learning strategy is utilized to generate initial solutions to rapidly improve the quality of the solutions. We further develop a self-adaptive mutation mechanism to improve the search performance by dynamically adjusting the mutation rate according to the teacher's convergence ability. Moreover, we integrate a differential evolutionary method with TLBO to boost the exploration ability of our algorithm. We conduct comparative experiments on 31 public data sets with different data dimensions, including 7 bioinformatics datasets, and evaluate our TS-TLBO algorithm compared with 11 related methods. The experimental results show that the TS-TLBO algorithm obtains a good feature subset with better classification performance, and indicates its generality to the FS problems.


Assuntos
Algoritmos , Biologia Computacional , Aprendizado de Máquina
20.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2208-2225, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35380958

RESUMO

The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical context, where the visual dynamics are believed to have modular structures that can be learned with compositional subsystems. This paper models these structures by presenting PredRNN, a new recurrent network, in which a pair of memory cells are explicitly decoupled, operate in nearly independent transition manners, and finally form unified representations of the complex environment. Concretely, besides the original memory cell of LSTM, this network is featured by a zigzag memory flow that propagates in both bottom-up and top-down directions across all layers, enabling the learned visual dynamics at different levels of RNNs to communicate. It also leverages a memory decoupling loss to keep the memory cells from learning redundant features. We further propose a new curriculum learning strategy to force PredRNN to learn long-term dynamics from context frames, which can be generalized to most sequence-to-sequence models. We provide detailed ablation studies to verify the effectiveness of each component. Our approach is shown to obtain highly competitive results on five datasets for both action-free and action-conditioned predictive learning scenarios.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...