Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38502624

RESUMO

Many complex social, biological, or physical systems are characterized as networks, and recovering the missing links of a network could shed important lights on its structure and dynamics. A good topological representation is crucial to accurate link modeling and prediction, yet how to account for the kaleidoscopic changes in link formation patterns remains a challenge, especially for analysis in cross-domain studies. We propose a new link representation scheme by projecting the local environment of a link into a "dipole plane", where neighboring nodes of the link are positioned via their relative proximity to the two anchors of the link, like a dipole. By doing this, complex and discrete topology arising from link formation is turned to differentiable point-cloud distribution, opening up new possibilities for topological feature-engineering with desired expressiveness, interpretability and generalization. Our approach has comparable or even superior results against state-of-the-art GNNs, meanwhile with a model up to hundreds of times smaller and running much faster. Furthermore, it provides a universal platform to systematically profile, study, and compare link-patterns from miscellaneous real-world networks. This allows building a global link-pattern atlas, based on which we have uncovered interesting common patterns of link formation, i.e., the bridge-style, the radiation-style, and the community-style across a wide collection of networks with highly different nature.

2.
IEEE Trans Neural Netw Learn Syst ; 34(4): 1681-1691, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32649280

RESUMO

Temporal point process is widely used for sequential data modeling. In this article, we focus on the problem of modeling sequential event propagation in graph, such as retweeting by social network users and news transmitting between websites. Given a collection of event propagation sequences, the conventional point process model considers only the event history, i.e., embed event history into a vector, not the latent graph structure. We propose a graph biased temporal point process (GBTPP) leveraging the structural information from graph representation learning, where the direct influence between nodes and indirect influence from event history is modeled. Moreover, the learned node embedding vector is also integrated into the embedded event history as side information. Experiments on a synthetic data set and two real-world data sets show the efficacy of our model compared with conventional methods and state-of-the-art ones.

3.
Artigo em Inglês | MEDLINE | ID: mdl-35969542

RESUMO

As machine learning algorithms are increasingly deployed for high-impact automated decision-making, the presence of bias (in datasets or tasks) gradually becomes one of the most critical challenges in machine learning applications. Such challenges range from the bias of race in face recognition to the bias of gender in hiring systems, where race and gender can be denoted as sensitive attributes. In recent years, much progress has been made in ensuring fairness and reducing bias in standard machine learning settings. Among them, learning fair representations with respect to the sensitive attributes has attracted increasing attention due to its flexibility in learning the rich representations based on advances in deep learning. In this article, we propose graph-fair, an algorithmic approach to learning fair representations under the graph Laplacian regularization, which reduces the separation between groups and the clustering within a group by encoding the sensitive attribute information into the graph. We have theoretically proved the underlying connection between graph regularization and distance correlation and show that the latter can be regarded as a standardized version of the former, with an additional advantage of being scale-invariant. Therefore, we naturally adopt the distance correlation as the fairness constraint to decrease the dependence between sensitive attributes and latent representations, called dist-fair. In contrast to existing approaches using measures of dependency and adversarial generators, both graph-fair and dist-fair provide simple fairness constraints, which eliminate the need for parameter tuning (e.g., choosing kernels) and introducing adversarial networks. Experiments conducted on real-world corpora indicate that our proposed fairness constraints applied for representation learning can provide better tradeoffs between fairness and utility results than existing approaches.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8618-8634, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34347595

RESUMO

In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space.

5.
IEEE Trans Vis Comput Graph ; 28(12): 4531-4545, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34191728

RESUMO

Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When analyzing event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this article, we propose a visual analytic approach for detecting anomalous sequences in an event sequence dataset via an unsupervised anomaly detection algorithm based on Variational AutoEncoders. We further compare the anomalous sequences with their reconstructions and with the normal sequences through a sequence matching algorithm to identify event anomalies. A visual analytics system is developed to support interactive exploration and interpretations of anomalies through novel visualization designs that facilitate the comparison between anomalous sequences and normal sequences. Finally, we quantitatively evaluate the performance of our anomaly detection algorithm, demonstrate the effectiveness of our system through case studies, and report feedback collected from study participants.

6.
Artigo em Inglês | MEDLINE | ID: mdl-32956057

RESUMO

The Deep learning of optical flow has been an active area for its empirical success. For the difficulty of obtaining accurate dense correspondence labels, unsupervised learning of optical flow has drawn more and more attention, while the accuracy is still far from satisfaction. By holding the philosophy that better estimation models can be trained with betterapproximated labels, which in turn can be obtained from better estimation models, we propose a self-taught learning framework to continually improve the accuracy using self-generated pseudo labels. The estimated optical flow is first filtered by bidirectional flow consistency validation and occlusion-aware dense labels are then generated by edge-aware interpolation from selected sparse matches. Moreover, by combining reconstruction loss with regression loss on the generated pseudo labels, the performance is further improved. The experimental results demonstrate that our models achieve state-of-the-art results among unsupervised methods on the public KITTI, MPI-Sintel and Flying Chairs datasets.

7.
IEEE Trans Neural Netw Learn Syst ; 30(10): 3124-3136, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30676979

RESUMO

Real-world sequential data are often generated based on complicated and latent mechanisms, which can be formulated as event sequences occurring in the continuous time domain. In addition, continuous signals may often be associated with event sequences and be formulated as time series with fixed time lags. Traditionally, event sequences are often modeled by parametric temporal point processes, which use explicitly defined conditional intensity functions to quantify the occurrence rates of events. However, these parametric models often merely take one-side information from event sequences into account while ignoring the information from concurrent time series, and their intensity functions are usually designed for specific tasks dependent on prior knowledge. To tackle the above-mentioned problems, we propose a model called recurrent point process networks which instantiates temporal point process models with temporal recurrent neural networks (RNNs). In particular, the intensity functions of the proposed model are modeled by two RNNs: one temporal RNN capturing the relationships among events and the other RNN updating intensity functions based on time series. Furthermore, an attention mechanism is introduced, which uncovers influence strengths among events with good interpretability. Focusing on challenging tasks such as temporal event prediction and underlying relational network mining, we demonstrate the superiority of our model on both synthetic and real-world data.

8.
Artigo em Inglês | MEDLINE | ID: mdl-30136953

RESUMO

Event sequence data is common to a broad range of application domains, from security to health care to scholarly communication. This form of data captures information about the progression of events for an individual entity (e.g., a computer network device; a patient; an author) in the form of a series of time-stamped observations. Moreover, each event is associated with an event type (e.g., a computer login attempt, or a hospital discharge). Analyses of event sequence data have been shown to help reveal important temporal patterns, such as clinical paths resulting in improved outcomes, or an understanding of common career trajectories for scholars. Moreover, recent research has demonstrated a variety of techniques designed to overcome methodological challenges such as large volumes of data and high dimensionality. However, the effective identification and analysis of latent stages of progression, which can allow for variation within different but similarly evolving event sequences, remain a significant challenge with important real-world motivations. In this paper, we propose an unsupervised stage analysis algorithm to identify semantically meaningful progression stages as well as the critical events which help define those stages. The algorithm follows three key steps: (1) event representation estimation, (2) event sequence warping and alignment, and (3) sequence segmentation. We also present a novel visualization system, ET2, which interactively illustrates the results of the stage analysis algorithm to help reveal evolution patterns across stages. Finally, we report three forms of evaluation for ET2: (1) case studies with two real-world datasets, (2) interviews with domain expert users, and (3) a performance evaluation on the progression analysis algorithm and the visualization design.

9.
IEEE Trans Vis Comput Graph ; 24(1): 56-65, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28866586

RESUMO

Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

10.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2581-2594, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28534789

RESUMO

Existing learning models for classification of imbalanced data sets can be grouped as either boundary-based or nonboundary-based depending on whether a decision hyperplane is used in the learning process. The focus of this paper is a new approach that leverages the advantage of both approaches. Specifically, our new model partitions the input space into three parts by creating two additional boundaries in the training process, and then makes the final decision based on a heuristic measurement between the test sample and a subset of selected training samples. Since the original hyperplane used by the underlying original classifier will be eliminated, the proposed model is named the boundary-eliminated (BE) model. Additionally, the pseudoinverse linear discriminant (PILD) is adopted for the BE model so as to obtain a novel classifier abbreviated as BEPILD. Experiments validate both the effectiveness and the efficiency of BEPILD, compared with 13 state-of-the-art classification methods, based on 31 imbalanced and 7 standard data sets.

11.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2660-2666, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28534791

RESUMO

In this brief, we propose a novel multilabel learning framework, called multilabel self-paced learning, in an attempt to incorporate the SPL scheme into the regime of multilabel learning. Specifically, we first propose a new multilabel learning formulation by introducing a self-paced function as a regularizer, so as to simultaneously prioritize label learning tasks and instances in each iteration. Considering that different multilabel learning scenarios often need different self-paced schemes during learning, we thus provide a general way to find the desired self-paced functions. To the best of our knowledge, this is the first work to study multilabel learning by jointly taking into consideration the complexities of both training instances and labels. Experimental results on four publicly available data sets suggest the effectiveness of our approach, compared with the state-of-the-art methods.

12.
IEEE Trans Pattern Anal Mach Intell ; 38(6): 1228-42, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26372208

RESUMO

This paper addresses the problem of matching common node correspondences among multiple graphs referring to an identical or related structure. This multi-graph matching problem involves two correlated components: i) the local pairwise matching affinity across pairs of graphs; ii) the global matching consistency that measures the uniqueness of the pairwise matchings by different composition orders. Previous studies typically either enforce the matching consistency constraints in the beginning of an iterative optimization, which may propagate matching error both over iterations and across graph pairs; or separate affinity optimization and consistency enforcement into two steps. This paper is motivated by the observation that matching consistency can serve as a regularizer in the affinity objective function especially when the function is biased due to noises or inappropriate modeling. We propose composition-based multi-graph matching methods to incorporate the two aspects by optimizing the affinity score, meanwhile gradually infusing the consistency. We also propose two mechanisms to elicit the common inliers against outliers. Compelling results on synthetic and real images show the competency of our algorithms.

13.
IEEE Trans Cybern ; 46(1): 27-38, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26208374

RESUMO

In recent years, multimedia retrieval has sparked much research interest in the multimedia, pattern recognition, and data mining communities. Although some attempts have been made along this direction, performing fast multimodal search at very large scale still remains a major challenge in the area. While hashing-based methods have recently achieved promising successes in speeding-up large-scale similarity search, most existing methods are only designed for uni-modal data, making them unsuitable for multimodal multimedia retrieval. In this paper, we propose a new hashing-based method for fast multimodal multimedia retrieval. The method is based on spectral analysis of the correlation matrix of different modalities. We also develop an efficient algorithm that learns some parameters from the data distribution for obtaining the binary codes. We empirically compare our method with some state-of-the-art methods on two real-world multimedia data sets.

14.
IEEE Trans Image Process ; 24(3): 994-1009, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25576568

RESUMO

The problem of graph matching (GM) in general is nondeterministic polynomial-complete and many approximate pairwise matching techniques have been proposed. For a general setting in real applications, it typically requires to find the consistent matching across a batch of graphs. Sequentially performing pairwise matching is prone to error propagation along the pairwise matching sequence, and the sequences generated in different pairwise matching orders can lead to contradictory solutions. Motivated by devising a robust and consistent multiple-GM model, we propose a unified alternating optimization framework for multi-GM. In addition, we define and use two metrics related to graphwise and pairwise consistencies. The former is used to find an appropriate reference graph, which induces a set of basis variables and launches the iteration procedure. The latter defines the order in which the considered graphs in the iterations are manipulated. We show two embodiments under the proposed framework that can cope with the nonfactorized and factorized affinity matrix, respectively. Our multi-GM model has two major characters: 1) the affinity information across multiple graphs are explored in each iteration by fixing part of the matching variables via a consistency-driven mechanism and 2) the framework is flexible to incorporate various existing pairwise GM solvers in an out-of-box fashion, and also can proceed with the output of other multi-GM methods. The experimental results on both synthetic data and real images empirically show that the proposed framework performs competitively with the state-of-the-art.

15.
J Am Med Inform Assoc ; 21(e1): e136-42, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24076750

RESUMO

OBJECTIVE: Electronic health records possess critical predictive information for machine-learning-based diagnostic aids. However, many traditional machine learning methods fail to simultaneously integrate textual data into the prediction process because of its high dimensionality. In this paper, we present a supervised method using Laplacian Eigenmaps to enable existing machine learning methods to estimate both low-dimensional representations of textual data and accurate predictors based on these low-dimensional representations at the same time. MATERIALS AND METHODS: We present a supervised Laplacian Eigenmap method to enhance predictive models by embedding textual predictors into a low-dimensional latent space, which preserves the local similarities among textual data in high-dimensional space. The proposed implementation performs alternating optimization using gradient descent. For the evaluation, we applied our method to over 2000 patient records from a large single-center pediatric cardiology practice to predict if patients were diagnosed with cardiac disease. In our experiments, we consider relatively short textual descriptions because of data availability. We compared our method with latent semantic indexing, latent Dirichlet allocation, and local Fisher discriminant analysis. The results were assessed using four metrics: the area under the receiver operating characteristic curve (AUC), Matthews correlation coefficient (MCC), specificity, and sensitivity. RESULTS AND DISCUSSION: The results indicate that supervised Laplacian Eigenmaps was the highest performing method in our study, achieving 0.782 and 0.374 for AUC and MCC, respectively. Supervised Laplacian Eigenmaps showed an increase of 8.16% in AUC and 20.6% in MCC over the baseline that excluded textual data and a 2.69% and 5.35% increase in AUC and MCC, respectively, over unsupervised Laplacian Eigenmaps. CONCLUSIONS: As a solution, we present a supervised Laplacian Eigenmap method to embed textual predictors into a low-dimensional Euclidean space. This method allows many existing machine learning predictors to effectively and efficiently capture the potential of textual predictors, especially those based on short texts.


Assuntos
Algoritmos , Inteligência Artificial , Cardiologia/métodos , Diagnóstico , Área Sob a Curva , Análise Discriminante , Humanos , Reconhecimento Automatizado de Padrão/métodos , Pediatria/métodos , Curva ROC , Sensibilidade e Especificidade
16.
Artigo em Inglês | MEDLINE | ID: mdl-26005312

RESUMO

Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives.

17.
Adv Neural Inf Process Syst ; 26: 3147-3155, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-26752940

RESUMO

If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |ε| edges to an accuracy of ε using n = O(1/ε2) randomizations and up to logarithmic factors O(n|ε|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 - 1/e) OPT - 2Cε , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence.

18.
Artigo em Inglês | MEDLINE | ID: mdl-24917494

RESUMO

In many applications in social network analysis, it is important to model the interactions and infer the influence between pairs of actors, leading to the problem of dyadic event modeling which has attracted increasing interests recently. In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps. Existing works either use fixed model parameters and heuristic rules for event attribution, or assume the dyadic events across actor-pairs are independent. To address those shortcomings we propose a probabilistic model based on mixtures of Hawkes processes that simultaneously tackles event attribution and network parameter inference, taking into consideration the dependency among dyadic events that share at least one actor. We also investigate using additive models to incorporate regularization to avoid overfitting. Our experiments on both synthetic and real-world data sets on international armed conflicts suggest that the proposed new method is capable of significantly improve accuracy when compared with the state-of-the-art for dyadic event attribution.

19.
IEEE Trans Pattern Anal Mach Intell ; 34(2): 253-65, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21670485

RESUMO

Manifold learning algorithms seek to find a low-dimensional parameterization of high-dimensional data. They heavily rely on the notion of what can be considered as local, how accurately the manifold can be approximated locally, and, last but not least, how the local structures can be patched together to produce the global parameterization. In this paper, we develop algorithms that address two key issues in manifold learning: 1) the adaptive selection of the local neighborhood sizes when imposing a connectivity structure on the given set of high-dimensional data points and 2) the adaptive bias reduction in the local low-dimensional embedding by accounting for the variations in the curvature of the manifold as well as its interplay with the sampling density of the data set. We demonstrate the effectiveness of our methods for improving the performance of manifold learning algorithms using both synthetic and real-world data sets.

20.
IEEE Trans Image Process ; 21(2): 615-25, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21859621

RESUMO

Aliasing is a common artifact in low-resolution (LR) images generated by a downsampling process. Recovering the original high-resolution image from its LR counterpart while at the same time removing the aliasing artifacts is a challenging image interpolation problem. Since a natural image normally contains redundant similar patches, the values of missing pixels can be available at texture-relevant LR pixels. Based on this, we propose an iterative multiscale semilocal interpolation method that can effectively address the aliasing problem. The proposed method estimates each missing pixel from a set of texture-relevant semilocal LR pixels with the texture similarity iteratively measured from a sequence of patches of varying sizes. Specifically, in each iteration, top texture-relevant LR pixels are used to construct a data fidelity term in a maximum a posteriori estimation, and a bilateral total variation is used as the regularization term. Experimental results compared with existing interpolation methods demonstrate that our method can not only substantially alleviate the aliasing problem but also produce better results across a wide range of scenes both in terms of quantitative evaluation and subjective visual quality.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...