Search | VHL Regional Portal

An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification.

Fouss, François; Francoisse, Kevin; Yen, Luh; Pirotte, Alain; Saerens, Marco.

Neural Netw ; 31: 53-72, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22497802

ABSTRACT

This paper presents a survey as well as an empirical comparison and evaluation of seven kernels on graphs and two related similarity matrices, that we globally refer to as "kernels on graphs" for simplicity. They are the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commute-time (or resistance-distance) kernel, the random-walk-with-restart similarity matrix, and finally, a kernel first introduced in this paper (the regularized commute-time kernel) and two kernels defined in some of our previous work and further investigated in this paper (the Markov diffusion kernel and the relative-entropy diffusion matrix). The kernel-on-graphs approach is simple and intuitive. It is illustrated by applying the nine kernels to a collaborative-recommendation task, viewed as a link prediction problem, and to a semisupervised classification task, both on several databases. The methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commute-time and the Markov diffusion kernels perform best on the investigated tasks, closely followed by the regularized Laplacian kernel.

Subject(s)

Databases, Factual/classification , Markov Chains , Statistics as Topic/classification , Random Allocation

The sum-over-paths covariance kernel: a novel covariance measure between nodes of a directed graph.

Mantrach, Amin; Yen, Luh; Callut, Jerome; Francoisse, Kevin; Shimbo, Masashi; Saerens, Marco.

IEEE Trans Pattern Anal Mach Intell ; 32(6): 1112-26, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20431135

ABSTRACT

This work introduces a link-based covariance measure between the nodes of a weighted directed graph, where a cost is associated with each arc. To this end, a probability distribution on the (usually infinite) countable set of paths through the graph is defined by minimizing the total expected cost between all pairs of nodes while fixing the total relative entropy spread in the graph. This results in a Boltzmann distribution on the set of paths such that long (high-cost) paths occur with a low probability while short (low-cost) paths occur with a high probability. The sum-over-paths (SoP) covariance measure between nodes is then defined according to this probability distribution: two nodes are considered as highly correlated if they often co-occur together on the same--preferably short--paths. The resulting covariance matrix between nodes (say n nodes in total) is a Gram matrix and therefore defines a valid kernel on the graph. It is obtained by inverting an n\times n matrix depending on the costs assigned to the arcs. In the same spirit, a betweenness score is also defined, measuring the expected number of times a node occurs on a path. The proposed measures could be used for various graph mining tasks such as computing betweenness centrality, semi-supervised classification of nodes, visualization, etc., as shown in Section 7.

Randomized shortest-path problems: two related models.

Saerens, Marco; Achbany, Youssef; Fouss, François; Yen, Luh.

Neural Comput ; 21(8): 2363-404, 2009 Aug.

Article in English | MEDLINE | ID: mdl-19323635

ABSTRACT

This letter addresses the problem of designing the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance, by minimizing the total travel cost-nothing particular up to now-you could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, avoid congestion, or avoid complete predictability of your routing strategy. In other words, you want to introduce some randomness or unpredictability in the routing policy (i.e., the routing policy is randomized). This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantified by the expected Shannon entropy spread throughout the network and is provided a priori by the designer. Then, necessary conditions to compute the optimal randomized policy-minimizing the expected routing cost-are derived. Iterating these necessary conditions, reminiscent of Bellman's value iteration equations, allows computing an optimal policy, that is, a set of transition probabilities in each node. Interestingly and surprisingly enough, this first model, while formulated in a totally different framework, is equivalent to Akamatsu's model ( 1996 ), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsu's model by recasting it into a sum-over-paths statistical physics formalism allowing easy derivation of all the quantities of interest in an elegant, unified way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second model is therefore more convincing because of its computational efficiency and soundness. Finally, simulation results obtained on simple, illustrative examples show that the models behave as expected.

Subject(s)

Models, Statistical , Neural Networks, Computer , Computer Simulation , Humans , Probability

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL