Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Neural Netw ; 165: 654-661, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37364474

ABSTRACT

We employ properties of high-dimensional geometry to obtain some insights into capabilities of deep perceptron networks to classify large data sets. We derive conditions on network depths, types of activation functions, and numbers of parameters that imply that approximation errors behave almost deterministically. We illustrate general results by concrete cases of popular activation functions: Heaviside, ramp sigmoid, rectified linear, and rectified power. Our probabilistic bounds on approximation errors are derived using concentration of measure type inequalities (method of bounded differences) and concepts from statistical learning theory.


Subject(s)
Neural Networks, Computer
2.
IEEE Trans Neural Netw Learn Syst ; 30(9): 2746-2754, 2019 09.
Article in English | MEDLINE | ID: mdl-30640635

ABSTRACT

The choice of dictionaries of computational units suitable for efficient computation of binary classification tasks is investigated. To deal with exponentially growing sets of tasks with increasingly large domains, a probabilistic model is introduced. The relevance of tasks for a given application area is modeled by a product probability distribution on the set of all binary-valued functions. Approximate measures of network sparsity are studied in terms of variational norms tailored to dictionaries of computational units. Bounds on these norms are proven using the Chernoff-Hoeffding bound on sums of independent random variables that need not be identically distributed. Consequences of the probabilistic results for the choice of dictionaries of computational units are derived. It is shown that when a priori knowledge of a type of classification tasks is limited, then the sparsity may be achieved only at the expense of large sizes of dictionaries.

3.
Neural Comput ; 29(8): 2203-2291, 2017 08.
Article in English | MEDLINE | ID: mdl-28562221

ABSTRACT

Optimal control theory and machine learning techniques are combined to formulate and solve in closed form an optimal control formulation of online learning from supervised examples with regularization of the updates. The connections with the classical linear quadratic gaussian (LQG) optimal control problem, of which the proposed learning paradigm is a nontrivial variation as it involves random matrices, are investigated. The obtained optimal solutions are compared with the Kalman filter estimate of the parameter vector to be learned. It is shown that the proposed algorithm is less sensitive to outliers with respect to the Kalman estimate (thanks to the presence of the regularization term), thus providing smoother estimates with respect to time. The basic formulation of the proposed online learning framework refers to a discrete-time setting with a finite learning horizon and a linear model. Various extensions are investigated, including the infinite learning horizon and, via the so-called kernel trick, the case of nonlinear models.

4.
Neural Netw ; 91: 34-41, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28482227

ABSTRACT

Limitations of approximation capabilities of shallow perceptron networks are investigated. Lower bounds on approximation errors are derived for binary-valued functions on finite domains. It is proven that unless the number of network units is sufficiently large (larger than any polynomial of the logarithm of the size of the domain) a good approximation cannot be achieved for almost any uniformly randomly chosen function on a given domain. The results are obtained by combining probabilistic Chernoff-Hoeffding bounds with estimates of the sizes of sets of functions exactly computable by shallow networks with increasing numbers of units.


Subject(s)
Neural Networks, Computer
5.
Neural Comput ; 27(2): 388-480, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25380338

ABSTRACT

The mathematical foundations of a new theory for the design of intelligent agents are presented. The proposed learning paradigm is centered around the concept of constraint, representing the interactions with the environment, and the parsimony principle. The classical regularization framework of kernel machines is naturally extended to the case in which the agents interact with a richer environment, where abstract granules of knowledge, compactly described by different linguistic formalisms, can be translated into the unified notion of constraint for defining the hypothesis set. Constrained variational calculus is exploited to derive general representation theorems that provide a description of the optimal body of the agent (i.e., the functional structure of the optimal solution to the learning problem), which is the basis for devising new learning algorithms. We show that regardless of the kind of constraints, the optimal body of the agent is a support constraint machine (SCM) based on representer theorems that extend classical results for kernel machines and provide new representations. In a sense, the expressiveness of constraints yields a semantic-based regularization theory, which strongly restricts the hypothesis set of classical regularization. Some guidelines to unify continuous and discrete computational mechanisms are given so as to accommodate in the same framework various kinds of stimuli, for example, supervised examples and logic predicates. The proposed view of learning from constraints incorporates classical learning from examples and extends naturally to the case in which the examples are subsets of the input space, which is related to learning propositional logic clauses.

6.
IEEE Trans Neural Netw Learn Syst ; 26(9): 2019-32, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25389245

ABSTRACT

A learning paradigm is proposed and investigated, in which the classical framework of learning from examples is enhanced by the introduction of hard pointwise constraints, i.e., constraints imposed on a finite set of examples that cannot be violated. Such constraints arise, e.g., when requiring coherent decisions of classifiers acting on different views of the same pattern. The classical examples of supervised learning, which can be violated at the cost of some penalization (quantified by the choice of a suitable loss function) play the role of soft pointwise constraints. Constrained variational calculus is exploited to derive a representer theorem that provides a description of the functional structure of the optimal solution to the proposed learning paradigm. It is shown that such an optimal solution can be represented in terms of a set of support constraints, which generalize the concept of support vectors and open the doors to a novel learning paradigm, called support constraint machines. The general theory is applied to derive the representation of the optimal solution to the problem of learning from hard linear pointwise constraints combined with soft pointwise constraints induced by supervised examples. In some cases, closed-form optimal solutions are obtained.

7.
Neural Comput ; 25(4): 1029-106, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23339616

ABSTRACT

Kernel machines traditionally arise from an elegant formulation based on measuring the smoothness of the admissible solutions by the norm in the reproducing kernel Hilbert space (RKHS) generated by the chosen kernel. It was pointed out that they can be formulated in a related functional framework, in which the Green's function of suitable differential operators is thought of as a kernel. In this letter, our own picture of this intriguing connection is given by emphasizing some relevant distinctions between these different ways of measuring the smoothness of admissible solutions. In particular, we show that for some kernels, there is no associated differential operator. The crucial relevance of boundary conditions is especially emphasized, which is in fact the truly distinguishing feature of the approach based on differential operators. We provide a general solution to the problem of learning from data and boundary conditions and illustrate the significant role played by boundary conditions with examples. It turns out that the degree of freedom that arises in the traditional formulation of kernel machines is indeed a limitation, which is partly overcome when incorporating the boundary conditions. This likely holds true in many real-world applications in which there is prior knowledge about the expected behavior of classifiers and regressors on the boundary.

8.
Neural Netw ; 24(8): 881-7, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21704495

ABSTRACT

Approximation capabilities of two types of computational models are explored: dictionary-based models (i.e., linear combinations of n-tuples of basis functions computable by units belonging to a set called "dictionary") and linear ones (i.e., linear combinations of n fixed basis functions). The two models are compared in terms of approximation rates, i.e., speeds of decrease of approximation errors for a growing number n of basis functions. Proofs of upper bounds on approximation rates by dictionary-based models are inspected, to show that for individual functions they do not imply estimates for dictionary-based models that do not hold also for some linear models. Instead, the possibility of getting faster approximation rates by dictionary-based models is demonstrated for worst-case errors in approximation of suitable sets of functions. For such sets, even geometric upper bounds hold.


Subject(s)
Computer Simulation , Linear Models , Algorithms , Neural Networks, Computer , Reproducibility of Results
9.
Neural Netw ; 24(2): 171-82, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21094023

ABSTRACT

Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator.


Subject(s)
Dictionaries as Topic , Linear Models , Neural Networks, Computer , Computational Biology , Models, Neurological , Statistics, Nonparametric
10.
Neural Comput ; 22(3): 793-829, 2010 Mar.
Article in English | MEDLINE | ID: mdl-19922296

ABSTRACT

Various regularization techniques are investigated in supervised learning from data. Theoretical features of the associated optimization problems are studied, and sparse suboptimal solutions are searched for. Rates of approximate optimization are estimated for sequences of suboptimal solutions formed by linear combinations of n-tuples of computational units, and statistical learning bounds are derived. As hypothesis sets, reproducing kernel Hilbert spaces and their subsets are considered.


Subject(s)
Artificial Intelligence , Algorithms , Linear Models , Time Factors
11.
IEEE Trans Neural Netw ; 18(1): 86-96, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17278463

ABSTRACT

A methodology to design state estimators for a class of nonlinear continuous-time dynamic systems that is based on neural networks and nonlinear programming is proposed. The estimator has the structure of a Luenberger observer with a linear gain and a parameterized (in general, nonlinear) function, whose argument is an innovation term representing the difference between the current measurement and its prediction. The problem of the estimator design consists in finding the values of the gain and of the parameters that guarantee the asymptotic stability of the estimation error. Toward this end, if a neural network is used to take on this function, the parameters (i.e., the neural weights) are chosen, together with the gain, by constraining the derivative of a quadratic Lyapunov function for the estimation error to be negative definite on a given compact set. It is proved that it is sufficient to impose the negative definiteness of such a derivative only on a suitably dense grid of sampling points. The gain is determined by solving a Lyapunov equation. The neural weights are searched for via nonlinear programming by minimizing a cost penalizing grid-point constraints that are not satisfied. Techniques based on low-discrepancy sequences are applied to deal with a small number of sampling points, and, hence, to reduce the computational burden required to optimize the parameters. Numerical results are reported and comparisons with those obtained by the extended Kalman filter are made.


Subject(s)
Algorithms , Computing Methodologies , Models, Theoretical , Neural Networks, Computer , Nonlinear Dynamics , Artificial Intelligence , Cluster Analysis , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...