Search | VHL Regional Portal

An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks.

Wang, Qinglong; Zhang, Kaixuan; Ororbia Ii, Alexander G; Xing, Xinyu; Liu, Xue; Giles, C Lee.

Neural Comput ; 30(9): 2568-2591, 2018 09.

Article in English | MEDLINE | ID: mdl-30021081

ABSTRACT

Rule extraction from black box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly nonlinear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order RNNs trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases, the rules outperform the trained RNNs.

Learning Simpler Language Models with the Differential State Framework.

Ororbia Ii, Alexander G; Mikolov, Tomas; Reitter, David.

Neural Comput ; 29(12): 3327-3352, 2017 12.

Article in English | MEDLINE | ID: mdl-28957029

ABSTRACT

Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The differential state framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term memory by learning to interpolate between a fast-changing data-driven representation and a slowly changing, implicitly stable state. Within the DSF framework, a new architecture is presented, the delta-RNN. This model requires hardly any more parameters than a classical, simple recurrent network. In language modeling at the word and character levels, the delta-RNN outperforms popular complex architectures, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU), and, when regularized, performs comparably to several state-of-the-art baselines. At the subword level, the delta-RNN's performance is comparable to that of complex gated architectures.

Unifying Adversarial Training Algorithms with Data Gradient Regularization.

Ororbia Ii, Alexander G; Kifer, Daniel; Giles, C Lee.

Neural Comput ; 29(4): 867-887, 2017 04.

Article in English | MEDLINE | ID: mdl-28095194

ABSTRACT

Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad (which also has L1 and L2 flavors of regularization) outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL