
Recurrent Neural Networks: Dynamics and Universal Approximation Theorem
Explore the dynamics of a recurrent neural network evolving with time-ordered inputs, described by equations. Understand the state of dynamic systems, the universal approximation theorem, and the role of hidden neurons in defining the network's state.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Second Second- -Order Network Order Network4 4 The network accepts a time-ordered sequence of inputs and evolves with dynamics defined by the pair of equations: Where: vk,n: the induced local field of hidden neuron k, bk xk,n: the state (output) of neuron k, uj,n: the input applied to source node j, wkij: weight of second-order neuron k. : the associated bias,
15.3 UNIVERSAL APPROXIMATION THEOREM 15.3 UNIVERSAL APPROXIMATION THEOREM1 1 The state of a dynamic system is defined as: a set of quantities that summarizes all the information about the past behavior of the system that is needed to uniquely describe its future behavior, except for the purely external effects arising from the applied input (excitation). Let the q-by-1 vector xn denote the state of a nonlinear discrete-time system, the m-by-1 vector un denote the input applied to the system, and, the p-by-1 vector yn denote the corresponding output of the system.
15.3 UNIVERSAL APPROXIMATION THEOREM 15.3 UNIVERSAL APPROXIMATION THEOREM2 2 Consider a recurrent network whose dynamic behavior, assumed to be noise free, is described by the pair of nonlinear equations Eq. (15.10) is the system (state) equation of the model, Eq. (15.11) is the measurement equation.
15.3 UNIVERSAL APPROXIMATION THEOREM 15.3 UNIVERSAL APPROXIMATION THEOREM3 3 The spaces Rm, Rq, and Rp are called the input space, state space, and output space, respectively. The dimensionality of the state space namely, q is the order of the system. Thus, the state-space model shown is an m-input, p-output recurrent model of order q.
15.3 UNIVERSAL APPROXIMATION THEOREM 15.3 UNIVERSAL APPROXIMATION THEOREM4 4 Note that, only those neurons in the multilayer perceptron that feed back their outputs to the input layer via delays are responsible for defining the state of the recurrent network. This statement therefore excludes the neurons in the output layer from the definition of the state. For the interpretation of matrices Wa, Wb, and Wc, and nonlinear function ( ), we may say the following:
15.3 UNIVERSAL APPROXIMATION THEOREM 15.3 UNIVERSAL APPROXIMATION THEOREM5 5 The matrix Warepresents the synaptic weights of the q neurons in the hidden layer that are connected to the feedback nodes in the input layer. The matrix Wbrepresents the synaptic weights of these hidden neurons that are connected to the source nodes in the input layer. To simplify the composition of Eq. (15.10), the use of bias has been excluded from the state model and from the output layer. The matrix Wcrepresents the synaptic weights of the p linear neurons in the output layer that are connected to the hidden neurons. The nonlinear function ( ) represents the sigmoidal activation function of a hidden neuron. This activation function typically takes the form of a hyperbolic tangent function,
15.3 15.3 UNIVERSAL APPROXIMATION THEOREM UNIVERSAL APPROXIMATION THEOREM6 6 An important property of a recurrent neural network described by the state-space model of Eqs. (15.10) and (15.11) is that it is a universal approximator of all nonlinear dynamic systems. Specifically, we may make the following statement (Lo, 1993): Any nonlinear dynamic system may be approximated by a recurrent neural network to any desired degree of accuracy and with no restrictions imposed on the compactness of the state space, provided that the network is equipped with an adequate number of hidden neurons.
EXAMPLE 1: Fully Connected Recurrent Network FIGURE 15.6 Fully connected recurrent network with two inputs, two hidden neurons, and one output neuron. The feedback connections are shown in blue
EXAMPLE 1: Fully Connected Recurrent Network EXAMPLE 1: Fully Connected Recurrent Network To illustrate the compositions of matrices Wa, Wband Wc consider the fully connected recurrent network shown in Fig. 15.6, where the feedback paths originate from the hidden neurons. In this example, we have m=2, q=3, and p= 1. Where the matrices Wa, Wbare defined as Feedback weights to 1stneuron Feedback weights to 2ndneuron Feedback weights to 3rdneuron Weights between inputs & 1stneuron Weights between inputs & 2ndneuron Weights between inputs & 3rdneuron where the first column of Wb , consisting of b1, b2, and b3, represents the bias terms applied to neurons 1, 2, and 3, respectively. The matrix Wc is a row vector defined by Wc= [1, 0, 0]
15.4 15.4 CONTROLLABILITY AND OBSERVABILITY CONTROLLABILITY AND OBSERVABILITY Controllability is concerned with whether it is possible to control the dynamic behavior. A recurrent neural network is said to be controllable if an initial state is steerable to any desired state within a finite number of time steps. Observability is concerned with whether it is possible to observe the results of the control applied. A recurrent network is said to be observable if the state of the network can be determined from a finite set of input/output measurements.
15.6 15.6 LEARNING ALGORITHMS LEARNING ALGORITHMS1 1 There are two modes of training an ordinary (static) multilayer perceptron: batch mode and stochastic (sequential) mode. Likewise, we have two modes of training a recurrent network, epochwise training and continuous training, described as follows: Epochwise training. For a given epoch, the recurrent network uses a temporal sequence of input target response pairs and starts running from some initial state until it reaches a new state, at which point the training is stopped and the network is reset to an initial state for the next epoch. Continuous training. It is suitable for situations where there are no reset states available or on-line learning is required. The distinguishing feature of continuous training is that the network learns while performing signal processing. Simply put, the learning process never stops.
15.6 15.6 LEARNING ALGORITHMS LEARNING ALGORITHMS2 2 We will describe two different learning algorithms for recurrent networks, summarized as follows: The back-propagation-through-time (BPTT) algorithm operates on the premise that the temporal operation of a recurrent network may be unfolded into a multilayer perceptron. This condition would then pave the way for application of the standard back-propagation algorithm. The BPTT can be implemented in the epochwise mode, continuous (real-time) mode, or a combination thereof. The real-time recurrent learning (RTRL) algorithm is derived from the state-space model described by Eqs. (15.10) and (15.11).
15.6 15.6 LEARNING ALGORITHMS LEARNING ALGORITHMS3 3 BPTT requires less computation than RTRL does, but the memory space required by BPTT increases fast as the length of a sequence of consecutive input target response pairs increases. Generally speaking: BPTT is better for off-line training, RTRL is more suitable for on-line continuous training.
15.7 15.7 BACK PROPAGATION THROUGH TIME BACK PROPAGATION THROUGH TIME BPTT algorithm for training a recurrent network is an extension of the standard back-propagation algorithm. It may be derived by unfolding the temporal operation of the network into a layered feedforward network, The topology of which grows by one layer at every time-step.
Recurrent network Recurrent network unfolded in time WIH= Weights between input-Hidden = Wb WHH= Weights between Hidden-Hidden = Wa WOH= Weights between Hidden-Output = Wc
15.7 BACK PROPAGATION THROUGH TIME 15.7 BACK PROPAGATION THROUGH TIME3 3 To be specific, let N denote a recurrent network required to learn a temporal task, starting from time n0all the way up to time n. Let N* denote the feedforward network that results from unfolding the temporal operation of the recurrent network N . The unfolded network is related to the original network as follows: 1. For each time-step in the interval (no, n], the network N* has a layer containing K neurons, where K is the number of neurons contained in the network N. 2. In every layer of the network N* , there is a copy of each neuron in the network N . 3. For each time-step l [no, n], the synaptic connection from neuron i in layer l to neuron j in layer l + 1 of the network N* is a copy of the synaptic connection from neuron i to neuron j in the network N .
EXAMPLE EXAMPLE 3 3: : Unfolding of two Unfolding of two- -neuron recurrent network Consider the two-neuron recurrent network N shown in Fig. (a) neuron recurrent network z-1 z-1 z-1 z-1 By unfolding the temporal operation of this network in a step-by-step manner, we get the signal-flow graph shown in Fig. (b), which represents the layered feedforward network N*, where the starting time no= 0.
Application of the unfolding procedure leads to two basically different implementations of back propagation through time, depending on whether epochwise training or continuous (real-time) training is used.