A recurrent neural network (RNN) is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.
Recurrent neural networks must be approached differently from feedforward neural networks, both when analyzing their behavior and training them. Recurrent neural networks can also behave chaotically. Usually, dynamical systems theory is used to model and analyze them. While a feedforward network propagates data linearly from input to output, recurrent networks (RN) also propagate data from later processing stages to earlier stages.
Contents 
Some of the most common recurrent neural network architectures are described here. The Elman and Jordan networks are also known as "simple recurrent networks" (SRN).
This variation on the multilayer perceptron was invented by Jeff Elman. A threelayer network is used, with the addition of a set of "context units" in the input layer. There are connections from the middle (hidden) layer to these context units fixed with a weight of one^{[1]}. At each time step, the input is propagated in a standard feedforward fashion, and then a learning rule is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequenceprediction that are beyond the power of a standard multilayer perceptron.
This network architecture is similar to the Elman network. The context units are however fed from the output layer instead of the hidden layer.
See ^{[2]}
The Hopfield network is a recurrent neural network in which all connections are symmetric. Invented by John Hopfield in 1982, this network guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust contentaddressable memory, resistant to connection alteration.
The echo state network (ESN) is a recurrent neural network with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be trained. ESN are good to (re)produce temporal patterns.
The Long short term memory (LSTM) is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients. It can therefore use long delays and can handle signals that have a mix of low and high frequency components.
In this setup the recurrent neural network with parametric bias (RNNPB) is trained to reproduce a sequence with a constant bias input. The network is capable of learning different sequences with different parametric biases. With a trained network is also possible to find the associated parameter for an observed sequence. The sequence is backpropagated through the network to recover the bias which would produce the given sequence.
(CTRNN)
Here RNN are sparsely connected together through bottlenecks with the idea to isolate different hierarchical functions to different parts of the composite network. ^{[3]} ^{[4]} ^{[5]}
(RMLP)^{[6]}
Training in recurrent neural networks is generally very slow.
In this approach the simple recurrent network is unfolded in time for some iterations and then trained through backpropagation as the feed forward network.
Unlike BPTT this algorithm is local in time but not local in space ^{[7]}
Since RNN learning is very slow, genetic algorithms are a feasible alternative for weight optimization, especially in unstructured networks^{[8]}.
Genetic algorithms come in handy for neural network training as the goal of neural training is to seek an optimal set of weights. Therefore, training neural network is seen as an optimization problem. Initially, the genetic algorithm is encoded with the neural network weights in a predefined manner where one gene in the chromosome represents one weight link, henceforth; the whole network is represented as a single chromosome. There are many chromosomes that make up the population; therefore, many different neural networks are evolved until a stopping criterion is satisfied. A common stopping scheme is: 1) when the neural network has learnt a certain percentage of the training data or 2) when the minimum value of the meansquarederror is satisfied or 3) when the maximum number of training generations has been reached. The stopping criterion is evaluated by the fitness function as it gets the reciprocal of the meansquarederror from each neural network during training. Therefore, the goal of the genetic algorithm is to maximize the fitness function, hence, reduce the meansquarederror. The fitness function is evaluated as follows: each weight encoded in the chromosome is assigned to the respective weight link of the network. The training set of examples is then presented to the network which propagates the input signals forward and the meansquarederror is returned to the fitness function which influences the genetic selection process. An overview of the entire process is shown in the algorithm below which assumes that two parents for a single child chromosome.
BEGIN Algorithm Genetic algorithm for neural network training Initialise Population(P) of individuals (weights) representing N neural networks while !termination while i < P.size() 1) Evaluate fitness of each neural network 2) Selection of Parents 3) Crossover 4) Mutation 5) Increment i end while Update population end while Get the best individual and copy into Neural network Load data for testing Neural network END Algorithm
Simulated annealing is a global optimization technique that is often used to seek a good set of weights.
Particle swarm optimization is a global optimization technique that is often used to seek a good set of weights.
