Tech

1. Recurrent neural networks (RNN) Features:

Remembers its input
Have internal memory
Derived from feedforward networks
Good for sequential data(time series, speech, text, financial data, audio, video, weather)
Information cycles through a loop
Add the immediate past to the present
RNN has two inputs: the present and the recent past.
Tweak the weights for both through gradient descent and backpropagation through time (BPTT)

Feed-forward neural networks map one input to one output, RNNs can map one to many, many to many (translation) and many to one (classifying a voice).

When temporal dynamics that connects the data is more important than the spatial content.

A usual RNN has a short-term memory. In combination with a LSTM they also have a long-term memory (more on that later).
Enable RNNs to remember inputs over a long period of time.
Can read, write and delete information from its memory
The problematic issues of vanishing gradients is solved through LSTM because it keeps the gradients steep enough, which keeps the training relatively short and the accuracy high.
In an LSTM you have three gates: input, forget and output gate. These gates determine whether or not to let new input in (input gate), delete the information because it isn’t important (forget gate), or let it impact the output at the current timestep (output gate).

Aims to solve the vanishing gradient problem which comes with a standard recurrent neural network
Uses update gate and reset gate
Keep information from long ago
The update gate helps the model to determine how much of the past information (from previous time steps) needs to be passed along to the future.
Reset gate is used from the model to decide how much of the past information to forget.
GRU uses less training parameter and therefore uses less memory and executes faster than LSTM whereas LSTM is more accurate on a larger dataset. One can choose LSTM if you are dealing with large sequences and accuracy is concerned, GRU is used when you have less memory consumption and want faster results.
The GRU controls the flow of information like the LSTM unit, but without having to use a memory unit. It just exposes the full hidden content without any control.