RNN LSTM GRU
1. Recurrent neural networks (RNN) Features:
- Remembers its input
- Have internal memory
- Derived from feedforward networks
- Good for sequential data(time series, speech, text, financial data, audio, video, weather)
- Information cycles through a loop
- Add the immediate past to the present
- RNN has two inputs: the present and the recent past.
- Tweak the weights for both through gradient descent and backpropagation through time (BPTT)
- Feed-forward neural networks map one input to one output, RNNs can map one to many, many to many (translation) and many to one (classifying a voice).
When to USE RNN:
- When temporal dynamics that connects the data is more important than the spatial content.
Problems with RNN:
- EXPLODING GRADIENTS: Solved by truncating or squashing the gradients
- VANISHING GRADIENTS: Solved by LSTM, GRU
2. LSTM Features:
- A usual RNN has a short-term memory. In combination with a LSTM they also have a long-term memory (more on that later).
- Enable RNNs to remember inputs over a long period of time.
- Can read, write and delete information from its memory
-
The problematic issues of vanishing gradients is solved through LSTM because it keeps the gradients steep enough, which keeps the training relatively short and the accuracy high.
- In an LSTM you have three gates: input, forget and output gate. These gates determine whether or not to let new input in (input gate), delete the information because it isn’t important (forget gate), or let it impact the output at the current timestep (output gate).
3. GRU (Gated Recurrent Unit):
- Aims to solve the vanishing gradient problem which comes with a standard recurrent neural network
- Uses update gate and reset gate
- Keep information from long ago
- The update gate helps the model to determine how much of the past information (from previous time steps) needs to be passed along to the future.
- Reset gate is used from the model to decide how much of the past information to forget.
- GRU uses less training parameter and therefore uses less memory and executes faster than LSTM whereas LSTM is more accurate on a larger dataset. One can choose LSTM if you are dealing with large sequences and accuracy is concerned, GRU is used when you have less memory consumption and want faster results.
- The GRU controls the flow of information like the LSTM unit, but without having to use a memory unit. It just exposes the full hidden content without any control.