
Understanding LSTM and GRU Architectures
Explore the structures of LSTM and GRU neural networks, with insights on LSTM core ideas, gating mechanisms, and the role of RNN dropout for regularization. Learn about state maintenance, information flow modulation, and output gate modulation in LSTM, as well as the combination of forget and input gates in GRU. Dive into these deep learning concepts through detailed explanations and visual aids provided in the linked resources.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
LSTM: Long Short Term Memory The basic structure of LSTM and some symbols to aid understanding http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM core ideas Two key ideas of LSTM: A backbone to carry state forward and gradients backward. Gating (pointwise multiplication) to modulate information flow. Sigmoid makes 0 < gate < 1. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM gating: forget The fgate is forgetting. Use previous state, C, previous output, h, and current input, x, to determine how much to suppress previous state. E.g., C might encode the fact that we have a subject and need a verb. Forget that when verb found. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM gating: input gate Input gate i determines which values of C to update. Separate tanh layer produces new state to add to C. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM gating: update to C Forget gate does pointwise modulation of C. Input gate modulates the tanh layer this is added to C. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM gating: output o is the output gate: modulates what part of the state C gets passed (via tanh) to current output h. E.g., could encode whether a noun is singular or plural to prepare for a verb. But the real features are learned, not engineered. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
GRU: Gated Recurrent Unit Combine C and h into a single state/output. Combine forget and input gates into update gate, z. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN dropout Dropout is used to regularize weights and prevent co- adaptation. Dropout for RNNs must respect the time invariance of weights and outputs. In Keras GRU, dropout applies vertically, recurrent_dropout applied horizontally. https://arxiv.org/pdf/1512.05287.pdf