Enhancing Neural Networks with Soft Attention Mechanism

attention n.w
1 / 8
Embed
Share

Learn about Recurrent Neural Networks (RNNs) and seq2seq models, and how the soft attention mechanism is used to improve long-range associations and selective information encoding in tasks like translation. Understand the concept of content-based attention and how it enables neural networks to focus on relevant parts of input data for better output generation.

  • Neural Networks
  • Soft Attention
  • RNNs
  • Seq2Seq
  • Translation

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Attention

  2. Recurrent neural networks (RNNs) Idea: Input is processed sequentially, repeating the same operation. LSTM and GRU use gating variables to allow for modulation of dependence (forgetting) and easier backpropagation (assignment of credit). https://distill.pub/2016/augmented-rnns/

  3. seq2seq Combine 2 RNNs to map from one sequence to another (e.g., for sentence translation). Encoder RNN compresses the input into a context vector (generalization of a word embedding). Decoder RNN takes the embedding and expands to produce the output. https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

  4. seq2seq: problems Difficult to make long-range associations. Sequential input makes strongest association between nearest neighbors. Entire content is compressed to a single, usually fixed-length vector. Question: How can we learn to attend to relevant parts of the input when we need them for the output? This would help address both problems above.

  5. Attention for translation Learn to encode multiple pieces of information and use them selectively for the output. Encode the input sentence into a sequence of vectors. Choose a subset of these adaptively while decoding (translating) choose those vectors most relevant for current output. I.e., learn to jointly align and translate. Question: How can we learn and use a vector to decide where to focus attention? How can we make that differentiable to work with gradient descent? Bahdanau et al., 2015 : https://arxiv.org/pdf/1409.0473.pdf

  6. Soft attention Use a probability distribution over all inputs. Classification assigned probability to all possible outputs Attention uses probability to weight all possible inputs learn to weight more relevant parts more heavily. https://distill.pub/2016/augmented-rnns/

  7. Soft attention Content-based attention Each position has a vector encoding content there. Dot product each with a query vector, then softmax. https://distill.pub/2016/augmented-rnns/

  8. Soft attention Content-based attention allows the NN to associate relevant parts of the input with the current part of the output. Using dot products and softmax makes this attention framework differentiable fits into the usual SGD learning mechanism.

Related


More Related Content