Distributed Representation of Words and Phrases - Enhancing Vector Quality and Training Speed

Distributed Representation of Words and Phrases - Enhancing Vector Quality and Training Speed
Slide Note
Embed
Share

Presenting extensions of the Skip-gram model that enhance vector quality and training speed. Discusses techniques like Hierarchical Softmax, Noise Contrastive Estimation, Negative Sampling, and more.

  • Representation
  • Skip-gram model
  • Vector
  • Training
  • Hierarchical Softmax

Uploaded on Mar 03, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Distributed Representation of Words and Phrases and their Compositionality Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffery Dean Chih-yuan Li cl524@njit.edu 1

  2. Introduction In this paper we present several extensions of Skip-gram model that improve both quality of the vectors and the training speed. 2

  3. Skip-gram Model An efficient method for learning high-quality vector representations of words from large amounts of unstructured text data. 3

  4. Skip-gram Model extension Hierarchical Softmax ( log2(number of node) ) Noise Constrative Estimation (NCE) Negative Sampling 4

  5. Skip-gram Model extension Subsampling of the frequent models t is the threshold, typically around 10^(-5) 5

  6. Empirical Results vec( Berlin )-vec( Germany )+vec( France )=vec( X ) X = Paris 6

  7. Learning Phrases New York Times / Toronto Maple Leafs this is 8

  8. Learning Phrases 9

  9. Phrase Skip-Gram Results Surprisingly, as we found the Hierarchical Softmax to achieve lower performance when trained without subsampling, it became the best performing method when we downsampled the frequent words. This result shows that the subsampling can result in faster training and can also improve accuracy. 10

  10. Additive Compositionality The Skip-gram representations exhibit another kind of linear structure that makes it possible to meaningfully combine words by an element-wise addition of their vector representations. 11

  11. Comparison 12

  12. Contributions How to train distributed representations of words and phrases with the Skip-gram model. Demonstration of these representations exhibit linear structure that makes precise analogical reasoning possible. Vector Addition Phrases representation with a single token 13

More Related Content