Improved Hierarchical Word Sequence Language Model Using Word Association
This research presents an enhanced hierarchical word sequence language model leveraging word association techniques. It explores the motivation behind the model, smoothing techniques for data sparsity, and the basic idea of the proposed approach, focusing on patterns and word generation. The proposed Frequency-based and Original HWS Models are also discussed within the context of a magazine editorial scenario.
Uploaded on Feb 16, 2025 | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
An Improved Hierarchical Word Sequence Language Model Using Word Association 2015.11.26 NARA Institute of Science and Technology Xiaoyi Wu Yuji Matsumoto Kevin Duh Hiroyuki Shindo
Motivation a selfish man Continuous Language Model Learned Sequences Unseen Sequence man a selfish man a selfish Data Sparsity 1 1
Motivation Smoothing techniques man a ) ? P( 30 years worth of newswire text 1/3 trigrams are unseen (Allison et al., 2005) Training data Discontinuous Sequence man a selfish man a 2 2
HWS language model n-gram HWS as as soon as as soon possible possible continuous discontinuous utterance-oriented pattern-oriented 3 3
Basic Idea of HWS Patterns are discontinuous(Sentence are divided into several sections by patterns) x is a y of z Patterns are hierarchical x is a y of z x is y of z x is z Words are generated from certain position of patterns (Words depend on patterns) 4 4
Basic Idea of HWS Tom is a boy of nine is Hierarchical Tom of nine a Word depends on pattern boy discontinuous 5 5
Proposed Approach (Frequency-based HWS Model) Mrs. Allen is a senior editor of insight magazine Corpus of Insight magazine Mrs. Allen is a senior editor 6 6
Proposed Approach (Frequency-based HWS Model) Mrs. Allen is a senior editor of insight magazine Corpus of magazine is Mrs. Allen Insight a senior editor 7 7
Proposed Approach (Original HWS Model) Mrs. Allen is a senior editor of insight magazine Corpus of magazine is Mrs. Insight a Allen editor senior 8 8
Proposed Approach (Original HWS Model) Mrs. Allen is a senior editor of insight magazine Corpus of magazine is Mrs. Insight a Allen editor senior ($, of), (of, a), (a, is), (is, Mrs.), (Mrs., Allen), (a, senior), (senior, editor), (of, magazine), (magazine, insight) 9 9
Advantage of HWS: discontinuity n-gram HWS as as soon as as soon possible possible 10 10
Word Association Based HWS Frequency-based HWS Word Association Based HWS Word Association Score Frequency to to much too too handle much handle 11 11
Extra Techniques 1/2: Directionalization HWS n-gram (Double-side generation) (One-side generation) as as soon as as soon possible possible ($, $, as), ($, as, soon), (as, soon, as), (soon, as, possible) ($, $, as), ($, as, as), (as, as, soon), (as, as, possible) 12 12
Extra Techniques 1/2: Directionalization HWS HWS (Directionalization) R as as as as R L soon soon possible possible ($, as, as), ($-R, as-R, as), (as, as, soon), (as, as, possible) (as-R, as-L, soon), (as-R, as-R, possible) 13 13
Extra Techniques 2/2: Unification HWS HWS (Unification) the . the . when constructing a HWS structure, for each word in one sentence, we only count it once. 14 14
Intrinsic Experiments Training data British National Corpus 449,755 sentences, 10 million words Test data English Gigaword Corpus 44,702 sentences, 1 million words Preprocessing NLTK tokenizer Lowercase Word Association Score Dice coefficient Smoothing methods MKN(Modeified Kneser-Ney) (Chen & Goodman, 1999) GLM(Generalized language model) (Pickhardt et. al.,2014) Evaluation measures Perplexity Coverage (|TR TE| / |TE|) Usage (|TR TE| / |TR|) 15 15
Evaluation 16 16
Extrinsic Experimental Settings Training data TED talks parallel corpus French-English (139761 sentence pairs) Test data TED talks parallel corpus French-English (1617sentence pairs) Translation toolkit Moses system Evaluation measures BLEU (Papineni et al., 2002) METEOR(Banerjee & Lavie, 2005) TER (Snover et al., 2006) 17 17
Extrinsic Evaluation 18 18
Conclusions We proposed an improved hierarchical language model using word association and two extra techniques Proposed model can model natural language more precisely than the original FB-HWS Proposed model has better performance on both intrinsic and extrinsic experiments Source code can be downloaded at https://github.com/aisophie/HWS 19 19
` Thank you for your attention! 20 20