Improving Neural Parsing by Disentangling Model Combination and Reranking Effects
This paper by Daniel Fried, Mitchell Stern, and Dan Klein from UC Berkeley focuses on enhancing neural parsing by untangling the impact of model combination and reranking effects. The study delves into innovative approaches to optimize parsing performance through disentanglement strategies.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley
Top-down generative models S VP NP NP idea . an The man had (S (NP The man ) (VP had (NP an idea ) ) . ) GLSTM[Parsing as Language Modeling, Choe and Charniak, 2016] GRNNG[Recurrent Neural Network Grammars, Dyer et al. 2016]
Generative models as rerankers base parser B generative neural model G S-INV S S VP VP NP NP ADJP NP NP The man had an idea. The man had an idea. The man had an idea. NP NP ? ~ ??? ?) argmax? ??(?,?)
Generative models as rerankers base parser B generative neural model G F1 on Penn Tree Bank 89.7 92.6 Choe and Charniak 2016 Charniak parser LSTM language model (GLSTM) 93.3 91.7 Dyer et al. 2016 RNNG-generative (GRNNG) RNNG-discriminative
B: Necessary evil, or secret sauce? base parser B generative neural model G Should we try to do away with B? No, better to combine B and G more explicitly 93.9 F1 on PTB; 94.7 semi-supervised
Using standard beam search for G True Parse man (S (NP The ... (NP The (S (NP (NP (NP (VP (NP Beam (NP (PP (NP Beam Size 100 GRNNG 29.1 F1 GLSTM 27.4 F1
Standard beam search in G fails Word generation is lexicalized: (S (NP The man ) (VP had (NP an idea ) ) . ) 0 Log probability -1 -2 -3 -4 -5
Word-synchronous beam search w0 w1 w2 (NP The (NP (S (NP man man (VP (NP The man The (PP (NP [Roark 2001; Titov and Henderson 2010; Charniak 2010; Buys and Blunsom 2015 ]
Word-synchronous beam search B GLSTM 95 B GRNNG 90 GLSTM 85 F1 on PTB GRNNG 80 75 70 100 200 300 400 500 600 700 800 900 1000 Beam Size
Finding model combination effects Add G s search proposal to candidate list: G B G S S S VP S-INV S- VP NP NP VP S NP NP VP The man had an idea. The man had an idea. The man had an idea. The man had an idea. The man had an idea. The man had an idea. NP NP VP NP NP ADJP NP NP NP NP
Finding model combination effects F1 on PTB 93.7 93.5 93.5 92.8 GRNNG B GLSTM B B B RNNG Generative Model LSTM Generative Model
Reranking shows implicit model combination B G B hides model errors in G
Making model combination explicit Can we do better by simply combining model scores? + B G B + G B G B log??(?,?) ? + 1 ? log??(?|?)
Making model combination explicit F1 on PTB score with G + B score with G 94.2 94.0 93.9 93.9 93.7 93.5 93.5 92.8 GRNNG B GLSTM B B B RNNG Generative Model (G=GRNNG) LSTM Generative Model (G=GLSTM)
Explicit score combination prevents errors fast + B G B + B G B G best
Comparison to past work F1 on PTB 94.7 add silver data 93.9 93.8 addGLSTM 93.6 add silver data 93.5 93.3 GRNNG B GRNNG+B 92.6 Choe & Charniak 2016 Dyer et al. 2016 Ours Kuncoro et al. 2017
Conclusion G Search procedure for (more effective version forthcoming: Stern et al., EMNLP 2017) B G Found model combination effects in Large improvements from simple, explicit score combination: + B G B