
Entity Centric Coreference Resolution with Model Stacking
Discover how entity-level information influences coreference decisions in the context of the model stacking approach presented by Kevin Clark and Christopher D. Manning. Learn about the use of mention pair models, logistic classifiers, and the effective building of coreference clusters.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Entity Centric Coreference Resolution with Model Stacking Kevin Clark and Christopher D. Manning (ACL-IJCNLP 2015) (Tables are taken from the above-mentioned paper) Presented by Mamoru Komachi <komachi@tmu.ac.jp> ACL 2015 Reading Group @ Tokyo Institute of Technology August 26th, 2015
Entity-level information allows early coreference decisions to inform later ones Hillary Clinton files for divorce from Bill Clinton ahead of her campaign for presidency for 2016. ? ! ? . Clinton is confident that her poll numbers will skyrocket once the divorce is final. Entity-centric coreference systems build up coreference clusters incrementally (Raghunathan et al., 2010; Stoyanov and Eisner, 2012; Ma et al., 2014) 2
Problem: How to build up clusters effectively? Model stacking Two mention pair models: classification model and ranking model Generates clusters features for clusters of mentions Imitation learning Assigns exact costs to actions based on coreference evaluation metrics Uses the scores of the pairwise models to reduce the search space 3
Mention Pair Models Previous approach using local information 4
Two models for predicting whether a given pair of mentions belong to the same coreference cluster Bill arrived, but nobody saw him. I talked to him on the phone. Is it a coreferent? Classification model Which one best suites for the mention? Ranking model 5
Logistic classifiers for classification model M: set of all mentions in the training set T(m): set of true antecedents of a mention m F(m): set of false antecedents of m Considers each pair of mentions independently 6
Logistic classifiers for ranking model Considers candidate antecedents simultaneously Max-margin training encourages the model to find the single best antecedent for a mention, but it is not robust for a downstream clustering model 7
Features for mention pair model Distance features: the distance between the two mentions in sentences or number of mentions Syntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head word Semantic features: named entity type, speaker identification Rule-based features: exact and partial string matching Lexical features: the first, last, and head word of the current mention 8
Entity-Centric Coreference Model Proposed approach using cluster features 9
Entity-centric model can exhibit high coherency Best first clustering (Ng and Cardie, 2002) Assigns the most probable preceding mention classified as coreferent with it as the antecedent Only relies on local information Entity-centric model (this work) Operates between pairs of clusters instead of pairs of mentions Builds up coreference chains with agglomerative clustering, by merging clusters if it predicts they are representing the same one 10
Inference Reducing the search space by using a threshold from mention-pair models Sort P to perform easy-first clustering s is a scoring function to make a binary decision for merge action 11
Learning entity-centric model by imitation learning Sequential prediction problem: future observations depend on previous actions Imitation learning (in this work, DAgger (Ross al., 2011)), is useful for this problem (Argall et al., 2009) Training the agent on the gold labels alone assumes that all previous decisions were correct, but it is problematic in coreference, where the error rate is quite high DAgger exposes the system to states at train time similar to the ones it will face at test time 12
Learning cluster merging policy by DAgger (Ross et al., 2011) Iterative algorithm aggregating a dataset D consisting of states and the actions performed by the expert policy in those states controls the probability of the expert s policy and current policy (decays exponentially as the iteration number increases) 13
Adding cost to actions: Directly tune to optimize coreference metrics Merging clusters (order of merge operations is also important) influence the score How a particular local decision will affect the final score of the coreference system? Problem: standard coreference metrics do not decompose into clusters Answer: rolling out the actions from the current state A(s): set of actions that can be taken from the state s 14
Cluster features for classification model and ranking model Between clusters features Minimum and maximum probability of coreference Average probability and average log prob. of coreference Average probability and log probability of coreference for a particular pair of grammar types of mentions (pron or not) 15
Only 56 features for entity-centric model State features Whether a preceding mention pair in the list of mention pairs has the same candidate anaphor as the current one The index of the current mention pair in the list divided by the size of the list (what percentage of the list have we seen so far?) Entity-centric model doesn t rely on sparse lexical features. Instead, it employs model stacking to exploit strong features (with scores learned from pairwise model) 16
Results and discussions CoNLL 2012 English coreference task 17
Experimental setup: CoNLL 2012 Shared Task English portion of OntoNotes Training: 2802, development: 343, test:345 documents Use the provided pre-processing (parse trees, NE, etc) Common evaluation metrics MUC, B3, CEAFE CoNLL F1 (the average F1 score of the three metrics) CoNLL scorer version 8.01 Rule-based mention detection (Raghunathan et al., 2010) 18
Results: Entity-centric model outperforms best-first clustering in both classification and ranking 19
Entity-centric model beats other state-of- the-art coreference models This work primarily optimize for B3 metric during training State-of-the-art systems use latent antecedents to learn scoring functions over mention pairs, but are trained to maximize global objective functions 20
Entity-centric model directly learns a coreference model that maximizes an evaluation metric Post-processing of mention pair and ranking models Closest-first clustering (Soon et al., 2001) Best-first clustering (Ng and Cardie, 2002) Global inference models Global inference with integer linear programming (Denis and Baldridge, 2007; Finkel and Manning, 2008) Graph partitioning (McCallum and Wellner, 2005; Nicolae and Nicolae, 2006) Correlational clustering (McCallum and Wellner, 2003; Finely and Joachims, 2005) 21
Previous approaches do not directly tune against coreference metrics Non-local entity-level information Cluster model (Luo et al., 2004; Yang et al., 2008; Rahman and Ng, 2011) Joint inference (McCallum and Wellner, 2003; Culotta et al., 2006; Poon and Domingos, 2008; Haghighi and Klein, 2010) Learning trajectories of decisions Imitation learning (Daume et al., 2005; Ma et al., 2014) Structured perceptron (Stoyanov and Eisner, 2012; Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014) 22
Summary Proposed an entity-centric coreference model using the scores produced by mention pair models as features Pairwise scores are learned using standard coreference metrics Imitation learning can be used to learn how to build up coreference chains incrementally Proposed model outperforms the commonly used best- first method and current state-of-the-art 23