
Annotating Relation Inference for Improved Understanding
Discover how annotating relation inference in context can lead to deeper insights in natural language processing. Learn about evaluating algorithms and creating unbiased datasets for more accurate results.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Annotating Relation Inference in Context via Question Answering Omer Levy Bar-Ilan University Israel Ido Dagan
Relation Inference When Is a given natural-language relation implied by another? X cures Y X treats Y Question: Which drug treats headaches? Text: Aspirin cures headaches.
Relation Inference in Context When is a given natural-language relation implied by another? X eliminates Y X treats Y Question: Which drug treats headaches? Text: Aspirin eliminates headaches.
Relation Inference in Context When is a given natural-language relation implied by another? X eliminates Y X treats Y Question: Which drug treats patients? Text: Aspirin eliminates patients.
Prior Art DIRT (Lin and Pantel, 2001) Universal Schema (Rockt schel et al., NAACL 2015) PPDB 2.0 (Pavlick et al., ACL 2015) RELLY (Grycner et al., EMNLP 2015)
This Talk Not about relation inference algorithms How to evaluate relation inference algorithms Problem: current evaluations are biased and can t measure recall Contributions: Novel methodology for creating unbiased and natural datasets A new benchmark for relation inference in context
Extrinsic Evaluation Usage: plug the inference algorithm into an RTE system Problems: Mixes other semantic phenomena Less relation inference examples Hard to trace/analyze System selection introduces bias We want an intrinsic evaluation too
Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts 1) Learn inference rules 2) Apply rules to text X eliminates Y X treats Y aspirin eliminates headaches aspirin treats headaches
Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts 1) Learn inference rules 2) Apply rules to text 3) Annotate for entailment X eliminates Y X treats Y aspirin eliminates headaches aspirin treats headaches
Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts Problems: Expensive research cycle Difficult to replicate Oblivious to recall We want a pre-annotated dataset
Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment aspirin eliminates headaches aspirin treats headaches
Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment run once aspirin eliminates headaches aspirin treats headaches
Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment 4) Compare new algorithms predictions to annotated data run once Premise Hypothesis Label Algo aspirin eliminates headaches aspirin treats headaches
Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment 4) Compare new algorithms predictions to annotated data run once Premise Hypothesis Label Algo aspirin eliminates headaches aspirin treats headaches
Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation Problems: Expensive research cycle Difficult to replicate Oblivious to recall Biased towards DIRT
Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels
Reformulating Relation Inference Relation Inference as Question Answering Question Answering
Data Collection ???? ???? ????? Questions Which ingredient is included in chocolate? ???? ???? ??????? Candidate Answers chocolate is made from the cocoa bean
Data Collection: Questions Existing QA datasets TREC (Voorhees and Tice, 2000) WikiAnswers (Fader et al., 2013) WebQuestions (Berant et al., 2013) Manually converted to Which ?????????????? Who climbed the Everest? = Which person climbed the Everest?
Data Collection: Questions Existing QA datasets TREC (Voorhees and Tice, 2000) WikiAnswers (Fader et al., 2013) WebQuestions (Berant et al., 2013) key idea: naturally-occurring questions Manually converted to Which ?????????????? Who climbed the Everest? = Which person climbed the Everest?
Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where:
Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? (????= ????)
Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? (????= ????) (??????? is a ?????)
Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? 3) The relation is different from ???? (????= ????) (??????? is a ?????) (???? ????)
Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? 3) The relation is different from ???? (????= ????) (??????? is a ?????) (???? ????) key idea: unbiased sample of relations
Crowdsourced Annotation Given 1 question + 20 matching candidate answers Annotate each candidate answer as either: The sentence answers the question. The sentence does not answer the question. ? The sentence does not make sense, or is severely non-grammatical.
Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia
Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia Eritrea borders Ethiopia
Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia
Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia Italy borders Ethiopia
Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia Italy borders Ethiopia Filter world knowledge from annotation by masking the answer
Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: [COUNTRY] invaded Ethiopia Filter world knowledge from annotation by masking the answer Substitute ??????? with ?????
Crowdsourced Annotation No context switch faster annotation!
Crowdsourced Annotation: Aggregation 5 Mechanical Turk annotators per question-answer pair At least 4/5 must agree on label Discard nonsensical/non-grammatical (?) examples Labeled examples (after filtering): 16,371 Agreement with expert: ~90% ?1
Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels
Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition)
Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????)
Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????) (16K examples)
Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????) (16K examples) (F1 90% vs expert)
Recap Novel methodology for creating unbiased and natural datasets Key idea: Reformulate relation inference as question answering A new benchmark for relation inference in context Empirical finding: current methods have very low coverage
Going Forward Data is publicly available! Poses a new challenge bit.ly/2aCLuLB Code is publicly available! Extend our methodology Larger datasets for supervised learning bit.ly/2b05EhK Thank you!