Annotating Relation Inference for Improved Understanding

annotating relation inference in context n.w

1 / 47

Embed Share

Discover how annotating relation inference in context can lead to deeper insights in natural language processing. Learn about evaluating algorithms and creating unbiased datasets for more accurate results.

lizz_6 Follow

Uploaded on Apr 09, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Annotating Relation Inference in Context via Question Answering Omer Levy Bar-Ilan University Israel Ido Dagan

Relation Inference When Is a given natural-language relation implied by another? X cures Y X treats Y Question: Which drug treats headaches? Text: Aspirin cures headaches.

Relation Inference in Context When is a given natural-language relation implied by another? X eliminates Y X treats Y Question: Which drug treats headaches? Text: Aspirin eliminates headaches.

Relation Inference in Context When is a given natural-language relation implied by another? X eliminates Y X treats Y Question: Which drug treats patients? Text: Aspirin eliminates patients.

Prior Art DIRT (Lin and Pantel, 2001) Universal Schema (Rockt schel et al., NAACL 2015) PPDB 2.0 (Pavlick et al., ACL 2015) RELLY (Grycner et al., EMNLP 2015)

This Talk Not about relation inference algorithms How to evaluate relation inference algorithms Problem: current evaluations are biased and can t measure recall Contributions: Novel methodology for creating unbiased and natural datasets A new benchmark for relation inference in context

Evaluating Relation Inference in Context

Extrinsic Evaluation Usage: plug the inference algorithm into an RTE system Problems: Mixes other semantic phenomena Less relation inference examples Hard to trace/analyze System selection introduces bias We want an intrinsic evaluation too

Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts 1) Learn inference rules 2) Apply rules to text X eliminates Y X treats Y aspirin eliminates headaches aspirin treats headaches

Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts 1) Learn inference rules 2) Apply rules to text 3) Annotate for entailment X eliminates Y X treats Y aspirin eliminates headaches aspirin treats headaches

Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts Problems: Expensive research cycle Difficult to replicate Oblivious to recall We want a pre-annotated dataset

Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment aspirin eliminates headaches aspirin treats headaches

Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment run once aspirin eliminates headaches aspirin treats headaches

Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment 4) Compare new algorithms predictions to annotated data run once Premise Hypothesis Label Algo aspirin eliminates headaches aspirin treats headaches

Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment 4) Compare new algorithms predictions to annotated data run once Premise Hypothesis Label Algo aspirin eliminates headaches aspirin treats headaches

Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation Problems: Expensive research cycle Difficult to replicate Oblivious to recall Biased towards DIRT

How can we do better?

Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels

Reformulating Relation Inference Relation Inference as Question Answering Question Answering

Data Collection ???? ???? ????? Questions Which ingredient is included in chocolate? ???? ???? ??????? Candidate Answers chocolate is made from the cocoa bean

Data Collection: Questions Existing QA datasets TREC (Voorhees and Tice, 2000) WikiAnswers (Fader et al., 2013) WebQuestions (Berant et al., 2013) Manually converted to Which ?????????????? Who climbed the Everest? = Which person climbed the Everest?

Data Collection: Questions Existing QA datasets TREC (Voorhees and Tice, 2000) WikiAnswers (Fader et al., 2013) WebQuestions (Berant et al., 2013) key idea: naturally-occurring questions Manually converted to Which ?????????????? Who climbed the Everest? = Which person climbed the Everest?

Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where:

Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? (????= ????)

Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? (????= ????) (??????? is a ?????)

Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? 3) The relation is different from ???? (????= ????) (??????? is a ?????) (???? ????)

Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? 3) The relation is different from ???? (????= ????) (??????? is a ?????) (???? ????) key idea: unbiased sample of relations

Crowdsourced Annotation Given 1 question + 20 matching candidate answers Annotate each candidate answer as either: The sentence answers the question. The sentence does not answer the question. ? The sentence does not make sense, or is severely non-grammatical.

Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia

Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia Eritrea borders Ethiopia

Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia

Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia Italy borders Ethiopia

Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia Italy borders Ethiopia Filter world knowledge from annotation by masking the answer

Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: [COUNTRY] invaded Ethiopia Filter world knowledge from annotation by masking the answer Substitute ??????? with ?????

Crowdsourced Annotation

Crowdsourced Annotation No context switch faster annotation!

Crowdsourced Annotation: Aggregation 5 Mechanical Turk annotators per question-answer pair At least 4/5 must agree on label Discard nonsensical/non-grammatical (?) examples Labeled examples (after filtering): 16,371 Agreement with expert: ~90% ?1

Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels

Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition)

Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????)

Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????) (16K examples)

Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????) (16K examples) (F1 90% vs expert)

How many examples does post-hoc miss?

Conclusions

Recap Novel methodology for creating unbiased and natural datasets Key idea: Reformulate relation inference as question answering A new benchmark for relation inference in context Empirical finding: current methods have very low coverage

Going Forward Data is publicly available! Poses a new challenge bit.ly/2aCLuLB Code is publicly available! Extend our methodology Larger datasets for supervised learning bit.ly/2b05EhK Thank you!

Annotating Relation Inference for Improved Understanding

Download Presentation

Presentation Transcript

Related

More Related Content