Annotating Relation Inference for Improved Understanding

annotating relation inference in context n.w
1 / 47
Embed
Share

Discover how annotating relation inference in context can lead to deeper insights in natural language processing. Learn about evaluating algorithms and creating unbiased datasets for more accurate results.

  • Relation Inference
  • Natural Language Processing
  • Evaluation
  • Context Understanding

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Annotating Relation Inference in Context via Question Answering Omer Levy Bar-Ilan University Israel Ido Dagan

  2. Relation Inference When Is a given natural-language relation implied by another? X cures Y X treats Y Question: Which drug treats headaches? Text: Aspirin cures headaches.

  3. Relation Inference in Context When is a given natural-language relation implied by another? X eliminates Y X treats Y Question: Which drug treats headaches? Text: Aspirin eliminates headaches.

  4. Relation Inference in Context When is a given natural-language relation implied by another? X eliminates Y X treats Y Question: Which drug treats patients? Text: Aspirin eliminates patients.

  5. Prior Art DIRT (Lin and Pantel, 2001) Universal Schema (Rockt schel et al., NAACL 2015) PPDB 2.0 (Pavlick et al., ACL 2015) RELLY (Grycner et al., EMNLP 2015)

  6. This Talk Not about relation inference algorithms How to evaluate relation inference algorithms Problem: current evaluations are biased and can t measure recall Contributions: Novel methodology for creating unbiased and natural datasets A new benchmark for relation inference in context

  7. Evaluating Relation Inference in Context

  8. Extrinsic Evaluation Usage: plug the inference algorithm into an RTE system Problems: Mixes other semantic phenomena Less relation inference examples Hard to trace/analyze System selection introduces bias We want an intrinsic evaluation too

  9. Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts 1) Learn inference rules 2) Apply rules to text X eliminates Y X treats Y aspirin eliminates headaches aspirin treats headaches

  10. Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts 1) Learn inference rules 2) Apply rules to text 3) Annotate for entailment X eliminates Y X treats Y aspirin eliminates headaches aspirin treats headaches

  11. Post-hoc Evaluation Usage: apply relation inference, annotate inferred facts Problems: Expensive research cycle Difficult to replicate Oblivious to recall We want a pre-annotated dataset

  12. Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment aspirin eliminates headaches aspirin treats headaches

  13. Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment run once aspirin eliminates headaches aspirin treats headaches

  14. Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment 4) Compare new algorithms predictions to annotated data run once Premise Hypothesis Label Algo aspirin eliminates headaches aspirin treats headaches

  15. Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation 1) Learn inference rules using DIRT 2) Apply rules to text Open IE tuples 3) Annotate for entailment 4) Compare new algorithms predictions to annotated data run once Premise Hypothesis Label Algo aspirin eliminates headaches aspirin treats headaches

  16. Pre-annotated Dataset (Zeichner et al, 2012) Usage: compare relation inference to recorded post-hoc evaluation Problems: Expensive research cycle Difficult to replicate Oblivious to recall Biased towards DIRT

  17. How can we do better?

  18. Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels

  19. Reformulating Relation Inference Relation Inference as Question Answering Question Answering

  20. Data Collection ???? ???? ????? Questions Which ingredient is included in chocolate? ???? ???? ??????? Candidate Answers chocolate is made from the cocoa bean

  21. Data Collection: Questions Existing QA datasets TREC (Voorhees and Tice, 2000) WikiAnswers (Fader et al., 2013) WebQuestions (Berant et al., 2013) Manually converted to Which ?????????????? Who climbed the Everest? = Which person climbed the Everest?

  22. Data Collection: Questions Existing QA datasets TREC (Voorhees and Tice, 2000) WikiAnswers (Fader et al., 2013) WebQuestions (Berant et al., 2013) key idea: naturally-occurring questions Manually converted to Which ?????????????? Who climbed the Everest? = Which person climbed the Everest?

  23. Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where:

  24. Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? (????= ????)

  25. Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? (????= ????) (??????? is a ?????)

  26. Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? 3) The relation is different from ???? (????= ????) (??????? is a ?????) (???? ????)

  27. Data Collection: Candidate Answers Extract Open IE assertions from Google s Syntactic N-grams chocolate is made from the cocoa bean Given a question Which ?????????????? , fetch all assertions where: 1) One of the arguments is equal to ???? 2) The other argument is a type of ????? 3) The relation is different from ???? (????= ????) (??????? is a ?????) (???? ????) key idea: unbiased sample of relations

  28. Crowdsourced Annotation Given 1 question + 20 matching candidate answers Annotate each candidate answer as either: The sentence answers the question. The sentence does not answer the question. ? The sentence does not make sense, or is severely non-grammatical.

  29. Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia

  30. Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia Eritrea borders Ethiopia

  31. Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia

  32. Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia Italy borders Ethiopia

  33. Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: Eritrea invaded Ethiopia A: Italy invaded Ethiopia Eritrea borders Ethiopia Italy borders Ethiopia Filter world knowledge from annotation by masking the answer

  34. Crowdsourced Annotation: Masking Answers The annotators are biased by their own world knowledge Q: Which country borders Ethiopia? A: [COUNTRY] invaded Ethiopia Filter world knowledge from annotation by masking the answer Substitute ??????? with ?????

  35. Crowdsourced Annotation

  36. Crowdsourced Annotation No context switch faster annotation!

  37. Crowdsourced Annotation: Aggregation 5 Mechanical Turk annotators per question-answer pair At least 4/5 must agree on label Discard nonsensical/non-grammatical (?) examples Labeled examples (after filtering): 16,371 Agreement with expert: ~90% ?1

  38. Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels

  39. Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition)

  40. Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????)

  41. Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????) (16K examples)

  42. Desired Qualities of Evaluation Scheme Intrinsic task Pre-annotated dataset Sensitive to recall Not biased towards a particular method Crowdsourcable High quality labels (by definition) (???? ????) (16K examples) (F1 90% vs expert)

  43. How many examples does post-hoc miss?

  44. Conclusions

  45. Recap Novel methodology for creating unbiased and natural datasets Key idea: Reformulate relation inference as question answering A new benchmark for relation inference in context Empirical finding: current methods have very low coverage

  46. Going Forward Data is publicly available! Poses a new challenge bit.ly/2aCLuLB Code is publicly available! Extend our methodology Larger datasets for supervised learning bit.ly/2b05EhK Thank you!

More Related Content