Distant Supervision for Knowledge Base Population: Training and Challenges

Distant Supervision for Knowledge Base Population: Training and Challenges
Slide Note
Embed
Share

Distant supervision is utilized for knowledge base population, with a focus on slot filling tasks and generating training data automatically from Wikipedia infoboxes. The approach involves mapping infobox fields to slots, finding relevant sentences using information retrieval, and training multiclass classifiers. Results show improvements in label correctness and prediction accuracy. Challenges include enhancing data quality, improving information recall, and developing better classifiers for noisy text.

  • Knowledge base population
  • Distant supervision
  • Slot filling
  • Training
  • Challenges

Uploaded on Feb 26, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher Manning

  2. Definition and Approach We took part in TAC KBP 2010 this year (both tasks) Slot filling task: learning a pre-defined set of relations and attributes for target entities based on documents in a collection Warren Buffett began studying at the Warton School of Finance at the University of Pennsylvania, but transferred to the University of Nebraska where he graduated. (per:schools_attended, Warren Buffett, University of Pennsylvania) (per:schools_attended, Warren Buffett, University of Nebraska Distant supervision approach: generate training data automatically from Wikipedia infoboxes

  3. Training Evaluation Infobox KB KBP query: entity name Map infobox fields to KBP slots (one to many mapping) IR: find relevant sentences Query: entity name + trigger words IR: find relevant sentences Query: entity name + slot value Extract slot candidates Map KBP slots to fine-grained NE labels Classify candidates Extract +/- slot candidates Inference (greedy, local) Train multiclass classifier Extracted slots

  4. Results Training on 2/3 of infoboxes, evaluating on 1/3 Label Correct Predict Actual P R F1 UNRELATED 268085 289135 295590 92.7 90.7 91.7 org:city_of_ headquarters 5835 9040 7514 64.5 77.7 70.5 Evaluating only on sentences that contain at least a valid slot org:country_of_ headquarters 2851 4638 3725 61.5 76.5 68.2 org:founded 3896 8199 6662 47.5 58.5 52.4 org:parents 1158 2292 2525 50.5 45.9 48.1 Top 10 most common slots org:top_members/empl oyees 1282 3067 3596 41.8 35.7 38.5 per:city_of_birth 1799 3920 3252 45.9 55.3 50.2 per:country_of_birth 1984 4122 3204 48.1 61.9 54.2 per:date_of_birth 3938 5427 4362 72.6 90.3 80.5 per:member_of 1771 3018 2887 58.7 61.3 60 per:title 1714 3364 3054 51 56.1 53.4 Total for all slots Total 37169 68822 62367 54 59.6 56.7

  5. Challenges Improve quality of data generated through distant supervision Improve IR recall Use relation-specific trigger words (or n-grams or dependency paths etc.) to boost sentences likely to contain answers to the top How to acquire these automatically? Better classifiers for noisy text (e.g., web snippets)

Related


More Related Content