Cross-Lingual Content Scoring: Enhancing Educational Equality

cross lingual content scoring n.w
1 / 23
Embed
Share

Explore the concept of cross-lingual content scoring for students' free-text answers using training and test data in different languages. Overcome data sparsity by reusing existing training data, fostering educational equality by focusing on content rather than language-specific correctness, and allowing non-native speakers to express their knowledge effectively. Discover the core ideas and motivations behind this innovative method.

  • Cross-Lingual
  • Content Scoring
  • Educational Equality
  • Training Data
  • Test Data

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Cross-Lingual Content Scoring Andrea Horbach, Sebastian Stennmanns, Torsten Zesch University Duisburg-Essen, Germany

  2. Cross-Lingual Content Scoring - Motivation Core Idea Content scoring of students free-text answers with training and test data in different languages Training Data Test Data Foster educational equality non-native speaker might know the answer, but is unable to express it teachers ignore spelling and grammar for content scoring correctness of content not language-specific Overcome data sparsety re-use existing training data in different language 2

  3. Cross-Lingual Scoring Core Idea The Standard Monolingual Content Scoring Case Question: After reading the groups procedure, describe what additional information you would need in order to replicate the experiment. Test Data Training Data LA1000: You would need to know how many ml of vinegar they used, how much distilled water to rinse the samples with and how they obtained the mass of each sample. LA10001: I would need to know the exact amount of vinegar in each container. LA1: Some additional information you will need are the material. You also need to know the size of the container LA2: The additional information you need is one, the amount of vinegar you poured in each container, two, label the containers. train & apply model 3

  4. Cross-Lingual Scoring Core Idea Cross-Lingual Scoring Test Data Training Data LA1000: Es fehlt der S uregehalt des Essigs. Die Menge Essig die verwendet wurde. Und welche Holzart da Holzsorten unterschiedliche S ureresistenz aufweist. LA1001: Wir m ssen wissen, wie viel Wasser wir sammeln m ssen, um die Probe zu machen LA1: Some additional information you will need are the material. You also need to know the size of the container LA2: The additional information you need is one, the amount of vinegar you poured in each container, two, label the containers. ? 4

  5. Cross-Lingual Scoring Core Idea Test Data Training Data LA1: Some additional information you will need are the material. You also need to know the size of the container LA2: The additional information you need is one, the amount of vinegar you poured in each container, two, label the containers. MT LA1: Einige zus tzliche Informationen, die Sie ben tigen, sind das Material. Sie m ssen auch die Gr e des Containers kennen LA2: Die zus tzliche Information, die Sie brauchen, ist eine, die Menge an Essig, die Sie in jeden Beh lter gie en, zwei, beschriften Sie die Beh lter. LA1000: Es fehlt der S uregehalt des Essigs. Die Menge Essig die verwendet wurde. Und welche Holzart da Holzsorten unterschiedliche S ureresistenz aufweist. LA1001: Wir m ssen wissen, wie viel Wasser wir sammeln m ssen, um die Probe zu machen train & apply model 5

  6. Basic Experimental Setup: Using Machine Translation Translating Training Data Translating Test Data Training Test Training Test Translate 6

  7. Outline Challenges of Cross-lingual Scoring Data Collection Content Scoring Experiments Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 7

  8. Challenges for Automatic Scoring Quality of machine translation spelling errors: translation errors or normalization? the vinegar der Essig the vineger der Vineger but separate getrennt seperate getrennt Translationese Nature of bi-lingual datasets different learner populations vs. vs. language and culture dependence of prompts If both the (US)-President and the Vice President can no longer serve, who becomes President? 8

  9. Pre-Study: Influence of MT Quality on Monolingual Scoring Machine Translation google translate: English to German, English to Russian DeepL: English to German Translating Training and Test Data Training Test Data: 3 prompts from ASAP-2 0.8 0.6 QWK 0.4 Content Scoring Setup Weka SVM classifier Token n-grams Character n-grams 0.2 0 Prompt 1 Prompt 2 Prompt 10 EN DE - Google DE - DeepL RU - Google 9

  10. Collecting a Cross-Lingual Dataset Option 1: Collecting data in two languages Full control over data collection Time & cost-intensive Option 2: Extend existing dataset in another language Use existing data for English Re-collect data for the same prompts in, e.g., German Requirements: Prompt material available Language- and culture-independent Curriculum-independent Scoring guidelines available/applicable 10

  11. Suitability of existing datasets ASAP-2 PG Sem Mohler -Eval &Mihalcea Prompt available? Culture independent? ( ) Curriculum independent? Scoring guidelines? 11

  12. Recollecting ASAP in German Existing data ASAP >2000 answers per prompt US high school students 5 x ELA 2 x biology 3 x science New data ASAP-DE 300 answers per prompt crowd-sourced 3 x science 12

  13. Dataset Comparison Label Distribution Prompt 1 Prompt 2 Prompt 10 answers with score 0.6 0.6 0.6 rel. frequency of 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0 0 0 EN DE EN DE EN DE 0 points 2 points 1 point 3 points 0 points 2 points 1 point 3 points 0 points 1 point 2 points Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 13

  14. Dataset Comparison Answer Length Avg # of words 70 60 50 40 30 20 10 0 EN DE 0 points 1 point 2 points 3 points Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 14

  15. Dataset Comparison Answer Length Avg # of words 70 60 50 40 30 20 10 0 EN DE EN translated DE translated 0 points 1 point 2 points 3 points Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 15

  16. Dataset Comparison Linguistic Diversity Type-Token Ratio 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 EN DE 0 points 1 point 2 points 3 points Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 16

  17. Dataset Comparison Linguistic Diversity Type-Token Ratio 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 EN DE EN translated DE translated 0 points 1 point 2 points 3 points Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 17

  18. Content Scoring Results train ENall EN DE test EN EN DE QWK 0.68 0.61 0.67 baseline Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 18

  19. Content Scoring Results train ENall EN DE ENT test EN EN DE ENT QWK 0.68 0.61 0.67 0.58 baseline translate both DET DET 0.66 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 19

  20. Content Scoring Results train ENall EN DE ENT test EN EN DE ENT QWK 0.68 0.61 0.67 0.58 baseline translate both DET ENT DET DE 0.66 0.34 translate train DET EN EN DET 0.40 0.29 translate test DE ENT 0.32 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 20

  21. Differences between Prompts prompt 2 0.08 train test ENT 1 0.49 10 0.46 translate train DE DET EN EN DET 0.41 0.35 0.39 0.08 0.39 0.43 translate test DE ENT 0.26 0.35 0.33 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 21

  22. The Influence of Translationese Maybe combining translated and original data is the problem Type B plastic was the supervisor in both Trial 1 and Trial 2. (B) Write down the weight that was put on to show how much each one has made plastic. Also do more experiments (...) (A) Plastic type B was the superior in both trial 1 and trial 2. (B) Record the weight that was put on to show how much effected each plastic. Also conducting more trials (...) MT MT Train Test Idea: translate test data, double translate train data but: makes little difference Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018 22

  23. Conclusions and Future Work We collected a German version of the ASAP-2 dataset https://github.com/ltl-ude/crosslingual First experiments on cross-lingual scoring using MT Does not work that well Results depend a lot on individual prompt Thank you! Vielen Dank! Questions? . Fragen? Understand influence factors better: Language Learner population Machine translation artifacts Alternatives to Machine Translation: cross-lingual embeddings 23

Related


More Related Content