Bayesian Knowledge Tracing and Predictive Models in Educational Data Mining

bayesian knowledge tracing and other predictive n.w
1 / 60
Embed
Share

Explore the concept of Bayesian Knowledge Tracing and other predictive models in educational data mining presented by Zachary A. Pardos at the PSLC Summer School 2011. Learn about the history, intuition, model parameters, and applications of Knowledge Tracing in tracking student knowledge over time. Dive into the evaluations and variations of these models for effective educational data analysis.

  • Bayesian Knowledge Tracing
  • Predictive Models
  • Educational Data Mining
  • Zachary A. Pardos
  • Learning Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  2. Outline of Talk Introduction to Knowledge Tracing History Intuition Model Demo Variations (and other models) Evaluations (baker work / kdd) Random Forests Description Evaluations (kdd) Time left? Vote on next topic 2 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  3. Intro to Knowledge Tracing History Introduced in 1995 (Corbett & Anderson, UMUAI) Basked on ACT-R theory of skill knowledge (Anderson 1993) Computations based on a variation of Bayesian calculations proposed in 1972 (Atkinson) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  4. Intro to Knowledge Tracing Intuition Based on the idea that practice on a skill leads to mastery of that skill Has four parameters used to describe student performance Relies on a KC model Tracks student knowledge over time Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  5. Intro to Knowledge Tracing For some Skill K: Given a student s response sequence 1 to n, predict n+1 1 . n n+1 ? 0 1 0 1 1 0 Chronological response sequence for student Y [ 0 = Incorrect response 1 = Correct response] Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  6. Intro to Knowledge Tracing Track knowledge over time (model of learning) 1 0 1 0 1 1 0 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  7. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing Knowledge Tracing (KT) can be represented as a simple HMM P(L0) P(T) P(T) Latent K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Observed Node states K = two state (0 or 1) Q = two state (0 or 1) Node states K = Two state (0 or 1) Q = Two state (0 or 1) Node representations K = Knowledge node Q = Question node 7 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  8. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing Four parameters of the KT model: P(L0) P(T) P(T) P(T) P(T) P(L0) K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(G) P(G) P(S) Probability of forgetting assumed to be zero (fixed) 8 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  9. Intro to Knowledge Tracing Formulas for inference and prediction If ???????? ?(?? 1) = ? ?? 1 (1 ? ? ) ? ?? 1 1 ? ? + (1 ? ?? 1) (? ? ) (1) ?????????? ?(?? 1) = ? ?? = ? (?? 1 (1 ? ? ) + (1 ?(?? 1) ?(?)) ? ?? 1 ? ? ? ?? 1 ?(?)+ (1 ? ?? 1) (1 ? ? ) (2) (3) Derivation (Reye, JAIED 2004): Formulas use Bayes Theorem to make inferences about latent variable Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  10. Intro to Knowledge Tracing Model Training: Model Training Step - Values of parameters P(T), P(G), P(S) & P(L0 ) used to predict student responses Ad-hoc values could be used but will likely not be the best fitting Goal: find a set of values for the parameters that minimizes prediction error 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 0 Student A Student B Student C 1 0 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  11. Intro to Knowledge Tracing Model Parameters Model Prediction: Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Model Parameters P(L0) = Probability of initial knowledge P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess 10% P(K) Model Tracing Step Skill: Subtraction Knowledge Tracing P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(S) = Probability of slip Knowledge Tracing 45% Knowledge Tracing 75% 79% 83% P(L0) P(L0) P(L0) P(T) P(T) P(T) P(T) P(T) P(T) Latent (knowledge) K K K K K K K K K Nodes representation K = knowledge node Q = question node Nodes representation K = knowledge node Q = question node Nodes representation K = knowledge node Q = question node P(G) P(S) P(G) P(S) P(G) P(S) Q Q Q Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) Node states K = two state (0 or 1) Q = two state (0 or 1) Node states K = two state (0 or 1) Q = two state (0 or 1) Observable (responses) 0 1 1 71% 74% P(Q) Test set questions Student s last three responses to Subtraction questions (in the Unit) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  12. Intro to Knowledge Tracing Influence of parameter values Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1 Student reached 95% probability of knowledge After 4th opportunity P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  13. Intro to Knowledge Tracing Influence of parameter values Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1 Student reached 95% probability of knowledge After 8th opportunity P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 P(L0): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  14. Intro to Knowledge Tracing ( Demo ) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  15. Intro to Knowledge Tracing Variations on Knowledge Tracing (and other models) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  16. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Prior Individualization Approach Knowledge Tracing Do all students enter a lesson with the same background knowledge? Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) K K K K K K S Nodes representation K = knowledge node Q = question node Observed P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) Node states K = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) Node representations K = Knowledge node Q = Question node S = Student node Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  17. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Prior Individualization Approach Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) CPT of Student node S value P(S=value) K K K K K K S Nodes representation K = knowledge node Q = question node CPT of observed student node is fixed Possible to have S value for every student ID Raises initialization issue (where do these prior values come from?) 1 1/N 2 1/N 3 1/N P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) N 1/N S value can represent a cluster or type of student instead of ID Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  18. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Individualized L0 values need to be seeded This CPT can be fixed or the values can be learned Fixing this CPT and seeding it with values based on a student s first response can be an effective strategy This model, that only individualizes L0, the Prior Per Student (PPS) model Prior Individualization Approach Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) S value P(L0|S) 0.05 K K K K K K S Nodes representation K = knowledge node Q = question node 3 1 2 0.30 0.95 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) N 0.92 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  19. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Prior Individualization Approach Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) Bootstrapping prior If a student answers incorrectly on the first question, she gets a low prior If a student answers correctly on the first question, she gets a higher prior S value P(L0|S) 0.05 K K K K K K S Nodes representation K = knowledge node Q = question node 0 1 0.30 1 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  20. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Prior Individualization Approach Knowledge Tracing What values to use for the two priors? Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) What values to use for the two priors? S value P(L0|S) 0.05 K K K K K K S Nodes representation K = knowledge node Q = question node 0 1 0.30 1 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  21. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Prior Individualization Approach Knowledge Tracing What values to use for the two priors? Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) 1. Use ad-hoc values S value P(L0|S) 0.10 K K K K K K S Nodes representation K = knowledge node Q = question node 0 1 0.85 1 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  22. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Prior Individualization Approach Knowledge Tracing What values to use for the two priors? Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) 1. Use ad-hoc values 2. Learn the values S value P(L0|S) EM K K K K K K S Nodes representation K = knowledge node Q = question node 0 1 EM 1 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  23. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Prior Individualization Approach Knowledge Tracing What values to use for the two priors? Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) 1. Use ad-hoc values 2. Learn the values 3. Link with the guess/slip CPT S value P(L0|S) Slip K K K K K K S Nodes representation K = knowledge node Q = question node 0 1 1-Guess 1 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) 1 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  24. Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip CPT of Individualized Prior node Prior Individualization Approach Knowledge Tracing What values to use for the two priors? Knowledge Tracing with Individualized P(L0) P(L0) P(L0|S) P(L0|S) P(L0|S) P(T) P(T) P(T) P(T) 1. Use ad-hoc values 2. Learn the values 3. Link with the guess/slip CPT S value P(L0|S) Slip K K K K K K S Nodes representation K = knowledge node Q = question node 0 1 1-Guess 1 P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) 1 With ASSISTments, PPS (ad-hoc) achieved an R2 of 0.301 (0.176 with KT) (Pardos & Heffernan, UMAP 2010) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  25. Intro to Knowledge Tracing Variations on Knowledge Tracing (and other models) 25 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  26. Intro to Knowledge Tracing 1. BKT-BF Learns values for these parameters by performing a grid search (0.01 granularity) and chooses the set of parameters with the best squared error Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing P(L0) P(T) P(T) P(T) P(T) P(L0) . . . K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(S) P(G) P(S) P(G) P(S) (Baker et al., 2010) 26 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  27. Intro to Knowledge Tracing 2. BKT-EM Learns values for these parameters with Expectation Maximization (EM). Maximizes the log likelihood fit to the data Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing P(L0) P(T) P(T) P(T) P(T) P(L0) . . . K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(S) P(G) P(S) P(G) P(S) (Chang et al., 2006) 27 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  28. Intro to Knowledge Tracing 3. BKT-CGS Guess and slip parameters are assessed contextually using a regression on features generated from student performance in the tutor Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing P(L0) P(T) P(T) P(T) P(T) P(L0) . . . K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(S) P(G) P(S) P(G) P(S) (Baker, Corbett, & Aleven, 2008) 28 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  29. Intro to Knowledge Tracing 4. BKT-CSlip Uses the student s averaged contextual Slip parameter learned across all incorrect actions. Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing P(L0) P(T) P(T) P(T) P(T) P(L0) . . . K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(S) P(G) P(S) P(G) P(S) (Baker, Corbett, & Aleven, 2008) 29 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  30. Intro to Knowledge Tracing 5. BKT-LessData Limits students response sequence length to the most recent 15 during EM training. Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip Knowledge Tracing P(L0) P(T) P(T) P(T) P(T) P(L0) . . . K K K Nodes representation K = knowledge node Q = question node P(G) P(S) Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(S) P(G) P(S) Most recent 15 responses used (max) P(G) P(S) (Nooraiei et al, 2011) 30 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  31. Intro to Knowledge Tracing 6. BKT-PPS Prior per student (PPS) model which individualizes the prior parameter. Students are assigned a prior based on their response to the first question. Knowledge Tracing with Individualized P(L0) Model Parameters P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(L0) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip P(T) Knowledge Tracing P(L0) P(L0|S) P(T) P(T) P(T) P(T) P(T) P(L0) P(L0|S) . . . K K K K K K S Nodes representation K = knowledge node Q = question node Observed P(G) P(S) P(S) P(G) Q Q Q Q Q Q Node states K = two state (0 or 1) Q = two state (0 or 1) P(G) P(S) P(G) P(S) P(G) P(S) (Pardos & Heffernan, 2010) 31 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  32. Intro to Knowledge Tracing 7. CFAR Correct on First Attempt Rate (CFAR) calculates the student s percent correct on the current skill up until the question being predicted. Student responses for Skill X: 0 1 0 1 0 1 _ Predicted next response would be 0.50 (Yu et al., 2010) 32 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  33. Intro to Knowledge Tracing 8. Tabling Uses the student s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set Training set Student A: 0 1 1 0 Student B: 0 1 1 1 Student C: 0 1 1 1 Max table length set to 3: Table size was 20+21+22+23=15 Test set student: 0 0 1 _ Predicted next response would be 0.66 (Wang et al., 2011) 33 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  34. Intro to Knowledge Tracing 9. PFA Performance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student s prior failures and successes on the current skill. An overall difficulty parameter is also fit for each skill or each item In this study we use the variant of PFA that fits for each skill. The PFA equation is: ? ?,? ???,?,? = ??+ (????? +????? ) (Pavlik et al., 2009) 34 UMAP 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  35. Intro to Knowledge Tracing Methodology Evaluation Study Dataset Cognitive Tutor for Genetics 76 CMU undergraduate students 9 Skills (no multi-skill steps) 23,706 problem solving attempts 11,582 problem steps in the tutor 152 average problem steps completed per student (SD=50) Pre and post-tests were administered with this assignment Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  36. Intro to Knowledge Tracing Methodology model in-tutor prediction Study Predictions were made by the 9 models using a 5 fold cross-validation by student Actual Student 1 Skill A Resp 1 0.10 0.22 0 Skill A Resp 2 . 0.51 0.26 1 Skill A Resp N 0.77 0.40 1 Student 1 Skill B Resp 1 0.55 0.60 1 Skill B Resp N 0.41 0.61 0 Accuracy was calculated with A for each student. Those values were then averaged across students to report the model s A (higher is better) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  37. Intro to Knowledge Tracing Results Study in-tutor model prediction A results averaged across students Model A BKT-PPS 0.7029 BKT-BF 0.6969 BKT-EM 0.6957 BKT-LessData 0.6839 PFA 0.6629 Tabling 0.6476 BKT-CSlip 0.6149 CFAR 0.5705 BKT-CGS 0.4857 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  38. Intro to Knowledge Tracing Results Study in-tutor model prediction A results averaged across students No significant differences within these BKT Model A BKT-PPS 0.7029 BKT-BF 0.6969 BKT-EM 0.6957 Significant differences between these BKT and PFA BKT-LessData 0.6839 PFA 0.6629 Tabling 0.6476 BKT-CSlip 0.6149 CFAR 0.5705 BKT-CGS 0.4857 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  39. Intro to Knowledge Tracing Methodology Study ensemble in-tutor prediction 5 ensemble methods were used, trained with the same 5 fold cross-validation folds features label Actual Student 1 Skill A Resp 1 0.10 0.22 0 Skill A Resp 2 . 0.51 0.26 1 Skill A Resp N 0.77 0.40 1 Student 1 Skill B Resp 1 0.55 0.60 1 Skill B Resp N 0.41 0.61 0 Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label. Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  40. Intro to Knowledge Tracing Methodology Study ensemble in-tutor prediction Ensemble methods used: 1. Linear regression with no feature selection (predictions bounded between {0,1}) 2. Linear regression with feature selection (stepwise regression) 3. Linear regression with only BKT-PPS & BKT-EM 4. Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip 5. Logistic regression Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  41. Intro to Knowledge Tracing Results Study in-tutor ensemble prediction A results averaged across students Model A Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028 Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973 Ensemble: LinReg without feature selection 0.6945 Ensemble: LinReg with feature selection (stepwise) 0.6954 Ensemble: Logistic without feature selection 0.6854 No significant difference between ensembles Tabling Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  42. Intro to Knowledge Tracing Results Study in-tutor ensemble & model prediction A results averaged across students Model A BKT-PPS 0.7029 Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028 Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973 BKT-BF 0.6969 BKT-EM 0.6957 Ensemble: LinReg without feature selection 0.6945 Ensemble: LinReg with feature selection (stepwise) 0.6954 Ensemble: Logistic without feature selection 0.6854 BKT-LessData 0.6839 PFA 0.6629 Tabling 0.6476 BKT-CSlip 0.6149 CFAR 0.5705 BKT-CGS 0.4857 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  43. Intro to Knowledge Tracing Results Study in-tutor ensemble & model prediction A results calculated across all actions Model A Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic regression without feature selection Ensemble: LinReg with BKT-PPS & BKT-EM BKT-EM BKT-BF BKT-PPS PFA BKT-LessData CFAR Tabling Contextual Slip BKT-CGS 0.7451 0.7428 0.7423 0.7359 0.7348 0.7348 0.7330 0.7310 0.7277 0.7220 0.6723 0.6712 0.6396 0.4917 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  44. Random Forests Random Forests In the KDD Cup Motivation for trying non KT approach: Bayesian method only uses KC, opportunity count and student as features. Much information is left unutilized. Another machine learning method is required Strategy: Engineer additional features from the dataset and use Random Forests to train a model Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  45. Random Forests Strategy: Create rich feature datasets that include features created from features not included in the test set raw training dataset rows raw test dataset rows Validation set 2 Validation set 1 Non validation training rows (nvtrain) (val1) (val2) Feature Rich Validation set 2 (frval2) Feature Rich Validation set 1 (frval1) Feature Rich Test set (frtest) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  46. Random Forests Created by Leo Breiman The method trains T number of separate decision tree classifiers (50-800) Each decision tree selects a random 1/P portion of the available features (1/3) The tree is grown until there are at least M observations in the leaf (1-100) When classifying unseen data, each tree votes on the class. The popular vote wins or an average of the votes (for regression) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  47. Random Forests Feature Importance Features extracted from training set: Student progress features (avg. importance: 1.67) Number of data points [today, since the start of unit] Number of correct responses out of the last [3, 5, 10] Zscore sum for step duration, hint requests, incorrects Skill specific version of all these features Percent correct features (avg. importance: 1.60) % correct of unit, section, problem and step and total for each skill and also for each student (10 features) Student Modeling Approach features (avg. importance: 1.32) The predicted probability of correct for the test row The number of data points used in training the parameters The final EM log likelihood fit of the parameters / data points Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  48. Random Forests Features of the user were more important in Bridge to Algebra than Algebra Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  49. Random Forests Algebra Rank Feature set RMSE Coverage 1 All features 0.2762 87% 2 Percent correct+ 0.2824 96% 3 All features (fill) 0.2847 97% Bridge to Algebra Rank Feature set RMSE Coverage 1 All features 0.2712 92% 2 All features (fill) 0.2791 99% 3 Percent correct+ 0.2800 98% Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

  50. Random Forests Algebra Rank Feature set RMSE Coverage 1 All features 0.2762 87% 2 Percent correct+ 0.2824 96% 3 All features (fill) 0.2847 97% Bridge to Algebra Rank Feature set RMSE Coverage 1 All features 0.2712 92% 2 All features (fill) 0.2791 99% 3 Percent correct+ 0.2800 98% Best Bridge to Algebra RMSE on the Leaderboard was 0.2777 Random Forest RMSE of 0.2712 here is exceptional Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011

Related


More Related Content