
Explore Core Methods in Educational Data Mining for Fall 2022
Dive into the world of Educational Data Mining with a focus on core methods and the upcoming final project. Learn about Deep Learning, Deep Knowledge Tracing, neural networks, and the classic perceptron. Get insights on forming project groups, understanding key project requirements, and finding teammates. Discover the basics of a perceptron and solve examples to test your understanding. Join the discussion on the final project and get ready for a hands-on learning experience.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Core Methods in Educational Data Mining EDUC 6191 Fall 2022
Final project Let s take a few minutes to discuss the final project One month away The end of the semester is creeping up on us
Final project Let s go over the assignment Most important things to note You need a project group of 2-3 You present an idea You do not conduct an analysis between now and December 15
Final project Finding teammates Some of you have been ultra-proactive and have already found teammates For those of you who haven t, I recommend posting to the final-project folder on piazza Suggest an idea you re potentially interested in to your classmates Read their ideas If you haven t found project partners by Thanksgiving, send me an email and I ll see if I can connect people
Final project Any questions?
An overly brief introduction to neural networks
The classic perceptron A perceptron takes a set of inputs Has a weight for each input Multiplies those weights by the inputs Adds it all together Adds an intercept And then applies a step function to get {0,1}
For example We have inputs M, N, P With w weights 1, 0, -0.5 and b intercept 0.1 Then for M=1, N=-7, P=2 What is f(x)?
For example We have inputs M, N, P With w weights 1, 0, -0.5 and b intercept 0.1 Then for M=-1, N=0.003, P=8 What is f(x)?
But actually Usually modern neural networks use more complex decision functions than just a step function Logistic function Tanh function ReLu function If x>0, x If x<=0, 0 And many more
Thats one perceptron And one perceptron can have multiple inputs
But But neural networks take a lot of inputs and they can produce multiple outputs Image courtesy of glosser.ca used under Creative Commons Licensing
Neural network Red circles: Predictors Blue circles: Perceptrons Green circles: Predicteds Image courtesy of glosser.ca used under Creative Commons Licensing
What you see here A single layer neural network A very simple one Generally hundreds/thousands/ millions of hidden perceptrons Image courtesy of glosser.ca used under Creative Commons Licensing
But this is just a simple single-layer neural network Image courtesy of glosser.ca used under Creative Commons Licensing
On to deep learning Image courtesy of IBM
Multiple hidden layers Image courtesy of IBM
Why does deep learning (sometimes) work better? Can capture multiple layers of abstraction Without having to do so in a way that human beings can understand
And Lots of ways to make things more complex still
Often the term deep learning Reserved for recurrent neural networks (or more complex algorithms still) Recurrent neural networks fits on sequence of events Keeping some degree of memory about previous events
Recurrent neural networks (RNN) Feed back information from later layers back to earlier layers A node can (over time) influence itself Allows for sequence of outputs
Long short term memory networks RNN variant Replaces perceptrons with LSTM units Information propagation reduces over time for given piece of information (long-term memory) Activation patterns in network change once per time step (short-term memory) Will not go into full details; linear algebra required
LSTM Unit Note the: Hidden state (h) Input gate (It) Forget gate (Ft) Output gate (Ot) Image by fdeloche - CC BY-SA 4.0
Deep Knowledge Tracing (Piech et al., 2015) Finally Predict student correctness with an LSTM!
DKT Initial paper reported massively better performance than original BKT or PFA (Piech et al., 2015) Seemed at first too good to be true, and (Xiong et al., 2016) reported that (Piech et al., 2015) had used the same data points for both training and test
DKT (Khajah et al., 2016) compared DKT to modern extensions to BKT on same data set Particularly beneficial to re-fit item-skill mappings (Wilson et al., 2016) compared DKT to temporal IRT on same data set Bottom line: All three approaches appeared to perform comparably well
Beginning of what could be called DKT-Family algorithms A range of knowledge tracing algorithms based on different variants on Deep Learning Now literally hundreds of published variants Most of them tiny tweaks to get tiny gains in performance I will discuss some of the key issues that researchers have tried to address, and what their approaches were
Degenerate behavior (Yeung & Yeung, 2018) report degenerate behavior for DKT Getting answers right leads to lower knowledge Wild swings in probability estimates in short periods of time
Degenerate behavior (Yeung & Yeung, 2018) report degenerate behavior for DKT Getting answers right leads to lower knowledge Wild swings in probability estimates in short periods of time They proposed adding two types of regularization to moderate these swings Increasing weight of current prediction for future prediction Reducing amount model is allowed to change future estimates
Impossible to interpret in terms of skills DKT Family predicts individual item correctness, not skills What do you do for entirely new items? What information can you provide teachers?
Extension for Latent Knowledge Estimation (Zhang et al., 2017) propose an extension to DKT, called DKVMN, that fits an item-skill mapping too Based on Memory-Augmented Neural Network, that keeps an external memory matrix that neurons update and refer back to Latent skill difficult to interpret
Extension for Latent Knowledge Estimation (Lee & Yeung, 2019) propose an alternative to DKT, called KQN, that attempts to output more interpretable latent skill estimates Again, fits an external memory network to fit skills Also attempts to fit amount of information transfer between skills Still not that interpretable
Extension for Latent Knowledge Estimation (Scruggs et al., 2020) propose an extension to any DKT-family algorithm Human-derived skill-item mapping used Predicted performance on all items in skill averaged Including both unseen and already-seen items Leads to successful prediction of post-tests outside the learning system
What is DKT really learning? Ding & Larson (2019) demonstrate theoretically that a lot of what DKT learns is how good a student is overall
What is DKT really learning? Zhang et al. (2021) follow this up with empirical work showing that most of the improvement in performance for DKVMN is in the first attempt on a new skill
What is DKT really learning? In particular, there s essentially no benefit to deep learning after several attempts on a skill (about the point where students often reach mastery, if they didn t already know skill)
Other Recent DKT variants: SAKT Pandey & Karypis (2019) propose a DKT variant, called SAKT, which fits attentional weights between exercises and more explicitly predicts performance on current exercise from performance on past exercises Gets a little better fit, doubles down a little more on some limitations we ve already discussed
Other Recent DKT variants: SAKT Incidentally, Pandey & Karypis (2019) s abstract doesn t match what they actually do Abstract claims they find relationship between KC And then one sentence in paper For predicting student s performance on an exercise, we used exercises as KCs. (p.2)
Other Recent DKT variants: AKT Ghosh et al. (2020) propose a DKT variant, called AKT, which Explicitly stores and uses learner s entire past practice history for each prediction Uses exponential decay curve to downweight past actions Uses Rasch-model embeddings to calculate item difficulty
Adding in more information: SAINT+ Shin et al. (2021) add elapsed time and lag time as additional inputs to SAINT+ Additional information leads to better performance Paper is unfortunately vague as to whether current action s time variables are included in calculation, or just previous actions
Adding in more information: Process-BERT Scarlatos et al. (2022) add timing and use of resources such as calculator Additional information leads to better performance