Educational Data Mining Assignment Challenges and Solutions

core methods in educational data mining n.w
1 / 64
Embed
Share

Explore technical glitches and key ambiguities in an educational data mining assignment, along with the setup of RapidMiner processes and analysis of association rules for sequential patterns.

  • Data Mining
  • Educational Data
  • Assignment Challenges
  • RapidMiner
  • Association Rules

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Core Methods in Educational Data Mining HUDK4050 Fall 2015

  2. Assignment B8 I know there were some technical glitches in the assignment Not to mention a key ambiguity sorry about that! Let s go over the answers

  3. Question 1: Set up a RapidMiner process using Read CSV and the GSP operator (Generalized Sequential Patterns not the WEKA version W- GeneralizedSequentialPatterns). What should your customer id be? anonid obsnum behavior affect

  4. Question 1: Set up a RapidMiner process using Read CSV and the GSP operator (Generalized Sequential Patterns not the WEKA version W- GeneralizedSequentialPatterns). What should your customer id be? anonid obsnum behavior affect

  5. Question 2: What should your time attribute be? anonid obsnum behavior affect

  6. Question 2: What should your time attribute be? anonid obsnum behavior affect

  7. Did anyone have trouble with this one? Question 3: Set min support = 0.6, window size = 0.0, max gap = 5.0, min gap = 0.0, positive value = 1. Which of these association rules has the highest support? behavior-ontask behavior-ontask behavior-offtask behavior-offtask behavior-ontask affect-concentrating affect-concentrating affect-concentrating affect- concentrating affect-concentrating behavior- ontask AND affect-concentrating

  8. Did anyone have trouble with this one? Question 4: If you set window size = 2.0, what is the association rule with the highest support that now is created (but was not created in question 3 s settings)? affect-concentrating ontask AND affect-concentrating affect-concentrating behavior-ontask affect- concentrating affect-concentrating behavior-ontask affect- concentrating affect-concentrating behavior-ontask behavior-ontask behavior- ontask AND affect-concentrating behavior-ontask behavior-

  9. Did anyone have trouble with this one? Question 5: Set window size back to 0.0. Set max gap to 1.0. Which is the rule with the most items? behavior-ontask AND affect-concentrating behavior-ontask AND affect-concentrating affect-concentrating behavior-ontask affect- concentrating behavior-ontask behavior-ontask behavior- ontask AND affect-concentrating affect-concentrating affect-concentrating

  10. Question 6: Which of these is a reason why you might want to create a window size above 0? Related events may be linked but separated by a few seconds Unrelated events may be separated by a few seconds Related events may occur at exactly the same time Unrelated events may occur at exactly the same time

  11. Question 6: Which of these is a reason why you might want to create a window size above 0? Related events may be linked but separated by a few seconds Unrelated events may be separated by a few seconds Related events may occur at exactly the same time Unrelated events may occur at exactly the same time

  12. Question 7: How many students had the sequential (and immediate, max gap = 1) rule behavior-ontask -> affect-concentrating at least once? (Hint: RapidMiner may not be the easiest tool to compute this with)

  13. Does anyone want to see this calculation? Question 7: How many students had the sequential (and immediate, max gap = 1) rule behavior-ontask -> affect-concentrating at least once? (Hint: RapidMiner may not be the easiest tool to compute this with) 49

  14. Question 8: What is the confidence for sequential rule behavior-ontask -> affect- concentrating? Give three digits after the decimal point, round to nearest number. (Hint: RapidMiner may not be the easiest tool to compute this with)

  15. Question 8: What is the confidence for sequential rule behavior-ontask -> affect- concentrating? Give three digits after the decimal point, round to nearest number. (Hint: RapidMiner may not be the easiest tool to compute this with) 0.726

  16. Question 9: What is the cosine for sequential rule behavior-ontask -> affect-concentrating? Give three digits after the decimal point, round to nearest number. (Hint: RapidMiner may not be the easiest tool to compute this with)

  17. Question 9: What is the cosine for sequential rule behavior-ontask -> affect-concentrating? Give three digits after the decimal point, round to nearest number. (Hint: RapidMiner may not be the easiest tool to compute this with) 0.671

  18. Question 10: What is the lift for sequential rule behavior-ontask -> affect-concentrating? Give three digits after the decimal point, round to nearest number. (Hint: RapidMiner may not be the easiest tool to compute this with) Did anyone get the answer in the system? I think there might have been a bug.

  19. Question 11 (ungraded due to technical issues) Question 11: Would Merceron & Yacef say that this is an interesting association rule? Yes, because cosine is over threshold Yes, because lift is over threshold Yes, because both lift and cosine are over threshold No, because cosine is over threshold No, because lift is over threshold No, because cosine is below threshold No, because lift is below threshold

  20. Sorry again about technical glitches Any content-related technical questions or comments?

  21. ARM vs SPM What are the differences between Association Rule Mining Sequential Pattern Mining

  22. Any questions about GPS algorithm?

  23. Perera et al. (2009) What were the three ways that Perera et al. (2009) used sequential pattern mining? What did they learn, and how did they use the information?

  24. Perera et al. (2009) 1. Overall uses of collaborative tools by groups 2. Sequences of collaborative tool use by different group members 3. Sequences of access of specific resources by different group members In all cases, they found common patterns and then looked at how support differed for successful and unsuccessful groups

  25. Perera et al. (2009): Important Findings 1. Overall uses of collaborative tools by groups Successful groups used ticketing system more than the wiki; weaker groups used wiki more Patterns were particularly strong for group leaders

  26. Perera et al. (2009): Important Findings 2. Sequences of collaborative tool use by different group members Successful groups characterized by leader opening ticket and other student working on ticket Successful groups characterized by students other than leader opening ticket, and other students working on ticket

  27. Perera et al. (2009): Important Findings 3. Sequences of access of specific resources by different group members The best groups had interactions around the same resource by multiple students The poor groups did no work on tickets before closing them

  28. Variants

  29. Differential Sequence Mining (Kinnebrew et al., 2013) Split data into two groups Look for differences in pattern frequencies between groups

  30. Example (Jiang et al., 2015) Compare behaviors of Students who had used inquiry science environment before Students who had never used inquiry science environment before

  31. Difference found Experienced students are more likely to read about several topics after conducting an experiment and looking at the results Novice students are more likely to read about a single topic after conducting several experiments and looking at the results

  32. MOTIF Extraction

  33. Motif Short, recurring pattern in a sequence of categories occurring over time

  34. Motif in Music Short, recurring pattern of notes in a musical composition

  35. Motif in Music What s the motif? http://www.youtube.com/watch?v=rRgXUFnf KIY How many times does the motif occur?

  36. Motif in Music What s the motif? http://www.youtube.com/watch?v=rRgXUFnf KIY How many times does the motif occur? Depends on how you define it, right? And that s part of the challenge

  37. Motif in Language Short, recurring pattern of characters in a sequence of characters occurring over time

  38. Motif in Genetics Short, recurring pattern of genes in a sequence of genes occurring over time Typically written as letters

  39. Goal of Motif Extraction Discern a common pattern of characters in a large corpus of characters The characters may vary slightly from case to case

  40. Can you find the motif?

  41. Can you find the motif? UBSWWDFKLWPRHUC INUSUNSGDAAICAV XRZZWCDXOVZZJKQ JBDPXBDVEJVMBKK VBDWNLROFVUBFFW OWIFTIENDOXJXIOB AUAAOOXZAABZSBT VOVCROMCJTOLXYU HUVRYFREENDOBBGC AQJBVXJCAJLEMAU MUAWSNTVZXSFHMI LFQRKUTFRIENDOV ONJORIFCGAUGIRN PJGCHBDQIWJJTMQ LOMTPOQHJVYYMFJ IQYQHKKBNBVDFPV LWGJMVPKYOZNMSA JJLHWPZAYZIGGEH RUPMFOHPVSPPVPT BAZXVFTPQFQJVBM IGJZRMAAWJBESSS JXZFRIEMDOVZRBJY HLPMOKUOXGRIENDO IRPWYIRJISLFVFF

  42. How would you describe the motif? UBSWWDFKLWPRHUC INUSUNSGDAAICAV XRZZWCDXOVZZJKQ JBDPXBDVEJVMBKK VBDWNLROFVUBFFW OWIFTIENDOXJXIOB AUAAOOXZAABZSBT VOVCROMCJTOLXYU HUVRYFREENDOBBGC AQJBVXJCAJLEMAU MUAWSNTVZXSFHMI LFQRKUTFRIENDOV ONJORIFCGAUGIRN PJGCHBDQIWJJTMQ LOMTPOQHJVYYMFJ IQYQHKKBNBVDFPV LWGJMVPKYOZNMSA JJLHWPZAYZIGGEH RUPMFOHPVSPPVPT BAZXVFTPQFQJVBM IGJZRMAAWJBESSS JXZFRIEMDOVZRBJY HLPMOKUOXGRIENDO IRPWYIRJISLFVFF

  43. Finding motifs Several algorithms

  44. Finding motifs Variant on PROJECTION algorithm (Tompa & Buhler, 2001) used in (Shanabrook et al., 2010) Only example of motif extraction in educational data mining so far

  45. Big idea For each character string C that could be a motif example (e.g. all character strings of desired length) Create a set of projections, random variations of C that vary in one or more ways

  46. Big idea For each pair of strings C1 and C2, see how many overlaps there are between their projection matrices Take the pair with the most matches and combine into a motif Creating multi-example motif if 3+ get added together Repeat until goal number of motifs is found, or until new motif is below criterion goodness

  47. Motif in Education Short, recurring pattern of behaviors in a sequence of behaviors occurring over time Written as letters in Shanabrook et al. (2010)

  48. Detail for education How do you segment student behavior? Could use student s interaction on an entire problem, and compute letters across whole problem Might make more sense in tutors with shorter problems Could use student s interaction on an entire problem, and define letters differently for context within whole problem Approach used by Shanabrook et al. (2010) Could use sliding window of N actions

  49. Behaviors in Shanabrook et al. hints (a, b, c) Hints is a measure of the number of hints viewed for this problem. Although each problem has a maximum number of hints, the hint count does not have an upper bound because students can repeat hints and the count will increase at each repeated view. The three categories for hints are: (a) no hints, meaning that thestudent did not use the hint facility for that problem, (b) meaning the student used the hint facility, but was not given the solution, and (c) last hint solved, meaning that the student was given the solution to the problem by the last hint. As described above, this metric combines two values logged by the tutor: the count of hints seen, and an indicator that the final hint giving the answer was seen. The data could have been simply binned low, medium, high hints; however, this would have missed the significance of zero hints and using hints to reveal the problem solution.

  50. Behaviors in Shanabrook et al. secFirst (d, e, f) The seconds to first attempt is an important measure as it is during this time that the student is reading the problem and formulating their response. In previous research [6], five seconds was determined to be a threshold for this metric representing gaming: students who make a first attempt in less than five seconds are considered not working on-task. We divide secFirst into three bins: (d) less than 5 sec, (e) 5 to 30 sec, (f) greater than 30 sec. (d) represents students who are gaming the system, (e) represents a moderate time to the first attempt, (f) represents a long time to the first attempt. The cut at 30 seconds was chosen because it equalizes the distribution of bins (e and f), representing a division between a moderate and a long time to the first attempt.

More Related Content