Observational Methods for Affective Transitions in Real Classrooms

1 / 95

Embed Share

Explore various observational methods for studying affective transitions in real classrooms, including survey results, probing questions, and preferred methods. Learn about advantages and disadvantages of each method and participate in voting for the most effective approach.

gajewski_p Follow

Uploaded on Mar 21, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Observational Methods Part Two January 20, 2010

Todays Class Survey Results Probing Question for today Observational Methods Probing Question for next class Assignment 1

Survey Results Much broader response this time thanks! Good to go with modern technology Generally positive comments Some contradictions in best-part, worst-part The best sign one is doing well is when everyone wants contradictory changes? I will give another survey in a few classes

Todays Class Survey Results Probing Question for today Observational Methods Probing Question for next class Assignment 1

Probing Question For today, you have read D'Mello, S., Taylor, R.S., Graesser, A. (2007) Monitoring Affective Trajectories during Complex Learning. Proceedings of the 29th Annual Meeting of the Cognitive Science Society, 203-208 Which used data from a lab study If you wanted to study affective transitions in real classrooms, which of the methods we discussed today would be best? Why?

Whats the best way? First, let s list out the methods For now, don t critique, just describe your preferred method One per person, please If someone else has already presented your method, no need to repeat it If you propose something similar, quickly list the difference (no need to say why right now)

For each method What are the advantages? What are the disadvantages?

Votes for each method

Todays Class Survey Results Probing Question for today Observational Methods Probing Question for next class Assignment 1

Topics Measures of agreement Study of prevalence Correlation to other constructs Dynamics models Development of EDM models (ref to later)

Agreement/ Accuracy The easiest measure of inter-rater reliability is agreement, also called accuracy # of agreements total number of codes

Agreement/ Accuracy There is general agreement across fields that agreement/accuracy is not a good metric What are some drawbacks of agreement/accuracy?

Agreement/ Accuracy Let s say that Tasha and Uniqua agreed on the classification of 9200 time sequences, out of 10000 actions For a coding scheme with two codes 92% accuracy Good, right?

Non-even assignment to categories Percent Agreement does poorly when there is non-even assignment to categories Which is almost always the case Imagine an extreme case Uniqua (correctly) picks category A 92% of the time Tasha always picks category A Agreement/accuracy of 92% But essentially no information

An alternate metric Kappa (Agreement Expected Agreement) (1 Expected Agreement)

Kappa Expected agreement computed from a table of the form Rater 2 Category 1 Rater 2 Category 2 Rater 1 Category 1 Count Count Rater 1 Category 2 Count Count

Kappa Expected agreement computed from a table of the form Rater 2 Category 1 Rater 2 Category 2 Rater 1 Category 1 Count Count Rater 1 Category 2 Count Count Note that Kappa can be calculated for any number of categories (but only 2 raters)

Cohens (1960) Kappa The formula for 2 categories Fleiss s (1971) Kappa, which is more complex, can be used for 3+ categories I have an Excel spreadsheet which calculates multi-category Kappa, which I would be happy to share with you

Expected agreement Look at the proportion of labels each coder gave to each category To find the number of agreed category A that could be expected by chance, multiply pct(coder1/categoryA)*pct(coder2/categoryA) Do the same thing for categoryB Add these two values together and divide by the total number of labels This is your expected agreement

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is the percent agreement?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is the percent agreement? 80%

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is Tyrone s expected frequency for on-task?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is Tyrone s expected frequency for on-task? 75%

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is Pablo s expected frequency for on-task?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is Pablo s expected frequency for on-task? 65%

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is the expected on-task agreement?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 What is the expected on-task agreement? 0.65*0.75= 0.4875

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 (48.75) What is the expected on-task agreement? 0.65*0.75= 0.4875

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 (48.75) What are Tyrone and Pablo s expected frequencies for off-task behavior?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 (48.75) What are Tyrone and Pablo s expected frequencies for off-task behavior? 25% and 35%

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 (48.75) What is the expected off-task agreement?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 5 Tyrone On-Task 15 60 (48.75) What is the expected off-task agreement? 0.25*0.35= 0.0875

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75) 5 Tyrone On-Task 15 60 (48.75) What is the expected off-task agreement? 0.25*0.35= 0.0875

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75) 5 Tyrone On-Task 15 60 (48.75) What is the total expected agreement?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75) 5 Tyrone On-Task 15 60 (48.75) What is the total expected agreement? 0.4875+0.0875 = 0.575

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75) 5 Tyrone On-Task 15 60 (48.75) What is kappa?

Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75) 5 Tyrone On-Task 15 60 (48.75) What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425 0.529

So is that any good? Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75) 5 Tyrone On-Task 15 60 (48.75) What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425 0.529

Interpreting Kappa Kappa = 0 Agreement is at chance Kappa = 1 Agreement is perfect Kappa = negative infinity Agreement is perfectly inverse Kappa > 1 You messed up somewhere

Kappa<0 It does happen, but usually not in the case of inter-rater reliability Occasionally seen when Kappa is used for EDM or other types of machine learning More on this in 2 months!

0<Kappa<1 What s a good Kappa? There is no absolute standard For inter-rater reliability, 0.8 is usually what ed. psych. reviewers want to see You can usually make a case that values of Kappa around 0.6 are good enough to be usable for some applications Particularly if there s a lot of data Or if you re collecting observations to drive EDM Remember that Baker, Corbett, & Wagner (2006) had Kappa = 0.58

Landis & Kochs (1977) scale Interpretation < 0 No agreement 0.0 0.20 Slight agreement 0.21 0.40 Fair agreement 0.41 0.60 Moderate agreement 0.61 0.80 Substantial agreement 0.81 1.00 Almost perfect agreement

Why is there no standard? Because Kappa is scaled by the proportion of each category When one class is much more prevalent Expected agreement is higher than If classes are evenly balanced

Because of this Comparing Kappa values between two studies, in a principled fashion, is highly difficult A lot of work went into statistical methods for comparing Kappa values in the 1990s No real consensus Informally, you can compare two studies if the proportions of each category are similar

There is a way to statistically compare two inter-rater reliabilities Junior high school meta-analysis

There is a way to statistically compare two inter-rater reliabilities Junior high school meta-analysis Do a 1 df Chi-squared test on each reliability, convert the Chi-squared values to Z, and then compare the two Z values using the method in Rosenthal & Rosnow (1991)

There is a way to statistically compare two inter-rater reliabilities Junior high school meta-analysis Do a 1 df Chi-squared test on each reliability, convert the Chi-squared values to Z, and then compare the two Z values using the method in Rosenthal & Rosnow (1991) Or in other words, nyardley nyardley nyoo

Comments? Questions?

Topics Measures of agreement Study of prevalence Correlation to other constructs Dynamics models Development of EDM models

Observational Methods for Affective Transitions in Real Classrooms

Download Presentation

Presentation Transcript

Related

More Related Content