Challenges of Using Rasch Analysis in Grading Essays

why rasch analysis is not the answer in grading n.w
1 / 19
Embed
Share

Explore why Rasch Analysis may not be the ideal method for grading essays, as discussed by Gavin T.L. Brown from The University of Auckland. This analysis delves into the unreliability of human ratings, issues with leniency and harshness, and highlights the assumptions that may not align with the complexities of evaluating extended writing assignments.

  • Rasch Analysis
  • Essay Grading
  • Writing Assessment
  • Academic Evaluation
  • Performance Ratings

Uploaded on | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Why Rasch analysis is not the answer in grading essays Gavin T L Brown, The University of Auckland (gt.brown@auckland.ac.nz) Nordic Testing and Assessment of Writing Symposium, Trondheim September 2015

  2. Progress in Writing Ordered categories of quality Excellent more than expected or desired . Unsatisfactory well below expected OR Highly accomplished appropriate to expert . Novice appropriate to beginner Norm-referenced Criterion-referenced Guide to teaching and learning and evaluation

  3. Rating Guided judgments of expert raters (teachers) as to best-fit of performance to a quality stage Often supplemented by a rubric

  4. Unreliable/inconsistent Humans are unreliable & inconsistent in their ratings Some are far from target but consistent (#2) Others are inconsistent & rarely on target (#4) Some are close to but not on the target (#3) Training, monitoring, moderation are needed to gain accuracy and precision

  5. Lenience Some are consistent but either too lenient (never give a low mark) or too harsh (never give a high mark) These can be adjusted statistically once they are identified Harshness

  6. Assumptions: Rasch statistical modeling All responses are a function of a single factor the mutually measured difficulty of an item and the ability of a person No other factor matters (e.g., discrimination of item, amount of chance, choice of task, marker, etc.) But this is clearly not applicable to realities of marking extended writing.

  7. Statistical Analysis of Ratings Multiple facets to take into account 1. The students performances (p) which differ from each other in quality 2. The task (t) each student completes (unless all do the same) 3. The raters (r) who differ from each other (unless only one) 4. Component (c) or sub-scores within each performance 5. The interactions of p*t; p*r; t*r; p*c; r*c; t*c; p*t*r; p*t*c; p*r*c; t*r*c; p*t*r*c Goal: variance in scores should be attributable to the student (p); NOT the task they did, the marker they had, the component being used to judge, or any interaction of these construct irrelevant factors.

  8. Good scoring of writing if . Student scores are spread across range Markers are close to each other Tasks are close to each other Components or sub-scores are close to each other

  9. Techniques for analysis of multiple facets Multi-facet Rasch analysis Generalisability theory Both look for proportion of variance in observed scores based on presence of facets. Key difference: Rasch analysis transforms raw score to a logit (log-odds) before analysis

  10. Example: e-asTTle Writing MFR analysis Highly able Very easy Shows distribution of students, tasks, markers, & sub-scores from a norming sample

  11. Example: e-asTTle Writing MFR analysis Example: Rasch analysis-asTTle writing 1 SD 1 SD 1 SD Almost all elements are within 1 SD of each other Almost all markers are within 1 SD of each other Almost all tasks are within 1 SD of each other

  12. Hence Yes markers, tasks, and components are not identical in their tendency to award marks and differences are beyond chance But most variation comes correctly from the students not construct-irrelevant factors Having established this do we need to make any further adjustments? And if so which value is the correct one? The highest, lowest, average? Which one do we agree is the true value?

  13. Hence, the real problem In classroom operation what is the TRUE essay score? What is a valid reference point? Each teacher s truthiness? a test score on a related construct? The statistically adjusted score? The socially agreed value assigned by experts following a systematic process?

  14. Systematic Classroom Processes Common rubric to guide judgments and discussion Linked to curriculum Training in use of rubric Exemplars, feedback Moderation by another marker Simple checks for level of agreement Discussion of differences Non-use of scores until agreement sufficient

  15. Analytic Writing Rubric Level Audience Awareness and Purpose 2 Proficient Evidence that writer recognises that his/her opinion is needed. May state opinions from a personal perspective. 3 Proficient Language generally appropriate to audience. Some attempt to influence the reader is evident. 4 Proficient Writer aware audience may hold a different point of view but tends to assume there is only one different generalised point of view. Opening presents a point of view to audience. Writing attempts to persuade reader. Clearly stated consistent position is evident. 5 Proficient Identifies & relates to a concrete/specific audience. Awareness of intended reader evident at beginning & end, but may be inconsistent in middle sections. Language use is appropriate and has elements which begin to be persuasive to audience. 6 Proficient Implicit awareness that audience may hold a range of points of view. Consistently persuasive throughout for intended audience. Tone likely to impact on or affect change or manipulate the intended audience towards author s purpose. In NZ the asTTle system has 7 analytic scales : Audience Awareness & Purpose, Content or Ideas, Structure or Organisation, Language and Style, Grammar, Punctuation, Spelling Available at http://www.tki.org.nz/r/asttle/user/writing-tuhituhi-ex_e.php

  16. Moderation of scoring Cross-checking by having 2 qualified judges mark and compare scores for a common group of essays Identical scores: Target is 70% the same Approximately equal (+/- 1 score point): Target is 90% the same if using A+ to F scale Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9(4), Available online: http://tinyurl.com/m7grwph.

  17. Moderation of scoring Debate and discussion and resolution is needed for any essay that differs by more than 1 letter grade or 3/20 or 10/100 Discussion must be linked to evidence in essay and criteria in scoring guide If agreement can t be reached need 3rd judge who should be MORE experienced than both markers If you meet the expected targets you can use the scores defensibly to make decisions about learning needs and priorities and to report

  18. Conclusion Multi-facet Rasch or Generalisability theory can determine whether construct-irrelevant factors undermine validity of scores This is necessary for norming purposes BUT: Classroom judgment by teachers cannot be treated the same way until they are calibrated into the system with equivalent training as norm-marking panels Simple inter-rater moderation statistics and discussion are sufficient to generate dependable scores

  19. Further New Zealand writing assessment resources at: http://assessment.tki.org.nz/Moderation/Moderation- resources/Moderating-asTTle-writing http://assessmhttp://assessment.tki.org.nz/Assessment-tools- resources/What-Next http://assessment.tki.org.nz/Assessment-tools-resources/The-NZ- Curriculum-Exemplars/English-exemplars

Related


More Related Content