
Formative Assessment Tools Development and Validation in Education
Explore the development and validation of assessment tools in education, focusing on Formative Assessment of Instruction (FASI). Learn about FASI definition, examples, content questions, perceptions statements, and the value of content versus perceptions in the teaching and learning process.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Assessment Tools: Development and Validation Wendy K. Adams Colorado School of Mines
Introduction Survey Tools: Formative Assessment of Instruction - FASI Definition Development and Validation (experts & students) Interpretation
Definition FASI - Formative Assessment of Instruction: Assessment of content knowledge Typically multiple choice Examples: FCI, BEMA, CSEM, QMCS, CUE Assessment of Perceptions Typically likert scale Examples: CLASS, PTaP
Content Question (FCI - Force Concept Inventory) Despite a very strong wind, a tennis player manages to hit a tennis ball with her racquet so that the ball passes over the net and lands in her opponent's court. Consider the following forces: 1. A downward force of gravity. 2.A force by the "hit". 3.A force exerted by the air. Which of the above forces is (are) acting on the tennis ball after it has left contact with the racquet and before it touches the ground? A. 1 only. B. 1 and 2. C. 1 and 3. D. 2 and 3. E. 1, 2, and 3.
Perceptions Statements After I study a topic in physics and feel that I understand it, I have difficulty solving problems on the same topic. 1 2 3 4 5 Strongly Agree Strongly Disagree I would become a grade 7-12 teacher if the pay were equal to my other career options 1 2 3 4 5 Strongly Agree Strongly Disagree
FASI Value Content or Perceptions Instructors agree that students should be able to answer these questions expect students to value the subject in their daily life Students do poorly on these concept tests when they enter the course disagree with experts on the subject applying to daily life and how to learn. Remarkably stubborn conceptual understanding doesn t change much after instruction Often perceptions of application and how to learn gets less expert-like after instruction
Introduction Survey Tools: Formative Assessment of Instruction - FASI Definition Development and Validation (experts & students) Interpretation
Key References AERA (American Educational Research Association), APA (American Psychological Association), & NCME (National Council on Measurement and Education). (1999). Standards for educational and psychological testing. Washington, DC: Author. NRC (National Research Council). (2001). Knowing what students know. The science and design of educational assessment. In J. W. Pellegrino, N. Chudowsky, & R. Glaser (Eds.), Committee on the foundations of Assessment (Board on Testing and Assessment Center for Education Division of Behavioral and Social Sciences and Education) (pp. 1 14). Washington, DC: National Academy Press. Adams, W. K. and Wieman. C.E. (2011). Development and validation of instruments to measure learning of expert-like thinking. International Journal of Science Education, 33, 9, 1289-1312.
Development Phase 1. Delineation of the purpose of the test and the scope of the construct or the extent of the domain to be measured Phase 2. Development and evaluation of the test specifications Phase 3. Development, field testing, evaluation, and selection of the items and scoring guides and procedures Phase 4. Assembly and evaluation of the test for operational use
Development Phase 2. Development and evaluation of the test specifications item format (forced answer) desired psychometric properties (low stakes, selection of a few concepts / perceptions) time restrictions (less than a class period/ 10 minutes) characteristics of the population test procedures (eg. pre/post)
Development Phase 3. Development, field testing, evaluation, and selection of the items and scoring guides and procedures; and Phase 4. Assembly and evaluation of the test for operational use. These two are the bulk of the work and constitute both Development & Validation
Validation Collecting evidence of validity rather than validating the instrument. Evidence based on test content how well does it represent the domain in question Evidence based on response processes (eg. If measuring reasoning, is the test taker using reasoning or an algorithm to answer.) Evidence based on internal structure single dimension or several components Evidence based on relations to other variables
Development & Validation Expert interviews 1. 2. Establish topics that are important to teachers. Interviews and observations to identify student thinking and the ways it can deviate from expert thinking. Create open-ended survey questions to probe student thinking more broadly Create a forced answer test. Carry out validation interviews with both novices and experts on the test questions. Administer to classes and experts - run statistical tests on the results. Modify items as necessary. Student interviews 3. Collect data Modify 4. 5. Expert interviews Student interviews 6. Collect data 7.
Content Questions assessments ... should focus on making students thinking visible to both their teachers and themselves so that instructional strategies can be selected to support an appropriate course of future learning (NRC guidelines, 2001, p. 4) Straight forward questions with clear language Students think they understand but they don t! Distracters (possible answers) Typically 3 5 distracters per question Include common incorrect responses Do not include any options that are not commonly chosen
Perceptions Statements Quote from things experts say and things students say that you don t like to hear. Straight forward language, ~4th grade level, avoid using not Misconceptions about Surveys: Positive and negative version of each statement (rarely works) Do not need to ask the same thing multiple times (less reliable) Does not need to be limited to one construct
Student Interviews Think-aloud interviews Interpret the questions/statements consistently Agree with the expert for expert-like reasons Choose correct answer for right reasons only Student thinking when they disagree with the expert choose the wrong answer. ~30 per version per population Only valid for populations used for development and validation
Despite a very strong wind, a tennis player manages to hit a tennis ball with her racquet so that the ball passes over the net and lands in her opponent's court. Consider the following forces: 1. A downward force of gravity. 2. A force by the "hit". 3. A force exerted by the air. Which of the above forces is (are) acting on the tennis ball after it has left contact with the racquet and before it touches the ground? A. 1 only. B. 1 and 2. C. 1 and 3. D. 2 and 3. E. 1, 2, and 3.
Despite a very strong wind, a tennis player manages to hit a tennis ball with her racquet so that the ball passes over the net and lands in her opponent's court. Consider the following forces: 1. A downward force of gravity. 2. A force by the "hit". 3. A force exerted by the air. Very popular Which of the above forces is (are) acting on the tennis ball after it has left contact with the racquet and before it touches the ground? A. 1 only. B. 1 and 2. C. 1 and 3. D. 2 and 3. E. 1, 2, and 3. 100 50 0 A B C D E
Interviews find Definition of force Needs a force to be moving Clarity of timing isolating moments in time
A large box is pulled with a constant horizontal force. As a result, the box moves across a level floor at a constant speed. (Two other questions here) 28. If, instead, the horizontal force pulling the box is doubled. The box s speed: A. continuously increases. B. will be double the speed but still constant. C. is greater and constant, but not necessarily twice as great. D. is greater and constant for awhile and increases thereafter. E. increases for a while and constant thereafter.
100 A large box is pulled with a constant horizontal force. As a result, the box moves across a level floor at a constant speed. (Two other questions here) 50 0 A B C D E 28. If, instead, the horizontal force pulling the box is doubled. The box s speed: A. continuously increases. B. will be double the speed but still constant. C. is greater and constant, but not necessarily twice as great. D. is greater and constant for awhile and increases thereafter. E. increases for a while and constant thereafter.
Interviews find This one needs several concepts Understanding that there s a difference between constant speed and speeding up. Net force results in an acceleration How can something just keep speeding up?
Statistical Analyses Textbook cutoffs do not necessarily apply Reliability - are the results consistent? a. Alternate-form coefficients b. Test-retest or stability coefficients c. Internal consistency coefficients cronbach s alpha (only works if single construct) The ideal approach to the study of reliability entails independent replication of the entire measurement process. (Standards 1999, p. 27)
Statistical Analyses Textbook cutoffs do not necessarily apply Item Analysis item difficulty percentage of students who got the item correct item discrimination how well an item differentiates between strong and weak students, as defined by their overall test score point biserial correlation Correlation between the item and students test scores Item correlations Correlations between individual items (>0.6 discard one of them)
Statistical Analyses Factor Analysis Data intensive technique Identifies which groups of questions/statements are answered consistently by students Concept tests - group concepts Perceptions empirical categories for analysis
Development & Validation Expert interviews 1. 2. Establish topics that are important to teachers. Interviews and observations to identify student thinking and the ways it can deviate from expert thinking. Create open-ended survey questions to probe student thinking more broadly Create a forced answer test. Carry out validation interviews with both novices and experts on the test questions. Administer to classes and experts - run statistical tests on the results. Modify items as necessary. Student interviews 3. Collect data Modify 4. 5. Expert interviews Student interviews 6. Collect data 7.
Introduction Survey Tools: Formative Assessment of Instruction - FASI Definition Development and Validation (experts & students) Interpretation
Scoring and Analysis Course Results Percent correct Normalized Gain Fraction of what they could have learned (Post Pre) / (100-Pre) Effect size How many standard deviations change (Post - Pre) / (Pooled St Dv)
Pre-test Scores 90.0 % Correct on Pre-test 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 Male 0.0 Female Indiana University (IE) [10] University of Minnesota Harvard (IE1) [15] Harvard (IE2) [15] Univeristy (IE) [13] Florida International University Florida International University Hull, UK (IE) [14] University of Wisconsin, Stout University of Northern Colorado University of Northern Colorado Manchester, UK (IE) [14] Michigan State (Traditional) [12] (Modeling) [12] (Traditional) [9] (Alg-Trad) (IE) [11] (Alg-IE)
Post-test Scores 100.0 % Correct on Post-test 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 Male 0.0 Female Indiana University (IE) [10] Harvard (IE1) [15] Harvard (IE2) [15] Univeristy (IE) [13] Florida International University Florida International University Hull, UK (IE) [14] University of Wisconsin, Stout University of Northern Colorado University of Northern Colorado University of Minnesota Manchester, UK (IE) [14] Michigan State (Traditional) [12] (Modeling) [12] (Traditional) [9] (Alg-Trad) (IE) [11] (Alg-IE)
Pre and Post 90.0 100.0 % Correct on Post-test % Correct on Pre-test 90.0 80.0 80.0 70.0 70.0 60.0 60.0 50.0 50.0 40.0 40.0 30.0 30.0 20.0 20.0 10.0 10.0 0.0 0.0 University of Wisconsin, University of Wisconsin, University of Northern Colorado Michigan State Michigan State Florida International University Florida International University Florida International University University of Northern Colorado University of Northern Colorado Florida International University University of Northern Colorado University of Minnesota University of Minnesota Indiana University (IE) [10] Indiana University (IE) [10] Harvard (IE1) [15] Harvard (IE2) [15] Harvard (IE1) [15] Harvard (IE2) [15] Hull, UK (IE) [14] Hull, UK (IE) [14] Manchester, UK (IE) [14] Manchester, UK (IE) [14]
Percent Gain 45.0 % gain -( %Post - %Pre) 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 Male 0.0 Female Indiana University (IE) [10] University of Minnesota Harvard (IE1) [15] Harvard (IE2) [15] Univeristy (IE) [13] Florida International University Florida International University Hull, UK (IE) [14] University of Wisconsin, Stout University of Northern Colorado University of Northern Colorado Manchester, UK (IE) [14] Michigan State (Traditional) [12] (Modeling) [12] (Traditional) [9] (Alg-Trad) (IE) [11] (Alg-IE) % Gain: The sheer volume of material learned, measured by FCI, for UNC IE is greater than other courses shown.
Normalized Gain 0.80 <g> normalized gain 0.70 0.60 0.50 0.40 0.30 0.20 0.10 Male 0.00 Female Indiana University (IE) [10] University of Minnesota Univeristy (IE) [13] Harvard (IE1) [15] Harvard (IE2) [15] Florida International University Florida International University Hull, UK (IE) [14] University of Wisconsin, Stout University of Northern Colorado University of Northern Colorado Manchester, UK (IE) [14] Michigan State (Traditional) [12] (Modeling) [12] (Traditional) [9] (Alg-Trad) (IE) [11] (Alg-IE) Normalized Gain: The Normalized gain for IE is nearly equivalent for the females and males. <g> = (%post-%pre)/(100%-%pre)
Effect Size 4.00 Cohens' d - effect size 3.50 3.00 2.50 2.00 1.50 1.00 0.50 Male 0.00 Female Indiana University (IE) [10] University of Minnesota Univeristy (IE) [13] Harvard (IE1) [15] Harvard (IE2) [15] Florida International University Florida International University Hull, UK (IE) [14] University of Wisconsin, Stout University of Northern Colorado University of Northern Colorado Manchester, UK (IE) [14] Michigan State (Traditional) [12] (Modeling) [12] (Traditional) [9] (Alg-Trad) (IE) [11] (Alg-IE) Effect Size: The effect size shows how much the mean has moved compared to the standard deviation of the population. This data clearly shows UNC IE and Indiana have the most impressive increases in movement of means. d = (%post-%pre)/(pooled sd)
Scoring and Analysis Course or Population Results Percent agreement with the expert
Course pre/post Shifts SCI 265 Traditional -25.0 -15.0 -5.0 5.0 15.0 25.0 Overall All categories Personal Interest Real World Connection PS General PS Confidence PS Sophistication SensesMaking/Effort Conceptual understanding Applied Conceptual understanding
Beliefs by Major 1st and 2nd yr grads 2nd yr Phys majors 1st yr Phys majors Whole class mostly engineers Bio/Physiology & Chem Non-sci majors Personal Interest Overall Elem. teacher cand. 0% 20% 40% 60% 80% 100% % Favorable Score (PRE)
CLASS Women vs. Men One study asked TWO questions: What would a Physicist say? What do you think? ConceptualUnderstanding Math SensesMaking/Effort Real World Connections Personal Interest ConceptualUnderstanding Math SensesMaking/Effort Real World Connections Personal Interest 100.00 100.00 80.00 Favorable 80.00 Favorable 60.00 60.00 40.00 40.00 0.00 20.00 40.00 0.00 20.00 40.00 Unfavorable Unfavorable You results consistent w/ typical CLASS scores
Conclusion Formative Assessments of Instruction Characterize who s in your class Identify strengths and weaknesses of instruction Compare different types of instruction within and across institutions Development and Validation Student interviews, expert interviews, data collection, and iteration. Interpretation Not just a single number