First I would like to thank ACARA for engaging with my blog The silent tragedy of NAPLAN, students reported in misleading bands. It is great to see that we still have an attentive and responsive public service. For that I am grateful. My problem is with the Australian body politic that drives some of NAPLAN’s policies.
First a quick thanks to all those who responded to the blog. The response was heartening.
The point that I was trying to make with the blog was that there are random students doing NAPLAN who are getting widely misleading results. Probably not a high percentage, but when over a million students undertake the test, 5% is 50,000 students. Most of those students might not even be psychologically affected. But some, perhaps those who tried hard on the promise of an ice-cream, or those who tried hard to please mum who is going through a rough trot, or those who tried hard one last time to be good at numbers or words, will be. Out of one million, the number of students may be less than 10,000, or less than 1%. But this worries the caring teacher type as it causes unnecessary grief.
Students, perhaps more than adults, are particularly vulnerable when things are unfair. Students roughly know where they are with their school work. When they receive feedback that is fair, justified and agrees with their self-perceptions, they generally accept it thoughtfully. Fairness is a big thing in testing (for example, see Camilli, 2006; Zieky, 2015). When feedback is unfair, students can have maladaptive emotions (for example see Vogl & Pekrun, 2016).
As a former maths teacher, I like the mathematics of assessment, often finding it more beautiful than useful. I like the simplicity, elegance and flexibility of the Rasch model (Adams & Wu, 2007; Rasch, 1960). I like the magic of plausible values, that random numbers can sometimes be more useful than real ones (Wu & Adams, 2002). And what is there not to like about a number called a warm estimate (or WLE), that gives a student who doesn’t get anything right a non zero score. But this magic does not always work for me with NAPLAN reports.
Response 1 (it gets a bit boring here)
- in the blog, I mentioned that confidence intervals were not included, I didn’t mention standard errors (trivial difference though)
- The confidence intervals are not shown on student reports (from what I’m told)
- the equivalence tables for 2016 is 26 pages of raw numbers, hardly amenable for parents.
A hypothetical example, using the 2016 Year 3 Spelling Score Equivalence Table, (see below)
A student gets a raw score of 13
from table, is a scaled score of 439.7
from table is reported band 5
from table scale Standard Error 20.36
therefore 90% Confidence interval = 1.64 x 20.36 33.4
therefore 90% Confidence range 473 (Band 5) – 406 (Band 4)
so this student’s graphic report looks something like this (on top) then annotated below
So my claim that student scores are not reliably reported in NAPLAN remains, as does the observation that this unreliability is not clearly communicated to students and parents. Further, it used inappropriately by the media to label improvers, coasters and strugglers .
Sure, and most school-based paper-and-pencil tests administered by teachers are administered under classical test theory (CTT) principles. Perhaps CTT tests lack precision, but student scores relate to the content on the test which enables teachers to give meaningful feedback around that content on the test.
NAPLAN, is based on the Rasch (1960) principle that the raw score is the sufficient statistic and reports on levels which is abstracted from the content (read the book). This makes it close to impossible to give meaningful feedback. Teachers would need to administer another test, or get evidence from elsewhere, to provide meaningful feedback.
Online may provide enhanced precision, but the feedback will remain less useful, and that students within cohorts will do different forms, teachers will need to trawl through test forms to have any hope of providing feedback. Further, the socialising narrative of the test will be diminished, and likely to alienate marginal students.
Moss (2003) argues that in “much of what I do, I have no need to draw and warrant fixed interpretations of students’ capabilities; rather, it is my job to help them make those interpretations obsolete.” The reporting regime of NAPLAN makes results, in the terms of Austin (1962), performative. That is, NAPLAN reporting does not so much describe, but they create and define. Performative statements are not true or false, but either happy or unhappy. Some of NAPLAN student reporting is unhappy.
(I’m happy to amend or withdraw if there are errors, let me know)
Adams, R. J., & Wu, M. (2007). The mixed-coefficients multinomial logit model: A generalized form of the Rasch model. In M. v. Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 57-75). New York, NY: Springer New York.
Austin, J. L. (1962). How to Do Things with Words. Oxford: Oxford University Press.
Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221-256). Westport, CT: American Council on Education and Praeger Publishers.
Moss, P. A. (2003). Reconceptualizing validity for classroom assessment. Educational Measurement: Issues and Practice, 22(4), 13-25.
Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: MESA PRESS. (Original work published 1960)
Vogl, E., & Pekrun, R. (2016). Emotions that matter to achievement. In G. T. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 111-128). New York: Routledge.
Wu, M., & Adams, R. J. (2002). Plausible Values–why they are important. Paper presented at the International Objective Measurement Workshop, New Orleans.
Zieky, M. J. (2015). Developing fair tests. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), The handbook of test development (2 ed., pp. 81-99). New York: Routledge.