The Demise of Teacher Professional Judgement

Follow up to Constellation or Continuum – metaphors for assessment

There are many ways in which teacher professional judgement can shape schooling.  Teachers can participate in the development of study designs, curriculum and syllabus, and they can also participate in exam setting, exam marking and standard setting.  In this way teachers perform sophisticated social roles in mediating between systems and the lifeworld of students as well as in setting and maintaining educational norms and expectations on behalf of the community. This kind of participation, where teachers both contribute to the creation of norms and learn how to teach them, is present in all systems to some extent, and highlights the important roles as moral agents and moral leaders that teachers can have.   However there are currently two developments working against teachers taking on system roles as moral agents:  1) instrumental reasoning of mathematical models and 2) the post-conventional/post-traditional nature of technology based education making teacher participation problematic.

Instrumental Reasoning

Where once curriculum and assessment were reflections of social expectation (including expectation of industry), this normative function has to some extent been superseded by uni-dimensional models of curriculum and assessment, mainly the Item Response Theory models (e.g. see Ayala, 2009; Embretson & Reise, 2000; Masters, 1982; Rasch, 1980) and its associated continuum metaphor.  In education systems where Item Response Theory models becomes prevalent learning progressions are less determined by social expectation and more determined by instrumentally defined scale progression, so that curriculum begins to comprise of ‘content that scales’ instead of content that meets social expectations.  Once curriculum content is comprised of ‘content that scales’, teachers’ participation in standard setting is no longer a requirement as instead of socially defined educational standards these standards can be set by way of cut-points, cut-scores and bands instrumentally and arbitrarily defined by application of Item Response Theory  based algorithms.

My thesis will argue that this phenomenon can lead to various outcomes including 1) alienation of teachers’ work, 2) curriculum and assessment not addressing social expectations, 3) students alienated from society and not fully socialised, and 4) a general loss of social capital across the system. It can also be seen as very efficient and cost saving as it doesn’t require expensive teacher engagement.

Post-conventional or post-traditional nature of education

The need to develop new educational norms and expectations during a time of developments in digital technology presents another issue for teacher engagement. Beavis (2010, p. 26) articulates this well when she states that factors such as cultural heritage and identity are at play for not only the student and teacher but also the subject itself.  The required moral reasoning of teachers is therefore far greater at a time where the system capacity of teachers has been greatly diminished through cutbacks etc. This leaves a vacated landscape that private sector can seek to fill (e.g Ultranet see Bajkowski, 2013), or other consortia (e.g. 21st Century Skills see Griffin, McGaw, & Care, 2012).

Discussion

Not all contemporary assessments are grounded on mathematical models. For example the Victorian Certificate of Education (VCE) is one example of curriculum and assessment that is firmly socially grounded.  The study designs for the VCE (VCE study Designs)  reflect the social, cultural and economic activity of Victoria, and Victorian teachers are actively involved in its design and implementation, including exam setting and marking. The VCE also uses routine statistical techniques (standardization and normalization) to create a single score and then ATAR for students that can be used as currency in the future job and education market in Victoria and beyond. These features make VCE a highly regarded qualification but that it has such significant social buy-in will make it difficult to adapt to technology-based. Although this can be overcome with good management, good planning and sufficient resources for stakeholder engagement.

There is also some hope produced by the constellation metaphor and in the use of Bayesian techniques in the development of curriculum and assessment that is more comprehensive (e.g. Almond, Mislevy, Steinberg, Yan, & Williamson, 2015). However the establishment of good Bayesian belief networks also requires extensive experienced teacher participation, so the danger of the constellation metaphor is that instead of relying on teachers’ input for belief networks, these networks will instead by based on trawling through learning analytic data. Should this occur, my thesis is that this would also lead to alienating circumstances for teachers and students.

My thesis will develop with the view that sophisticated and social cohesive education systems have a sufficient base of morally competent teachers that are involved in the setting of curriculum and assessment, where the judgement of these teachers are informed and supported by sophisticated data systems (constellation and continuum). Of course this could potentiality bifurcate the other way, where teachers and students become increasingly alienated by technocratic systems.

Almond, R. G., Mislevy, R. J., Steinberg, L., Yan, D., & Williamson, D. (2015). Bayesian Networks in Educational Assessment. Tallahassee: Springer.

Ayala, R. J. De. (2009). The Theory and Practice of Item Response Theory. Guilford Press.

Bajkowski, B. J. (2013). News Review . Vic Auditor fails Ultranet, (March).

Beavis, C. A. (2010). English in the Digital Age: Making English Digital. English in Australia, 45(2), 21–30. Retrieved from http://www98.griffith.edu.au/dspace/handle/10072/37149

Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. L. Erlbaum Associates.

Griffin, P., McGaw, B., & Care, E. (Eds.). (2012). Assessment and Teaching of 21st Century Skills. Dordrecht: Springer. doi:10.1007/978-94-007-2324-5

Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. doi:10.1007/BF02296272

Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: MESA PRESS.

Constellation or Continuum – metaphors for assessment

In this post I want to lay ground work for a major shift in assessment methodology that education will experience in the coming decades. It will do so by discussing educational objectives, heuristic metaphors, and mathematical models.

To be clear, what we are talking about here are mathematical models and how they implement metaphors or ways of verbal reasoning about educational objectives.  These models inform how we think about and organise content, including assessment, at the system level. While this post will remain agnostic on the science of how the brain works, these models nevertheless inform how we approach students and organise schooling.

After discussing two metaphors, this post will discuss potential issues in their use with a view to informing teacher participation in a broader debate. While it may be unreasonable to expect teachers to understand the mathematics, it is reasonable to expect teachers to engage at the metaphorical and verbal levels.

Constellation or Continuum

The constellation and continuum metaphors have long and evolved histories in the academic and published literature. I will discuss these metaphors in terms of their main exponents and uses.

The continuum metaphor, or the ruler metaphor, is the one most Australians would be familiar with or have experienced.  It is the metaphor used by both NAPLAN and PISA as part of system evaluation. It is therefore also used in many derivative studies or by those who wish to align themselves with these methodologies.  Australia has many world leading exponents for the continuum metaphor with Geoff Masters the most well know due to his development of the Partial Credit Model (Masters, 1982), which was a development of the earlier Rasch Model (Rasch, 1980).  The mathematical models associated with this metaphor are generally called Rasch Models or Item Response Theory (e.g. see Ayala, 2009; Embretson & Reise, 2000) which are often described in terms of improvements to Classical Test Theory.

The constellation metaphor is not so well known in large scale assessment.  A well know exponent is Robert Mislevy who, while remaining pluralistic, opened up the field through his work with others in Evidence Centred Design (ECD) (Almond, Mislevy, Steinberg, Yan, & Williamson, 2015; Mislevy, Steinberg, Almond, Haertel, & Penuel, 2003). This metaphor can also be associated with diagnostic assessment or cognitive assessment (e.g. Leighton & Gierl, 2007, 2011; Rupp & Templin, 2008). The mathematical models associated with this metaphor include Bayesian Networks, Neural Networks and elaborations of Item Response Theory. The constellation metaphor is not as widely used as they are more difficult to implement, although they are often used in post-hoc analysis of learning data.

A simple example

The profound differences between the two metaphors can be illustrated through a simple example. Below is a diagram showing a simple test of 8 questions which tests four operations using smaller numbers then larger numbers.  Student A can do all operations but not with larger numbers. Student B can just do addition and subtraction.

Responses

The key issue here is that each student has quite a different state of proficiency yet the raw score for these two patterns cannot distinguish between them, so raw scores mathematical models as used by the continuum metaphor cannot readily detect this type of difference.  A deviant response pattern may be picked up in a misfit or bias analysis, but unless there is some additional treatment these two students will be reported the same.

The two ways of reporting these two response patterns under each metaphor is illustrated below.

stars_ruler

It is clear that differences between the two students are lost under the continuum metaphor, but are captured under the constellation metaphor.

My hypothesis is that Australia is captured by the continuum metaphor due to the good fortune of it having the leading Item Response Theorists in the world (Masters, Adams, Andrich, Wu, Wilson etc), it is this circumstance that has also led to a neglect of the constellation metaphor and a concern about what individual Australian students are able to do; a neglect that has led to a decline in overall student performance and to a paradoxical situation where Australia is well placed to measure its decline.  This is a hypothesis only that cannot be empirically proved but which can be reasoned about.

Furthermore, I also contend that the continuum metaphor, with its focus on measurement, comparability and comparisons, is sometimes mistaken for neoliberal forces. It’s not really a conspiracy, but just a by-product of some smart people working very effectively in the endeavor of their interest.

Discussion

The constellation and continuum metaphors have corresponding metaphors for how we talk about teaching.  Related to constellation metaphor is ‘who a student is’, ‘collection of knowledge’, ‘learning as growth’ and ‘depth and relation’. Related to the continuum metaphor is ‘where a student is’, ‘uni-dimensionality’, ‘teacher as conduit’, ‘learning as filling an empty vessel’.

A particularly effective use of the continuum metaphor is as a system evaluation tool, that’s why it’s used in PISA, NAPLAN and TIMSS.  As a system evaluation metaphor it is also very effective at detecting system biases and therefore it served both accountability and civil rights movements in the United States during last century (see Gordon, 2013), which in part has led to the dominance of the metaphor today.

What is clear from the example above is that the continuum metaphor, and by extension NAPLAN, is a poor diagnostic device and is able to provide little information about the student and on what to teach next, other than a vague location where a student may be in relation to other students.

While the constellation metaphor is better at providing diagnostic information to teachers, these sorts of assessments are also a lot more difficult to manage and implement and have therefore not been implemented at scale. Instead, the constellation metaphor is increasingly being used for post-hoc analysis and fishing exercises on causal relations in education; for example learning analytics (e.g. Behrens & DiCerbo, 2014).  For those who consider education as a purposeful activity, this type of post-hoc meaning making may be of concern.

I trust this may help some, writing it has helped clarify some of my thoughts.

Addendum

Where both the constellation and continuum metaphors are driven by mathematical models, the determination of matters such as bands and cut-scores are largely arbitrary and determined by a choice of parameter. This contrasts to traditional standard setting procedures that are based on the professional judgements of groups of teachers (e.g. see Cizek, 2012) or holistic judgements in higher education (e.g. see Sadler, 2009).  The metaphors can of course be used to support teacher judgement, and some methods in Cizek’s book recommend this.

Almond, R. G., Mislevy, R. J., Steinberg, L., Yan, D., & Williamson, D. (2015). Bayesian Networks in Educational Assessment. Tallahassee: Springer.

Ayala, R. J. De. (2009). The Theory and Practice of Item Response Theory. Guilford Press.

Behrens, J. T., & DiCerbo, K. E. (2014). Harnessing the Currents of the Digital Ocean. In J. A. Larusson & B. White (Eds.), Learning Analytics:From Research to Practice (pp. 39–60). New York: Springer.

Cizek, G. J. (Ed.). (2012). Setting Performance Standards : Foundations, Methods, and Innovations. New York: Routledge.

Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. L. Erlbaum Associates.

Gordon, E. W. (Ed.). (2013). To Assess, to Teach, to Learn: A Vision for the Future of Assessment : Technical Report. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf

Leighton, J. P., & Gierl, M. J. (2007). Cognitive Diagnostic Assessment for Education: Theory and Applications. New York: Cambridge University Press.

Leighton, J. P., & Gierl, M. J. (2011). The Learning Sciences in Educational Assessment: The Role of Cognitive Models. Cambridge University Press.

Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. doi:10.1007/BF02296272

Mislevy, R. J., Steinberg, L. S., Almond, R. G., Haertel, G. D., & Penuel, W. R. (2003). Leverage points for improving educational assessment (PADI technical report 2). Menlo Park: SRI International.

Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: MESA PRESS.

Rupp, A. A., & Templin, J. L. (2008). Unique Characteristics of Diagnostic Classification Models: A Comprehensive Review of the Current State-of-the-Art. Measurement: Interdisciplinary Research & Perspective, 6(4), 219–262. doi:10.1080/15366360802490866

Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education.

 

NAPLAN, Conflict of Interest and Research Ethics

Rejoinder to Timna Jacks – Company marking NAPLAN accused of conflict of interest

Timna Jacks’ article on Pearson Australia is a welcome reminder of Jean-François Lyotard’s (1979) seminal observations about data banks and the commercialization of knowledge.  The issues Lyotard identified have been emerging for decades and can be addressed through the tradition of research ethics.

Koomen_Poster_DataUses

Research ethics seeks to protect the vulnerable in data collection, in this case students.  The traditional ethical concerns of informed consent and conflict of interest are central to the issue brought to light by Jacks and are concerns that have been somewhat disregarded in the data frenzy currently capturing the education sector. On informed consent there are four key ethical issues: 1) do students have a choice about participation, 2) do students trust the data collection process, 3) are students confident that their results will be used fairly, and 4) are the interests of students, data agencies and third parties balanced.  The nature of students’ informed consent is unclear, there is a social compulsion to participate on the basis that there is a legislative mandate for students to attend school, but the mandate to attend school translates into a social expectation, and not compulsion, for students to participate in NAPLAN.  The key issue in Jacks’ article centers on the balanced interests of parties. Are the interests of students sufficiently balanced with the interests of others? The article suggests no.

By way of background, education is currently experiencing a clash of data collection traditions.  Traditionally educational assessment focused on providing a reliable indicator that teachers could use to report to parents and that systems could report to students for use in the broader education and job markets.    A second distinct data tradition relates to school evaluation and accountability such as the evaluative reports made available through the MySchool website – www.myschool.edu.au.  That NAPLAN provides a reliable indicator to parents and systems generates broad public support for the program – even if its curriculum coverage is somewhat limited. But there are now three other data traditions operating across education that may also be infiltrating NAPLAN and which may not be so transparent: scientific education research, quality management, and market research.

Public support for scientific education research tends to be high but this is a little more fraught. Scientific research is littered with disturbing episodes (e.g. Albert Neisser, Willowbrook, Tuskegee) but has been largely tamed through initiatives such as the Declaration of Helsinki and the work of university ethics committees. The extent to which NAPLAN data is used for scientific research and the ethical frameworks surrounding this research is unclear.  Therefore there is some justification for public concern on these matters.

Walter Shewhart in quality management provides yet another data tradition.  This tradition became prevalent during the industrial age to ensure quality and reproducibility of manufacturing. These techniques are now widely applied in the service sector and are increasingly being applied in education. In education, this tradition is used by educational administrators to influence the work of schools and teachers.

Finally, it is the tradition of market research that is the most pernicious in education and the issue at the heart of Jacks’ article; the possibility that data collected on the basis of creating a common understanding is being used for concealed strategic action.  While it is unlikely that this may be happening within such a large organisation, it is the possibility that it might be happening that is of concern, and it questions the social expectation that we as adults place on children to participate in NAPLAN.

The existing regulatory framework around data collection in Victoria, for example, is quite fragmented and patchwork. Children are mandated to attend school through the Education and Training Reform Act 2006 which is silent on participation in testing. There’s also the Privacy and Data Protection Act 2014, the Health Records Act 2001 and Public Records Act 1973. It is uncertain if this legislative and regulatory framework appropriately addresses the underlying issues first identified by Lyotard and which Jacks’ alerts us to in her article.

Jacks alerts us to a significant issue in education and we need to be thankful for her efforts. But it’s potentially only the tip of the iceberg in terms of ethical issues.  Two actions are required of government. First, a comprehensive review of the legislative and regulatory frameworks around data collection in education.  Should any shortcomings be identified these need to be addressed and new standards promulgated to bureaucrats, contractors, parents, teachers and students. The second action relates to conflict of interest. Government needs to centralize data collection in a new statutory agency independent from education administration and commercial education services.  That is, data collection, indicator production and reporting should reside in an authority independent from the Department responsible for the management of schools and teachers, and reside in an authority with no other responsibility but data, indicators and reporting. This would also mitigate the kind of ethical issues the Victorian department has recently experienced.