In this post I want to lay ground work for a major shift in assessment methodology that education will experience in the coming decades. It will do so by discussing educational objectives, heuristic metaphors, and mathematical models.

To be clear, what we are talking about here are mathematical models and how they implement metaphors or ways of verbal reasoning about educational objectives. These models inform how we think about and organise content, including assessment, at the system level. While this post will remain agnostic on the science of how the brain works, these models nevertheless inform how we approach students and organise schooling.

After discussing two metaphors, this post will discuss potential issues in their use with a view to informing teacher participation in a broader debate. While it may be unreasonable to expect teachers to understand the mathematics, it is reasonable to expect teachers to engage at the metaphorical and verbal levels.

**Constellation or Continuum**

The constellation and continuum metaphors have long and evolved histories in the academic and published literature. I will discuss these metaphors in terms of their main exponents and uses.

The **continuum** metaphor, or the ruler metaphor, is the one most Australians would be familiar with or have experienced. It is the metaphor used by both NAPLAN and PISA as part of system evaluation. It is therefore also used in many derivative studies or by those who wish to align themselves with these methodologies. Australia has many world leading exponents for the continuum metaphor with Geoff Masters the most well know due to his development of the Partial Credit Model (Masters, 1982), which was a development of the earlier Rasch Model (Rasch, 1980). The mathematical models associated with this metaphor are generally called Rasch Models or Item Response Theory (e.g. see Ayala, 2009; Embretson & Reise, 2000) which are often described in terms of improvements to Classical Test Theory.

The **constellation** metaphor is not so well known in large scale assessment. A well know exponent is Robert Mislevy who, while remaining pluralistic, opened up the field through his work with others in Evidence Centred Design (ECD) (Almond, Mislevy, Steinberg, Yan, & Williamson, 2015; Mislevy, Steinberg, Almond, Haertel, & Penuel, 2003). This metaphor can also be associated with diagnostic assessment or cognitive assessment (e.g. Leighton & Gierl, 2007, 2011; Rupp & Templin, 2008). The mathematical models associated with this metaphor include Bayesian Networks, Neural Networks and elaborations of Item Response Theory. The constellation metaphor is not as widely used as they are more difficult to implement, although they are often used in post-hoc analysis of learning data.

**A simple example **

The profound differences between the two metaphors can be illustrated through a simple example. Below is a diagram showing a simple test of 8 questions which tests four operations using smaller numbers then larger numbers. *Student A* can do all operations but not with larger numbers. *Student B* can just do addition and subtraction.

The key issue here is that each student has quite a different state of proficiency yet the raw score for these two patterns cannot distinguish between them, so raw scores mathematical models as used by the continuum metaphor cannot readily detect this type of difference. A deviant response pattern may be picked up in a misfit or bias analysis, but unless there is some additional treatment these two students will be reported the same.

The two ways of reporting these two response patterns under each metaphor is illustrated below.

It is clear that differences between the two students are lost under the continuum metaphor, but are captured under the constellation metaphor.

My hypothesis is that Australia is captured by the continuum metaphor due to the good fortune of it having the leading Item Response Theorists in the world (Masters, Adams, Andrich, Wu, Wilson etc), it is this circumstance that has also led to a neglect of the constellation metaphor and a concern about what individual Australian students are able to do; a neglect that has led to a decline in overall student performance and to a paradoxical situation where Australia is well placed to measure its decline. This is a hypothesis only that cannot be empirically proved but which can be reasoned about.

Furthermore, I also contend that the continuum metaphor, with its focus on measurement, comparability and comparisons, is sometimes mistaken for neoliberal forces. It’s not really a conspiracy, but just a by-product of some smart people working very effectively in the endeavor of their interest.

**Discussion**

The constellation and continuum metaphors have corresponding metaphors for how we talk about teaching. Related to constellation metaphor is ‘who a student is’, ‘collection of knowledge’, ‘learning as growth’ and ‘depth and relation’. Related to the continuum metaphor is ‘where a student is’, ‘uni-dimensionality’, ‘teacher as conduit’, ‘learning as filling an empty vessel’.

A particularly effective use of the continuum metaphor is as a system evaluation tool, that’s why it’s used in PISA, NAPLAN and TIMSS. As a system evaluation metaphor it is also very effective at detecting system biases and therefore it served both accountability and civil rights movements in the United States during last century (see Gordon, 2013), which in part has led to the dominance of the metaphor today.

What is clear from the example above is that the continuum metaphor, and by extension NAPLAN, is a poor diagnostic device and is able to provide little information about the student and on what to teach next, other than a vague location where a student may be in relation to other students.

While the constellation metaphor is better at providing diagnostic information to teachers, these sorts of assessments are also a lot more difficult to manage and implement and have therefore not been implemented at scale. Instead, the constellation metaphor is increasingly being used for post-hoc analysis and fishing exercises on causal relations in education; for example learning analytics (e.g. Behrens & DiCerbo, 2014). For those who consider education as a purposeful activity, this type of post-hoc meaning making may be of concern.

I trust this may help some, writing it has helped clarify some of my thoughts.

**Addendum**

Where both the constellation and continuum metaphors are driven by mathematical models, the determination of matters such as bands and cut-scores are largely arbitrary and determined by a choice of parameter. This contrasts to traditional standard setting procedures that are based on the professional judgements of groups of teachers (e.g. see Cizek, 2012) or holistic judgements in higher education (e.g. see Sadler, 2009). The metaphors can of course be used to support teacher judgement, and some methods in Cizek’s book recommend this.

Almond, R. G., Mislevy, R. J., Steinberg, L., Yan, D., & Williamson, D. (2015). *Bayesian Networks in Educational Assessment*. Tallahassee: Springer.

Ayala, R. J. De. (2009). *The Theory and Practice of Item Response Theory*. Guilford Press.

Behrens, J. T., & DiCerbo, K. E. (2014). Harnessing the Currents of the Digital Ocean. In J. A. Larusson & B. White (Eds.), *Learning Analytics:From Research to Practice* (pp. 39–60). New York: Springer.

Cizek, G. J. (Ed.). (2012). *Setting Performance Standards : Foundations, Methods, and Innovations*. New York: Routledge.

Embretson, S. E., & Reise, S. P. (2000). *Item Response Theory for Psychologists*. L. Erlbaum Associates.

Gordon, E. W. (Ed.). (2013). *To Assess, to Teach, to Learn: A Vision for the Future of Assessment : Technical Report*. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf

Leighton, J. P., & Gierl, M. J. (2007). *Cognitive Diagnostic Assessment for Education: Theory and Applications*. New York: Cambridge University Press.

Leighton, J. P., & Gierl, M. J. (2011). *The Learning Sciences in Educational Assessment: The Role of Cognitive Models*. Cambridge University Press.

Masters, G. N. (1982). A rasch model for partial credit scoring. *Psychometrika*, *47*(2), 149–174. doi:10.1007/BF02296272

Mislevy, R. J., Steinberg, L. S., Almond, R. G., Haertel, G. D., & Penuel, W. R. (2003). *Leverage points for improving educational assessment (PADI technical report 2)*. Menlo Park: SRI International.

Rasch, G. (1980). *Probabilistic Models for Some Intelligence and Attainment Tests*. Chicago: MESA PRESS.

Rupp, A. A., & Templin, J. L. (2008). Unique Characteristics of Diagnostic Classification Models: A Comprehensive Review of the Current State-of-the-Art. *Measurement: Interdisciplinary Research & Perspective*, *6*(4), 219–262. doi:10.1080/15366360802490866

Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. *Assessment & Evaluation in Higher Education*.