Validity, reliability, washback … raise your hand if you have ever been confused by a) what these terms mean and b) how they can actually help you in your real-life work as a teacher.

If you have done the Cambridge DELTA or an MA TESOL, you will be familiar with this terminology related to language assessment. I certainly was, and have recently re-acquainted myself with it as I have been teaching a module on online assessment on a Master’s Degree course and started a British Council research project on the impact of IELTS on forced migrants.

Through these experiences, I have realised that the validity of our assessment and of formal language exams extends much further than our classrooms, with tangible consequences on students’ lives and our societies. This is how I got interested in validity: what does it mean for a test to be valid and how do language tests impact students and our society at large?



What is validity?

Broadly speaking, like Mark Smith reminds us in Unmasking the validity of exams, validity in language assessment refers to the extent to which a test measures what it is supposed to measure. You may also have heard of this being referred to as construct validity.

This may sound a little theoretical, but it is actually quite a practical concept: if you consider your everyday formative and summative assessment practices, you may think about whether they assess what you originally set out to assess – and crucially, what the learners think is being assessed. For example, if you have explained to your students that they are going to do a listening comprehension test, listening comprehension is the construct that you are assessing. However, if the test includes long essay questions and you also mark the students’ grammatical accuracy, to what extent is their listening being assessed as opposed to their reading, vocabulary or grammar? By the way, if you are interested in this area, you may like to have a look at Stefen O’Grady’s thoughts on Developing alternative listening assessments!



Beyond construct: consequential validity

But there is more to validity than just construct validity. Indeed, as argued by Messick (1996), validity includes a consequential aspect, which means that we should also evaluate the intended and unintended consequences of how test scores are interpreted and used in real life. Evaluating the extent to which score-based inferences are appropriate, meaningful and useful depends greatly on the social consequences of tests and not just on whether they make sense intrinsically, i.e. they test what they are supposed to test.

According to Weir (2005), consequential validity should focus on three main areas: differential validity, washback and effects on society. Differential validity refers to bias, that is, whether the way that a test is built affects specific groups of candidates. This may be due to their cultural background, background knowledge, cognitive characteristics or other individual characteristics, like their age or gender. For example, if a speaking test requires a student to talk about topics that they have no idea about because they are culture-specific, this may be an issue of bias.



Washback is the impact of assessment on teaching and learning and it is what was traditionally referred to as the impact of language assessment. You may have experienced this yourself: you know that your students have to prepare for a high stakes language test, so you dedicate part of your lessons to preparing them for that specific exam. Although most of us are probably used to thinking about washback in negative terms, however, it does not always need to be negative: for example, in their article Tell your group, David Coniam, Mandy Lee Wai Man and Kerry Howard describe positive washback, as their class had fun while preparing for an oral exam.

Effects on society is the more recent (and, I think, important) part of consequential validity. Indeed, language tests have tangible effects on people’s lives and on society at large. Think, for example, about the real-life consequences of failing to pass a standardised language test that is required for immigration purposes, or the impact on students of failing an end-of-year exam at school or university. Through language tests, language can become a tool for gatekeeping, even though this may not be an intended consequence of the test itself.


Critical Language Assessment

This is what the field of Critical Language Assessment (or Critical Language Testing) studies, starting from the assumption that the act of testing is in itself never neutral, but rather ‘a product and an agent of cultural, social, political, educational and ideological agendas that shape the lives of individual participants, teachers and learners’ (Shohamy, 1998:332).

Some questions coming from Critical Language Assessment that may be worth thinking about are:

  • What models do they promote?
  • What and whose agendas, if any, are behind the tests?
  • To what extent are test scores absolute and prescriptive versus open to discussion and interpretation?
  • What kind of real-life consequences do tests have on test-takers?

These are important issues to consider, especially as the drive for standardised testing in language learning has been increasing for decades: as I mentioned in my piece Taking stock and looking ahead, IELTS alone is taken by over 3 million people a year!

So what are your thoughts on assessment? What impact does assessment have in your context in terms of consequential validity? Please tell us in the comments section!



Messick, S. (1996). ‘Validity and washback in language testing’. Language Testing 13 241–257 London: Sage Journals.

Shohamy, E. (1998). ‘Critical Language Testing and Beyond’. Studies in educational evaluation 24(4) 331–345. Amsterdam: Elsevier.

Weir, C. J. (2005). Language testing and validation: an evidence-based approach. Basingstoke: Palgrave Macmillan.