High-Stakes Testing: Necessary but Not Sufficient

open notebook taking test

Standardized tests have been both vilified and venerated, and, as with most polarizing issues, the truth likely lies somewhere in the middle.  Despite their well-documented shortcomings, they are widely used in many high-stakes circumstances, from measuring the effectiveness of learning programs to serving as gatekeepers for schools, universities, and employers.  It’s no wonder that test preparation is a multibillion dollar industry, which, paradoxically, calls into question the validity of these tests as true measures of learner ability, aptitude, or achievement.  As it turns out, it is difficult to tell the difference between someone who truly understands the concepts being tested and someone who has simply learned to game the system.

We see this very clearly with the standardized English proficiency tests used around the world for admittance to universities.  Just last month, I met an entrepreneur in Vietnam who told me she could teach anyone to “hack the TOEFL,” which is not far from the claims that many test preparation centers make.  Learners are often able to get the scores they need for acceptance without actually mastering the English skills they need to succeed in an English-only academic environment. They then arrive on campus unable to understand lectures–or, in the case of graduate student teachers, unable to give them.

We aren’t going to get rid of standardized tests anytime soon, but what we can do is take them with a grain of salt.  Or, rather, with some other measures of proficiency, performance, and achievement. Taking multiple measures of something in different ways, or triangulation, is a time-tested way of making sure that our measurement of a complex phenomenon is robust, comprehensive, and complete.

So how does this work with language learning?  First, we should figure out what the standardized test itself is measuring by asking some questions:

  • Is the test evaluating reading, writing, listening, and speaking performance as separate entities or are the skills tested together?  
  • Is the entire test multiple choice?  
  • Are any sections graded by human raters?
  • Does the test use academic language or does content come from some other domain?

Answering these questions will help determine what additional data points to consider in evaluating how well someone can actually use English.  In addition to scores on standardized proficiency tests, we should look at learners’ mastery of domain-specific content. For example, if we’re talking about international graduate students, we’ll want to know if they can understand the language used in academic articles.  Can they explain their own research? Can they give lectures? Including a portfolio of English materials as well as an in-person interview as part of the language evaluation process is a great way to measure these things.

Standardized tests are a blunt instrument that, under the best of circumstances, measure a test-taker’s performance at a single moment in time.  When we are talking about measuring something as complex as language ability, we need to go beyond multiple choice questions or academic essays that follow a prescribed format in order to truly indicate to both the test-taker and the organization requiring the test what the learner is capable of doing.

Katie is Voxy’s Chief Education Officer, which means she leads the teams ensuring that learners are getting the most efficient and effective educational experience possible.  She has a PhD in Second Language Acquisition and years of experience teaching languages, building language courses, and evaluating the effectiveness of language training as a research scientist.  She lectures and writes about all things related to language learning and educational technology.