Computers are taking over testing.
Such is the fear of those who worry that greater automation will lead to a shift in educational values from individuality to conformity. It’s also the hope of those who envision a future where a computer is able to offer students an exam so individualized that even the question content matches their personal interests. The state of computerized testing and preparation suggests that the testing industry is on track to a future that blends these two visions. Some examples:
- Both College Board (the maker of the SAT) and ACT are rolling out computer-based testing of their signature exams in areas offering state-sponsored testing. Number 2 pencils could soon be relegated to the dustbin of history as students key in answer choices.
- Khan Academy, in partnership with College Board, has implemented a feature for students called “Signal Check,” whereby students receive computer-generated feedback on their essay writing. Teachers can lay down their red pens as students turn to an algorithm for writing advice.
- ACT recently announced that it will soon begin using automated item generation (AIG) software, aptly named IGOR—Item GeneratOR—to increase the speed of writing new tests. Test makers can eschew the blank page as a program writes algebra problem after algebra problem.
The question Compass is regularly asked is how soon will these tests be entirely computer-based?
The growth in the field of educational technology over even the last few years suggests that the answer should be VERY SOON. The reality is probably more like soon-ish. A closer look at each of these examples reveals technology that is slightly less than reliable because of the risks that face any online system and the extent to which human intervention is still required.
Let’s take online testing to start. ACT and College Board now offer limited computer-based testing at schools contracted through state or district-wide testing. It certainly feels like both companies are moving exams from paper to computers in the near future. But three factors slow this process: security, access, and reliability.
Both ACT and College Board are beset with claims that they are not doing enough to protect their tests and that hacking and cheating are rampant. During the redesign process for the SAT, hundreds of questions were leaked to Reuters reporters. International testing centers have been shut down with alarming frequency over the past few years. Both ACT and College Board have suggested that the move online would increase security: if there are no paper booklets, they argue, there will be nothing to steal. Such a stance begs for evidence. How could either organization reasonably safeguard against the digital attacks that have had such massive effects on businesses and institutions in the past year alone? With a motivated population of students, domestic and international, and the potential high stakes of their performance, it’s far more likely that increased online availability will only make the SAT and ACT more of a security risk.
Even if College Board and ACT were able to marshal resources to secure their exams, they still wouldn’t have the capability to provide equitable access to every student who wanted to take an exam online. It’s not like they’re walking into a system where technological access is evenly distributed in schools across the country. If the tests go fully online, how do schools without resources provide their students the hardware required to securely take an exam? The momentum has been behind providing students testing opportunities during the school day or at least at their own school on the weekends. Moving student from schools to SAT test centers (in the format of GRE, for instance) would run counter to that trajectory.
And then there’s network reliability. Technical issues outside of either the school or test provider’s control can skew results. Take South Carolina for example. Students who took a computer-based version of the ACT ended up having to retake the exam because Amazon Web Service (AWS), which powered the software, went down, freezing the test intermittently for students. It’s hard to take a speeded test when the computer itself is slowing you down.
None of these issues are insurmountable on their own, but in concert they certainly become harder to manage as online testing scales. Increasingly, the benefits of online testing appear less compelling than the stability offered by the paper and pencil method that has worked reasonably well for so long (the occasional lost or stolen batch of tests aside).
If online testing is further off than we might expect, surely there are other areas where technology can support test preparation.
College Board and Khan Academy have been pitching their automated essay feedback for the last couple of years. Here’s how it works: educators take a large sample of essays about a single prompt and rate them across the three areas for which students receive scores: Writing, Analysis, and Reading. Then a committee arrives at a set of comments that is meant to be both instructive and encouraging to students. All of this information then gets fed into the computer, which analyzes the essays for patterns. It’s not so much that the program “reads” the essay, it’s more that it is trained to recognize syntactical patterns and the repetition of words and phrases. These patterns are then associated with both the human-derived scores and the comments appended to them.
After playing around with the program, I think it’s not unfair to underscore the fact that the essay is not being read in the same way it would be by professional graders. For instance, if in the midst of the essay, you were to write that x is “evidence of the author trying to draw a comparison between opposing positions,” you might receive a comment like: “Analysis: You’re on your way here with a description of the author’s choices. As a writer, you can’t necessarily assume that your reader will draw the same conclusions you have drawn for the evidence, so your job now is to show how this evidence works to support the argument.” This is generic feedback couched in personalized terms and likely stems from your use of “evidence” and “the author trying,” terms the program has learned to associate with the analysis score. The system doesn’t, and can’t, know that you did go on to show how the evidence works to support the author. The computer is not offering unique feedback; it is associating feedback via comparison to other essays. The more students write in a standardized format using similar language choices, the more likely the program will offer relevant feedback.
ACT’s turn to auto-generated items works similarly. A first glance at their press release suggests that ACT is relying on computers to develop new test forms. But that’s not quite right. What ACT will do is feed a ton of human-written questions and their metadata into a computer so that the computer can learn what the qualities are of an ACT question. From this data set it generates similar items. You can imagine a computer scanning a reading passage and offering items that ask “In line x, the word [vocabulary word] most nearly means” with four options. And then doing that over and over again.
But it’s not like these questions go straight from the computer to the test form. They are run through an extensive human review process. The more questions generated by computers, the fewer test writers required to stare at a blank page for inspiration.
Signal Check and IGOR require human intervention both before and after they run. In both cases the initial writing, whether an essay or a question, is done by people and evaluated by people. Given enough data, the computer can simulate this process to an extent. But it still requires people to review the questions or interpret the feedback.
Perhaps someday test takers will face an exam that is manufactured by computers from top to bottom, but for today’s high school students, it’s a safe bet to assume that humans will be working their analog magic at the edges of testing for some time to come.