A month has past since the first glimpse of the new SAT. The buzz has abated and will intensify again once details are disclosed on April 16th. In an attempt to “improve the silence”, I share here what I’m seeing, and not seeing.
Though highlights of changes have been suitably covered in the media, responding discourse has mostly circumvented what should be our central concern. Despite change the SAT has undergone over the years, what the test is has always mattered less than what the test does. Rather than anticipate what the new test will contain, we should more scrutinize what the new test will reliably do: face validity versus predictive validity. If the former were to supplant the latter, the test’s relevance in college admissions would decline.
Face validity is when a test feels “on its face” to measure what is deemed important. Psychometrically, this is the weakest form of evidence; ironic, given the SAT’s move toward testing one’s ability to anchor answers to evidence. That a test seems to measure something doesn’t mean it does. However, face validity is important because it affects public attitudes toward a test. Sound familiar?
Predictive validity is when a score positively correlates with criteria measured in the future. The SAT is only relevant if colleges can rely on its predictive validity. If the new test became less assuring than its ancestors of this promise, its meaningfulness would taper, but we’d need years to know that for sure. Meanwhile, colleges will accept it on its face, or not.
Predictive validity is pertinent because of what we didn’t hear, for once, on March 5th. Previously, we heard that the new SAT would improve its differentiation at its tails. Given what I know about test construction, I was skeptical. Because as more is asked of the SAT, might it do fewer things well? As it soon flexes to do more — by ostensibly testing less, no less — something will have to give. And until we see countervailing evidence, I sense the test’s optics will diminish.
You can’t get finer scores than the questions you have available. Try to imagine a test that at once improves its differentiation and focuses on fewer things — those few things we’ve been sententiously told, “matter most”. Or a test that opens more doors of opportunity and transforms “possibilities for everyone and anyone” and still makes meaningful distinctions across a broad range of applicants. The test is shortening, its breadth narrowing, and its questions may take longer to answer (restricted use of calculators, evidence-supportedanswers, more complex equations and analysis, more reading). And without deductions for wrong answers, the range of possible raw points shrinks further. This redesign is a radical dream and a psychometric nightmare.
So while much focus will be on how these changes will treat test takers, the bigger issue is how they might trammel the test makers. Shorten the test, narrow its focus and perhaps reduce the number of possible raw outcomes — can they really do all this and improve the test’s performance? Or, is this a shift in utility? A surrendering of a once high stakes differentiator, recast as a capstone exam; a test that pushes out rather than pulls in?
And while the noble mission of delivering opportunity to under-represented students is well stated, I fear unintended consequences. The original SAT was meant to reveal latent potential in unlikely places, to discover talent in students who don’t necessarily have classical educations. Over time, the test was said to have lost its way. However, might new changes only further undermine that original goal? It is said the new test will be modeled on work of our best classroom teachers and the most rigorous course work. Who will that most benefit? Might the test become even more yielding to the most advantaged students and even more futile for those most hindered?
If so, then will the test distinguish between two otherwise outstanding applicants as well as it does now? While unpopular to discuss out loud, selective colleges find the conspicuous difference of 300 SAT points between two otherwise comparable applicants consequential. If that difference is obscured, the test becomes less useful to its consumers.
And the footing gets even more dangerous with the test’s handling of sub-populations. That the SAT correlates with family income, for example, has not been a design flaw — it’s been a usage flaw. If the new test were engineered to break that correlation it would earn a pyrrhic victory on its way to defeat. Despite what some think, and even claim, colleges don’t exactly need a test on which sub-populations do better. Rather, they want a test that better identifies which individuals of a sub-population will most likely succeed. A sudden rise in scores among sub-groups is no promise that it will. And if traditionally underperforming groups further underperform, well, then that’s even a bigger problem.
To be fair, we’ve been assured that previous scores will map to new scores, which implies the normalcy of the bell curve will remain apparent. But the alignment of scores, given the narrowing of the test, could involve a “smushing” of scores — a side effect that refutes earlier indications that the SAT will soon get better at distinguishing among students.
For all its limitations, the SAT has endured because it reliably distributes, by design, symmetry of outcomes every year. What that distribution means is another matter. Changes to the test can crown different winners but it can’t crown more winners unless it becomes a test of mastery. And a test of mastery will have either a negative or positive skew, take your pick. AP Exams and Subject Tests exemplify this.
And this is where President Coleman seems understandably torn: his background is in mastery but the mission of the SAT has long been differentiation. Can he yoke these competing ideals? Perhaps to a degree, but that would redefine the SAT and would be a gamble I am unsure member colleges are prepared to take.