Richard Atkinson and Saul Geiser recently published a critique of the New SAT in the New York Times. The article raises some interesting points about the SAT — and about standardized testing in general — but it also leaves much out and, in my opinion, contains some serious errors. The context of the article is vital. Much of the piece reworks the criticisms that Atkinson leveled at the SAT when he was president of the University of California and has repeated during the SAT’s various incarnations in the years since.
His 2001 lecture,”Standardized Tests and Access to American Universities,” explained his decision to ask the UC Academic Senate to abandon the SAT (at the time, the SAT I), and depend, instead, upon the SAT Subject Tests (then known as the SAT IIs) — at least until a better, nobler set of exams could take their place. There were two problems with this proposal — the Academic Senate did not adopt it, and Atkinson never explained who would develop (or how) “standardized tests that are directly tied to the college preparatory courses required of students applying to UC.” In 2004, Atkinson and Patricia Palfrey published a paper, Rethinking Admissions, where Atkinson expanded on his criticism of the SAT I and on his support for the SAT IIs. After retiring as president of the UC, he and Geiser produced a valuable — if biased — historical sweep of the admission testing landscape, going back to 1901 — Reflections on a Century of College Admissions Tests. He followed it up with a lecture, the title of which sounds remarkably similar to his May 4, 2015 piece in the Times — “The New SAT: A Test at War with Itself.” [The “New SAT” referenced in 2009 was what we currently call the SAT Reasoning and had been introduced in 2005.] Unfortunately, history, like the Academic Senate, let him down. The SAT Subject Tests continue to decline in relevance. No other public university system required the Subject Tests. And the ACT and SAT had almost 1 million more test takers in 2014 than in 2001.
Most relevant to his current criticism of the SAT, though, is that no suitable replacement has arisen as a test for college admission. In fact, Atkinson’s search for the perfect test seems to depend upon it never arriving. Imagining an ideal test is easier than creating one and far easier than having it accepted. The standard formulation in Reflections is that “test X would have been a tremendous replacement for the SAT had it not been waylaid by problem Y.” There is always a waylayer. The STAR would work, except it doesn’t. The Golden State Examinations would have worked, but they didn’t. The California Standards Tests offered hope, but they were abandoned. The SAT II’s should have been crowned, but they weren’t. The AP’s might work, but they won’t. He likes to quote the creator of the ACT’s philosophy on testing and extols the virtues of the ACT until it crashes on the rocks of norm-referenced testing (NRT) — the same fate ascribed to the revised SAT (a test, it should be pointed out, that will not be offered for 10 more months). Atkinson seems to require a test that will never be required.
The authors get in the standard knocks against a straw man SAT: infamously tricky, puzzle-type items; an essay rewarding sheer verbosity; obscure language; family income; native intelligence. These criticisms — except for the essay dig — are primarily rooted in the original Scholastic Aptitude Test and less and less on the SAT that has evolved over the last twenty years. He admits that the new SAT will be an improvement in many areas, but norm-referencing seems to make all the difference between success and failure. There are many reason not to like the current SAT or to lament the new SAT, but an attack on norm-referencing plays into common misperceptions and underestimates the value of such measures.
We are to believe that criterion-referenced tests (CRT) provide measurement against “fixed academic standards.” Fixed? Criteria migrate. Criteria rapidly migrate when money is on the line. It’s amazing how much smarter students get when federal funding is about to be turned off. There is too often a fundamental misconception that measuring academic progress or academic standards is incompatible with norm-referenced testing. The Stanford Achievement Tests, the TerraNova, the ACT Aspire, the College Board’s Readistep (to become the PSAT 8/9 and PSAT 10) are all Common Core-aligned norm-referenced tests. The quality and quantity of testing items tend to determine how effectively a testing instrument meets its goals.
It is implied that NRT depend exclusively on multiple-choice items — they do not.
“Test designers accomplish this, among other ways, by using plausible-sounding ‘distractors’ to make multiple-choice items more difficult, requiring students to respond to a large number of items in a short space of time, and by dropping questions that too many students can answer correctly.”
Multiple-choice items cannot function without “plausible-sounding ‘distractors’,” and the Common Core-aligned exams that are CRT, such as Smarter Balanced and PARCC, are heavily dependent on multiple-choice. It should also be noted that these CRT do exactly what Atkinson and Geiser lament about the SAT — “[require] students to respond to a large number of items in a short space of time,” (although I’d quibble over how demanding or unrealistic that really is). As someone who heard every tick of the clock as I raced through blue books in college, I can attest that speededness does not end with the SAT.
In Reflections, Atkinson acknowledges the tension between K-12 achievement tests and college admission needs:
“Standards for what is expected of entering freshmen at selective colleges and universities are different and usually much more rigorous than K-12 curriculum standards. They may overlap, to be sure, but they are not the same, and institutional conflicts over standards and testing are probably inevitable for this reason. College and university faculty are right to be skeptical about using K-12 tests in admissions if it means relinquishing control over entrance standards.”
Standards tests that provide only Not Proficient, Proficient, or Advanced or that group students into several levels are hardly sufficient for the needs of selective colleges and universities. The latest batch of K-12 assessment and exit exams concentrate on “college readiness” rather than success at a particular college. Moreover, the tests are often designed with an eye toward public accountability — how well is this state or school district performing — at the expense of actionable information on the individual student.
The article misrepresents how norm-referenced exams can be interpreted.
“By design, norm-referenced tests reproduce the same bell-curve distribution of scores from one year to the next, with only minor differences. This makes it difficult to gauge progress accurately.”
A well-equated exam with a fixed reference group does an excellent job of measuring progress (well, to the extent that any test can be free of the underlying changes in test-taking population). Atkinson and Geiser are too smart to make the innumerate assumption that distributions do not change over time or that the results of criterion-referenced tests cannot fall into bell-shaped curves as well (normal, logistic, and other “bell curve” distributions are common in testing instruments). Criterion-referenced items must usually proceed through the same review, field testing, and statistical gauntlet as norm-referenced items. The selection process has far more overlap than NRT critics admit.
The authors make much of the fact that the bell curve means that a few questions here or there can make a big difference. This is true only to the extent that the scores of the ACT and SAT are misinterpreted. Whether NRT or CRT, test consumers must be aware of the limits of a test and its level of reliability. CRTs have come under the same criticism, as a single question can make the difference between being above or below a cut score. In other words, a student who missed Proficiency by dint of one mistake is considered to be at the same level as a student completely ignorant of the standard being tested. This is an oversimplification, but no more so than the knocks against NRT. The underlying theory behind CRT has made great strides, but cut scores still predominate and those cut scores are not as “fixed” and as accurate as Atkinson and Geiser would have us believe.
The authors are both products of a particularly challenging time in the history of California’s higher education system. One was the first post-Prop 209 president and the other was the administrator charged with overhauling the admission process during that period. Atkinson and Geiser naturally see the SAT through the UC lens. When race-based considerations were eliminated, the UC had to take drastic measures to ensure equal opportunities for California’s residents. Holistic review was expanded and the UC followed the lead of the University of Texas’ automatic admission program by introducing Eligibility in the Local Context, which would ultimately “[draw] qualified students from among the top 9 percent of each participating high school.” [UCOP website] And within the UC system, SAT II scores had proved to be as valuable as SAT I scores in predicting student success (critics, including the College Board, have taken issue with some of Geiser’s research.
SAT scores seemed like an obstacle, so Atkinson hoped to remove them from consideration. The solution did not work even in a UC system with a relatively strong central leadership and a Master Plan that demanded inclusion. It did not work even though the UCs were the only public university with extensive experience with an alternative to the SAT, the SAT II. It is hard to imagine, then, how such a specific solution — dropping the SAT — to a specific problem — recovering from Prop 209 — that was only partially successful (ELC was never brought to the campus level) could be applied to the diverse system of 2,500 colleges and universities with 2-3 million new entrants each year. Should Amherst admit all A students? Should Penn drop the SAT because Atkinson thought that it made sense for the UCs? At most institutions, holistic review has long provided an important bulwark against SAT and ACT dominance. It would be hard to name an institution where high school GPA and course rigor are not far more important than test scores.
Atkinson’s most successful tactic against the SAT may not have been attempting to outlaw it but trying to improve it. By encouraging, in 2002, the College Board to move awkwardly and unwisely in 2005 to introduce SAT Reasoning, Atkinson helped forever hobble the SAT IIs. The College Board took away Writing and added it to a test that didn’t deserve it. SAT Reasoning could never quite find its reason. Now Atkinson wants to criticize the College Board for making the essay optional (it will still be required by most selective colleges). Making it mandatory almost killed them! Is it too much to ask that colleges be given a choice?
In his 2004 paper, Atkinson was laudatory of the — as yet unnamed — SAT Reasoning test.
“The new test will be in use for students entering universities in Fall 2006. In a remarkably short time, university admissions in the US will have undergone a revolutionary change—a change that will affect millions of young people. One of the clear lessons of history is that US colleges and universities, through their admissions requirements, strongly influence what is taught in the nation’s high schools. The most important reason for changing the SAT is to send a strong message to K-12 students, their teachers, and their parents that learning to write and mastering a solid background in mathematics are of critical importance. The changes being made in the test by the College Board go a long way toward accomplishing that goal. Many high schools have already introduced intensive writing programs for students in anticipation of the new essay requirement.”
He fell just short of awarding the College Board a gold star. In the current New York Times piece, he gives the revised SAT its due before coming down hard. It is a norm-referenced interloper!
He previously extolled the virtues of the ACT:
“[ACT founder Lindquist believed that] assessment should flow from standards, not the other way round… [He] insisted that achievement tests can and should measure students’ reasoning skills, albeit those developed within the context of the curriculum. Reflecting Lindquist’s philosophy, the ACT from the beginning has been tied more closely than the SAT to high-school curricula… As the ACT grew into a national test, its content came to be based on national curriculum surveys as well as analysis of state standards for K-12 instruction…The ACT exhibits many of the characteristics that one would expect of an achievement test.”
Except, of course, it is norm-referenced.
Atkinson and Geiser reject the SAT and ACT primarily because they are norm-referenced tests. Yet there is no evidence that criterion-referenced tests will a) work across the higher education landscape and b) improve upon the flaws in the current system.
In Reflections and elsewhere, Atkinson talks almost wistfully of a prelapsarian world where the College Entrance Examination Boards were tests of academic prowess — where concerns about test prep and socioeconomic distortions did not exist. Perhaps he hasn’t read the primary sources deeply enough. The first two decades of the twentieth century were full of laments about “cramming coaches.” “Test prep” had already become a term of derision. There was strong debate about how the boards — tests of “subject mastery” — were reshaping high school curricula, how schools were teaching to the test, and how wealthy students benefited from these subject tests. It sounded, in short, remarkably similar to the debate in the first two decades of the twenty-first century.
Atkinson and Geiser believe that admission tests have a strong signaling effect on K-12 education. Interestingly, the College Board and ACT have worked to solidify the same reputation. But there is little to no evidence that this has been the case beyond anecdotal accounts of English classes working on analogies taking over from Shakespeare. The ACT and SAT have both gone through major changes in the last 25 years. They have been more impacted by the shift to common standards than the other way around. It is questionable whether the SAT or ACT have a strong signaling effect. What is clear, though, is that the primary goal of colleges is to signal “Please attend.” As Atkinson has noted, the needs of colleges are not the same as the needs of the K-12 community.
Criterion-referenced tests mesh with our desire to measure what is valuable and to reconsider those measures when values change. As a society, we can choose to “raise standards” and accept that our students might go from a 75% pass rate to a 70% pass rate. Or we might lower standards. We might dump one standard for another. We might throw out the Common Core. We might switch testing companies from McGraw-Hill to ETS to ACT to College Board. Yet why must admission offices be forced to join in? Should college presidents ratify what legislatures decide? Will one criterion-referenced test really be able to suffice the needs of students, teachers, high schools, departments of education, testing companies, and colleges across the entire spectrum? The recent case of the Biology Advanced Placement exam is illustrative of how dependency on CRT would come with its own set of problems. The AP’s have to be periodically rethought because of changing academic standards and philosophies. The Biology AP went through a particularly painful transition two years ago, and the number of students scoring a 5 plummeted from 19.4% in 2012 to 5.4% in 2013. The changes may have been salutary, but it dispels the notion of fixed academic standards. Pity the institution planning on using AP Bio for admission or hoping to keep a fixed sense of how it stacked up against AP Chem. The added irony is that this criterion-referenced test went from a flat distribution of scores to one resembling a bell curve.
“Norm-referenced tests like the SAT and the ACT have contributed enormously to the ‘educational arms race’ — the ferocious competition for admission at top colleges and universities.”
Atkinson first brought up the arms race in 2001. To the extent that there is an arms race, the 60-80 year old ACT and SAT seem like poor candidates for having kicked it off. The problem with Atkinson’s analogy to the nuclear arms race is that the Cold War was a battle between two superpowers . Higher education is a cooperative and competitive jumble of thousands of institutional players. The competition for admission to elite universities will not be upended through a change that most people would not even recognize as such (even CRTs have scores that can be fretted over!). Oversimplifying the battle as between CRT and NRT or between ACT and SAT or between the current SAT and the revised SAT is to wish away rather than to deescalate.