I heard that some test dates are easier than others – is that true?
A) Sorry, no.
B) Well, technically yes, but it doesn’t matter.
C) For the most complete answer, read on . . .
This issue generates more confusion and is more frequently misunderstood than any other testing topic. It’s an unintuitive concept so it makes sense when students, parents, and counselors are confused.
The problem is when testing “experts” provide bad advice backed by incorrect explanations like the ones we so often repair. In fact, most damage is caused by those who know just enough to be dangerous – big brains who coach the test but who lack a background in standardized test construction. Their mistakenness comes in various forms:
“Avoid this date; you’re competing with motivated seniors then.”
“Take this date; not many smart students show up then.”
The root of confusion is a misunderstanding of how tests are scored and the notion that the pool of test-takers on a given date affects the curve. Curves do not pertain to specific test dates but rather to an entire cohort of test takers in a calendar year. The SAT is not scaled in the colloquial sense of assigning a score to a limited number of students. Instead, it is normed using a reference group – most recently, the class of 1990. Whether a student takes the test in January, June, or October or on a Saturday, Sunday, or special administration, his or her performance is compared to this reference group. This statistical time travel is accomplished through item anchors. The SAT uses its un-scored section to include previously administered items. By having enough of these anchor items, a current tester is placed on the same scale as someone taking the test in the past. This makes the scaling independent of the mix of students taking the exam.
The concept is complex but the explanation and decision-making can be kept simple: never select a test date based on your assessment of the testing pool on that day. Students don’t influence one another; there is no comparative advantage or disadvantage. Now, if an SAT boycott left only the best students showing up for a given SAT, then it would theoretically be possible for every student to score above 2100. In practice, cohorts change little from year to year, which is why the average SAT score is relatively stable.
Common sense is helpful: if there were really advantages to taking the test on certain dates, wouldn’t colleges account for this? Have you heard of a college giving more or less credit for taking the test on a particular date?
Additional confusion stems from different tests having different raw-to-real scales. Of course they do. That has everything to do with the content and nothing to do with the testing pool. Tests can be made similar, but they cannot be made identical. The equating process is designed so that fluctuations in the raw difficulty or performance on a test can be equated back through time and through known tests to the reference group. As the College Board explains:
“Raw scores are placed on the College Board scale of 200 to 800 through a process that adjusts scores to account for minor differences in difficulty among different versions of the test. This process, known as equating, is performed so that a student’s reported score is not affected by the version of the test taken or by the abilities of the group with whom the student takes the test . . . scores earned by students at different times can be compared.”
There is no way to predict these minor differences by month nor is there any point in worrying about them. Scales adjust for small variances in test difficulty, not differences in the testing population from test to test.
Skeptics point to seemingly odd scoring behavior at the tails of the curve, but it’s wrong to limit analysis to the extreme ends. A testing instrument is most accurate in the middle. The lowest performing students get almost everything wrong, the highest performing students get almost everything right, and so it is harder to make distinctions at the ends. That’s why we tend to see minor anomalies in the scale at the extremes. On one math test, one wrong answer might earn a 760 and on another it might earn a 780. This has nothing to do with the ability of the students and everything to do with the test itself.
It’s why we shouldn’t make fine distinctions between similar test scores. Compare two students: one with a true score (the score she would end up with after taking the test 1,000 times) of 760 and another with a true score of 780. Now think about how hard it would be to write the perfect question that the first student gets wrong every time and the second student gets right every time. Yet that is the goal of test construction. The SAT that the folks at ETS dream about would have a perfectly linear scale from 200 to 800. That’s not realistically achievable. Nor is it perfectly predictable, so colleges shouldn’t predict different outcomes from scores so close.
The only reason for a major shift in scoring would be if students collectively and suddenly got much worse or better on the SAT (or gradually but significantly changed, which would call for another re-centering). With the law of large numbers at work (1.6 million per year), we see no large changes in percentiles. Scales make it so it is not easier or harder to achieve a given score on any particular date. When we see an individual student improve, it’s for reasons that pertain to the student: preparation, maturity, experience, motivation, confidence, and cognitive stamina. Changes in the student, not changes in the testing pool.
The College Board reports percentiles based on class years, not individual test dates. Colleges could theoretically calculate percentiles by test date within their own applicant pool, but the whole point of the SAT’s scoring scheme is to prevent colleges from having to worry about this. Technically the College Board uses the score from the final time the student sat for the test, which is actually spring of 11th grade for approximately one-third of students. Again, it’s not relevant. Colleges use scaled scores to compare applicants within a class year. Colleges and the College Board review percentiles from year to year for other reasons.
It’s not light fare, but there are volumes of published research on the SAT and how it is scaled. For true test preparation experts, this is required reading. While it is possible to tutor the SAT without a psychometric background, most advisors should just stick to content review and test skills, and avoid giving bigger picture advice.