Sometimes the preconceptions we bring to a testing term can cause confusion. For example, most students are familiar with scaling from tests in school. Some teachers like to challenge students with really hard tests, to the point at which much of a class may only get 40–60% of the questions correct. Teachers, not wanting revolt in the classroom, typically scale (or “curve”) these results. In some cases, they may assign a fixed number of As, Bs, Cs, etc. Scaling can be thought of as mapping one set of values to another. In this case, a C might be 40–45% correct; a B might be 46%–60%. When this mapping is plotted out, it typically forms a “curve.”
This works well in many cases, but what if you are in a French class that just admitted a group of exchange students? If they dominate the top end of the curve, even a good score in this class may be assigned a B or a C. Fortunately, that’s not how scaling is done on the SAT and ACT.
A key point is whether the scale is made based on the results of the current group of testers as in the French class above (what test makers call “within-group norms”), or whether the scale/curve is based on a prior reference group. The latter technique can be practiced by teachers as well as standardized test makers. Perhaps our French teacher has been teaching the class for 20 years and has a highly-tuned sense of what level of performance deserves an A. Rather than issue a fixed number of grades, the teacher determines her scale in advance.
This way, a newly arrived set of transfer students from Paris won’t “break the curve.” These students will presumably receive As, but it will be no harder or easier for the rest of the class to also receive As. The scale has been determined in advance.
In fact, standardized test makers have to be even more precise than the French teacher. They follow strict guidelines when setting their initial reference group and determining the initial scale. Once those things are set, they rarely change because they don’t need to. A 30 on ACT English means the same thing whether it was taken in September 2008 or September 2015. In order to accomplish this feat, one additional concept must be added—equating. Not every test can have the same questions—wouldn’t that make prep easy?—so not every test form can have the exact same difficulty. However, by always mapping performance back to the reference group, ACT can make small adjustments to the scale to smooth away these differences. The math is tricky, but the goals are simple. Make the results of each test date as fair as any other test date and make sure that no student is disadvantaged by the abilities of other students taking the exam.