Measuring Up:
What Educational Testing Really Tells Us
by Daniel Koretz
Harvard University Press
368 pages
Toward the beginning of Measuring Up: What Educational Testing Really Tells Us, Daniel Koretz offers a simple principle: a test is not a synonym for what a child has learned, but one piece of evidence, useful in some limited ways, but “unavoidably incomplete and error prone.”
I could call the rest of this very good book critical apparatus or, more accurately, artful and expanding variations on this theme. The expansion is not padding, however. The book may be a bit longer than people knowledgeable about testing and educated in statistics might like, but Koretz’s intended audience is not specialist or specialized. Thus the examples he supplies to define terms like “domain” (he offers a set of words for a vocabulary test) bring the idea home clearly to general readers and is worth the little bit of extra weight.
His premise is that testing — which we freight with the concept of “accountability”— is really badly misunderstood despite having been around for well over a century and having stood front and center in American education since the end of the second world war. He is unhappily aware that people want tests to give simple answers — for example, students are learning more math than they did three years ago or more than other eighth-graders learned three years ago. But as a psychometrician, he also knows that a test cannot tell you something that blunt because even the best tests do not justify such inferences, or so rarely that our conclusions cannot be valid or justifiable.
He observes, again early in Measuring Up, that we do not dismiss the “arcana” of medicine and brake repair because we do not understand them, but routinely dismiss the technical complexities of academic testing. We may not know what our doctor and mechanic know, but we accept they know things we do not and trust them to apply their knowledge in a reasonable and plausible way, usually by eliminating what is highly unlikely or unwarranted by the diagnosis, then moving towards the more compelling. His apt example for this is Car Talk on Public Radio where Tom and Ray move in tightening circles around a caller’s problem and reach a reasonable, but not definitive, conclusion. If we are willing to trust our cars (and our lives) to this approach, why should we not trust our children’s education to the same sensible criteria?
The answer, of course, is largely political. Various laws and movements, most recently No Child Left Behind, have commanded accountability which is interpreted to mean children are learning more. If they are not, then off with the teachers’ heads — and for good measure the principals’ and the superintendents’, too. But to understand a test, one must also understand measurement error, reliability, and validity, among many other arcane ideas, as terms of art and must also understand how slippery they can be and how easy it is to construct a bad test or to misinterpret a good one.
Consider the word “proficient.” It appears in analyses of test scores as the basic acceptable level of — forgive the jargon — learning outcomes. Yet Koretz illustrates how aleatory or simply out-of-left-field the definition of “proficient” may be on any given test. Perhaps more serious is his observation, with all things being equal, that no test can measure all goals or tell you why a student has learned more — or appears to be learning more.
Tests cannot necessarily give either of these pieces of information. A test may indicate that a student has improved his mastery of algebra since the last test — but what do you do if his class-room performance has not improved at all? A test and classroom performance both showing improvement do not mean that the teachers are doing a better job, but perhaps the student is getting professional or parental tutoring. This is hardly splitting hairs. If a school system is rewarded or punished because of students’ score on high-stakes tests, it is not particularly logical to conclude that it is teaching, and only teaching, that has made the difference. Teaching may have contributed — and to that extent the teachers and the schools are entitled to some reward, e.g., they get to keep their jobs. But outside factors — social, familial, intellectual — may play a role, and testing cannot reveal them, let alone evaluate how important they have been or even how they have worked.
What Koretz reveals, it seems to me, is a great deal about the assumptions concerning what tests can tell us and a great deal of arbitrariness in their use and interpretation. If we — the public — decide to place all our bets on a test, we are clearly making a false assumption. If we place bets on the meaning of proficiency or the difference between 600 and 700 on an SAT without knowing what percentile difference this represents or how great a standard deviation it may be, we really learn nothing of any use to us.
Koretz is plain in stating that he is not against testing and not against educational accountability. One of his best variations is “Don’t treat any single test as providing the ‘right,’ authoritative answer. Ever.” Always, he cautions, try to find other sources of information. If this is not possible, consider the test to be “a snapshot of performance, necessarily incomplete and probably modestly different from what you would obtain if you had another, also reasonable measure.”
This suggests that Koretz sees value in testing and in what testing leads to in the United States, which, as I understand it, is higher education. In this sense, his book is a useful brick to throw at The Bell Curve by Charles Murray and Richard Herrnstein who maintained that few children were worth educating beyond high school. Koretz may be too sanguine about how valuable long-term education is for all, but his implied case is strong.
I wish his conclusion about sensible uses of testing were somehow firmer and directly addressed at those who make educational policy. He switches from talking to the general public, policy-makers, and academics. Asking for the answer, naturally, would be absurd, but something a little meatier might direct those in the political-educational establishment better.
And a last melancholy thought. The ideas are so well written, well intended, and well explained that I fear they will not seem — paradoxically — arcane enough, “scientific” enough to be widely accepted. Moreover, to accept Koretz’s good ideas would mean rejecting a couple of generations of half-notions, received knowledge, and conventional wisdom. But — who knows? — perhaps Measuring Up is tougher than those formidable, retrograde foes.
Stephen Joel Trachtenberg, chairman of the higher education practice at Korn/Ferry International and President Emeritus and University Professor of Public Service at The George Washington University.