Value Added, Value Lost?

Value-added testing is enjoying increased popularity. But will this new approach help children learn better?

By Gerald W. Bracey

Educators often have difficulty specifying what is “good.” They have much less trouble with the concept of “better.” To get “better,” they assess the current state of affairs, take that as a baseline and try to improve on it. This likely explains the great interest in the “value-added” model of teacher effectiveness, constructed by William Sanders at the University of Tennessee, which has been in place in that state since 1992.

Sanders claims to have developed a technique for identifying those teachers who make kids “better” – they add value to the children by increasing the children’s test scores. Some of the results are impressive: Children who have three consecutive years of what Sanders calls effective teachers have sharply rising test scores; students stuck with three years of ineffective teachers have plummeting test scores.

Behind this apparently simple and precise outcome, though, are difficulties and uncertainties. To begin with, since teachers are defined as effective on the basis of their ability to produce test-score changes, it should not surprise us that children who have such “effective” teachers sequentially would have rising test scores. It’s circular.

More importantly, the entire model depends on the acceptance of multiple-choice tests as adequate and appropriate measures of educational outcomes and acceptance of changes in test scores as the sole indicators of effectiveness. A much more appropriate label for the teachers who change test scores would be “test-effective.”In practice, Sanders works only with norm-referenced standardized tests because they produce the extended scale he needs for his analysis, namely, percentile ranks that run from 1 to 99. A scoring system like the 5-point Advanced Placement test scale or many states’ scales for writing assessment would be too broad, though in theory, performance tests could yield percentile ranks.

Sanders and his colleagues have argued that multiple-choice questions can measure higher-order thinking and complex knowledge. They can, but only in rare settings. Usually they don’t, and norm-referenced commercial tests used by Tennessee (the California Achievement Test) and being considered elsewhere definitely do not assess higher-order thinking.

Not only does the model rest on multiple-choice questions, itr equires that every child be tested every year in every subject. It is thus a budget drain on most school systems, but a boon for testmakers.

What is badly lacking from the Sanders model is any research on what the test-effective teachers actually do to improve test scores. For the kinds of skills tested in the elementary years, the teachers might well be relying on “drill and kill” work sheet activities. Such activities will indeed raise test scores, but few educators would call their use good pedagogy. Indeed, as summarized by psychometrician Robert Linn, the knowledge shown by gains in test scores does not generalize. That is, the skills are specific to the test. They do not transfer and do not represent increases in general achievement.

NO INDEPENDENT EVIDENCE

Also missing from the model is any independent evidence that the test-effective teachers are perceived as generally effective by parents, administrators and other teachers. Everything hangs on test scores.

Where such tests are important in accountability schemes, teaching to the test will be even more prevalent. Important learning that is not and often cannot be measured with multiple-choice tests will not be counted in the determination of value gained. For schools that focus on more expansive and richer areas of learning, “value added” cannot represent what the school is trying to accomplish. In short, while ostensibly a means to assess progress in learning, “value added” reinforces the most narrowing aspect of testing – thereby reducing, not increasing, real value.

There are also possible problems with the model itself. We must say “possible” because Sanders has refused to tell anyone how it works. Indeed, he has contracted with a private firm to provide his analysis to school systems for a fee. This has greatly angered assessment experts who would like to know how the model works in order to improve on it, debunk it, or simply explore its possibilities and limits as is customary in the open world of research.

For instance, Sanders claims that because his model rests on prior test scores, it removes the impact of socio-economic status. That is, because it calculates changes in scores from, say, grade three to grade four, he argues it is measuring changes independent of where the child started from. This is debatable – some would argue that the effects of socio-economic status are ongoing. Without access to the model, though, no debate is possible.

Suspicions about the utility of this particular value-added model are increased by reports that crucial teacher quality statistics are unstable. A teacher who is very effective one year, might not be the next year. This raises fundamental and vexing questions about the model’s accuracy. Again, in the absence of open and scholarly debate, these questions cannot be addressed.

Finally, there are educational implications of the process that go beyond Sanders’ specific model. As former U.S. Commissioner of Education Harold Howe put it in a letter written to the Washington Post, “In my view, what is really happening in our schools is that the worship of accountability so dominates every aspect of learning that it is narrowly defined into what can be measured conveniently by standardized tests. Sanders appears to be giving it an effective but tragic boost.”

Gerald Bracey is an educational consultant and writer based in Alexandria, Va. The above is reprinted with permission from the August issue of the FairTest Examiner.

NO INDEPENDENT EVIDENCE

Included in:

Volume 15, No.1

Fall 2000