A few weeks ago, I posted a video of David Berliner’s speech in Australia, in which he explained why teachers and teachers’ education programs should not be evaluated by standardized test scores. This, as you know, is the policy that was the centerpiece of the failed Race to the Top. Its main effect has been to create teacher shortages; many experienced teachers have left the profession and enrollments in teacher education programs has sharply declined since the introduction of “value-added modeling” (VAM).
Audrey Amrein-Beardsley has done all of us a favor by transcribing Berliner’s speech. You can find it here.
Here are a few (not all) of his reasons:
“When using standardized achievement tests as the basis for inferences about the quality of teachers, and the institutions from which they came, it is easy to confuse the effects of sociological variables on standardized test scores” and the effects teachers have on those same scores. Sociological variables (e.g., chronic absenteeism) continue to distort others’ even best attempts to disentangle them from the very instructional variables of interest. This, what we also term as biasing variables, are important not to inappropriately dismiss, as purportedly statistically “controlled for.”
In law, we do not hold people accountable for the actions of others, for example, when a child kills another child and the parents are not charged as guilty. Hence, “[t]he logic of holding [teachers and] schools of education responsible for student achievement does not fit into our system of law or into the moral code subscribed to by most western nations.” Related, should medical school or doctors, for that matter, be held accountable for the health of their patients? One of the best parts of his talk, in fact, is about the medical field and the corollaries Berliner draws between doctors and medical schools, and teachers and colleges of education, respectively (around the 19-25 minute mark of his video presentation).
Professionals are often held harmless for their lower success rates with clients who have observable difficulties in meeting the demands and the expectations of the professionals who attend to them. In medicine again, for example, when working with impoverished patients, “[t]here is precedent for holding [doctors] harmless for their lowest success rates with clients who have observable difficulties in meeting the demands and expectations of the [doctors] who attend to them, but the dispensation we offer to physicians is not offered to teachers.”
There are other quite acceptable sources of data, besides tests, for judging the efficacy of teachers and teacher education programs. “People accept the fact that treatment and medicine may not result in the cure of a disease. Practicing good medicine is the goal, whether or not the patient gets better or lives. It is equally true that competent teaching can occur independent of student learning or of the achievement test scores that serve as proxies for said learning. A teacher can literally “save lives” and not move the metrics used to measure teacher effectiveness.
Reliance on standardized achievement test scores as the source of data about teacher quality will inevitably promote confusion between “successful” instruction and “good” instruction. “Successful” instruction gets test scores up. “Good” instruction leaves lasting impressions, fosters further interest by the students, makes them feel competent in the area, etc. Good instruction is hard to measure, but remains the goal of our finest teachers.
Related, teachers affect individual students greatly, but affect standardized achievement test scores very little. All can think of how their own teachers impacted their lives in ways that cannot be captured on a standardized achievement test. Standardized achievement test scores are much more related to home, neighborhood and cohort than they are to teachers’ instructional capabilities. In more contemporary terms, this is also due the fact that large-scale standardized tests have (still) never been validated to measure student growth over time, nor have they been validated to attribute that growth to teachers. “Teachers have huge effects, it’s just that the tests are not sensitive to them.”

Berliner’s speech should be sent to all the members of Congress.
LikeLike
“In more contemporary terms, this is also due the fact that large-scale standardized tests have (still) never been validated to measure student growth over time, nor have they been validated to attribute that growth to teachers. “Teachers have huge effects, it’s just that the tests are not sensitive to them.”
Nor is the concept of “growth” associated with the ups and downs of test scores a meaningful concept. Period.
A test score does not “grow.” It is a number. it can be higher or lower than another, but kids do not shrink or stop growing if they get a lower test score on one test that they made on another. The kids are not flatlining in “growth” if some formula has aggregated a zillion scores and determined a particular student has “not met expectations.”
Same for teachers. They do not shrink or gain in height, lose or gain weight, or get older or younger according to the rise or fall of the test scores of their students, or the scores on those teacher observation forms or those student survey forms..not one of these valid across all subjects, every grade, every time of day, every season of a year.
Two tests scores from the same subject, usually in the same year, or over several consecutive years are the basis for so-called measures of growth. Forget that the content is not the same on each test, never mind that the tests are not designed and cannot be made to function like a ruler (an interval scale) .
Remove the term “growth” permanently from every discussion of test scores.
Restore the meanings of growth in connection with multiple aspects of human development, not just some degree of agility producing tests scores.
LikeLike
Economists are particularly bad when it comes to the bastardization of terminology.
They regularly co-opt words from other disciplines (eg, biology, math and physics) and use them in totally new (often meaningless) ways.
That’s one of primary reasons that “debating” with these folks is a big mistake because the words mean whatever they want them to mean.
LikeLike
We should never accept the over reliance on blind statistics. Numbers can be misused and abused to suit a purpose or a bias. In the case of VAM, it has the potential to ruin careers or impact a teacher’s income. Statistics can be twisted and cherry picked as Jersey Jazzman has pointed out in his many blogs that analyze the false assertions of reformers. As for medicine, do we really want to penalize a hospital like Columbia Presbyterian because it may have higher mortality rates than other institutions for certain procedures because their doctors take on tough cases that are rejected by other hospitals? In the same way do we applaud charters that only accept the easiest and cheapest to teach while we punish public schools with the highest levels of poverty? Education is not just about numbers that can be manipulated to serve an agenda.
LikeLike
I listened to this again today and took notes… in his 14 points he has two distinct concepts : (a) vertically equated tests and (b) instructional sensitivity.
In previous comments I may have blurred these ….
David Berliner said that Cronbach has pounded validity into his head.
even the test makers are saying they don’t have the appropriate information to show tests are reliable and valid as in this statement : “Although it is possible to create some growth models without vertically scaled tests, there is disagreement among researchers on the accuracy and technicalities on how to do so (CCSSO 2007). – See more at: http://www.centerforpubliceducation.org/Main-Menu/Policies/Measuring-student-growth-At-a-glance/Measuring-student-growth-A-guide-to-informed-decision-making.html#sthash.Q2SW830o.dpuf
———————-
these were two of Berliner’s points. For more information on instructional sensitivity David Berliner cites Popham (if you want a reference on Popham’s definition I will be pleased to supply)
LikeLike
Of course, he has more than TWO ; I was just counting out the two that I may have blurred in my mind (in the past before listening again to David Berliner today).
LikeLike
there were excellent references in the Berliner video; one especially important is Edward Haertel “Reliability and Validity of Inferences about Teaching Baed on Student Test Scores” Angoff Memorial Leture, Washington DC.. March 2013 (available through ETS).
LikeLike