Audrey Amrein-Beardsley of Arizona State University is one
of the nation’s leading authorities on teacher evaluation. She has
the advantage of having taught middle school math for several
years. She understands better than almost any other researcher just
how flawed value-added measurement is.

Next year, her book on the
limitations of test-based accountability will be published.

I invited her to contribute to the blog so you would become familiar
with her valuable work.

She writes:

Stock Your Bunkers with VAMmunition

While “Top Ten Lists” have become a recurrent trend in
periodicals, magazines, blogs, and the like, one “Top Ten List,”
presented here, should satisfy readers’ of this blog and hopefully
other educators’ needs for VAMmunition, or rather, ammunition
practitioners need to protect themselves against the unfair
implementation and use of VAMs (i.e., value-added models).

Likewise, as “Top Ten Lists” typically serve reductionistic
purposes, in the sense that they often reduce highly complex
phenomenon into easy-to-understand, easy-to-interpret, and
easy-to-use strings of information, this approach is more than
suitable here whereas those who are trying to ward off the unfair
implementation and use of VAMs do not have the VAMmunition they
need to defend themselves in research-based ways.

Hopefully this list will satisfy at least some of these needs. Accordingly, I
present here the “Top Ten Bits of VAMmunition” research-based
reasons, listed in no particular order, that all public school
educators should be able to use to defend themselves against VAMs.

1. VAM estimates should not be used to assess teacher
effectiveness. The standardized achievement tests on which VAM
estimates are based, have always been, and continue to be,
developed to assess levels of student achievement and not levels
growth in student achievement nor growth in achievement that can be
attributed to teacher effectiveness. The tests on which VAM
estimates are based (among other issues) were never designed to
estimate teachers’ causal effects.

2. VAM estimates are often
unreliable. Teachers who should be (more or less) consistently
effective are being classified in sometimes highly inconsistent
ways over time. A teacher classified as “adding value” has a 25 to
50% chance of being classified as “subtracting value” the following
year(s), and vice versa. This sometimes makes the probability of a
teacher being identified as effective no different than the flip of
a coin.

3. VAM estimates are often invalid. Without adequate
reliability, as reliability is a qualifying condition for validity,
valid VAM-based interpretations are even more difficult to defend.
Likewise, very limited evidence exists to support that teachers who
post high- or low-value added scores are effective using at least
one other correlated criterion (e.g., teacher observational scores,
teacher satisfaction surveys). The correlations being demonstrated
across studies are not nearly high enough to support valid
interpretation or use.

4. VAM estimates can be biased. Teachers of
certain students who are almost never randomly assigned to
classrooms have more difficulties demonstrating value-added than
their comparably effective peers. Estimates for teachers who teach
inordinate proportions of English Language Learners (ELLs), special
education students, students who receive free or reduced lunches,
and students retained in grade, are more adversely impacted by
bias. While bias can present itself in terms of reliability (e.g.,
when teachers post consistently high or low levels of value-added
over time), the illusion of consistency can sometimes be due,
rather, to teachers being consistently assigned more homogenous
sets of students.

5. Related, VAM estimates are fraught with
measurement errors that negate their levels of reliability and
validity, and contribute to issues of bias. These errors are caused
by inordinate amounts of inaccurate or missing data that cannot be
easily replaced or disregarded; variables that cannot be
statistically “controlled for;” differential summer learning gains
and losses and prior teachers’ residual effects that also cannot be
“controlled for;” the effects of teaching in non-traditional,
non-isolated, and non-insular classrooms; and the like.

6. VAM estimates are unfair. Issues of fairness arise when test-based
indicators and their inference-based uses impact some more than
others in consequential ways. With VAMs, only teachers of
mathematics and reading/language arts with pre and post-test data
in certain grade levels (e.g., grades 3-8) are typically being held
accountable. Across the nation, this is leaving approximately
60-70% of teachers, including entire campuses of teachers (e.g.,
early elementary and high school teachers), as VAM-ineligible.

7. VAM estimates are non-transparent. Estimates must be made
transparent in order to be understood, so that they can ultimately
be used to “inform” change and progress in “[in]formative” ways.
However, the teachers and administrators who are to use VAM
estimates accordingly do not typically understand the VAMs or VAM
estimates being used to evaluate them, particularly enough so to
promote such change.

8. Related, VAM estimates are typically of no
informative, formative, or instructional value. No research to date
suggests that VAM-use has improved teachers’ instruction or student
learning and achievement.

9. VAM estimates are being used inappropriately to make consequential decisions. VAM estimates do not have enough consistency, accuracy, or depth to satisfy that
which VAMs are increasingly being tasked, for example, to help make
high-stakes decisions about whether teachers receive merit pay, are
rewarded/denied tenure, or are retained or inversely terminated.
While proponents argue that because of VAMs’ imperfections, VAM
estimates should not be used in isolation of other indicators, the
fact of the matter is that VAMs are so imperfect they should not be
used for much of anything unless largely imperfect decisions are

10. The unintended consequences of VAM use are
continuously going unrecognized, although research suggests they
continue to exist. For example, teachers are choosing not to teach
certain students, including those who teachers deem as the most
likely to hinder their potentials to demonstrate value-added.
Principals are stacking classes to make sure certain teachers are
more likely to demonstrate “value-added,” or vice versa, to protect
or penalize certain teachers, respectively. Teachers are
leaving/refusing assignments to grades in which VAM-based estimates
matter most, and some teachers are leaving teaching altogether out
of discontent or in protest.

About the seriousness of these and
other unintended consequences, weighed against VAMs’ intended
consequences or the lack thereof, proponents and others simply do
not seem to give a VAM.