Archives for category: Teacher Evaluations

Bruce Baker has watched the evolution of the effort to create that magical metric that will identify the best and worst teachers so they may be evaluated, rewarded, warned, and/or fired. He concludes that the great “value-added and growth score train wreck is here.”

Despite the billions that Arne Duncan has thrown into them, and despite the hundreds of millions that Bill Gates has targeted on a few selected districts, they are still shockingly unreliable. Baker writes:

A really, really, important point to realize is that the models that are actually being developed, estimated and potentially used by states and local public school districts for such purposes as determining which teachers get tenure, or determining teacher bonuses or salaries, who gets fired… or even which teacher preparation institutions get to keep their accreditation?…. those models increasingly appear to be complete junk! 

He analyzes the research and experience of several districts and states.

Did it occur to anyone that none of the high-performing school systems in the world are doing this disservice to their teachers?

If we continue to use junk science to rate teachers, who will want to teach?

State Commissioner John King of Néw York has created an educator evaluation system that is untested, one if those planes built in mid-air by uncertified mechanics and unlicensed engineers. If the plane crashes, too bad.

But not every teacher will be subject to King’s burden of paperwork and test-based evaluation! If you happen to work on the boot-camp no-excuses charter chain that King founded, known as Uncommon Schools, you are exempt

More proof that charter schools are not public schools. !

A comment from a reader in Florida:

“I went to an Orange County Florida school board meeting and afterwards told a couple members “I am the world expert on the VAM -potentially” I smiled. They had not even looked at the equation. No one in that room or probably in the administrative building for 180,000 students had even looked at this. What had they almost passed the last meeting? Flunking, then firing all the new teachers in a high poverty school. I told the school lawyer.The attitude? They may care but they have been hoodwinked. They have all bought into voodoo school management. Who would ever want to say about the VAM: “My boss is an algorithm.”

Audrey Amrein-Beardsley of Arizona State University is one
of the nation’s leading authorities on teacher evaluation. She has
the advantage of having taught middle school math for several
years. She understands better than almost any other researcher just
how flawed value-added measurement is.

Next year, her book on the
limitations of test-based accountability will be published.

I invited her to contribute to the blog so you would become familiar
with her valuable work.

She writes:

Stock Your Bunkers with VAMmunition

While “Top Ten Lists” have become a recurrent trend in
periodicals, magazines, blogs, and the like, one “Top Ten List,”
presented here, should satisfy readers’ of this blog and hopefully
other educators’ needs for VAMmunition, or rather, ammunition
practitioners need to protect themselves against the unfair
implementation and use of VAMs (i.e., value-added models).

Likewise, as “Top Ten Lists” typically serve reductionistic
purposes, in the sense that they often reduce highly complex
phenomenon into easy-to-understand, easy-to-interpret, and
easy-to-use strings of information, this approach is more than
suitable here whereas those who are trying to ward off the unfair
implementation and use of VAMs do not have the VAMmunition they
need to defend themselves in research-based ways.

Hopefully this list will satisfy at least some of these needs. Accordingly, I
present here the “Top Ten Bits of VAMmunition” research-based
reasons, listed in no particular order, that all public school
educators should be able to use to defend themselves against VAMs.

1. VAM estimates should not be used to assess teacher
effectiveness. The standardized achievement tests on which VAM
estimates are based, have always been, and continue to be,
developed to assess levels of student achievement and not levels
growth in student achievement nor growth in achievement that can be
attributed to teacher effectiveness. The tests on which VAM
estimates are based (among other issues) were never designed to
estimate teachers’ causal effects.

2. VAM estimates are often
unreliable. Teachers who should be (more or less) consistently
effective are being classified in sometimes highly inconsistent
ways over time. A teacher classified as “adding value” has a 25 to
50% chance of being classified as “subtracting value” the following
year(s), and vice versa. This sometimes makes the probability of a
teacher being identified as effective no different than the flip of
a coin.

3. VAM estimates are often invalid. Without adequate
reliability, as reliability is a qualifying condition for validity,
valid VAM-based interpretations are even more difficult to defend.
Likewise, very limited evidence exists to support that teachers who
post high- or low-value added scores are effective using at least
one other correlated criterion (e.g., teacher observational scores,
teacher satisfaction surveys). The correlations being demonstrated
across studies are not nearly high enough to support valid
interpretation or use.

4. VAM estimates can be biased. Teachers of
certain students who are almost never randomly assigned to
classrooms have more difficulties demonstrating value-added than
their comparably effective peers. Estimates for teachers who teach
inordinate proportions of English Language Learners (ELLs), special
education students, students who receive free or reduced lunches,
and students retained in grade, are more adversely impacted by
bias. While bias can present itself in terms of reliability (e.g.,
when teachers post consistently high or low levels of value-added
over time), the illusion of consistency can sometimes be due,
rather, to teachers being consistently assigned more homogenous
sets of students.

5. Related, VAM estimates are fraught with
measurement errors that negate their levels of reliability and
validity, and contribute to issues of bias. These errors are caused
by inordinate amounts of inaccurate or missing data that cannot be
easily replaced or disregarded; variables that cannot be
statistically “controlled for;” differential summer learning gains
and losses and prior teachers’ residual effects that also cannot be
“controlled for;” the effects of teaching in non-traditional,
non-isolated, and non-insular classrooms; and the like.

6. VAM estimates are unfair. Issues of fairness arise when test-based
indicators and their inference-based uses impact some more than
others in consequential ways. With VAMs, only teachers of
mathematics and reading/language arts with pre and post-test data
in certain grade levels (e.g., grades 3-8) are typically being held
accountable. Across the nation, this is leaving approximately
60-70% of teachers, including entire campuses of teachers (e.g.,
early elementary and high school teachers), as VAM-ineligible.

7. VAM estimates are non-transparent. Estimates must be made
transparent in order to be understood, so that they can ultimately
be used to “inform” change and progress in “[in]formative” ways.
However, the teachers and administrators who are to use VAM
estimates accordingly do not typically understand the VAMs or VAM
estimates being used to evaluate them, particularly enough so to
promote such change.

8. Related, VAM estimates are typically of no
informative, formative, or instructional value. No research to date
suggests that VAM-use has improved teachers’ instruction or student
learning and achievement.

9. VAM estimates are being used inappropriately to make consequential decisions. VAM estimates do not have enough consistency, accuracy, or depth to satisfy that
which VAMs are increasingly being tasked, for example, to help make
high-stakes decisions about whether teachers receive merit pay, are
rewarded/denied tenure, or are retained or inversely terminated.
While proponents argue that because of VAMs’ imperfections, VAM
estimates should not be used in isolation of other indicators, the
fact of the matter is that VAMs are so imperfect they should not be
used for much of anything unless largely imperfect decisions are
desired.

10. The unintended consequences of VAM use are
continuously going unrecognized, although research suggests they
continue to exist. For example, teachers are choosing not to teach
certain students, including those who teachers deem as the most
likely to hinder their potentials to demonstrate value-added.
Principals are stacking classes to make sure certain teachers are
more likely to demonstrate “value-added,” or vice versa, to protect
or penalize certain teachers, respectively. Teachers are
leaving/refusing assignments to grades in which VAM-based estimates
matter most, and some teachers are leaving teaching altogether out
of discontent or in protest.

About the seriousness of these and
other unintended consequences, weighed against VAMs’ intended
consequences or the lack thereof, proponents and others simply do
not seem to give a VAM.

This reader notes that Bill Gates admits that we won’t know if his education “stuff” works for a decade. Yet based on Gates’ support for evaluating teachers by student test scores, teachers are losing their jobs. These are real people, who need to feed their children, not data points in an experiment.

Meanwhile, most researchers agree that the metric is flawed, unreliable, and unstable. Thus comes this reader’s suggestion:

“Here’s a suggestion offered only partly tongue-in-cheek.. The Gates Foundation should be joined as a repondent in every suit filed for wrongful termination as a consequence of any Gates sponsored or designed teacher evaluation system. What could be more quintessentially American than that? Moreover, when suing, it’s always better to go after the party with deep pockets. After all, should cash-strapped urban districts that have had these evaluation systems forced upon them be held solely liable?”

In the early years of this century, Bill Gates felt certain
that he knew how to fix the nation’s high schools. He pumped $2
billion into breaking them into smaller schools, often Nader the
same roof.

In 2008, he decided he was not pleased with the
results,and he dropped that idea.

Then, he decided that teacher evaluation was broken, and he would use his billions–plus the
billions of Race to the Top–to create a metric that would identify
the best and worst teachers.

He adamantly opposed reducing class size, even though his own children go to a school known for small classes.

His theory was that “bad” teachers identified by his
metric would be fired, while the “best” teachers would get more
money and larger classes. He gave hundreds of millions dollars to
district to develop the measuring stick, but so far there has been
no results.

The federal government, fully on board with the Gates
idea, now has almost every state following agates’ plan. As
Valerie Strauss points out on her blog, Gates
now says
that it will take about a decade to determine whether his latest
hunch actually works.

So far, it has failed to produce a reliable
metric or results anywhere. So far, it has failed wherever it was tried, and billions of dollars have been wasted.

In the meanwhile, real teachers are being fired and losing their livelihood based on Gates’ latest big
idea. Strauss writes: “Hmmm. Teachers around the country are
saddled every single year with teacher evaluation systems that his
foundation has funded, based on no record of success and highly
questionable “research.”

And now Gates says he won’t know if the
reforms he is funding will work for another decade. But teachers
can lose their jobs right now because of reforms he is funding.

In the past he sounded pretty sure of what he was doing. In this 2011 oped
in The Washington Post (cited in Valerie’s post), he wrote: “What should policymakers do? One
approach is to get more students in front of top teachers by
identifying the top 25 percent of teachers and asking them to take
on four or five more students.” The problem with Gates is that he
tries out his ideas as if he were playing with toy soldiers.

Doesn’t anyone around him have the chutzpah to tell him that his
untested hunches don’t work and are ruining the lives of decent
people? Will anyone in his foundation be held accountable for his
latest foray into redesigning the nation’s public schools? I have
some really good ideas for him in my latest book. They have solid
research behind them. They work. They help people instead of
ruining the lives of others. They do no harm. I wish he would read
it. He could leave a lasting legacy of success rather than a long string of costly failures that harmed people who were doing good work.

Gary Rubinstein’s analysis of the charter schools founded by Congressman Jared Polis showed that the schools posted low test score growth. Congressman Polis responded in a comment (posted below) that this was understandable because his charter schools enroll very low-performing students, many of whom barely speak or read English, and many of whom are overage for their grade and far behind. It is understandable, he says, that these kids are not posting big score gains. He also notes that the teachers at his schools are not judged by value-added assessment, given the students they serve.

Congressman Polis is making my case for me but he doesn’t realize it. He should read my book.

He would discover that I support charter schools that enroll the kids who didn’t make it in public schools. They should exist to do what the public schools can’t do. They should exist to help kids who were left behind, not to skim the brightest kids from the poorest communities. Schools should not be closed because of their low scores, and their teachers should not be judged by test scores. Charter schools and public schools should collaborate, not compete. Charter schools should fill a need, as Polis’ schools seem to do, not fight public schools for market share.

If Congressman Polis would read my book, he would see that his are the kinds of charters I endorse.

If he would take the time to familiarize himself with the research on test-based accountability, he would join me in opposing it. He would withdraw his support for Colorado’s SB 191, which bases 50% of a teacher’s evaluation on student test scores. This is one of the nation’s worst, most punitive, and most ignorant teacher evaluation law, based on no research or evidence, just the whim of young State Senator Michael Johnston, ex-TFA. There are good ways to evaluate teachers, and test-based accountability is not one of them. That is why Jared Polis’ charter schools don’t do it.

Since we have now found common ground, despite the fact that Polis called me “an evil woman,” and despite the fact that he stubbornly refuses to apologize for his outburst, I invite him to meet with me in Brooklyn to discuss whether he can overcome his irrational contempt for traditional public schools. Even though he is a billionaire, I will pick up the check for breakfast, lunch, or dinner on one condition: read my book. If you don’t like it, Jared, I will give you your money back. Just promise not to throw it at me.

Here is his comment on the blog in response to Gary’s post:

“Thank you for your post defending the efforts of New America School. New America School (NAS) serves almost entirely NEP (non-English-proficient) and LEP (limited English proficiency) students, many of whom are several grade levels behind when they enter NAS. Nearly all of their students are drop-outs or have major gaps in their education.

“Given that the tests are only available in English, the NAS students have a significant disadvantage.

“A primary metric the school uses to demonstrate success is measuring the acquisition of the English language. Many NAS students are 19 or 20 years old, and only have a 6th grade or 8th grade education prior to entering NAS. Sadly in Colorado students “age out” of public education at age 21, and few students can accomplish 4 or 5 years of learning in 1 or 2 years. But even if they don’t earn a diploma, the students gain functional English language literacy.

“This analysis is a good example of why test scores should not be the only criteria used to evaluate schools or teachers. NAS teachers are hard working and dedicated and have literally transformed lives. To be clear, I support transparency on aggregate test scores, and Mr. Rubinstein is welcome to use that information to make whatever charts he wishes to show that a school is good, bad, or otherwise but it is important to educate the reform community about the importance of alternative education and serving all kids.

“Rubinstein mentions that “Colorado is one of the states that has been most aggressive about tying standardized test scores to teacher evaluations and to school rankings. ” but NAS does not use standardized test scores to evaluate teachers, nor has any kind of “ranking” hurt the school’s effort to fulfill its mission “to empower new immigrants, English language learners, and academically under-served students with the educational tools and support they need to maximize their potential, succeed and live the American dream.”

John Thompson has an excellent post on Anthony Cody’s blog, trying to figure out why the architects of Race to the Top ignored a wealth of social science evidence by demanding more test-based accountability than even No Child Left Behind.

He notes that both Elaine Weiss of the Bolder Broader Approach and the U.S. Government Accountability Office (GAO) take a dim view of RTTT.

Elaine Weiss reviewed the evidence and found that RTTT was not likely to meet its lofty goals. States made promises they could not keep, and RTTT has been accompanied by punitive strategies, conflict, and deprofessionalization of teaching. “Districts heavily serving low-income and minority students, especially large urban districts, face some of the most severe challenges. Tight timelines and lack of resources compound RTTT’s failure to address poverty related impediments to learning. Heightened pressure on districts to produce impossible gains from an overly narrow policy agenda has made implementation difficult and often counterproductive.”

The GAO report found that implementation of teacher and principal evaluation systems were proceeding slowly and problematically. Some districts report that the cost of implementation exceed the value of the award. No one can say with assurance that education has been improved by the DOE’s demand to put even higher stakes on testing.

Of course, test-based evaluation of professionals is bound to be challenging because most teachers are not teaching tested subjects; many are “evaluated” by the scores of their school, or by the scores registered by students in subjects the teachers don’t teach. This is not only a challenge, it is nonsensical.

I have been searching, but I can’t find another nation in the world that is pursuing this means of evaluating teachers. If anyone who reads this knows of one, please let me know.

I am aware of several books that will be published over the next year explaining why teachers should not be evaluated by test scores. Of course, the U.S. Department of Education was warned not to do it. It was warned in a strong letter written by the National Academies of Sciences Board on Testing and Assessment. Here is a key paragraph, warning that value-added measures (VAM) were not ready to be used to evaluate teachers:

In sum, value-added methodologies should be used only after careful consideration of their appropriateness for the data that are available, and if used, should be subjected to rigorous evaluation. At present, the best use of VAM techniques is in closely studied pilot projects. Even in pilot projects, VAM estimates of teacher effectiveness should not be used as the sole or primary basis for making operational decisions because the extent to which the measures reflect the contribution of teachers themselves, rather than other factors, is not understood. Even in pilot projects, VAM estimates of teacher effectiveness should not be used to make operational decisions for teachers with students who have achievement levels that are too high or too low to be measured by the available tests because the estimates for such teachers will be essentially meaningless. Even in pilot projects, VAM estimates of teacher effectiveness that are based on data for a single class of students should not used to make operational decisions because such estimates are far too unstable to be considered fair or reliable. 

The U.S. Department of Education ignored the advice of testing experts, and now, three years after handing out $4.35 billion, there is no evidence that Race to the Top has accomplished anything other than to create massive demoralization among teachers and principals.

Herb Bassett is a teacher in Louisiana. He teaches music, but like Jersey Jazzman, has the ability to understand statistics and how they work in real classrooms.

This is a letter that he wrote about Louisiana’s new teacher evaluation system, which is as incoherent as teacher evaluation systems in other states:

 

State Superintendent John White showed his true colors when he recently praised four FirstLine charter schools that “fell in the top 10 percent of Louisiana schools in terms of improving test scores, yet ranked fewer than 10 percent of their teachers highly effective.
 
‘Amazing results,’ he wrote.”
 
He did not mention that one of the four schools, while ranked in the 99th percentile of improvement, declared 68 percent of its teachers Ineffective. Most of its teachers are now on a fast track to dismissal.
 
In each of the other three schools, at least 69 percent of their Value-added Assessment Model (VAM) teachers ranked Highly Effective, but none received an observation rating of Highly Effective. Not one.
 
If the VAM computer model ranked so many teachers Highly Effective, why could the principals not find at least one example of Highly Effective teaching in an observation?
 
These results clearly do not reflect student achievement or teacher quality. They deserve condemnation, not praise.
 
What does this bode for teachers and students on the coming Common Core assessments? White has predicted that due to the “rigor” of the new standards, achievement scores will go down.
 
As strange as it seems, teachers will not see lower ratings under VAM – even with the dramatic drop predicted for student scores. The VAM computer model simply ranks the teachers from highest to lowest. No matter whether the scores rise or drop dramatically, there will always be a bottom ten percent ranked Ineffective and a top twenty percent ranked Highly Effective. These quotas were set by the Louisiana Department of Education. Yes, the Department arbitrarily decided that ten percent of teachers are Ineffective and twenty percent should be Highly Effective.
 
Then why does the Compass Report show that only four percent of all teachers are Ineffective?
 
The computer model does not rank all teachers. The majority of teachers are not subject to the quotas. The purpose of the Compass Report was to show the discrepancy, and to coerce evaluators of the non-VAM teachers into matching the VAM system quotas.
 
White, however, seems to relish the thought of evaluations that cut short the quota for Highly Effective teachers.
 
Superintendent White now controls the cut-off scores for the achievement levels on the new assessments. Having seen him praise unjustifiably low teacher evaluations, should parents trust him to decide whether their children pass or fail the new assessments?  

A regular reader has posted several comments that seem to
imply that not enough teachers are being fired. Or that a system
with a small number of teachers fired was not up to par, assuming
that there are many “bad” teachers who have not been found out yet.
This seems to be the assumption behind Race to the Top and the
Gates’ approach to evaluation: stack ranking, from top to bottom.
Fire the bottom. I responded that about 40% of teachers leave
within the first five years of starting their job. He asked for
evidence. Good question. Here are two good sources. Ken Futernick
of Wested in Sn Francisco wrote an excellent article called “Incompetent
Teachers or Dysfunctional Systems?”
Matt Di Carlo wrote
a good overview of
the research here.
In no other profession do so many
people exit so rapidly. This suggests to me that states and
districts should have high standards for hiring teachers and then
should mentor new teachers, build a collegial culture, and make
sure that retention is a goal. We make a huge mistake with the new
evaluation systems, which seem intended to find and fire weak
teachers. The goal should be to make teachers better, if they are
willing to be helped. Churn is bad.