Archives for category: Teacher Evaluations

Last Friday, before the winter break, D.C. officials quietly released the news that the D.C. IMPACT evaluation system contained technical errors. It was the perfect time to reveal an embarrassing event, hoping no one would notice. Spokesmen minimized the importance of the errors, saying they affected “only” 44 teachers, one of whom was wrongfully terminated.

But Professor Audrey Amrein-Beardsley explains that what happened was “a major glitch,” not a “minor glitch.” It was not a one-time issue, but an integral part of a deeply flawed method of evaluating teachers. No amount of tinkering can overcome the fundamental flaws built into value-added measurement of teacher quality.

Beardsley writes:

VAM formulas are certainly “subject to error,” and they are subject to error always, across the board, for teachers in general as well as the 470 DC public school teachers with value-added scores based on student test scores. Put more accurately, just over 10% (n=470) of all DC teachers (n=4,000) were evaluated using their students’ test scores, which is even less than the 83% mentioned above. And for about 10% of these teachers (n=44), calculation errors were found.

This is not a “minor glitch” as written into a recent Huffington Post article covering the same story, which positions the teachers’ unions as almost irrational for “slamming the school system for the mistake and raising broader questions about the system.” It is a major glitch caused both by inappropriate “weightings” of teachers’ administrator’ and master educators’ observational scores, as well as “a small technical error” that directly impacted the teachers’ value-added calculations. It is a major glitch with major implications about which others, including not just those from the unions but many (e.g., 90%) from the research community, are concerned. It is a major glitch that does warrant additional cause about this AND all of the other statistical and other errors inherent not mentioned but prevalent in all value-added scores (e.g., the errors always found in large-scale standardized tests particularly given their non-equivalent scales, the errors caused by missing data, the errors caused by small class sizes, the errors caused by summer learning loss/gains, the errors caused by other teachers’ simultaneous and carry over effects, the errors caused by parental and peer effects [see also this recent post about these], etc.).

The “errors” cannot be corrected because the method itself is the problem. The errors and flaws are integral to the method. VAM is Junk Science, the use of numbers to intimidate the innumerate, the use of data to quantify the unmeasurable.

Patrick Hayes is a teacher in Charleston, South Carolina, who is leading the fight to block test-based, value-added evaluations of teachers in that district. As many posts on this blog have iterated and reiterated, most researchers think that VAM is flawed and error-ridden. (Check out Audrey Amrein-Beardsley’s blog VAMboozled and Edward Haertel’s ETS lecture.)

Hayes read about the errors in the Mathematica study of VAM in D.C., and left the following comment:

“This is awful news for DC teachers. Down here in Charleston, it’s the greatest Christmas gift imaginable.

“We’re fighting VAM-based merit pay tooth and nail. Guess who our district hired to do the work?

“Here’s the only question I have: was this what Mathematica had in mind in 2010 when they said that VAM has a 36% error rate?

http://ies.ed.gov/ncee/pubs/20104004/pdf/20104004.pdf

“Is that before or after they foul up the data?

“Tell you what, don’t ask Mathematica. I can tell you from personal experience: they REALLY don’t like talking about that study.

“I know it was before Arne Duncan handed out nearly a billion dollars in grant funding for value-added systems.

“When Mathematica published this, TIF grants were still comparatively small potatoes.

“Funny thing is, Arne’s the one who picked up the tab for that study. His name appears on page 3. Go figure.”

Last Friday, officials at the central office of the District of Columbia Public Schools quietly released the news that the teacher ratings on its highly touted IMPACT system contained errors. It was not clear how many teachers were affected. If you want to bury a policy disaster, the best time to announce it is on a Friday before a long holiday, on the assumption it will be ignored and forgotten.

Researchers have warned for the past three years that grading teachers by the test scores of their students is error-ridden, inaccurate, and unstable. Earlier this year, the distinguished psychometrician Edward Haertel of Stanford warned in a major lecture that value-added scores should not be used as a fixed percentage when evaluating teachers and should have multiple safeguards to avoid error. Did anyone at the U,S. Department of Education or anywhere else take heed? Of course not.

As Valerie Strauss notes in the linked article, this inherently flawed and demoralizing process has been widely accepted (it is a major element of Race to the Top; in addition, states that want waivers from the impossible mandates of NCLB must agree to adopt this procedure, no matter how ill-conceived it is.)

Strauss writes:

“Testing experts have long warned that using test scores to evaluate teachers is a bad idea, and that these formulas are subject to error, but such evaluation has become a central part of modern school reform. In the District, the evaluation of adults in the school system by test scores included everybody in a school building; until this year, that even included custodians. In some places around the country, teachers received evaluations based on test scores of students they never had. (It sounds incredible but it’s true.)”

Only a few weeks ago, Audrey Amrein-Beardsley wrote on her blog that the ratings on the D.C. IMPACT system made no sense.

Now the company that created the rating system has acknowledged the errors.

Let’s see if some enterprising journalist digs into this fiasco.

A reader who teaches in the District of Columbia wrote to ask my advice. He couldn’t understand his evaluation based on the District’s complex IMPACT evaluation system. I sent it to a teacher evaluation expert, Audrey Amrein-Beardsley. She wrote a post trying to make sense of the evaluation report.

Read it here.

A reader writes:

“I’m a special education teacher in New Mexico and I took this year off teaching, for medical reasons. The choice was made easier by the new teacher evaluations. Since my students have significant disabilities, they can not take the state tests. 50% of my evaluation would then be based on how the regular education students, who I do not teach, scored at our school. 25% of my evaluation would be based on my principal’s view of how I contributed to that score (God knows how that we be, since I do not provide instruction to any of the students who take the test). So 75% of my yearly evaluation would have been based on test scores of students that receive no instruction from me. How could this possibly be an effective indicator of student/teacher performance in my classroom?

“All this does is provide an incentive for teachers to work with the wealthiest, highest performing schools, while disincentivizing teachers from working with special needs students or in high risk or low income settings.

“I have no problem with being evaluated or critiqued as a professional, but it needs to be done with some semblance of common sense.”

Surprise! The school leadership of Charleston, South Carolina, has come up with some stale ideas and branded them as “reform.”

Nothing like copying what was tried and failed everywhere else!

The district calls it a “new” program of teacher evaluation, pay for performance, and reconfigured salary structure BRIDGE but in fact it is the status quo demanded by the U.S. Department of Education.

Every Broad-trained superintendent has the same ideas but is tasked with calling them “new” (when they are not), “evidence-based” (when they are not), and “reform” (when they are the status quo, paid for and sanctified by the U.S. Department of Education).

Patrick Hayes, a teacher in Charleston, has launched a campaign to expose the destructive plan of the district leaders, whose primary outcome will be to demoralize and drive away good teachers.

This blogger, the Charleston Area Community Voice for Education, recognizes that the new structure is not new, that it relies on “Junk Science,” and that it is “a Bridge to I Don’t Know Where.”

He writes:

BRIDGE brings into full play in Charleston many of the recent reform strategies and policies, including

  • large-scale testing,
  • using test scores to rate principal and teacher performance (VAM), merit pay,
  • Broad Academy trained leadership (starting with the superintendent), for example

It is important to note that these are the reforms of the last decade or so that have produced little improvement in schools as measured by the same testing and by the recently announce PISA results. These “reforms” are the status quo; in fact, they are not reform at all. As Hayes and others have pointed out, there is no credible evidence to support the effectiveness of these efforts, at least in terms of increased learning or even measuring teacher quality.

Further, the school district has built no case for why do BRIDGE in terms of what we want for our children, teachers, and classrooms. BRIDGE appears to be a large, well-funded ($23.7 million) solution to vague, and even non-existent problems. It is a solution the district apparently intends to impact every classroom and hence every student in Charleston public schools.

Here’s the thing. There are students in all schools who are not learning to their potential. There are also schools that have issues, academic and otherwise, that need addressing. There are also schools and students doing amazingly well.

The success of those students and schools cannot be attributed to evaluation (of teachers, schools, or even the students), nor is there any evidence that evaluation will fix the problems that do exist. Hint: we already know where the problems are. To base a massive restructuring of how schools, teachers, principals, and certainly students do business and spend their days is bogus, and the impacts of flawed, misdirected programs in education usually drive us to a cliff.

The bottom line is this: Charleston County School District has embarked on a very large experiment, called BRIDGE, with vaguely defined goals (except, perhaps raising test scores) with the plan of “let’s see if this works, because we have to do something”. Of course, in science, when you’re out there exploring the unknown, you don’t know what you’ll get.

Perhaps I’m missing the point here, so maybe I need to ask my six year old granddaughter and her teacher and principal, all of whom are doing quite well, thank you.

I would like to hear an answer from the school board and superintendent addressed to Grace (who understand quite a bit) to this question:

Why are you doing this BRIDGE thing?

Go ahead. I dare you.

This is a terrific article that appeared on Huffington Post by Nicholas Ferroni.

He speaks truths that every teacher will understand.

This is what he did this week:

This week of school, like every other week, was pretty normal: I gave out about fifty dollars to various students who didn’t have lunch money; I resolved two teenage relationship issues; I comforted three girls who, for some reason, think they are so ugly that no boys will ever like them; I got three students, who have whispered three words each all year, to speak in front of the class; I paid for four students to join the gym and also offered to train them in order for them to deal with their aggression constructively; I went out of my way to make sure that five of my students, who I know are having problems at home, know that they are intelligent, strong and have so much to offer this world. So, in the education world where you deal with hundreds of uniquely individual teenagers trying to accept who they are, it’s just a normal week. I am not trying to brag because my commitment to my students is not the exception but the norm, especially at the high school where I teach where so many of my colleagues, day in and day out, give their hearts, souls and money to their students without a thought. I also do not want your sympathy because I, like most teachers, went into education for this very reason: to educate, empower and nurture youth.

Yet politicians constantly take pot shots at teachers and try to find a metric to weigh their value, usually with test scores. Teaching is so much more complicated and demanding than test prep, Ferroni explains.

He adds:

Without going into too much off topic, has anyone advocating for teacher evaluation and merit pay ever even consider what impact it will have on the performance of students in the classroom? They are incredibly naïve if they think that the fact that all accountability now lies on a teacher’s performance, and not the student, will not lead students performance to decline. Why would students work harder to excel in the classroom, when they are completely free of any responsibility for their grade? This is ultimately suggesting that each student has no role in their own success or failure in the classroom. Any one of us who has attended school knows that without a doubt that, not only are we responsible for our own academic performance, but that we are far more responsible than our teachers, our parents and even our friends were for our grades.

This brings me back to my opening paragraph; the most important role a teacher plays in the lives of his or her students is not as an examiner, but as a nurturer. Attempting to evaluate a teacher based on standardized tests is like evaluating a doctor solely on whether a patient lives, dies, or is cured. Just as every doctor gives his or her all attempting to save and cure patients, every teacher gives his or her entire self to students (who we treat more like our own children than our students). I can’t imagine a world where teachers are so fearful of losing their jobs because their students, who may be going through so many various and horrible circumstances, that they disregard the emotional role of an educator and focus solely on the academics. I will never tell a student, “Stop crying! I don’t care if you are depressed, or you haven’t eaten breakfast, or your parents beat you. I need you to do your work and study so you do well on your exam, so we meet our district goals and my pay is not garnished!”

Massachusetts released teacher ratings, and lo and behold, all the best teachers seem to be in the most affluent districts.

In Boston, the lowest ratings went to old and minority teachers. The highest ratings went to central office administrators.

EduShyster did not use the phrase “makes no sense,” which is the headline of this post.

Actually, the teacher evaluations do make sense. They are doing what they were designed to do. They are bogus. This is the regime imposed on the state by Jonah Edelman and Stand for Children. Firing old and minority teachers is the civil rights issue of our day? When wil there be accountability for the people who push these demoralizing ideas? Other countries support and develop teachers, helping them improve. We rand and rate and humiliate them to force out those whose kids don’t get high scores and to make room for young college graduates who want to try their hand at teaching for a while until something better comes along.

Edward H. Haertel is one of the nation’s premier psychometricians. He is Jacks Family Professor of Education Emeritus at Stanford University. I had the pleasure of serving with him on the National Assessment Governing Board, after I joined the board in 1997. He is wise, thoughtful, and deliberate. He understands the appropriate use and misuse of standardized testing.

He was invited by the Educational Testing Service to deliver the 14th William H. Angoff Memorial Lecture, which was presented at ETS in March 21, 2013 and at the National Press Club on March 22, 2013.

This lecture should be read by every educator and policymaker in the United States. Haertel explains the research on value-added models (VAM), which attempt to measure teacher quality by the rise or fall of student test scores, and shows why VAM should not be used to grade and rank teachers.

Haertel begins by pointing out that social scientists generally agree that “teacher differences account for about 10% of the variance in student test score gains in a single year.” Out-of-school factors account for about 60% of the variance; many other influences are unexplained variables.

Small though 10% may be, it is the only part of the influence that policymakers think they can directly affect, so many states have enacted policies to give bonuses or to administer sanctions based on student test scores. In Colorado, for example, policymakers have decided that the rise or fall of test scores counts for 50% of the teacher’s evaluation, which will determine tenure, pay, and retention or firing.

Haertel proceeds to demolish various myths associated with VAM, for example, the myth that the achievement gap would close completely if every child had a “top quintile” teacher or if every low-performing student had a top quintile teacher. He notes that “there is no way to assign all of the top-performing teachers to work with minority students or to replace the current teaching force with all top performers. The thought experiment cannot be translated into an actual policy.”

He notes other confounding variables: students are not randomly assigned to classrooms. Some teachers get classes who are easier or harder to teach. Changing the test will change the ratings of the teachers. The advocates of VAM routinely ignore the importance of peer effects, the peer culture of a school in which students “reinforce or discourage one another’s academic efforts.”

He adds: “In the real world of schooling, students are sorted by background and achievement through patterns of residential segregation, and they may also be grouped or tracked within schools. Ignoring this fact is likely to result in penalizing teachers of low-performing students and favoring teachers of high-performing students, just because the teachers of low-performing students cannot go as fast…Simply put, the net result of these peer effects is that VAM will not simply reward or penalize teachers according to how well or poorly they teach. They will also reward or penalize teachers according to which students they teach and which schools they teach in.”

After a careful review of the current state of research, Haertel reaches this conclusion:

“Teacher VAM scores should emphatically not be included as a substantial factor with a fixed weight in consequential teacher personnel decisions. The information they provide is simply not good enough to use in that way. It is not just that the information is noisy. Much more serious is the fact that the scores may be systematically biased for some teachers and against others, and major potential sources of bias stem from the way our school system is organized. No statistical manipulation can assure fair comparisons of teachers working in very different schools, with very different students, under very different conditions. One cannot do a good enough job of isolating the signal of teacher effects from the massive influences of students’ individual aptitudes, prior educational histories, out-of-school experiences, peer influences, and differential summer learning loss, nor can one adequately adjust away the varying academic climates of different schools. Even if acceptably small bias from all these factors could be assured, the resulting scores would still be highly unreliable and overly sensitive to the particular achievement test employed. Some of these concerns may be addressed, by using teacher scores averaged across several years of data, for example. But the interpretive argument is a chain of reasoning, and every proposition in the chain must be supported. Fixing one problem or another is not enough to make the case.”

Please read this important paper. It is the most important analysis I have read of why value-added models do not work. Since Race to the Top has promoted the use of VAM, Haertel’s analysis demonstrates  why Race to the Top is demoralizing teachers across the nation, why it is destabilizing schools, and why it will ultimately not only fail to achieve its goals but will do enormous damage to teachers, students, the teaching profession, and American education.

Please send this paper to your Governor, your mayor, your state commissioner of education, your local superintendent, the members of your local board of education, and anyone else who influences education policy.

 

 

In this article at Huffington Post, Alan Singer has investigated the secret, privately funded apparatus that designs education policy in New York State.

The group is known as the Regents Research Fellows, but they are not subject to any public oversight.

They are appointed by the state commissioner, funded by big foundations, and seem to have more authority than the duly appointed Board of Regents.

It is an unusual arrangement, to say the least, in that the Fellows operate outside the legal framework of state law.

Who are they?

“The initial group of Regents Research fellows included, Matthew Gross, executive director of the fund, who previously recruited business leaders to partner with schools. Gross was originally a Teach for America recruit. Other fellows were Kristen Huff, a former College Board research director who developed their advanced placement and SAT testing programs; Amy McIntosh, formerly CEO of Zagat Survey and a senior vice president at a company that provides business information, previously developed teacher and principal effectiveness strategies for the New York City Department of Education; Julia Rafal, fellow for teacher and principal effectiveness was a TFA graduate and consultant for charter schools; and Kate Gerson. Gerson is promoted as a former New York City teacher and school principal who brings legitimate educational credentials and experience to the table. The reality however is that Gerson only worked for two years at a transfer school for over-aged-under-credited students before leaving for an organization called New Leaders for Schools.

“Later fellows have included Peter Swerdzewski, a psychometrics specialist from the College Board; Joshua Marland, another a psychometrics specialist; Jason Schweid, also recruited from the College Board; Joshua Skolnick, an attorney, assumed Gross’s management and fund raising responsibilities when Gross resigned; TFA graduates Ha My Vu and Joyce Macek; Beth Wurtmann, a television reporter; Jennifer Sattem; Doug Jaffe, a lawyer; Anu Malipatil, a TFA graduate and charter school advocate who also works for the Two Sigma Investment company; and Wendy Perdomo, a New York City DOE bureaucrat with no apparent teaching experience.”

Government in a democratic society should be transparent and accountable. This group is neither.