Archives for category: Teacher Evaluations

Thus far, the concept of VAM–or value-added measurement–has an unbroken record of failure. Wherever it has been tried, it has proven to be inaccurate and unstable. Teacher and student records are erroneous. Teachers are judged based on students they never taught. VAM demoralizes teachers, who understand they are being judged for factors over which they have little or no control.

The major perpetrators of this great fraud are Bill Gates, who bet hundreds of millions of dollars on the proposition that test scores could be a major factor in identifying bad teachers and firing them, and Arne Duncan, who required states to use VAM if they wanted to be eligible to get a share of his $4.35 billion Race to the Top fund.

Yet a third perpetrator was Jeb Bush, whose love affair with data is unbounded. Bush went from state to state selling “the Florida miracle,” which supposedly proved that testing and accountability were the keys to solving America’s educational problems.

One of Jeb’s acolytes was Hannah Skandera, who was chosen as Secretary of Education in Néw Mexico but was never confirmed because of her lack of classroom credentials. As Secretary-designate, she sought to import the Florida model of testing and accountability.

When the state released its new teacher evaluation ratings, teachers and students showed up at the Albuquerque school board meeting to complain about errors. Teachers talked about missing and incomplete data. One student said he was part of a team that placed first in the state in civics, yet he failed his end-of-course government exam.

“James Phillips teaches calculus to Advanced Placement students at Albuquerque High School. He described how the previous week had seen him publicly praised by board member Marty Esquivel, who called him the best math teacher in New Mexico. Just days later, Phillips was notified that the PED had also ranked him “minimally effective.”

“Wendy Simms-Small, a parent of three APS students who’d helped organize the day’s rally, said she started getting active after hearing rumors that hundreds of teachers were planning on leaving the school system.

“I got curious and wanted to find out why,” she said. “As a member of this community over many years, I have never seen the demoralization of professional individuals like this ever before.” She said the pressure of testing had also taken a toll on her kids.

“Private corporations reap great rewards when school systems implement standardized testing,” said Simms-Small, “so it’s my belief that they’re motivated financially to turn our children into pawns for profit.”

At some point, the data-obsessed federal and state policy makers will have to concede that they were wrong or face a massive parent-teacher rebellion. They are literally destroying the nation’s schools with their mad ideas. It is time for a revival of common sense or a public discussion of the true meaning of education.

This is one of the funniest YouTube videos I have ever seen (excluding a few about dogs and cats).

I promise you, you will not be disappointed. Watch it and forward it to your superintendent or your school board. Remember how I always say that VAM is Junk Science. Here is one of the world’s leading assessment experts, and his message mocks VAM as just plain Junk.

It features the great assessment expert W. James Popham as pitchman for a line of products guaranteed to raise your students’ test scores. Some of them are chewable, some are drinkable and last five hours (he warns that if the effect lasts for longer than five hours, call your physician at once).

Please watch this. You will never think about value-added assessments or rubrics the same way again.

Want to know more about Popham? Read this author bio.He is one funny guy.

Thanks to Arne Duncan, almost every state now has an elaborate teacher evaluation plan. There is no evidence that the pans identify teachers correctly, but they are widespread because Duncan believes and he is Secretary of Education, with more certainty than any of his predecessors.

What hath Arne wrought? Here is an account I hope he reads. It shows what a mess he has made in thousands of schools.

I”ve been trying to find the right place to share what I’ve written about the ridiculous evaluation process that occurs in Palm Beach County, FL. Friends who have read it asked that I share it with you, Diane.

My teacher evaluation rant

I’m about to write out the long, stupidly involved story of the truth behind teacher evaluations, specifically at my school, but likely not too different than everywhere.

Five (or more) times in the school year my principal does classroom observations. There are three different kinds: 1 formal, forty minute lesson observation; 2 5-15 minute informal; 2 30 second-2 minute walkthroughs. She evaluates me using the Robert Marzano Menu of Design Questions There are about 60 specific behaviors within 4 different domains, each with 3-7 components, that she is looking for during those observations.

Each of the 60 behaviors is then graded on a scale of: not using, beginning, developing, applying, and innovating. The evaluator, after marking off the components then decides how to grade the behavior. Comments, if appropriate are also added (as in: All of the students were actively engaged in the lesson) At the end of each observation I get an email directing me to approve the observation.

At the end of the year, the grades for each behavior are calculated to determine if the teacher is: Highly Effective (3.2 – 4.0), Effective (2.1 – 3.1), Developing (1.2 – 2.0) or Unsatisfactory (1.0 – 1.1). Last year I was deemed Highly Effective based on my observations. I didn’t really look at the details because I was overly pleased with the results.

I should note that for my first two years teaching this was a new evaluation system. Our district decided that during the learning curve process ALL teachers would be given the same grade/evaluation level of Effective, so the observations were a tool for us to begin to look at where we could stand to improve and what we were already doing well. The fact that I was evaluated as Highly Effective had no bearing on anything, since ALL teachers were graded as Effective. Also, last year only 2 of the 4 domains were observed. This year only 3 were observed.

The added domain is for our own personal professional development. I mention this because we were instructed at the beginning of the year that ALL teachers had to have the same personal professional development goal: to improve student success through implementation of the Marzano Techniques of Teaching (see a trend?). I didn’t want this to be my goal, but I didn’t have a choice (don’t get me started). (Last year my personal goal was to improve my ELL students’ oral language assessments by 50% – I reached that goal and then some).

This year’s evaluation has come back and I am now graded as Effective with an overall score of 3.0. I was wondering how I dropped from Highly Effective to Effective, so I started looking more closely at the numbers. Here is what I saw:

I was marked as Innovating (4.0) for 12/31 behaviors
I was marked as Applying (3.0)for 19/31 behaviors.
I had no lower marks than that.

Now in my world of calculating scores, I would multiply 4 x 12 = 48 and 3 x19=57, then add them together 48 + 57 = 105, then divide by 31 which equals 3.39. 3.39 is Highly Effective, but I was graded as 3.0 – Effective. Hmm. I called my union rep and she was not sure how that could be. She also, for what it’s worth, had a similar score drop. She remembered that there was some ridiculous way to calculate the previous two years, and thought maybe they are doing the same thing this year. Regardless, neither of us knew how our scores were calculated, so we knew we would have to ask higher ups.

I asked my principal. She wasn’t sure how it is done, either. She suggested that we both call/email the woman at the district who is in charge. So I did. This is what I learned: If 50% or more of your marks are Highly Effective, then you are Highly Effective; if 50% or more are effective than you are effective, and so on.

Then I began to wonder, as has my union rep, how is it that I was more effective last year than I am this year? What am I NOT doing now that I did then? It turns out that I am, in fact doing the same things. I was marked as doing the same exact components of behaviors this year as I was for last year. The difference is, last year I was rated as innovating more times. So, for example let’s say Behavior A has 6 components. Last year when I was checked off as meeting all 6 I was deemed innovating. This year those 6 components checked off are only earning me applying. WHY? HOW? Fortunately, I am friends with a number of people who have some real answers.

The answer isn’t pretty, but it’s been corroborated by more than one source. Here we go…

The principals and assistant principals were told that they were *giving out too many innovatings and that they needed to mark innovating less often*. In other words, the evaluation that is supposed to determine our level of teaching, which in turn determines our merit pay (no, we don’t really get merit pay. we’re supposed to, but that’s a whole different – let’s lie to the people of Florida – nightmare) is being manipulated by the powers that be in an effort to…I don’t know…make it seem like teachers aren’t as good as we are. So they can pay us less and blame us more. The powers that be are doing to the teachers what the high stakes testers are doing to our students: creating a system that is skewed to failure (or mediocrity).

I am outraged. Mostly I am outraged that Principals and ASsistant principals, who know this is wrong and are being asked to downgrade their own teachers, are going along with it and not fighting back. The people who told me are in those leadership positions. They know it’s wrong. But they pooh-pooh saying things like, “Everyone knows those evaluations aren’t right, so what does it matter? *I* know who’s great and doing a great job, so the evaluation doesn’t really matter between you and me and the kids. ”

And I almost buy it. Except for this: my evaluation is public recod. Any parent can go to the district and access my evaluation score. Most parents won’t, it is true. Only parents who have a beef with a teacher would do that, as it so happens. But that’s when the difference between effective and highly effective DOES matter. I AM highly effective. I know it. My administrators know it. My students know it. But when a parent who is already certain I am picking on his/her kid or that I don’t know how to teach his/her kid goes to access my records, they see I am Effective, not Highly Effective. It’s fuel for their fodder, which I do not like.

And if I go to work in another district, all the hiring people will see is Effective. I don’t like it. NOt one little bit.

Gary Rubinstein writes in this post about Michael Johnston and his long association with him.

Today Johnston is known in Colorado as the state senator who wrote the most punitive, anti-teacher law in the nation. At present, Harvard students are protesting the invitation to Johnston to speak at commencement

Gary knew him from Teach for America. He describes a young man who understood and cared about his students, who saw the obstacles they confronted, and who appreciated the hard work of veteran teachers.

But something happened to Michael Johnston between 2002 and 2010. The man Gary knew turned into an accountability hawk. He became a harsh critic of teachers.

For a time he was leading the test-and-punish parade, but the parade seems to be in disarray. It is no longer the leading edge but the rearguard.

Michael Johnston was invited to be the commencement speaker at the Harvard Graduate School of Education for 2014, but some students objected and called on the school to withdraw the invitation. That’s not likely to happen, nor should it. The students and graduates should have a chance to debate the issues, to debate the value of the Rhee–Duncan-Spellings style that has long been favored by the Harvard Graduate School of Education. Now is a good time to review the research on value-added measurement. Now is a good opportunity to ask SenatorJohnston what happened to quash his youthful wisdom.

John Kuhn is the superintendent of the small Perrin-Whitt Independent School District in rural Texas. He is an eloquent speaker and supporter of public education. He has spoken at national events and recently published two new books. He knows that the schoolssuffernot only fro budget cuts but from Washington’s wildly unrealistic expectations. He knows it would be nice if every student were bound for college but he knows it is unrealistic and turns success into failure.

This is a wonderful interview with the Texas Tribune. You will enjoy reading it.

This is the last Q&A:

“Trib+Edu: How has your life been different since 2011?

Kuhn: Not a whole lot different in terms of my day-to-day life. I still basically do what I’ve always done for a living and that is work in a rural public school and try to serve my community to the very best of my ability. I’ve been invited to give some speeches here and there and I’ve written a couple of books … I think speaking out like I did put me in a situation to where I’ve been educated in the political reality that affects local schools.

Previously, I just kind of accepted whatever rolled down from Washington, D.C., and whatever rolled down from Austin. I kind of thought the role of a teacher and educator was just to live with dumb policies. And I don’t think that anymore. I think now that I have a moral obligation to speak up and say, “Hey, this policy is dumb. It doesn’t work and this is what we’re seeing on the frontlines.”

I’m a fan of public education. I grew up in a little, rural Texas town where the public school was the center of what we did in town. There was no mayor’s office. It’s an unincorporated town and the school was the heart of the community. And I think, politically, we’ve kind of forgotten how important public schooling is in Texas.”

The studies of value-added measurement keep on coming, and the findings usually show what an utterly absurd idea it to think that teacher quality can be judged by student test scores. In a just world, Arne Duncan would be held accountable for the stupid and harmful theories he has imposed on the nation’s public schools. The U.S. Department of Education has become a malignant force in American education. I cannot think of any time in our nation’s history when public schools and teachers were literally endangered by the mandates coming from Washington, D.C., where the leadership is wholly ignorant of federalism.

This story in Education Week summarizes the latest batch of studies of VAM. some researchers, having made this their area of specialization, continue to prod in hopes of good news.

But look at this:

“In a study that appears in the current issue of the American Educational Research Journal, Noelle A. Paufler and Audrey Amrein-Beardsley, a doctoral candidate and an associate professor at Arizona State University, respectively, conclude that elementary school students are not randomly distributed into classrooms. That finding is significant because random distribution of students is a technical assumption that underlies some value-added models.

“Even when value-added models do account for nonrandom classroom assignment, they typically fail to consider behavior, personality, and other factors that profoundly influenced the classroom-assignment decisions of the 378 Arizona principals surveyed. That, too, can bias value-added results.

“Perhaps most provocative of all are the preliminary results of a study that uses value-added modeling to assess teacher effects on a trait they could not plausibly change, namely, their students’ heights. The results of that study, led by Marianne P. Bitler, an economics professor at the University of California, Irvine, have been presented at multiple academic conferences this year.
The authors found that teachers’ one-year “effects” on student height were nearly as large as their effects upon reading and math. The researchers did not find any correlation between the “value” that teachers “added” to height and the value they added to reading and math. In addition, unlike the reading and math results, which demonstrated some consistency from one year to the next, the height outcomes were not stable over time. The authors suggested that the different properties of the two models offered “some comfort.” Nevertheless, they advised caution.”

So, let’s get this right: teachers’ effects on students’ height were nearly as large as their effect on reading and math.

Perhaps Arne can just arrange to have all teachers fired (except for TFA), close every school (except “no-excuses” charter schools), and turnaround the whole country.

Valerie Strauss notes the growing number of studies that debunk the value of judging teacher quality by the rise or fall of test scores, and naturally she wondered what Secretary Arne Duncan thought about them.

There was the report of the American Statistical Association, which said: l

“VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”

Days ago, a new Gates-funded study found no correlation between “quality teaching and the appraisals teachers received.”

Another study by a team led by Marianne P. Bitler, an economist at the University of California, said that VAM ratings had about the same relationship to reading and math scores as to changes in a student’s height.

VAM is the centerpiece of Race to the Top..

Strauss called the u.S. Department of Education to ask whether Secretary Duncan was aware of the research and whether it had changed his views. The answers: yes, he was aware of the research; no, it had not changed his views.

Four years ago, I was in Colorado to discuss education policy. This was in the heady early days of Race to the Top (which Colorado did not win, despite its whole-hearted embrace of everything Arne Duncan wanted). On one occasion, I was scheduled to debate State Senator Michael Johnston, the darling of the “reform” crowd. Johnston had written a bill that was coming to a vote that very day. His bill made student test scores count for 50% of every educators’ evaluation. An effective evaluation, his bill decreed, required growth in student scores. Johnston called his bill something like “Great Schools, Great Educators.” Or something like that. Every bill these days must contain at least one impossible promise in its title.

As I said, we were supposed to debate in front of a packed room of civic leaders, maybe 80 or so people. I waited and waited. No Johnston. Finally, I got up and spoke my concerns in his absence. No sooner did I finish than the doors at the back of the room opened and out popped young Senator Johnston. I say young because he appeared to be about 25, though I think he was actually 32. He was then considered the leading voice of education reform in the Legislature, despite members who were retired and experienced educators. Senator Johnston had served two years in Teach for America, then was principal of a school for two years, then ran for state senate. And now he was rewriting the state’s education laws! Truly a whiz kid!

Since he did not hear me, he did not have to respond to anything I said. Instead, he spoke in glowing terms of his legislation. He had an almost mystical faith in the amazing results that would automatically materialize as soon as teachers and principals were evaluated by the academic growth of their students. He seemed to believe that the only source of low scores was the absence of incentives and sanctions for those unmotivated, possibly lazy educators. Everyone, it seemed, wanted to believe that he knew what he was talking about.

Now, we know it takes time to phase in new policies and practices. As Bill Gates famously said, “It will take a decade to know whether this stuff works.” What he meant by “this stuff,” I guess, is the idea that privatization and measuring teacher quality by student scores will make students better educated. My own view is that we should stop looking for the “secret sauce” because it is a chimera. Instead, we should do what we know works, which is reduced class sizes, early childhood education, family education, experienced teachers, healthy children, a full and rich curriculum, and the wraparound services that children need. But all that is complicated, not simple; our data-driven reformers like simple solutions, the bumper sticker ideas.

But surely we should see some positive movement in Colorado, don’t you think? And it should be cumulative, stronger every year as the “reforms” take hold.

The latest state scores from Colorado–which has been dominated by data-driven reformers for a decade– are unimpressive. Actually, the scores of third-graders, who have known nothing other than a testing culture, took a slight dip. In truth, they were flat.

Oh, well, maybe next year, we will see the miracle that Senator Johnston promised. Or the year after that.

Meanwhile Senator Johnston has been invited to be Alumni Commencement Speaker at Harvard Graduate School of Education, which has aroused some protest. This is allegedly a tribute to his great accomplishment in Colorado, where every year his promises grow more hollow. How many of the graduates at HGSE would want to work under Johnston’s law? Presumably, students at HGSE read research and know that VAM is Junk Science.

Laura Chapman says it is no improvement to substitute student growth in test scores for plain old test scores. Both reduce teaching and learning to multiple choice test questions.

She writes, in response to this post:

“Instead of judging schools solely by test scores, they might be judged–at least in part–by student growth.”
This is not an improvement of any kind, but the precise language from Race to the Top Legislation (see reference below).

In federal and state policy “student growth” is just a euphemism for a gain score from pre-test to post-test, or year-to-year. In other words, the term “growth” has been thoroughly corrupted to mean just another score, and preferably a score with properties that can be processed to produce a VAM–value added score. (See reference below on the new grammar…)

Do not be mislead. The marketeers of “growth” as if this is some gold standard or “fair” measure for judging educational activity are engaging in a propaganda campaign. Participants include USDE and its hired hands who know that this term “growth” has a rich and elaborate semantic reach in education. They are cynically trying to cut away understandings of growth and development as teachers understand it for individual students–multifaceted and asynchronous (e.g. bright but socially awkward; coordinated dancer, but not an athlete; enchanted with calligraphy but has terrible handwriting). To be sure, there are normative patterns for a large number of students, but so-called “developmental levels” also mask all of the wondrous variability in students. Forget all that, the new meaning of “growth” is a gain or increase in a metric derived from a test.

A perfect example of the marketing effort on behalf of redefining “human growth” (as a difference in metrics) is the infamous “Oak Tree Analogy” (see reference below)–that conveniently ignores that fact that students, unlike trees, have minds of their own.

I call this a cynical move because the oak tree analogy is framed to place teachers in the role of workers in a nursery in charge of providing the “nutrients” that are needed for trees to thrive. This frame, as Lakoff and Johnson remind us, taps a “nurturing parent” metaphor for teachers, and also the traditional role for women. The campaign to portray teachers as bad nurturers, lay, soft, uncaring is nowhere more evident that in the excessive use of “rigor” and “rigorous” as obligatory adjectives for almost everything bearing on “improvements” in education. See Lakoff, G., & Johnson, M. (2008). Metaphors we live by. University of Chicago press.

Repeat. Federal and state policy documents define “growth” as a gain in pre-test to post test scores, and a gain in year-to-year scores. Such scores are used to radically simplify judgments about districts, schools, teachers, and students. The distorted views of education produced by aggrandizing tests and “metrics” as if these refer to the actual complexities of human growth and development–perceptual, intellectual, social, physical, creative, aesthetic–is a fraud.

For federal language for “growth” see: Final Definitions 559751-52 Federal Register / Vol. 74, No. 221 / Wednesday, November 18, 2009 / Rules and Regulations DEPARTMENT OF EDUCATION 34 CFR Subtitle B, Chapter II [Docket ID ED–2009–OESE–0006]

RIN 1810–AB07 Race to the Top Fund AGENCY: Department of Education.Retrieved from http://www.gpo.gov/fdsys/pkg/FR-2009-11-18/pdf/E9-27426.pdf

For the false comparison of human development and oak tree “growth” see:

Value-Added Research Center. (2012). Teacher effectiveness initiative value-added training oak tree analogy. Madison: University of Wisconsin. Retrieved from Retrieved from http://varc.wceruw.org/tutorials/oak/index.htm

For the cynical promotion of a preferred “grammar” for education see:
Reform Support Network. (2012, December). Engaging educators, Toward a New grammar and framework for educator engagement. Author. Retrieved from http://www2.ed.gov/about/inits/ed/implementation-support-unit/tech-assist/engaging-educators.pdf

The centerpiece of Race to the Top is evaluating teachers by test scores. The students of good teachers, Arne Duncan and Barack Obama believe, get higher scores. If they have low scores, it is the fault of bad teachers. There was no evidence for their beliefs, other than the speculations of economists and statisticians. Real teachers never believed the theory, because they know that many favors affect test scores, not just teachers.

Thirty five states and DC followed Duncan’s lead, even though his hunch lacked any evidence . Lyndsey Layton has a comprehensive article in today’s Washington Post, describing the latest study to disprove Duncan’s theory.

Spurred on by Duncan, many states now use test scores to determine tenure and compensation. Duncan recently said he wants to judge the quality of teacher education programs by the test scores of students taught by their graduates.

Secretary Duncan’s love affair with standardized testing is inexplicable. There can be no question that he has caused immense damage to children, teachers, and public education.