Archives for category: Teacher Evaluations

The New York Times’ editorial today about teacher evaluation was unusually odd. It sounded as though the writer knows there is no evidence to support using student test scores, but is trying to find a rationale for doing it anyway. There is literally not a single district one can point to and say, “It’s working here. Here is proof that using test scores to evaluate teachers produces excellence.”

The editorial claimed that Montgomery County’s much-admired Peer Assistance and Review program relies on test scores. It sounds like Cinderella’s ugly sister is trying to stuff her big foot into the glass slipper. Montgomery County turned down $12 million in Race to the Top funding to avoid using test scores to evaluate teachers.

Its peer assistance program works far better than the value-added test-based evaluations now adopted in many states and districts in which test scores count for as much as 50% of a teacher’s “grade.”

Carol Burris, who has been a leader in the fight against test-based evaluation in New York, shares her reaction to this odd editorial:

Today’s editorial in The New York Times [http://www.nytimes.com/2012/09/17/opinion/in-search-of-excellent-teaching.html?_r=1&ref=opinion] on teacher evaluation is just one more beat on the same broken drum.  The Times seeks to distance the Chicago plan from other evaluation plans, which with the exception of Montgomery County’s, are more like Chicago’s than not.   Montgomery County’s longstanding plan, does not use test scores for evaluation and it focuses on teacher improvement, not sorting and dismissal.
 
The column bases its arguments on the same false assumptions that folks like Michelle Rhee have sold to the public. The first is that teacher evaluation is universally broken.  This assumption comes from the report, The Widget Effect, produced by Rhee’s group, the New Teacher Project.  It drew its conclusions from a few selected districts. Evaluation is not broken in Montgomery County and it is certainly not broken at my high school.  Many districts have sound evaluation systems that help teachers become more effective—they are not teacher dismissal machines but rather supervision models designed to improve instruction.
 
The second false assumption is that excellent teachers leave districts because they are not rewarded (translate, receive merit pay).  Again, there is no factual evidence to support this.  Merit pay is neither effective nor is it desired by teachers—it is a gift of public funds at a time when schools can ill afford it.
 
The third false assumption is that as long as we decrease the percentage of the evaluation number derived from VAM scores, we can make it all work. The editorial uses IMPACT as an example. They attribute the Washington DC school district’s decision to decrease the percentage of VAM in evaluations to ‘teacher anxiety’. I find that remark, which reformers often use to describe teacher responses to these systems, to be both paternalistic and sexist. Teachers object to VAM because they know its limitations and flaws.  It was never designed to evaluate individual teachers; it was designed by researchers to be a tool to assess systems and programs. Using VAM to evaluate teachers is akin to using Lysol as a mouth wash because it kills germs on your kitchen floor.
 
Here is an example of the limitations of the New York system.  Teachers and principals of grades 4-8 were recently assigned “growth scores” by the State Education Department.    The model SED used was a hybrid of a growth model and a VAM model. The American Institute for Research, which created the model, also produced a technical manual to explain the resulting scores. You can find that manual here: http://usny.nysed.gov/rttt/docs/nysed-2011-beta-growth-tech-report.pdf.
 
It is well worth a careful read.  AIR was remarkably candid explaining the limitations.  Here are some highlights:
 
• Although AIR preferred to use three years of prior scores as the baseline for growth, such data was available for Grades 6- 8 only. Grades 4 and 5 had limited prior data which was reflected in larger error, especially in Grade 4.
• There was no way to identify co-teachers or support teachers, and a little over half of all student scores in grades 4 and 5 were attributed to principals only, because they could not be correctly linked to teachers.
• The only co-variates (predictor variables in the model) were ELL status, SWD status (with all disabilities mild and severe considered the same) and economic disadvantage.
• Race, ethnicity, class size, spending, attendance and a host of other variables which are known correlates with student performance were not included.
 
Perhaps the most important problems with the model are explained on pages 24 – 30. AIR clearly shows how as the percentage of students with disabilities and students of poverty in a class or school increases, the average teacher or principal growth score decreases.  In short, the larger the share of such students, the more the teacher and principal are disadvantaged by the model. Regarding ELL students, the report indicates that some teachers are advantaged, while others are disadvantaged. This should come as no surprise—well educated students from China and students from rural areas of El Salvador with interrupted education are both classified as ELL, but their growth, as measured by test scores, is quite different.
 
Likewise, in this model, teachers who have students whose prior test scores are higher are advantaged, while teachers whose students have lower prior achievement are disadvantaged. This phenomenon, known as peer effects, has been observed in the literature since the 1980s.  It is a root cause of the widening of the test score gap among classes in tracked schools. It has also been found in school to school comparisons as well.  In a study of Houston Schools after Katrina, the schools which received a large share of high performing students from New Orleans saw their original students’ scores rise, and those who received a large share of low performing students from New Orleans saw their original students’ scores decrease.
 
Perhaps the best critique of the model comes from AIR itself.  They conclude “the model selected to estimate growth scores for New York State represents a first effort to produce fair and accurate estimates of individual teacher and principal effectiveness based on a limited set of data” (p. 35).  Not “our best attempt’, not even a ‘good first attempt’, but rather a “first effort’.  And yet, across the state, teachers and principals have received scores telling them that they are ineffective in producing student learning growth.
 
I can assure those who believe that teachers are simply anxious, this is not something that a Xanax will cure. Teachers and principals are smart and savvy; you are mistaking outrage for anxiety.

Michael Klonsky in Chicago disagrees with Pedro Noguera’s views in the Nation about the Chicago teachers’ strike. Here Klonsky sets the record straight:

Pedro Noguera claims that the CTU, “has not been willing to acknowledge that more learning time and a clear and fair basis for judging teacher effectiveness are legitimate issues that must be addressed.”

I’m a big fan of Pedro but his latest criticism of the union is not only ill-timed, but dead wrong as well. The union doesn’t oppose “more learning time” for students as Pedro Claims. From the start, they supported the idea of a longer, better school day (see the Ward Room (http://www.nbcchicago.com/blogs/ward-room/CTU-Contract-Longer-School-Day-163588976.html) including more art, music, physical education and recess, similar to the school day at the private school where Rahm and board member Penny Pritzker send their children.

The union’s approach to a longer school day moves well beyond and improves upon the mayor’s top-down imposition of more seat time on teachers, students and parents. It is true that union has opposed the idea of a longer school day and year without any added compensation for teachers as mandated by the board.

Pedro’s other poke at the CTU for supposedly not offering an alternative approach to improving “teacher effectiveness” is also misleading. The union, with research support coming from the CReATE group of researchers, has put forth important ideas for transforming the current inadequate evaluation system (See CReATE member Isabel Nunez’ commentary in the Sept. 12 Sun-Times http://www.suntimes.com/news/otherviews/15107882-452/standardized-test-scores-are-worst-way-to-evaluate-teachers.html).

What makes Pedro’s criticism so unfair, particularly at this time, is that the union has taken on both the more-seat-time issues as well as new approaches to teacher evaluation at great risk during the current contract negotiations. Perhaps he isn’t aware that since the passage of Sen. Bill 7, Chicago teachers are legally barred from negotiating over anything except wage/benefit issues.

Pedro would do well to read the union’s excellent document, “The Schools Chicago’s Students Deserve” to better understand where the CTU is coming from. The report can be found at http://www.ctunet.com/blog/text/SCSD_Report-02-16-2012-1.pdf

Amanda Ripley, who usually writes pro-corporate reform articles in TIME, has an article in the Wall Street Journal about how teachers in other nations embrace “reform.”

Her first example is Finland.

That is a curious example for a devotee of today’s carrot-and-stick reforms because Finland would never permit a teacher with five weeks of training to teach. As she notes, they must complete a rigorous four-year college program PLUS a master’s degree. There is no “Teach for Finland.”

Furthermore, there is NO standardized testing in Finland. Ripley doesn’t mention that.

And teachers are not evaluated by the test scores of their students, because there are no student scores.

Also, as I saw when I visited several schools in Finland, the classes are small. About 15-19 in elementary schools, in the low-to-mid 20s in the other grades. And the elementary schools are saturated with services for children that need extra help.

What’s the lesson? How can we get to be more like Finland? After all, Finland borrowed most of its pedagogical philosophy from John Dewey.

LIFE & CULTURE
September 14, 2012, 6:29 p.m. ET
Training Teachers to Embrace Reform

Chicago-style war with unions is the past. Here’s how Finland and Ontario found a new way forward

By AMANDA RIPLEY

Making sense of the Chicago teachers’ strike (where the two sides were reportedly moving toward resolution on Friday) is like trying to understand the failure of a friend’s marriage. You can’t help speculating about who’s to blame, but you’ll never really know. In truth, it doesn’t matter. Many countries have revolutionized their education systems in recent years, but not one of them has done it through strikes, walkouts or righteous indignation.

Just about every country in the developed world has a teachers’ union, so the mere presence of a union doesn’t determine the quality of a country’s schools. There is, however, a significant relationship between the professionalism of the union and the health of an education system. The all-important issue is not how easy it is to fire the worst teachers; it’s how to elevate the entire craft without going to war with teachers.

Striking Chicago public school teachers on Friday picket outside Whitney M. Young Magnet High School in Chicago.
That’s where other countries can show us a better way. Working with unions doesn’t mean turning into Mexico, where the education system has been gifted to the union in exchange for political favors—and teenagers perform at the bottom of the world in math and reading. In a few countries, politicians and union leaders have managed not only to raise expectations but to get teachers to drink from the same punch bowl as reformers.

In Finland in the 1970s, teachers had to use special diaries to record what they taught each hour. Government inspectors made sure that a rigorous national curriculum was being followed. Teachers and principals weren’t trusted to act on their own.

At the same time, however, the government began to inject professionalism into the system. The Finns shut down the middling teacher-training schools that dotted the rural landscape and moved teacher preparation into the elite universities, where only the top echelon of high-school graduates could study (something the U.S. has never attempted). Opponents said the changes were elitist, but the reformers insisted that the country had to invest in education to survive economically. Once teachers-to-be got into the universities, they were required to master their subject matter and to spend long stretches practicing in high-performing public schools.

In the 1980s and ’90s, with higher standards and more rigorous teacher training in place, the reformers injected trust. They lifted mandates and asked the teachers themselves to design a new, smarter national curriculum. Today, Finland’s teenagers score at the top of the world on international tests.

If Finland feels too remote to serve as a model for the U.S., consider Ontario, Canada. After years of labor strife in the 1990s, a new provincial premier was elected in 2003. Dalton McGuinty chose Gerard Kennedy, a critic of the old regime, as his education minister. He spent months in school cafeterias, principals’ offices and parent meetings before the negotiations began. “You couldn’t wait until you were at the bargaining table,” explains Benjamin Levin, the former deputy minister. When it came time to negotiate a new teachers’ contract in 2005, Mr. Kennedy harangued the bargainers and kept them at the table all night on more than one occasion—deflecting the distractions that normally dominate such talks—until he finally got an agreement.

The plan that emerged put pressure on Ontario’s schools to improve results and also offered more help to educators. This worked in part because Canada already had fairly rigorous and selective education colleges, so teachers had the skills to adapt to these changes. And by giving in to teachers’ requests for smaller elementary-class sizes, politicians bought themselves enormous good will.

The system in Ontario became “a virtuous circle,” says Marc Tucker, author of “Surpassing Shanghai,” a book about top-performing education systems. “When the young people came out of their training programs, they were damn good teachers. Because of that, they were able to raise public and political confidence—and when that happened, it made it possible for them to get higher salaries and even higher quality recruits into teaching.”

For the past decade, there has been a détente in labor relations in Ontario. Despite a diverse population of students, a quarter of whom were immigrants, the province’s high-school graduation rate rose from 68% to 82%. Teacher turnover also declined dramatically. In 2009, Ontario was one of the few places in the world (aside from Finland) where 15-year-olds scored very high on international tests regardless of their socioeconomic background.

Interestingly, Ontario had its own labor flare-up this week—over a proposed wage freeze and a law that could limit strikes. But coming after years of relative harmony, the response has been reasonable so far. The union urged members to temporarily stop coaching sports and limit other voluntary activities. The situation could deteriorate, but for now, the tone in Ontario is revealing.

What happened in Chicago is about more than just Chicago. It’s about the deeper problem of transforming America’s schools. For too long our education reformers have tried to create a professional teaching corps from the top down, and union leaders have fought to maintain an untenable system. Both sides need to enter the 21st century.

—Ms. Ripley is an Emerson Fellow at the New America Foundation and the author of a forthcoming book about life in the smartest countries in the world.

Gary Rubinstein is an extraordinary math teacher who has a terrific blog.

His analysis of New York City’s teacher data reports shows that they are inaccurate, unreliable, and meaningless.

Any district or state official who is considering VAM should read Gary’s six posts.

If you do, you will discover there is no there there.

Please help this post go viral.

Every pundit–from Nick Kristof to David Brooks to the editorial writers–should read this analysis.

One of our readers got his score from the state education department. He is in a state of shock and rage:

Today I’m angry, disgusted, demoralized,and frustrated. I am also firmly resolved to fight back against the tsunami of junk ideology that all good educators face these days.

I received my ‘growth score’ today from the New York State Education Department.

I know, I really shouldn’t care what my score is. I know 100% of my students tested at or above grade level in Math and English Language Arts. I know my class’ scores were near or at the very top of my district’s scores. I know my district is also at or nearly at the top of the region’s and states’ scores. I know I work my heart out and push my students to excel. My students always, ALWAYS succeed.

Yet according to the NYSED my growth score is so so. I’m rated effective with a growth score of 14 out of 20. Keep in mind, my student’s mean scale in math is 708.4 and ELA it is 678. I’m confident both scores are well above that state mean.

So why did I get a mediocre growth score?

The state’s explanation of it’s calculation should be a eye opener for all of us. Check out this junk math.

Click to access Teachers_Guide_to_Interpreting_Your_Growth_Score.pdf

Here it is in a nutshell..

They compare your students with similar students and measure how your students do to these similar students. You are then graded based on how much better your students did or how much ‘poorer’ your students did than these other students. They look for the gap between your students and the representative group of similar students.

Here the flaw…

If the representative sample of student all do well, your ratings will be negatively affected, because your growth is based on only how much better your students did than the group. In other words they look for a gap between your students and the group.

We all know that this year scores went up for everyone.. so as they rise, individual teachers get lower ratings, because the gap doesn’t increase. Sounds nuts doesn’t it? Goes against all the jargon about closing the gap.

It gets worse if you happen to have some high performing students in your class as well. Not much room for growth if you’re near the top, and your group is near the top. It’s a teacher’s advantage then to not take those high performing kids, It will hurt their growth scores.

My students did great, it’s a shame that NYS thinks they did so, so. Perhaps, if my students understood pineapples and hare races a little better, they could have correctly answered just 1 more question in that 6 hour marathon of testing correct, and all would be well.

We have a choice, we could start practicing saying, “welcome to Walmart”, for our next career or fight back. What say you?

http://rlratto.wordpress.com/2012/09/05/growth-scores-a-formula-for-failure/

Two years ago, the Economic Policy Institute drafted a joint statement by a group of prominent scholars of education and assessment.

Well before the current crisis over value-added assessment, this ad hoc group warned that there were many reasons to doubt the value of test-based evaluation.

Since that report was published there have been many more demonstrations of the invalidity of student tests as a measure of teaching quality.

In time, it will be clear to everyone who cares about education that this is not a good way to judge teacher quality.

Please read this article.

Eric Zorn of the Chicago Tribune has taken the time to read research.

This is especially important because the Trib has been hostile to the CTU strike.

I am especially pleased that he read Gary Rubinstein’s careful dissection of VAM in New York City.

Gary’s posts should go viral.

He shows that VAM doesn’t work.

It is meaningless.

Richard Rothstein explains that VAM is an unproven methodology with negative consequences for the quality of education.

Rothstein says he is not surprised that Chicago teachers oppose its use. He wonders why other teachers have not gone on strike for the same reason.

It has not worked anywhere.

It narrows the curriculum.

It relies too heavily on tests that were not designed to measure teacher quality.

The teachers are being used as guinea pigs for an unproven methodology that will harm education.

Marcus Winters recommends using value-added assessment to get rid of “ineffective” teachers. His paper was published by the conservative Manhattan Institute, which regularly issues his and others’ critiques of unions, tenure, seniority and any kind of job protection for teachers.

Many studies–and practical experience–have demonstrated that value-added assessment is unstable, unreliable and inaccurate. A teacher with a high rating may have a low rating the next year. The National Academy of Education and the American Educational Research Association published a joint paper warning that VAA says more about which students are assigned to the class than about teacher quality.

And then there is the problem that there is no district that has been able to demonstrate that VAA actually identifies ineffective or effective teachers. When New York City published its value-added ratings, the allegedly “worst” teacher in the city taught immigrant students who cycled in and out of her class as they learned English. As Linda Darling-Hammond and others have warned, value-added assessment will encourage teachers to shun the students with the highest needs and gifted students. Neither will produce the expected gains.

It is interesting  and curious that Arne Duncan’s favorite innovation happens to be the favorite solution of the right to find and fire “ineffective” teachers.

I will never understand why the rightwingers are so devoted to high-stakes testing, which is known to produce narrowing of the curriculum, teaching to the test, gaming the system, and cheating. What’s to like? Maybe they like it because it gives them a club with which to bash teachers, their unions, and public education.

The Los Angeles Times printed a thoughtful editorial about the teachers’ strike and about evaluating teachers by student test scores.

These days it is unusual to find an editorial or opinion column asking whether the tests were designed to measure teacher quality. They were not. Frankly, the test publishers ought to be yelling bloody murder about the inappropriate use of the tests, but they are making so much money that it’s hard to hear their complaints or to expect them.

I wish more writers would look at the research about the inaccuracy and instability of value-added assessment. I wish they would think a bit about how this high-stakes testing invariably leads to teaching to the test, narrowing the curriculum, score inflation, and cheating.

The one thing it does not produce is good education. If it did, we would see it in all the best private schools. But not a single one of them uses value-added assessment or even standardized tests. That would insult the intelligence of their teachers.