Carol Burris and I Dissect A Bizarre New York Times Editorial

The New York Times’ editorial today about teacher evaluation was unusually odd. It sounded as though the writer knows there is no evidence to support using student test scores, but is trying to find a rationale for doing it anyway. There is literally not a single district one can point to and say, “It’s working here. Here is proof that using test scores to evaluate teachers produces excellence.”

The editorial claimed that Montgomery County’s much-admired Peer Assistance and Review program relies on test scores. It sounds like Cinderella’s ugly sister is trying to stuff her big foot into the glass slipper. Montgomery County turned down $12 million in Race to the Top funding to avoid using test scores to evaluate teachers.

Its peer assistance program works far better than the value-added test-based evaluations now adopted in many states and districts in which test scores count for as much as 50% of a teacher’s “grade.”

Carol Burris, who has been a leader in the fight against test-based evaluation in New York, shares her reaction to this odd editorial:

Today’s editorial in The New York Times [http://www.nytimes.com/2012/09/17/opinion/in-search-of-excellent-teaching.html?_r=1&ref=opinion] on teacher evaluation is just one more beat on the same broken drum. The Times seeks to distance the Chicago plan from other evaluation plans, which with the exception of Montgomery County’s, are more like Chicago’s than not. Montgomery County’s longstanding plan, does not use test scores for evaluation and it focuses on teacher improvement, not sorting and dismissal.

The column bases its arguments on the same false assumptions that folks like Michelle Rhee have sold to the public. The first is that teacher evaluation is universally broken. This assumption comes from the report, The Widget Effect, produced by Rhee’s group, the New Teacher Project. It drew its conclusions from a few selected districts. Evaluation is not broken in Montgomery County and it is certainly not broken at my high school. Many districts have sound evaluation systems that help teachers become more effective—they are not teacher dismissal machines but rather supervision models designed to improve instruction.

The second false assumption is that excellent teachers leave districts because they are not rewarded (translate, receive merit pay). Again, there is no factual evidence to support this. Merit pay is neither effective nor is it desired by teachers—it is a gift of public funds at a time when schools can ill afford it.

The third false assumption is that as long as we decrease the percentage of the evaluation number derived from VAM scores, we can make it all work. The editorial uses IMPACT as an example. They attribute the Washington DC school district’s decision to decrease the percentage of VAM in evaluations to ‘teacher anxiety’. I find that remark, which reformers often use to describe teacher responses to these systems, to be both paternalistic and sexist. Teachers object to VAM because they know its limitations and flaws. It was never designed to evaluate individual teachers; it was designed by researchers to be a tool to assess systems and programs. Using VAM to evaluate teachers is akin to using Lysol as a mouth wash because it kills germs on your kitchen floor.

Here is an example of the limitations of the New York system. Teachers and principals of grades 4-8 were recently assigned “growth scores” by the State Education Department. The model SED used was a hybrid of a growth model and a VAM model. The American Institute for Research, which created the model, also produced a technical manual to explain the resulting scores. You can find that manual here: http://usny.nysed.gov/rttt/docs/nysed-2011-beta-growth-tech-report.pdf.

It is well worth a careful read. AIR was remarkably candid explaining the limitations. Here are some highlights:

• Although AIR preferred to use three years of prior scores as the baseline for growth, such data was available for Grades 6- 8 only. Grades 4 and 5 had limited prior data which was reflected in larger error, especially in Grade 4.
• There was no way to identify co-teachers or support teachers, and a little over half of all student scores in grades 4 and 5 were attributed to principals only, because they could not be correctly linked to teachers.
• The only co-variates (predictor variables in the model) were ELL status, SWD status (with all disabilities mild and severe considered the same) and economic disadvantage.
• Race, ethnicity, class size, spending, attendance and a host of other variables which are known correlates with student performance were not included.

Perhaps the most important problems with the model are explained on pages 24 – 30. AIR clearly shows how as the percentage of students with disabilities and students of poverty in a class or school increases, the average teacher or principal growth score decreases. In short, the larger the share of such students, the more the teacher and principal are disadvantaged by the model. Regarding ELL students, the report indicates that some teachers are advantaged, while others are disadvantaged. This should come as no surprise—well educated students from China and students from rural areas of El Salvador with interrupted education are both classified as ELL, but their growth, as measured by test scores, is quite different.

Likewise, in this model, teachers who have students whose prior test scores are higher are advantaged, while teachers whose students have lower prior achievement are disadvantaged. This phenomenon, known as peer effects, has been observed in the literature since the 1980s. It is a root cause of the widening of the test score gap among classes in tracked schools. It has also been found in school to school comparisons as well. In a study of Houston Schools after Katrina, the schools which received a large share of high performing students from New Orleans saw their original students’ scores rise, and those who received a large share of low performing students from New Orleans saw their original students’ scores decrease.

Perhaps the best critique of the model comes from AIR itself. They conclude “the model selected to estimate growth scores for New York State represents a first effort to produce fair and accurate estimates of individual teacher and principal effectiveness based on a limited set of data” (p. 35). Not “our best attempt’, not even a ‘good first attempt’, but rather a “first effort’. And yet, across the state, teachers and principals have received scores telling them that they are ineffective in producing student learning growth.

I can assure those who believe that teachers are simply anxious, this is not something that a Xanax will cure. Teachers and principals are smart and savvy; you are mistaking outrage for anxiety.

lellingw says:

September 17, 2012 at 8:28 pm

There is never a discussion about how people learn best and teach best when anxiety and pressure are low.

LikeLike

Alan says:

September 17, 2012 at 8:48 pm

I certainly find that to be the case in my classes, and I tend to do much better than average with students who have documented anxiety issues. Their numbers, by the way, are steadily rising according to out guidance counselors.

LikeLike

Linda says:

September 17, 2012 at 8:29 pm

I canceled my weekend subscription to the NY Times. I am not spending my money to read that anti-public school, anti-teacher rag. Done with them!

readingexchange says:

September 17, 2012 at 8:40 pm

“It sounds like Cinderella’s ugly sister is trying to stuff her big foot into the glass slipper.”

Love it …this is really rich, Diane. You nailed it!!!!!!!!!

Duane Swacker says:

September 17, 2012 at 8:54 pm

Thank you Carol and Diane for the work and effort but also the link to the VAM manual itself. This is the type of material evidence that needs to be out there. And it is what Wilson has done with educational standards and standardized testing, using the proponents own literature to show the flaws, errors and invalidities.

Duane Swacker says:

September 17, 2012 at 8:56 pm

Much the same way if all test givers were able to read the standardized test questions, the many errors and mistakes could be pointed out. But no, can’t have that happen. Crappy product, crappy results.

LikeLike

- ReTiredbutMisstheKids says:
  
  September 18, 2012 at 3:25 am
  
  Yes! The New York Times Editorial Board needs to read Diane’s post,”House of Cards.” The very lives and livelihoods of teachers (and the lives of students–“a teacher’s working conditions are a student’s learning conditions”) dependent on greatly flawed tests
  makes sense…NONsense!
  
  LikeLike

Patrice says:

September 17, 2012 at 8:57 pm

NYT is desperately trying to carry the obama Ed reform agenda. They are not going to knock teacher evaluations since RTTT money depends on districts buying this. Follow the money. Gates, Pearson, Obama, Duncan, Emanuel….sad. Just sad.

moosesnsquirrels says:

September 17, 2012 at 9:12 pm

I suggest that we create similar VAM tests for doctors, lawyers, and politicians and start ranking them according to their contributions. Oh, and reporters and news editors too. That will solve this problem real soon!

dianerav says:

September 17, 2012 at 9:25 pm

As Leona Helmsly, widow of real estate baron, once said, “Taxes are for the little people.” You know who gets VAM. Not doctors, lawyers, pols

Diane Ravitch

LikeLike

Catherine says:

September 17, 2012 at 9:24 pm

I would like to share a piece I heard on NPR this morning.
http://www.npr.org/blogs/health/2012/09/17/161159263/teachers-expectations-can-influence-how-students-performMay

This might be another irony in the long saga of students, teachers, and standardized tests. Most teachers now have access to detailed information about a student’s performance on each state’s standardized test. Here it is the FCAT. Suppose a student did poorly on the test. Maybe they broke their glasses (a true story) and their mother didn’t have money to buy a new pair until after the exam. Maybe they speak Jamaican patois at home (as many of my students do) but when they entered school and the ELL coordinator interviewed them and asked their parents if English was their first language and they said yes then they would not get any services like extended time. Maybe they disliked their teacher that year or their parents divorced or their dad or mother went to prison. A low score would increase the likelihood that their next year’s teacher would have lower expectations, creating a vicious cycle. Just something to think about.

Duane Swacker says:

September 17, 2012 at 9:40 pm

I heard that radio report too just as I was pulling into a parking spot at school. From what I heard about the study, my first thoughts were that the study would never get approval these days. Too much deception, too much reliance on a “new” IQ test (and we all know about the problems of IQ tests especially those from almost 50 years ago), too much potential harm to the students. Haven’t read the study but I’d bet a dime on a dollar that it has been thoroughly debunked by now. Another one of the “Oh, if the teacher has high expectations all students can succeed” platitudes type report as if teachers don’t have high expectations now, that is until the student disavows us of that notion (and not that the teacher even then “gives up” on the student but will keep trying to disavow the student of the notion that they can’t, don’t want to learn).

A nun at my Catholic grade school (mind you this was 1966-7 fairly soon after that study) announced that “You are all ‘A’ students in my mind”. We all just laughed as we knew it wasn’t true (not for the right reasons but we still knew it wasn’t true). We played with that one for quite a while only as “middle schoolers” (even though that term wasn’t around then in the Catholic system) can.

LikeLike

- Confused says:
  
  September 18, 2012 at 5:50 am
  
  Duane,
  
  Just like when my principal says,”You know you are all the best staff I have ever had.” I’d rather have no praise than faux praise.
  My students just took their baseline online test so that my teaching can be evaluated for specific student learning targets. In grading the open response questions with the rubric provided I was reading mostly trash. “IDK” written in every box over and over, or “This is stupid”. I just got sooooo angry that my “value” will be based on middle school students’ hormonal levels, frontal lobe function(or lack thereof) and blood sugar level(after no breakfast or eating three packages of pop-tarts and a Monster Energy drink and I have them after 10am when luch isn’t until 12:30.
  
  LikeLike
- kafkateach says:
  
  September 18, 2012 at 6:39 am
  
  Confused and I must have the same principal. At every staff meeting we are told that we are “the best staff in the world.” This line must come out of some principal playbook. Funny how at the end of the year when she had a chance to rate us on a scale of 1-50 for the first time only a few members of her inner circle received the precious 50. Apparently, most of us are not “the best staff in the world.”
  
  LikeLike
- Duane Swacker says:
  
  September 18, 2012 at 8:23 am
  
  Confused,
  
  Boy am I so happy that the this testing/VAM crap (and that’s putting it nicely) hasn’t reached me yet (I teach HS Spanish). Although I do believe that next year there will be some form of VAM thrust upon us. So not only are the concerns you raise valid but I will be “graded” on imaginary students that I don’t even teach. It’s going to be a battle as I will refuse to play along with the VAM nonsense.
  
  Good thing I only have a few years left (need to contact our state retirement system to find out just what I need to do).
  
  LikeLike
dianerav says:

September 17, 2012 at 9:51 pm

Back in 1967, there was a book that claimed that student performance would soar if teachers had high expectations. There was an experiment in which teachers were given confidential information that certain children were incredibly brilliant. According to the study, the teachers believed it–and consequently the students performed brilliantly.

Unfortunately, no one was able to replicate the findings. But the idea lived on.

Diane Ravitch

LikeLike

- readingexchange says:
  
  September 18, 2012 at 9:13 pm
  
  “The Rosenthal Effect”… very interesting study.
  http://psych.wisc.edu/braun/281/Intelligence/LabellingEffects.htm
  
  LikeLike
Alan says:

September 18, 2012 at 7:40 am

I heard another NPR story this morning, the gist of which was that in Massachusetts the 10th graders did well on their standardized test. This was exhibited as proof that ed “reforms” were working. Actually, I suspect the sophomores did well because of a push to improve and increase MA preschool programs about a decade ago. The story also reported that younger elementary students did poorly on their test; THIS is more likely connected to recent “reform” efforts.

LikeLike

leonie haimson (@leoniehaimson) says:

September 17, 2012 at 9:41 pm

Also you should know that the growth scores do not differentiate free and reduced lunch students whose expected outcomes are very different.

Alan says:

September 18, 2012 at 7:43 am

In one area turnaround school, the reformy “Special Master” has made every student in the school eligible for free lunch; Ta-Dah – better test scores and closing achievement gap!

LikeLike

Jon Awbrey says:

September 17, 2012 at 10:36 pm

Corporate owned media organizations have a conflict of interest that prevents them from reporting the facts about public education. They are nothing more than another party of scavengers waiting for the major predators to finish taking their share of the kill.

John Young says:

September 17, 2012 at 11:33 pm

Reblogged this on Transparent Christina.

kafkateach says:

September 18, 2012 at 2:36 am

I’m glad I’m not the only person to find that NY Times editorial to be a hodgepodge of contradictions. They state, “At their best, these evaluation systems are based on the idea that teaching is difficult to master and that high-performers tend to get that way through intensive feedback and help from colleagues.” I got news for the NY Times, if you are basing 50% of my evaluation off of student test scores ranked against my peers, I’m not helping anyone. If they want to bring Wall Street to Sesame Street, don’t expect teachers to play nicey-nice.

Carol burris says:

September 18, 2012 at 8:39 am

Because the press is now looking more critically at VAM because of Chicago, I think the The Times, which has supported these eval systems was arguing…Chicago bad, others evials good. Can you imagine what they would say if drug companies brought lots of new drugs to market without testing?

Carol says:

September 18, 2012 at 1:50 pm

It was also shocking that the editorial included parent surveys as one of the factors used to assess teachers. The vast majority of teachers my child had were great to good. But there were some that never should have entered a classroom. The active parents know which small percentage of teachers in a school are the lemons.

carolcorbettburris says:

September 18, 2012 at 6:59 pm

Even more shocking is that Chicago is proposing using the ACT as the growth measurement. You cannot use that test to measure the effect of an individual teacher…that test like the SAT measures general intelligence and is highly correlated with the education level of the mother. It is not a measure of the teaching of high school curriculum.

Carol Burris and I Dissect A Bizarre New York Times Editorial

26 Comments Post your own or leave a trackback: Trackback URL

Leave a reply to Alan Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

Carol Burris and I Dissect A Bizarre New York Times Editorial

Diane Ravitch's Blog

26 Comments Post your own or leave a trackback: Trackback URL

Leave a reply to Alan Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats