Gary Rubinstein: Gates’ MET Study Is Wrong

January 9, 2013 //

Gary Rubinstein took a close look at the new Gates’ study of teacher evaluation and says it is wrong. The media takeaway is tat in evaluating teachers, test scores are more reliable than observations. But Gary, who teaches mathematics at Stuyvesant High School in New York City, says it isn’t so.

Bill Gates has put $50 million into finding the ideal way to evaluate teachers.

Gary concludes: “It seems like the point of this ‘research’ is to simply ‘prove’ that Gates was right about what he expected to be true. He hired some pretty famous economists, people who certainly know enough about math to know that their conclusions are invalid.”

Categories Gates Foundation, Bill Gates, Teacher Evaluations

30 Comments Post your own or leave a trackback: Trackback URL

Marge Borchert says:

January 9, 2013 at 7:00 am

Oops!! How dare a math teacher be able to figure that out!!!

Marge

Reply
Robin Johnston says:

January 9, 2013 at 8:24 am

And the scary truth is that it doesn’t take a math teacher to understand why student test scores aren’t the best measures of teacher quality. A little common sense will do.

When I was an administrative intern, we had a math teacher whose students got great results, especially her geometry students. The principal attributed this to her good teaching, but the geometry class was composed of 7 students who took geometry online, and I substituted for the teacher a few times, as did several other teachers–because they didn’t even really need a teacher–they were the most advanced students in the middle school and just learned the material independently online. I observed her classes on occasion. It only took a little OBSERVATION and common sense to figure out that the scores had nothing to do with the teacher. With no countervailing observation, the test scores would be most misleading.

In fact, understanding why a student’s score on a test might not be as good as it should be is one of the marks of a good teacher–as is not assuming credit when you know it is not your teaching that caused a good score on occasions either.

Common sense also tells us that unless there is a pre-test, a post-test is not representative of what students learned in the class, and that unless the test covers most of what the teacher taught, it is likewise not representative. I have yet to see standardized tests that meet either of these criteria. I wrote a post about how this plays out in NC here: http://robinwilsonjohnston.edublogs.org/2012/11/26/data-central/

Reply
jcgrim says:

January 9, 2013 at 8:38 am

“Gary concludes: “It seems like the point of this ‘research’ is to simply ‘prove’ that Gates was right about what he expected to be true. He hired some pretty famous economists, people who certainly know enough about math to know that their conclusions are invalid.”
This is exactly the same tactic the Miliken Foundation used to sell their TEAM/TAP teacher evaluation program to the press and politicians.

Reply
Sandy says:

January 9, 2013 at 8:39 am

The irony is beautiful to behold in this one. If I were Bill Gates, I’d be soooo angry and frustrated. When men get angry, they usually make things worse. Prepare for a double down effort by Gates. The beatings will continue until morale improves!

Reply
- Duane Swacker says:
  
  January 9, 2013 at 8:27 pm
  
  “When men get angry, they usually make things worse.”
  
  Oh you sexist, what’s the equivalent of misogynist-feminoginyst?
  
  Although as a male I agree with what you have stated.
  
  Reply
  - Duane Swacker says:
    
    January 9, 2013 at 8:28 pm
    
    Oh, I know, the ol Limpballs favorite slur, feminazi. How could I have forgotten the ol Limpballs?
  - Sandy says:
    
    January 9, 2013 at 8:56 pm
    
    When women get angry, they get things done right. Men immediately go into “fix it” mode without patience and reflection. Speaking from experience.
  - Sandy says:
    
    January 9, 2013 at 8:58 pm
    
    Oh, pipe down. I know for a fact that Gates is pissed about the rejection and resistance to his save the education world debacle. You watch. He will do more of what doesn’t work before he acquiesces to reality. Matter of time.
  - Duane Swacker says:
    
    January 9, 2013 at 10:36 pm
    
    Sandy,
    I hope your “pipe down” comment wasn’t directed at me as I was agreeing with you in my own misogynist way.
    Duane
  - Sandy says:
    
    January 9, 2013 at 10:39 pm
    
    All in fun. No offense intended toward you and none taken here. Life is too short. The printed word is so easily misunderstood. One of the pitfalls of social media. Happy New Year!
  - Duane Swacker says:
    
    January 9, 2013 at 11:13 pm
    
    And Happy New Years to you also! One of the problems that I have with the written language is that “tone” is very hard to discern. As a second language teacher I am very sensitive to not only meaning but tone. And don’t get me wrong, I can be an obnoxious SOB in my writing.
  - Sandy says:
    
    January 9, 2013 at 11:19 pm
    
    When SOB is necessary, go for it. Always know your audience. Hope your year teaching is going well. Hard time to be in this profession. Stay strong, don’t let the bastards get ya down.
Galton says:

January 9, 2013 at 8:48 am

The study also found that TEACHING to THE test and test prep did have a POSITIVE EFFECT on SCORES. Even though reformers claim otherwise!
Like the Gates push for small schools, the is difficult to purchase.

Reply
- Galton says:
  
  January 9, 2013 at 8:49 am
  
  Oops, s/b
  Like with the Gates push for small schools, the truth is difficult to purchase.
  
  Reply
- "2old2tch" says:
  
  January 9, 2013 at 9:22 pm
  
  Of course teaching to the test affects scores. Why else do we have SAT and ACT prep courses? I wonder why there isn’t a VAM score?
  
  Reply
  - Duane Swacker says:
    
    January 9, 2013 at 10:40 pm
    
    I was sitting there today listening to our asst sup talk about the new MSIP 5 system under which our schools here in MO will be “graded” and I was thinking “I know I could tell you all how to game the system without requiring teachers to teach to the test” but I know they don’t want to know what I have to say as they only know how to “play by the ‘rules'”.
Dufrense says:

January 9, 2013 at 9:33 am

It’s a safe assumption that some principals’ evaluating is prone to inflation simply because even with detailed evaluation rubrics, training, etc., observations have a degree of subjectivity. That doesn’t mean it’s a widespread problem or that principals who do their jobs well can’t evaluate teachers.

Ed reformers conveniently ignore the problems with VAM or assert that making test scores just one of “multiple measures” mitigates VAM’s limitations. “Hey, 100% of your high-stakes evaluation isn’t based on junk science, just 40-50%!”

If someone develops a valid, reliable model based on test scores, then I’m fine with including it as part of my evaluation. But VAM isn’t such a model.

When the TN Dept of Ed issued their review of TEAM over the summer, I was struck by their insistence that the disparity between the percentage of teachers scoring a 1 on their observation and the percentage with 1 on VAM resulted from inflated observation scores:

“In many cases, evaluators are telling teachers they exceed expectations in their observation feedback when in fact student outcomes paint a very different picture. This behavior skirts managerial responsibility and ensures that districts fail to align professional development for teachers in a way that focuses on the greatest areas of need” (p. 32).

“This disparity between student results and observations signifies an unequal application of the evaluation system throughout the state” (p. 32).

Page 33 mentions that Tennessee “leads the nation in available data on teacher performance and effectiveness” and that it possesses a “tremendous amount of student outcome data received through TVAAS.”

However, there’s no mention anywhere of considering the validity or reliability of the TVAAS-based data!

Although the report focuses on the disparity between the Level 1 scores, the chart on page 32 clearly indicates disparities at Level 4 and Level 5 as well:

Level 4
TVAAS 11.9%
Observation 53%

Level 5
TVAAS 31.9%
Observation 23.2%

If the issue is actually observation score inflation, why then does the department not cite Level 4, the level with the largest disparity? Or if the source of error is the observation, not the TVAAS, why not argue that more teachers should’ve received a 5 for their observation score?

I posed these questions to a member of the Dept of Ed and received nothing but equivocation and what amounted to essentially a “We’ll just have to agree to disagree” dodge.

Reply
teachingeconomist says:

January 9, 2013 at 9:44 am

I have no doubt that peer evaluation is the bes t way to evaluate teachers, but I am concerned that some view this as unworkable or even unethical.

Reply
- dianerav says:
  
  January 9, 2013 at 10:02 am
  
  I agree with you. This is how college faculty are evaluated. I see nothing unethical or unworkable about peer review. The obstacle is that many see teachers as grown-up children who can’t make responsible decisions.
  
  Reply
  - Duane Swacker says:
    
    January 9, 2013 at 8:37 pm
    
    I completely agree with your last statement. However, I see a lot that is unethical about peer evaluations, at least in the system that we currently have.
- Duane Swacker says:
  
  January 9, 2013 at 8:35 pm
  
  Well, considering that we teachers are not paid the big bucks to evaluate each other (that’s the administrators who obviously have not been doing the jobs they are paid to do see-ins how there are soooo many “failing” teachers out there-ok turn off snarkometer)
  I do see it as unworkable unless it is done in the fashion of all other “professions” where “professional” interaction is seen as part and parcel of the profession and is not used for evaluative purposes.
  
  No, I am not paid to “evaluate” other teachers, I teach. I’m not in their rooms enough (if ever) to be able to begin to “evaluate” them. If you want me to “evaluate” other teachers pay me the big bucks, eh.
  
  Reply
  - Duane Swacker says:
    
    January 9, 2013 at 8:39 pm
    
    In other words, why should I do the administrator’s job without getting proper compensation. Wouldn’t you agree TE????
  - teachingeconomist says:
    
    January 9, 2013 at 11:01 pm
    
    I suppose it depends on how one views their job. In higher education, all faculty members do view evaluation of each other as part of the job. Faculty make the decision of who to hire, who is promoted, who is tenured. Faculty also evaluate each others for raises. It is part of what is seen as be being a professional.
  - Sandy says:
    
    January 9, 2013 at 11:23 pm
    
    Teachers at a local Jesuit prep school here in the Valley are evaluated by peers, parents, admins and students. Not sure how productive it is. I wonder if it works?
  - Duane Swacker says:
    
    January 9, 2013 at 11:18 pm
    
    TE,
    How can you even begin to evaluate another professor if you are not in the class room to see how they interact with the students? I understand peer review of written work but how can you even begin to think about evaluating another professor without being in his/her (or as evidently the new term “hir”) classroom. Please explain how this works if you would.
    Thanks,
    Duane
  - teachingeconomist says:
    
    January 9, 2013 at 11:35 pm
    
    Actually it begins with the hiring decisions. Typically a department might have a couple of hundred applications for a position. A committee of faculty members will screen the applications down to about 20 and the search committee will interview these candidates at the national meetings (this procedure varies with field, so bear in mind that I am talking about economics here). These meetings in economics usually happen between Christmas and the beginning of the spring term. The committee members use the interviews along with ,enters of recommendations and examples of the candidates work to narrow the choice down to usually three or four on campus candidates. These candidates are flown out to the campus at the campus expense, meet with faculty members, and give a talk to the faculty and graduate students. The faculty will vote to order they candidates and job offers typically follow that order of the department.
    
    Serving on a search committee is a burden, but it is seen as part of the job.
  - "2old2tch" says:
    
    January 9, 2013 at 11:45 pm
    
    TE, that has little to do with yearly evaluations done at the undergraduate level. I doubt you have administrators sitting in the back of your class filling out a checklist or rubric on a regular basis. We need to be careful about the way we translate each other’s worlds, and I include myself in that.
  - teachingeconomist says:
    
    January 10, 2013 at 12:07 am
    
    If by administers you mean deans and provosts, they do no evaluation at all. Everything is done through the faculty. First at the department level, next at the college level, finally at the university level.
  - Duane Swacker says:
    
    January 10, 2013 at 7:32 am
    
    TE,
    As 2 old pointed out the university level is not the same as the public school level. I know a little (very little) about the process at your level-mainly from the collegemisery.com site (quite humorous at that). But again, how can someone evaluate another teacher if one hasn’t been in the classroom to see how they actually teach?
    Duane
  - teachingeconomist says:
    
    January 10, 2013 at 9:06 am
    
    Elementary school is not the same as high school, but we use the same administrative structure for both.