Audrey Beardsley reports here on the trial of teacher evaluation in New Mexico.
She is testifying Monday so she keeps her views to herself, but she quotes others.
This is a quote from an article written by another observer at the trial:
“Joel Boyd, [a highly respected] superintendent of the Santa Fe Public Schools, testified that ‘glaring errors’ have marred the state’s ratings of teachers in his district.” He testified that “We should pause and get it right,” also testifying that “the state agency has not proven itself capable of identifying either effective or ineffective teachers.” Last year when Boyd challenged his district’s 1,000 or so teachers’ rankings, New Mexico’s Public Education Department (PED) “ultimately yielded and increased numerous individual teacher rankings…[which caused]..the district’s overall rating [to improve] by 17 percentage points.”
State Senator Bill Soules, who is also a recently retired teacher, testified that “his last evaluation included data from 18 students he did not teach. ‘Who are those 18 students who I am being evaluated on?’ he asked the judge.”

Ok, it’s good to point out legitimate errors in the system. I’m sure Dr. Amrein-Beardsley will find and report some in her testimony. Hopefully, they will improve the data collection techniques. But suppose that 15% of the teachers identified as ineffective were false positives and did not actually fall into the bottom 10%.
That still leaves 85% of the identified ineffective teachers as ineffective!!! Ultimately, schools exist to effectively educate kids. If you think courts are going to be moved to leave disadvantaged kids in 85% of the ineffective teachers’ classrooms just so that the 15% who were mistakenly identified never have a more thorough review, you are out of your mind. And frankly, rather evil.
LikeLike
Virginia,
If children were manufactured like bars of iron, I would agree with you. But here’s what you need to know: every child is different!
Are you calling the American Statistical Association “evil”?
LikeLike
Here’s a statement I found that says it all.
“The pacing of the curriculum and the Learning Needs of students may not be aligned with the Timing of the assessments. This can limit the Usefulness of the data that result from the assessment!” Because students have different learning needs in every class and every subject.
LikeLike
Here’s evil: basing evaluation systems of a job filled with soft skills on almost nothing but quantitative data. Teaching is very qualitative as well. I’ve posted about this numerous times in response to you.
Virginia, all you want is a system that makes a judgment without actually making a judgment. The computer spits out numbers, an algorithm that is unproven and can’t be explained by its users, an inanimate response simply says, “Hey, you’re effective!” And the human beings are absolved of all decision-making and qualitative factors. “It’s just what the numbers tell me.” I’m sorry, but quite frankly, that’s evil.
We have a teacher in our school who specializes with our hardest to educate populations. I can’t tell you his VAM but I do know that our most socially and personally needy students flock to this guy like the Pied Piper. They sense how much he cares and love the way he mentors them when they need an adult to listen to them. He’s their greatest champion. If his VAM came in low and he got laid off, those kids would lose their most trusted adult in the building. Many would be deeply upset and lost. That would be pretty evil to those kids.
I’m not opposed to statistical analysis of my job. But I am opposed to it being the dominant factor that you routinely propose. There’s way more to this job than just getting kids to spit out test scores. Way more.
LikeLike
Steve K: well put.
Thank you for your comments.
😎
LikeLike
“18 students he did not teach” Well, there’s one error.
I think all they’re testing is math and english. If all other subjects are collateral and not tested, math and english teachers better be getting paid substantially more for having their feet held to the fire.
From a scientific measuring standpoint, it’s a bogus system, applying math and english scores to art, social studies, science, and PE teachers. That’s too peripheral, sketchy, nebulous, and threadbare to be anything but speculation.
LikeLike
Virginia, I hope that some day karma smacks you upside of the head and you lose your job and career because of a metric over which you have no control. For good measure, I hope this metric is published, with your name next to it, in the local paper. As has been suggested, I will no longer interact with you, either.
LikeLike
I suppose a legitimate error could get you fired as well. And you would lick the boots of the firer? No. First of all there is nothing legitimate about the tests, let alone the use of the results.
LikeLike
West Coast Teacher, please see my comments in the Seth Sandrosky thread on the interests of teachers vs students. I think many disagree with you, including Diane, about whether the tests have any benefits (I’m sure Duane is overjoyed with you right now though!). Many simply think you have to analyze trends of larger sample sizes and use the test results as a small part of a larger picture. I don’t really disagree with either statement but may disagree with the specifics. I think about 40 samples over multiple years are sufficient. And I think they should be used for 25-50% of the evaluations. The biggest issue in NY is that SGPs are currently being used for 70%+ of evaluations, not the 50% being touted. That’s because while teachers get the full spread of SGPs (from 1-20 pts), they really only get about 60% of the spread of an observation (say 8-20 pts). That effectively increases the weight of the SGPs. Nobody seems to be making that legitimate argument. I would support you on that.
TC which tests to use in measuring teachers is a crucial point. In business, profit sharing is done in multiple ways. You may get a bonus (could be a large % of compensation) based on entire company or large division. You might say our group did great but we get dragged down by others in the division. In fact, if your group does well and company does poorly, the entire staff could get fired if the company goes down. That said, I’m not a fan of evaluating anyone off things they don’t primarily control. I understand the rationale for asking everyone to help students learn math by highlighting it in other subjects. I respectfully disagree. I don’t like Florida’s model which does this nor New Mexico’s. I think eventually these will fade away.
I completely agree with math/English teachers getting paid more for having to be accountable. As well it should be. Those two subjects are more important (not the only important ones) since you can’t have a career if you can’t read/write or understand math. You will end up with the best teachers in math and English. The less effective teachers (note that VAMs are consistent across subjects) will tend toward the less measured subjects. This happened previously even before VAMs were applied to teachers since schools didn’t want poor average scores overall. They simply places bad teachers in non-tested subjects.
Steve K I don’t disagree that teaching requires soft skills. So does being a great quarterback. Many of the best assets of a QB revolves around leadership, being a great example and even a teacher to the receivers. But the objective data of winning and passer ratings measure these soft skills. Maybe not perfectly but better than any subjective rating. The same goes for teaching. Great teachers with high VAMs don’t “test prep” all day. They inspire, motivate and communicate well. Those show up on objective scores. If we could objectively measure these things ahead of time, we could predict which teaching candidates would be effective. We cannot. Thus, we must measure.
Steve, why do you think the Pied Piper teacher would have a bad VAM? Rafe had a great VAM. Why don’t you pick out all the great teachers (before looking at their VAMs) and see how the VAMs stack up. As one NC principal told me on twitter, he “knows” who the better and worse teachers before looking at the VAMs. The question I had was why were the ineffective teachers still teaching kids then? Why do we need VAMs to force action on these ineffective teachers? And I never suggested it be a dominant factor. I support the Gates view that it should be 25-50%. And not “50%” as in NY where it’s actually 70%. The variability of the VAM should be 50% of the total evaluation variability.
Diane, can you point to the part of the ASA statement that condemns using VAMs to measure teachers? They say “It is not meant to be prescriptive or advocate any particular VAM specification or promote or condemn specific uses of VAM”. They advise caution and care to publish all the relevant caveats but never condemn VAM.
LikeLike
Virginia,
I am beginning to think I am wasting my time responding to you. Have you read the American Statistical Association report? I don’t think so. It says plainly that VAMs “typically” measure correlation, not causation; that different models produce different ratings; and that teachers account for 1%-14% of the variance in test scores. I can’t copy from the pdf of the ASA report. You will find it here: https://dianeravitch.net/2014/04/12/breaking-news-american-statistical-association-issues-caution-on-use-of-vam/
Here is the ASA statement: It rightly criticizes the use of VAM to evaluate individual teachers: http://www.amstat.org/policy/pdfs/asa_vam_statement.pdf
I am sorry but I am not going to respond to you anymore. You are a closed-minded ideologue. You think children are widgets that can be cut and shaped to size; you think that children learn because a teacher pours stuff into their heads.
Your lack of knowledge of teaching and learning makes it impossible for me to answer your comments in the future. It is a waste of my time because you never listen and you never learn. This last question about the ASA statement is typical. You didn’t read it. If you did, you didn’t understand it. It is written in plain English. Please read it again and stop pestering me by making the same comments over and over again, without any indication that you ever learn.
LikeLike
Virginia, Great teachers don’t want to test prep—they are forced to because not only are the stats used to evaluate teachers flawed, but so are the tests that measure both students and teachers. The Lederman case is an excellent example of VAM’s stupidity. Here an excellent teachers got a less than desirable rating based on test scores.
I remember reading that the gentleman who developed VAM told Joel Klein that is was not intended to be used for teachers, but Klein ignored his caveats because VAM is a punitive measure based on the margin of error. Junk Science!!!
LikeLike
Unfortunately, this Emperor has no clothes to cover his flawed character. Even more problematic is that he cannot even understand the point written in a very plain English language–which is understandable to anyone–even someone like me who speaks English as a second language.
I cannot agree with Diane anymore. In space, you can see one jupiter spinning around the moon three times as fast as normal ones. That one is called S(upernova) G(aijin) P(lanet). It never stops until it gets shot down because it’s completely out of order.
LikeLike
cannot/couldn’t
anymore/more
Ugh. Typos
LikeLike
Diane,
You’ve suffered a great walloping fool for far too long. And, no, I’m not talking about myself.
LikeLike
I feel schools exist to educate future human citizens, not statistically sufficient drones.
LikeLike
See response below
LikeLike
It is my understanding from reading another article about this that the AFT is supporting this case. However, that is not the case with Lederman here in NY. I take it NM doesn’t have a Democratic governor? I don’t even recall NYTeacher, our union publication, having a front-page story on this–or any story. But that’s a publication better for wrapping fish than reading since it’s filled with so much false propaganda.
Rooting for both cases!!!!
LikeLike
AFT is not only supporting it but AFT-NM and the Albuquerque Teachers Federation (ATF) are the ones who filed the lawsuit, paying the attorney and doing all of it. The lawsuit is 100% union.
LikeLike
So why the double standard? Why not do the same for NY and show support for Lederman??
LikeLike
Thank God someone is willing to step up and support a better approach for the benefit of children and society, otherwise the profession would be bent to serve some less humane business model. Don’t praise AFT too much, though. Their record is rather cautious and complacent.
LikeLike
I have been disappointed by all of our union heads when it comes to true, front-line, pro- activism. Not sure if it’s the need to preserve their own seat at the leaders table, political strategery, or what…but it has taken significant citizen action to suddenly prod them into their “Hey!!! Look how much we care now!” public positions.
LikeLike
There is absolutely nothing complacent about AFT-NM or Albuquerque Teachers Federation! This lawsuit would NOT have ever happened if it were not for the hard work and leadership of Stephanie Ly, President of AFT-NM and Ellen Bernstein, President of Albuquerque Teachers Federation. There is nothing EVER cautious about their work to preserve public education. AFT-NM and ATF have formed some great coalitions with progressive organizations but there has been very little citizen action to prod these two unions to fight for public education. They have never done what they do for a seat at the table but instead for their belief in social justice. They have been fighting these incredibly punitive and flawed evaluation scam since the day one. It is obvious that some of the posters on this page our clueless to the leadership of President Ly and President Bernstein.
LikeLike
I am proud of the actions you describe in New Mexico. I am not proud of union leadership at the national level. You are right about me being clueless regarding the specific people you mention. You are lucky to have them in your corner.
LikeLike
This system is destroying teachers and our system. Is it any wonder that there are over 300 teacher vacancies in our state’s largest school district?
LikeLike
Can the group please refrain from engaging with posters whose aim is to make known their own “superior” opinions, while demonstrating very little interest in learning from the group or engaging in any real sort of dialogue about education. Ego feed, and not much else as a result. Waste of time and effort, indeed.
LikeLike
Priscilla,
Thank you for good advice and reminding us of the principle:
“Don’t feed the trolls.”
LikeLike
I admit it, I fed the troll! But I learned so much about the ins & outs of the NM VAM while researching my response, I just had to put it out there. A truly horrible system. Up until now I’d only delved into my own state’s (NJ’s) VAM details.
LikeLike
Virginiasgp: let’s use your example (15% ‘false positives’ among total ineffective ratings), applied to actual 2015 NM results.
Of 20,500 NM teachers rated, 3.6%, or 738 people, scored ‘ineffective’. Let’s say 15% or 111 people can show their ratings were falsely calculated.
Can we conclude that the other 627 people were correctly rated ineffective? Not unless their evaluations were also examined for errors. If all 738 ineffectives were closely examined, finding only 111 false, can we conclude that the overall rating system is pretty good since there was only a .005 error rate? Only if all 20,500 have been closely examined and found error-free excepting the 111. If only the 738 ineffectives were examined for error, would we draw the alarming conclusion that there’s a 15%
error rate overall? There’s no rationale for extrapolating from 738 to 20,500.
Can we conclude anything AT ALL from NM’s VAM results? Let’s look at the input data.
1. 50% based on improved student achievement – based on NM’s EOC SBA. Example, Matt scored 30, 30, 30 on the EOC’s for 3rd, 4th, & 5th grade, his expected score in 6th grade is 30. If he gets a 32 this year, good for teacher. If he gets a 28, bad for teacher. Is this the same test Matt took last year? No. Based on same stds? No. Is this the same classroom teacher? No. Is Matt a mass-produced widget? Are his 5th & 6th grade teachers clones? What if Matt lost a parent, or became homeless, causing his attendance, time for study, score to take a dive? Tough for Matt– and his teacher. But don’t worry only 1 teacher takes a big hit, then the dive is rolled into the next 3-yr ave. Next yr’s teacher might get a windfall if Matt gets a new home!
2. 25% based on 2 to 3 “formally scored” teacher observations lasting “at least 15 minutes”, performed by “school leaders trained and certified in the NMTEACH Observation Protocol. Guided by a rubric that describes teacher effectiveness at varying levels, observers have an objective set of criteria to evaluate teachers”… Wow. 30-45mins = 25% of your input. Standardize with training & rubrics, it’s still observation of humans by humans, subject to error, gaming, lying, you name it.
3. 25% based on very loosely defined “Multiple Measures”: teacher inputs samples of teaching ‘artifacts’ ranging from lesson plans to communication system with parents; this section also includes teacher’s attendance, and even a couple of % worth for student & parent satisfaction surveys. Wide open: any input, any interpretation.
I am an outside observer, w/ both bus. & teaching experience, but little of the latter full-time & none of it in public schools (tho I did put 3 thro pub sch in NJ). My viewpoint is primarily that of an educated, informed taxpayer and citizen. This VAM system is an insult to the intelligence.
LikeLike