Massachusetts is the latest battlefield over the question of how to evaluate teachers. At the center of the conflict is the favorite idea of Arne Duncan and Bill Gates: evaluating teachers by the test scores of their students (or if not their students, someone else’s students). The new Every Student Succeeds Act relieved states of the obligation to tie teacher evaluations to students scores. Oklahoma and Hawaii recently dropped the measure, which many researchers consider invalid and unreliable.
The state plans to impose its evaluation system on all teachers, including teachers of the arts and physical education. How the state will measure the students’ growth in music or art or sports is not clear.
Researchers at the University of Massachusetts-Amherst studied the plan and criticized it:
A 2014 report by the Center for Educational Assessment at the University of Massachusetts Amherst, which examined student growth percentiles, found the “amount of random error was substantial.”
“You might as well flip a coin,” Stephen Sireci, one of the report’s authors and a UMass professor at the Center for Educational Assessment, said in an interview. “Our research indicates that student growth percentiles are unreliable and should not be used in teacher evaluations. We see a lot of students being misclassified at the classroom level.”
The Massachusetts Teachers Association, the largest teachers’ union in the state, has come out in opposition to the plan, as has the Massachusetts Association of School Committees, representing the state’s elected school board.
But state officials, led by state Commissioner Mitchell Chester, insist that they won’t back down. Boston’s superintendent, Tommy Chang, a graduate of the unaccredited Broad Superintendents Academy, is acting to implement the evaluations.
A centerpiece of Massachusetts’ effort to evaluate the performance of educators is facing mounting opposition from the state’s teacher unions as well as a growing number of school committees and superintendents.
At issue is the state’s edict to measure — based largely on test scores — how much students have learned in a given year.
The opposition is flaring as districts have fallen behind a state deadline to create a “student impact rating,” which would assign a numeric value to test score growth by classroom and school. The rating is intended to determine whether teachers or administrators are effectively boosting student achievement. The requirement — still being implemented — would apply to all educators, including music, art, and gym teachers.
“In theory it sounded like a good idea, but in practice it turned out to be insurmountable task,” said Glenn Koocher, executive director of the Massachusetts Association of School Committees. “How do you measure a music teacher’s impact on a student’s proficiency in music? How do you measure a guidance counselor’s impact on student achievement?”
Critics question whether the data can be affected by other factors, including highly engaged parents or classrooms with disproportionate numbers of students with disabilities or other learning barriers. The requirement has also created problems in developing assessments for subjects where standardized tests are not given, such as in art and gym.
Resistance has escalated in recent weeks. On Thursday, the state’s largest teachers union, the Massachusetts Teachers Association, as well as others successfully lobbied the Senate to approve an amendment to the state budget that would no longer require student impact ratings in job evaluations. A week earlier, the Massachusetts Association of School Committees passed a policy statement urging the state to scrap the student impact ratings.
But some educators see value in the student impact ratings. Mitchell Chester, state commissioner for elementary and secondary education, defended the requirement, which has been more than five years in the making.
Commissioner Chester is deeply involved with the Common Core and the tests for Common Core. Until recently, he was chair of the PARCC Governing Board.
The educational turmoil in Massachusetts is baffling. It is the nation’s highest-scoring state on standardized tests, yet school leaders like Mitchell Chester can’t stop messing with success. Although they like to say they are “trying to close the achievement gap” or they are imposing tougher measures “to help minority students,” these are the children who fall even farther behind because of the new tests, which are harder than past tests, and are developmentally inappropriate, according to teachers who have seen them.
What is happening in Massachusetts is the epitome of “reform” arrogance. Why doesn’t Commissioner Chester support the fine teachers he has and fight for better funding and smaller classes in hard-pressed urban districts like Boston?

“How the state will measure the students’ growth in music or art or sports is not clear.”
Why is this even a question? Just make the requirements in line with exit exams in academic subjects: give a concert at a nationally-ranked theater, a showing at a major gallery, & qualify for the farm league of a major league team. Simple! 🙂
LikeLike
Stupid, stupid, stupid.
LikeLiked by 1 person
Not enough stupids, F L E R P !
LikeLike
Smart people would admit this was a mistake and drop it. I realize they all fell in love with it and spent a billion dollars and their credibility on it but they have to let it go.
I cannot imagine the amount of time and money the state of Ohio has expended on this. All they need is one brave lawmaker to break the logjam and ADMIT ERROR. Think of all the other stuff they could be doing! I think you could improve test scores in my district if they had expended half this energy toward getting kids to school every day and anyone can measure that!
LikeLike
Mitchell Chester almost sounds like a gameshow host.
LikeLike
Beware.
He’s got a goat behind curtain 3.
LikeLiked by 1 person
🙂 With so many Big Name reformers still pushing reform while the money runs out, soon there will be nothing at all behind that curtain….not even that little guy in the Wizard of Oz.
LikeLike
My school district cut back on music education under Ohio’s innovative ed reform strategy of cutting funding for public schools to give tax cuts to wealthy people. I can afford music lessons to replace the formerly public instruction so my son is really killing it on the baritone compared to the lower income “cohort”
What part of his band director’s VAM score should be attributed to private lessons?
.007 or .008? I can’t go on without this vital information.
LikeLiked by 1 person
“How do you measure a music teacher’s impact on a student’s proficiency in music?” In my state – NY – the question should be “How do you measure a music teacher’s impact on a student’s proficiency in math and English?” I am an elementary school band director, and 40% of my APPR is my VAM score which is taken from the common core state tests in math and ELA. The whole system is beyond stupid.
LikeLike
Massachusetts is determined to follow the failed recommendations that awarded them the RTTT money. Now they continue to try to pound a square peg in a round hole by forcing districts to disregard reason and research so they can stick to the “reform plan.” This type of upside side thinking occurs with “pay to play” incentives. They start with an erroneous conclusion, and try to force everyone to abide with it.
LikeLike
Indefensible! So Massachusetts officials have spun an embarrassing web of
deceit since the day the state ‘won’.
LikeLike
“How do you measure a music teacher’s impact on a student’s proficiency in music?
That’s actually the wrong question
The right question for k-12 would be “How do you gauge a teacher’s impact on a student’s love of music?”
I have seen first hand the effect of making “proficiency” the goal of music teaching at the middle school and high school level. It made my niece quit oboe in 8th grade and my nephew quit saxophone in 7th grade.
It’s a huge mistake to focus on “proficiency” at that age because it makes kids hate music. Proficiency comes naturally when kids like what they are doing and keep practicing.
The reality is that Massachusetts is a test-crazed state and has been for a long time, which is why their students do well on tests. But people should not make the mistake of assuming that their students are better educated than anyone else based on that fact.
LikeLike
So right, and I’d argue this applies to every subject in k-12. This obsessive striving for “achievement,” causing us to ignore that education is a “human system,” — is the problem. Almost paradoxically, students “achieve” better when “achievement” is not the goal. Proficiency is not bad — of course we want people to excel at things. But this sick and narrow-minded obsession with proficiency-at-all-costs is a bad thing. It is ruining education.
To expect that all students should be “proficient” at writing an essay — or at playing the clarinet — is insane.
LikeLiked by 1 person
Both my niece and nephew actually loved playing when they started out and would actually practice even when they did not have to.
The irony is that they were actually very good — ie, proficient — for their age.
But because of the focus on proficiency (they actually had to take tests in band! in middle school!), they came to hate the whole thing. My niece is now a college graduate and has not picked up her oboe since. Nor has my nephew.
That’s a crying shame.
LikeLike
It’s all too familiar a story…
Students have to take tests and be graded on everything nowadays, even in the arts, which are fundamentally against standardization and truly beyond measurement. It’s like when Bizarro Albert Einstein said, “Things only matter when they’re measured and documented — especially the most important things.”
He also said, “Art without measurement is blind… Measurement without art is just fine”
LikeLike
Ed
Einstein also said the opposite
LikeLike
SomeDAM Poet
Partially correct. Affinities for any subject can be killed by the testing regime. But note that performance in the manner of an artist is not a prerequisite for love. Think poetry, the whole symphony or jazz performance, a novel that haunts your memory with little lessons about human beings you have not and never will meet, places you are unlikely to be, and so forth.
Also impact is a terrible word. It is a perversion of concepts in education about human growth and development. You don’t achieve growth by impacting students. The concept is really absurd, if not dangerous. It is favored by the economists. They are largely responsible for corrupting the educational meanings of growth, connected with human growth and development, reducing the meaning to increments in test scores year over year. That is return-on-investment talk that has structured too much of education. “We grew our test scores therefore we “impacted our students.” This is fit for satire, but it is serious.
For anyone interested in “impact” assessments, here is a brief history of the migration into education.
Click to access The-history-of-social-impact-assessment.pdf
LikeLike
Diane, I think one of us misunderstood each other. “Bizarro” Einstein is the opposite of Einstein 🙂
“In popular culture “Bizarro World” has come to mean a situation or setting which is weirdly inverted or opposite to expectations.”
https://en.wikipedia.org/wiki/Bizarro_World
It’s like saying Bizarro Arne Duncan was the best Secretary of Ed in history, or our current education policy “puts the students first” in bizarro world… 🙂
LikeLike
Ed, you are right. I misunderstood you.
LikeLike
Laura
You’re right. Impact is a bad word. I should have been more careful to change that word as well –to “effect” or something like that — when I copied the sentence and changed proficiency to love for.
And agree that proficiency (at playing an instrument, for example) is not necessary to have a love of music.
As far as gauging a teacher’s effect on student’s love for music, I think one way to do it is to see how excited that student is — or unexcited — which is actually the main point I was making.
If a student is very excited to begin with and then becomes less and less excited and eventually quits after just a year or two, there is something seriously amiss.
In the case of my niece and nephew, the focus on “proficiency” completely killed any love they had for music to begin with.
The point of middle school and high school music programs should be to encourage everyone to develop a love for music.
LikeLike
In my IL district, as a special education teacher, my evaluation is based on my caseload students progress in either their math or communications score (whichever has the highest number of students with IEP goals in that area). It has NOTHING to do with what I work with the students on, or what I teach. It’s not fair either to me (relying on someone else to teach what my students need) or to the other teacher (responsible for what someone else receives on an evaluation). And what does our union tell me? It’s the best we can do.
LikeLike
“And what does our union tell me? It’s the best we can do.”
Those vaunted unions have sold out a long time ago (except for a few i.e., Chicago under Lewis, MA with Madeloni as head, and ???).
LikeLike
If the population of self contained special ed students resembles those that I used to have, it is a wonder that any special ed teacher keeps their job beyond a year. The kids come with so much baggage that requires a lot more from a teacher than how much you can shove into their heads in a manner that they can then demonstrate on a computer test that has little to do with what happens in a classroom (and in most cases it wouldn’t matter if it did). No one has ever explained to me how you differentiate instruction in such a manner that then allows students to shine on a one size fits all test. I still remember the blank unresponsive stares I got when I suggested that they needed to provide my students their IEP accommodations for MAP testing. Never happened. No one ever explained to me how all my students were supposed to make multiyear progress like those in the ideal cohort and program setup marketed to the district. Neither my students or the program components provided came close to that dream classroom. Not that any test score would ever adequately describe even one of those students.
LikeLike
I wrote about the idiocy of using math and reading test scores to evaluate music teachers here, if anyone is interested: https://theconversation.com/can-it-get-more-absurd-now-music-teachers-are-being-tested-based-on-math-and-reading-scores-47995
LikeLike
Our research indicates that student growth percentiles are unreliable and should not be used in teacher evaluations. ”
Student growth percentiles were never intended to be used in teacher evaluations.
They are not just unreliable, but are actually invalid for that purpose.
LikeLike
“These growth measures [SGP’s] can be aggregated to the classroom or school level to provide descriptive information on how the group of students grew in performance over time, on average, as a subset of a larger group. But, these measures include no attempt at all to attribute that growth or a portion of that growth to individual teachers or schools. That is, sort out the extent to which that growth is a function of the teacher, as opposed to being a function of the mix of peers in the classroom.”
” SGPs, unlike VAMs are not even intended to “infer” that the growth was caused by differences in teacher or school quality. ”
— Bruce Baker in Take your SGP and VAMit, damnit
LikeLike
Here’s a paradox: if the people of Massachusetts are so smart (based on test scores), why have they (or at least their officials) proposed using a method for evaluating teachers that is completely invalid for that purpose?
LikeLike
If the people of Massachusetts are so smart, why is MA the only state in the country
to exempt the legislature, the judiciary and the governor from public records statues.
LikeLike
“In theory it sounded like a good idea, but in practice it turned out to be insurmountable task,”
NO! It’s pure garbage, theoretically speaking. Unless one is a raccoon that “theory” garbage stinks to high heaven.
LikeLike
No standardized tests can adequately “measure” the intellectual growth of students in a particular class. Rather each school should follow the lead of the NY Consortium on Performance Standards where teachers in each class create a series of before, during and after high-challenge, authentic benchmark assessments to observe performance on significant intellectual processes in that subject. By year’s end they can more than adequately show students’ growth over time as evidenced in recently published project report:
Moving from WHAT to What if? Teaching Critical Thinking with Authentic Inquiry and Assessment (Routledge, 2016)
John Barell
LikeLike
Can you show me what this means for Kindergarten learning, in art, and how to observe the “significant intellectual processes” when a major purpose of instruction is to keep interest alive–far more than to impose critical thinking. What you seem to be seeking is something like the Goodenough-Harris Draw-a-Person Tests, except these are really a measure of prior learning and native talent and once used as an IQ test. The test assumes that students come from a cultural and home background that has no prohibitions against drawing a human figure. The lowest score is 0 for an aimless scribble. A score of 1 is awarded for somewhat controlled lines that that suggest an attempt at representation The maximum score is 51, with one point earned (or not) for the inclusion of specific details in nine categories that reflect a perceptual and conceptual grasp of common features in a healthy and typical human.
An additional category offers a measure of motor skill in using the provided medium (a soft pencil, no eraser). National “mental age” norms are provided with a range from 3 years, 3 months to 12 years, nine months.
The Goodenough test assumes that a proper aim of art education is developing representational drawing skills with a focus on the human figure, over and over, grade to grade.
The test is a performance measure for certain, and many will find it challenging. It is even harder to rate performances. Raters of the drawings should be selected from a pool of evaluators who have academic training in representational drawing skills and knowledge of the drawing conventions of young children as documented in the research of scholars such as Rhoda Kellogg, Hilda Lewis, Al Hurwitz, Brent Wilson, Christine Thompson, Martha Colbert, and Kathy Weisman Topal among others.
The estimated time for one qualified judge to score one drawing is 10-15 minutes. For a minimum threshold of reliability, three judges must score each work independently. If the total scores for each drawing are more than three points apart, those drawings must be reviewed again to secure a consensus on the score.
I think this way of thinking about “assessment” comes close to the idea of performance assessment, but it would also misrepresent major aims for art education and be another case of aggrandizing tests and accountability.
How about some test/assesmment free education for a change. Refocus on evaluation–drawing forth the value of learning. In art this could mean inviting conversations about art rather than extracting some rating from these conversations to comply with an accountability system. How about some conversations about learning for particular students with those students, and some conversations with the students’ parents/caregivers. How about some group celebrations of unexpected and idiosyncratic learning–not a norm or bell curve in sight.
I think I am not alone in rejecting the idea that so much of educational theory, policy and practice must be organized around tests, measures, assessments, counting the “how much” learning as if that is the same as how well, how memorable, how useful, how beautiful and surprising learning can be.
Making schools test-free zones might restore some sanity to education, and wisdom, and mental space for learning.
LikeLike
Almost every teacher in my district will be evaluated on the combined scores from 9th grade algebra I, 9th grade Living Environment, 10th grade global history, 11th grade US history, and 11th grade ELA Regents tests. Student growth in this mix of tests will be used determine teacher effectiveness, and count for 50% of a teacher’s overall evaluation. A number of NY school districts are taking this approach to avoid having to produce state approved “local” exams for every subject at every grade level, except for the state tests already in place, but excluding Common Core math and ELA, grades 3 to 8, as there is a 4 year moratorium in place. By the 2017 – 2018 all of this will probably be resigned to the smoldering ash heap of not just stupid, but very damaging education policy.
LikeLike
VAM has been “slammed” — quoting The Washington Post — by the very people who know the most about data measurement: The American Statistical Association (ASA). So every teacher who is unfavorably evaluated on the basis of students’ standardized test scores should vigorously oppose the evaluation, citing the ASA’s authoritative, detailed, seven-page VAM-slam “Statement on Using Value-Added Models for Educational Assessment” as the basis to have public employment boards and courts toss out any test-based Value Added Model (VAM) unfavorable evaluation.
Moreover, a copy of the VAM-slam ASA Statement should be posted on the union bulletin board at every school site throughout our nation and should be explained to every teacher by their union at individual site faculty meetings so that teachers are aware of what it says about how invalid it is to use standardized test results to evaluate teachers.
Even the anti-public school, anti-union Washington Post newspaper said this about the ASA Statement: “You can be certain that members of the American Statistical Association, the largest organization in the United States representing statisticians and related professionals, know a thing or two about data and measurement. The ASA just slammed the high-stakes ‘value-added method’ (VAM) of evaluating teachers that has been increasingly embraced in states as part of school-reform efforts. VAM purports to be able to take student standardized test scores and measure the ‘value’ a teacher adds to student learning through complicated formulas that can supposedly factor out all of the other influences and emerge with a valid assessment of how effective a particular teacher has been. THESE FORMULAS CAN’T ACTUALLY DO THIS (emphasis added) with sufficient reliability and validity, but school reformers have pushed this approach and now most states use VAM as part of teacher evaluations.”
The ASA Statement points out the following and many other failings of testing-based VAM:
> “VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.”
> “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions.”
“System-level conditions” include everything from overcrowded and underfunded classrooms to district-and site-level management of the schools and to student poverty.
Fight back! Never, never, never give up!
LikeLike
I’m a special edition teacher in NYC. Welcome to insanity, Massachussetts. And, yes: Art and PE teachers are caught in the net, as well; having their evaluations effected by academic outcomes far removed from their control.
This will do absolutely NOTHING in closing the achievement gap. It’s like making a doctor’s worth contingent on the health of his/her patients AND those of other doctors, to boot.
LikeLike
I believe once these “coaches” become “administrators,” they completely forget about the students, and are there to inspire SCORES!!!
LikeLike