Archives for category: Teacher Evaluations

Educators in Néw York are trying to make sense of the state’s evaluation system. The formula is supposed to consist of observations (60%); state scores (20%); and local assessments (20%). Yet the results don’t line up with common sense or common knowledge.

Some principals seem to be giving higher observation scores to teachers they want to protect because they believe they are valuable and don’t want to lose them

“In Scarsdale, regarded as one of the best school systems in the country, no teacher has been rated “highly effective” in classroom observations. It is the only district in the Lower Hudson Valley with that strict an evaluation. In Pleasantville, 99 percent of the teachers are rated as “highly effective” in the same category.”

Charlotte Danielson, whose rubric is the basis forest teacher evaluation systems, called these results “laughable.”

“Pleasantville schools Superintendent Mary Fox-Alter defended her district’s classroom observation scores, which use the Danielson model — saying the state’s “flawed” model had forced districts to scale or bump up the scores so “effective” teachers don’t end up with a rating of “developing.”

What is truly laughable is the effort to turn the art and craft of teaching into a scaled metric, like weighing apples at the supermarket. What is essentially a matter of human judgment, based on experience and wisdom, cannot be measured and graded. Its results will always be flawed, and the very act of measuring the unmeasurable will change teacher behavior to conform to the scale. If all we want is higher scores, this might be a good way to get them. If we want inspired teaching, it is not.

Audrey Amrein-Beardsley posted a guest blog by a rising star in the Academy, Jimmy Scherrer of North Carolina State University, who previously taught in LAUSD.

Scherrer wrote:

“As someone who works with students in poverty [see also a recent article Scherrer wrote in the highly esteemed, peer-reviewed Educational Researcher], I am deeply troubled by the use of status measures—the raw scores of standardized assessments—for accountability purposes. The relationship between SES and standardized assessment scores is well known. Thus, using status measures for accountability purposes incentivizes teachers to work in the most advantaged schools.

“So, I am pleased with the increasing number of accountability systems that are moving away from status measures. In their place, systems seem to be favoring value-added estimates. In theory, this is a significant improvement. However, the manner in which the models are currently being used and how the estimates are currently being interpreted is intellectually criminal. The models’ limitations are obvious. But, as a learning scientist, what’s most alarming is the increasing use of the estimates generated by value-added models as a proxy for “effective” teaching…..”

“Typically, research studies on teaching and learning are framed using one of three perspectives: the behaviorist, the cognitivist, and the situative. Each perspective is associated with a different grain size. The behaviorist perspective focuses on basic skills, such as arithmetic. The cognitivist perspective focuses on conceptual understanding, such as making connections between addition and multiplication. The situative perspective focuses on practices, such as the ability to make and test conjectures. Effective teaching includes providing opportunities for students to strengthen each focus. However, traditional standardized assessments mainly contain questions that are crafted from a behaviorist perspective. The conceptual understanding that is highlighted in the cognitivist perspective and the participation in practices that is highlighted in the situative perspective are not captured on traditional standardized assessments. Thus, the only valid inference that can be made from a value-added estimate is about a teacher’s ability to teach the basic skills and knowledge associated with the behaviorist perspective.”

This, he writes, is “intellectually criminal” and “intellectually lazy.”

Tell it! VAM is Junk Science.

A teacher in Texas wrote this comment, which depicts (to me) a system where data matters more than teachers or learning or children, either the system is on autopilot or is run by people who confuse numbers with learning.

“They recruited from NC and from Spain (for bilingual teachers) this year because they did expect vacancies. I think it’s important to mention that all are not based on EVAAS because not everyone has those standardized scores. They are also based on Stanford testing in 1st and 2nd grade and for classes like PE, a district made assessment. I teach Kinder and am still waiting to find out what growth they calculated for my scores last year (and yes, they were bubble-in multiple choice tests). No one could explain to me how it was going to work, what percentage growth was required to be considered effective and how that was going to be calculated– so I’m very anxious about it. I was rated highly effective in the professional and instructional areas but who knows. We are supposed to use 2 different assessments for more validity but that doesn’t happen-they end up using the reading and math versions of the same test given the same week. I did wonder how many vacancies they had to start the new school year yesterday?”

Laura Chapman, a regular contributor to the blog, has worked in arts education for many years.

She writes:

This desire to churn the teaching workforce is not just a push from Bill Gates and lawsuits to dismantle unions.
Six economists/statisticians brought together at the Brookings Institution offered a similar plan. These number crunchers said that district-wide VAM (value-added) scores should be used to determine the most effective teachers, irrespective of the subjects and grade-levels they teach.

This proposal is efficient and absurd. It is based on the assumption that a district’s value added scores are so highly correlated with “non-value added” measures that employment decisions for all teachers can be based on the performance of teachers with value added scores.

Under this system, all teachers would also have a composite evaluation based on multiple measures such as end of course test scores, observations, and student surveys. Even so, the teachers with VAM scores would determine the employment fate of all teachers. How is this conclusion reached?

Here is the magical thinking: “For example, we would assume that the correlation between observationally-based ratings of teachers and value-added (scores) in math would be the same in history, where value-added measures are not available.”

In other words, the statisticians freely invent (impute) a missing metric for the history teacher by assuming a math teacher’s rating on a classroom observation protocol can be used as a substitute for the history teacher’s missing value added score.

Those inferential leaps are just the beginning of a larger plan that would make all teacher evaluations “comparable” without any distinctions in grade level, or subject, or conditions under which teachers work.

The Brookings policy articulates principles for dismissing up to 25% of teachers in a district, on the assumption that this action plan would increase test scores and be “fair” to every teacher. The only exception to this formula might be for teachers of exceptional children. This case of econometric thinking ignores the educational, ecological, and substantive importance of different job assignments. See Corerelation, Para 5 in

The Brookings paper is not radically different, (except for the 25 % churn) from a USDE plan for all teachers by a collective VAM for a school, but limited to one of the “priority” subtests such as reading or mathematics. In Florida, for example, the school wide VAM in reading or math is assigned to art and other teachers of nontested subjects. In other words, the curriculum and instruction that really matters is narrowed to the three R’s.

The use of a collective VAM focused on reading or math is a rapid and cost-effective way to meet federal or state requirements for teacher evaluation. Moreover, in 2014, a U.S. district judge ruled that evaluators in Florida are allowed to disregard a teacher’s job assignment in rating performance. The judge ruled that this practice is legal, even if it is unfair.

Teacher ratings based on a collective value-added score are likely to increase in states where Common Core State Standards (CCSS) are adopted and tested. The CCSS call for all teachers to improve student proficiency in English Language Arts and mathematics.

Although the American Statistical Association has denounced the practice of using VAM for rating individuals, that measure is unlikely to disappear as a tool for churning the workforce.

In the Obama/Duncan/McKinsey & Co. “RESPECT” project, for example, a teacher can only be judged “highly qualified” by producing more than a year’s worth of growth (gain in test scores) in three out of every five years. Teachers without that designation have shorter up-or-out criteria to meet.

This stack-ranking system, like the Brookings plan, banishes job security and churns the teaching workforce by insisting on one-size-fit-all criteria for “effective” teachers.

Zak Jason wrote a fascinating interview in “Boston” magazine with Barbara Madeloni, the recently elected president of the Massachusetts Teachers Association, the largest union in the state with 110,000 members.

I first learned of Madeloni when she was preparing teachers at the University of Massachusetts, Amherst, and she refused to give the Pearson test to evaluate new teachers. Michael Winerip wrote a story about her defiance in the New York Times, and within a matter of days, her contract was not renewed. Now all teacher candidates across the university are required to take the Pearson exam.

I learned many things from this article. I learned that Barbara was a psychotherapist before she became a high school English teacher. I learned that when she ran for union president, she was considered a very long shot. Some people thought she had no chance at all.

I learned that the State Commissioner of Education, Mitchell Chester, is also chair of the governing board of PARCC, one of the two federally-funded Common Core tests. Some in the state say he has a conflict of interest.

Madeloni has called for a three-year moratorium on all testing and teacher evaluations:

“We’ve been trying to do scale, instead of human beings. We need to do human beings,” she says. She lambasts the Common Core, a national set of curriculum standards that the state adopted in 2010, as “corporate deform,” and described its architects to CommonWealth magazine as “rich white men who are deciding the course of public education for black and brown children.”

“The past and present heads of the state’s top education offices I talked to dismiss Madeloni’s rhetoric as naive, absurd, and, in the case of the moratorium, illegal. Mitchell Chester, the commissioner of the state’s Department of Elementary and Secondary Education (DESE), says he’s concerned that her “hyperbolic” vision may force the DESE to tune out the entire union.”

Chester may dismiss her, but teachers view her as a savior. “She’s the first MTA leader willing to listen to their agony, and to tell the truth about how teaching in the age of accountability can be, as Holyoke teacher Cheri Cluff puts it, “like waiting tables at a busy restaurant; you’re running and running and running, and you’ve lost your head.” Whereas past presidents and her opponent, MTA vice president Tim Sullivan, were willing to compromise with state administrators, Madeloni is combative, unapologetic, and, as Agustin Morales, another Holyoke teacher, says, “unafraid to make her life uncomfortable.”

Morales, the article notes, was elected president of his local in Holyoke with a 70% majority; he complained about the data walls, where students’ names and test scores are publicly posted. He was fired.

Madeloni is a fighter. She is outspoken and unafraid. Will she be marginalized by the state? Can the state alienate its largest union? Watch for the battles ahead. Madeloni was elected to stand up for teachers. Richard Stutman of the Boston Teachers Union has agreed to collaborate with her.

Zak Jason concluded:

“When I first talked to Madeloni soon after her election, she agreed to have me follow her throughout her first week. But just before her presidency began, she told me, “As a psychotherapist, I know the presence of someone else in the room can affect how the room behaves,” and said she would only be available for an interview, and her communications director James Sacks would join.

“As I’m about to leave her office, Madeloni turns to Sacks and asks, half-joking, “Is there anything I didn’t say that I was supposed to say?”

“What’s your vision?” he says.

“That we reclaim the vision of public education as a space for democracy, for joy, for hope, for a better future for all of our children. All of our children.”

Peter Greene sees signs that educators are fed up with the top-down mandates from non-educator Arne Duncan, fed up with the failed punitive policies of NCLB and Race to the Top. Now we know that Washington cares about one thing only: test scores, and now we know that the beneficiaries of Washington’s obsession are the testing companies. We have now had nearly 15 years of test-based incentives and sanctions and ample evidence that this approach has driven joy out of learning and failed to achieve anything that benefits students or society.

As the school year begins, let’s hope that there will be more states following Vermont’s lead by rejecting federal mandates and setting forth their own vision of what good education looks like. Let’s hope that there will be more teachers like those in Chicago and at Garfield High in Seattle who insist on doing what’s right for their students. Let’s hope that there will be more superintendents like those in Washington State who were compelled by NCLB to send home a letter saying “we are a failing school,” but added a cover letter saying that it was not true. Let’s hope that integrity, courage, and candor break out everywhere.

Paul Horton here attempts to understand why the Obama administration is waging war on teachers. He reminds us of Central Falls, when the Obama administration supported firing the entire staff of the high school. He remembers when the administration was neutral during the Chicago teachers’ strike, and Arne Duncan’s support for the noxious Vergara decision. He could have mentioned many other instances of the administration’s hostility to teachers, such as Duncan’s support for the L.A. Times story releasing the names and ratings of teachers. Or the administration’s silence during the large demonstrations against Wisconsin Governor Scott Walker, or its silence as vouchers spread.

He writes:

“In sum, the war on teachers and due process for teachers is presented by many Democrats as a new war on poverty, and, somewhat obscenely, “the Civil Rights Movement of our time.” Last year Michelle Rhee, former chancellor of Washington D.C. Schools, made speeches at southern civil rights museums that proclaimed that supporting charter schools and making teachers accountable was the key to creating a more equitable America. Closing the achievement gap and not the excuse of poverty was the new focus of the new Civil Rights movement. The National Civil Rights Museum—Lorraine Motel in Memphis recently recognized Geoffery Canada, a Harlem charter school operator and the star of the anti-pubic school documentary, “Waiting for Superman” as a “Civil Rights Hero.”

It was cheaper to wage war on teachers than to wage war on poverty. But that leaves so much unexplained. Why did President Obama embrace the Republican agenda of testing, accountability, and choice? Why did President Obama turn against one of the most reliable members of his party’s base? Horton doesn’t explain.

Remember that Arne Duncan said that there was too much testing, that testing was sucking the oxygen and joy out of classrooms? New York didn’t get the message. In that state, state tests count for 20% of educator evaluations, and local assessments count for another 20%. That is the agreement negotiated with the unions when the state won Race to the Top funding.

That was then, this is now.

The Néw York Board of Regents want state test scores to count for 40% of the evaluations of teachers and principals. This report was was confirmed to me by someone in Albany.

It matters not to the Regents that test-based evaluation is not working anywhere else. It matters not that the AERA and the National Academy of Education warned against it, warned that it would incentivize teachers to avoid high-needs students. It matters not that the American Statistical Association warned against using test scores to rate individual teachers since they affect only 1-14% of variation in student scores.

The ASA said: “Attaching too much importance to a single item of quantitative information is counter-
productive—in fact, it can be detrimental to the goal of improving quality. In particular, making changes in response to aspects of quantitative information that are actually random variation can increase the overall variability of the system.”

Unlike the state of Vermont, which refuses to rate teachers and principals by test scores, Néw York’s Regents will plunge ahead, regardless of the damage they do to teachers, principals, students, and communities.

This editorial from the Tampa Bay Times was published in March, but I just discovered it and wanted to share it. Unlike the editorial writers in many other cities, the Tampa Bay Times went beyond the press releases and self-serving statements of public officials.

They pointed out that the ratings had a margin of error of 50%. “That means it is useless. Still, the state intends to base half of a teacher’s performance evaluation, and future pay, on this absurdity.

“As Tampa Bay Times staff writers Lisa Gartner and Cara Fitzpatrick reported, the state’s flawed system rates some of the region’s most honored teachers as low performers. Hillsborough County teacher of the year Patrick Boyko, a social studies teacher at Jefferson High School, scored a minus 10.23 percent, with a margin of error above 50 percent. Translation? His students scored 10 percent worse on the FCAT than typical children across the state even though the Florida Comprehensive Achievement Test measures students in reading, writing, mathematics and science, but not social studies. Of course, it mattered little since the margin of error larger than Boyko’s actual VAM score invalidated the whole process.”

“Even lawmakers had to acknowledge it wasn’t fair to judge teachers based on students’ performance in academic areas they do not teach. But how do you assign a numeral measurement to teachers who inspire and challenge children to read classic literature, explore scientific principles, create a piece of art, write a song, or run a 5K for the first time? In Florida, you would check to see how the kids did on their math FCAT. The system is so convoluted that one Hernando School District administrator correctly observed the highest rated teachers are likely the physical education staffers at A-rated schools.

“Like Florida’s controversial school grading system, these teacher evaluations, relying on the value-added model, are not credible and conflict with the school districts’ own performance standards. House Speaker Will Weatherford has said he wants to restore trust and integrity to the school grades, but he also champions a value-added concept for rating teachers — a model, he acknowledges, that is so complex he can’t explain it. Neither district administrators nor classroom teachers have confidence in this evaluation system. The Department of Education should toss its modeling and let districts devise an evaluation system for teachers that more accurately reflects the daily occurrences inside individual classrooms.”

If only other editorialists took the time to look at the VAM ratings, they too would conclude that this multimillion dollar exercise in number-crunching is Junk Science.

Audrey Amrein-Beardsley reports that highly rated teachers are leaving the Houston public schools because of the erratic EVAAS measure. Seven teachers are suing the district based on its erratic measure.

In this post, she tells the story of a teacher with 15 years experience who prefers teaching in high-needs schools.

“The one teacher highlighted in this piece, “holds a mathematics degree from the University of Houston, has taught all levels of high school mathematics for 15 years…and has repeatedly pursued assignments in high-needs schools with large Latino populations. While administrators, parents and peers have consistently rated him as a highly effective teacher, his EVAAS scores have varied wildly. While at [one district high school], he earned one of the highest EVAAS scores and year-end bonuses possible. Two years ago, teaching the same subject at [another high school] he received a below-average EVAAS score.” This teacher decided to leave the high-needs school in which his students’ performance apparently “biased” his results. He explained, “I can’t afford to be heroic. I want to be in the toughest schools, but the EVAAS model interprets my students’ challenges as my personal [and professional] failure.”

Teachers in training, she reports, are shunning Houston because of the flawed EVAAS.

Don’t forget: the purpose of EVAAS was to ensure that HISD had only “great teachers.” When will district leaders recognize it is driving away its best teachers?


Get every new post delivered to your Inbox.

Join 113,388 other followers