Archives for category: Teacher Evaluations

Levi Cavener, a teacher of special education in Idaho, learned that Idaho will give the Common Core test SBAC) to tenth graders even though it includes eleventh grade content.

“However, I was shocked during this exchange when the Director told me that the decision was due to the fact the state was worried students wouldn’t take the test seriously, and they didn’t want their data set tainted…because, you know, then the results wouldn’t be valid.

“Here is the Director’s response to my question of the logic in giving 10th graders the SBAC instead of 11th graders:

[The director said “Grade 11 is optional this year as your juniors have already met graduation requirements with the old ISATs and might not take the new tests seriously if they were used for accountability.”
Well, that’s convenient. I’m glad the State Department can cherry-pick the students who take the SBAC “seriously” and which students will not; I’m sure they will give that same privilege to teachers…oh..err…I guess not.]

See, here’s why my jaw was left open: The Director of Assessment admitted, rightfully and logically, that if students won’t take the test seriously, then there is no point in assessing them because the data will be invalid. And, if that’s true, let’s not assess those kidos because it would be a total waste of time and resources, not to mention the fact that the data would be completely invalid.

Thus, it would be logical to conclude that if the data is not accurate, then the SDE surely wouldn’t want to tie those scores to something as significant as a teacher’s livelihood.

Oh wait…they want to do exactly that? Shucks!

According to the the Idaho State Department of Education’s recent Tiered Licensure recommendations, SBAC data will be tied directly to a teacher’s certification, employment, and compensation.

Yet, If the Dept. of Ed admits SBAC data isn’t accurate, then what in the world are they doing on insisting that the data be tied to a teacher’s certification, employment, and compensation?

The insistence of tying data that is admittedly invalid is synonymous to tying a fortune cookie to real-world events. I don’t know about you, but my lucky numbers haven’t hit the lottery; what a scam!”

The test is more than eight hours long.

Writes Levi, “Isn’t it logical to conclude that at some point that kidos decide they would rather go outside to recess rather than reading closely on a difficult text passage or spending more time editing a written response? When the kido makes that decision, do we hold the teacher responsible for the invalid data?”

And what about special education kids? “Let’s compound that scenario for special education teachers who work with a population of students qualifying for a special education eligibility under categories of Attention Deficit Hyperactivity Disorders, Emotional Disturbances, and Autism Spectrum diagnosis.

“Yup, I’m sure these students will always take the multi-day SBAC with the utmost earnestness; it’s not like the very behaviors they demonstrated to qualify for special education services to begin with would impede their ability to complete the SBAC with total validity of the results?”

Audrey Amrein-Beardsley posted a guest blog by a rising star in the Academy, Jimmy Scherrer of North Carolina State University, who previously taught in LAUSD.

Scherrer wrote:

“As someone who works with students in poverty [see also a recent article Scherrer wrote in the highly esteemed, peer-reviewed Educational Researcher], I am deeply troubled by the use of status measures—the raw scores of standardized assessments—for accountability purposes. The relationship between SES and standardized assessment scores is well known. Thus, using status measures for accountability purposes incentivizes teachers to work in the most advantaged schools.

“So, I am pleased with the increasing number of accountability systems that are moving away from status measures. In their place, systems seem to be favoring value-added estimates. In theory, this is a significant improvement. However, the manner in which the models are currently being used and how the estimates are currently being interpreted is intellectually criminal. The models’ limitations are obvious. But, as a learning scientist, what’s most alarming is the increasing use of the estimates generated by value-added models as a proxy for “effective” teaching…..”

“Typically, research studies on teaching and learning are framed using one of three perspectives: the behaviorist, the cognitivist, and the situative. Each perspective is associated with a different grain size. The behaviorist perspective focuses on basic skills, such as arithmetic. The cognitivist perspective focuses on conceptual understanding, such as making connections between addition and multiplication. The situative perspective focuses on practices, such as the ability to make and test conjectures. Effective teaching includes providing opportunities for students to strengthen each focus. However, traditional standardized assessments mainly contain questions that are crafted from a behaviorist perspective. The conceptual understanding that is highlighted in the cognitivist perspective and the participation in practices that is highlighted in the situative perspective are not captured on traditional standardized assessments. Thus, the only valid inference that can be made from a value-added estimate is about a teacher’s ability to teach the basic skills and knowledge associated with the behaviorist perspective.”

This, he writes, is “intellectually criminal” and “intellectually lazy.”

Tell it! VAM is Junk Science.

A teacher in Texas wrote this comment, which depicts (to me) a system where data matters more than teachers or learning or children, either the system is on autopilot or is run by people who confuse numbers with learning.

“They recruited from NC and from Spain (for bilingual teachers) this year because they did expect vacancies. I think it’s important to mention that all are not based on EVAAS because not everyone has those standardized scores. They are also based on Stanford testing in 1st and 2nd grade and for classes like PE, a district made assessment. I teach Kinder and am still waiting to find out what growth they calculated for my scores last year (and yes, they were bubble-in multiple choice tests). No one could explain to me how it was going to work, what percentage growth was required to be considered effective and how that was going to be calculated– so I’m very anxious about it. I was rated highly effective in the professional and instructional areas but who knows. We are supposed to use 2 different assessments for more validity but that doesn’t happen-they end up using the reading and math versions of the same test given the same week. I did wonder how many vacancies they had to start the new school year yesterday?”

Laura Chapman, a regular contributor to the blog, has worked in arts education for many years.

She writes:

This desire to churn the teaching workforce is not just a push from Bill Gates and lawsuits to dismantle unions.
Six economists/statisticians brought together at the Brookings Institution offered a similar plan. These number crunchers said that district-wide VAM (value-added) scores should be used to determine the most effective teachers, irrespective of the subjects and grade-levels they teach.

This proposal is efficient and absurd. It is based on the assumption that a district’s value added scores are so highly correlated with “non-value added” measures that employment decisions for all teachers can be based on the performance of teachers with value added scores.

Under this system, all teachers would also have a composite evaluation based on multiple measures such as end of course test scores, observations, and student surveys. Even so, the teachers with VAM scores would determine the employment fate of all teachers. How is this conclusion reached?

Here is the magical thinking: “For example, we would assume that the correlation between observationally-based ratings of teachers and value-added (scores) in math would be the same in history, where value-added measures are not available.”

In other words, the statisticians freely invent (impute) a missing metric for the history teacher by assuming a math teacher’s rating on a classroom observation protocol can be used as a substitute for the history teacher’s missing value added score.

Those inferential leaps are just the beginning of a larger plan that would make all teacher evaluations “comparable” without any distinctions in grade level, or subject, or conditions under which teachers work.

The Brookings policy articulates principles for dismissing up to 25% of teachers in a district, on the assumption that this action plan would increase test scores and be “fair” to every teacher. The only exception to this formula might be for teachers of exceptional children. This case of econometric thinking ignores the educational, ecological, and substantive importance of different job assignments. See Corerelation, Para 5 in http://www.brookings.edu/research/reports/2011/04/26-evaluating-teachers

The Brookings paper is not radically different, (except for the 25 % churn) from a USDE plan for all teachers by a collective VAM for a school, but limited to one of the “priority” subtests such as reading or mathematics. In Florida, for example, the school wide VAM in reading or math is assigned to art and other teachers of nontested subjects. In other words, the curriculum and instruction that really matters is narrowed to the three R’s.

The use of a collective VAM focused on reading or math is a rapid and cost-effective way to meet federal or state requirements for teacher evaluation. Moreover, in 2014, a U.S. district judge ruled that evaluators in Florida are allowed to disregard a teacher’s job assignment in rating performance. The judge ruled that this practice is legal, even if it is unfair.

Teacher ratings based on a collective value-added score are likely to increase in states where Common Core State Standards (CCSS) are adopted and tested. The CCSS call for all teachers to improve student proficiency in English Language Arts and mathematics.

Although the American Statistical Association has denounced the practice of using VAM for rating individuals, that measure is unlikely to disappear as a tool for churning the workforce.

In the Obama/Duncan/McKinsey & Co. “RESPECT” project, for example, a teacher can only be judged “highly qualified” by producing more than a year’s worth of growth (gain in test scores) in three out of every five years. Teachers without that designation have shorter up-or-out criteria to meet.

This stack-ranking system, like the Brookings plan, banishes job security and churns the teaching workforce by insisting on one-size-fit-all criteria for “effective” teachers. http://www.ed.gov/blog/2012/02/launching-project-respect/

Peter Greene here evaluates a report by two analysts at Bellwether Education, a DC think tank, about how teachers should be evaluated. His post is a model of how to tear apart and utterly demolish the musings of people far removed from the classroom about how things ought to work.

He begins by situating its sponsor:

“I am fascinated by the concept of think tank papers, because they are so fancy in presentation, but so fanceless in content. I mean, heck– all I need to do is give myself a slick name and put any one of these blog posts into a fancy pdf format with some professional looking graphic swoops, and I would be releasing a paper every day.

“Bellwether Education, a thinky tank with connections to the standards-loving side of the conservative reformster world, has just released a paper on the state of teacher evaluation in the US. “Teacher Evaluation in an Era of Rapid Change: From ‘Unsatisfactory’ to ‘Needs Improvement.'” (Ha! I see what you did there.) Will you be surprised to discover that the research was funded by the Bill and Melinda Gates Foundation?”

He reviews what they describe as current trends and pulls each one apart.

Here is an example of a current trend and Greene’s response:

“3) Districts still don’t factor student growth into teacher evals

“Here we find the technocrat blind faith in data rearing its eyeless head again”

The authors say: “While raw student achievement metrics are biased—in favor of students from privileged backgrounds with more educational resources—student growth measures adjust for these incoming characteristics by focusing only on knowledge acquired over the course of a school year.”

“This is a nice, and inaccurate, way to describe VAM, a statistical tool that has now been discredited more times than Donald Trump’s political acumen. But some folks still insist that if we take very narrow standardized test results and run them through an incoherent number-crunching, the numbers we end up with represent useful objective data. They don’t. We start with standardized tests, which are not objective, and run them through various inaccurate variable-adjusting programs (which are not objective), and come up with a number that is crap. The authors note that there are three types of pushback to using said crap.

“Refuse. California has been requiring some version of this for decades. and many districts, including some of the biggest, simply refuse to do it.

“Delay. A time-honored technique in education, known as Wait This New Foolishness Out Until It Is Replaced By The Next Silly Thing. It persists because it works so often.

“Obscure. Many districts are using loopholes and slack to find ways to substitute administrative judgment for the Rule of Data. They present Delaware as an example of how futzing around has polluted the process and buttress that with a chart that shows statewide math score growth dropping while teacher eval scores remain the same.

“Uniformly high ratings on classroom observations, regardless of how much students learn, suggest a continued disconnect between how much students grow and the effectiveness of their teachers.

“Maybe. Or maybe it shows that the data about student growth is not valid.

“They also present Florida as an example of similar futzing. This time they note that neighboring districts have different distributions of ratings. This somehow leads them to conclude that administrators aren’t properly incorporating student data into evaluations.

“In neither state’s case do they address the correct way to use math scores to evaluate history and music teachers.”

After carefully pulling apart the report, here are the conclusions, theirs and his:

Greene reviews their recommendations:

“It’s not a fancy-pants thinky tank paper until you tell people what you think they should do. So Adelman and Chuong have some ideas for policymakers.

“Track data on various parts of new systems. Because the only thing better than bad data is really large collections of bad data. And nothing says Big Brother like a large centralized data bank.

“Investigate with local districts the source of evaluation disparities. Find out if there are real functional differences, or the data just reflect philosophical differences. Then wipe those differences out. “Introducing smart timelines for action, multiple evaluation measures including student growth, requirements for data quality, and a policy to use confidence intervals in the case of student growth measures could all protect districts and educators that set ambitious goals.

“Don’t quit before the medicine has a chance to work. Adelman and Chuong are, for instance, cheesed that the USED postponed the use of evaluation data on teachers until 2018, because those evaluations were going to totally work, eventually, somehow.

“Don’t be afraid to do lots of reformy things at once. It’ll be swell.

“Their conclusion

“Stay the course. Hang tough. Use data to make teacher decisions. Reform fatigue is setting in, but don’t be wimps.

“My conclusion

“I have never doubted for a moment that the teacher evaluation system can be improved. But this nifty paper sidesteps two huge issues.

“First, no evaluation system will ever be administrator-proof. Attempting to provide more oversight will actually reduce effectiveness, because more oversight = more paperwork, and more paperwork means that the task shifts from “do the job well” to “fill out the paperwork the right way” which is easy to fake.

“Second, the evaluation system only works if the evaluation system actually measures what it purports to measure. The current “new” systems in place across the country do not do that. Linkage to student data is spectacularly weak. We start with tests that claim to measure the full breadth and quality of students’ education; they do not. Then we attempt to create a link between those test results and teacher effectiveness, and that simply hasn’t happened yet. VAM attempted to hide that problem behind a heavy fog bank, but the smoke is clearing and it is clear that VAM is hugely invalid.

“So, having an argument about how to best make use of teacher evaluation data based on student achievement is like trying to decide which Chicago restaurant to eat supper at when you are still stranded in Tallahassee in a car with no wheels. This is not the cart before the horse. This is the cart before the horse has even been born.”

Zak Jason wrote a fascinating interview in “Boston” magazine with Barbara Madeloni, the recently elected president of the Massachusetts Teachers Association, the largest union in the state with 110,000 members.

I first learned of Madeloni when she was preparing teachers at the University of Massachusetts, Amherst, and she refused to give the Pearson test to evaluate new teachers. Michael Winerip wrote a story about her defiance in the New York Times, and within a matter of days, her contract was not renewed. Now all teacher candidates across the university are required to take the Pearson exam.

I learned many things from this article. I learned that Barbara was a psychotherapist before she became a high school English teacher. I learned that when she ran for union president, she was considered a very long shot. Some people thought she had no chance at all.

I learned that the State Commissioner of Education, Mitchell Chester, is also chair of the governing board of PARCC, one of the two federally-funded Common Core tests. Some in the state say he has a conflict of interest.

Madeloni has called for a three-year moratorium on all testing and teacher evaluations:

“We’ve been trying to do scale, instead of human beings. We need to do human beings,” she says. She lambasts the Common Core, a national set of curriculum standards that the state adopted in 2010, as “corporate deform,” and described its architects to CommonWealth magazine as “rich white men who are deciding the course of public education for black and brown children.”

“The past and present heads of the state’s top education offices I talked to dismiss Madeloni’s rhetoric as naive, absurd, and, in the case of the moratorium, illegal. Mitchell Chester, the commissioner of the state’s Department of Elementary and Secondary Education (DESE), says he’s concerned that her “hyperbolic” vision may force the DESE to tune out the entire union.”

Chester may dismiss her, but teachers view her as a savior. “She’s the first MTA leader willing to listen to their agony, and to tell the truth about how teaching in the age of accountability can be, as Holyoke teacher Cheri Cluff puts it, “like waiting tables at a busy restaurant; you’re running and running and running, and you’ve lost your head.” Whereas past presidents and her opponent, MTA vice president Tim Sullivan, were willing to compromise with state administrators, Madeloni is combative, unapologetic, and, as Agustin Morales, another Holyoke teacher, says, “unafraid to make her life uncomfortable.”

Morales, the article notes, was elected president of his local in Holyoke with a 70% majority; he complained about the data walls, where students’ names and test scores are publicly posted. He was fired.

Madeloni is a fighter. She is outspoken and unafraid. Will she be marginalized by the state? Can the state alienate its largest union? Watch for the battles ahead. Madeloni was elected to stand up for teachers. Richard Stutman of the Boston Teachers Union has agreed to collaborate with her.

Zak Jason concluded:

“When I first talked to Madeloni soon after her election, she agreed to have me follow her throughout her first week. But just before her presidency began, she told me, “As a psychotherapist, I know the presence of someone else in the room can affect how the room behaves,” and said she would only be available for an interview, and her communications director James Sacks would join.

“As I’m about to leave her office, Madeloni turns to Sacks and asks, half-joking, “Is there anything I didn’t say that I was supposed to say?”

“What’s your vision?” he says.

“That we reclaim the vision of public education as a space for democracy, for joy, for hope, for a better future for all of our children. All of our children.”

Peter Greene sees signs that educators are fed up with the top-down mandates from non-educator Arne Duncan, fed up with the failed punitive policies of NCLB and Race to the Top. Now we know that Washington cares about one thing only: test scores, and now we know that the beneficiaries of Washington’s obsession are the testing companies. We have now had nearly 15 years of test-based incentives and sanctions and ample evidence that this approach has driven joy out of learning and failed to achieve anything that benefits students or society.

As the school year begins, let’s hope that there will be more states following Vermont’s lead by rejecting federal mandates and setting forth their own vision of what good education looks like. Let’s hope that there will be more teachers like those in Chicago and at Garfield High in Seattle who insist on doing what’s right for their students. Let’s hope that there will be more superintendents like those in Washington State who were compelled by NCLB to send home a letter saying “we are a failing school,” but added a cover letter saying that it was not true. Let’s hope that integrity, courage, and candor break out everywhere.

Paul Horton here attempts to understand why the Obama administration is waging war on teachers. He reminds us of Central Falls, when the Obama administration supported firing the entire staff of the high school. He remembers when the administration was neutral during the Chicago teachers’ strike, and Arne Duncan’s support for the noxious Vergara decision. He could have mentioned many other instances of the administration’s hostility to teachers, such as Duncan’s support for the L.A. Times story releasing the names and ratings of teachers. Or the administration’s silence during the large demonstrations against Wisconsin Governor Scott Walker, or its silence as vouchers spread.

He writes:

“In sum, the war on teachers and due process for teachers is presented by many Democrats as a new war on poverty, and, somewhat obscenely, “the Civil Rights Movement of our time.” Last year Michelle Rhee, former chancellor of Washington D.C. Schools, made speeches at southern civil rights museums that proclaimed that supporting charter schools and making teachers accountable was the key to creating a more equitable America. Closing the achievement gap and not the excuse of poverty was the new focus of the new Civil Rights movement. The National Civil Rights Museum—Lorraine Motel in Memphis recently recognized Geoffery Canada, a Harlem charter school operator and the star of the anti-pubic school documentary, “Waiting for Superman” as a “Civil Rights Hero.”

It was cheaper to wage war on teachers than to wage war on poverty. But that leaves so much unexplained. Why did President Obama embrace the Republican agenda of testing, accountability, and choice? Why did President Obama turn against one of the most reliable members of his party’s base? Horton doesn’t explain.

Remember that Arne Duncan said that there was too much testing, that testing was sucking the oxygen and joy out of classrooms? New York didn’t get the message. In that state, state tests count for 20% of educator evaluations, and local assessments count for another 20%. That is the agreement negotiated with the unions when the state won Race to the Top funding.

That was then, this is now.

The Néw York Board of Regents want state test scores to count for 40% of the evaluations of teachers and principals. This report was was confirmed to me by someone in Albany.

It matters not to the Regents that test-based evaluation is not working anywhere else. It matters not that the AERA and the National Academy of Education warned against it, warned that it would incentivize teachers to avoid high-needs students. It matters not that the American Statistical Association warned against using test scores to rate individual teachers since they affect only 1-14% of variation in student scores.

The ASA said: “Attaching too much importance to a single item of quantitative information is counter-
productive—in fact, it can be detrimental to the goal of improving quality. In particular, making changes in response to aspects of quantitative information that are actually random variation can increase the overall variability of the system.”

Unlike the state of Vermont, which refuses to rate teachers and principals by test scores, Néw York’s Regents will plunge ahead, regardless of the damage they do to teachers, principals, students, and communities.

This editorial from the Tampa Bay Times was published in March, but I just discovered it and wanted to share it. Unlike the editorial writers in many other cities, the Tampa Bay Times went beyond the press releases and self-serving statements of public officials.

They pointed out that the ratings had a margin of error of 50%. “That means it is useless. Still, the state intends to base half of a teacher’s performance evaluation, and future pay, on this absurdity.

“As Tampa Bay Times staff writers Lisa Gartner and Cara Fitzpatrick reported, the state’s flawed system rates some of the region’s most honored teachers as low performers. Hillsborough County teacher of the year Patrick Boyko, a social studies teacher at Jefferson High School, scored a minus 10.23 percent, with a margin of error above 50 percent. Translation? His students scored 10 percent worse on the FCAT than typical children across the state even though the Florida Comprehensive Achievement Test measures students in reading, writing, mathematics and science, but not social studies. Of course, it mattered little since the margin of error larger than Boyko’s actual VAM score invalidated the whole process.”

“Even lawmakers had to acknowledge it wasn’t fair to judge teachers based on students’ performance in academic areas they do not teach. But how do you assign a numeral measurement to teachers who inspire and challenge children to read classic literature, explore scientific principles, create a piece of art, write a song, or run a 5K for the first time? In Florida, you would check to see how the kids did on their math FCAT. The system is so convoluted that one Hernando School District administrator correctly observed the highest rated teachers are likely the physical education staffers at A-rated schools.

“Like Florida’s controversial school grading system, these teacher evaluations, relying on the value-added model, are not credible and conflict with the school districts’ own performance standards. House Speaker Will Weatherford has said he wants to restore trust and integrity to the school grades, but he also champions a value-added concept for rating teachers — a model, he acknowledges, that is so complex he can’t explain it. Neither district administrators nor classroom teachers have confidence in this evaluation system. The Department of Education should toss its modeling and let districts devise an evaluation system for teachers that more accurately reflects the daily occurrences inside individual classrooms.”

If only other editorialists took the time to look at the VAM ratings, they too would conclude that this multimillion dollar exercise in number-crunching is Junk Science.