Archives for category: Teacher Evaluation

Laura H. Chapman left the following comment. The word “desperate” to describe this quest for a scientific, data-based means of judging teachers is mine. Something about it smacks of anti-intellectualism, the kind of busywork exercise that an engineer would design, especially if he had never taught K-12. This is the sort of made-up activity that steals time from teaching and ultimately consumes a lot of time with minimal rewards.

Chapman writes:

Please give at least equal attention to the 70% of teachers who have job assignments without VAMs (no state-wide tests). For this majority, USDE promotes Student Learning Objectives (SLOs) or Student Growth Objectives (SGOs), a version of 1950s management-by-objectives on steroids.

Teachers who have job-alike assignments fill in a template to describe an extended unit or course they will teach. A trained evaluator rates the SLO/SGO (e.g. “high quality” to “unacceptable” or “incomplete”).

The template requires the teacher to meet about 25 criteria, including a prediction of the pre-test to post-test gains in test scores of their students on an approved district-wide test. Districts may specify a minimum threshold for these gains.

Teachers use the same template to enter the pre-and post-test scores. An algorithm determines if the gain meets the district threshold for expectations, then stack ranks teachers as average, above or below average, or exceeding expectations.

1. The Denver SLO/SGO template is used in many states. This example is for art teachers—-Denver Public Schools. (2013). Welcome to student growth objectives: New rubrics with ratings.
2. One of the first attempts to justify the use of SLOs/SGOs for RttT—-Southwest Comprehensive Center at WestEd (n.d.). Measuring student growth in non-tested grades and subjects: A primer. Phoenix, AZ: Author.

3. This USDE review shows that SLOs/SGOs have no solid research to support their use—-Gill, B., Bruch, J., & Booker, K. (2013). Using alternative student growth measures for evaluating teacher performance: What the literature says. (REL 2013–002). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Mid-Atlantic.

4. The USDE marketing program on behalf of SLOs/SGOs—-Reform Support Network. (2012, December). A quality control toolkit for student learning objectives.

5. The USDE marketing campaign for RttT teacher evaluation and need for district “communication SWAT teams” (p. 9) —- Reform Support Network. (2012, December). Engaging educators, Toward a New grammar and framework for educator engagement. Author.

6. Current uses of SLOs/SGOs by state—-Lacireno-Paquet, N., Morgan, C., & Mello, D. (2014). How states use student learning objectives in teacher evaluation systems: a review of state websites. Washington, DC: US Department of Education, Institute of Education Sciences.

7. Flaws in the concepts of “grade-level expectation” and “a year’s worth of growth” —-Ligon, G. D. (2009). The optimal reference guide: Performing on grade level and making a year’s growth: Muddled definitions and expectations, growth model series, Part III. Austin, TX: ESP Solutions

A few days ago, I published a post about a paper by Kirabo Jackson, explaining that the non-cognitive effects of teachers are often more important than the test scores of their students.


As it happened, mathematician Robert Berkman read the paper and explains here why it is another nail in the coffin of value-added measures, which judge teacher quality by the rise or fall of student test scores.


Berkman writes:


In this post, I’m going to examine one of the studies that no doubt had a profound impact on the members of AMSTAT that led them to this radical (but self-evident) conclusion. In 2012, the researcher C. Kirabo Jackson at Northwestern University published a “working paper” for the National Bureau of Economic Research, a private, nonprofit, nonpartisan research organization dedicated to promoting a greater understanding of how the economy works (I’m quoting here from their website.) The paper, entitled “Non-Cognitive Ability, Test Scores, and Teacher Quality: Evidence from 9th Grade Teachers in North Carolina” questions the legitimacy of evaluating a teacher based on his/her students’ test scores. Actually, it is less about “questioning” and more about “decimating” and “annihilating” the practice of VAM.


He adds:


What should be noted is that Jackson is not an educational researcher, per se. Jackson was trained in economics at Harvard and Yale and is an Associate Professor of Human Development and Social Policy. His interest is in optimizing measurement systems, not taking positions on either side of the standardized testing debate. Although this paper should reek with indignation and anger, it makes it’s case using almost understated tone and is filled with careful phrasing like “more than half of teachers who would improve long run outcomes may not be identified using test scores alone,” and “one might worry that test-based accountability may induce teachers to divert effort away from improving students’ non-cognitive skills in order to improve test scores.”

But lets get to the meat of the matter, because this paper is 42 pages long and incorporates mind-boggling statistical techniques that account for every variable one might want to filter out to answer the question: are test scores enough to judge the effectiveness of a teacher? Jackson’s unequivocal conclusion: no, not even remotely.


The only puzzle is why Arne Duncan keeps shoving VAM down the throats of states and school districts.



PS: Berkman added his credentials in a comment:

“I’m a math teacher who has worked with pre-K through college aged students for 30 years. My degrees are in Urban Studies, and Elementary Math Education. I have also done extensive work in neuroscience and numeracy, as well as technology and education, not to mention cognitive science.”



Kim Cook, a first-grade teacher in Florida, received a bonus of $400. She donated it to the Network for Public Education to fight the failed ideas of corporate reform, which prevail in her state.

She is the second teacher to donate their bonus to NPE to fight fake reforms that demean teachers and distort education. Not long ago, Kevin Strang, an instrumental music teacher from Florida, donated his $800 bonus, awarded because he teaches in a school that was rated A.

On behalf of NPE, we thank Kim and Kevin. We hope other teachers will follow their lead. We pledge to fight for you and to advance the day when non-educators and politicians stop meddling with your work and let you teach.

I asked Kim to tell me why she decided to do this. This was her reply:

“Hi Diane,

“Yes, I donated $400. I am a first grade teacher in Alachua County, Florida. I was inspired by Kevin Strang’s donation last month. I, too, received bonus money, not because I work at an “A” school, but because my school’s grade went from a “D” to a “C.”

“Here’s the catch: I don’t teach at the school that determines my school’s grade. I teach at Irby Elementary School in Alachua, Florida, which only serves grades K-2. My school’s grade is determined by students at the grade 3-5 school up the road.

“I have only been working at Irby Elementary for three years, so I have never met–never even passed in the hall–the fourth and fifth grade students whose FCAT scores determined my school’s grade. Even if I had, I completely disagree with high-stakes testing and tying teachers’ bonuses, salaries, and evaluations to those scores. I am donating my bonus money to NPE because I am fighting the failed policies of education “reformers” in every way that I can. Thank you for providing me an avenue through which to do that!

“Here is some background information on me. I am the Florida teacher that received an unsatisfactory evaluation based on students I had never taught at the same time I was named my school’s teacher of the year. My story made it into Valerie Strauss’ The Answer Sheet.

I am also the lead plaintiff in Florida Education Association/NEA’s lawsuit challenging the constitutionality of VAM.

With deep appreciation and respect,

Kim Cook

Arne Duncan may withdraw the waiver he extended to Washington State because it failed to adopt a test-based teacher evaluation system, as he demanded.

The first question is, what this will mean for Washington State, should Duncan withdraw the waiver? If the state reverts to the requirements of NCLB, then very likely every school and every district will be a “failing” school or district and therefore subject to draconian punishments, such as state takeover, takeover by a private management company, takeover by charter operators, or closure. In short, the entire state public school system would be privatized, subject to state control, or closed. The utter absurdity of NCLB would be on public display for all to see. That might be a valuable lesson for the nation, helping to hasten an end to a failed law.

Another interesting question that the Washington State issue raises is where Arne Duncan got the authority to set the terms of waivers from the law. Did Congress say he could do it? I don’t think so. Is it legal for him to create conditions that mirror Race to the Top requirements but without RTTT funding? Congress might want to know the answer to that question, especially Senator Patty Murray of Washington, who will not be happy to see her entire state branded a failure. Senator Murray is chair of the Senate Budget Committee and a member of the Senate Health, Education, Labor, and Pensions committee.

Third, why should he revoke his legally dubious waiver because a state fails to enact a program that has consistently failed wherever it was tried? Evaluating teachers by test scores has not worked anywhere, has received negative reviews from most education researchers, yet Duncan clings to it with religious faith.

Why should Washington State be punished for demonstrating good judgment, wisdom, and critical thinking?

Value-added-measurement (VAM) produce ratings that are
inaccurate and unstable. In Florida, about half of teachers don’t
teach tested subjects, so they are assigned scores based on the
scores of their school, meaning they are rated in relation to the
scores of students they never taught and subjects they never
taught. This
Florida teacher explains why she was rated a 23.6583 out of 40,
even though she teaches a non-tested subject.
This is
irrational. Yet Arne Duncan has compelled almost every state to
develop VAM ratings because he believes in them, even though there
is no evidence for their value. How can a teacher’s quality be
judged by the test scores of students she never taught? If that is
not Junk Science, what is? Bill Gates gave Hillsborough County,
Florida, $100 million to evaluate teachers using
value-added-measurement. Here is the formula: Here’s how
the state’s Department of Education explains it, from
a department paper
methods2 I admit I don’t understand it. Many
people don’t understand it. But whoever wrote it understands it.
Bill Gates said recently it would take at least ten years to see if
this stuff “works.” I don’t think we have to wait ten years. “This
stuff” doesn’t work. It doesn’t even make sense. Teachers of the
gifted may be rated ineffective because their students have already
hit the top, and their scores can’t go up any higher. Their ratings
are Junk Science. When the same teacher gets a bonus one year, but
then is rated ineffective the next year, it shows how unstable the
ratings are. That means they are not science, they are Junk
Science. There is so much more to the art and craft of teaching
than standardized tests reveal. What matters most is not
quantifiable although peers and supervisors can indeed judge which
teachers are best and worst. If the measure is not valid, if the
measure in inaccurate and unstable, then it is wrong to use it to
give people bonuses or to fire them. In this post on her
VAMboozled, Audrey Amrein-Beardsley reviewed a study
of VAM which again identified the weaknesses of VAM. She writes:
Finally, these researchers conclude that, “even in the
best scenarios and under the simplistic and idealized
conditions…the potential for misclassifying above average teachers
as below average or for misidentifying the ‘worst’ or ‘best’
teachers remains nontrivial.” Accordingly, misclassification rates
can range “from at least seven to more than 60 percent” depending
on the statistical controls and estimators used and the moderately
to highly non-random student sorting practices and scenarios across
Now, think about it. If the VAM rating can be
wrong by as much as 60%, why would any school district use it to
fire teachers? No wonder teachers are suing for wrongful
termination! Call in the lawyers, VAM is Junk Science.

The Rochester Teachers Association is suing the state over its teacher evaluation system, alleging that it does not take into account the impact of poverty on classroom performance.

RTA says the evaluations are “junk science.”

“ALBANY, N.Y. March 10, 2014 – The Rochester Teachers Association today filed a lawsuit alleging that the Regents and State Education Department failed to adequately account for the effects of severe poverty and, as a result, unfairly penalized Rochester teachers on their APPR (Annual Professional Performance Review) evaluations.

“The suit, filed in state Supreme Court in Albany by New York State United Teachers on behalf of the RTA and more than 100 Rochester teachers, argues the State Education Department did not adequately account for student poverty in setting student growth scores on state tests in grades 4-8 math and English language arts. In addition, SED imposed rules for Student Learning Objectives and implemented evaluations in a way that made it more difficult for teachers of economically disadvantaged students to achieve a score of “effective” or better. As a result, the lawsuit alleges the Regents and SED violated teachers’ rights to fair evaluations and equal protection under the law.

“SED computes a growth score based on student performance on state standardized tests, which is then used in teacher evaluations.

“Nearly 90 percent of Rochester students live in poverty. The lawsuit says SED’s failure to appropriately compensate for student poverty when calculating student growth scores resulted in about one-third of Rochester’s teachers receiving overall ratings of “developing” or “ineffective” in 2012-13, even though 98 percent were rated “highly effective” or “effective” by their principals on the 60 points tied to their instructional classroom practices. Statewide, just 5 percent of teachers received “developing” or “ineffective” ratings.

“The State Education Department’s failure to properly factor in the devastating impact of Rochester’s poverty in setting growth scores and providing guidance for developing SLOs resulted in city teachers being unfairly rated in their evaluations,” Iannuzzi said. “Rochester teachers work with some of the most disadvantaged students in the state. They should not face stigmatizing labels based on discredited tests and the state’s inability to adequately account for the impact of extreme poverty when measuring growth.”

“RTA President Adam Urbanski said an analysis of Rochester teachers’ evaluations for 2012-13 demonstrated clearly the effects of poverty and student attendance, for example, were not properly factored in for teachers’ evaluations. As a result, “dedicated and effective teachers received unfair ratings based on student outcomes that were beyond their control. The way the State Education Department implemented the state testing portion of APPR adds up to nothing more than junk science.”

Marc Tucker has written an excellent post on the failure of punitive accountability.

The working theory behind the Bush-Obama “reforms” is that teachers are lazy and need to be motivated by rewards and punishments and the threat of public shaming.

This is in fact a theory drawn from the early twentieth century writings of Frederick Winslow Taylor, who studied the efficiency of factory workers.

Tucker writes:

Let’s start by examining the premises behind the prevailing system.  The push for test-based accountability systems to evaluate teachers have their origin in the work of a professor of agricultural statistics in Tennessee who discovered that differences in teacher quality as measured by analyses of student test scores over time accounted for very large differences in student performance.  Many observers concluded from this that policy should concentrate on using these statistical techniques to identify poor teachers and remove them from the teaching force.  At the same time, other observers, believing that the parents would choose effective schools for their children over ineffective schools if only they had information as to which schools are effective, pushed to use student test data to identify and publicly label schools based on the available test score data.  And, finally, policymakers passed the NCLB legislation, requiring the identification of schools as chronically underperforming and remedies involving the replacement of school leaders and staff, and, in extreme cases, closing schools down.

All of these accountability systems are essentially punitive in design and intent.  They threaten poor performing schools with public shaming, takeover and closure and poor performing individuals with public shaming and the loss of their jobs and livelihood.  The introduction of these policies was not accompanied by policies designed to improve the supply of highly qualified new teachers by making teaching a more attractive option for our most successful high school students—a key component of policy in the top-performing countries.  There is a lot of federal money available for training and professional development for teachers but no systematic federal strategy that I can discern for turning that money into systems of the kind top-performing countries use to support long-term, steady improvements in teachers’ professional practice.  I conclude that policymakers have placed their bet on teacher evaluation, not to identify the needs of teachers for development, but to identify teachers who need to be dismissed from the service.  And, further, that the way to motivate school staff to work harder and more efficiently is to threaten them with public shame and the loss of their job.

Race to the Top incorporates the ideas of economist Eric Hanushek of the Hoover Institution at Stanford University, who has argued in various writings that the way to improve results (test scores) is to “deselect” the bottom 5-10% of teachers based on the test scores of their students.

As Tucker shows, modern cognitive psychology recognizes that people are motivated to do their best not by humiliation and punishment, but by a sense of purpose, professionalism, and autonomy.  Unfortunately, neither our Congress nor the policymakers in the Obama administration are familiar with modern cognitive psychology, with the work of scholars and writers like Edward Deci, Dan Ariely, or Daniel Pink, nor with the organizational theory of Edwards Deming, who acknowledged that people want to do their best and must be allowed and encouraged to do it, not threatened with dire punishments.

Jersey Jazzman warns that New Jersey’s new teacher evaluation plan is expensive, wasteful, inaccurate, and has no basis in research whatever. Other than that….it stinks.

In short, he calls it Operation Hindenburg, and if you don’t know about the Hindenburg, I suggest you google it. (Watch out, as the data miners will start offering you bargain deals on used blimps.)

New Jersey’s new teacher evaluation system — code name: Operation Hindenburg — is not cheap. Superintendents around the state have been warning us about this for a while: the costs of this inflexible system are going to impose a significant financial burden on districts, making this a wasteful, unfunded mandate.

JJ writes:

But if you don’t believe me, and you don’t believe these superintendents, why not listen to a couple of scholars who have produced definitive proof of the exorbitantly high costs of AchieveNJ:

In 2012, the New Jersey State Legislature passed and the Governor signed into law the Teacher Effectiveness and Accountability for the Children of New Jersey (TEACHNJ) Act. This brief examines the following questions about the impact of this law:
• What is the effect of intensifying the teacher evaluation process on the time necessary for administrators to conduct observations in accordance with the new teacher evaluation regulations in New Jersey?
• In what ways do the demands of the new teacher evaluation system impact various types of school districts, and does this impact ameliorate or magnify existing inequities?
We find the following:
On average, the minimum amount of time dedicated solely to classroom observations will increase by over 35%. It is likely that the other time requirements for compliance with the new evaluation system, such as pre- and post-conferences, observation write- ups, and scheduling will increase correspondingly.
The new evaluation system is highly sensitive to existing faculty-to-administrator ratios, and a tremendous range of these ratios exists in New Jersey school districts across all operating types, sizes, and District Factor Groups. There is clear evidence that a greater burden is placed on districts with high faculty-to-administrator ratios by the TEACHNJ observation regulations. There is a weak correlation between per-pupil expenditures and faculty-to-administrator ratios.
The change in administrative workload will increase more in districts with a greater proportion of tenured teachers because of the additional time required for observations of this group under the new law.
The increased burden the TEACHNJ Act imposes on administrators’ time in some districts may compromise their ability to thoroughly and properly evaluate their teachers. In districts where there are not adequate resources to ensure administrators have enough time to conduct evaluations, there is an increased likelihood of substantive due process concerns in personnel decisions such as the denial or termination of tenure. [emphasis mine]

- See more at:


Grover (Russ) Whitehurst is worried that the public is turning against standardized testing. As George W. Bush’s director of education research, he was and is a true believer in testing. As head of the Brown Center at Brookings, once known as a bastion of liberal thought, Whitehurst wants to see the programs he tended under Bush’s NCLB survive.

Yet they are, as he puts it, “in a bit of trouble.”

He is upset to see that Néw York City elected a new mayor who does not share his love of testing, accountability, and choice. Bill de Blasio is a progressive Democrat.

He is not happy that the Texas legislature rolled back some of its testing requirements, responding to public protest.

But most of all, he is upset that Linda Darling-Hammond, who is senior advisor to one of the federally funded testing consortia, recommends testing in only a single grade in each of the three levels of schooling: elementary school, middle school, and high school.

He frets: What would that do to teacher assessment? How could growth scores be calculated?

Whitehurst’s recommendation: we should test more, not less!

I am not sure I follow the logic here.

How will more testing quell the growing rebellion against testing? There will be more angry moms and dads, more Bill de Blasio’s elected. Maybe he is on to something.

This post was written by Charles J. Morris, Emeritus Professor of Psychology at Denison University, who lives in Indianapolis.

Does the ISTEP Measure School Quality and Teacher Effectiveness?

Charles J. Morris1

While there appears to be general agreement that teachers can make a big difference in the lives of students, there is little evidence that performance on standardized tests provides a valid assessment of teacher effectiveness. Nonetheless, at the national, state, and local levels, we are seeing increasing use of test scores to evaluate both schools and teachers, to award merit pay, and even sanction low performing schools and corporations.

This growing trend toward using test scores to evaluate schools and teachers fails to recognize the evidence that factors beyond the control of schools account for most of the variation we see in test scores among school districts throughout a given state. Matthew Di Carlo of the Shanker Institute sums it up this way: “…roughly 60 percent of achievement outcomes is explained by student and family background characteristics…schooling factors explain roughly 20 percent, most of this (10-15 percent) being teacher effects.”2 (The remaining variation is unexplained and considered error variance.) What this basically means is that schools and teachers are being judged to a substantial degree on the basis of factors over which they have little control.

Is the above conclusion also true for the ISTEP, Indiana’s test for measuring student performance and evaluating school quality and teacher effectiveness? The purpose of this short piece is to briefly summarize some evidence which indicates that the same conclusion holds for the ISTEP: Out-of-school factors, namely the socioeconomic profile (SES) of a school district, explain most of the variation we see in test performance from one district to the next.

Consider, for example, the following chart which shows the percent of students who passed both the ELA (English/Language Arts) and Math portions of the 2013 ISTEP as a function of the percentage of students in the corporation (Indiana’s districts) who qualify for free- or reduced-price lunches (FRPL, a commonly used measure of SES):


These data are based on the 56 corporations that have at least 5000 students in the district. As can be seen, there is a very strong correlation between the two variables: The higher the percentage of kids who qualify for FRPL, the lower the passing percentage. Another way of putting it is, if we know the socioeconomic profile of a corporation we can make a very good prediction of where that corporation stands compared to other corporations on the ISTEP. This should not be a surprise to those familiar with the research literature. The same relationship has been found for the various standardized tests used throughout the country.

The above results are based on the performance of all students in each corporation. The following charts show the results separately for 3rd and 8th graders:


Again, we see the same pattern for both grade levels, basically unchanged after 5 years of schooling in a high-scoring or low-scoring corporation. The SES influence is quite strong independently of the schools and teachers in a particular corporation. In fact, if anything, the SES impact appears to become slightly stronger as students progress from the 3rd to 8th grade.

So what are we to make of this obvious association between ISTEP scores of SES? The seemingly inescapable conclusion is that corporations and teachers deserve neither praise nor criticism for how their student compare to other corporations and teachers. Clearly, the socioeconomic profile (SES) of the corporation plays a decisive role. So I ask a simple question: Does anyone seriously believe that if Carmel and Gary (a high and low-performing corporation, respectively) exchanged teachers, the ISTEP scores would suddenly reverse themselves? I don’t think so.

The challenge thus becomes how to respond to the fact that poorer kids are not performing well in our schools. Is there less parental involvement in these communities? Are expectations lower? Do these parents need additional help in becoming more effective mentors? Are after-school tutoring programs a possible solution? What about summer programs? Or pre-school programs? Perhaps all of the above, along with addressing the well-documented and devastating effects that poverty has on the health and well-being of poor children long before they even enter school3.

But one thing seems clear: Judging school and teacher quality on the basis of test scores offers little in the way of a solution. We need to look beyond our schools and teachers if we are going to better prepare all kids for the world they will face in the days ahead.

1Charles J. Morris is an Emeritus Professor of Psychology from Denison University. He resides in Indianapolis.

2Matthew Di Carlo, Shanker Institute (see

3Diane Ravitch, Reign of Error (New York: Knopf, 2013, pp. 91-98).


Get every new post delivered to your Inbox.

Join 95,059 other followers