Rachel E. Gabriel and Sarah L. Woulfin of the University of Connecticut ask a simple but very important question: Isn’t it time to redesign teacher evaluation? Most states are stuck with laws they wrote to apply for Race to the Top funding. Nearly a decade has passed. We now know that test-based evaluation has failed. Why are so many states and districts holding on to a failed strategy for evaluating teachers? Is it inertia? Apathy?
The model in use is obsolete. It failed. It is time to move on.
“Under RTT, teacher-evaluation policies were designed using economic theories of motivation and compensation and statistical growth tools such as value-added measurement. Evaluation policies based on principles of economics and corporate management have failed to take into account the complex and personalized work of educating students.
While evaluation aims to address teacher performance and quality, what we don’t see is acknowledgement of teacher voice and choice in how policies affect their work. We need to create learning-focused evaluation policies for teachers that enable both students’ and teachers’ growth and align with the needs of schools, students, and communities.
“It’s clear to most educators that the current crop of teacher-evaluation systems is flawed, overwrought, and sometimes just plain broken. Detailed case studies demonstrate that some states now spend millions of dollars on contracts with data-management companies and statistical consulting firms. Many states and districts make similar investments despite the fact that researchers and policymakers question the wisdom of value-added measurement within high-stakes teacher evaluations.
“There is now an entire industry devoted to the evaluation of teaching and the management of student data. There are online professional-development video databases and classroom-walkthrough apps for school leaders—which have not demonstrated a positive effect on instruction. But all of them have inflated the edu-business marketplace…
“A learning-focused teacher-evaluation policy would create the organizational and social conditions teachers need to thrive. During goal-setting with administrators, teachers would work together to write challenging, yet attainable, goals for themselves and their students. They would also have professional-development opportunities to learn about different types of student-progress measurement tools to refine what works best. And in feedback meetings with school leaders, teachers would have space to reflect upon areas of their success and weakness. In turn, principals would devote time and energy to framing evaluation as an opportunity to learn about—rather than judge—teaching.
“To begin the transition toward this kind of evaluation, state and district administrators must shift the balance of resources away from measuring and sorting teachers into categories. School leaders must focus on subject-specific questions about teaching and learning, rather than applying a generic set of indicators. And instead of boiling teachers’ work down to a rating, leaders must share observations that help teachers extend what they do well and identify where they can grow.
“Only when we involve teachers in the process of evaluation policymaking will we come up with a system that supports and develops the teaching expertise students deserve.”
A couple of years ago, we organized a survey of teachers and administrators, focused on their experiences with current teacher evaluation systems. The responses showed widespread dissatisfaction. A team of teachers reviewed the responses and came up with six recommendations on how to improve the process. Please take a look here.
https://networkforpubliceducation.org/2016/04/6468/
Here are those recommendations:
• An immediate halt to the use of test scores as any part of teacher evaluation.
• Teacher collaboration should not be tied to evaluation but instead be a teacher- led cooperative process that focuses on their students’ and their own professional learning.
• The observation process should focus on improving instruction—resulting in reflection and dialogue between teacher and observer—the result should be a narrative, not a number.
• Evaluations should require less paperwork and documentation so that more time can be spent on reflection and improvement of instruction.
• An immediate review of the impact that evaluations have had on teachers of color and veteran teachers.
• Teachers should not be “scored” on professional development activities.
This is horrendous and incoherent piece of bureaucrat-speak, which is hard to comprehend even after reading each sentence twice. I would expect this from a Washington, DC functionary but not from a teacher. Some imperatives are statements, other are commands. Why “scored” is in quotes? Also, I did not know that one can say “halt to”.
Anthony, I wonder about the TN responses on your survey.
A recent Harvard study (sponsored by the Gates Foundation) praises TN educational policy, saying they must be doing something right since the state’s scores on NAEP have improved a great deal since 2009. A Chalkbeat (also sponsored by the Gates Foundation) article is about the Harvard study, and offers this as an explanation for TN success
Since 2010, higher academic standards has been an integral part of Tennessee’s long-term plan for improving public education. The other two components are an aligned state assessment and across-the-board accountability systems for students, teachers and schools, including a controversial policy to include student growth from standardized test scores in teacher evaluations.
https://www.chalkbeat.org/posts/tn/2018/05/22/from-an-f-to-an-a-tennessee-now-sets-high-expectations-for-students-says-harvard-study/
My understanding is that the TN teacher evaluation system called TVAAS is VAM in name change only, but it would be great if somebody more in the loop would say something more definite.
What I can say with conviction is that the so called new TN standards are minor modifications of Common Core, despite 18 months of work put into it. Here is a side by side comparison of the first 10 Kindergarten math standards (poor kindergartners!!)
Click to access cc_tncc.pdf
I poked into the new TN standards in many places, and found the same extreme similarity with Common Core. TN officials, on the other hand, claim that “Teachers are excited” about the new standards.
https://www.chalkbeat.org/posts/tn/2016/04/15/at-long-last-phase-out-of-common-core-is-official-in-tennessee/
“We now know that test-based evaluation has failed.”
Hell, we knew back then that it would fail.
There is no need to reinvent the wheel in order to change to a more authentic, useful type of evaluation. All that is needed is to dust off some of some of the cobwebs from evaluations done before Gates inserted himself into policies. The most legitimate form of evaluation is local with local school districts setting the criteria and format for evaluation. The feedback provided is designed to help teachers develop their craft. There is no need to spend millions on a third party vendor to accomplish this. While this is not a perfect system, it is far better and more useful than quasi-scientific VAM.
“All that is needed is to dust off some of some of the cobwebs from evaluations done before Gates inserted himself into policies. ”
But it’s not new and shiny (and no on can make boku bucks off of it)!
Bill Gates still wants to “stack rank” teachers and public schools, but he tried that at Microsoft and “stack ranking” failed there too.
In 2014, “Microsoft officially killed that old system, known as “stack ranking” in November. With stack ranking, no matter how well an individual performed, that person might still get a bad review because a certain percentage of folks had to be ranked as a bottom performers compared to their co-workers.” …
“Another big change in the new system, across Microsoft including SMSG, is that employees will be getting regular feedback from their managers on how they are doing on their core priorities, with adjustments to their annual goals made throughout the year. They will no longer be reliant on an annual review, our source said.” …
“nor is Microsoft the only tech company that used it. GE’s CEO Jack Welch is credited with making it popular in the 1980s. He called it Rank and Yank meaning the bottom 10% would get fired, reports Forbes. We’ve heard from employees at other large tech companies who tell us they’ve experienced the same sort of thing.”
http://www.businessinsider.com/microsofts-old-employee-review-system-2014-8
Erm, now where did Gates or whoever else in MS (Ballmer, probably?) get the idea of stack ranking? Duh, they got it from colleges and even some high schools, where it is called – surprise! – “bell curve” or just “curving”. I have always hated curving, it is unfair to high achievers, and it always finds low achievers, and it allows even a bad professor to look good. So what if his questions on a test did not make any sense – the curve will cover this up and instead reveal the As and the Bs and the Cs, just set arbitrary cutoff values. Sweet!
I never graded on a bell curve, and for the thirty years I taught, I didn’t know any teachers that graded on a bell curve. In fact, when I was teaching 8th grade, I had one English class where no one earned a failing grade. Grades were based on the work students did and if they all did the work, it was possible that everyone could earn an A. It was even possible to earn an A in the classes I taught even if the final exam was failed because tests were never worth more than 10 percent of the grade and there was also challenging assignments for all of my students if they were willing to do them. Because of those assignments, there were students in all my English classes that earned 130 percent or higher and 90 percent was an A-.
And when I was the journalism advisor for the international, national and regional award-winning high school paper for seven years and also taught one section of journalism as a class, not one student ever failed. Most earned As because they did everything and did it on time — never missed a deadline.
In fact, I was not alone among K012 teachers in how I graded, and DA’s rant about the bell curve is wrong, because “In the U.S., strict bell-curve grading is rare at the primary and secondary school levels (elementary to high school) but is common at the university level.”
https://www.k12academics.com/education-assessment-evaluation/bell-curve-grading
Lloyd, you clearly have issues with close reading, that is, you do not. I wrote: “colleges and even some high schools”, which is pretty much the same as ” rare at the primary and secondary school levels (elementary to high school) but is common at the university level.”. I did not even mention elementary and middle schools.
But thanks for explaining how you used to grade.
Loud Laughter from me. Close Reading, seriously, for a comment from someone that is clearly biased and a sock puppet and failed troll. In fact, I scan most posts and comments and skip the really long ones. The reason is simple. Too many of them and not enough time.
“colleges and even some high schools”, which is pretty much the same as ” rare at the primary and secondary school levels (elementary to high school) but is common at the university level.” No, it’s not. Nice try. You would have done better to say you had no idea what common practice is at the primary and/or secondary level.
Lloyd, so you confirm that you answer without fully comprehending the comment that you are answering to? Sure, why not, kilobytes are pretty much free. But do not expect people to answer your muddled diatribes.
Play your games BackAgain. I have nothing but contempt for your games. Your attempt to belittle or embarrass me accomplished nothing except in your biased, ignorant head.
The test-based “Value-Added Method” (VAM) of evaluating teachers has been “slammed” — quoting The Washington Post — by the very people who know the most about data measurement: The American Statistical Association (ASA). The ASA’s authoritative, detailed, VAM-slam analysis, titled “Statement on Using Value-Added Models for Educational Assessment” and has become the basis for teachers across the nation successfully challenging VAM-based evaluations.
Even though it’s anti-public school and anti-union, the Washington Post said the following about the ASA Statement: “You can be certain that members of the American Statistical Association, the largest organization in the United States representing statisticians and related professionals, know a thing or two about data and measurement. The ASA just slammed the high-stakes ‘value-added method’ (VAM) of evaluating teachers that has been increasingly embraced in states as part of school-reform efforts. VAM purports to be able to take student standardized test scores and measure the ‘value’ a teacher adds to student learning through complicated formulas that can supposedly factor out all of the other influences and emerge with a valid assessment of how effective a particular teacher has been. THESE FORMULAS CAN’T ACTUALLY DO THIS (emphasis added) with sufficient reliability and validity, but school reformers have pushed this approach and now most states use VAM as part of teacher evaluations.”
The ASA Statement points out the following and many other failings of testing-based VAM:
“System-level conditions” include everything from overcrowded and underfunded classrooms to district-and site-level management of the schools and to student poverty.
A copy of the VAM-slamming ASA Statement should be posted on the union bulletin board at every school site throughout our nation and should be explained to every teacher by their union at individual site faculty meetings so that teachers are aware of what it says about how invalid it is to use standardized test results to evaluate teachers or principals — and teachers’ and principals’ unions should fight all evaluations based on student test scores with the ASA statement as a good foundation for their fight.
Oh that your argument where VAM abuse “should be explained to every teacher by their union at individual site faculty meetings so that teachers are aware of what it says about how invalid it is to use standardized test results to evaluate teachers or principals…” could somehow backtrack and take out so many long, long YEARS of union collaboration with the teacher evaluation game. How often have teachers heard from one union worker or another that “we have to use testing to find the bad teachers…”
Also, student evaluations of teachers needs to be revisited—there’s a wave of revamping these/eliminating them going through higher ed right now, yet K-12 seems silent on the issue: https://www.insidehighered.com/news/2018/05/22/most-institutions-say-they-value-teaching-how-they-assess-it-tells-different-story
At the end of each year I had the students fill out a survey, even “grade” me. There was room for commentary. I learned a bit, even modified what I did due to the students’ evaluations but generally the students confirmed what I was doing in the class was working. No one else saw them but me. I know of very few other teachers that did the same sort of thing. I found it useful and the students usually enjoyed expressing themselves and getting to “grade” the teacher. I told them up front how it was going to be used, that it had nothing to do with their grades, as a matter of fact I didn’t look at them until grades were in.
I started that about 8 years ago as well, and I’ve found some of the evaluations to be useful…enough so that I too have made changes to my practice.
My students responded similarly to yours. They enjoyed the exercise, and the vast majority took it seriously. My relationship with my students is one based on mutual respect, and their responses show that.
This is an excellent idea for secondary teachers. Even making a audio and/or video of a lesson can be useful for self reflection.
I do the same thing every year, Duane. The reflections (that’s what I call them) help me to plan lessons that are more engaging for the kids, and I learn a lot.
Absolutely if these are what you describe..unfortunately, many schools have centralized eval forms which create another datapoint for admins…
It was between the students and me. No need for an adminimal to get involved. Hell, the adminimals wouldn’t even sit down with me for an hour or so, so that I could explain what I was doing in class during the year. When I tried their eyes would glaze over after 5 minutes. Sad, very sad!
The ideas in this link come from professors concerned with literacy. Therein rests a problem
Demands is for unambiguous measures of teacher effectiveness are hard to kill because they are embedded in federal and state policies. There is also a difference between teacher-initiated surveys from their own students and the surveys now marketed to data-hungry administrators.
Here is information for David F and others about student surveys.
Here is a link to one of the first relatively new surveys, developed by economist Ron Ferguson, and piloted for the Gates-funded Measures of Effective Teaching study. This survey, now marketed as the Tripod survey, has an embedded image of an “effective” teacher independent of the subject matter being taught. In my judgment the survey is skewed to reward teachers who assign and check homework and function both as a helicopter monitor of students and as the “sage on the stage” giving assignments. The first version of the Tripod had a long list of questions about learning in the home environment. You can see that these were really out of bounds–intrusive and would not be allowed by FERPA privacy laws. Here is the early elementary school version with those questions–questions NOT present in the current version. This report raised good questions about the Gates-funded student surveys. http://k12education.gatesfoundation.org/download/?Num=2504&filename=Asking_Students_Practitioner_Brief.pdfhttp://www.sd394.com/images/MET_Project_Elementary_Student_Survey.pdf
You can see how the Tripod surveys are now marketed here http://tripoded.com/about-us-2/
or
Here is a link to student surveys from Panorama Education. These are being used in the CORE districts of California as part of their School Quality Improvement Index. https://www.panoramaed.com/panorama-student-survey
The current scales include students’ perceptions of:
Classroom Climate – the overall feel of a class including aspects of the physical, social and psychological environment;
Engagement – their behavioral, cognitive, and affective investment in the subject and classroom;
Grit – their ability to persevere through setbacks to achieve important long-term goals;
Learning Strategies – the extent to which they use metacognition and employ strategic tools to be active participants in their own learning process;
Mindset – the extent to which they believe that they have the potential to change those factors that are central to their performance in a specific class;
Pedagogical Effectiveness – the quality and quantity of their learning from a particular teacher about that teacher’s subject area;
Rigorous Expectations – whether they are being challenged by their teachers with high expectations for effort, understanding, persistence, and performance in the class;
School Belonging – the extent to which they feel that they are valued members of their school community;
Teacher-Student Relationship – the overall social and academic relationship between students and their teachers; and
Valuing of the Subject – how interesting, important, and useful a particular school subject seems.
https://www.panoramaed.com/panorama-student-survey
I think these surveys are over-theorized.
Here is a current student survey from the University of Chicago, licensed for use in many districts. https://www.uchicagoimpact.org/sites/default/files/2018%20CPS%205Essentials%20Student%20Codebook.pdf
I am for peer collaborations and reflection on practice, but these require time to arrange, low stakes, and job-alike assignments before scaling up.
Student surveys are 20% of a teacher’s evaluation in Utah, and they are a joke. For the little kids, they are colors–red, yellow, and green. The kids get bored and change colors, and what about color-blind kids?
For the older kids, the surveys are done outside of school time, and very few kids do it.
Parents are supposed to do it, too, and even fewer parents do the evaluations. All we get is a number from a statement. There are no comments, so teachers truly don’t get any information from them,
And I’m sure the adminimals make sure the system is “working”, eh!
“Why are so many states and districts holding on to a failed strategy for evaluating teachers? Is it inertia? Apathy?”
Why? It’s because of Bill Gates’ dark money, that’s why.
One of the reasons VAM is failing is because teaching in American schools is highly segmented, even in elementary school. Each year kids are booted off to a different teacher, moreover they are usually re-grouped. It feels like everything is done to ensure that kids do not develop attachment to either their peers or their teachers.
Another effect is that a teacher sees a particular student for only one year. Whatever damage has been done by the prior teacher, the current teacher has to fix it within a year and then add his own value. Or make even more damage. After several years the issues accumulate, and everyone points to everyone else.
The same idiocy happens with districts, where high school district is separate from elementary and middle, this creates all kids of issues when one wants to accelerate while in middle school and integrate middle and high school curricula in meaningful way.
While VAM is a faulty approach to evaluate teachers in itself, its harm is exacerbated by the inefficient and downright malevolent system.
One suggestion I have is to factor in community (school) engagement into tenure. Teachers who advise clubs, attend school events, etc., add positively to the school environment. Of course, these things are hard to quantify, but most students can recognize an engaged teacher when they see one.
My administration evaluator last year told me that I wouldn’t get a satisfactory evaluation unless I was doing two different school activities or committees. I was the debate coach and spent over a hundred unpaid hours at tournaments and practices, yet that wasn’t enough, because it wasn’t two different activities. Be careful what you wish for.
Did you mean?:
“My adminimal evaluator last year told me that I wouldn’t get a satisfactory evaluation unless. . .
Ya think it is time?
Hmm, Lee me think…it was 1997 when I was the NY State Educator of Excellence, and chosen by Harvard and Pew as the NYC cohort of the standards research, but in 1998, I was told I was incompetent… even though my work was touring the nation, and a feature of seminars at the LRDC, (UNIV. of Pittsburgh).
NYC showed it’s utter contempt for an authentic teacher, and emptied my employment folder of 4 decades of successful service, and all my awards — including my 4 inclusion in WHO’S WHO AMONG AMERICA’S TEACHERS– to fill MY employment history with new ‘documentation’ of my incompetence.
It is 20 years later. Hundred’s of thousands of experienced, dedicated genuine teacher-professionals are GONE.
Time, LONG past!
In the “ideal” world we can construct teacher assessment systems, in the real world supervisors make subjective judgments, let’s be honest, in some school districts teachers are observed once a year, pro-forma, everyone receives a satisfactory rating, in other schools “hard-ass” principals give numerous unsatisfactory ratings …. can an Artificial Intelligence HAL be fair?
No, because it’s still humans that program that HAL.
AI is no match for human stupidity. We shall prevail…
Local evaluations are not perfect, but they are far better than VAM. We will always have a few administrators that abuse power, but overall, it is a better system than a mysterious algorithm.
If you want NYS to repeal the teacher evaluation law, then please sign our petition to repeal and then spread the word to others.
https://petitions.moveon.org/sign/repeal-nys-teacher-evaluatio?source=embedhomepage