Archives for category: Teacher Evaluations

Kenneth Mitchell, a school superintendent in the Lower Hudson Valley of New York, has been concerned about the costs imposed on school districts by Race to the Top. He previously estimated that six districts in his region would spend $11 million to comply with the mandates of Race to the Top, which paid these districts $400,000.

In this comment, he describes a recent meeting with lawyers about possible lawsuits that will be brought because of New York’s flawed Educator Evaluation System.


On Friday, March 14, The Lower Hudson Council of School Superintendents hosted a panel of education attorneys to address the following topic:


Supervision, Evaluation & Tenure Decisions

• What is the effect of Education Law §3012-c on a school district’s ability to terminate probationary teachers and principals?

• How might overly prescriptive, rigid statutory and regulatory policy frameworks, such as §3012-c, regarding teacher evaluation, tenure, and employment decisions withstand teacher and principal appeals?

Statistical Reliability and Validity of Data in Supervision, Evaluation & Tenure Decisions

• How might the statistical reliability and validity of measures of teaching effectiveness – state assessments, VAM, SLO’s, school-wide assessment scores – affect teacher evaluation, tenure, and employment decisions?

• How will the metric of ‘confidence intervals’ be considered in a legal decision about a teacher’s effectiveness?

• How will the number of years of value-added assessment data to determine teacher quality be a factor in a teacher or principal appeal?

• In what ways will the use of locally-developed assessments, such Student Learning Objectives (SLOs), be challenged in an appeal?

• How will the individual evaluation of a teacher based on school-wide data, such as the 4th grade math assessment, withstand an appeal?

Implementation, Professional Development, and Resources

• How will such factors as consistency, training, and quality be considered in observations and evaluations developed by supervisors?

• How will equity issues, such as the access to materials (e.g., Common Core units) or technology, be a factor in an appeal?

• Experts in child and adolescent development have asked for a review of the Common Core to ensure that all of the standards are developmentally appropriate.
Since assessments are being developed on the basis of Common Core and teachers and principals being assessed accordingly, how will the aforementioned concerns be considered?

Other References

“Evaluation Law Could Limit Ability to Terminate Probationary Teachers”; Warren Richmond III (Harris Beach), New York Law Journal, (May 2013)

“Legal Issues in the Use of Student Test Scores and VAM to Determine Educational Quality”; Diana Pullin, Education Policy Analysis (2010 Manuscript)

In addition to these references, we have posted other related legal articles on the main page of our website: We have also raised other concerns about the model that we have shared with state legislators, members of the Board of Regents, officials at the State Education Department and with representatives of the governor’s office. There are many other questions that will need to be answered once this enters the legal arena.

We shared that many in our organization have concerns that a) the design of reform model is flawed on multiple levels; b) the expedited and unsupported implementation will further contribute to inevitable legal challenges; c)the weak technical basis and very limited or no research behind elements of the model will not withstand legal challenges. These are just a few of our concerns. As a result, school districts will be wasting even more time and money on legal costs. Unless significant changes are made on the basis of substantive evidence, New York’s reform model is headed for trouble that will move beyond the anxiety and frustration of over-tested students, angry parents, weary teachers, and harried administrators.

Peter Greene just keeps writing hit after hit. This
one explains
what VAM means and why it works well in
manufacturing but not in dealing with human beings.

He explains how
Pennsylvania measures teacher quality: PVAAS uses a
thousand points of data to project the test results for students.
This is a highly complex model that three well-paid consultants
could not clearly explain to seven college-educated adults, but
there were lots of bars and graphs, so you know it’s really good. I
searched for a comparison and first tried “sophisticated guess;”
the consultant quickly corrected me—“sophisticated prediction.” I
tried again—was it like a weather report, developed by comparing
thousands of instances of similar conditions to predict the
probability of what will happen next? Yes, I was told. That was
exactly right. This makes me feel much better about PVAAS, because
weather reports are the height of perfect prediction.

Here’s how it’s supposed to work.

The magic formula will
factor in everything from your socio-economics through the trends
over the past X years in your classroom, throw in your pre-testy
thing if you like, and will spit out a prediction of how
Johnny would have done on the test in some neutral universe where
nothing special happened to Johnny. Your job as a teacher is to get
your really Johnny to do better on The Test than Alternate Universe
Johnny would.
The only thing that goes wrong is that it
doesn’t work. Students are not inanimate objects like pieces of
steel. So he concludes: This is one more example of a
feature of reformy stuff that is so top-to-bottom stupid that it’s
hard to understand.

But whether you skim the surface, look at the
philosophical basis, or dive into the math, VAM does not hold up.
You may be among the people who feel like you don’t quite get it,
but let me reassure you– when I titled this “VAM for Dummies,” I
wasn’t talking about you. VAM is always and only for dummies; it’s
just that right now, the dummies are in charge.

See? All that’s required for VAM to work is believing
that the state can accurately predict exactly how well your
students would have done this year if you were an average teacher.
How could anything possibly go wrong??

As is well known, the U. S. Department of Education zealously believes–like Michelle Rhee–that low test scores are caused by “bad” teachers. The way to find these ineffective teachers, the theory goes, is to see whose students get higher scores and whose don’t. That’s known as value-added measurement (VAM), and the DOE used Race to the Top to persuade or bribe most states to use it to discover who should be terminated.

As we also know, things have not worked out too well, as some Teachers of the Year were fired; some got a bonus one year, then got fired the next year. In many states, teachers are rated by the scores of students they never taught. The overall effect of VAM has been demoralization, even among those with high scores because they know the ratings are arbitrary.

For some reason, teachers don’t like to “win ” at the expense of their colleagues and they can spot a phony deal a mile away.

But the U.S. DOE won’t give up, so they released a research brief attempting to show that VAM does work!

But Audrey Amrein Beardsley deconstructs the brief and shows that it is a mix of ho-hum, old-hat and wrong-headed assumptions.

It’s true (but not new) that disadvantaged students have less access to the best teachers (e.g., NBCT, advanced degrees, expertise in content areas (although as Beardsley says, the brief doesn’t suggest such things matter).

It is true, that “Students’ access to effective teaching varies across districts. There is indeed a lot of variation in terms of teacher quality across districts, thanks largely to local (and historical) educational policies (e.g., district and school zoning, charter and magnet schools, open enrollment, vouchers and other choice policies promoting public school privatization), all of which continue to perpetuate these problems.”

She writes:

“What is most relevant here, though, and in particular for readers of this blog, is that the authors of this brief used misinformed approaches when writing this brief and advancing their findings. That is, they used VAMs to examine the extent to which disadvantaged students receive “less effective teaching” by defining “less effective teaching” using only VAM estimates as the indicators of effectiveness, and as relatively compared to other teachers across the schools and districts in which they found that such grave disparities exist. All the while, not once did they mention how these disparities very likely biased the relative estimates on which they based their main findings.

Most importantly, they blindly agreed to a largely unchecked and largely false assumption that the teachers caused the relatively low growth in scores rather than the low growth being caused by the bias inherent in the VAMs being used to estimate the relative levels of “effective teaching” across teachers. This is the bias that across VAMs is still, it seems weekly, becoming more apparent and of increasing concern.”

VAM in the real world is Junque Science.

This letter arrived from:

Douglas McGuirk

English Teacher

Dumont High School Dumont, NJ

My Testimony about the AchieveNJ Act:

The AchieveNJ Act is certainly doing its part to make a convoluted mess out of the art of teaching our children.

In this testimony, I will address the most readily apparent of its many problems: data collection, Student Growth Objectives, Student Growth Percentiles, PARCC tests, and the new observation system. The AchieveNJ Act, and all of its affiliated changes, is simultaneously stretching the education profession in two different directions, most likely to the point of snapping it in half. I am no longer certain about what my job description is these days; am I a teacher, one who attempts to engage students and help them understand subject matter and their world, or am I a data collector, one who keeps statistics on all manner of measurables in a theoretical attempt to improve the process of teaching in which I am often not engaged because I am busy collecting the data?

AchieveNJ seems to operate on the fallacious principle that there is an infinite amount of time. During my day, this humble English teacher will collect data, analyze data, send students out for standardized tests, be observed by an administrator, and, somewhere in and among all of that, plan lessons, grade papers, and teach students. When do all of these things happen? How do they get done? How do I prioritize if each of these items is now considered crucial?

Most days only allow for one to two hours of time not spent in front of a class. Allow me to recount a personal story of how I spent two weeks in October of 2013. Every moment I worked, excluding those during which I was contractually obligated to actually teach students, was spent doing something related to my Student Growth Objectives (SGOs). I had previously administered a benchmark assessment or pre-test (no staff member in my school is sure about what terminology to use, so we have alternately used both, to the point that the students are not sure whether they are being benchmarked, or pre-tested, or, to put in plainly, harassed into doing something they do not wish to do), so I had a stack of essays that needed scoring. To start work on my SGO, I graded the essays according to the soon-to-be obsolete NJ Holistic Scoring rubric. Then I created and organized a spreadsheet to sort and organize my data. Then I entered all of the scores into the spreadsheet. Then I read through all the emails sent by district administrators about how to create my SGO. Following that, I formally wrote my SGO and submitted it to my supervisor.

The next day, the SGO was rejected, and my supervisor told me that all SGOs had been done incorrectly and that our staff would need training. We held a department meeting to review SGO policies. We then held an after school training session to discuss the writing of SGOs. I attended both of these. After two weeks of writing and rewriting my SGO, complete with all of the Core Curriculum Content Standards pasted from the web site, I finally had an acceptable SGO. I managed to accomplish absolutely no lesson planning during this period of time. I graded no papers. I am a veteran teacher with nine years in the profession. I understand how to manage my workload, overcome setbacks, and complete my responsibilities. In short, I am a professional who maintains a diligent work ethic.

But nothing could prepare me for the amount of time I had just spent on a new part of my job that basically exists so that I can continue to prove that I should be entitled to do the other parts of my job. After I completed my SGO, my principal told our staff to make sure we save all of the data, paperwork, and student work relating to our SGO, just in case people from the State want to review the integrity of the data. Seriously? This is the most egregious assumption that there is an infinite amount of time.

When will State reviewers go back and reread mountains upon mountains of SGO data to make sure that my essay scores (which suffer from an inherent subjectivity anyway) are accurate? The real goal of the SGO process seems to be to take teachers so far out of their comfort zones, and so far from working directly with students, that they may begin to question what kind of work they are doing anyway. Wouldn’t this time spent collecting mountains of dust-collecting data be better spent planning more interesting lessons? Offering students more feedback on work they understand and view as necessary? Researching content to make myself more knowledgable and helpful to my students? I guess not.

I have to teach my students the content needed to improve on the SGO so I can keep my job, which apparently consists of collecting even more SGO data. Just in case the SGO process is not intimidating and distracting enough, many of us (myself included) now have the threat of Student Growth Percentiles (SGPs) looming as well. The fact that these SGPs only apply to certain disciplines is inequitable and unfair to begin with, but that does not even address the fact that the correlation between my SGP score and my actual effectiveness is non-existent. Every article that I have read on this issue shows that the data produced by SGPs is statistically insignificant in its ability to determine my actual teaching effectiveness. I might as well determine a sizable portion of my evaluation by rolling dice or, to draw upon history, releasing doves and watching which way they fly. I have no control over how hard the students will work on these tests. I have no control over how thoroughly they will prepare.

From what I have read, these PARCC tests do not even have any actual effect on student grades or promotion. They are only used to evaluate me. In that case, allow me to hand-select the students who will be used to determine my effectiveness. Or better yet, the most fair thing to do would be to allow me to take the test myself, so at least I can have complete control over my own evaluation. Beyond just potentially affecting me in a random (and possibly absurd) way, the PARCC tests further reinforce the current contradictory nature of education rhetoric. What do policymakers want for our children? I consistently hear, from the mouths of our politicians, that our students are falling behind (falling behind whom?) in their critical thinking skills. (May we at least ask, how are these critical thinking skills measured? By bubble tests?) If that is the case, then shouldn’t we, as professionals, seek to introduce more critical thinking tasks, like project-based learning, into our curricula? Aren’t multiple choice standardized tests anathema to critical thinking tasks? Why is anyone promoting them, then? Where is the emphasis? Do we want students to legitimately be able to assess and evaluate on their own? Or do we want illogical measures to make sure that our teachers are, well, doing what exactly? If (some) teachers’ jobs are contingent on whether or not they achieve a high SGP score, then those teachers will, for the sake of their own self-preservation, certainly spend a great deal of time and energy trying to prepare students for those very tests, even though they cannot do the one thing that will ensure satisfactory scores, which is make the students put forth their best effort.

No students dislike learning. But many dislike education, because education consists of misguided and needlessly enervating tasks like standardized tests. Instead of spending this time engaged in critical thinking, students will be responding to questions that will be used to make sure their teachers are doing their jobs. Ironically enough, teachers will again be doing less of their jobs, as I assume we will be called upon increasingly to babysit computer labs full of children clicking vapidly through PARCC assessments. (As a side note, I am sure international test production companies like Pearson stand to profit from this arrangement immeasurably, probably at the expense of my own paycheck, most of which would have been spent in the local New Jersey economy.)

The final issue I will address in the AchieveNJ Act is the inconsistent new observation system. For starters, the public school districts across the state use two different evaluation systems: Danielson or McRel. If we are striving for consistency, why can we not agree on a single, unified observation system, so that all teachers are theoretically evaluated in the same fashion? Still, even if we achieved such uniformity, all observations would continue to suffer from the same inherent bias as the grades on students’ essays: each observer (or teacher, as is the case with the essays) has a different viewpoint (yes, even using a rubric). The administrators who serve as observers in my school have wildly varying interpretations about what constitutes an effective lesson. Even worse, some administrators are offering critiques to teachers about “how the lesson should have been conducted,” and providing less than satisfactory ratings to teachers who choose to do something in a different way.

The biggest source of all of this uncertainty and inconsistency has been the use of technology. Some of our administrators have said that we are to use technology in every single lesson, no exceptions. Others have been more lax about this requirement. I make this point to further illuminate the backwards nature of many these evaluative changes. If we must use technology, then technology is the starting point for each and every lesson. Previously, student learning was my starting point. What tools will help my students learn? Am I there to teach them or to show off the latest and greatest tech toys in my classroom? Are observers looking for critical thinking? Are they looking at my rapport with students? Or are they there to make sure that I go through the motions (according to one person’s rubric of what constitutes effective teaching) of reaching all of my supposed requirements? The inherent subjectivity of trying to quantify the unquantifiable is of course the same issue with which I wrestle when trying to score the essays that will make up my SGO. We all now must worship at the altar of data, even though, at best, the data is fickle and, at worst, it is fraudulent.

In the end I am not quite sure how to proceed under the AchieveNJ system. To paraphrase Plato, a single part of one’s soul cannot be engaged in two contradictory actions at the same time. So the only thing I can do is to default back to the ways in which I have always taught. I will try to help my students learn. I will try to reinforce material that I think is of value. I will provide as many insights from my own experiences as I can. I will focus on the human side of teaching and learning, my AchieveNJ ratings be damned. If this system says that an intelligent and dedicated individual like me is not fit to teach the students of New Jersey, then it is even more broken than my testimony could ever hope to convey.

Kevin Strang, a high school music teacher in Orange County, Florida, won an $810.87 bonus for teaching in an A-rated school. He is donating his bonus to the Network for Public Education to fight high-stakes testing, school grading, merit pay, and the other corporate reforms that treat teachers as donkeys in need of carrots and sticks.

Kevin is a professional, and he expects to be treated as a professional.

“Strang, who has taught in Florida schools for 15 years, sent out a press release Wednesday stating that “the $810.87 received for his school’s ‘A’ rating will instead be sent to the Network for Public Education, an organization dedicated to ending the practice of linking high-stakes testing to teacher evaluations and pay….”

“Strang’s own teaching evaluation was tied to math and reading exams of ninth-graders, though he teaches music.

“I don’t feel right taking the money when there are teacher teaching at schools with different populations not receiving the money,” he said Wednesday. “It’s like I’m being rewarded for parenting skills.”

Thank you, Kevin!

You inspire all of us at NPE to fight harder for you!

Kevin Welner, director of the National Education Policy Center, wrote this commentary in response to the complaints of teachers who are evaluated by the scores of students they never taught. Few people can understand the complex algorithms underlying VAM scores, and the people who wrote these formulae can’t explain them in pain English. Yet teachers are fired or get a bonus if their incomprehensible rating is low or high. Bear in mind that few, if any, states would have adopted these measures without the financial and political pressure exerted by Arne Duncan, Race to the Top, and the Obama administration, which demanded them.

Welner writes:

“As you probably know, Diane, my biggest concerns about high-stakes accountability systems tied to measures of academic growth aren’t technical—they’re about perverse incentives. Yes, the technical problems are very real, but even if they were all somehow overcome, we’d be left with a much poorer system of education that’s narrowly focused on what’s being measured.

“Having said that, I do want to add to your earlier post concerning the Florida VAM. I think the post makes three good points but overlooks the most important one.

“As you point out, the model is nonsense when applied to educators who don’t teach the tested subjects. And as you point out, application of the model results in misclassifications—as do all such models. Finally, as you point out, very few readers can understand the model.

“But that leads to a somewhat different point that I think is very important. Florida’s legislators, its Commission of Education, and the members of the State Board of Education almost surely are among those who cannot understand the model. My hunch is that the AIR experts who developed FL’s model have walked through it, possibly multiple times, with these policy makers. But the math is just too complex. (Note that the excerpt you pasted from page 6 of the AIR report is just the general form of the model; if expanded it would be much more overwhelming—see the next 10 pages of the report.)

“This is not a criticism of the model or its developers; simple regression models that could be relatively easily understood have well-documented flaws. But adding vectors capturing the effect of lagged scores, mathematical descriptions of Bayesian estimates, and within-student covariance matrices—while all justified in the report—has the obvious effect of placing policy makers at the mercy of whichever experts they choose to listen to.

“This sort of problem does come up in other contexts; to some extent it’s unavoidable. When Congress votes to fund a NASA mission, the underlying math, physics and engineering are similarly beyond normal understanding. When judges hear expert testimony in a pharmaceutical case, etc., they also must confront their own limitations. But at least in those instances, there’s a procedure in place to take oppositional testimony.

“The best analogy here is probably to the defense industry, which works with people in the defense department to design a new weapons system and then helps to market it to Congress. The result is often something technically sophisticated and, for most members of Congress, well beyond their ability to understand strengths and weaknesses.

“Perhaps that’s why the non-technical evidence is so important. We can all understand the problem when a teacher explains that her evaluation is based on the academic growth of students in areas she doesn’t teach. We can all also, to some extent, understand the problem of unreliable evaluations that result in misclassifications.

“But we should, at the very least, recognize and acknowledge the reality that these policies are being adopted by policy makers who pretty much have no clue what it is that they’re putting in place.”

This teacher thought she was doing a swell job. But then
ratings came out and she discovered she is the worst
cher in the state! In the past, she has won many
awards, and she loves teaching. In addition: I initiated
and continue to run the chess and drama clubs with no
remuneration. I do get a small stipend for being the
academic games coordinator, running the Mathletes team and spelling
bee for the school, along with keeping the staff and students
informed of enrichment opportunities like academic
competitions. I organize the field trips for my grade
level and a trip for 4th and
5th graders to spend three days at an
oceanographic institute in the Florida Keys.

My own 5th grade
gifted students will end this year with a full understanding of
three Shakespearean plays, as class sets of these and other texts
were secured through my Donors Choose
requests. Saturday, I’ll be the designated
representative picking up free materials for my
school. I write the full year’s lesson plans over the
summer (then tweaking as I go).
She is the victim of the ceiling
effect. Her students got such high scores last year that they can’t
get higher scores this year.
She explains:
Last year, many of my students had had the
highest scores on the state tests possible the year prior—a 5 out
of 5. That’s how they get in to my class of gifted and
high achieving students. Except, last year, they
raised the bar so that the same
5th graders who scored 5s in
4th grade were much less likely to earn
5s in math and reading in
5th grade. Some still DID
score 5s in math AND reading, yet were still deemed not to have
made sufficient progress because they did not score as high within
the 5 category as they had the year before.

It’s like expecting the members of an Olympic
pole vaulting team to all individually earn gold medals every time
the Olympics come around, regardless of any other factors affecting
their lives, with the bar raised another five inches each go
around. In a state where 40% of students pass the
5th grade science test, 100% of my
students passed; but no one (at the state level) cares about
science scores.
Therefore, I suck.
How nutty is this? Why does the
U.S. Department of Education insist that states must adopt flawed
measures? Does anyone at the U.S. Department of Education consider
the consequences of their policies? Do they know anything about
research or evidence? Do they care how many people lives or
reputations they carelessly ruin with their dumb ideas?
Just wondering.

In this age of value-added measurement, when teachers are judged by the rise or fall of their students’ test scores, it is very dangerous to teach gifted classes. Their scores are already at the top, and they have nowhere to go, so the teacher will get a low rating. It is also dangerous to teach English language learners, students with disabilities, and troubled youth. Their scores will not go up as much as the kids in affluent districts who have no issues.

Here is what happened to one teacher of gifted students:

“As a teacher of gifted students in Florida, I can attest to the fact that you are more likely to get slammed by VAM. I was rated the worst teacher at my school, the 14th worst teacher in my district, and the 146th worst teacher in the state of Florida (out of 120,000). Previously, I had a great reputation at my school among staff, parents, and students. Now that these scores have been published on the internet, I fear that future students, parents and administrators might be influenced by my extremely negative VAM ranking. Even if they aren’t, I have to worry about being slammed by VAM two years in a row, being rated “needs improvement”, losing my job and having my teaching license revoked by the state. Funny, just two years ago I was selected to be a mentor teacher by my district in the subject that I teach. Now I’m at risk of losing my career based on VAM results of a subject I don’t teach. Thanks a lot Arne.”

Mercedes Schneider came across a speech
that Bill Gates gave to state legislators in 2009
. It
lays out the blueprint for everything that has happened in
education since then. Forget what you learned in civics class.
Gates gave legislators their marching orders. Duncan already had
his marching orders. Gates laid out $2.3 billion to create and
promote the Common Core standards. His buddy Arne handed out $350
million to test Bill’s standards. All the other pieces are there:
Charter schools should replace failure factories. He is a true
believer in charter magic. (We now know that charters get the same
results when they have the same students.) Longitudinal data
systems should be created to track students. (A parent rebellion
seems to have put this on the back burner for now, although
everyone seems to be mining student data, from Pearson to the SAT
to the ACT.) The teacher is the key to achievement (although real
research says the family and family income dwarfs teacher effects).
Here is the man behind the curtain, the man who loves data and
measurement, not children. Lock the doors, townspeople. Bill Gates
wants to measure everything about your children! Ask yourself, if
this guy made $60,000 a year, would anyone listen to him?

After this blog was posted, two privacy activists–Allison White
and Leonie Haimson advised me that the collection of confidential
data about children is going forward, thanks to Arne Duncan’s
loosening of privacy rights under FERPA, the legislation designed
to prevent data mining. They write: “Actually at least 44 states
including NY are going forward with their internal P20 Longitudinal
data systems – as required by federal law – which will track kids
from cradle to the grave and collect their personal data from a
variety of state agencies.” Leonie Haimson is leader of Class Size
Matters and Prvacy Matters Allison Breidbart White is Co-author,
Protect NY State School Children Petition Please sign and share the

ALSO: I transposed the numbers describing what the Gates Foundation spent on Common Core: it was $2.3 billion, not $3.2 billion. A billion here, a billion there, soon you are talking real money (I think I am paraphrasing long-gone Senator Everett Dirksen of Illinois, but who knows?)

A friend who observed the proceedings in the Vergara trial sent me the following notes, based on the testimony of Stanford professor Linda Darling-Hammond. She is probably the nation’s leading expert on issues related to teacher recruitment, preparation, retention, and support. Her testimony, based on many years of study and experience, was devastating to the plaintiff’s case.

Linda Darling-Hammond’s testimony


Yesterday, expert witness Linda Darling-Hammond, a renowned scholar and Stanford professor, has refuted the main arguments of the plaintiffs’ lawyers.

Darling-Hammond, whose insights come from both research and experience, stated that measures based on student test scores do not identify effective teachers, that two years is enough time to identify teachers who should be counseled out of the profession, and that extending that period beyond two years would harm students.


On what a good evaluation process looks like.

“With respect to tenure decisions, first of all, you need to have – in the system, you need to have clear standards that you’re going to evaluate the teacher against, that express the kind of teaching practices that are expected; and a way of collecting evidence about what the teacher does in the classroom. That includes observations and may also include certain artifacts of the teacher’s work, like lesson plans, curriculum units, student work, et cetera.”

“You need well-trained evaluators who know how to apply that instrument in a consistent and effective way.

“You want to have a system in which the evaluation is organized over a period of time so that the teacher is getting clarity about what they’re expected to do, feed back about what they’re doing, and so on.

In California – note related to the tenure decision, but separately – there is a mentoring program that may be going on side-by-side; but really, that does not feed into the tenure decisions. It’s really the observation and feedback process.”

On the problem with extending the tenure beyond two years

“It’s important that while we want teachers to at some point have due process rights in their career, that that judgment be made relatively soon; and that a floundering teacher who is grossly ineffective is not allowed to continue for many years because a year is a long time in the life of a student.

“So I think that having the two-year mark—which means you’re making a decision usually within 19 months of the starting point of that teacher – has the interest of allowing a – of encouraging districts to make that decision in a reasonable time frame so that students aren’t exposed to struggling teachers for long than they might need to be.”

Other reasons why two years is enough

“My opinion is that, for the first reason I mentioned earlier—the encouragement to make a judgment about a grossly ineffective teacher before many years go by is a useful reason to have a shorter tenure period – or pre-tenure period.

“But at the end of the say, the most important thing is not the amount of time; the most important thing is the quality and the intensity of the evaluation and support process that goes on for beginning teachers.

On the benefits and importance of having a system that includes support for struggling teachers

“Well, it’s important both as a part of a due process expectation; that if somebody is told they’re not meeting a standard, they should have some help to meet that standard.

The principal typically does not have as much time and may not have the expertise in the content area that a mentor teacher would have. For example, in physics or mathematics, usually the mentor is in the same area, so the help is more intensive and more specific.

“And in such programs, we often find that half of the teachers do improve. Others may not improve, and then the decision is more well- grounded. And when it is made, there is almost never a grievance or a lawsuit that follows because there’s ben such a strong process of help.

“The benefits to students are that as teachers are getting assistance and they’re improving their practice, students are likely to be better taught.

“And in the cases where the assistance may not prove adequate to help an incompetent teacher become competent, the benefit is that that teacher is going to be removed from the classroom sooner, if, sort of, they allowed the situation to just go on for a long time, which is truncated by this process of intensive assistance….

“The benefits to districts are that by doing this, you actually end up making the evaluation process more effective, making personnel decisions in a more timely way, making them with enough of a documentation record and a due process fidelity, that very rarely does there occur a problem after that with lawsuits; which means the district spends a little bit of money to save a lot of money and to improve the effectiveness of teaching for its students.

On peer assistance and review (PAR) and other mentoring programs

“A PAR program and other programs that mentor teachers typically improve the retention of teachers; that is, they keep more of the beginning teachers, which is where a lot of attrition occurs. But they do ensure that the teachers who leave are the ones that you’d like to have leave, as opposed to the ones who leave for other reasons.”

On firing the bottom 5% of teachers

“My opinion is that there are at least three reasons why firing the bottom 5 percent of teachers, as defined by the bottom 5 percent on an effectiveness continuum created by using the value-added test scores of their students on state tests, will not improve the overall effectiveness of teachers….

One reason is that, as I described earlier, those value-added metrics are inaccurate for many teachers. In addition, they’re highly unstable. So the teachers who are in the bottom 5 percent in one year are unlikely to be the same teachers as who would be in the bottom 5 percent the next year, assuming they were left in place.

“And the third reason is that when you create a system that is not oriented to attract high-quality teachers and support them in their work, that location becomes a very unattractive workplace. And an empirical proof of that is the situation currently in Houston, Texas, which has been firing many teachers at the bottom end of the value-added continuum without creating stronger overall achievement, and finding that they have fewer and fewer people who are willing to come apply for jobs in the district because with the instability of those scores, the inaccuracy and bias that they represent for groups of teachers, it’s become an unattractive place to work.

“The statement is often made with respect to Finland that if you fire the bottom 5 percent [of teachers], we will be on a par with achievement in Finland. And Finland does none of those things. Finland invests in the quality of beginning teachers, trains them well, brings them into the classroom and supports them, and doesn’t need to fire a lot of teachers.”


Get every new post delivered to your Inbox.

Join 95,407 other followers