Archives for category: Teacher Evaluations

Kevin Strang, a high school music teacher in Orange County, Florida, won an $810.87 bonus for teaching in an A-rated school. He is donating his bonus to the Network for Public Education to fight high-stakes testing, school grading, merit pay, and the other corporate reforms that treat teachers as donkeys in need of carrots and sticks.

Kevin is a professional, and he expects to be treated as a professional.

“Strang, who has taught in Florida schools for 15 years, sent out a press release Wednesday stating that “the $810.87 received for his school’s ‘A’ rating will instead be sent to the Network for Public Education, an organization dedicated to ending the practice of linking high-stakes testing to teacher evaluations and pay….”

“Strang’s own teaching evaluation was tied to math and reading exams of ninth-graders, though he teaches music.

“I don’t feel right taking the money when there are teacher teaching at schools with different populations not receiving the money,” he said Wednesday. “It’s like I’m being rewarded for parenting skills.”

Thank you, Kevin!

You inspire all of us at NPE to fight harder for you!

Kevin Welner, director of the National Education Policy Center, wrote this commentary in response to the complaints of teachers who are evaluated by the scores of students they never taught. Few people can understand the complex algorithms underlying VAM scores, and the people who wrote these formulae can’t explain them in pain English. Yet teachers are fired or get a bonus if their incomprehensible rating is low or high. Bear in mind that few, if any, states would have adopted these measures without the financial and political pressure exerted by Arne Duncan, Race to the Top, and the Obama administration, which demanded them.

Welner writes:

“As you probably know, Diane, my biggest concerns about high-stakes accountability systems tied to measures of academic growth aren’t technical—they’re about perverse incentives. Yes, the technical problems are very real, but even if they were all somehow overcome, we’d be left with a much poorer system of education that’s narrowly focused on what’s being measured.

“Having said that, I do want to add to your earlier post concerning the Florida VAM. I think the post makes three good points but overlooks the most important one.

“As you point out, the model is nonsense when applied to educators who don’t teach the tested subjects. And as you point out, application of the model results in misclassifications—as do all such models. Finally, as you point out, very few readers can understand the model.

“But that leads to a somewhat different point that I think is very important. Florida’s legislators, its Commission of Education, and the members of the State Board of Education almost surely are among those who cannot understand the model. My hunch is that the AIR experts who developed FL’s model have walked through it, possibly multiple times, with these policy makers. But the math is just too complex. (Note that the excerpt you pasted from page 6 of the AIR report is just the general form of the model; if expanded it would be much more overwhelming—see the next 10 pages of the report.)

“This is not a criticism of the model or its developers; simple regression models that could be relatively easily understood have well-documented flaws. But adding vectors capturing the effect of lagged scores, mathematical descriptions of Bayesian estimates, and within-student covariance matrices—while all justified in the report—has the obvious effect of placing policy makers at the mercy of whichever experts they choose to listen to.

“This sort of problem does come up in other contexts; to some extent it’s unavoidable. When Congress votes to fund a NASA mission, the underlying math, physics and engineering are similarly beyond normal understanding. When judges hear expert testimony in a pharmaceutical case, etc., they also must confront their own limitations. But at least in those instances, there’s a procedure in place to take oppositional testimony.

“The best analogy here is probably to the defense industry, which works with people in the defense department to design a new weapons system and then helps to market it to Congress. The result is often something technically sophisticated and, for most members of Congress, well beyond their ability to understand strengths and weaknesses.

“Perhaps that’s why the non-technical evidence is so important. We can all understand the problem when a teacher explains that her evaluation is based on the academic growth of students in areas she doesn’t teach. We can all also, to some extent, understand the problem of unreliable evaluations that result in misclassifications.

“But we should, at the very least, recognize and acknowledge the reality that these policies are being adopted by policy makers who pretty much have no clue what it is that they’re putting in place.”

This teacher thought she was doing a swell job. But then
the
ratings came out and she discovered she is the worst
tea
cher in the state! In the past, she has won many
awards, and she loves teaching. In addition: I initiated
and continue to run the chess and drama clubs with no
remuneration. I do get a small stipend for being the
academic games coordinator, running the Mathletes team and spelling
bee for the school, along with keeping the staff and students
informed of enrichment opportunities like academic
competitions. I organize the field trips for my grade
level and a trip for 4th and
5th graders to spend three days at an
oceanographic institute in the Florida Keys.

My own 5th grade
gifted students will end this year with a full understanding of
three Shakespearean plays, as class sets of these and other texts
were secured through my Donors Choose
requests. Saturday, I’ll be the designated
representative picking up free materials for my
school. I write the full year’s lesson plans over the
summer (then tweaking as I go).
She is the victim of the ceiling
effect. Her students got such high scores last year that they can’t
get higher scores this year.
She explains:
Last year, many of my students had had the
highest scores on the state tests possible the year prior—a 5 out
of 5. That’s how they get in to my class of gifted and
high achieving students. Except, last year, they
raised the bar so that the same
5th graders who scored 5s in
4th grade were much less likely to earn
5s in math and reading in
5th grade. Some still DID
score 5s in math AND reading, yet were still deemed not to have
made sufficient progress because they did not score as high within
the 5 category as they had the year before.

It’s like expecting the members of an Olympic
pole vaulting team to all individually earn gold medals every time
the Olympics come around, regardless of any other factors affecting
their lives, with the bar raised another five inches each go
around. In a state where 40% of students pass the
5th grade science test, 100% of my
students passed; but no one (at the state level) cares about
science scores.
Therefore, I suck.
How nutty is this? Why does the
U.S. Department of Education insist that states must adopt flawed
measures? Does anyone at the U.S. Department of Education consider
the consequences of their policies? Do they know anything about
research or evidence? Do they care how many people lives or
reputations they carelessly ruin with their dumb ideas?
Just wondering.

In this age of value-added measurement, when teachers are judged by the rise or fall of their students’ test scores, it is very dangerous to teach gifted classes. Their scores are already at the top, and they have nowhere to go, so the teacher will get a low rating. It is also dangerous to teach English language learners, students with disabilities, and troubled youth. Their scores will not go up as much as the kids in affluent districts who have no issues.

Here is what happened to one teacher of gifted students:

“As a teacher of gifted students in Florida, I can attest to the fact that you are more likely to get slammed by VAM. I was rated the worst teacher at my school, the 14th worst teacher in my district, and the 146th worst teacher in the state of Florida (out of 120,000). Previously, I had a great reputation at my school among staff, parents, and students. Now that these scores have been published on the internet, I fear that future students, parents and administrators might be influenced by my extremely negative VAM ranking. Even if they aren’t, I have to worry about being slammed by VAM two years in a row, being rated “needs improvement”, losing my job and having my teaching license revoked by the state. Funny, just two years ago I was selected to be a mentor teacher by my district in the subject that I teach. Now I’m at risk of losing my career based on VAM results of a subject I don’t teach. Thanks a lot Arne. http://kafkateach.wordpress.com/2014/03/01/gosh-damn-thats-a-bad-vam/”

Value-added-measurement (VAM) produce ratings that are
inaccurate and unstable. In Florida, about half of teachers don’t
teach tested subjects, so they are assigned scores based on the
scores of their school, meaning they are rated in relation to the
scores of students they never taught and subjects they never
taught. This
Florida teacher explains why she was rated a 23.6583 out of 40,
even though she teaches a non-tested subject.
This is
irrational. Yet Arne Duncan has compelled almost every state to
develop VAM ratings because he believes in them, even though there
is no evidence for their value. How can a teacher’s quality be
judged by the test scores of students she never taught? If that is
not Junk Science, what is? Bill Gates gave Hillsborough County,
Florida, $100 million to evaluate teachers using
value-added-measurement. Here is the formula: Here’s how
the state’s Department of Education explains it, from
a department paper
:
methods2 I admit I don’t understand it. Many
people don’t understand it. But whoever wrote it understands it.
Bill Gates said recently it would take at least ten years to see if
this stuff “works.” I don’t think we have to wait ten years. “This
stuff” doesn’t work. It doesn’t even make sense. Teachers of the
gifted may be rated ineffective because their students have already
hit the top, and their scores can’t go up any higher. Their ratings
are Junk Science. When the same teacher gets a bonus one year, but
then is rated ineffective the next year, it shows how unstable the
ratings are. That means they are not science, they are Junk
Science. There is so much more to the art and craft of teaching
than standardized tests reveal. What matters most is not
quantifiable although peers and supervisors can indeed judge which
teachers are best and worst. If the measure is not valid, if the
measure in inaccurate and unstable, then it is wrong to use it to
give people bonuses or to fire them. In this post on her
blog
VAMboozled, Audrey Amrein-Beardsley reviewed a study
of VAM which again identified the weaknesses of VAM. She writes:
Finally, these researchers conclude that, “even in the
best scenarios and under the simplistic and idealized
conditions…the potential for misclassifying above average teachers
as below average or for misidentifying the ‘worst’ or ‘best’
teachers remains nontrivial.” Accordingly, misclassification rates
can range “from at least seven to more than 60 percent” depending
on the statistical controls and estimators used and the moderately
to highly non-random student sorting practices and scenarios across
schools.
Now, think about it. If the VAM rating can be
wrong by as much as 60%, why would any school district use it to
fire teachers? No wonder teachers are suing for wrongful
termination! Call in the lawyers, VAM is Junk Science.

Mercedes Schneider came across a speech
that Bill Gates gave to state legislators in 2009
. It
lays out the blueprint for everything that has happened in
education since then. Forget what you learned in civics class.
Gates gave legislators their marching orders. Duncan already had
his marching orders. Gates laid out $2.3 billion to create and
promote the Common Core standards. His buddy Arne handed out $350
million to test Bill’s standards. All the other pieces are there:
Charter schools should replace failure factories. He is a true
believer in charter magic. (We now know that charters get the same
results when they have the same students.) Longitudinal data
systems should be created to track students. (A parent rebellion
seems to have put this on the back burner for now, although
everyone seems to be mining student data, from Pearson to the SAT
to the ACT.) The teacher is the key to achievement (although real
research says the family and family income dwarfs teacher effects).
Here is the man behind the curtain, the man who loves data and
measurement, not children. Lock the doors, townspeople. Bill Gates
wants to measure everything about your children! Ask yourself, if
this guy made $60,000 a year, would anyone listen to him?

UPDATE:
After this blog was posted, two privacy activists–Allison White
and Leonie Haimson advised me that the collection of confidential
data about children is going forward, thanks to Arne Duncan’s
loosening of privacy rights under FERPA, the legislation designed
to prevent data mining. They write: “Actually at least 44 states
including NY are going forward with their internal P20 Longitudinal
data systems – as required by federal law – which will track kids
from cradle to the grave and collect their personal data from a
variety of state agencies.” Leonie Haimson is leader of Class Size
Matters and Prvacy Matters Allison Breidbart White is Co-author,
Protect NY State School Children Petition Please sign and share the
petition http://bit.ly/18VBvX2

ALSO: I transposed the numbers describing what the Gates Foundation spent on Common Core: it was $2.3 billion, not $3.2 billion. A billion here, a billion there, soon you are talking real money (I think I am paraphrasing long-gone Senator Everett Dirksen of Illinois, but who knows?)

A friend who observed the proceedings in the Vergara trial sent me the following notes, based on the testimony of Stanford professor Linda Darling-Hammond. She is probably the nation’s leading expert on issues related to teacher recruitment, preparation, retention, and support. Her testimony, based on many years of study and experience, was devastating to the plaintiff’s case.

Linda Darling-Hammond’s testimony

Overview

Yesterday, expert witness Linda Darling-Hammond, a renowned scholar and Stanford professor, has refuted the main arguments of the plaintiffs’ lawyers.

Darling-Hammond, whose insights come from both research and experience, stated that measures based on student test scores do not identify effective teachers, that two years is enough time to identify teachers who should be counseled out of the profession, and that extending that period beyond two years would harm students.

Excerpts

On what a good evaluation process looks like.

“With respect to tenure decisions, first of all, you need to have – in the system, you need to have clear standards that you’re going to evaluate the teacher against, that express the kind of teaching practices that are expected; and a way of collecting evidence about what the teacher does in the classroom. That includes observations and may also include certain artifacts of the teacher’s work, like lesson plans, curriculum units, student work, et cetera.”

“You need well-trained evaluators who know how to apply that instrument in a consistent and effective way.

“You want to have a system in which the evaluation is organized over a period of time so that the teacher is getting clarity about what they’re expected to do, feed back about what they’re doing, and so on.

In California – note related to the tenure decision, but separately – there is a mentoring program that may be going on side-by-side; but really, that does not feed into the tenure decisions. It’s really the observation and feedback process.”

On the problem with extending the tenure beyond two years

“It’s important that while we want teachers to at some point have due process rights in their career, that that judgment be made relatively soon; and that a floundering teacher who is grossly ineffective is not allowed to continue for many years because a year is a long time in the life of a student.

“So I think that having the two-year mark—which means you’re making a decision usually within 19 months of the starting point of that teacher – has the interest of allowing a – of encouraging districts to make that decision in a reasonable time frame so that students aren’t exposed to struggling teachers for long than they might need to be.”

Other reasons why two years is enough

“My opinion is that, for the first reason I mentioned earlier—the encouragement to make a judgment about a grossly ineffective teacher before many years go by is a useful reason to have a shorter tenure period – or pre-tenure period.

“But at the end of the say, the most important thing is not the amount of time; the most important thing is the quality and the intensity of the evaluation and support process that goes on for beginning teachers.

On the benefits and importance of having a system that includes support for struggling teachers

“Well, it’s important both as a part of a due process expectation; that if somebody is told they’re not meeting a standard, they should have some help to meet that standard.

The principal typically does not have as much time and may not have the expertise in the content area that a mentor teacher would have. For example, in physics or mathematics, usually the mentor is in the same area, so the help is more intensive and more specific.

“And in such programs, we often find that half of the teachers do improve. Others may not improve, and then the decision is more well- grounded. And when it is made, there is almost never a grievance or a lawsuit that follows because there’s ben such a strong process of help.

“The benefits to students are that as teachers are getting assistance and they’re improving their practice, students are likely to be better taught.

“And in the cases where the assistance may not prove adequate to help an incompetent teacher become competent, the benefit is that that teacher is going to be removed from the classroom sooner, if, sort of, they allowed the situation to just go on for a long time, which is truncated by this process of intensive assistance….

“The benefits to districts are that by doing this, you actually end up making the evaluation process more effective, making personnel decisions in a more timely way, making them with enough of a documentation record and a due process fidelity, that very rarely does there occur a problem after that with lawsuits; which means the district spends a little bit of money to save a lot of money and to improve the effectiveness of teaching for its students.

On peer assistance and review (PAR) and other mentoring programs

“A PAR program and other programs that mentor teachers typically improve the retention of teachers; that is, they keep more of the beginning teachers, which is where a lot of attrition occurs. But they do ensure that the teachers who leave are the ones that you’d like to have leave, as opposed to the ones who leave for other reasons.”

On firing the bottom 5% of teachers

“My opinion is that there are at least three reasons why firing the bottom 5 percent of teachers, as defined by the bottom 5 percent on an effectiveness continuum created by using the value-added test scores of their students on state tests, will not improve the overall effectiveness of teachers….

One reason is that, as I described earlier, those value-added metrics are inaccurate for many teachers. In addition, they’re highly unstable. So the teachers who are in the bottom 5 percent in one year are unlikely to be the same teachers as who would be in the bottom 5 percent the next year, assuming they were left in place.

“And the third reason is that when you create a system that is not oriented to attract high-quality teachers and support them in their work, that location becomes a very unattractive workplace. And an empirical proof of that is the situation currently in Houston, Texas, which has been firing many teachers at the bottom end of the value-added continuum without creating stronger overall achievement, and finding that they have fewer and fewer people who are willing to come apply for jobs in the district because with the instability of those scores, the inaccuracy and bias that they represent for groups of teachers, it’s become an unattractive place to work.

“The statement is often made with respect to Finland that if you fire the bottom 5 percent [of teachers], we will be on a par with achievement in Finland. And Finland does none of those things. Finland invests in the quality of beginning teachers, trains them well, brings them into the classroom and supports them, and doesn’t need to fire a lot of teachers.”

The Tennessee Education Association filed a second lawsuit against the use if value-added assessment (called TVAAS in Tennessee), this time including extremist Governor Haslam and ex-TFA state commissioner Huffman in their suit.

The teachers rightly say that the evaluations are unfair, a point on which most reputable researchers are in their corner.

“TEA’s lawsuit was filed on behalf of Knox County teacher Mark Taylor, an eighth grade science teacher at Farragut Middle School. Taylor was unfairly denied an APEX bonus after his TVAAS estimate was based on the standardized test scores of only 22 of his 142 students.

“Mr. Taylor teaches four upper-level physical science courses and one regular eighth grade science class,” said Richard Colbert, TEA general counsel. “The students in the upper-level course take a locally developed end-of-course test in place of the state’s TCAP assessment. As a result, those high-performing students were not included in Mr. Taylor’s TVAAS estimate.”

“While Mr. Taylor’s observation score was ‘exceeding expectations,’ his low TVAAS estimate based on only 16 percent of his students dropped his final evaluation score below the threshold to receive the APEX bonus,” Colbert said.

“Unfortunately, Mr. Taylor’s situation is not an uncommon one. Many teachers across the state – particularly at the high school level – are being unfairly evaluated on an arbitrary percentage of their students.”

Gosh, Arne Duncan only recently hailed Tennessee as one of the stars of Race to the Top. Not so much.

The Rochester Teachers Association is suing the state for its flawed evaluation system, which unfairly judges teachers.

Erica Bryant explains why in this article.

“Years ago, I visited the Kennedy Space Center and bought a coffee mug from the gift shop. It is decorated with some NASA equations, including one used to calculate the speed an object needs to escape Earth’s gravity. This formula fits on one line.

“By contrast, the document that describes how to measure student growth for the purpose of evaluating New York’s teachers and principals is 112 pages.”

Even in 112 pages, the teacher evaluation system is unfair and senseless and penalizes teachers who work with the poorest students.

What a mess in Connecticut!

Robert A. Frahm writes in the Connecticut Mirror about how teachers and principals are struggling with the state’s test-based evaluation system. Teachers waste time setting paperwork goals that are low enough to make statistical “gains.” If they don’t, they may be rated ineffective.

Every principal spends hours observing teachers—one hour each time—taking copious notes, then spending hours writing up the observations.

Connecticut, one of the two or three top scoring states in the nation on NAEP (the others are Massachusetts and New Jersey), is drowning its schools and educators in mandates and paperwork.

Why? Race to the Top says it is absolutely necessary. Connecticut didn’t win Race to the Top funding, but the state is doing what Arne Duncan believes in. Stefan Pryor, the state commissioner, loves evaluating by test scores, but that’s no surprise because he was never a teacher; he is a law school graduate and co-founder of a “no excuses” charter school chain in Connecticut that is devoted to test scores at all times. The charter chain he founded is known for its high suspension rate, its high scores, and its limited enrollment of English learners.

Researchers have shown again and again that test-based accountability is flawed, inaccurate, unstable. It doesn’t work in theory, and it has not worked in five years of experience.

The article quotes the conservative advocacy group, National Council for Teacher Quality, which applauds this discredited methodology. NCTQ is neither an accrediting body nor a research organization.

Our nation’s leading scholars and scholarly organizations have criticized test-based accountability.

In 2010, some of the nation’s most highly accomplished scholars in testing, including Robert Linn, Eva Baker, Richard Shavelson, and Lorrie Shepard, spoke out against the misuse of test scores to judge teacher quality.

The American Educational Research Association and the National Academy of Education issued a joint statement warning about VAM.

Many noted scholars, like Edward Haertel, Linda Darling Hammond, and David Berliner, have warned about the lack of “science” behind VAM.

The highly esteemed National Research Council issued a report warning that test-based accountability had not succeeded and was unlikely to succeed. Marc Tucker recently described the failure of test-based accountability.

But the carefully researched views of our nation’s leading scholars were tossed aside by Arne Duncan, the Gates Foundation, and the phalanx of rightwing groups that support their agenda of demoralizing teachers, clearing out those who are veterans, and turning teaching into a short-time temp job.

The article cites New Haven as an example:

“Four years ago, New Haven schools won national attention when the district and the teachers’ union developed an evaluation system that uses test results as a factor in rating teachers. Since then, dozens of teachers have resigned or been dismissed as a result of the evaluations. Last year, 20 teachers, about 1 percent of the workforce, left the district after receiving poor evaluations.”

Four years later, can anyone say that New Haven is now the best district in Connecticut? Has the achievement gap closed? Time for another investigative report.