Good news! The Governor of New Jersey, Phil Murphy, and the State Commissioner, Lamont Repollet, slashed the stakes attached to PARCC testing. Until now, 30% of a teacher’s evaluation was tied to test scores on the Common Core PARCC Test. The governor and Commissioner just dropped it to 5%.
The practice of evaluating teachers by student test scores was heavily promoted by Arne Duncan and Race to the Top. It has been widely discredited by scholarly organizations like the American Statistical Association. It remains on the books in many states as a dead vestige of the past, a zombie policy that has never worked but never died.
New Jersey drove a stake in its icy heart.
“New Jersey Commissioner of Education Dr. Lamont Repollet today announced that PARCC scores will account for only five percent of a teacher’s evaluation in New Jersey next year, down from the damaging 30 percent figure mandated by his predecessors. State law continues to require that standardized test scores play some role in teacher evaluation despite the lack of any evidence that they serve a valid purpose. In fact, researchers caution against using the scores for high-stakes decisions such as teacher evaluation. By cutting the weight given to the scores to near the bare minimum, the Department of Education and the Murphy administration have shown their respect for the research. The move also demonstrates respect for the experience and expertise of parents and educators who have long maintained that PARCC—or the Partnership for Assessment of Readiness for College and Careers—is an intrusive, harmful test that disrupts learning and does not adequately measure student learning or teacher effectiveness.
“Today’s announcement is another step by Gov. Murphy toward keeping a campaign promise to rid New Jersey’s public schools of the scourge of high-stakes testing. While tens of thousands of families across the state have already refused to subject their children to PARCC, schools are still required to administer it and educators are still subject to its arbitrary effects on their evaluation. By dramatically lowering the stakes for the test, Murphy is making it possible for educators and students alike to focus more time and attention on real teaching and learning.
“NJEA President Marie Blistan praised Gov. Murphy and Commissioner Repollet for putting the well-being of students first and for trusting parents and educators. “Governor Murphy showed that he trusts parents and educators when it comes to what’s best for students. By turning down the pressure of PARCC, he has removed a major obstacle to quality teaching and learning in New Jersey. NJEA members are highly qualified professionals who do amazing work for students every day. This decision frees us to focus on what really matters…”
“While the move to dramatically reduce the weight of PARCC in teacher evaluation is a big win for families and educators alike, it is only the first step toward ultimately eliminating PARCC and replacing it with less intrusive, more helpful ways of measuring student learning. New Jersey’s public schools are consistently rated among the very best in the nation, a position they have held for many years. Despite that, New Jersey students and educators are among the last anywhere still burdened by this failed five-year PARCC experiment. By moving away from PARCC, New Jersey’s public education community will once again be free to focus on the innovative efforts that have long served students so well.”
Since VAM (of more precisely SGP in the case of NJ) is really just a random number generator, it’s not clear why they should NOT drop the weighting of student test scores in the teacher evaluation to zero.
In fact, they should drop it to zero because the effect of VAM remains largely undiluted by the other factors used to evaluate teachers.
As Cathy O’Neil (Mathbabe) has pointed out, even when the weighting for VAM is small, it can be the deciding factor in whether a teacher is considered good or poor simply because VAM accounts for the vast majority of the variance in teacher evaluation scores.
Essentially, teachers are rated very similarly based on other parts of the evaluation, so VAM becomes the “decider” simply by virtue of the fact that the VAM score varies from teacher to teacher ( and from o e year to the next for a single teacher!)
Cathy talks about that in a blog post
https://mathbabe.org/2016/09/30/the-one-of-many-fallacy/
She uses 50% as the weight, but the point of her post is that the actual weighting is not all that important.
The people making these decisions really need to start consulting people!e like Cathy O’Neil who understand the mathematics.
Well it is a good start. The article clearly states that they cut the weight of the test scores to the bare minimum….then they can change the state law that requires the scores. Government does nothing fast.
ESSA requires yearly testing.
Actually, they said “near the bare” minimum and did not actually specify what that minimum is, or even if it is a specific percentage.
All they said was that
“State law continues to require that standardized test scores play SOME role in teacher evaluation despite the lack of any evidence that they serve a valid purpose. In fact, researchers caution against using the scores for high-stakes decisions such as teacher evaluation. By cutting the weight given to the scores to NEAR the bare minimum, the Department of Education and the Murphy administration have shown their respect for the research”
Near the bare minimum is not the same as “the bare minimum”.
So why stop at 5%?
Why not 0.0000001%? That is SOME.
Or 0.000000000000001%? Also “some”
If legislators and others are going to play games with teachers lives, then why not beat them at their own game?
YES.
Excellent link!
Andy “The Not So Great” – are you paying attention?
When will other states come up their senses?
Great. 👍
But how much failure and exposed corruption did it take?
VAM was thoroughly trashed by the very people who know the most about it: The American Statistical Association (ASA), the largest organization in the United States representing statisticians and related professionals, and they know a thing or two about data and measurement. The ASA slammed the deceptively-labeled ‘Value-Added Method’ (VAM) of evaluating teachers because VAM falsely claims to be able to take student standardized test scores and measure the ‘value’ a teacher adds to student learning through complicated formulas that can supposedly factor out all of the other influences and emerge with a valid assessment of how effective a particular teacher has been. But the ASA lays bare the fact that THESE FORMULAS CAN’T ACTUALLY DO THIS with reliability and validity. It’s pure political ideology to claim that VAM can do these things.
In an official statement, the ASA points out the following and many other failings of testing-based VAM:
“System-level conditions” include everything from overcrowded and underfunded classrooms to district-and site-level management of the schools and to student poverty.
A copy of the VAM-slamming ASA Statement should be posted on the union bulletin board at every school site throughout our nation and should be explained to every teacher by their union at individual site faculty meetings so that teachers are aware of what it says about how invalid it is to use standardized test results to evaluate teachers or principals — and teachers’ and principals’ unions should fight all evaluations based on student test scores with the ASA statement as a good foundation for their fight.
Fight back! Never, never, never give up!
Here is another statement, only one page and you can download it without being a member of AERA.
Klees, S. J. (2016). VAMs Are Never “Accurate, Reliable, and Valid.” Educational Researcher, 45(4), 267. doi: 10.3102/0013189X16651081
http://journals.sagepub.com/stoken/rbtfl/Ozv0.uo.t/JQM/full
I would love to see more lawsuits that really do the job of killing VAM, but I also know that there are other and really draconian measures being pushed. One of these the “School Improvement Index” substantially funded by private foundations and originating in the CORE Districts of California–among the “networks” the Gates Foundations will be investing in.
The evil empire continues in so many different forms. I would like to see more teachers that lost jobs over VAM file lawsuits.
The problem is that those who wrote the ASA report were not forceful enough in their prescriptions for how VAM should and should not be used.
Any intelligent AND reasonable person would gather from the ASA paper that ASA considered it a bad idea to use VAM to evaluate individual teachers.
But the VAMbots are not reasonable people — and many of them (like Arne Duncan) are not very intelligent.. They are the sort who will and DID claim that because ASA did not specifically say “Thou shalt not use VAM to evaluate individual teachers”, it’s fine and dandy to do so.
I understand that ASA thought it was not their place as a mathematical society to make policy, but whether they like it or not, their lack of forcefulness had serious negative ramifications for teachers and schools because it allowed people like Raj Chetty and John Friedman to continue to tout the wonders of VAM long after the ASA report had come out.
People like Chetty and Friedman need to be told in no uncertain terms that their claims are a lot of bull. They need to be quite publicly embarrassed by actual statisticians and made examples of how NOT to do statistics.
But sadly, with a few notable exceptions (Moshe Adler and Audrey Amrein-Beardsley), that did not happen.
This whole thing actually points out a significant problem. By and large, academics are reluctant to criticize policies that they consider bad when their colleagues are the ones pushing the policies. So, for example, when Raj Chetty was running around yakking up VAM, his colleagues in the statistics department at Harvard had virtually nothing to say about his work, despite the fact that it involved cherry picking and statistical “errors” that would not be tolerated of students by most stats professors teaching stats 101.
Arne still believes in VAM. No one was courageous enough, except him, to crack down on all those lying teachers
“By and large, academics are reluctant to criticize policies that they consider bad when their colleagues are the ones pushing the policies. ”
Most scientists in academia like to stay away from politics and policies. They do not feel competent—and they are correct when it comes to suggesting policies, since the consequences are unpredictable and are never backed by research.
On the other hand, scientists should feel obligated to criticize. policies if they are based on questionable science. The story of VAM is a perfect example.
Another great example is Common Core: I doubt anybody is competent to set standards like that for such a big population, while all teachers can and should criticize it.
There seems to me to be very healthy debate among scientists, through papers, studies, and within their associations. The ASA statement is perhaps overly cautious from a statistical point of view, but as Laura notes, was tartly criticized by educational researchers. And regardless of whether academicians are hesitant to criticize colleagues’ work, if a study is based on flawed research, critiques debunking it appear and get publicized.
The idea that Duncan would have been forced to reconsider had the ASA’s statement been categorical is wishful thinking. No one who cared to base educational policy on research could have ignored its clear message on the unsuitability of VAM. The point needs to be made that major DofEd policy for some years now has been entirely political, and ignorant or dismissive of educational research, including stats showing lack of results and even harmful results of these policies.
Unfortunately, the ASA rebuke of VAM was not published until 2014.
Most states adopted VAM in 2010. Do you think that legislators pay attention to Research?
Bethree
People like Chetty are extremely sensitive to criticism by real scientists as evidenced by his NY Times “defense of economics as a science” .
Its possible that it did not matter what ASA,said, but if the Harvard statistics department had pointed out his cherry picking and statistical errors, I believe it would have had an impact.
When it comes to statistics, most people understand that there is no comparison between the Harvard stats department and the Harvard econ department. It’s day (stats) and night (econ).
Chetty figured he could get away with a handwaving nonresponse to Moshe Adler because, you know, Adler is not a Haaaawvid MacArthur genius like Chetty and hence does not invoke (undeserved) awe in Presidents and other influential poeole the way Chetty does.
And Chetty was right because Adler was mostly ignored, even by the journal that published Chetty’s nonsense.
Diane, thanks for correcting my chronology. Obama/ Duncan’s 2009’s Race to the Top caused most states to adopt VAM by 2010 (legisl research reqd: ‘do we need the $?’). Done deal by the time ASA spoke up.
Wouldn’t it have been great if instead of imposing VAM via bribe during financial crisis, DOEd had offered grant money for 5-yr pilot programs – collected data – based any new policy on the results? In that scenario, ASA’s 2014 ‘expert opinion’ would have gone into the hopper as more input to an ongoing, research-based process. That things “Raced” in the opposite direction is just one more example of why voters are cynical & view DC as a swamp of corruption.
Duncan couldn’t wait for evidence. .
The evidence would take decades to gather. No political or business leadership would be willing to wait that long.
This is why, there will always be an antagonism between capitalism and academia. Presently, capitalism is winning, turning academia into businesses which serve corporations and political agendas and set short term goals.
At my university, for example, departments and profs are evaluated based on how much grant (or any other) money they bring in to the university. We even have a “millionaires’ club” the members of which are profs who got at least a million dollar grant.
SDP thanks for the background on Chetty. Looked him up on wiki: good lord. Must be something good in those publications to garner all the attention/ honors [perhaps outside the ed field]. I wonder whether colleagues’ considered criticism would have given pause to “one of the youngest tenured faculty in the history of Harvard’s economics department”/ 2012 MacArthur “genius.”. Could you clarify what you meant in the comparison of Harvard eco & stat depts?
Why wait until next year?
“. . . PARCC—or the Partnership for Assessment of Readiness for College and Careers—is an intrusive, harmful test that disrupts learning and does not adequately measure student learning or teacher effectiveness.”
and
“. . . it is only the first step toward ultimately eliminating PARCC and replacing it with less intrusive, more helpful ways of measuring student learning.”
and from Scisne’s comment:
“In an official statement, the ASA points out the following and many other failings of testing-based VAM:
‘VAMs typically measure correlation, not causation. . .”
For the last first: How can ASA not realize one does not “measure a correlation”? One can determine a correlation. One can figure out a correlation. One can evaluate a correlation. One can assess a correlation. But how the hell does one “measure a correlation”? What is the standard unit of measurement for a correlation? What is the measuring device? How has that device been calibrated against said standard unit? Needless to say there is no measuring going on, ay ay ay ay ay!
“Measuring student learning”:
The most misleading concept/term in education is “measuring student achievement” or “measuring student learning”. The concept has been misleading educators into deluding themselves that the teaching and learning process can be analyzed/assessed using “scientific” methods which are actually pseudo-scientific at best and at worst a complete bastardization of rationo-logical thinking and language usage.
“But how the hell does one “measure a correlation”?”
https://en.wikipedia.org/wiki/Correlation_and_dependence
In other words, Duane, once you have data, you can measure it correlation. The basic problem is not this. The basic problem is the collection of quantitative data, and thinking, it has relevance to learning and teaching.
Here are some useful “measurements” (for what, I am not sure, but surely useful for something)
http://www.tylervigen.com/spurious-correlations
Most people don’t realize how utterly meaningless correlations can be (and usually are)
Eg
Suicides by hanging, strangulation and suffocation correlates very strongly (0.993796) with numbers of lawyers in North Carolina from 1999-2009.
But of course, economists think correlation is the best thing since sliced bread. In fact, without correlation, there would be no “econometrics”, which most probably would not be a bad thing.
And the worst part is that economists like Raj Chetty don’t even realize how meaningless their correlations are.
I agree with SDP. Still too much. (But glad to see this.)
“The governor and Commissioner just dropped it to 5%.”
This is a great idea—in fact, why not drop it to 1%? Also don’t drop yearly standardized tests for students since federal law demands them. Instead, make these tests 5 minutes long.
NJ: One down one to go: dump Danielson/ Marzano/ et al SGO part of teacher evaluation!
Real NJ pubsch teachers correct me if I’m wrong. I have only 2nd-hand info from sister (Danielson NYS) & a few local friends (Marzano NJ) dating from first couple yrs’ implementation. But what I heard sounded like a colossal amt of teacher-generated ppwk, time-wasting & easily gamed – & very reminiscent of [now-debunked] MBO method applied to constr projects I worked on in the ’80’s [also colossal pprwk, time-wasting & easily gamed].
I have email correspondence with Danielson that includes extended responses to my questions about the research claimed to be supportive of her framework. It that exchange she acknowledges there is no research supporting its use in every grade and for every subject where it is used. The Danielson framework was used in the Gates-funded “Measures of Effective Teaching” (MET) project, to rate the video clips selected by teachers…a farce claimed as “research” supporting the use of the product. The reliability issues in that flawed use were neverfully acknowledged.
Glad to hear I’m not way off the mark here.
My curriculum for PreK conversational Spanish is more or less self-invented (from sparse available resources) – & taught pre-reading/ writing – & I operate under the stds/ assessment radar. It’s very easy for such a course to be aimless, & much of what I was doing early on was instinctive.
I actually learned a lot as the stds/sgo movement began trickling into PreK 10 yrs ago. But not from the actual stds/ growth measurements I encountered! (which were clumsy & restrictive & inappropriate to early lang learning). It was the concept I got: consciously applying the discipline of articulating specific objectives, observing growth, making eyeball assessments of progress, adjusting curriculum accordingly. From that flowed one-theme lesson-plans connected organically, scaffolding, & other good things that have helped me be more effective.
The Danielson et al ‘methods’ seem to me not methods at all, but plodding, user-unfriendly translations of sensible pedagogical concepts into miniscule computer-friendly bits which add nothing to teaching/ learning, take away precious time, & warp the concepts. Very much parallel to what CCSS-ELA does to reading/ writing pedagogy.