Linda Darling-Hammond of Stanford University posted this graphic on Twitter. It shows the flat lines and declines of U.S. scores on the international PISA test from 2000-2012.
Just think of the waste of billions of dollars for testing under both No Child Left Behind and Race to the Top. Think of the children cheated of a real education. Think about the fact that Arne Duncan says that high-stakes testing is non-negotiable in any new Elementary and Secondary Education Act.
Is it not the definition of insanity to do the same thing over and over for a dozen years without success and to expect better results if you do it for another 7-10 years?
…. and so, flat PISA scores and declines in enrollment in teacher preparations programs notwithstanding, the teacher beatings will continue until morale and scores improve. Aren’t critical thinking and using evidence supposed to be part of the Common Core? Oh, wait. That’s for test questions, not policy decisions.
I just heard from a teacher friend this morning that PARCC had randomized the test, so that on the same section of the test, some students had 7 questions while some had 23. How is this standardized and further, how does it make any sense?! Wow.
On Wed, Mar 4, 2015 at 11:01 AM, Diane Ravitch’s blog wrote:
> dianeravitch posted: “Linda Darling-Hammond of Stanford University > posted this graphic on Twitter. It shows the flat lines and declines of > U.S. scores on the international PISA test from 2000-2012. > https://twitter.com/ldh_ed/status/484388825645326337 Just think of the > was”
” How is this standardized and further, how does it make any sense?! Wow”
Don’t worry about that. Psychometricians and their ilk have perfected the art of fudging truth.
Yeah: psycho metricians.
TAGO!
Then, Arne Duncan should have to explain to Congress how NCLB has improved accountability based on Linda Darling Hammond’s findings. Maybe they should have looked at information and facts prior to the reauthorization process. We should make an effort to “stop doing stupid.” The problem is facts take a back seat to special interests as the the people making decisions have lost their objectivity.
If you go to Stanford’s website and look at the Power Point Hammond will use at her workshop, there are lots of other comparative facts. Our teachers work, longer, harder, with larger classes of poorer students than countries where students score higher on the PISA. We have less prep time and less time for collaboration which, she maintains, leads to better student outcomes. https://edpolicy.stanford.edu/events/1220
But Bill Gates says we have to wait 10 years to see if the Common Core experiment works!
That’s what I loved about the editorial pages in Ohio on the CC:
“It’s just millions of students! Why not experiment?!”
They’re big risk takers on behalf of other people, those “adults”
This is the Achieve mock-up of a Common Core test score report that goes to parents.
Boy, it sure looks like they’re ranking and measuring students based on one test score:The language they use is VERY definitive and reductive. A parent reading this would naturally conclude that this test is the sum total measure of this student.
Click to access Family_Report_ELA_Gr11.pdf
I agree. An extended version of this was in the works that would tell parents how their kids were doing relative to all other schools and suggest options for them to consider…steering bases on this kind of detail..
And another would tell parents if the kid was college and career ready, and by middle school name the career clusters and postsecondary options suitable for the student and/or options to catch up..
These are “recommendation systems” that Gates and others want to see elaborated in order to channel “human capital” into the “best value” paths for post-secondary education.
The data-mining and telling parents plus teachers what do is micromanagement from afar by whomever is paying Achieve to produce this misleading report… also filled with a lot of jargon not likely to be grasped by a lot of parents.
I don’t know how you issue such a definitive report to parents while at the same time insisting one test isn’t definitive and “multiple factors” should be taken into account.
Kids are going to be defined by this test. Particularly because the promoters of the test so over-sold it as a tool. We have been told again and again that this is a great test- the best test ever! The logical extension of that is “this test defines the sum total of what this kid knows and can do”
Is that true? DOES this test predict all that? Aren’t we just taking the over-reliance on ranking people that now exists with the SAT and the ACT and pushing it down to younger and younger kids?
I don’t know if you’re watching the evolution of the Ohio vocational tracking, but they’re going off the rails with it.
I support vocational education. I don’t think everyone has to go to college. However. I knew ed reformers were going to go nuts tracking kids too early and too soon into vocational ed, and they are.
They lack restraint. They go crazy.
I’ll observe yet again that a main thesis of “Reign of Error” is that to a large degree, reform is a response to a so-called manufactured crisis: the state of Anerica’s schools is strong, and to represent this visually, there are a few dozen pages’ worth of graphs and charts with the evidence.
So posts like this confuse me. I understand and respect Keynes’s point about facts changing, but does this mean that you’ve changed your mind since the publication of “Reign”?
Is it not a fair point from Diane that — whether you think there’s a crisis or not — these statistics demonstrate there has been no improvement through the Reformist policies?
I don’t have my first edition first printing hardcover copy of “Reign” with me right now, but off the top of my head I suspect it would attempt to dismiss what this graph says in two ways–1. The chart doesn’t account for poverty; our kids in public schools where less than 10 or 20% of the kids qualify for free lunch do as well as anyone in the world. 2. How the U.S. performs on these tests doesn’t even matter. We’ve been bringing up the rear for generations now and are nevertheless the most powerful and wealthy nation on earth.
I think I see your point, but I don’t think Diane’s comments are at odds in any way with her earlier commentary. She suggests that the crisis is manufactured, but whether you agree with that or not, the new policies have done nothing to improve scores. You can take issue with Diane’s analysis of the statistics, but I’m not sure how you could be confused with the point of her post that NCLB and other W/Rhee/Duncan Reforms have not led to measurable improvements (despite coming, as Steve K notes below, at large costs).
As far as disaggregating PISA data, Steven Krashen did disaggregate passed PISA scores. He found that when the US student scores were sorted by socioeconomic level, the US students scores were in the top tier of each of the socioeconomic levels. What pulls our scores down is the high level of poverty in the US. Today more than half of students in public schools are poor. Here is a link with a breakdown. http://christinemccartney.net/
“. . . these statistics demonstrate there has been no improvement through the Reformist policies?”
NO! They don’t demonstrate that.
They don’t demonstrate anything as the results are COMPLETELY INVALID as proven by Noel Wilson. Using PISA as Diane is doing is just as invalid a way of critiquing the educational malpractices of the edudeformers as it is for doing all the other magical things (evaluating students, teachers, schools, etc. . . ) that these INVALIDITIES supposedly do.
Diane can speak for herself on this. My response would be that this does not demonstrate the strength of schools. Rather, it demonstrates that a decade plus of test-based accountability has not moved the needle. Instead, it has been a financially costly exercise that has been increasingly used to leverage improvement and that it has failed.
It would suggest that NCLB and RttT have had zero effect. Sure, there’s fun little selective stories and the occasional successful model (that can never really be replicated on a large scale). But the above graph demonstrates treading water at best (actually scores have declined a little over that time). It is not an indicator of the strength of the system but rather a reflection of the effect of test-based accountability policies.
I think it calls “high expectations = better performance” into question, and high expectations is absolutely central to the ed reform “movement”.
They set the NCLB goals very high- so high that they had to put in an elaborate waiver system to avert disaster. So they should ask whether there was a benefit that can be linked to “high expectations” because that idea is also central to the Common Core testing.
At a bare minimum I would want to see the data disaggregated by student characteristics before arriving at any conclusions.
The devil’s advocate might say that as of 2012, the beneficent effects of rigorous national standards tied to test score-based accountability had yet to kick in.
Great summary!
Tim, disaggregation is a fair point. I would agree. But, remember, it’s the collective national score that is cited when reformers talk. My interpretation was that Race to the Top wasn’t just a policy title but rather a slogan. To the top of what? The PISA tables!
Disaggregation would be useful but in this day and age of cherry-picking and selective use of statistics (and selective interpretation of statistics) I think that this chart does reflect that the policy has not worked. It can also argue that it hasn’t been harmful I guess.
And, ultimately, that appears to be Darling-Hammond’s point. What have we gotten for our foray into test-based accountability? What have we gotten for the money directed into the coffers of Pearson and the like? Not much it appears.
We must disaggregate with all deliberate speed.
Steve K: I would like to expand a little, from my POV, on your statement that “it demonstrates that a decade plus of test-based accountability has not moved the needle.”
Apart from the question of whether or not standardized tests measure anything at all or anything important or just generate scores that are used/misused/abused to suit the agenda of those pushing charters/vouchers/privatization—
There is a simple and incontrovertible fact: by their own standards, the leaders and enforcers and apologists for self-proclaimed “education reform” have failed by their single most sacred metric.
They are stuck with their unacknowledged failure to deliver the goods aka ascending numbers on their fantastical promises and wishful thinking and aspirational chimeras.
I hold them fast to one of their mantras: No Excuses. Because even when squishing figures and torturing stats and withholding data and using words to mean what they want them to mean, they don’t measure up to their own high-falutin’ rhetoric.
So when it comes to doing just the one thing they keeping pushing down our throats as the ultimate and final indication & assurance of genuine learning and teaching—getting the numbers up—they can’t even do that one single thing.
With billionaires backing them, political heavyweights pushing their policies, educrats enforcing and enabling their gimmicks and schemes, they can’t move one lone needle even a little bit in the “right” direction.
Their response to all this? See a previous posting today on this blog, quoting the NJ Education Commissioner: “Whatever we’re doing, we need to double down.”
Failure, thy name is “education reform.”
😎
When will the blinders come off?????
Here’s a very unbiased and scientific report:
“Because education in this country is by-and-large a public enterprise, champions of change and defenders of the status quo must turn to elected and appointed officials to advocate for their desired outcome.”
Good Lord. “Champions of Change versus Defenders of the Status Quo”
http://www.brookings.edu/research/reports/2015/03/04-education-advocacy-whitehurst?rssid=LatestFromBrookings&utm_content=bufferc5013&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Grover Whitehurst was research director for George W. Bush’s Institute of Education Sciences. He now heads the education program at Brookings. One of his innovations was to create an annual ranking that honors the district that has the most school chiice.
CROSS POSTED AT
http://www.opednews.com/Quicklink/A-Stunning-Graphic-on-the-in-Best_Web_OpEds-Diane-Ravitch_Education_Failures_Testing-150304-633.html#comment535835
Darling-Hammond for Secretary of Education. To think that Obama almost chose her! Instead he got Duncan. What a tragedy for us all.
Don’t give Obama any more credit than he deserves. He didn’t “almost” choose Darling-Hamilton. He pulled a bait-and-switch. He knew all along it would be Arne.
Barbara Riverwoman,
What do you mean “almost”?
He put her there as arm candy for the liberal mindset, and then ditched her ASAP to please his cronies. He never had Linda Darling Hammond in mind to begin with.
Now we have Duncan, whose biggest intellectual competition are Chloe Kardashian and Teresa Giudice . . . .
Almost makes me yearn for Chloe Kardashian.
Dienne! You’re bad. 😉
http://www.livingindialogue.com/big-error-school-accountability/
It matters not how many facts accumulate to demonstrate the idiocy of the reformers, if insanity and cruelty to children make political donors some money, then that will be policy. It’s going to take a whole lot of people to overturn the systematic dismantling of public schools, I pray that happens.
Diane,
Not sure where to post this but please read and hopefully post this great article on the politics of testing in Colorado:
http://northdenvernews.com/scare-tactics-testing-dfer-mikejohnston-edreform-janegoff/
Here is part of the story of New York’s grade 3-8 ELA and math assessments, and some of the ways in which failure has been manufactured by numerous changes to the tests, performance expectations and proficiency levels over the years:
http://www.citizencombatants.net/home/testing-history-in-ny-state
Do you have anything similar for 2005 and prior?
Not at this point. I have extended this work by looking at the change in percentage at Level 1 by accountability group, particularly between 2012 and 2014–but I don’t have those slides posted yet. I’ll put them up in the next week or so.
Thanks. Not to add to your workload or slough off what I conceivably could do myself, but what I think parents would find most interesting is a table comparing the length of the tests, going back as far as possible.
FLERP,
Good point. Until I took the SAT, I never had a test that lasted longer than 45 minutes.
Yup. It doesn’t make any change at all. This is like comparing apple to orange–or comparing donkey to elephant. PISA test scores reflect on those who have a decent social well-being, born to a family of decent income, and solid access to educational resources. Even in Japan, students of higher income family come on top, and students of working class or lower-income family come on bottom in PISA test scores.
Again the PISA test score results don’t mean squat. The test and results suffer from all the epistemological and ontological errors that render any results COMPLETELY INVALID. All this type of talk/banter is mental masturbation that may “feel good” but certainly doesn’t accomplish the primary goal of orgasmic delight.
To understand why the test and results are less than the “real thing”, i.e., COMPLETELY INVALID read and understand Noel Wilson’s never refuted nor rebutted treatise on educational standards and standardized testing “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine
.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
By Duane E. Swacker
Well, the first question any for profit business asks is: What markets should you be in?
Guess the “rich” found a way to make even more profits for themselves—close down public schools via repression using ccss and high-stakes tests. It’s a win-win NOT for our country, but for the few at the top of the pyramid.
Even when the very data they claim is so good, points in the opposite direction, they ignore. This is MARKETING, folks. This is after all, JUST BUSINESS.
And, of course, follow the $$$$$ trail. Who profits? Who does not?
http://blogs.edweek.org/edweek/learning_deeply/2015/02/the_role_of_testing_to_support_deeper_learning.html
A good read from Linda Darling-Hammond.
I bow in homage to Noel Wilson and Señor Swacker. Now, having suspended our disbelief, on to the test sores.
The test scores are always supposed to rise, n’est-ce pas? And it’s assumed that the students are somehow static, unchanging from year to year. But teachers know it’s not true: one year you have all lovely children whom you want to adopt and the next it’s the class from hell.
So, what has changed since 2000? Well, we went to war in two countries, sending parents away from their children. We had a nearly total destruction of our financial system. Unemployment reached record numbers and we have had a jobless recovery. Tens of thousands of families went through foreclosures, many became homeless. Hurricane Katrina destroyed New Orleans and its public schools. TFA became a hurricane of its own. The rate of poverty among school children rose above 50%. Public school budgets have been slashed and not restored. Class sizes have grown. Teachers have lost job protections as well as hard-earned pension benefits, making teaching a tenuous profession. An unproven, developmentally inappropriate curriculum has been forced on children and teachers by a few billionaires. Money has been sucked away to charters and to paying for Ipads and Chromebooks.
Since test “results” reflect the circumstances of those who take them, it must be only the hard work of classroom teachers everywhere, who have remained focused on delivering the best education they can muster to the kids in their care, which has prevented the tests scores from plummeting to the very bottom of the reformsters’ charts.
Bravo, colleagues!
“…it must be only the hard work of classroom teachers everywhere, who have remained focused on delivering the best education they can muster to the kids in their care, which has prevented the tests scores from plummeting to the very bottom of the reformsters’ charts.’ …and the kids who insist on learning despite the “best efforts(practices)” of the power brokers to destroy all desire to do so.
Just opened this from Twitter in one of my e-mail accounts—there were six in this list and Diane was listed as number 2:
Popular in your (Twitter) network
Diane Ravitch
@DianeRavitch
A Stunning Graphic on the Failure of Test-Based Accountability wp.me/p2odLa-9Ky
No, no, no! You are all totally missing the point. Look how bad teachers have become lately!! That’s how to view this graphic. If only teachers would teach the damn standards and add rigor and make the kids have grit……. Sheesh!
You don’t seriously think the reformsters will see it any other way, right?!?!
It’s all in the spin……
Yes, it’s obvious that the reformers are defining what failure is in public education through the CCSS rank and yank testing. Nothing else counts.
It doesn’t matter if the test works or not or is seriously flawed—–the test is a tool to set up (frame) the public schools in a context the reformers control that equals failure.
CCSS testing = failure and the reformers control the number that determines failure. For instance, in New York the cut date was set so 70% of students would have to fail no matter what they did on the test.
The language in NCLB supports the reformers because 100% of children must be considered college and career ready by any means possible and the tool they are using to determine failure—and even 99% college and career ready means failure for the public schools—is a test they control.
Failure under NCLB’s language is guaranteed because 100% is impossible. All the reformers needed was the CCSS standardized test as the tool that would be used to fulfill the language of NCBL and then the public schools can be shut down and replaced for profit.
Legally, a bad law is a bad law but it is still the law. NCLB was a set up from the start.
You know, I am hearing the poor performance of teachers from many friends who are teaching. Bill Keller did a great article in the nY Times, years a go, about the need for education degrees to have the rigor o f medical or law degrees.
I think that an ed degree is too often the last refuge of young people who want an ‘easy’ degree, and who are not accustomed to doing work in high school.
Thie profession is under attack, and if the crop of teachers who emerge are not up to the job, then , yes, public education will fail.
But also, in fairness, really smart people, go to college, spend a fortune, and end up in a workplace where the top -down mandates make it impossible to motivate the kids, to really engage them. In no time, novice teacher-practitioners discover there is no support from above, and their practice contains children who disturb the process of learning, and that there is no supplies, there are huge classes, and no time to plan or meet student needs… they give up. Why dedicate yourself and give your all, when no one cares.