NEPC: The Failure of Test-Based Accountabilty

On behalf of the National Education Policy Center at the University of Colorado, Kevin Welner and William Mathis have written an excellent overview of the failure of standardized testing as the driver of educational reform.

Here is the summary:

“In this Policy Memo, Kevin Welner and William Mathis discuss the broad research consensus that standardized tests are ineffective and even counterproductive when used to drive educational reform. Yet the debates in Washington over the reauthorization of the Elementary and Secondary Education Act largely ignore the harm and misdirection of these test-focused reforms. As a result, the proposals now on the table simply gild a demonstrably ineffective strategy, while crowding out policies with proven effectiveness. Deep-rooted trends of ever-increasing social and educational needs, as well as fewer or stagnant resources, will inevitably lead to larger opportunity gaps and achievement gaps. Testing will document this, but it will do nothing to change it. Instead, the gaps will only close with sustained investment and improvement based on proven strategies that directly increase children’s opportunities to learn.”

Congress is about to pour more billions into mandating the testing of every child in grades 3-8, every year. Given the research consensus documented here, the question is: Why?

The report begins:

“Today’s 21-year-olds were in third grade in 2002, when the No Child Left Behind Act became law. For them and their younger siblings and neighbors, test-driven accountability policies are all they’ve known. The federal government entrusted their educations to an unproven but ambitious belief that if we test children and hold educators responsible for improving test scores, we would have almost everyone scoring as “proficient” by 2014. Thus, we would achieve “equality.” This approach has not worked.

“Yet over the past 13 years, Presidents Bush and Obama remained steadfastly committed to test-based policies. These two administrations have offered federal grants through Race to the Top,2 so-called Flexibility Waivers under NCLB,3 School Improvement Grants,4 and various other programs to push states, districts, and schools to line up behind policies that use these same test scores in high-stakes evaluations of teachers and principals, in addition to the NCLB focus on schools. The proposed new Teacher Preparation Regulations under Title II of the Higher Education Act now attempt to expand the testing regime to teacher education programs.5 These expansions of test-driven accountability policies require testing even beyond that mandated by NCLB.

“Not surprisingly, current debates over the reauthorization of the Elementary and Secondary Education Act (ESEA), of which NCLB is the most recent iteration, now center around specific assessment issues such as how many tests should be given and which grades should be tested, as well as the respective roles of state and federal governments.6 Largely lost in these debates is whether test-based accountability policies will produce equitable educational opportunities through substantially improved schooling. This NEPC Policy Memo explains why they will not.7 Instead, we argue that as a nation we must engage in a serious, responsible conversation about evidence-based approaches that have the potential to meaningfully improve student opportunities and school outcomes.”

NY Teacher says:

February 16, 2015 at 10:43 am

This quote says it all about test-and-punish reform:

“What was educationally significant and hard to measure has been replaced by what is educationally insignificant and easy to measure. So now we measure how well we taught what isn’t worth learning.”

– Arthur Costa, Emeritus Professor at California State University

LikeLike

joe prichard says:

February 16, 2015 at 11:01 am

Have any democrats been told about this study? Is there any pressure upon Obama and Arne Duncan to pull their heads out of wherever they are located?

Q says:

February 16, 2015 at 11:15 am

If you give a group of students a test and the class fails as a whole do you expect the test to fix the problem or do you have to create a plan to reteach the material in an other way? From the onset this report says that testing is effective in identifying short comings, but not in driving change. Well here is your sign! If the data is collected, the problems identified and then no plan is initiated to move the ball then you get what you have always got. In Finland, the curriculum is so well defined that it leaves little room for interpretation and no room for ineffectiveness; all classes are pass fail. Fail it, you retake it next school year…. If the ministry sees too many children failing then they run their version of our standardized testing (a rose is a rose by any other name) call in the stake holders and then turn it over to the curriculum writers to solve the problems before sending the material back to the teachers to teach….so the reality, should we choose to accept it, is to use the data to create a plan. Wow, now there is a concept!

NY Teacher says:

February 16, 2015 at 11:31 am

Using crappy data from crappy tests would create a crappy plan – what a crappy concept! Read the quote I posted and please do your best to understand just how profound it is. The Common Core aligned PARCC and SBAC tests will not produce data worth analyzing, or identify any problem worth solving.

LikeLike

joe prichard says:

February 16, 2015 at 11:35 am

That makes no sense to me. How would such a concept help blame teachers for failure, and prove that class size and money are unrelated to dealing with problems? How would this help in making budget cuts to further weaken public schools? How could it be used to make effective use of the term “union thugs”? I am sure you gave it all some study……but…..it lacks that explosive inspiration so dear to the reform movement.

LikeLike

retired teacher says:

February 16, 2015 at 12:10 pm

“From the onset this report says that testing is effective in identifying short comings,” We know about the shortcomings, but we do nothing to address them. We need to take the bulk of the money to effect change, not on more identification. It is the same as knowing that a shoe is too small for you, but you keep buying the wrong size hoping they won’t pinch your feet. But they do! The outcome is the same, and so it is with more testing. More testing and resources wasted on identification will not improve the results.

LikeLike

Duane Swacker says:

February 16, 2015 at 6:23 pm

Q,

“From the onset this report says that testing is effective in identifying short comings. . . ”

Unfortunately for you standardized test lovers that statement is blatantly false.

Standardized tests are COMPLETELY INVALID as proven by Noel Wilson in his never refuted nor rebutted 1997 dissertation. Read it and understand why “testing IS NOT EFFECTIVE in identifying short comings.” See:

“Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.

1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other word all the logical errors involved in the process render any conclusions invalid.

5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?

My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

By Duane E. Swacker

LikeLike

Ken Watanabe says:

February 16, 2015 at 9:30 pm

You seem to forget the most important fact that teachers don’t create standardzied tests. They are created by non-educators from testing giants named P, G, A-B, etc.

LikeLike

KrazyTA says:

February 16, 2015 at 12:30 pm

NY Teacher: I have used that quote before.

Excellent choice.

I have also used the following on this blog. It is a reminder that barely out of the gates NCLB was not just a predictable failure, but it was already failing:

“The biggest problem with the NCLB Act is that it mistakes measuring schools for fixing them.”

[Linda Darling-Hammond in her contribution to MANY CHILDREN LEFT BEHIND; HOW THE NO CHILD LEFT BEHIND ACT IS DAMAGING OUR CHILDREN AND OUR SCHOOLS, 2004, p. 9]

Reinforced by the following comment by Alfie Kohn in his contribution to the same book, p. 86:

“How many schools will NCLB-required testing reveal to be troubled that were not previously identified as such? For the last year of so, I have challenged defenders of the law to name a single school anywhere whose inadequacy was a secret until yt another wave of standardized test results was released. So far I have had no takers.”

I saved the “best” for last. This blog. The architect of NCLB His Own Bad Self—Sandy Kress! Forget his delusional “there are two many bad tests” [see below] and see how he works the “you don’t fatten a pig by weighing it” argument into his version of the Teflon Defense [of his own actions].

[start blog posting quote]

Yes, there are too many tests (and too many bad tests) – but no, it’s not the fault of NCLB. “Why [states and districts] chose to have tests on top of tests on top of tests” instead of improving instruction “is beyond me,” he said. The testing mania not only spurred the anti-NCLB backlash, but it flat out didn’t work, Kress said: “If you spend all your time weighing your pig, when it comes time to sell the pig, you’re going to find out you haven’t spent enough time feeding the pig.”

[end blog posting quote]

Link: https://dianeravitch.net/2015/01/13/nclb-architect-defends-nclb/

Teachers. Students. Parents.

Opt out of the insanity.

Time for genuine learning and teaching to be given a chance.

A better education for all. Whatever it takes.

😎

Laura H. Chapman says:

February 16, 2015 at 2:53 pm

Congress and USDE and the people they listen to are indifferent to evidence, amid much posturing about wanting evidence-based policy and practice.

I work in arts education where evidence from empirical research (as defined by USDE) is scant and not clearly valued over context-specific qualitative research.

Insisting on evidence-based research for every aspect of education, even from scholars who are meticulous, may well be a first order mistake in educational thinking. Why?

There is no “steady state” in any classroom, school, or district such that the most significant factors in the education of any student can be pinned down and then “replicated or scaled up.” That seems to me the main point of pursuing “evidence”–doing a triage on “what works, and does not work” a USDE slogan that also influences what counts as “evidence.”

I think that the most enduring influences of teachers on their students is largely unmapped territory, primarily because the impulse is to get data and treat it as “evidence” is impulsive–dominated by very short time frames–and truncated–with research on mathematics and reading too often leading to conclusions about the whole of education.”

I would like to see much more attention to what is NOT measured now,
and what is beyond measure (a forthcoming film with the same title).

Consider, for example, a demand for standards-based instruction long forwarded by federal and state policies, most recently seen in the CCSS–which Congress is morphing into an agenda for college OR career prep.

In 2013 in Ohio, there were 3,203 standards on the books, including 1,620 in the so-called Common Core, all approved at the state level, about 267 per grade, not counting those for technology, and financial literacy among others. Further, these standards may not stand still for long. They do not reflect new national standards in the sciences and the arts.

Where is the “evidence” to support this proliferation of standards? Where is the evidence that supported the specific decisions and procedures that put these standards on the books? There is a difference between facts/evidence and values.

There is more pontificating about evidence, than interest in learning from it. For example, nothing about the process or outcomes from standard-setting seems to have been learned from the Goals 2000 project. We are again drowning in standards created by groups who hope that grade-level (not grade span) standards will command more curriculum time. Time is a scarce resource. Under the Goals 2000 Educate America Act (H.R. 1804, 1994), K-12 standards were written in 14 domains of study, 24 subjects, then parsed into 259 standards, and 4100 grade-level benchmarks, some of those for grade spans.

What did not happen then, and has not happened since, is some comprehensive thinking about relationships among standards. What do those 3,203 standards in Ohio mean for the general education of students in Ohio, if they are acted upon?

Can these standards be fashioned into reasonably coherent and engaging curricula that are also feasible? Would “evidence” be relevant to those deliberations or is this really “a vision thing?

What is the point of having all of the standards if they do not speak to an intended curriculum and allocations of time for instruction within and across the grade levels?
What is the point of all of the standards if they do not speak to the matter of resources other than time, including budgets?

This is to say that values matter, This is to say that context matters.

I am just restating that there are no panaceas or silver bullets from “evidence.” especially when evidence is treated as if it is objective and equivalent to an enduring truth. An example from the vintage Coleman report: “The teacher is the most important in-school factor influencing student learning.”

That snippet of “evidence” has been used to justify unparalleled surveillance of teachers. It is too often coupled with a casual disregard for the resources provided to teachers and students, including the time allocations for instruction, and freedom from excessive testing for the sake of mandates for data of little actual use in the practical and world of teaching and learning.

Susan Lee Schwartz says:

February 17, 2015 at 11:27 am

Cross posted at http://www.opednews.com/Quicklink/NEPC-The-Failure-of-Test-in-Best_Web_OpEds-Accountability_Diane-Ravitch_Education_Failures-150217-211.html#comment533947

with this comment, quoted from a post at your site:Why Most Students Will “Fail” PARCC Test

Also G. F. Brandenburg alerted Ravitch to this very troubling analysis by Russ Walsh of the reading levels in the PARCC test.Brandenburg titled his post: “Looks like the reading levels of the new PARCC were deliberately set so high that most students will give up.”Brandenburg says of Walsh’s findings:

“Many analysts say that mass failure is precisely the goal of the people who designed the Common Core tests: if they define “mastery” as reading and doing math two grades above current grade level, then by definition all but a tiny fraction of students will fail, and these “experts” can proclaim that public education is a failure and must be abolished.

“It’s an evil plan worthy of an evil genius.”

Russ Walsh scrutinized reading passages from the PARCC test for grades 3-8 to determine their readability and appropriateness for each grade level. He used five different measures if readability. After reviewing the outcome of his analysis, Walsh concludes:

” The PARCC sample tests show that they have certainly raised the bar when it comes to making reading comprehension passages quite difficult at every grade level.

“These results clearly show that even by the altered Lexile level standard the 4th grade passage is much too difficult for 4th grade children. I would hope that the actual PARCC would not include any material remotely like this over-reaching level of challenge for children. I would hope, but the inclusion of this passage in the sample does not give me confidence.

“The other results show that the passages chosen are about two grade levels above the readability of the grade and age of the children by measures other than the Lexile level. The results of testing children on these passages will be quite predictable. Students will score lower on the tests than on previous tests. We have already seen this in New York where test scores plummeted when the new tests were given last year. English Language Learners (ELL) and students with disabilities will be particularly hard hit because these tests will prove extraordinarily difficult to them.

“What happens when students are asked to read very difficult text? For those students who find the text challenging, but doable, they will redouble their efforts to figure it out. For the majority of children, however, who find the text at their frustration level, they may well give up. That is what frustration level in reading means. The ideal reading comprehension assessment passage will be easy for some, just right for most and challenging for some. The PARCC passages are likely to be very, very challenging for most.”

NEPC: The Failure of Test-Based Accountabilty

11 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

NEPC: The Failure of Test-Based Accountabilty

Diane Ravitch's Blog

11 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats