Today the federal government released the NAEP 2012 “Trends in Academic Progress.” This is known as the Long-Term Trend report. These tests seldom change in content. They are given every four years to national–not state–samples of students at ages 9-13-17.
The reports say that achievement is stagnant, but it is not true. What is truly stagnant are the scores for the past four years.
There were big achievement gains from 1971-2008 for whites, blacks, and Hispanics, and big achievement gains for students at every age level tested–ages 9, 13, and 17.
From 1971-2008, in reading, black students at age 9 gained 34 points; at age 13, 25 points; at 17, 28 points.
From 1971-2008, white students at age 9 made gains of 14 points; at 13 points, 7 points; at 17, 4 points.
From 1971-2008, Hispanic students at age 9 gained 25 points; at 13, 10 points; at 17, 17 points.
However, for the past four years, from 2008-2012, the scores have been stagnant for every racial and ethnic group and for every age group with the singular exception of Hispanic 13-year-olds and female 13-year-olds.
From 2008-2012, the acme of the high-stakes testing era, there were no gains for black students at ages 9 or 13 or 17.
From 2008-2012, there were no gains for white students at ages 9 or 13 or 17.
From 2008-2012, there no gains for Hispanic students at ages 9 or 17. At 13, Hispanic students gained 7 points.
From 2008-2012, there were no gains for males in any age group.
From 2008-2012, there were no gains for females at ages 9 or 17. At age 13, females gained 3 points.
The lesson of the new report: billions spent on high-stakes testing have had minimal to no effect on test scores.
High-stakes testing has failed.
We need to take a new course.
Even the couple of numbers that measured as “statistically significant” are not meaningful in any way. The .05 confidence level says that 19 times out of 20, if the test says there’s a difference, there’s a real difference. But 1 out of 20 times, there is not. Therefore, if you slice and dice your data into more than 20 categories and one or two show a significant difference, it’s still probably an artifact of measurement rather than real improvement.
Surely there was a better way to spent the billions that went to testing.
Dianne,
I think you may have misinterpreted the results. It appears high stakes testing has had an impact, a negative one. Has anyone teased out regional differences between Reformy areas and states stemming back the reformer tide?
crazycrawfish: while I await further analysis by others, my first reaction is [what appears to be] your initial reaction.
In any case, I don’t think the edubullies and their accountabully underlings are going to like the discussions that will ensue on this blog and many others. NAEP hasn’t been kind to their arguments in the past; I don’t see that changing this time around either.
P.S. Thank you for all the postings on your blog.
🙂
The NAEP has a data mining area that can gen reports. Not only does it show stagnant or declining results beginning 2004 (there are taps prior that year) for example 17 year old math, but states with strong teacher organizations prior the GOP takeover fared better than the national average (Massachusetts, Wisconsin, Ohio). The exception is California but maybe that was their tax cutting and Arnold? No coincidence 2004 is the year NCLB and Bush’s reforms started kicking in.
Those who truly understand testing will understand that it is an assessment, not an intervention. As such, high stakes testing isn’t intended to “work,” but to measure other things that are expected to work.
This would be similar to saying that a reading assessment in K didn’t “work” because the curriculum implemented was ineffective, or that your scale was broken because you didn’t lose any weight.
The “high-stakes” part makes it an intervention – pretty much *the* intervention. When the teacher’s job/the school’s survival/the kid’s graduation/etc. depend on that high stakes test, all you can do is spend your time teaching to and preparing for the test. To pretend that high-stakes testing is separate from curriculum is, at best, disingenuous.
What you’re speaking of is really the accountability movement – not using assessments to measure academic progress. It’s what’s done with those assessments. I’m against the current version of accountability in many places, and I’m with you. But that’s different from being against assessments themselves.
Real teachers know all about authentic assessments. We can determine what students know and don’t know, get and don’t get without a computerized or scantron bubble choice. We are wasting valuable time assessing and not teaching/learning. We are professionals; we know what we are doing. Edufrauds are not needed or wanted. They are a distraction.
Hi Linda – I largely agree with you and don’t really think they’re valuable, especially given the cost and time. We’ve disagreed before that all teachers can create all needed assessments “in house,” but I think we find common ground on the worth of many state tests.
Still, good or not, we can’t argue against an assessment because it doesn’t produce an intervention effect, because it’s not an intervention.
If the assessment doesn’t lead to more insight or a useful intervention to teach the student to think for themselves and become independent readers/learners, then what IS the point exactly?
Teachers, students and parents don’t regard the following as valuable: ranking and stacking, numbering and labeling, classifying and sorting, shaming and punishing.
Teaching and learning are human interactions based on trust and mutual respect.
I am not pushing widgets on an assembly line. Children are people, not data for sale.
I definitely agree with your first paragraph when you say that an assessment should be useful when planning instruction. No question.
In terms of your comments about “classifying and sorting,” activities such as that can be an important part of a learning sequence. Independent thought doesn’t happen at first – a strong foundation needs to be laid. As an example, I think it would be completely useful for a child to “classify & sort” different words into different categories of phonics patterns (e.g., short /i/ vs short /a/). That skill will ultimately promote independence, but by itself isn’t sufficient to enable “independent thinking.”
Wow! Hurray for us. We agree on something. 🙂
However, I was referring to classifying and sorting students by levels, disablities, points scored on a standardized test. The kids know where they end up: level 5 (advanced) vs. 4 (mastery) all the way down to 1 (below basic) as well as their lexile level, whether they are sped, 504, ADD, ADHD…that’s what I meant by classifying and sorting.
However, I agree 100% with your take on those terms.
You know Linda, I think we’re in full agreement here! Let’s savor the moment 🙂
Cheers..I will tap by bottled water to you!
eded – you seem to have a consistent habit of conflating “assessment” with “standardized test”. In theory I’m not completely opposed to standardized tests if they were used for assessment purposes, but it never seems to work out that way – standardized tests are endlessly subject to mission creep and will inevitably be used for high-stakes purposes, at which point you lose the assessment value. And anyway, as Linda points out, there are other (and usually better) ways to assess students that don’t involve standardized tests.
I thought the same thing.
In the 1980s, I taught in an elementary school in a low-income area of Brooklyn. Those were tough times, with high unemployment, a raging crack epidemic, AIDS on rise and the Reagan-conservative policy agenda gaining ascendancy. Then too, there was purposeful underinvestment in jobs, pre-school, public education, health care, housing and infrastructure. Then too, there was very significant pressure from district and city-wide administrations on low-scoring schools like ours. The newspapers ranked the schools and our low place on the list was a source of embarrassment. We worried about being declared a “school under review.” Then too, test prep was a growing part of our daily lives. In fact, our administration thought it would be a good idea to have multiple, full-scale, school-wide practice tests, to familiarize kids with the procedures and items. Each year, we started these practice drills a bit earlier in the year. One year, we started in January with monthly run throughs. After the third one, with the real April tests looming, we met as a staff to examine the results and plan next steps. The results showed declining scores. I raised my hand to say, “All we are doing is making the kids and teachers more anxious with no apparent improvement. Can we stop now?” The administration would not back off… too risky! We conducted yet another round of practice tests. Once again, when the real test result came in, we were still in an unenviably low position on the rankings. The next year, it started all over again with the same result.
Here we are twenty-five years later promoting the same failed theory of action, but with even greater intensity. NAEP scores are stagnating after a long period of improvement. Listening to many policy makers, the solution seems to be obvious… double down on the anxiety strategies! Where have all the improvements gone, long time passing? When will they ever learn?
Diane, why do you choose a four-year period from starting in 2008? Why not a six-year period, or a 8-year period, or a 10-year period? Is there something about 2008 that’s meaningful, policy-wise, or is it just that that’s where we see the trendline flatten?
Oh, I see, the tests happen every 4 years. All-righty then.
Didn’t Duncan come into power then? I guess we’ve been on a Race To Nowhere.
Gives new meaning to the Duncan cap…coined by TC yesterday.
Poor, poor Arne…it’s all falling apart. He needs a new message real quick. Eli? Bill?
Perhaps we should calculate Arne’s value added i.e. the average NEAP score in 2012-the average NEAP score in 2008 and compare it with other secretaries of education (or whatever he is).
And since NY Ed administrators have no accountability score maybe we can do the same for them using New York data.
[Of course, if I was doing this for real I’d probably want to adjust for some other variables, e.g. ethnicity but probably not poverty since poverty is no excuse.]
NYC and the USDOE have had plenty of time to “reform”. Time to turn around Arne and Tweed….out with the bums. Fire 50% of the staff and hire new blood.
What’s good for the indentured servants is good for the big boys!
They are F A I L U R E S!
This is huge! Data showing exactly what I am seeing anecdotally in the classroom. Standards based education is bad education. Accountability hawks have made things worse. None professional educators leading education are lost.
I just saw NBC news where the anchor said “bad news from the federal gov’t about student achievement… it mentioned to increase in scores for all ethnic groups but claimed no improvement since the 70s-claimed the reason was that lower performing students were staying in school and led to lower test scores. I have no doubt that that is a factor. But, Diane is right, the accountability movement has done nothing to improve education in this country.
The problem with real world data is that so many things happen in the real world. As many have pointed out, poverty has an important impact on learning. Perhaps the Great Reccesion had an impact on these scores. It is hard to tease out the impact of what goes on in the classroom from what happens in the world.
It’s hard also to deny or ignore the tremendous impact that “reform” has had in the last 11+ years, isn’t it? Since NCLB was signed into law the government has had basic control of education through manipulation by funding.
Despite the best efforts of Chester Finn and the Fordham Foundation, standards-based education has failed to produce the promised results under any circumstances. Yet Finn et al still cling to their disproven theories and claim “lack of fidelity” and the need for ever more punitive controls.
Remember, te. NO EXCUSES. If teachers and schools aren’t allowed to “tease out the impact” of outside factors, the standardistas and reformers aren’t allowed to do so either, even if that makes one’s pet theories uncomfortably off kilter. That’s not cricket!
Reminds me of the old axiom: conservatism never fails as a public policy, it can only be failed. I think we are at that point now with standards and high-stakes testing and yet we are still preparing to throw billions of dollars at these failed theories during a time of economic woe throughout the nation. Wow.
The 11 year time frame is off for this empirical result. The recession fits pretty well
I’m afraid I don’t understand your math or your logic, te.
NCLB was signed into law in 2001. That means that the scores of the students from 2008 – 2012 reflect the students that had NCLB and Reading First standards, requirements, and high-stakes testing throughout their school years. These students were tested in 4th, 8th, and 12th grade . Those are the years the growth stopped and no progress has been made since. That’s a pretty huge factor I’d say.
What do you mean when you say “the time frame is off”? That doesn’t make sense at all.
The flattening of the score is over the the last 4 years. Shouldn’t it have shown up for the earlier grades in the drop from 2004 to 2008 scores?
Wait, are you saying that poverty is an excuse?
I have never said that poverty was not an issue in education. I am just pointing to an alternative explanation for the flattening of test scores that should be very plausible to most of the folks that post here.
We’ve had intervening recessions during the data period.
Reblogged this on David R. Taylor-Thoughts on Texas Education.
I have to wonder if the mushrooming wealth in the uppermost classes and the de-investment in job creation, research and development here in the U.S. around that time or just a little before correlates. We have heard since Clinton2 that companies are closing, decent careers are disappearing, people who once thought their jobs were secure might have to look at retraining and taking lesser paying jobs…if that is what the market and politicians believe, the why are teachers and public school/public workers taking the blame for the societal aftermath?
OFF TOPIC: Diane. Check today’s (Thursday) 2 the Advocate.com. Our nemesis John White is defending the state giving $1.2 million to Teach for America to hire more TFAs for our worst schools. He calls it “a new approach” even though Louisiana has been infested with them since the 1990s. He and Lottie Beebe got into a big argument apparently. If you can do anything toward influencing this, please do.
“The Latest NAEP Report: Don’t Believe What You Read about It”
Quite correct, Diane, including what you have written as NAEP suffers the same logical errors and flaws that all educational standards and standardized testing entail that render any conclusions/results invalid or as Wilson states “vain and illusory”. -“Vain, Adjective: Having or showing an excessively high opinion of one’s appearance, abilities, or worth. and producing no result; useless: ‘a vain attempt to sleep’. Synonyms: futile – useless – unavailing – idle – conceited – empty” and “Illusory, Adjective: Based on illusion; not real: ‘she knew the safety of her room was illusory’. Synonyms: illusive – delusive – delusory – deceptive – unreal.
Any way you slice it and dice it NAEP results are “vain and illusory” and are not a good argument to use either in support of or in denigration of public education in America (sic) today.
Diane,
have you read and understood Noel Wilson’s ““Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 ?? If so, what are your thoughts about what Wilson has to say? If not, it would behoove you to do so as it is one of the most important educational policy studies in the last 50 years if not THE most important one. All other studies pale in comparison.
Duane
:
NAEP Scores are real scores which enable comparison, not scores thrown on a bell curve where the passing grade is arbitrarily adjusted and guarantees a fair number of winners and losers.
Reblogged this on laartsedforum and commented:
Perhaps now students can receive a full education one that includes learning in the arts and world languages; an education that is inclusive of cultural and social learning in addition to academic learning.
My response: http://usedbooksinclass.com/2013/06/30/41-years-later-a-2-increase-in-reading-wait-um-thats-it/
Excerpt from post:
Since NAEP uses the results of standardized tests, and those standardized tests use multiple choice questions, here is my multiple choice question for consideration:
Based on the 2012 NAEP Report results, what difference(s) in reading scores separates a 17-year-old high school student in 1971 from a 17-year-old high school student in 2012?
a. 41 years
b. billions in dollars spent in training, teaching, and testing
c. a 2 % overall difference in growth in reading
d. all of the above
You could act on your most skeptical instincts about the costs and ineffectiveness of standardized testing and make a calculated guess from the title of this blog post or you could skim the 57 page report (replete with charts, graphs, graphics, etc) that does not take long to read, so you could get the information quickly to answer correctly: choice “D”.
Yes, 41 years later, a 17-year old scores only 2% higher than a previous generation that probably contained his or her parents.