Archives for category: NAEP

Bruce Baker has written an important post about the inability of pundits (and journalists) to read NAEP data.

Part of the misinterpretation is the fault of the National Assessment Governing Board, which supervises NAEP. It has a tight embargo on the scores, which are widely released to reporters. It holds a press conference, where board members and one or two carefully chosen outsiders characterize the scores.

He writes:

“Nothin’ brings out good ol’ American statistical ineptitude like the release of NAEP or PISA data. Even more disturbing is the fact that the short time window between the release of state level NAEP results and city level results for large urban districts permits the same mathematically and statistically inept pundits to reveal their complete lack of short term memory – memory regarding the relevant caveats and critiques of the meaning of NAEP data and NAEP gains in particular, that were addressed extensively only a few weeks back – a few weeks back when pundit after pundit offered wacky interpretations of how recently implemented policy changes affected previously occurring achievement gains on NAEP, and interpretations of how these policies implemented in DC and Tennessee were particularly effective (as evidenced by 2 year gains on NAEP) ignoring that states implementing similar policies did not experience such gains and that states not implementing similar policies in some cases experienced even greater gains after adjusting for starting point.

“Now that we have our NAEP TUDA results, and now that pundits can opine about how DC made greater gains than NYC because it allowed charter schools to grow faster, or teachers to be fired more readily by test scores… let’s take a look at where our big cities fit into the pictures I presented previously regarding NAEP gains and NAEP starting points.
The first huge caveat here is that any/all of these “gains” aren’t gains at all. They are cohort average score differences which reflect differences in the composition of the cohort as much as anything else. Two year gains are suspect for other reasons, perhaps relating to quirks in sampling, etc. Certainly anyone making a big deal about which districts did or did not show statistically significant differences in mean scale scores from 2011 to 2013, without considering longer term shifts is exhibiting the extremes of Mis-NAEP-ery!”

But if NAGB wanted intelligent reporting of the results, it would release them not just to reporters but to qualified experts in psychometric s and statistics. Because it refuses to do this, NAEP results are reported like a horse race. Scores are up, scores are down. But most journalists never get past the trend lines and cannot find experts who have had time to review the scores and put them into context.

I have a personal beef here because I was given access to the embargoed data when I blogged at Education Week and had 4,000 readers weekly. Now, as an independent blogger with 120,000-150,000 readers weekly, I am not qualified to gain access to the data until after they are released (because i do not work for a journal like Edweek.) I don’t claim to be a statistical expert like Bruce Baker, but surely the governing board of NAEP could release the data in advance to a diverse group of a dozen qualified experts to help journalists do a better job when the scores come out.

For more than two decades, we have heard that charter schools will “save” poor kids from “failing public schools.”

Most comparisons show that charter schools and public schools get about the same test scores if they serve the same demographics. When charter schools exclude English learners and students with severe disabilities and push out students with low test scores, or exclude students with behavioral issues, it is likely to boost their test scores artificially.

Nicole Blalock, who holds a Ph.D. and is a postdoctoral scholar at Arizona State University, compared the performance of charter schools and public schools on NAEP 2013.

She acknowledged the problems inherent in comparing the two sectors. Both are diverse, and demographic controls are not available.

Nonetheless, she identified some states where charter performance is better, and some where public school performance is better.

The result, as you might expect: Mixed.

Bottom line: charters are no panacea.

You can view here the results for the NAEP for urban districts, known as TUDA, or Trial Urban District Assessments.

Five districts volunteered to take the NAEP in 2002.

Since then, the number has grown to 21 districts.

Test scores have generally risen, though not in all districts and not at the same rate.

Demographics affects the scores, not surprisingly.

Watch for changes over time in the proportion of high-poverty students.

As a New Yorker, I was very interested in the progress of what was once known as the “New York City miracle.” It disappeared.

On NAEP TUDA 2013, there was no “New York City miracle.” For almost every group and grade, scores have been stagnant since 2007. This year, the only group that saw a gain was white students in eighth grade. Black students and Hispanic students in fourth and eighth grades saw no gains at all. Black and Hispanic scores have been flat since 2005.

Knowing of Mayor Bloomberg’s large public relations staff and his pride in having “transformed” New York City’s public schools, I was curious to see how they would spin these flat results.

Here it is, in the Wall Street Journal:

“NYC Student Test Scores Rise Slower Than Other Cities”

“City Says Its Already High Scores Are Tougher to Improve”

But New York City is not number 1; it is not even number 2.

It is in sixth, or seventh, or eighth place in reading and mathematics, as compared to cities like Charlotte, Austin, Hillsborough County, Boston, and San Diego, yet its officials feel compelled to claim that they are just too darn accomplished to make improvements.



When I spoke in Rhode Island in October, I said that test scores were at their highest point in the past 40 years. I also said that the rate of increase had slowed after the passage of NCLB and Race to the Top. The largest recent gains occurred from 2000-2003, before the implementation of NCLB. Whoever writes the PolitiFact column for the Providence Journal claimed that my statements were “mostly false,” for reasons I did not understand, since I had the graphs from the US Department of Education to back me up.

I wrote a response, which the paper did not print.

Historian-teacher John Thompson corrected PolitiFact as well, included in the previous link.

The newspaper just issued a correction, admitting its error.

It is good to set the record straight.

The Thomas B. Fordham Institute is a conservative think tank based jointly in DC and Dayton, Ohio. I was a founding board member and served on its board for many years until 2009, when I decided I could no longer support its central focus on school choice and testing. I had tried to resign earlier, but was persuaded by personal friendships to remain as an internal dissident. One of the qualities I admired about TBF was its candor in recognizing the shortcomings of its ideas and projects. In fact, when people ask me why I abandoned the rightwing crusade for choice, I often refer back to the blunt self-criticisms of TBF’s charter schools. I opposed the idea that TBF should become a charter authorizer but was outvoted. Then, over the next few years, my own illusions about charters were dashed as many of the charters we sponsored became failures.

The latest report from TBF, written by Aaron Churchill, continues the tradition of candor.

Churchill reviews the NAEP results for Ohio and acknowledges that traditional public schools significantly outperformed charter schools.

Churchill compares the performance of students eligible for free and reduced price lunch in both sectors and concludes:

“The results from this snapshot in time are not favorable to charter schools. In all four grade-subject combinations, charter school NAEP scores fall short of the non-charter school scores. And in all cases, I would consider the margin fairly wide—more so in 4th than 8th grade. In 4th grade reading, for example, non-charter students’ average score was 211, while charter students’ average score was 191, a 20 point difference.

“The difference, however, narrows in 8th grade. Charter school scores are only 5 points lower in reading and 6 in math. The standard error bars nearly overlap in 8th grade, but not quite—if the standard error bars had overlapped, the difference in scores would not have been meaningful.

It was findings like these that convinced me that the proliferation of charter schools was no panacea; that most charter schools were no better and possibly weaker than traditional public schools; and that an increase in charters–especially in a state like Ohio, where the charter sector is politically powerful and seldom (if ever) held accountable–would harm children and weaken American education.

And one aside about this post: I object to the idea, recently popular, that NAEP “proficient” should be treated as a reasonable goal for most children, and that anything less is disappointing. New York, which is probably not alone, has aligned its Common Core testing to produce results aligned with NAEP “proficient,” so that anything less is considered failing. This is absurd. NAEP “proficient” represents superior achievement, not pass-fail. The only state in the nation that has reached the 50% mark is Massachusetts. Why set impossible and unrealistic goals? Did we learn nothing from the disaster of NCLB?

Some guy who works for StudentsFirst–the organization that promotes vouchers and charters and wants to strip teacher of all due process–wrote a criticism of me on Huffington Post because he doesn’t like the way I interpret NAEP data. This is silly because I served on the NAEP board for seven years and know its strengths and limitations. NAEP was designed to serve an audit function, never to be used for high stakes. Like every other standardized test, NAEP reflects socioeconomic status. The kids with the most advantages score at the top, and those with the fewest advantages cluster at the bottom. NAEP is generally known as “the gold standard” because no one knows who will take it, no student takes the whole test, and no one knows how to prepare for it. NAEP scores may reflect demographic changes or other factors.

Here Mercedes Schneider takes him to task for his misinterpretation of what I wrote.

The release of the 2013 NAEP results set off cheering among advocates of corporate reform because DC and Tennessee showed big gains. But, I pointed out, states following exactltly the same formula showed small gains, no gains, or losses.

He missed the point.

Tom Loveless of the Brookings Institution doesn’t like it when politicians play games with education statistics.

In this post, he gives a lesson in the interpretation and misinterpretation of NAEP scores ranking the states.

Editor’s note:  While Diane is on a somewhat reduced blogging schedule, she has invited members of the Education Bloggers Network, a consortium of people who blog about education issues on the national, state or local level to contribute to her blog.  If you are a blogger who supports public education and would like to join the Education Bloggers Network, contact Jonathan Pelto at

This guest blog is written Paul Thomas

During her tenure as Secretary of Education (2005-2009), Margaret Spellings announced that a jump of 7 points in NAEP reading scores from 1999-2005 was proof No Child Left Behind was working. The problem, however, was in the details:

During President George W. Bush’s tenure, NCLB was a corner stone of his agenda, and when then-Secretary Spellings announced that test scores were proving NCLB a success, Gerald Bracey and Stephen Krashen exposed one of two possible problems with the data. Spellings either did not understand basic statistics or was misleading for political gain. Krashen detailed the deception or ineptitude by showing that the gain Spellings noted did occur from 1999 to 2005, a change of seven points. But he also revealed that the scores rose as follows: 1999 = 212; 2000 = 213; 2002 = 219; 2003 = 218 ; 2005 = 219. The jump Spellings used to promote NCLB and Reading First occurred from 2000 to 2002, before the implementation of Reading First. Krashen notes even more problems with claiming success for NCLB and Reading First, including: 

“Bracey (2006) also notes that it is very unlikely that many Reading First children were included in the NAEP assessments in 2004 (and even 2005). NAEP is given to nine year olds, but RF is directed at grade three and lower. Many RF programs did not begin until late in 2003; in fact, Bracey notes that the application package for RF was not available until April, 2002.”

With the 2013 release of NAEP data, then, shouldn’t we be skeptical of Duncan’s rush to claim victory for education reform under Obama?:

This year, Tennessee and the District of Columbia, which have both launched high-profile efforts to strengthen education by improving teacher evaluations and by other measures, showed across-the-board growth on the test compared to 2011, likely stoking more debate. Only the Defense Department schools also saw gains in both grade levels and subjects.

In Hawaii, which has also seen a concentrated effort to improve teaching quality, scores also increased with the exception of fourth grade reading. In Iowa and Washington state, scores increased except in 8th-grade math.

Specifically pointing to Tennessee, Hawaii and D.C., Education Secretary Arne Duncan said on a conference call with reporters that many of the changes seen in these states were “very, very difficult and courageous” and appear to have had an impact.

Duncan’s claims, in fact, have prompted The Wall Street Journal to announce “School Reform Delivers”:

Education Secretary Arne Duncan hailed this year’s National Assessment of Educational Progress (i.e., the nation’s report card) results on Thursday as “encouraging.” That’s true only if you look at Washington, D.C., Tennessee and states that have led on teacher accountability and other reforms….

However, a handful of states did post significant gains, and the District of Columbia and Tennessee stand out. Until very recently, Washington, D.C. was an example of public school failure. Then in 2009 former schools chancellor Michelle Rhee implemented more rigorous teacher evaluations that place a heavy emphasis on student learning. The district also tied pay to performance evaluations and eliminated tenure so that ineffective teachers could be fired.

Between 2010 and 2012, about 4% of D.C. teachers—and nearly all of those rated “ineffective”—were dismissed. About 30% of teachers rated “minimally effective” left on their own, likely because they didn’t receive a pay bump and were warned that they could be removed within a year if they failed to shape up.

Clearing out the deadwood appears to have lifted scores.

As I warned on the release date of NAEP, we should anticipate this careless and unsupported eagerness to use NAEP data as evidence of corporate reform success.

Jim Horn has highlighted that NAEP shows a powerful picture of the growing problem with re-segregation and the entrenched reality of racial and socioeconomic achievement gaps—messages ignored by Duncan. At the very least, then, Duncan is cherry-picking.

Gary Rubinstein has also dismantled the DC “miracle,” and G.F. Brandenburg provides a clear chart showing that DC gains are a continuation of a trend pre-Rhee. As Rubinstein concludes:

I’m still pretty confident that in the long run education reform based primarily on putting pressure on teachers and shutting down schools for failing to live up to the PR of charter schools will not be good for kids or for the country, in general.  I hope politicians won’t accept the first ‘gains’ chart without putting it into context with the rest of the data.

With the USDOE at Duncan’s disposal, it seems careless and inexcusable to make unproven claims that policy has caused test score changes when no one has had time to analyze the data in order to make such claims

Like Spellings, Duncan is showing that he is either unqualified to be Secretary of Education due to a lack of understanding of statistics or that he is willing to place partisan politics above what is best for children and public education. Either way, this is yet another example of failure from the top in the world of education reform and politics.

The indefatigable Bruce Baker is at his best in this post, where he puts the NAEP scores into perspective. As he notes, it is not useful to look at a two-year test score change as a reliable indicator. It is far wiser to look at scores in a longitudinal fashion and, when possible, look at other factors that may affect test scores. Then, too, he notes that the NAEP results do not align well with Michelle Rhee’s scorecard for the states. Some of the states she considered to be tops don’t do well on NAEP, either short term or long term.

Even with Baker’s fine analysis, it makes me uneasy to see this maniacal national and international race to get the highest scores. As long as our policymakers and federal policy continue to ignore the undying factors of child health and well-being, and the well-being of families and communities, the NAEP scores are like shadows on the wall, interesting but a distraction from the more important factors that create the conditions for a good life, including a respect for and love of learning.

While Arne Duncan and ex-Superintendent Tony Bennett were celebrating Indiana’s gains on the 2013 NAEP, researchers at Indiana University said the gains were no different from the state’s performance in past years on NAEP.

“Relative to the 1-point gains in mathematics and reading for the nation as a whole, the 5- and 4-point gains for Indiana fourth-graders appear impressive,” said Peter Kloosterman, the Martha Lea and Bill Armstrong Chair for Teacher Education and a professor of mathematics education. “However, state samples are relatively small, and thus scores tend to fluctuate more than national scores. In 2000, Indiana was 9 points above the national average in math, but that dropped to 4 points above in 2007 and 2009 before going back to 9. In reading, Indiana has fluctuated from 2 to 5 points above the national average since 2000.”

In addition:

“Regarding the latest Grade 8 results, Kloosterman said gains for Indiana students are comparable to recent years.

“Indiana is now 4 points above the national average in mathematics as compared to 2 points in 2011,” he said. “Since 2000, however, Indiana has been as high as 9 points above and as low as 2 points above. In reading, Indiana eighth-graders are now 1 point above the national average, the same as 2011 and within the window of 1 to 4 points above the national average for Indiana since 2000.”

Although Indiana remains above the national average, it is not in the top tier of U.S. students. “In brief, we see substantial gains in mathematics across the nation with fourth- and eighth-graders in 2013 achieving about two grade levels above their counterparts in 1990,” Kloosterman said. “There have been gains in reading at both levels, but they are much less than a grade level. Indiana is consistently above the national average, but not at the level of the highest-performing states. These trends have held throughout all the state and national education policy changes over this period.”

Kloosterman is available to respond to questions about how to interpret the latest NAEP results. He can be reached at 812-855-9715 or


Get every new post delivered to your Inbox.

Join 118,240 other followers