The link was left off. It is here.

Valerie Strauss reports on an important new study by a group at Stanford University led by historian Sam Wineburg.

NAEP supporters say that the tests are able to measure skills that other standardized tests can’t: problem solving, critical thinking, etc. But this post takes issue with that notion. It was written by three Stanford University academics who are part of the Stanford History Education Group: Sam Wineburg, Mark Smith and Joel Breakstone.

Wineburg, an education and history professor in the Graduate School of Education, is the founder and executive director of the Stanford History Education Group and Stanford’s PhD program in education history. His research interests include assessment, civic education and literacy. Smith, a former high school social studies in Iowa, Texas and California, is the group’s director of assessment; his research is focused on K-12 history assessment, particularly on issues of validity and generalizability. And Breakstone, a former high school history teacher in Vermont, directs the Stanford History Education Group. His research focuses on how teachers use assessment data to form instruction.

The National Assessment of Educational Progress is considered the “gold standard” of education testing because it is the only national longitudinal measure that goes back to 1970; no one can practice for it; no one knows which students will take the test; no single student takes the entire test; samples of students in every state take portions of the tests.

But when it comes to standardized testing, there is no gold standard. It is all dross, especially now that almost all standardized tests are delivered online. Online testing is popular because it is cheap and supposedly fast. But online testing by its nature allows no room for demonstrating thoughtfulness or for divergent thinking or for creative responses. It is the enemy of critical thinking.

Wineburg’s group tried to determine whether NAEP actually tested critical thinking, and they found that it did not.

But what would happen [they asked] if instead of grading the kids, we graded the test makers? How? By evaluating the claims they make about what their tests actually measure.

For example, in history, NAEP claims to test not only names and dates, but critical thinking — what it calls “Historical Analysis and Interpretation.” Such questions require students to “explain points of view,” “weigh and judge different views of the past,” and “develop sound generalizations and defend these generalizations with persuasive arguments.” In college, students demonstrate these skills by writing analytical essays in which they have to put facts into context. NAEP, however, claims it can measure such skills using traditional multiple-choice questions.

We wanted to test this claim. We administered a set of Historical Analysis and Interpretation questions from NAEP’s 2010 12th-grade exam to high school students who had passed the Advanced Placement (AP) exam in U.S. History (with a score of 3 or above). We tracked students’ thinking by having them verbalize their thoughts as they solved the questions.

What we learned shocked us.

In a study that appears in the forthcoming American Educational Research Journal, we show that in 108 cases (27 students answering four different items), there was not a single instance in which students’ thinking resembled anything close to “Historical Analysis and Interpretation.” Instead, drawing on canny test-taking strategies, students typically did an end run around historical content to arrive at their answers.

Their analysis is fascinating.

It is past time that we relinquished our obsession with standardized testing.