Sue Legg used to be in charge of assessment for the state of Floridea, before the current reign of educational inanity took hold. She writes:
QUANTITY VERSUS QUALITY IN TESTING
Back in the late 90s, psychometricians were searching for ways to counter attacks against multiple choice testing. Opponents argued that essay exams were needed to measure higher order thinking skills. Too often multiple choice tests relied on fact based information or esoteric vocabulary as indirect measures of analytical reasoning. In response, we added essay components to state-wide assessments. Essay scoring, however, had its own problems. Readers are people–they have their own biases about essay scoring. I remember Mark Reckase, then at ACT, saying that it would take the average score of 7 readers to achieve the reliability of a multiple choice test. It was a conundrum. Do we trade the validity of the essay that could measure critical thinking for the reliability of a fact-based multiple choice test?
Instead, we looked for ways to combine and improve the two types of questions. At the same time, we worked to capitalize on the possibilities for using computers to improve testing and instruction. I remember the excitement over the possibilities of using computers to adapt multiple choice tests to better measure students’ abilities without spending endless hours in testing laboratories. Essay scoring could also be more efficient if computers could substitute for some, but not all human readers. Testing could become part of the learning process as well as less onerous.
Enter the law of unintended consequences. Instead of helping diagnose student learning problems, testing became the face of an accountability driven political agenda to reform rather than fund educational change. Even though testing has now become extremely expensive, it costs less than attracting and retaining high quality teachers and providing the support struggling students need to learn.
We do not have much to show for all the emphasis on accountability driven reform. Serous efforts to evaluate progress have come up short. Excessive testing does not make better teachers, better schools, or better students. Instead, legislators have created a huge profit driven testing industry, but even that industry cannot offer the advantages of new, more efficient ways to measure student learning for everyone. States are not willing to limit the frequency of testing to make it meaningful and affordable. Even the infrastructure needed to deliver these online, adaptive exams is limited.
Equally bad, the temptation to recreate the old paper and pencil drill and practice worksheets on the computer and call it innovative instruction is proving hard for online learning companies and legislators to resist. They are easy to develop and profitable. The question is how we can force reform minded legislators to recognize that there is a difference between quality and quantity. Online learning and testing has possibilities to improve teaching and learning that are beyond exciting. Unfortunately, they may be beyond the critical thinking and problem solving skills of the current wave of school reformers. One wonders if legislators are up to this test?

Sue,
These may seem like silly questions, but I hope you’ll pause to think about them for a moment:
Is there such a thing as an all-purpose “problem solving skill”?
Does knowing some metacognitive tricks (e.g. brainstorming) suffice to make one adept at problem solving and other higher order thinking, or is there more involved? If more’s involved, what is that “more”?
Is “analytical reasoning” something that can be taught?
If so, how do you teach it?
And if not, why should schools and teachers care about measuring it?
What underlying model of the mind informed those discussions you had about transitioning away from multiple choice tests? Was there a clearly articulated model, or was the model a bit fuzzy?
Do you know if the psychometricians you describe are well-versed in the cognitive science on learning (e.g. Dan Willingham’s, J.R. Anderson’s and P.N. Johnson Laird’s work)?
You seem to assume that teaching “higher-order thinking” is better than teaching facts. But can “higher order thinking” be taught? If so, how? And does teaching facts have any role in a good education? If so, what is it? Do you see any relationship between fact-learning and higher-order thinking skills, e.g. problem solving ability?
LikeLike
This may actually be a bad sign for you, because I am an idiot, but I think you make a fair amount of sense on a regular basis.
LikeLike
Back in the dark ages–or is that where we are now–I edited a book on critical thinking and problem solving. Did research once on examining the differences in the role of fluid and crystallized intelligence in taking multiple choice and essay tests. Love this stuff!!
LikeLike
Ponderosa,
TAGO!
So many questions and so few, if any, rational, logical, ethical and fundamentally conceptually sound answers. But that shouldn’t necessarily keep us from trying and fighting against the edudeformers and GAGAers.
LikeLike
Ponderosa,
I’m not entirely sure if higher-order/critical thinking can be “taught,” the way grammatical rules or mathematical formulas can, but they can be modeled by the teacher and hopefully internalized over time by students.
I teach ESL to recent immigrants, and do a lot of guided reading and guided questioning with my students, making sure to ask questions (depending on the text and circumstances, and in a form suitable for them) such as,
“Is the author relating facts or opinions? How do you know?”
“How is the author supporting their opinion? Is it effective? Why or why not?
“Are the author’s ideas internally consistent?
“Are the author’s analogies valid?
“Is the author arguing about cause-and-effect? If so, are they doing so effectively? Why or why not?”
I also try to find examples of fallacious arguments to give to students, and introduce some basic logical fallacies to them.
Then again, what do I know? I’m neither a psychometrician nor a cognitive scientist, but a teacher, and according to the so-called reformers, an apparent cause of the decline and threatened fall of Western Civilization.
LikeLike
Michael,
Those questions certainly prompt a type of critical thinking about a text. I’m sure there’s some value in that. I guess I worry that such techniques are another version of prematurely asking kids to act like experts. Yes, expert readers ask those types of questions. But they do so AFTER they obtain the ability to comprehend the text. I fear we teachers too often discount this elementary part of reading. We want to take a short cut to the advanced stuff (Bloom’s Taxonomy, don’t you know?), and try to get our kids there at the expense of giving them the wherewithal to comprehend on their own. And whence comes that wherewithal? In my view, it’s teaching facts, facts, facts.This world knowledge gives you word knowledge gives you comprehension ability Comprehension of text gets you 95% of the way to thinking critically about it. So if we really want to make critical readers, we need to focus our energies on making kids better comprehenders, but the activities in which we have kids ape expert readers subtract from the time we should be using to teach them the world knowledge that will make them better readers. Of course, I might be wrong.
LikeLike
“saying that it would take the average score of 7 readers to achieve the reliability of a multiple choice test”
Interesting point on the level of subjective judgment in grading written essays. Definitely harder to quantify and more imprecise than measuring length, mass, or volume.
What is the confidence in the precision of the things that we quantify. On a scale of 5, your school has received a 3, with a precision of +/- 2. So critical thinking tells me, if the precision is so poor, that the results verge on being meaningless, why waste the resources to try to measure such a thing to begin with.
LikeLike
TC,
“. . . if the precision is so poor, that the results verge on being meaningless, why waste the resources to try to measure such a thing to begin with?”
Quit being so rational and logical.
Although I would have said that “. . . the results are meaningless”.
LikeLike
Reliability, in and of itself, is meaningless unless it is wedded to validity.
My students, recent immigrants all, are compelled to pass the NYS English Regents exam in order to graduate. However, that test is devised for native speakers, so how valid is the score of a student who has been in the country three years or less?
Even more ridiculous, my students are compelled to take the SAT, which comprises three-plus hours of torture for them, plus a diminishment of their self esteem. Yet, while the multiple-choice items are reliable, how valid is the test itself for this category of student, who makes up a large portion of the public school population, especially in cities?
LikeLike
Michael Fiorillo: you make an excellent point.
Without quibbling over your use of terms—which would be, frankly, insulting to you—let me riff off of your comments by giving two expert opinions about the psychometric meaning of “reliability” in standardized tests.
1), “Reliable scores show little inconsistency from one measurement to the next‚ that is, they contain relatively little measurement error. Reliability is often incorrectly used to mean ‘accurate’ or ‘valid,’ but it properly refers only to the consistency of measurement. A measure can be reliable but inaccurate—such as a scale that consistently reads too high.” (Daniel Koretz, MEASURING UP: WHAT EDUCATIONAL TESTING REALLY TELLS US, 2009, pp. 30-31).
2), “Reliability in a test is a measure of stability.” (Gerald Bracey, READING EDUCATIONAL RESEARCH: HOW TO AVOID GETTING STATISTICALLY SNOOKERED, 2006, p. 145).
IMHO, they are essentially talking about the same thing, expressing themselves slightly differently because of the context in which they are using the term “reliability” as applied to standardized tests.
Referring to your comments, the varying understandings of the term “reliability” can serve the diversionary purposes of the the leaders and enablers of the self-styled “education reform” movement.
First, tossing around the word in major media forums gives people a false impression since the vast majority of people are going to assume that “reliability” surely means something like “accurate” and/or “valid.” When pressed, the ball can simply be passed off to some fast-talking “expert” (aka accountabully underling) who will use a superficially dazzling array of jargon and numbers (aka word salad) to intimidate people into accepting an explanation they don’t really understand but find impossible to challenge or refute.
Second, the psychometric use of the term can be an unintended plus for those in favor of a “better education for all.” That is, if we press the issue. Put simply, regardless of how psychometrically reliable a particular test (especially of the high-stakes variety) is, it simply doesn’t answer some very basic questions. Just a few: a), just what is the test measuring?; b), is what the test measuring worth measuring?; c), even if the test is measuring something worth measuring, any idea of how much or how little of that is being measured?; and d), is the act of testing itself distorting and displacing genuine opportunities for teaching and learning?
I realize I am opening up another discussion on “validity” but I don’t want to make this overly long.
Thank you for your comments. Please accept my mine as a complement to what you wrote.
😎
LikeLike
“The question is how we can force reform minded legislators to recognize that there is a difference between quality and quantity. Online learning and testing has possibilities to improve teaching and learning that are beyond exciting. Unfortunately, they may be beyond the critical thinking and problem solving skills of the current wave of school reformers. One wonders if legislators are up to this test?”
So much to unpack. No, I don’t I don’t mean “unpack” in edudeform and GAGA parlance, I mean the truck after a week on the river. But back to the subject at hand before I start unloading it (and by “it” I don’t mean bullshit).
I don’t “wonder if the legislators are up to this test”, whatever “this test” is. Understanding the difference between quality and quantity??? Give em a few minutes with me and I can easily teach them the difference. I’d give them a dozen little love slaps across the cheek and then give them a Joan Crawford one (see: http://www.triloquist.net/2011/03/theres-nothing-like-joan-crawford-slap.html). One good slap versus twelve, Quality Vs. Quantity 101.
NO, “Online learning and testing has possibilities to improve teaching and learning that are beyond exciting” doesn’t have any possibilities. Perhaps the usage of technology in the teaching and learning process has “exciting” possibilities but certainly not in the fashion that Ms. Legg would use it.
To whom does the “they” in “they may be beyond” refer”? The legislators or online learning and testing?
LikeLike
A couple years ago, the guy who invented the cubicle passed away.
He intended his invention to bring a new freedom to the workplace. He envisioned it being used to create, tear down, and recreate, flexibly, spaces for collaborative working. Instead, we got the cubicle farm.
Same thing here. All this interesting work on testing. And what we got from it was a hammer beating down our kids and teachers, a machine for turning our curricula and pedagogy into test prep pablum.
LikeLike
Well remember Nobel and dynamite, it was intended to save lives by making mining safer and we know how things turned out. I am afraid with the courts the oligarchs have a full tool set to beat us down with.
LikeLike