A new study by researchers at MIT, Harvard, and Brown cast doubt on the value of pursuing higher scores on standardized tests as an end in themselves.
Since this has been the highest goal of federal policy since 2002, when No Child Left Behind was signed into law, the study raises questions about the billions spent on testing, test preparation, evaluating teachers and schools by test scores, firing teachers and principals because of test scores, and closing schools based on test scores.
Are test scores the Golden Fleece? No.
Yet with the release of every NAEP test or every international test, the media go into a frenzy, and Arne Duncan leads a national day of high anxiety and breast beating about our nation’s imminent peril because test scores did not rise as much as they should.
The new study raises the question of how much those standardized test scores mean.
The study found:
In a study of nearly 1,400 eighth-graders in the Boston public school system, the researchers found that some schools have successfully raised their students’ scores on the Massachusetts Comprehensive Assessment System (MCAS). However, those schools had almost no effect on students’ performance on tests of fluid intelligence skills, such as working memory capacity, speed of information processing, and ability to solve abstract problems.
The researchers calculated how much of the variation in MCAS scores was due to the school that students attended. For MCAS scores in English, schools accounted for 24 percent of the variation, and they accounted for 34 percent of the math MCAS variation. However, the schools accounted for very little of the variation in fluid cognitive skills — less than 3 percent for all three skills combined.
…
Even stronger evidence came from a comparison of about 200 students who had entered a lottery for admittance to a handful of Boston’s oversubscribed charter schools, many of which achieve strong improvement in MCAS scores. The researchers found that students who were randomly selected to attend high-performing charter schools did significantly better on the math MCAS than those who were not chosen, but there was no corresponding increase in fluid intelligence scores.
U.S. News describes “fluid intelligence”:
Those skills are described as fluid because they require using logical thinking and problem solving in novel situations, rather than recalling previously learned facts and skills.
“It doesn’t seem like you get these skills for free in the way that you might hope, just by doing a lot of studying and being a good student,” said the study’s senior author, John Gabrieli, in a statement.
What improving test scores does do, Gabrieli said, is raise students’ “crystallized intelligence” – the ability to access information from long-term memory to use acquired knowledge and skills.
The importance – or lack thereof – of standardized tests has been widely debated by educators and state policymakers. While some argue that testing is important to track students’ performance and progress, others say there is a culture of over-testing in the United States.
He added that “crystallized intelligence” is important, but should not be the only goal of schooling:
But Gabrieli, a professor of brain and cognitive sciences at MIT, said improving crystallized skills – such as recalling previously learned facts – is still important, and that the findings should also be used to push educational policymakers to add practices that help enhance cognitive abilities as well.
“It’s valuable to push up the crystallized abilities, because if you can do more math, if you can read a paragraph and answer comprehension questions, all those things are positive,” he said in the statement. “Schools can improve crystallized abilities, and now it might be a priority to see if there are some methods for enhancing the fluid ones as well.”
“Fluid intelligence” seems to be the “higher-order thinking skills” that policymakers claim to value. But they are not measured by standardized tests.
“…Arne Duncan leads a national day of high anxiety and breast beating about our nation’s imminent peril because test scores did not rise as much as they should.”
I LOVE this statement. What an image!
Yes. It’s hilarious!!!
You can read preliminary work on this (1,368 students) at the NAEP Governing Board
cf. Martin West …. also look at his resume please so that you will know what direction he is heading. Tom Fiorillo (I think I have the spelling right) has written about this group that is centered at Harvard PEPG…. I have been writing critical comments for 2 years now…. There is a much better study by Deborah Waber out of Boston Children’s Hospital ; she does not start out with an ideology of “voucher only” or the ideology of Shumpeter Peterson …..
Proven right again! maybe someone will listen. Test is NOT an indicator of academic achievement. try this http://savingstudents-caplee.blogspot.com/2013/12/accountability-with-honor-and-yes-we.html
quote: “Fluid intelligence” seems to be the “higher-order thinking skills” that policymakers claim to value. But they are not measured by standardized tests.”
There is a serious problem in choosing operational definitions to measure these
and there is still a lot of debate whether or not it is a legitimate division : crystallized/fluid. For information that is readily available to the school psychologist or teacher look at the Woodcock Johnson test (McGrew’s work and the IAP website) and then look at the WISC. Toronto is attempting to build a large group test that would partially get at the concepts. Also, look at Elizabeth Wiig’s material on executive function and working memory; she is the one who most closely defines what we need for language development. There are other versions that are still experimental (in the lab but not yet in public schools)…. I still prefer Deborah Waber’s study at Boston Children’s Hospital where she examines these factors along with MCAS high stakes testing. Fordham/Education Next/PEPG/Harvard etc are filtering the information that they say comes from Brown or MIT so I would read with a skeptical eye. They include OCEAN (a 5 factor personality model that has some cross-cultural validity) but they only used one of the 5 factors and they use the “GRIT” measure but they adapted self-rating scales on an experimental basis. Then when they don’t get the results they wanted they go back and say well the measures weren’t very good or they say the “students lied” (reference bias they call it). If there is Gates funding for the entire study (the MIT part and the Brown author) then I would not buy any of the conclusions. This is not research the way I was taught to do it. Someone on here wrote (probably Diane) that the research department has become a public relations department for the public schools . In this case, “research” purports to come from “neuroscience” to scare me into thinking that I don’t have a firm basis in psychology or developmental psychology of school children? That is why I draw a contrast between the Boston Children’s Hospital study Deborah Waber et al and this “MIT” study …. Children’s Hospital has been there with a track record over the 50 years that I worked with Massachuetts schools.
Jean,
It sounds like you know a thing or two about these topics. Trouble is, you write like we lay people know what you are writing about. It would be great if you could break this down for us. Please? I know there is a wealth of substance here, but all I took away was that you disagree with the MIT findings. Findings that made my heart sing.
Thank you,
JonBoy
JonBoy; sorry about that. There are so many complex issues.
I think one thing is to look at the funding source behind the study.
I can offer more information on that. There are some books that are available to the lay person; an example might be “Ungifted” by Scott Barry Kaufman but he has a lot of technical jargon in that also.
JonBoy: you are right it can get pretty “heavy duty” and, there is a lot of experimental …. not ready for public school policy. This article is an example:
“Self estimates of general, crystallized, and fluid intelligences in an ethnically diverse population☆
James C. Kaufman ⁎
Learning Research Institute, California State University at San Bernardino, United States
——————
For a lay person reading the library might have “Ungifted” by Scott Barry Kaurman; it takes some of the ideas and expresses in language that is not written with as much technical jargon.
“Fluid intelligence” seems to be the “higher-order thinking skills” that policymakers claim to value.
Interesting that these are the skills policy makers say they value because there is so little evidence that they employ these skills themselves.
Yes, indeed. It is very, very amusing that the very skills that they claim to value are the ones that their system does not improve.
Awesome observation Carol!
Thank you for at least acknowledging that there is “still a lot of debate” regarding these operational definitions of intelligence. Ravitch, by contrast, simply ignores the nuances of the very article she cites and frames it with a blog title (“MIT Researchers: Higher Test Scores Do Not Translate into Higher Levels of Thinking”) that completely misrepresents its content in an obvious attempt to force an agenda. You at least appear to have the professional integrity (or dare I say critical thinking skills) to weigh in on these important questions after carefully examining the research and literature.
AR,
A quick Google search yielded these results for the article in question. Based on their similar, to DR’s blog post, titles and content I believe you would have us believe that each is guilty of pushing the same agenda you accuse DR of pushing.
Boston Study: What Higher Standardized Test Scores Don’t Mean- Boston NPR 90.9
Even when test scores go up, some cognitive abilities don’t- MIT News
Standardized Test Score May Not Correlate to Higher IQ- Psychcentral.com
Study: Test-score gains don’t mean cognitive gains- Washington Post
Study: High Standardized Test Scores Don’t Translate to Better Cognition- US News & World Report
Hmm…
As for DR’s agenda, saving America’s schools from the corporate ‘reformers’ who seek to profit at the expense of our children, or nation, and our democracy, there is none more noble nor beyond reproach.
Have a nice day,
JB
Don’t fret JB, those offended or feigning concern are either on the gravy train and/or linked to the charter chain charade.
So MIT will stop using test scores to determine who to admit? When our children and I visited MIT, we were not not to bother applying unless they invited the youngsters to apply. And how did they decided who to invite? Top test scores.
Harvard & Brown also make test scores an important part (only part, but an important part) of admissions policies.
Tests certainly don’t measure everything that’s important. It will be interesting to see if these faculty succeed in getting MIT,Harvard and Brown to change its admission policies.
MIT, Stanford, et. al will continue to use SATs and ACTs as these de-facto IQ tests give them a starting point for admissions.
Well, they would get in trouble (public opinion, if nothing else) if they used family income as their main admission criteria. Test scores accomplish the same thing without drawing so much criticism.
This study is on eighth graders. Nothing to do with the SATs, ACTs and ACH tests. This study discusses standardized testing on the lower grade levels and its ineffectiveness. I bet those that did extremely well on the 4th grade ELA last year might still not score a 2400 on the SATs. The tests are not designed the same way and the new ELAs are not respected as valid unlike the SATs. The CCLS have nothing to do with SAT aptitude or increasing it.
It is absurd to relegate a child’s entire education experience to the latest “study”. Our are not kids are lab rats. There is a place for crystallized memory (multiplication tables for instance). This type of learning leaves children with a bunch of memorized facts that they can regurgitate, but without the critical thinking skills necessary to utilize that information for anything meaningful.
Indeed, but the CCSS are ALMOST ENTIRELY lists of abstract skills.
Reblogged this on David R. Taylor-Thoughts on Texas Education.
Could you please specify three discrete “critical thinking skills” and explain how to teach them to school children.
I thought not.
Who are you directing your question to?
Just looking for some concrete examples.
Does it matter Joe, why don’t you answer?
He can’t answer because their is none.
If you’re confused, let me try to help. I’ll use a much more fundamental example, nothing quite as complex as critical thinking.
I can teach graphing skills using a smart board demonstration followed by guided practice.
I can teach students how to write sentences in which content specific nouns are used in place of the typical pronouns. I first teach student what pronouns are using common example. I then show some sample sentences in which we identify the pronioun(s) and then substitute the content specific noun or proper noun. The writing skill is reinforced through guided practice.
Your turn. Critical thinking skills. No fair using Wikipedia.
My teaching included asking students to help solve an environmental problem of their choice, in the schools neighborhood. Some created a recycling program for the school.
Some decided to challenge 3 large companies that were smelling up the air. It took 3 years but they succeeded. Along the way they tried a lot of strategies including studying the (weak) laws on odor emission, testifying at the legislature in support of stronger laws, talking with a reporter, going door to door with a petition, etc. They learned how to set a goal, and use various strategies to accomplish it. Then they wrote a booklet about what they learned.
I am out of date in this reply; my colleagues currently teach this field. What I relied on was the University of Wisconsin Madison and their description of critical, creative and independent reading. I am 3,000 miles from home so can’t look it up and I would have to ask my
colleague dr Giovino in Massachusetts for her current recommendations. If you are really interested in the research, look at Keith Stanovich’s books and articles on reading. When you get to the other subjects (beyond reading) I would have to ask a colleague. You might look at Jack Hassard’s blog on “Art of Teaching” renamed from the “ART of Teaching Science” He is professor emeritus in Georgia and also worked in Massachusetts. When it comes to history and civics education I would ask the Millbury High School teacher in MA for his recommendation as he is responsible for a state program (brought in when Kennedy was still in Senate) and all of his work at the high school level is aimed at this effort.
The owner of this blog has demonstrated that she is perfectly capable of defending herself. However, I would rather she not spend her time on trivialities. So, always willing to lend a hand in support of public education, courtesy of your local neighborhood KrazyTA…
A cursory—not even careful—reading of the words of the owner of this blog shows the posting has such phrases as “cast doubt” and “raises the question” and “seems to be.” The blog heading represents the tone of the article[s] she references. Naturally, if every blog posting on every blog on the world wide web met every objection about clarity and completeness, there might not be enough bandwidth to support titles that crowded out content. But in in deference to the “Show Me!” state—
“Neuroscientists at the Massachusetts Institute of Technology—working with education researchers at Harvard University and Brown University—have come to the conclusion that higher standardized test scores [whether norm-referenced or criterion-referenced] do not automatically, in all cases, and in every instance translate into higher levels of thinking/fluid intelligence/out-of-the-box creativity/cage busting achievement gap crushing innovative thinking.”
And that’s the short version of a more, er, “accurate” title.
However imperfectly, the title given by the owner of this blog has captured the tone of the article[s] in question. Agree, disagree, think it’s important, think it’s unimportant—
How’s about we get past the title and deal with the content?
Which reminds me of that old adage [ok, I admit, I just made it up]:
“Lord, deliver us from long boring titles so we can just get on with it.”
😎
“What improving test scores does do, Gabrieli said, is raise students’ “crystallized intelligence” – the ability to access information from long-term memory to use acquired knowledge and skills.”
Considering the absence of studies into how long students retain knowledge and skills, I am skeptical of how stable this “crystallized intelligence” is. Perhaps it can be more aptly described as crystallized behavior, supported artificially by school structures and somewhat meaningless outside of the school context.
Fluid intelligence implies knowledge and skills that continue to be useful in new contexts, the only capacity that can help students succeed in their lives, the only metric that should concern us. “Crystallized intelligence” sounds pretty but could well be as ephemeral as snowflakes.
But both are important because application of fluid intelligence depends upon declarative knowledge. You can’t understand the “Socrates is moral” syllogism if you don’t know what “mortal” means. You can’t separate items into classes unless you know what their properties are. Use, what constitutes essential knowledge of what changes, and is changing increasingly dramatically–but deep knowledge within SOME domains–which can differ dramatically from person to person–is essential for the application of fluid intelligence at sophisticated levels. For far too long, we’ve had this phoney war in U.S. education between the teach the skills and teach the facts folks. Skills are procedural acquisitions (innate or implicitly learned) and knowledge of one kind–innate, implicit, or explicit knowledge of how. To be learned, implicitly or explicitly, they have to be practiced in contexts.
Years ago, Roger Shank and his students set out to write a computer program for a robot that would go into a fast-food restaurant and order a burger and fries. The rules–the fluid intelligence–turned out to be the really easy part. The crystallized intelligence turned out to be breathtakingly difficult because there was so very, very much that the robot would have to “know” about the world in order to carry out those rules in the contexts of eating, doing a commercial transaction, interacting socially, and on and on and on.
Here would have been a more appropriate title for the blog if Ravitch was truly interested in addressing its content: “MIT Researchers: Study suggest that test scores measure some, but not all, important thinking skills”. Critical readers/thinkers pursue both the global meaning of a text but also the details. That’s why rereading is important, especially when dealing with complex texts. Another important disposition of critical thinkers discussed by those who think a lot about this subject is “open-mindedness”. Good thinkers reconsider & revise their views when they recognize that their thinking has been fallacious or incomplete (see Harvard Project Zero). In short, I encourage Ravitch and KrazyTA reread the article with an open-mind.
You’re preaching about an open mind? Good thinkers? Reflect much?
I personally believe a test is beneficial instead of just a “gentleman’s agreement” where we would revert back to the 50s. Sometimes the Meyers Briggs personality measure is used but only to offer some indication of who has the persistence. GPA is especially good predictor (we need to know which tests have predictive ability and I claim that the “stuff” they are putting out now has no predictive ability and is being used for the wrong purpose.
Just wondering…
Would it be safe to say that fluid intelligence depends on chrystalized intelligence? Without it, could a person still be able to have higher order thinking skills?
I would recommend this article from Boston Children’s Hospital.
DEVELOPMENTAL NEUROPSYCHOLOGY, 29(3), 459–477 Copyright © 2006, Lawrence Erlbaum Associates, Inc.
Executive Functions and Performance on High-Stakes Testing in Children From Urban Schools
Deborah P. Waber
Department of Psychiatry Children’s Hospital Boston, MA
Emily B. Gerber
Institute for Prevention Science New York University Child Study Center
Viana Y. Turcios and Erin R. Wagner
Department of Psychiatry Children’s Hospital Boston, MA
Peter W. Forbes
Clinical Research Program Children’s Hospital Boston, MA
High-stakes achievement testing is a centerpiece of education reform. Children from socially disadvantaged backgrounds typically perform more poorly than their more advantaged peers. The authors evaluated 91 fifth-grade children from low-income urban schools using clinical neuropsychological tests and behavioral questionnaires and obtained fourth-grade scores on state mandated stan- dards-based testing.
—————–
if you are unable to retrieve it I can send you the PDF
jeanhaverhill@aol.com
To me it is a much better study than this thing that “MIT” is pushing (first cited author Amy Finn)
A very important study.
I agree. The skills, knowledge and habits that are most worth measuring are the most difficult to measure, but this is no reason to abandon our efforts to do so. I wish there was more energy spent on deliberately improving standards and assessments and less on conspiracy theories and maudlin pronouncements about the harm being done to “our kids”.
sigh
Exactly what procedures did the Common Core Curriculum Commissariat and Ministry of Truth build into its top-down, totalitarian mandate to ensure specific critique, development of alternatives, and so continual improvement in standards and assessments? None. We are supposed to wait until the Politburo meets again to decide, for the rest of us, what outcomes are to be measured, how what is to be measured will be conceptualized, what the measurement instruments will be, and so on.
” . . . maudlin pronouncements”
If you read carefully most of us are angry and frustrated, not inappropriately or overly emotional. Most of us back up our professional “pronouncements” with many years of first hand experiences in working with kids. Request a visit this April and watch a room full of 12 or 13 year old special needs students (any one of whom could be your child, if not for thr grace of god) enter their 18th hour of standardized testing; tests they can barely read, tests that they know (from minute one) that they have absolutely no chane to pass, tests that will ultimately be used to label them as failures. Then you can try to accuse some of us of being a bit too sentimental for your tastes.
Don’t waste too much time on him. He’s always been a hater and very rude to Diane. Supposedly he’s a professor.
Ivory tower type, eh – that explains a lot.
NYS Teacher & Linda: let’s put on our “critical thinking” cap on, shall we?
It was not so long ago that the PISA scores came out. On this blog we were treated to some comments that adamantly described the results as mediocre. Not average, mind you, which can refer to something a bit more neutral like “mean” or “median” or “mode.”
IMHO, “mediocre” is a pejorative judgment, “average” [needing further qualification] can be a relatively neutral description. In the same vein, when someone uses the phrase on this blog “maudlin pronouncements” that is a pejorative judgment, without even the pretense of being a neutral description.
It is a way of contemptuously dismissing legitimate and well-founded concerns, frustration and, yes, anger.
But let’s humor the “maudlin pronouncements” crowd for a moment. Hmmmm… over 50 years ago appeared the first edition of Banesh Hoffman’s THE TYRANNY OF TESTING (currently available as a paperback of the 1964 edition, originally published in 1962, and based on published pieces dating back a number of years). It reads in great part like a contemporary critique of ever-increasing high-stakes standardized testing.
In other words, the standardized testing folks [critical proviso: some of real talent and distinction] have had over fifty years since the first appearance of Hoffman’s book to fix, tweak and perfect their eduproducts and—gloryosky!—they came up with…
Google “hare” and “pineapple” and “Daniel Pinkwater.”
Ok, ok, I’ll make it easy:
Link: http://blogs.wsj.com/metropolis/2012/04/20/daniel-pinkwater-on-pineapple-exam-nonsense-on-top-of-nonsense/
Link: http://www.pinkwater.com/the-story-behind-the-pineapple-and-the-hare/
I know, I know, I must be overcome with “maudlin sentiments” by offering such help, but hey…
Your local neighborhood KrazyTA is always here to lend a hand.
NYS Teacher & Linda: thank you so much for your comments.
😎
P.S. Throwing good measure after bad [hint: a numbers/stat joke]: “What was once educationally significant , but difficult to measure, has been replaced by what is insignificant and easy to measure. So now we test how well we have taught what we do not value.” — Art Costa, professor emeritus at Cal State-Fullerton [Jim Horn and Denise Wilburn, THE MISMEASURE OF EDUCATION, 2013, p. 1]
I am sure there is a better than 98% “satisfactory” [thank you, Bill Gates!] chance of certainty that you understand my point.
😎
Particularly when there is so much money to be made!
Yes we should abandon our efforts to measure such constructs!! To understand why and why eductional standards, standardized testing which form the basis of these supposed measurements that are less than valid, i.e., invalid, and not reliable may I suggest you read Noel Wilson’s “Educational Standards and the Problem of Error” found at:
http://epaa.asu.edu/ojs/article/view/577/700
The sheer absurdity of relying on the concept of “measuring the inmeasurable” as some sort of educational panacea has always baffled me. From the study:
“It requires an enormous suspension of rational thinking to believe that the best way to describe the complexity of any human achievement, any person’s skill in a complex field of human endeavour, is with a number that is determined by the number of test items they got correct. Yet so conditioned are we that it takes a few moments of strict logical reflection to appreciate the absurdity of this.”
And, NO, the pronouncements about the harm being done to “our kids” are not “maudlin”. The harms caused by these educational malpractices range far and wide not only to the kids but now to adults (teachers), the schools and communities. These supposed “measuring devices” are so inaccurate and invalid that they are laughable. How one believes in this bullshit and that’s being nice is beyond me.
Senor Swacker
Most people belive exactly what they want to belive. The truth eludes the intellectually lazy and the zealots.
Duane
Calling it BS is an insult to bulls everywhere
Sort of like believing a LOR written by a relative = CONFLICT OF INTEREST unless politics are involved.
Concepts related to intelligence; Kevin McGrew has done some of the best research. In Canada I believe they are doing solid research in developing group tests but they are still experimental. This is a reference that I value in Canada.
Reference Citation
• To cite the Tests of Cognitive Abilities, use:
Beal, A., Lynne (2011). Insight Test of Cognitive Abilities. Markham, ON: Canadian Test Centre.
• To cite this manual, use:
Beal, A., Lynne (2011). Examiners Manual. Insight Test of Cognitive Abilities. Markham, ON: Canadian Test Centre.
Published by CTC/Canadian Test Centre, Markham, Ontario. Copyright © 2011. Printed in Canada.
———————————————–
What the “MIT” study (I hate to call it that; call it the Finn study because Amy is the first cited author) does is build a patchwork quilt of trait/abilities/ by grabbing from the psychological literature; college textbooks would explain the OCEAN 5 factor theory which seems to be solid and holds up across cultures. When the Finn study wanted to look at non-cognitive factors they “grabbed” only one of the 5 and rewrote questionnaires designed for student self-reports. When they wanted to measure the non-cognitive “trait”/skill called “GRIT” they used Duckworth’s material. These are concepts that are “fuzzy”, difficult to measure, and there are no agreed operational definitions for the “Fuzzy” concepts. We have a long tradition of measuring academic self-concept that has some solid research behind it; instead, the way to get your name in print is to grab something “sexy”, “experimental”, to be the shiniest penny in the jar. And, I question the motives behind the people prompting this work if it is Education Next/Harvard/PEPG, who gathered this team of “researchers” together.
That is why I am pointing out what I believe to be credible work like the Canada Test of Cognitive Abilities. There is an elitist , ego driven push to invent something to put your name on and that is vanity. It does not always generate sound policy decisions because it is experimental, designed for small populations in a lab or what have you. A patchwork quilt of Boston school students and these “fuzzy ” concepts with poorly chosen operational definitions (tests) is not the best way to do research that contributes to the knowledge base.
When the newest version of Finn study becomes available I might see a different picture; I am basing my comments on what is available to me today and that is M. West’s study of 1368 Boston pupils (that he doesn’t want cited but he sent along to the President of NAEP/NA Governing Board to set policy and build new “creations” that are experimental, costly, and have no predictive validity.) So don’t quote the study but put in the top down filter at Governing Board and then order teachers and principals they must do it. That is another issue I keep writing the Governor of Massachusetts about.
Reblogged this on Pilant's Business Ethics Blog and commented:
Quite right. Test scores are a clumsy method of student evaluation.