All of U.S. education policy is now firmly hitched to standardized test scores.
Although the President said in his last State of the Union address that teachers should not teach to the test, he surely knows that federal policy demands teaching to the test.
Test scores determine teacher evaluation, teacher salary, teacher tenure, teacher bonuses. Test scores determine whether teachers and principals are fired. Test scores determine whether schools get closed or commended.
Test scores determine whether students are promoted or held back.
Today, the New York Times reported that a professor at the University of Texas has concluded that the standardized tests are not reliable or valid. He says they predict how students will do in the future in relation to how well they have done on the same standardized tests in the past. They do not show what students have learned.
The story begins:
“In 2006, a math pilot program for middle school students in a Dallas-area district returned surprising results.
“The students’ improved grasp of mathematical concepts stunned Walter Stroup, the University of Texas at Austin professor behind the program. But at the end of the year, students’ scores had increased only marginally on state standardized TAKS tests, unlike what Mr. Stroup had seen in the classroom.
“A similar dynamic showed up in a comparison of the students’ scores on midyear benchmark tests and what they received on their end-of-year exams. Standardized test scores the previous year were better predictors of their scores the next year than the benchmark test they had taken a few months earlier.
“Now, in studies that threaten to shake the foundation of high-stakes test-based accountability, Mr. Stroup and two other researchers said they believe they have found the reason: a glitch embedded in the DNA of the state exams that, as a result of a statistical method used to assemble them, suggests they are virtually useless at measuring the effects of classroom instruction.”
Read the whole story and re-read that last line: the tests are “virtually useless at measururing the effects of classroom instruction.”
Think of it: we have a multi-billion dollar industry that sucks resources out of the classroom, whose tests are best at predicting how students will perform on next year’s tests. The test measure each other. They are designed to do that. They demand teaching to the test.
I don’t know whether the professor’s concerns are right. Others with technical expertise will weigh in . But what was obvious before he spoke out is that these tests are not good enough to carry the weight of determining our social structure, let alone the lives of students and teachers and principals and the fate of their schools.
Let’s hear from the testing experts about this.
My initial comment to the headline was “Duh!”
The rest of the article says it all.
Why isn’t this front page national news? Never mind. I think we all know the answer to that.
I taught 4th grade, where my students were tested for 16 years, before I retired last year. I worked with several different teachers through that process. A few years ago, as the test neared, a teacher, new to testing but with several years of teaching experience, was agonizing over the impending test. I asked her, “Who do you think will pass the test and who do you think will fail?” She rattled off the appropriate list of names. “So why do we need to give the test?” She had a real aha moment.
Why, then, are we wasting our time and the TAXPAYERS’ MONEY on these tests? Shouldn’t that money and time be going to help those we know will fail? But with those who know- the teachers- making the decisions? Could somebody- anybody- please research the cost of writing, printing, administering, grading, and posting the scores for the tests, even in just one state for just one year??????? Imagine what could be done with THAT money!
Reblogged this on Transparent Christina.
“. . . has concluded that the standardized tests are not reliable or valid”.
NSS!!
Considering that Wilson proved that in his 1997 dissertation “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577 and reiterated it in “A Little Less than Valid: An Essay Review” found at:
http://www.edrev.info/essays/v10n5index.html that shouldn’t be such headline news. But has anyone read Wilson? Very few. Wilson should be required reading for all educators.
Reblogged this on The Curmudgeonette Speaks Out and commented:
Yet more evidence that education policies are formulated by boneheads. The fact that there is only universal acceptance (and not universal outrage) over these damn tests is, in itself, an outrage.
This is one, unpublished, non-peer reviewed study. Will you accept one similar, unpublished study (albeit with the opposite findings) as the basis for an article entitled, “Are Standardized Tests More Valuable Than Thought?”
No, I’m firmly convinced that Thought is more valuable than Standardized Tests.
More Thought ❢
Less Tests ❢
Hi Jon, That’s a good play on words !
To your point though, there are plenty of great teachers who are both thought-provoking and whose children do fine on tests.
Ed- tell me about these students’ family support.
Hi Nancy, sorry, I’m not sure to which students you are referring?
To Ed:
As a great teacher I knew which students were mastering the concepts because I was constantly assessing them in my classroom.
I don’t need a high stakes test to tell me what I already know.
Non-educators and profiteers don’t understand that and they never will.
The purpose of the tests is to make money and degrade teachers..bottom line.
Since many of the deformers are not teachers they are trying to make the profession an exact science, which it will never be. Maybe just maybe, Gates, Bloomberg, Rhee don’t actually know what the are talking about. They may have lots of money and confidence but they do not have the classroom experience we have.
Read World Class Learners by Yong Zhao….the US is going backwards. We will lose our edge in terms of creativity and we will lose our entrepreneurial spirit. The more pressure you put on the high
stakes test, the more you narrow the curriculum.
Be informed.
Hi Linda, I was not advocating for or against standardized testing in this post. Only pointing out that the supporting evidence presented for the headline, “Are Standardized Tests Worthless?” is very weak and asking the question would similarly weak evidence results in a post extolling the virtues of standardized testing.
Also the headline “Are Standardized Tests Worthless?” is a textbook example of confirmation bias. See here:
http://en.wikipedia.org/wiki/Confirmation_bias
“Experiments have repeatedly found that people tend to test hypotheses in a one-sided way, by searching for evidence consistent with the hypothesis they hold. Rather than searching through all of the relevant evidence, they ask questions that are phrased so that an affirmative answer supports their hypothesis.”
“Are Standarded Tests Worthless?” Textbook example of confirmation bias.
It’s the difference between providing thoughtful information vs. propaganda.
See here: http://en.wikipedia.org/wiki/Propaganda
“As opposed to impartially providing information, propaganda, in its most basic sense, presents information primarily to influence an audience. Propaganda often presents facts selectively…to encourage a particular synthesis, or uses loaded messages to produce an emotional rather than rational response to the information presented. The desired result is a change of the attitude toward the subject in the target audience to further a political agenda.”
I like to know when I’m getting information versus propaganda.
Ed, Almost everything you read about the “reform” agenda is propaganda.
The attacks on teachers.
The attacks on public education.
The spurious claims about charters and vouchers and merit pay.
It is all propaganda, cherry picked data of no value/
Diane
If what you say is correct, I would not spend time reading posts here, following up and reading sources people post, and posting my own thoughts here.
Given my beliefs, this isn’t a place where I expect to find many allies or expect to change minds on most issues.
We do agree on some issues – teachers need a strong and competent union, charters are not living up to the hype, the importance of due process for teachers, etc.
What I do hope to find here is insight as there’s more than enough propaganda coming from both sides on many of these issues.
See above referenced works by Wilson. Read and understand them. Repeat but do not rinse.
How about the report, “The Irreplaceables”, just put out today by TNTP –Rhee’s progeny? Not published in a peer-reviewed journal (Hanushek’s a technical advisor and the Gates & Waltons provided “support”), they contend that “40% of teachers with more than 7 years experience are less effective than the average novice”, in the four (unidentified) school districts they studied, based on student test scores and VAM. http://tntp.org/assets/documents/TNTP_Irreplaceables_2012.pdf
Few people may readily recognize that it’s a house of cards, especially when TNTP encourages people to spread their gospel like this: http://tntp.org/irreplaceables/spread-the-word
See Shanker blog for dissection of this promotion for the Irreplacible brand-new inexperienced teacher, who is so much better than those awful people with experience.
I realize that four-letter acronyms are irreplaceable in teaching these days, but I couldn’t find anywhere they might have said what theirs meant.
Maybe we should have a backronym contest?
My guess — To Nullify Teaching Professionals
You only need to know two things…started by the Rheeject and funded by the Walton Foundation. The results are always what they want them to be before they begin their “study and investigation”. Survey questions can be worded to get the results you want. Anything Rhee’s name is attached to is a fungus.
Prof W, Can you point me to the “40% of teachers…” content? Which page is it on? Thanks.
Sorry, TNTP is The New Teacher Project, established by Michelle Rhee, after she left Teach for America, which has a Teaching Fellows program that follows the five week training model of TFA. So yes, “To Nullify Teaching Professionals” sounds about right.
The 40% claim is one of their ads here: http://tntp.org/irreplaceables/spread-the-word
AND–read Todd Farley’s 2009 expose Making the Grades: My Misadventures in the Standardized Testing Industry. It’s 242 well-written pages about the foibles of “standardized” (and he says it’s anything but) testing. While so much of the book is hilarious in its absurdity, it will also make you want to cry. When you’re finished, lend it out, and encourage that person to lend it to someone else and so on. Perhaps more
people will become outraged, and more parents will opt their children out.
The problem with benchmark tests, and this study’s reliance on these to detect a statistical aberration, is that they aren’t…standardized. This means that their administration isn’t standardized (oftentimes high/low performing schools are less/more likely to administer them consistently) and they can be distorted by interference by instructors. This interference is unmeasured, and will be dumped into the unexplained portion of the model, which may lead to the aberrations that are found in this study.
The bigger issue, which is much less sexy but more important and statistically defensible, is that many of these assessments aren’t designed to detect student growth among high achieving students. I know nothing about the intervention that this gentleman was hawking (also, note that this seems to be stemming from his dissatisfaction with the test’s alleged inability to detect an effect of a program that he implemented, which can prompt all sorts of cognitive biases), nor do I know anything about the types of students who were served (again, I can find no peer-reviewed articles about this work, just a dissertation). But, if the program is serving high-functioning students, and teaching them advanced concepts, then it’s perhaps not surprising that the exam was unable to detect growth among these students. This is a flaw, but it’s certainly a known one, and will probably cost a lot of money to alleviate for a small percentage of students.
I’m also quite suspicious of this work’s provenance. It seems to be mostly supported by a dissertation from three years ago, from which no peer-reviewed articles have been published (that I can find). Also, I think most of the work was done by the faculty member’s graduate student, which is sort of weird that it is being resurrected and publicized by the dissertation adviser and not, you know, the author of the work.
It’s hard to know what to make of someone who would find the provenance of a PhD thesis “suspicious” because, in its standard use, the word provenance simply refers to “the chronology of the ownership or location of a historical object.” Anyone who has read a thesis would have to know that in conformity with long-established practices such issues are typically addressed in the first few pages. Given this, one can only assume that “provence” and “suspicion” are invoked in proximity to one another in the previous post for reasons having more to do with an effort to discredit the particular work being discussed. The implication is that somehow the artifact, in this case a PhD thesis by my former advisee Vinh Pham, is not what it purports to be, and thus is worth less than it might be if its provenance was secure.
While one might admire the elegance and subtlety of this form of malice and character assignation directed at both myself and, more importantly, my former student, I would suggest that in general: (1) PhD theses, especially those that emerge from top-rated graduate programs, are routinely cited in nearly any realm of formal inquiry as credible sources of scholarship and (2) that the best way to evaluate the quality and significance of that scholarship is to actually read it.
You might also note the NYT article does in fact refer to myself “and two other researchers” in the second of two introductory paragraphs. Both names — Drs. Vinh Pham and Guadalupe Carmona were given to the reporter, Morgan Smith. My guess, and I should stress it is only a guess, is that she left them out only for reasons having to do with style.
Having now addressed your concerns about provenance, I would close by simply expressing our sincere hope that you might now settle into actually reading the work you seem so committed to disparaging. A place to start might be Dr. Pham’s Thesis:
Click to access VinhHuyPham09Dissertation.pdf
My comment (actually, one paragraph from my comment) was not intended as “character assignation”, and I apologize if that was the tone. Rather it was to express my puzzlement that the only findings linked to in the article, and discussed by you, the primary source, were in fact not written by you, and were from a dissertation from 2009. And, from this dissertation, I can find no peer-reviewed articles, even though, after three years, they should have worked themselves through the review process by now.
Also, in other social sciences with which I’m familiar, it is uncommon for the dissertation adviser to be the primary public advocate, three years after approval, for findings from a sponsored dissertation. If this was just an oversight by Morgan Smith, and she improperly linked to the wrong material or failed to properly attribute the author, then that’s fine. Better yet: is there a draft of the article you are preparing to “submit to multiple journals?” (I assume you are not, in fact, submitting a single article to multiple journals, and that this is just an exaggeration by the author) Can you post this?
I am in no way “disparaging” Pham’s work, unless you consider raising substantive questions about the research design and the interpretation of the findings “disparaging” (recall this comprises 3/4 of my comment). I call this science. Nor am I claiming that dissertations have no scientific value. This being said, typically in the social sciences, dissertations with empirical chapters that have strong scientific merit are submitted for review to a peer-reviewed journal. Dissertations, as I’m sure you know, vary widely in scientific rigor and quality, while top-tier academic journals, while still varying, introduce some sort of standardization and eliminates committee biases/institutional biases that may allow inferior work to be accepted. Also, there is wide inter-departmental standards and, even at a top-tier university like UT Austin.
I do find it rather naive/precious that you expect academics and applied researchers (not to even mention curious laypeople) who are genuinely interested in the provenance of this work and these findings to sift through a 211 page dissertation, particularly since most committee members don’t even have this wherewithal.
All this being said, this would probably be a good opportunity to address the bulk of my comment, which pertains to substantive questions. The anecdote which launches the story isn’t empirically demonstrated in the dissertation, unless I’m missing it. The article is written in such a way as to imply that it is. This is, of course, no fault of yours.
Dr. Stroup, as a PhD in political science from a heavily quantitative department, I completely disagree with your claim that dissertations are routinely cited – even those from top-rated graduate programs. In fact, the disciplines that I am familiar with – political science and economics, graduate students are heavily discouraged from citing non-peer reviewed dissertations. Until reviewed by the scientific community, specifically methodological and issue experts, these theses are nothing more than longer academic conference papers. This may be standard practice for graduate programs in education, but the disciplines that are more firmly grounded in scientific and methodological rigor, this is not the case at all.
I have to agree with David. Dissertations are the last work of a student, and they are often the basis of the first work of a researcher. The researchers work is peer reviewed, however. Until it is, it’s only a working paper, and typically not cited in my discipline.
I don’t agree. In my work as a historian, I often cite dissertations. There are some that are second-rate, even third-rate. But I have found many excellent dissertations that were absolutely top-notch, opened new avenues of research, and went into the painstaking fact-gathering and analysis that many established researchers find too tedious. Anyone who reads the histories I have written–especailly “Left Back” and “The Troubled Crusade”–will see that I found many excellent, valuable dissertations. And it is not true that they are not peer-reviewed. In each case, they were read and reviewed by leading faculty members, who were super-peers.
Ms. Ravitch, I am sure that there are fields where dissertations are more rigorous and appropriate. For fields that rely heavily on data and high-end statistical models to examine their questions, this is the norm. In all due respect, I am not sure what statistical model or level of methodological rigor is involved in assessing “The Troubled Crusade,” but research that involves these types of instruments, must be open to review, evaluation, and replication. This is how we separate opinion and suspicion from actual evidence. There are many dissertations in the fields of political science and economics that do,in fact, contribute greatly to our field. But only after they are presented at conferences or submitted to respectable journals where the student’s peers and experts in the field have an opportunity to review their work, examine the models, and replicate the work. Without this level of scrutiny and transparency, our fields would be worthless and our findings as respectable as a grocery store tabloid.
I think the question people are asking here is how we can tell if this was a first rate, second rate, or third rate dissertation. All of those classifications of dissertation are, as you call it, “super-peer” reviewed. The typical process would involve publication of the main findings in a refereed journal.
I should also point out that this discussion is not about citing a dissertation or two among many other peer reviewed pieces of research done on the topic: the dissertation is the only piece of research cited to support the conclusion. I don’t know if you have been a journal editor, but would you accept for publication an article that only cited unpublished research?
No one is questioning the scientific value of dissertations but, rather, how to determine the good ones from the bad ones. John Nash’s dissertation was 28 pages long and broke new ground in game theory. But, because his work was so significant, it was published in the most respected journals in the field. That’s what’s so puzzling about this. Something that is so ostensibly devastating to the standardized testing system, and to accountability systems, and so provocative, hasn’t been vetted by the larger scientific community.
Taking this a bit further, Stroup does elaborate on his claims about “instructional sensitivity” here. Many of the graphics are pulled from Pham’s dissertation. This is from 2009. I, again, want to note the puzzle that appears on page 52:
“And they are in submission or are close to submission, in various forms, for publication.”
Click to access PMENA_09v_05.pdf
What’s also odd about the discussion of these findings is that the author actually reports an affect from the intervention, indicating that the assessment is sensitive to instruction. But, apparently, he claims, the intervention’s impact doesn’t match the anecdotal claims of teachers (who claim students were ready to “rock” the test) or the impact detected by the benchmarks. This is hardly surprising. I’ve never met a teacher, bless their hearts, who don’t signal that their students will “rock” tests. If this is the source of the puzzle, then no wonder it hasn’t been published.
I thought you would appreciate this video of a teacher telling the story about a time that he actually influenced the testing industry. It’s a great profile of a tenacious teacher faced with the absurdity of the testing “machine.”
http://www.youtube.com/user/CurriculumStudyComm
I’m a reporter for the Dallas Morning News working on a piece about Stroup’s work. I’m looking for people who can help me put it into context. Some of y’all who have posted here appear to have the credentials for that. But haven’t exactly made it easy to contact you…1:-{)>
jweiss@dallasnews.com
Stroup posted above. Have you tried clicking on his name? Here’s his faculty webpage: http://www.edb.utexas.edu/education/faculty/view.php?ID_PK=68D6808A-AF96-BF74-7872307E7C91D7DD