I usually agree with Matt Di Carlo. He is one smart guy.
But not always.
That’s okay. Friends can disagree and still be friends (I proved that by blogging with Deborah Meier for five years).
I think that value-added methods of using test scores to rate teacher quality are “junk science.”
Now, granted, I am but a historian, not a social scientist, but I do read lots of social science. I noted that the National Academy of Education and AERA held a briefing on Capitol Hill and issued a joint statement warning about the pitfalls of VAM. Here is a salient point from their report: “With respect to value-added measures of student achievement tied to individual teachers, current research suggests that high-stakes, individual-level decisions, or comparisons across highly dissimilar schools or student populations, should be avoided.”
Edmund Gordon, one of our nation’s most eminent psychologists, recently led a commission to study assessment practices. He concluded that the overemphasis and misuse of standardized testing to hold students, teachers, and schools accountable is not only ineffective but “immoral.”
I would say that “immoral” is an even stronger condemnation than “junk science.”
Campbell’s Law suggests that the use of high-stakes testing degrades education. Threatening to fire teachers if their students’ scores don’t go up does not produce better education. It produces worse education. It promotes narrowing of the curriculum. It promotes cheating. It encourages teachers and schools to avoid the neediest students.
Teaching is so much more than test scores. To think that teachers may be defined significantly by the rise or fall of the test scores of their students requires a belief in the intrinsic value of standardized tests that I do not share. We may learn something from wide assessments with no-stakes, like NAEP. But using these flawed instruments to fire teachers and close schools is–in my judgment–wrong. They were not designed for those purposes. And the first rule of psychometrics is that tests should be used only for the purpose for which they were designed.
All things considered, the term “junk science” seems appropriate, as does Dr. Gordon’s phrase: ineffective and immoral.
One reason parents flee public schools, if they can afford it, is to escape the dead hand of testing that now strangles learning. The sooner we can put testing in its place as a diagnostic tool for teachers to assess what they have taught (not as a Pearson-designed 14-hour ordeal), the sooner we will restore the rightful purposes of education and the dignity of the teaching profession.
You may be interested in the report by some very reputable evaluation researchers found at
Click to access b9667271ee6c154195_t9m6iij8k.pdf
“A study designed to test this question used VAM methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teachercannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM results are based on factors other than teachers’ actual effectiveness.”
Teachers grade children using their own VAM exams all the time.i know that Duane would be happy to say the results are meaningless, but I don’t believe many teachers would say that the results of the exams they use to grade students are junk science.
Teacher made test/ corporate made test: apples and oranges, my friend.
I think that point has been made here many times.
While I agree, that any /all test have issues, I am able to address them much better if I made the test and went over the answers with my students following the exam. This is also why (at the HS level) the entire course grade is not based on tests, we use many types of assessments, to get a more balanced view of what the student can do.
Also, are there any teachers out there who use a VAM model in the class room?
Ang, are you saying that teacher’s are better able to psychometrically construct assessments? Also, while “corporations” do sell assessments, they typically employ folks – including educators and researchers – who construct those assessments. Finally, are you saying that if teachers constructed state tests that they would be valid?
Ended education,
Yes, teachers are better qualified to assess their students than tests constructed by psychometricians. Teachers know what the children were taught. The psychometricians don’t. Teachers can get immediate feedback about whether their students learned what they were taught.
That is what happens on our best private schools. That is the way Finland does it.
But if assessment is “junk science” isn’t teacher designed assessment simply pinning down the exact time and place of birth in order to cast a “more accurate” horoscope?
TE, it is obvious you don’t know what VAM is. Don’t embarrass yourself.
I am making the same distinction that Di Carlo makes between how a test is employed and the validity of the test methodology.
Do you believe that standardized tests are too imprecise to tease out the impact of teaching on students performance or do you believe standardized tests tell us nothing about students? Given your posts about the impact of poverty on standardized test scores, I had thought your position was the former until you disagreed with the Di Carlo post.
Diane, there is a difference between assessment and test construction. There is no doubt that educators who are immediately familiar with children have a better sense of which measures to give or how to interpret those assessments. However, constructing many of those assessments often requires a level of technical expertise. For example, CBM (e.g., DIBELS, AIMSweb) – the probes are designed by folks outside the classroom, but administered and interpreted inside the classroom.
That being said, teachers can and do design very effective assessments in terms of informal assessments gauging mastery of content throughout the year, such as quizzes.
My main point, though, is that the mere fact that researchers/psychometricians designed an assessment does not render it less useful, because it’s the application and interpretation of that assessment that is ultimately what will be used to measure learning.
Eded, when they use those standardized tests to hold students and teachers accountable at Exeter, Dalton, Lakeside Academy, Harpeth Hall, and Kincaid, let me know.
The rich get teachers, everyone else gets testing.
TE @ 10:38,
The claim is not that “assessment is ‘junk science'” it’s that VAM based on standardized testing is “junk science”. Teachers assessing students is quite different from students taking standardized tests used as an evaluation of the teacher (which is completely UNETHICAL-that is using any standardized test score for anything other than it is designed has always been considered UNETHICAL.)
Duane
Diane, definitely not advocating VAM at any school.
Ang: for the benefit of those who need an English-to-English translation it can never be repeated enough times: “Teacher made test/ corporate made test: apples and oranges, my friend.”
I applaud your politeness and restraint when trying to discuss high-stakes standardized testing and VAManiacal assaults on teaching staff. In my discussions with people who aren’t in education, they almost invariably latch onto the words “test” and “teacher evaluation” and at first berate me for [they assume] being against ‘assessing’ students and holding teaching staff ‘responsible’ for doing their jobs. I must be an ignorant hater! An upholder of failure! Oh, the shame of bigots who have low expectations!
Oh my… 😦
Once we get past the terminological barriers erected by the edubullies, we have much more productive and meaningful discussions.
dianerav: exactly so; the very purveyors of high-stakes standardized tests and VAManiacal schemes send their own children to schools that are the exact opposite of what they are promoting, defending and mandating for OTHER PEOPLE’S CHILDREN.
A little historical context: William J. Reese TESTING WARS IN THE PUBLIC SCHOOLS: A FORGOTTEN HISTORY (2013) and Nicholas Lemann THE BIG TEST: THE SECRET HISTORY OF THE AMERICAN MERITOCRACY (2000) remove any doubt that money, ego, power, hubris and the expectation of gain always play a great role in the use of numbers and stats to ‘prove’ any case.
However, not all sides in the ed debates [in earlier times or now] had or have an equal stake in the ethical and honest use of numbers and stats. To paraphrase KrazyMathLady, not all the numbers/stats persons “use their powers for good.”
So when it comes to high-stakes standardized tests and VAM, I see Arne/Rhee & Co. a 13th percentile and raise them to a 90th. Rheeally!
But for the sake of the kids, let’s try to keep it real…
🙂
Do the children of well of families take no standardized tests? No SAT, SAT II, ACT, AP, or IB exams?
TE: Those exams are in high school and not before. Do you think that sticking students in front of computers or bubble sheets for days at a time and then evaluating teachers on those scores is a good idea? What about when they’re in kindergarten? Or first grade?
Are standardized exams in high school acceptable? I had not seen that distinction made here before.
The concern here was that the “one percent” were subjecting other people’s children to standardized exams that the children of the “one percent” do not have to take.
I listed a set of standardized exams that the children of the “one percent” do take. There should be no objection to these standardized, corporate created exams because the “one percent” students take them just like other kids, right?
You are correct TE but not only do I say that the results are meaningless as, logically speaking any conclusions drawn from false/invalid premises, by definition have to be “vain and illusory” but also the concept of “grading” students is an abhorrant educational (mal)practice as students internalize what the powers that be say about them.
The vast majority of those viewing these words have benefitted from the “upper” (ha-ha) end of these “grading” scales (myself included) and therefore view them as normal and natural. But those students who are constantly referred to as “failing” quite naturally would view grades and grading and education quite differently. No, the “F” word is education is not fuckohImeanfudge but FAIL. Harm is perpetuated on many students in these repugnant unjust malpractices that are grades, standards and standardized testing.
Duane
Duane, I totally agree. I tutor my younger child in math; and at some point early on in elementary school she got the message that she “stinks at math” (probably more because things were ‘explained’ in a way she didn’t get or she wasn’t *ready* for the material as she is a kid who really tended to go at her own pace–stagnate then zoom then stagnate–in school). So, now her first response to *every* problem in math is “I stink at math” before she has even spent any time thinking about whether she can figure things out. She has built a huge wall, and when I have broken through, she has been *extremely* successful; but the next time, there is the wall again.
To all those people who say, “self esteem, shelf of steam–it’s all a bunch of bulloney; life is not fair; kids need to learn that” I say, “you don’t know what you’re talking about; kids are not mini-adults; the whole situation becomes extremely counter-productive–in the true sense of the word–if it is damaged.”
Ann,
You noting that your daughter has to put a lot into learning something but then suddenly “breaking through” and understanding is a description of how learning usually occurs. Whether it’s sports, music, math, Spanish, whatever, I’ve found that one can go along at a certain level, seemingly not making any progress and then suddenly ‘one jumps up a level”. Then one continues on that level for a while struggling at that level and then with hard work, perseverence, repetition, among many other strategies one does the “quantum” leap again. Many students and people in general seem to get stuck in learning on a level and never seem to escape the “gravitational pull” of that level. Those who break free, well have greater understanding and skill levels than those that don’t.
Duane
TE:
” if assessment is “junk science””
Perhaps you are confusing VAM (and large scale, corporate made, never released to the public, high stakes tests in general and especially those used to measure the teacher as opposed to the test taker) with any old classroom assessment?
While I agree that every assessment has its issues, no assessment is perfect for every child, yada, yada, yada,…
Teacher made tests, tailored to the instruction given, that the child and the parent can see and that the teacher can go over with the students after the test, etc. are better for teaching and learning, IMOH.
Grade students using VAM, ha ha. Have all the teachers fill out a 55 question tripod test about each student and see what those look like. I don’t know what tripod stands for, but basically it’s just some thinly veiled random opinion generator. Have a team of so called experts observe a student for 15 minutes a couple of times a year and base their grades on that.
I’m sure that would go over REAL well with parents. It would instantly be seen as the junk science and buffoonery that it is.
I’m in, lets do this thing!
😉
Finding dignity, morality, and courage requires action. Now is the time to find all three by
standing shoulder to shoulder in a line in front of the children to shield them from the abuse of inappropriate or over applied testing. Every single day the kids are at the mercy of a system gone mad. If the same people/professionals who found their voice for themselves when salaries and jobs became clearly threatened, then to do the same for the students is a responsibility. We just might see a correction to this deliberately misguided destruction of schools and teaching. Do No Harm!!!!!
Is this about how to teach all children or some children? Is this about children and teaching or something more insidious? In the minds of the true architects of this nightmare is this about a war of a different design, complexion, and purpose? Is this callous abandonment of the masses for the privileged few wrapped around the information highway? Should the questions continue? Do we act or just keep asking questions and second guessing how we got here? The people who have been duped in this masquerade of deception and deceit should be angry enough by now to collectively find a voice and a will.
This is not a new plan or purpose. It is all on the books and articles/ papers written by Douglas D. Noble. Nobody can say they have not been warned because this man risked his life to tell us. The Classroom Arsenal:Military Research,Information Technology and Public Education (London:Falmer Press), The Truth About The Information Highway, so forth and so on. Sometimes a voice from the wilderness can be heard, but only if you listen. This blog has served to enlighten but it can only bring you to the source and resources of understanding, it can not make you act on your own need to intellectually find the truth. Read the above author and come to understand…….
“If the same people/professionals who found their voice for themselves when salaries and jobs became clearly threatened, then to do the same for the students is a responsibility. We just might see a correction to this deliberately misguided destruction of schools and teaching. Do No Harm!!!!!”
Excellent comment Ronee. And thanks for the author reference. Too many for too long have tolerated these educational malpractices and should look intently into the mirror and resolve themselves to resist and cease these malpractices.
“Sometimes a voice from the wilderness can be heard, but only if you listen. This blog has served to enlighten but it can only bring you to the source and resources of understanding, it can not make you act on your own need to intellectually find the truth.” Again, an excellent insight, Ronee!!
Hi Ronee,
I was not familiar with Noble.
Very interesting.
Some may enjoy this article (although a few years old, still applicable, I think.)
http://www.ascd.org/publications/educational-leadership/nov96/vol54/num03/Mad-Rushes-Into-the-Future@-The-Overselling-of-Educational-Technology.aspx
My response on Matt’s blog:
Well considering that the concepts of educational standards and standardized testing are so fraught with error as shown by Noel Wilson that it renders them invalid, and once invalidity is proven the reliability goes out the window, any talk of VAM having any value whatsoever is “vain and illusory” (to quote Wilson).
I invite Matt and Sherman to refute/disprove what Wilson says in his “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 or his essay review of the testing bible “A Little Less than Valid: And Essay Review” found at: http://www.edrev.info/essays/v10n5.pdf
Thanks,
Duane
Thanks! Very helpful.
Duane, I’ve said this before, but you might consider actually posting some arguments against assessments, rather than continuing to cite external works. Citations can be helpful, but I’m seeing that you’re taking an assumption Wilson’s work, and I’ve yet to see you demonstrate what you mean by that. It may be helpful to readers to have something in your own words.
Eded,
I am working on that, slowly (very) but surely. The reason I keep referencing Wilson is that I expect everyone that reads this blog to be able to read original works and understand them. The problem is is that most see the length of the ‘Educational Standards and the Problem of Error” and for some reason don’t read it (and it’s a serious problem if educators aren’t willing to read education relating articles/research/opinion, etc. . . ). That is why I post link for a shorter version on the invalidities involved “A Little Less than Valid: An Essay Review”. Yes, I expect those of you who beiieve in educational standards and standardized testing to be able to refute Wilson’s work. If not then it stands and your position is wrong.
I read the vast majority of the links people provide, especially those that deal with defending the status quo (not the band, Joanna B) of educational standards and standardized testing to see if anything new has come out and/or if there is a repudiation of Wilson’s work. Haven’t found any of the latter ever and no one has ever shown me one even though I’ve asked hundreds of times. I would think that those who support “X” position would be willing to defend said position when attacked, no one ever has with Wilson’s work. When powers that be (those who support the illogical and unethical use of standards and standardized testing in this case) cannot defend their positions, they do their best to ignore and hopefully the “thorn in the side” will go away. Well, I’m trying not to allow Wilson’s work be confined to the dust bin of history.
I didn’t get a chance to go through my home email yesterday, but I will resend what I sent a week or two ago as a start to my explaining it to you.
Duane
No rush Duane – just wanted to stay on top of things. I also think it may be helpful for advancing your ideas. The documents you link to are dense, and unlikely many folks are going to be willing or have the background to fully digest them. Simply dropping the name of some guy who think the entirety of psychometrics is wrong probably not going to get you very far. Giving people a bit of reasoning why folks should click on your link might be helpful.
Also, I don’t think we should expect for the average teacher to have the educational background to digest and defend psychometrics as a field. Education is, by necessity, interdisciplinary. A school counselor (unless having taught) probably couldn’t fully understand teaching, and vice versa. Here’s another example – teachers use materials every day in their classroom that they can’t fully replicate or explain. A teacher could likely not independently explain textbook design, how reading levels are psychometrically designed, or how their SmartBoard works. This doesn’t mean they don’t know how to integrate those elements into their teaching, but they shouldn’t be expected to give you a detailed description of the technology behind SmartBoard. The same is true with assessments. Teachers should be extremely knowledgeable about how to administer and interpret assessments, and even be knowledgeable about basic concepts that enable them to assess the reliability and validity of an assessment. However, we can’t reasonably expect teachers to provide an in-depth theoretical defense of the constructs of reliability and validity.
That being said, I’m not saying that Wilson should go unanswered. I do think the work deserves a response, but probably more likely from a university professor with the appropriate educational background. As such, Wilson’s work doesn’t “stand as is” because no one has provided a response. If I provided you with an internet link to someone who thought the universe was 27 billion years old rather than 13, that wouldn’t stand simply because I can’t refute it. The burden of proof in empirical/theoretical debates generally rests with the lesser tested of the two. As such, you dropping Wilson’s name and some links doesn’t count as evidence that psychometrics are invalid. You can claim that as your viewpoint, but to see it stands unrefuted and therefore true isn’t valid.
It’s no wonder that so-called education reformers, who infuse their absurdities and attacks with Bizspeak – children are “assets,” teachers are “talent” to be “managed,” and school systems are “portfolios” – would seek to base teacher evaluations on a model premised on children being “products whose value is enhanced before sale to a customer.”
If the wealthy know-nothings who are trying to smash and grab public education were not given credence they have not earned, the values embedded in this worldview should suffice to have it driven from the schools with derisive laughter and contempt.
No doubt, Michael, no doubt!
I still think it is more about the economy than about teachers or education. People resent teachers right now because of pensions. Historically a state has to offer an attractive pension to attract teachers who will commit to a career of being a teacher! And historically teachers sign up because they enjoy children and youth and they like learning and they take it for the team, and the team promises to take care of them. So those who have nobody to “take care of them” in the long run are resentful and they are lashing out. Suddenly choosing teachers who have a different definition of “taking it for the team” are being bred. The question is: who is the team? Do I still work for the state of NC or do I work for an organization(s) whose contracts trump what is best for the state? Are these organizations the equivalent of textbook companies a few decades ago, or is it a far bigger situation?
I did appreciate this website sort of clarifying the process chronologically for getting us where we are now:
http://www.rightsidenews.com/2013050332484/life-and-science/health-and-education/common-core-the-state-led-myth.htm
but because I also see them finding problems with Obamacare, I come back to being confused about whether what is going on in education is inevitable because those I know in the business end of medicine say Obamacare would have been necessary even under Romney because of the rate baby boomers are turning 65. So are the projections of teacher and school employee pensions down the road scary enough to warrant all the shake-up? I have no idea. But I wish I did.
What I do know is that this morning I presented a song I enjoy singing with elementary kids in the spring. I lead into it with a literacy/math/science component by reading “Inch by Inch”, a Caldecott book by Leo Lionni, and then we sing the song “Inch by Inch” (formally called the “Garden Song” by David Mallett and recorded by Pete Seeger). The words are: “Inch by Inch, row by row, gonna make this garden grow. All it takes is a rake and hoe. . .” And no matter the population or the age group, some child is always alarmed by the word “hoe,” because they don’t know what a hoe is. They think of rap music and street lexicon (though they probably don’t know what it means) and think “ho.” So without going into what they might be thinking, I always explain what a “hoe” is. So to me, that is more indicative of where our society is in mindset than what children demonstrate on a test. Not that I would ever want censorship of songs of any style (such as those the children might be hearing in their parents’ cars).
What does it all mean? How does what I see in the classroom from six year olds relate to what is going on in education reform? Does it? Somehow it must.
Joanna,
“What does it all mean? How does what I see in the classroom from six year olds relate to what is going on in education reform? Does it? Somehow it must.”
Yes, it does relate. It’s just that too few have developed the ability to “see/feel/inuit” or even attempt to ascertain the connections. And your connecting songs/lyrics to the discussions serves to broaden the realm of the discussion, thanks!!
Junk Science Is As Junk Science Does …
The NYC Data Skeptics Meetup
The issue Diane isn’t with your conclusion that VAM shouldn’t be used, but with how you come to that conclusion. Essentially, you disagree with VAM exactly BECAUSE of the science you are calling junk – you are looking at reliability and validity of the assessments used and have determined that such assessments can’t be used to fully describe a teacher’s contribution to a classroom, and aren’t solely influenced by the teacher. I agree with that, you agree with that, and my sense is that Matt does as well. However, the science used to determine the nature of that statement is NOT junk, but is the very premise on which you based your argument. In other words, calling it “junk science” takes away your very foundation for leveling a complaint against VAM.
No matter how credible a methodology or instrument is when applied correctly, if it is bastardized to become a racist, exploitative, and and unfair fiction used to aim, fire, and destroy teachers and learners; then it rapidly becomes junk science.
Di Carlo is probably just being a scientist while writing the post, as far as I can tell. But the main point of his post (that the implementation is the actual culprit) will probably be omitted when his post is quoted by those supporting the usage of VAM in education.
I don’t want to digress to another sensitive topic, but this is the best analogy I can think of: Di Carlo’s argument is very much the same as “guns don’t kill, people kill.”
On the other hand, it is irresponsible, to say the least, to isolate the method from its implementation.
The VAM pilot study in La shows results that are capricious and therefore worthless.
Both researchers who piloted the study have quit LDOE.
VAM is junk:
A couple of points from the article: “Furthermore, what is the case against calling classroom observations “junk science” too? Even when done properly — by well-trained observers observing multiple times throughout the year, observation scores also fluctuate over time and between raters, and they are subject to systematic bias (e.g., poorly-trained or vindictive principals).” I thoroughly agree with the statement, except that these classroom observations have never been considered “scientific” that I know of. So, Matt, you’re raising a strawman argument here that really doesn’t add to the conversation.
And: “You might believe that human judgment is a better way to assess performance than analyzing large-scale test score datasets, and you might be correct, but that’s just an opinion, and it hardly means that all alternative measures are “junk” no matter their policy deployment.” Yes, “alternative measures” are junk because to attempt to “measure” an aesthetic human activity, as is the teaching and learning process, is not logically valid as quantity (which is what “measure” implies) is a subset of the logical category of “quality” or “aesthetics. It is logically impossible to use a subset for assessment of the whole set. That fact is certainly not rocket science. And, Matt, by insisting that that fact is an opinion, you (the pot) have just called the kettle black due to the illogical nature of your beliefs in the value of “measuring” human aesthetics..
Eded,
Here is a part of my response to your email. I hope you don’t mind my putting it out here.
For me the basis of the discussion of Wilson starts with a little bit of logical thinking. And that “little bit” is the logical fact that a “quality” cannot be “quantified” as quantity is a subset of the category quality and it is logically impossible for a subset of a given entity to define the whole category. The teaching and learning process (and it’s the process that needs to be kept in focus not the so called “product”) falls in the logical category of a quality interaction between humans. Any conclusions drawn from logically impossible comparisons/assessments therefore must be invalid (other than by sheer chance where, like the blind squirrel finding an occasional acorn, a conclusion might be correct).
A very quick summary of his dissertation:
Wilson identifies four frames of references that teacher “come from” when assessing a student. Each one has its own epistemological basis and the “mixing” of these ways of assessing can cause errors in the process and conclusions drawn. Each frame has its own suppositions and methods that are not necessarily compatible.
He discusses what a “standard” is and how it is misused in the educational realm.
He shows how the only conclusion drawn from an assessment situation is that it is an assessment of an event-the event being the interaction of the student and the test (whether paper or computer) and that the assessment cannot be “attached” to the student as it is not an assessment of the student but of the interaction itself.
He identifies thirteen sources of error (one of which is the above mentioned attachment error), some of which deal with the incompatibilities of the frames of assessment. Any one error can cause the process and conclusions drawn to be invalid. You state “the author may be using evidence of some invalidity to assert total invalidity”. Yes, that is exactly the case. Once a process has been identified as being invalid all conclusions drawn are invalid. Its like being dead or alive, either one is alive or one is dead, either a conclusion/processs etc. . . is valid or it is invalid. Any source of invalidity makes it invalid.
There is quite a bit more to both of his writings but that is a start.
Good points, one and all, Diane. Except that for most of those 5 years of our blogging together we agreed