This comment was posted yesterday:
I am a former, part time item writer for a private testing company; I wrote for many different state standards under NCLB. I must say that poorly constructed, confusing, or developmentally inappropriate items undermine the validity of standardized scores and subsequent use in teacher evaluation. When standardized tests are properly constructed, such items which might make it to a field test will almost certainly be vetted during what is typically a two year process. Many items on the Pearson math and ELA administered last April here in NY were written, in my opinion, in an intentionally confusing style using obtuse or arcane vocabulary. The ELA test in particular included confusing item stems and distractors that were not clearly wrong. There were far too many items that turned subjective opinions (most likely; best; author’s intent; etc.) into a “one right, three wrong” format. Many teachers were unsure of the correct answers on a number of vague and fuzzy items.
The math test included many items that were ridiculously convoluted. Although there may be other compelling arguments against VAM teacher evaluations, corrupt test writing, norm referencing (instead of criterion referenced scoring), and manipulating cut scores add up to a rather important set of reasons to invalidate the entire process.
I’m sending this info to my elected officials and union leaders. This message should be shouted from every corner of the state (and nation).
What a great idea – I’m doing the same!
The message that should be “shouted from every corner of [every] state (and nation)” is that the processes involved in the making of, giving of, and the dissemination of the results, is so rife with error that any conclusions drawn are completely invalid, illogical and as Wilson states, “vain and illusory”.
The message that should be “shouted from every corner of [every] state (and nation)” is that Noel Wilson’s “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 contains everything one needs to completely destroy the validity of educational standards (CCSS) and standardized testing (mandated by NCLB and RaTT) with the bonus of showing the harm to the many students who “do not do well” on these tests and therefore are discriminated against by the state in no less a manner than discriminating by race, gender, sexual orientation and/or age. It is state sponsored discrimination and it HAS to be stopped.
Everything else is just window dressing for the unrefuted, never rebutted Wilson study. My Quixotic Quest will come closer to fruition when Diane R. finally admits that these practice cause untold harm to way to many students and completely rejects these nefarious practices, including NAEP and implores her readers to do the same. Will you do that, Diane?
Speaking of the Quixotic Quest:
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
Confusing questions can’t undermine the validity for usage for teacher evaluation. There is already no validity there. At all.
Saying “only part of the evaluation should be based on test scores of students” is equivalent to saying “only part of the evaluation should be based on a coin toss.”
If anyone wants to read more on this, http://epi.3cdn.net/b9667271ee6c154195_t9m6iij8k.pdf.
My favorite observation from that document: “A study designed to test this question used VAM (Value Added Models) methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM results are based on factors other than teachers’ actual effectiveness.”
“Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores.”
Which leads to two logical conclusions:
VAM is goat feed, or, good teachers have discovered time travel?
I’ll go with the time travel!!
“There is already no validity there. At all.”
As proven by Wilson (and others).
Thank you for validating what so many teachers have been feeling. I do not teach in New York, yet have witnessed what you’re describing many times in my own state as we make the move toward Common Core. Pearson textbooks and computer programs in particular are rife with confusing, vague or outright incorrect material.
A few teachers began expressing this sentiment during the past couple of school years – “Are the folks who write this material really that ignorant of good curriculum and assessment, or are they intentionally setting us (public schools) up to fail?”
At first, even asking this question out loud felt strange – I mean, surely there couldn’t REALLY be anyone setting the schools, teachers, and children up to fail?! Suffice it to say, I’ve seen, heard, and read so much over the past year that I no longer feel any embarrassment over expressing my feelings – I just want more and more people to become aware. This madness MUST stop.
I am fed up with Pearson and its 2013 Scott Foresman Common Core Reading Street series! Also, but I’ll have to go back on to my Facebook page to find it, I read a post today about Pearson acquiring a company regarding ADHD and related testing. I’ll post it when I relocate it.
Here’s the link for Pearson’s Acquires ADHD Testing Company Bio Behavioral Diagnostics.
http://www.prweb.com/releases/2013/8/prweb11061295.htm
Thank you for posting that….I had no idea! Pearson definitely has their finger in every pie, including the health field now. Wow. Correct me if I’m wrong, but doesn’t this mean that Pearson now has a vested interest in ADHD diagnoses?
Judging from what I’ve seen on the PARCC and Smarter Balanced websites, it’s clear that the new tests are going to be complete disasters.
The education business has always had more than its share of people who not only talk but also THINK in important-sounding but vacuous, confused, and woefully unexamined jargon. However, the PARCC and Smarter Balanced folks, in their rationales for their test formats and items, have taken such talk and such thinking to new heights of convolution. What of the test items themselves? well, have a look at the samples on the PARCC and Smarter Balanced websites. Bear in mind when you examine the sample items posted by these consortia that those samples are supposed to be representative of the items from valid, reliable tests of GENERAL reading, writing, speaking, and listening ability. Also bear in mind that these folks wouldn’t have posted the items they did unless they believed these to be good enough to bear up under intense scrutiny. In other words, these items posted by PARCC and Smarter Balanced are the best they could come up with.
If it weren’t for the obscene costs of this development and of the distortions of our pedagogy and curricula in preparation for these new exams, one would be tempted to laugh. But these tests are no laughing matter. The “reform” movement will come to a crashing halt when these new tests are administered. That’s pretty much inevitable.
You know it’s bad when the people who write this junk are telling us it’s crap.
I am the former, part-time test writer (and 33 year veteran school teacher) in question. I was trained by and wrote for a private company named Measured Progress. I wrote hundreds of MC and CR test items for multiple states over a five year period including MCAS and NEAP exams. The contracts I wrote for were for items aligned with individual state standards under the former NCLB . I have also written items for the GED and the NYS middle level assessment in addition to scoring and evaluating field tests. As a classroom teacher, the experience was invaluable; and it helped pay the bills as well. The tests we constructed at Measured progress were the polar opposite of the Pearson tests I administered last April. The item writing supervisors I submitted my work to insisted on clear, direct item stems, only one obviously correct answer and three “plausible” distractors. For every distractor we wrote – we also had to explain why it was a plausible but incorrect response. The company prohibited any MC questions that required the test taker to respond the negative form (e.g. “Which of the following is NOT . . . .”) because they were too confusing to many students. Correct answers could not include a key term from the item stem and plausible distractors required “parallel” construction. The items I wrote were reasonably challenging for the given contract (grade level and subject), and we were required to make sure that language, vocabulary, and syntax was clear and understandable for the age group. We were corrected with certainly if we inadvertently wrote a confusing or convoluted test item. My item writing supervisor made it clear that we were to always err on the side of the test taker. Measured Progress did not believe in writing cumulative EYO exams that tested trivial facts or unimportant ideas; these exams were geared for success for the average student who. In my opinion, the majority of the items I saw on the Pearson exams last April would have been immediately rejected by Measured Progress to be discarded or re-written. Personally, I was appalled at just how shoddy and unfair the test construction was. So, no, I am not a person who “wrote this junk” just an experienced teacher and a trained test writer that could readily recognize it as such.
NYSED just sent out emails informing school districts across the state that they are officially SUSPENDING THE PARCC EXAMS for 2014-2015 and the future of New York state’s commitment to the PARCC exams is very much in doubt. The email cited reasons for the suspension including, cost to districts and technology logistics. Ding-dong the . . .
Wow! Do you have a link? I could not find anything….
Are they going to send a release about this news?
I had a conversation this morning with a rather despondent third grade teacher who had to administer a new county-wide test to replace an old one not in the “common core” style. She is a lovely usually upbeat person who adores her students and they adore her. She told me that in all her years of teaching she had never seen math problems so developmentally inappropriate and felt horrible having to make her students take a test that made them so anxious and upset. The children in class were despondent .. one was even crying over this. And when she gave one concrete example of a test item to me, it was very obtusely written so that in effect the children had to decipher complicated language before they could even do the math in this math problem. So, for English language learners or children with difficulty reading it was not about math at all. This nonsense HAS TO STOP. Students really and truly feel like failures when they are constantly taking tests and not doing well on them. The ultimate irony… they cannot even review the test material to learn so it does not even contribute to learning. Teacher created and administered tests are all about learning. The tests cover material studied in class and the teacher will review the answers in class as a whole group and individually on an as needs basis because the end goal is NOT THE TEST but LEARNING! I fear for the students when the common core high stakes test is rolled next year. It is bad enough that the curriculum is common core and way over the title one students heads. This year they will still have the old state test and won’t even be prepared for that because the curriculum is entirely different under common core. What a national disgrace this is…..
I should mention that the test administered this morning is a series of smaller practice versions of high stakes tests administered quarterly to “measure” student growth before taking the big high stakes test.
The children in class were despondent .. one was even crying over this. And when she gave one concrete example of a test item to me, it was very obtusely written so that in effect the children had to decipher complicated language before they could even do the math in this math problem. So, for English language learners or children with difficulty reading it was not about math at all.
This comment hits the nail on the head. Sad but true, for now.
Tell your despondent friend to keep her chin up and be patient. When good teachers give up over this temporary nonsense – the corporate reformers win.
And this sort of junk will be used to calculate up to 50% of my effectiveness ‘rating’. And, the benchmarks that are not meant to be used that way, but will be, even though they are full of mistakes. Not typos, MISTAKES. None of this shows what my kids learned nor what I taught. Kiss it, boys. Go ahead and fire me. I wish you luck finding someone to do what I do every day.
Ultimately we must ask – why? If the tests are purposely written to cause our children to fail – what is the purpose? Is it to get rid of the teachers? Is it to prove that our students are behind those in other countries? Is it to sell curriculum?
The results are: devastated students, outraged parents, and incredulous teachers and administrators. Education is difficult enough to achieve without throwing rocks, no, I mean boulders, in the path. They’ve turned a choppy river into Niagara Falls.
And what is the ultimate goal? It is definitely not with children in mind.
Based on the options you pose in the first paragraph, I’d say the answer to your question is D) All of the above.
The only solution to this mess, especially given how the test corporations are profit driven to reduce costs by getting the cheapest, not the best, tests, is to go back to that old standby — teacher professionalism. It may also end that Reign Or Error we’ve faced for the past quarter century or so.
It is difficult to write decent “test items,” although the more practice you get, with input and critique, the better you can become. I know because during my 28 years’ teaching I wrote more than 1,000 tests (and test-like homework worksheets) beginning during the “Ditto Machine” era and continuing until I was fired and blacklisted by Paul Vallas and the Chicago Board of Education for exposing some of the more ridiculous “standardized” tests in history — the Chicago CASE (Chicago Academic Standards Examinations).
During those years, I learned that we can test some things, and other things we can’t.
Working from 1969 on in some of Chicago’s most challenged high schools (and a couple of “upper grade centers”), I was able to utilize my quick typing ability to make up learning materials for the kids I was teaching. Over time, the best procedure was to develop materials based on (a) the work being presented and (b) the “level” of the kids in the classes. I chose to teach lots of books, because many of the students I was teaching (generally, 9th and 10th graders) in Chicago’s massively segregated all-black schools had read few if any books during their lives. Surveys I began using in the mid-1970s (a “Reading Interest Survey”) showed that many of the children had few if any books in the home. Often, the main book was the Bible. Often I got questions like, “Is Playboy a book?”
I would try and provide the kids with a worksheet every day on the reading we were doing, so that the class could discuss the materials without having to simply recap the stuff. Every Friday, the kids got a test.
There was no secret, just long-term relentless consistency. Four worksheets (made by me) led to one test. The kids could get extra credit if they found “mistakes” on the tests I had written. Double the “points.”
During those years, I prepped more than 100 books, most of them fiction. Some years, it depended upon the school. If the school had a certain book, that’s the one that got taught. The only rule was that every child had to have a book to take home EVERY NIGHT. In many inner city schools, Chicago deemed that “class sets” were OK for the poor black and brown kids I taught. That meant there were 30 or 35 books PER ROOM, but not a book (or more) per child. The books were distributed to the class at the beginning of the period and collected at the end.
This ridiculous scheme became “normal” at many Chicago public schools serving the children of the poor. “They just lose them,” was the answer when I asked. “Well, let’s get more then…” was the best answer to that. A kid who doesn’t have books in the home and who can’t take his assigned books home is not going to become a book reader, whether in 1969 (when my students managed to “acquire” their own copies of the paperback edition of “The Autobiography of Malcolm X” or 1999, my last year before getting blacklisted, when I was teaching “Native Son” for about the sixth or seventh time.
Heaven came when the personal computer (and “Word”) arrived. I was able, slowly, to transfer my preps to a word processing program and revise them after that first year of prepping any book. Same for the tests.
What I learned during those decades was simple.
No “test” is sufficient for all the kids. Each year, there would be changes.
A lurid example.
During the 1970s, I modified a thing called the Psychometrics Reading Interest Inventory to learn quickly and early what my kids knew and had access to. Some of the questions were basic, some tricky. If you asked a kid how many books he had read in the past year and he said “five” you could follow up with “Tell me about one of them.” (Some children like to please, in ways that are not informative).
For many of my 9th graders, the first book they actually read was in high school. Most were able to like “The Outsiders” and prepped and utilized along with the movie it was successful. Over time, other works were equally possible. But we had a disagreement when Chicago tried to mandate (the Paul Vallas years) that we teach “Romeo and Juliet” during the first semester of 9th grade. It could be taught, but not that early. You have to know your kids before you can launch them into Shakespeare. (And you have to get parental permission if you are going to use the Zepherelli version, with the “Honeymoon” scene, as I called it).
It is also clear that we shouldn’t be mandating “informational texts” instead of fiction for most of our readings. David Coleman is more than a jerk, as he’s proved, but this is a fatal pedagogy, once again proving he doesn’t know anything about teaching, the real world of schools or kid.
I taught many “informational” texts during my years, including, in an AP English Language class Stephen Jay Gould’s Mismeasure of Man. But it was much better to introduce the “informational” stuff in the context of the study of fiction than to mandate it in the abstract.
Finally…
Teachers do assessments all the time. During the years I was writing all that paper so my kids could learn better, one of the things I learned was that you couldn’t test abstractions about “literature.” The “tests” were simply stupid. You can teach and test “narrator” and “protagonist” in the context of teaching To Kill a Mockingbird. But to make up a question asking a child to define “character” doesn’t really get you much further than to know that Jem, Scout and Dill are all characters in “To Kill a Mockingbird.” And to know that the narrator is Scout.
And is the “Protagonist” really Atticus?
Of such realities is reality made, and made the basis for our resistance, which is exploding this year. Opt Out will be a national movement by February, when test season looms. Finally.
And the reason is that the corporate hacks and Broad-vetted freaks currently being placed in dictatorships over our teachers and children are not fit, as the man said in “Gettysburg” “to pour pee out of a book with instructions written on the heel.”
Our trouble was that most teachers spend most of our time working with the kids thrown in front of us, trying to make up for all the nonsense that is placed in the way of our educating those children.
Thanks to all of us, it’s much less lonely today fighting absurdity in these things than it was when I first began learning how to teach “MacBeth” to 11th graders at Steinmetz High School in Chicago more than 30 years ago. As many of our friends here say. We are the experts. Now get out of our way because you finally got us angry.
I sat in a room full of hard-working students during last school year’s administration of these exams. Several students cried because of the length and difficulty of the exam. I gave them tissues, patted their backs and urged them to continue to do their best. To hear that the exams may have been “intentionally confusing” makes me want to punch someone.
How dare you do that to my students!