Heather Vogell, a stellar reporter for the Atlanta Journal-Constitution, has done in-depth investigative reporting on the standardized tests that now are used to determine the fate of students, teachers, principals, and schools.
She has found a surprising number of errors, though not surprising to those familiar with the testing industry.
Read this article. How should a student respond to questions where all the answers are wrong?
What does it do to students when they realize the questions or answers are wrong?
Here is an idea for this tireless reporter: investigate how much money the testing industry spends to lobby Congress and the states to maintain their hold over the minds of our students and the very definition of education.
Readers, after you read Heather Vogell’s excellent articles, please read Todd Farley’s eye-popping exposé of the testing industry called “Making the Grades.”
You will never forget his description of how student constructed responses are scored and who is doing it (minimum wage temps).
I was part of a Common Core K-5 math training last week. The presentation went well until the sample standardized test questions were shown. Then our collective jaws dropped. Diane is on the mark when she states that the Common Core Test is simply the old test on Steroids. You would think that a question regarding a visual representation of the fraction 2/5s would be easily answered by a group of teachers, but our presenter told us that we had probably answered the question incorrectly. Even he was not sure what the “official” answer was. What a bunch of malarkey! It’s tragic to think that a student who has actually done the work and mastered the grade level standards will be told by the state that their progress is inadequate.
Another educator gets “common cored” during supposed “professional development”. My condolences!
In CA we must sign that we will not discuss or reveal test questions to anyone. How can we possibly eliminate the biased, inaccurate and just down right stupid questions without discussing them?
Sit down, shut up and do as we say! Remember that you’re being trained to glorify the Fatherland!
I find there to be a double standard that while tightening standards for teacher pay and employment according to test performance by students, the companies hosting these tests and organization systems are not up to par. This very quote from our DPI came today:
“While the rollout of Home Base has met with success overall, the enormity of the project has not been without a hiccup here and there. To keep you posted on the latest issues defined around Home Base, we have created a webpage to report known incidents.”
Who is getting paid more money here? Why is the performance of these companies not up to snuff?
Refresh my feeble mind, DPI???
Department of Public Instruction
Thanks, Joanna, so many acronyms so little time!
Apparently the tests are loaded not only with errors but trivia too. Knowing who Andrew Lloyd Weber is an important piece of social studies knowledge because…?
I know, right.
On a 6th grade social studies test, yet.
Cultural and SES bias perhaps?
I also wondered at the question but apparently it is in the curriculum. It is difficult, however, to assess bias based on a single item. If the majority of questions reflected the programming on NPR rather than MTV or BET, then you would have more of a case.
Bernie1815,
Test maker bias is just one of the many errors which Wilson has identified that makes the whole process of educational standards, standardized testing and the grading of students completely invalid. Again I invite you to read and understand why in Wilson’s “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 and join the hottest new Quixotic Quest Bandwagon to rid the world of these insane educational malpractices.
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
bernie,
“If the majority of questions reflected the programming on NPR rather than MTV or BET, then you would have more of a case.”
One question out of 70 that is culturally biased would be enough to skew a students score, especially in a particular domaine. However, the important thing to note from the investigation is that there was NO CORRECT answer on the test to choose. This is a problem, along with cultural bias in choice of examples and word choice, we have seen for years.
And regarding ALW being “in the curriculum”…not exactly. According to members of the SS department, Broadway composers are not really “in the curriculum” so to speak. However, ALW is listed in the DOE descriptors as an example:
“Describe major contributions to literature (e.g., Nobel Prize winning authors), art (e.g., Van Gogh,Picasso), and music (e.g., classical, opera, Andrew Lloyd Webber)”
That is one descriptor from a document with 9 pages of 6th grade SS domains and descriptors.
And it still smacks of trivial pursuit, IMHO.
Ang:
Your Trivial Pursuit analogy may well be correct. My understanding is that the item was amended by teachers who created part of the confusion – how many teachers, etc., is not indicated. Your point about a single item is not really the case since any item on a Social Studies exam is likely to put some group or other at an advantage or disadvantage by virtue of their ethnicity or cultural background: Congregationalist may know a lot about the Pilgrims, African Americans may know more about slavery, etc. The issue is the pattern and the general cultural relevance of the information. So, Muslims should know about the Pilgrims and Whites should know about slavery. I am not sure anybody really needs to know about Andrew Lloyd Webber.
My thoughts exactly. When I read this, my first thought was, “How many questions does this test have, that you can include a kid’s knowledge of the pop culture of his grandparents’ generation in his score?”
The ACJ article is poor to say the least. The Webber question is a weak straw upon which to build a thesis. Additional actual examples of bad and/or ambiguous questions would have been more persuasive.
This, of course, does not excuse errors on the part of test producers or scorers.
As for quality control, the immediate release of all test questions after a test is the only real way to ensure that errors are corrected. The cost of redoing items should not be material.
“Additional actual examples of bad and/or ambiguous questions would have been more persuasive.”
In the actual article (behind a pay wall on the AJC site) there were links to PDFs containing much more background, evidence, many of the actual questions, etc.
Ang:
Thanks, I missed that. I see no reason for the pay wall for this type of information.
There’s a major difference between a thesis and an example. The Andrew Lloyd Webber question was an example. The thesis is that there are as many as 1 in 5 problematic test questions. You need to take my debate class so that you can see the differences. Or are you being deliberately obtuse?
LP:
Please. It remains a poor example for the argument that is being made.
As to the thesis, the ACJ article actually states:
“Overall, the newspaper found potential problems with blocks of questions on nearly one in 10 tests given across the country in recent years. and repeats this later in the article: Statistical analyses of more than 1,700 tests given over two years in 42 states and Washington, D.C., suggest a patchwork of problem questions spans coasts.
For almost 9 percent of tests, one in every 10 questions or more showed signs of potential flaws, technical reports examined by the AJC revealed. Most states gave at least one test in recent years with blocks of suspect questions — threatening the tests’ overall quality and raising questions about fairness.
Of course this latter statement suggests that about 1% of all questions “showed signs of potential flaws.”
But perhaps the ACJ editor missed this confusing and contradictory set of statements. The article simply is not very good and is poorly written.
Bernie1815,
You’re right that we shouldn’t have to pay for catching and reporting the items which results in the companies spending more time making the tests, but then again, we shouldn’t even be paying them anything for a product that is so logically invalid that it is considered an educational malpractice that harms many students. See above never rebutted nor refuted study of Wilson.
leave out the “considered”. It just is.
“At the same time, cash-strapped states have struggled to hire and retain staff to provide oversight. Too often, they leave contractors to police themselves, some experts say.”
This scares me, because the reformer response to the Atlanta test scores scandal was to call for increased testing security. All that means to me is pouring more money into testing, and probably hiring contractors to do it, which will just mean more lobbyists, more profit, and more resistance to some brave adult in the room eventually bucking the status quo and calling a halt to the ever-increasing lunacy of this testing.
I also don’t want my kid in some horrible testing “security lockdown” for weeks out of the year. They’ve alternately bored kids to death and scared them to death with these tests already. All we need now is “increased testing security”, so their entire school experience can be eclipsed by testing.
“All we need now is “increased testing security”, so their entire school experience can be eclipsed by testing.”
We can all become good East Germans!!
“CTB declined to discuss the matter.”
Why are teachers held accountable at every step of the way, but a testing company can decline to discuss the matter of a test question with no correct answers? Why are testing companies not held accountable also?
Because they pay off the politicians.
In using “Successmaker” math, there were errors. Errors in answer selection. Errors in information on the screen. Difficulties using interactive tools for problem solutions was another problem. If there isn’t sufficient time for the teacher to preview all the various features of the program, some students will become “lost” when trying to use those tools.
One problem I found was that when a student worked on the “scratch paper” feature to find a solution to operations problems, the program required the student to remember what he/she had written and then to type the answer in left to right. When students solve problems step by step not using the scratch paper feature, they generally write answers from right to left.
Another difficulty is found when using the click and drag tool to outline a perimeter or draw a shape or find a measurement. Sometimes it didn’t work well.
In a timed situation, these types of features will slow some students down.
You’re elucidating just some of the many errors involved in the process of educational standards and standardized testing that render the process completely invalid. And once something is invalid it cannot be valid which then means it can’t be reliable.
Such simple logic is ignored by almost all educators. But, But, we must grade students!! All hail the testing gods. Maybe we need to sacrifice a few vestal virgins or run knotted ropes threw the leaders genitals a la Maya to appease them. Those two courses of action are more logical me than the educational malpractices that are educational standards, standardized testing and the grading of students.
leaders’ not leaders
Deb:
Your test format issue is an excellent one and it is certainly a significant validity threat. When we designed simulation exercises for management assessment centers, we constructed a practice exercise so that nobody would be surprised at the format of the exercise.
You also raise an important issue about time for completing a test. Again it seems to me to be an unnecessary and difficult to justify constraint for this type of KSA test and yet another validity threat. Is the test measuring reading speed or comprehension? If it is both how can it have any formative value?
I might add, as an example, that my brother, who read VERY SLOWLY, did poorly on his college entrance exam. He came to the same college as I did, and he couldn’t get into the program he wished because of his score. He was told to take multiple Foundations classes. Even though I was a student, I happened to be working there as the Foundations Teacher to help Freshmen in Mathematics. I talked to the school administration and explained that I knew for a fact that he was very bright, brighter than I was, and that he might need to take a course to improve his reading speed. So, he enrolled in some of the same classes that I was taking, and I was a Junior. Guess what? He and I raced for the 100% on every test in those classes. He eventually graduated at the top of his class, got his Masters Degree and PhD in botanical sciences and worked as project manager for various corporations. Unfortunately, he died suddenly at age 58, a year ago.
Point? His test scores meant nothing except that he couldn’t read quickly. It turns out that once he DID read something, he remembered it, including Latin spellings for scientific classifications. Tests. Prove. NOTHING.
Deb:
That is a great and compelling anecdote.
I have a question and I do not mean it to be rude or intrusive: Why wasn’t your brother’s problem recognized and addressed before he went to college? You saw the problem and identified the solution. How come his elementary and HS teachers could not do the same thing?
Deb,
“Tests. Prove. NOTHING.”
Exacto.
I mean STANDARDIZED tests mean nothing, not all tests. But, even then, tests are subjective because if we make our own, there can be errors or we can teach to the test. There is no perfect test and no perfect evaluation.
Deb:
I disagree with this line of reasoning. It is true that Standardized Tests are flawed and imperfect. It is also true that they have rather a constrained validity. In addition there may well be alternatives to Standardized Tests. However, the fact that they are not perfect does not mean that they are not useful.
Here is a study that pari passu found that LSAT scores were predictors of the passing of Bar Exams. Obvioulsy, I recognize that passing Bar Exam does not guarantee that the lawyer will be very effective or ethical, but you cannot practice law without passing a Bar Exam.
Click to access NLBPS.pdf
Bernie:
I recognized that there would be objection to that statement. However, I was “in the moment” and just went with it. Of course standardized tests have SOME value, but not “absolute” value. They have always had skewed results.
Because they aren’t absolute, they by no means should be used to determine a person’s entire future. Student or teacher.
As for my brother… he was in the accelerated program all along until 9th grade. He got kicked out because he was fooling around. He didn’t try hard. At that point he was kind of floundering in school. The speed at which someone read was dealt with in our 7th-8th grade classes, but not in high school. His senior year, he decided to get interested in bringing up his grades and going to college. His GPA plus his test scores were not the best. That is why he wound up in that predicament. But, this was 1972 when he graduated. Schools weren’t all that concerned about remediation as they are today. Why didn’t my parents do something? Well, they had had three kids and suddenly they had another set of twins who were in first grade when my brothers were in tenth grade. I guess some things slipped through the cracks.
Also, as great as my school was for the accelerated kids, it was not very good for the average and low kids. And, they made mistakes. This was before Open Records. Parents weren’t allowed to see kids’ permanent record files.
Well, an incident occurred that made me recognize that something was wrong with my GPA. It wasn’t accurate. I was always on Dean’s List, but my cumulative average was below a B level. I went to the counselor and asked what could be done to check it. They showed me the front of my folder for a split second. The sticker with all of my 10th grade grades was incorrect. This included test scores such as ACT. It contained the grades of another person with my name, but a different middle initial. She had a “J” not an “L”. So they changed it. This was after all the information went out to colleges for scholarships, etc. It changed my class rank.
I found out 10 years later that they had made an error on my 11th grade grades, too. So, that impacted my college years, too. I just didn’t know until I was married that they had made 2 errors. I will never know my real GPA oro my class rank. They said it was too late to change it in the archives.
I am surprised I got into college without any problems, given that mess. I am just saying, things are not always what they seem.
Deb:
The anecdote certainly underscores the importance of transparency and openness. Mine was small – 70 in each year – and that mistake simply would not have occurred.
It sounds like your HS was pretty large and somebody in the office did not pay attention to a likely source of error. It would be interesting to find out what happened to your namesake who went through life with your GPA instead of her own. Echoes of Being There.
The people who believe that it is possible to put together a short, simple test that validly measures the GENERAL reading, writing, and language abilities of a student beyond the beginner level are laboring under a delusion. But there’s a problem with trying to explain to them why that’s delusional. These are the sorts of people who like to think that things are simple. You get what you measure. Measure reading and writing and language skills and you will get improvement in reading writing and language skills. What could be simpler, more intuitive, more obvious than that?
But, the simple idea turns out to be simple minded.
A single, short test that validly measures an older student’s general reading, writing, and language abilities turns out to be an impossible construct, like an Escher stairway or a Penrose triangle or a devil’s tuning fork. (If you don’t know what these are, have a look, here: http://en.wikipedia.org/wiki/File:Impossible_objects.svg.) The best that one can hope for is to create a test that is a very, very crude instrument on which no one should rely for making high-stakes decisions. To understand why this is so, one has to know a lot about measurement, about testing, and about the various domains of the English language arts and what knowledge of these domains consists of.
That presents a problem. The people who think that there are simple answers to every problem and that testing is such an answer are also people who are not going to sit through long, complex, technical explanations of why their pet EASE ALL doesn’t work.
So, we have a Catch-22 there.
And, to compound the problem, a lot of the people who buy the CENTRAL DEFORM DOGMA (“Just give a guy a KPI.”) have a lot of investment in and earnings from the standards-and-testing machine. So, that further disinclines them to attend to complicated explanations of why, in reality, their scheme doesn’t work and, in fact, creates precisely the opposite effects from those intended–producing mediocrity instead of merit.
And to compound the problem even further, they have the power to push their ideas through without having to submit to anything as quaint as democratic process or debate.
Sigh
Using the current ELA high-stakes tests to measure general reading, writing, and language ability is sort of like using a sledge hammer and a hacksaw to do brain surgery.
Sorry, that made me laugh …
“. . . on which no one should rely for making high-stakes decisions.”
Or rely on anything other than wasting students’ and teachers’ time and taxpayer money.
KPI + kaolin-pectin inama????
+ should be =
key performance indicator, a key measurement of the success or failure
Sounds like the same thing to me!
BTW, the calamity that is summative testing to the standards should not, I think, blind us to what CAN be learned from applying business principles in education. One can learn a lot, for example, from businesses’ experiences with balanced scorecards and dashboards about how to put together metrics that actually make sense. And one can learn a LOT from the Japanese experience of quality circles, Do-Plan-Check-Act, and other bottom-up quality mechanisms about how, actually, to improve outcomes in U.S. education. But the stupid, crude implementation of the “apply business principles to education” notion that is the current standards-and-testing machine is, I fear, going to put people off the notion that one can learn anything from businesses about how to get real quality improvement.
Robert,
I’m certainly not anti-business practices as I worked outside education till I was 38 and used many of Demings idea in inventory control, production and materials management. Even took a course in JIT (just in time) inventory control and production. Put JIT into practice in purchasing hospital pharmaceuticals and it was vital to have our historical usage data correct and up to date.
That said the teaching and learning process is not inventory control nor production management but a whole different sphere. And while some ideas can roughly transfer realm to realm some others have no place. I think we agree on that. Key performance indicators might better be indicated in the manufacturing realm as key production indicators.
And using production/manufacturing language tends to dehumanize the teaching and learning process and has lead to myriad agregious practices being instituted in education when they shouldn’t have been.
I’ve had the fortune of working WITH enlightened management and working FOR not so enlightened management with the unfortunate thing being that most of my experiences with public school administrators falls in the second category all the while they were spouting platitudes about involving the staff with “decision making” (as long as decisions coincided with previously decided outcomes).
Duane:
You and I differ on many things here, but I certainly agree with the importance and criticality of strong, knowledgeable and ethical leadership.
At the same time, my mind boggles at the demands placed on somebody charged with leading NYC Public Schools or LAUSD. The sheer size and complexity is potentially paralyzing. Alas political hacks may forever have the inside track for many of these positions.
As much as I excoriate administrators I do understand the complexities involved in those positions and the difficulty in doing them even in a half way decent fashion. In the business sector most educational administrative positions would pay much much more considering the scope of responsilibilites and the span of control. But as the saying goes “If you can’t stand the heat get out of the kitchen”.
And, you are quite correct that political hacks as you call them or cronies seem to have the inside track on administrative positions. Kissing ass has never been my idea of being qualified for a postion. And I’ve seen way too much of it in the public education realm.
Years ago, I was judging a high-school speech tournament. One of the categories of competition, as you may know, is dramatic interpretation. Well, one young woman had chosen an exploded language poem by e.e. cummings and performed it as a scat song, in the manner of the incomparable Ella Fitzgerald. It was a perfect marriage of content and delivery–a brilliant concept, brilliantly executed. I gave her the highest ranking. ALL the other judges ranked her at the bottom. They had no idea what they were hearing, and they applied their usual criteria of judgment to the measurement and evaluation of what the performer had done. This is just one example of the many, many ways in which measurement can go wrong. There an be a mismatch between the object being measured and the tool being used to measure it with.
cxs: add had before performed; strike with at the end of that paragraph
cx: Add a c before an.
Oh, and mark my post “not proficient” because of the typos. : )
You’re a FAILURE!
Bob: Interesting anecdote that illustrates your point. This probably happens a lot with such competitions which is why many folks prefer competitions where there is a simpler criteria: speed skating versus ice-skating. It does not make it right or wrong just different.
I would like to see the testing companies make A LOT of money, BTW. I would like to see them making that money with sophisticated, scientifically prepared diagnostics–news that teachers can use.
Bob:
I agree 100%.
Let’s not forget that a virtual charter school in Arizona was discovered outsourcing the grading of essays to India.
If you’re the Gene Glass of the EPAA @ ASU I would like to say thanks for that website. It has one of the most important writings on educational standards, standardized testing and the “grading” of students, Noel Wilson’s “Educational Standards and the Problem of Error”
All here should be checking out that site regularly. One of our principals just today asked us to give her what we regularly liked to read and I responded with EPAA along with this site and a few others like edushyster, etc. . . .
Gene:
Did they grade the essays well or poorly?
Robert D. Shepherd: so the superior performer was severely penalized for being too original and accomplished?!?!?
Wow! How could anyone guess that mismeasurement could make the best look like the worst?
Oops, my bad, Banesh Hoffman, THE TYRANNY OF TESTING (1964).
As for high-stakes standardized testing in general, I remind viewers of this blog of a very recent posting:
Link: https://dianeravitch.net/2013/09/19/fairtest-pearsons-history-of-test-foul-ups/
I heartily second Diane’s recommendation of Todd Farley’s MAKING THE GRADES: MY MISADVENTURES IN THE STANDARDIZED TESTING INDUSTRY (2009).
For a particularly pointed example from the ongoing horror series called “Testolatry Gone Wild,” google “pineapple, hare, Daniel Pinkwater.” For the less adventuresome, just click on the link below—
Link: http://blogs.wsj.com/metropolis/2012/04/20/daniel-pinkwater-on-pineapple-exam-nonsense-on-top-of-nonsense/
“When the right thing can only be measured poorly, it tends to cause the wrong thing to be measured, only because it can be measured well. And it is often much worse to have a good measurement of the wrong thing—especially when, as is so often the case, the wrong things will in fact be used as an indicator of the right thing—than to have poor measurements of the right thing.” [John Tukey, mathematician, Bell Labs and Princeton University] (from THE MISMEASURE OF EDUCATION, Jim Horn & Denise Wilburn, 2013, p.147)
For those who have difficulty understanding this, I suggest they review their Marxist playbook:
“A child of five would understand this. Send someone to fetch a child of five.”
Ok, you got me.
Groucho.
🙂
: )
Although we are strictly forbidden from doing so, I have often perused the Science section of the Wisconsin Knowledge and Concepts Examination while my sophomore homeroom is taking it. I have often been appalled at the poor quality of the questions. Out of every 50, you are likely to find 2 that are so poorly worded as to be nearly impossible for me to understand and 2 more that seem very much to me to have more than one correct answer or no clearly correct answer.
I have marveled at the fact that, in 20 years of writing all my own formative and summative assessments (we’re talking maybe five thousand questions), I have had only 6 instances of a poorly worded or misleading question making it into the hands of my students. And I often write these on the fly, while eating lunch and tutoring students at the same time. One would think that an entire company whose sole job is to produce high quality questions could get it right.
Of course, the quality of the questions is a completely moot point in my school, because the kids will finish that 30 minute test in less than 5 minutes, because they know it doesn’t matter to them at all. So I can’t wait until my state jumps on the bandwagon and uses test results to evaluate me as a teacher.