Emma Brown of the Washington Post explains that DC test scores in math reached a historic high point because of a decision by DC officials.
The decision was made after D.C. teachers recommended a new grading scale — which would have held students to higher standards on tougher math tests — and after officials reviewed projections that the new scale would result in a significant decline in math proficiency rates.
The choice that D.C. officials faced suggests that proficiency rates — which are used to make employment and pay decisions for teachers and principals and to judge the city’s efforts to improve public education — are as much a product of policymakers’ decisions as they are of student performance.
The lesson in this episode is not that DC officials are not to be trusted, but that the scoring of tests is a matter of human judgment, not science.
The human judgment may be based on political considerations. Or not.
But we must stop believing that standardized tests are a scientific measure, like a barometer or a thermometer.
It is human beings who decide where the passing mark should be.
They may decide to make it easier or harder to pass.
But it is not science. It is human judgment.
Test scores are the result of subjective judgment, not objective measures.
I agree that there are numerous ways to manipulate test scores to receive the results you want. Take LAUSD, the state test scores went down while LAUSD’s scores went up. No one could explain why but I have some thoughts, For instance. When you change the program of special Ed students so they don’t have to have test scores that are included in the general scoring, scores go up. Just saying.Many other adjustments can be made to game the test.
There was cheating going on too . There were a lot of eraser parties at my school. I imagine any schools where Broadies were installed to drain the Grants from SLC -AYP gains was inclined to cheat. Well the proncipals. But these schools sank otherwise. Top down despots on a disruotive tear . Eli Broad admits..he and his pals know nothing about teaching. “But we know about Managing.” i beg to differ unless it is tge Fascist school of management
At this point this article is just one more necessary documented statistic on the fallacy of all the “testing miracles”. Walter Haney of Boston College uncovered a few of these scams many years back (remember the Florida and Texas miracles). This is one more “corporate ed” stat which uses their own “ed reform” stats against them in my opinion!!Testing under punitive conditions just leads to learning how to “better game” the system. This has been shown too with all the hype over cheating. Think NYC a few years back when it was realized that major gains were really just testing companies being led to create just a few more easier questions on the tests and this drove up the scores. As I read Diane Ravitch’s latest book, I had a thought… maybe middle class parents across America are finally growing tired of “ed reform” nonsense too.. kids wasting their valuable learning time on taking these high stakes tests. But most middle class parents cannot afford private school and may want their kids anyhow to remain with childhood friends in their communities. What would happen if hoards of middle class kids with “the modicum of right academic credentials” kept private schools very busy by applying to these schools en masse as a form of protest (asking for scholarships of course). Elite private admissions would be very busy. And of course any title one students able to apply (many have limited access to computers, have not been able to take requisite coursework because their schools do not offer these courses etc).. could apply too. So then I wonder how “corporate ed reformers” would respond to one of their favorite motto’s CHOICE THROUGH VOUCHERS… Would they be “outted” by their own words? Choice for title one students and most middle class students does not mean “choice to attend Dalton, Milton Academy, Sidwell Friends. And for title one students, IT MEANS CHOICE OF unregulated charter schools that will accept your child (and they in no way compare to the “Daltons”)….. In fact the schools the title one students would be able to “select” would not even be on par with second and third tier private schools that the “elite” would not even consider anyway. Just a pipe-dream that is probably filled with flaws.. but this latest Ravitch book should make us all think of ways we can take the “offensive” position against “ed reformers” who are threat to democracy and our national education system. They have been “leading the way” toward a destructive path for TOO LONG.
“. . . maybe middle class parents across America are finally growing tired of “ed reform” nonsense too.. . .”
I’ve been saying since the beginnings of NCLB that this standardized testing regime insanity wouldn’t change until the axe started to fall on the middle to upper class districts. It took about five more years than I anticipated as some stays of executions took place so that the upper class schools wouldn’t look so bad, but now it’s in full swing. . . and the cries of those that matter the most, the upper class, are coming out. Good I say to that but still hypocritical in that when the axe chopped up the poor urban districts few cared.
Opt out! If parents want this to end tbey can opt out. They just tell the school they do not want their kid tobtake no stinking tests. I do it woth my son. He loves that!
I’ve said this before on this blog.
Here’s the dirty little secret of summative testing of this high-stakes variety: Tell me what results you want, and I will
a. Design a test to give you those, or
b. Create cut-off scores to give you those, or
c. Create a raw-to-scaled score conversion scale to give you those
All one has to do for a dramatic demonstration of this is to graph the raw and scaled scores used in New York state ever since the passage of NCLB. These are supposed to be in LINEAR RELATIONSHIP, or would be if one had done a conversion of the raw scores into standard scores. But if you graph the raw and scaled scores from the conversion charts used in New York since NCLB, you will find that the resulting graphs are
a) completely different for each test and grade level;
b) jump all over the place wildly–I mean WIDELY, like gerbils on methamphetamine;
c) place the cut-offs arbitrarily.
In several cases, the scaled versions of the scores made scores barely above what one could get from simply choosing answers at random into ones above the proficiency cutoff.
In other words, the scales were cooked to give the results that people wanted–that steady annual yearly progress (AYP). This has been so for a long time in New York and has been done elsewhere in the country as well. And the geniuses at the federal Department of Education to whom these scores are reported weren’t able to figure that out, even though any first-year statistics student would have been able to show them the problem.
Ah, it’s a wicked world.
cx:
a) completely different for each test and grade level
above, should read
a) completely different for each test and grade level from year to year
‘Tis all mental masturbation Robert, mental masturbation, which unfortunately harms many students and by extension teachers. And as we were warned by the nuns in grade school, it’s made many people blind to the irreality of it all (never thought I’d agree with the nun’s on that one!).
So, what we’ve gotten from the states for all our NCLB money is a lot of statistical chicanery, a lot of “trust us, this is mystical math stuff.”
I think it is useful to distinguish between the scores on an exam and how those scores are labeled. This article is primarily about the labeling of scores.
Yes, TE. But it does raise the more general issue of ways in which the raw scores are used/interpreted.
Using and interpreting does require a labeling of sorts, but it seems to me that a number of posts here confuse the measurement of a thing with the thing itself.
I believe, TE, that my post addressed the intentional labeling of scores in such a way as to mislead as to what those scores meant.
TE,
To misquote you: “I think it is USELESS to distinguish between the scores on an exam and how those scores are labeled” except to point out the sheer utter insanity in discussing INVALID scores unless to point out that insanity in order to rid the world of such noxious educational malpractices.
TE, have you read Wilson’s “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 ?? If so, do you have any rebuttals to or refutations of what he has to say? And if not how can one logically and validly discuss these supposed “problems” with cut scores, score interpretations, etc. . . .
And you know what’s coming next!! Yes the Quixotic Quest Bandwagon has arrived!!
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
Duane,
Suppose one fine afternoon you engage in a conversation in Spanish with a gentlemen you find by the river fishing. The conversation quickly turns to a discussion of the wels catfish. After a half hour you part company.
This interaction happened on a particular day at a particular time over a particular subject. Could you conclude that the gentleman you spoke with had a native speaker fluency or would you have to say that the gentleman might be fluent on that day, but who knows about tomorrow?
Well, first off I would have learned a bit from the “caballero” about a species I do not recall hearing about but after looking it up I realized that I had indeed heard/read about said species.
So can you say that I was ignorant/did not know of the species at that particular time or is it perhaps that the human mind doesn’t work in a very linear way in the sense of learn something one day and be fluent in it and still be able to remember and converse about it a year or ten later?
And yes, I believe I could determine if the caballero was/is fluent, whatever the defining definition of that is as there are many types of “fluency” in any language. Hell, I could hardly understand some Bostonians’ English for their Baahhhston accent.
And who knows “What Tomorrow Will Bring”, eh.
Well, your point was a long time coming, but, in the end, Duane, you are dead on the mark. A big AMEN! to your final paragraph.
It is pure fantasy to think this was not on purpose just as driving off the low performers is good for raising test scores when they really did not go up. At LAUSD 117,000 students do not come to school everyday. They are the low performers. Let’s say they come back to school which also means more than 4,000 teaching jobs and others to support them. LAUSD just bragged about a 3 point API gain. What’s to brag about? That is really terrible. What if those 117,000 came back and were tested? What would the API look like now? Why much lower of course. This is the true API.
We at CORE-CA prefer to have the students back in school than on the street getting in trouble with lower API scores. If the students are in school and not on the street and ending up with either no chances to get ahead except with crime or very low paying jobs with no better prospect especially since they have cut off adult education almost completely or in the Sheriff’s hotel, jail. We have discussed this with Sheriff Baca and his top officials many times and they hope we win and they have less business. The top people in Sheriff Baca’s Dept. are working hard to prevent more crime and returning to jail. They have an education program to prevent them from coming back which includes basic education, job skills and life training such as anger management and such.
Would you be in favor of lower test scores and almost all students in school? We believe that more students in school with at least a high school diploma will result in a much better society with a lot less crime through less stress in life for the citizens.
“but that the scoring of tests is a matter of human judgment, not science. . . . The human judgment may be based on political considerations. Or not.”
BINGO!
“But we must stop believing that standardized tests are a scientific measure, like a barometer or a thermometer. . . . It is human beings who decide where the passing mark should be.”
BANGO!
“They may decide to make it easier or harder to pass. . . . But it is not science. It is human judgment. . . .Test scores are the result of subjective judgment, not objective measures.”
BOINGO!
Please read the book Mismeasure of Education by James Horn . It is a revelation about the complexity of tests, stats and academic acheivement , which we would have much more of it we obeyed the Brown vs School Board verdict.
In her book, Diane says reflects on that “reformers” always say.. that they are defending the “Civil Rights” issue of our time! She rightly questions this absurdity and comments, “It defies reason to believe that MLK would march arm in arm with Wall Street hedge fund managers and members of ALEC to lead a struggle for the privatization of public education, the crippling of unions and the establishment of for-profit schools…” (p.55) Her words should be said over and over repeated like a mantra to the “corporate ed reformers”… how could they argue this point I wonder? They would have to get yet another top PR expert on the payrolls I guess!
Yet, in my experience, we are told that the standardization of test makes the results more objective. Constant misrepresentation.
Look what this does to students’ abilities to succeed when applying for college. Look what this does to distort the truth. Look at the number of students who drop out of college. Look at the lives ruined by misinformation.
And, of course, it is the fault of the teacher(s).
A dilemma being faced by many schools is the “third grade guarantee” in reading. There are often so many students who need help that districts are overwhelmed at the expense of needing to hire additional teachers … or they just make class sizes larger.
I am not saying “social promotion” is the answer. However, I would prefer for districts to spend their limited dollars on adding transition grades for those students who are simply not succeeding well than to waste dollars on perpetual “high stakes testing”.
It is all about where the money goes in the end, anyway. That is the reason for this entire blog. Money going to 1) private corporations and 2) testing companies and their subsidiaries. Yet all the while there is a complaint about teacher salaries as being “throwing money at education”.
Sad. Just sad.