Leonie Haimsom, founder of Class Size Matters and board member of the Network for Public Education and New York State Allies for Public Education (NYSAPE), warns that the test scores released by New York are not to be trusted. She argues that “we are entering a new era of mass delusion and test score inflation- including cut score manipulation.”
She offers evidence for her assertions.
State Commissioner MaryEllen Elia released the scores and advised readers not to compare the Scores of 2016 to earlier years, then immediately made the very comparisons she cautioned against. Chancellor Farina celebrated the astonishing growth in the city’s ELA scores.
But Haimson takes a close look and concludes that state officials are manipulating the data. She reminds readers that state scores went up at a dizzying pace from 2002-2009, leading Mayor Bloomberg to boast about a New York City “miracle.” But in 2010, after an independent investigation, the state admitted that the tests had become easier, the passing mark had been lowered, and the dramatic gains had been a hoax. Once the scores were corrected, the gains of the Bloomberg-Klein era disappeared.
Something similar is happening now, write Haimson.
“There are four ways to artificially boost results on exams:
1. Make the tests shorter
2. Allow more time to take them
3. Make the questions easier
4. Change the cut scores and/or translation from raw scores to performance levels.
“It appears that the state made at least three out of the four changes listed above. We won’t know if the questions were harder or easier until the state releases the P-values and provides other technical details.”
These are serious charges. It is now the responsibility of Comissioner Elia, the Board of Regents, and the State Education Department to demonstrate the validity and integrity of the tests.
This is how Commercial Ed will work.
They need something … anything … positive to say in their commercials.
They will fluff up test scores the same way they fluff up toilet paper.
Will we start seeing “New and Improved!” or “10% More Test Questions Included!” on the outside of test booklets?
You can bet your “bottom dollar” on that …
The fact that these tests are so utterly politicized makes them meaningless. As long as tests are used as political footballs instead of for providing useful feedback for teachers, kids, and parents, there is no justification for giving them. Everything else is just predictable obfuscation that makes it impossible to derive any meaningful information from the entire sordid affair.
Standardized tests are not that useful anyway, not to teachers in any case. With so many opportunities to manipulate scores, they are less credible than ever.
However, the test scores are VERY useful for those who wish to publicly place blame and methodically rid themselves of “old” (read: resistant) teachers.
Few teachers take these tests vey seriously for a number of reasons.
Test burn-out comes in many forms and at many levels. For 15 years we’ve been held at gun-point by these tests and by the whims of the NYSED psycho-magicians. After a while you just want to say, either pull the dam trigger or put your gun down.
“The fact that these tests are so utterly politicized makes them meaningless.” See my post below as to what makes the results “meaningless” in other words completely invalid. We have known of that complete invalidity for at least 20 years now since Wilson wrote his dissertation.
“Everything else is just predictable obfuscation that makes it impossible to derive any meaningful information from the entire sordid affair.”
I call that “predictable obfuscation” mental masturbation!
retiredteacher,
“With so many opportunities to manipulate scores, they are less credible than ever.”
They haven’t ever been credible due to all the inherent onto-epistemological errors and falsehoods along with “opportunities to manipulate scores”, i.e., psychometric fudging that Wilson has identified. See below for link.
As Leonie Haimson points out NYS testing has been a shell game for years. In 2013, NYS rushed to introduce common-core aligned tests before PARCC itself was ready to roll out its own assessment—that would come in the 2014/2015 school year– before teachers had had any meaningful training in the new standards, and before students had had much exposure to them. The decision was a disaster, creating a backlash among parents, principals and educators of all stripes. (And those tests were in addition to the introduction of the MOSL an entirely separate round of tests, the sole purpose of which was teacher evaluations.) Ever since that hasty 2013 roll out, the “common core” tests have changed substantially every year. In 2013 and 2014 I received copies of the 6, 7 and 8 grade ELA tests and wrote about them here: https://andreagabor.com/2013/06/03/unwrapping-new-york-states-new-common-core-tests/
And here: https://andreagabor.com/2014/06/11/unwrapping-new-york-states-latest-common-core-tests/
There is no way you can make an apples-to-apples comparison of these tests. Maybe rotten apples to rotten oranges…
Little is said about the psychological and motivational effect the opt-out students have on the opt-in students. When 50% of your fellow students (and some of your friends) are watching movies or reading their favorite book while you are sitting with a Pearson academic death trap in front of you, the same test that you failed the last three consecutive years, maybe just maybe your good old positive test taking attitude gets sidetracked. Don’t forget, it’s no secret that there is a four year moratorium on the use of the test scores which makes them nothing more than a mosquito buzzing in your ear for 90 minutes.
Leonie
I would strongly recommend extending your research on test score inflation to the Common Core algebra I and Common Core ELA (grade 11). If you think the 3 to scores are “juiced” try comparing the pre-high school scores to the high scores which are required for graduation. Compare grade 8 CC math to grade 9 CC algebra if you want to see areal test score “miracle”.
“3 to 8”
Rage,
I second Leonie examining the CC Algebra 1 exam AND the CC ELA 11 exam results The conversion chart for English was especially lenient. Election year bump?
Nimbus
You know that the politicians could never withstand the blowback if Common Core high school graduation tests had the same failure rate as the 3 to 8 exams. So they pick the youngest students to stigmatize as failures and then leave them wondering how they got so much better by high school. This can’t sit well in many adolescent brains. Kids are pretty jaded these days, so maybe they just file it under “BS”.
Test score “inflation” implies that there is some sort of proper, natural distribution of scores to begin with that teachers, schools, etc. are artificially inflating. The assumption is that there should always be some kids who fail. Maybe we need to re-consider that assumption.
The only things that remain consistent and comparable about the testing program are the self-serving press releases that NYSED and the NYCDOE put out about the results. The only question is how much the level of distortion and disinformation goes up each year.
What is it with these people? In the old days, it was tar and feathers, and a ride out of town. Our children’s lives are too precious to be wasting all of this time and money on what amounts to a bunch of nonsense.
“Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other words all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
Agree, we need to make the tests longer and more difficult again. Parents won’t stand for these artificially boosted scores!
NYSED will be moving ELA to March with math in April in order to reduce test taking fatigue. And Questar Assessments will be writing the new and improved version. This of course will make the inevitable year to year comparisons less than meaningless. The tests are now a laughing stock within the profession.
Funny how NYSED keeps the 4th and 8th grade science scores very quiet and under the radar.
The cognitive dissonance produced by the disparity in the scores – science v. ELA/math – would not help their testing credibility very much.
Can anyone at NYSED explain how only 30% of 8th graders pass the Common Core math (pre-algebra), yet 70+% of 9th graders pass the Common Core algebra? Cognitive dissonance anyone?