Leonie Haimson demonstrates the disconnect between the Boasting of officials in New York City and State about test scores and the NAEP flatlines of the city and state.
To make matters worse, the state says that it is impossible to compare the scores between 2017 and 2018, because the test timing changed. But then the state and the city proceeded to boast about the “gains” between those years.
She adds:
“Here are some additional questions that I would have asked the Commissioner and/or the Mayor if I’d had the chance:
“How can NYSED or DOE or mayor claim progress has been made, if as clearly stated that as a result in the change in the tests, this year’s scores aren’t comparable to previous years?
“Why did they so radically change the scoring range, from a maximum of about 428 to about 651 this year?
“Why does the state no longer report scale scores in its summaries, rather than proficiency levels which are notoriously easy to manipulate?
“Where are the NYSED technical reports for 2016, 2017, and 2018 that could back up the reliability of the scoring and the scaling?
“Why was the public release of the scores delayed though schools have had student level scores t for a month?
“How were the state vs the city comparisons affected by the fact that opt out rates in the rest of the state averaged more than 18% while they were only about 4% here?
“Finally, how can either the state or the city claim that these tests are reliable or valid, when neither the scoring nor the trends have been matched on the NAEPs, in which NYC scores have NEVER equaled the state in any category and results for the state & city have fallen in 4th grade math and reading since 2013?
“Though the Mayor apparently tempered his tone at this afternoon’s press conference, according to Twitter he apparently claimed that he expects next year’s scores to show significant gains because those 3rd graders will have had the benefit of Universal preK.
“Sorry to say I won’t trust the state test results next year either. We will have take those scores with several handfuls of salt too — and wait for the 2019 NAEP scores to judge their reliability.“

““Why does the state no longer report scale scores in its summaries, rather than proficiency levels which are notoriously easy to manipulate?”
Those “scale scores” are also “notoriously easy to manipulate”. Standardized test supporters want all to think it’s just a simple math conversion. It is not as someone has to determine the conversion factor-yep chock full of subjectivity. The conversion factor/number may be divulged but the factors that went into determining it are never divulged-at least that I know of.
LikeLike
My god, so much to comment on in this post. Please excuse the number of my posts but right off the bat let’s look at ““Finally, how can either the state or the city claim that these tests are reliable or valid, when neither the scoring nor the trends have been matched on the NAEPs, in which NYC scores have NEVER equaled the state in any category. . . ”
NAEP suffers all the inherent onto-epistemological errors and falsehoods, and psychometric fudgings that plague all standardized testing that renders the usage of any of the results COMPLETELY INVALID.
To rely on NAEP as a supposed gold standard test is a fool’s reliance. Ay ay ay ay ay!
LikeLike
“Where are the NYSED technical reports for 2016, 2017, and 2018 that could back up the reliability of the scoring and the scaling?”
As Wilson has shown reliability means nothing when the the tests are invalid to begin with. Without validity, and there is none-even for NAEP, everything else is irrelevant and it’s just a bunch of mental masturbation, a waste of time and energy to pretend that the scores mean anything at all.
LikeLike
Diane I didn’t post Fred Cohens statement I forwarded it to you and others. And I didn’t ask him for permission – perhaps you should?
Leonie Haimson Class Size Matters/ Parent Coalition for Student Privacy http://www.classizematters.org http://www.studentprivacymatters.org leoniehaimson@gmail.com
Follow me at @leoniehaimson
>
LikeLike
I will delete it. I assumed it was posted on the NYC Parent list serv
LikeLike
“I must say that I am now totally confused about the meaning of proficiency. The title of the Power Point is ‘Measuring Student Proficiency in Grade 3-8 English Language Arts and Mathematics.’”
The title points to another onto-epistemological falsehood, another misuse and abuse of the English language. The attempt to “measure” what a student learns. I’ve posted this before but it bears repeating:
The most misleading concept/term in education is “measuring student achievement” or “measuring student learning”. The concept has been misleading educators into deluding themselves that the teaching and learning process can be analyzed/assessed using “scientific” methods which are actually pseudo-scientific at best and at worst a complete bastardization of rationo-logical thinking and language usage.
There never has been and never will be any “measuring” of the teaching and learning process and what each individual student learns in their schooling. There is and always has been assessing, evaluating, judging of what students learn but never a true “measuring” of it.
But, but, but, you’re trying to tell me that the supposedly august and venerable APA, AERA and/or the NCME have been wrong for more than the last 50 years, disseminating falsehoods and chimeras??
Who are you to question the authorities in testing???
Yes, they have been wrong and I (and many others, Wilson, Hoffman etc. . . ) question those authorities and challenge them (or any of you other advocates of the malpractices that are standards and testing) to answer to the following onto-epistemological analysis:
The TESTS MEASURE NOTHING, quite literally when you realize what is actually happening with them. Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement:
“Physical tests, such as those conducted by engineers, can be standardized, of course [why of course of course], but in this volume , we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.” [my addition]
Notice how he is trying to assert by proximity that educational standardized testing and the testing done by engineers are basically the same, in other words a “truly scientific endeavor”. The same by proximity is not a good rhetorical/debating technique.
Since there is no agreement on a standard unit of learning, there is no exemplar of that standard unit and there is no measuring device calibrated against said non-existent standard unit, how is it possible to “measure the nonobservable”?
THE TESTS MEASURE NOTHING for how is it possible to “measure” the nonobservable with a non-existing measuring device that is not calibrated against a non-existing standard unit of learning?????
PURE LOGICAL INSANITY!
The basic fallacy of this is the confusing and conflating metrological (metrology is the scientific study of measurement) measuring and measuring that connotes assessing, evaluating and judging. The two meanings are not the same and confusing and conflating them is a very easy way to make it appear that standards and standardized testing are “scientific endeavors”-objective and not subjective like assessing, evaluating and judging. They are not! That supposedly objective results are used to justify discrimination against many students for their life circumstances and inherent intellectual traits. Which is an abomination and should be viewed as the educational malpractice that it is.
LikeLike
““We all can appreciate what SED has done to improve testing in New York State.”
More namby-pamby nonsense. Improving a malpractice only makes it worse. Again I’ve posted this before but it needs repeating:
Doing the Wrong Thing Righter
The proliferation of educational assessments, evaluations and canned programs belongs in the category of what systems theorist Russ Ackoff describes as “doing the wrong thing righter. The righter we do the wrong thing,” he explains, “the wronger we become. When we make a mistake doing the wrong thing and correct it, we become wronger. When we make a mistake doing the right thing and correct it, we become righter. Therefore, it is better to do the right thing wrong than the wrong thing right.”
LikeLike
It’s easy to manipulate the test results, specially when the details are kept a secret.
I still remember when G. W. Bush was governor of Texas and he bragged about the improvements to the public high school graduation rate there due to the results of a standardized state test that was set at 4th grade. Students had to pass that test to prove they were qualified to graduate from high school and that bar was 4th grade math and reading level.
California set its limit for a high school graduation requirement at 9th grade for reading and math.
Not all the states used these types of tests to determine if a student was ready to graduate from high school but about half did, and I read that California had the highest bar at 9th grade and Texas had the lowest bar at 4th grade.
The average reading level in the US, I’ve read, is 5th grade so California determined 9th grade was the level necessary for a child to become a literate life long learner. Instead of going for better bragging points by setting the bar lower, the set it higher.
But it wasn’t common knowledge at the time what those limits were — except in California.
While in Texas Bush bragged about what a great job he did as governor to increase the level of students that proved they were ready to graduate from high school.
Comparing California to Texas, California was criticized because it wasn’t as successful as Texas, but it was never common knowledge that they were comparing a 4th grade level with a 9th grade level, thanks to the media and the GOP’s usual misleading propaganda.
LikeLike
Apropos of Fred Cohen’s observation, the Grade 6 ELA results for New York City are also very screwy. The percentage of students deemed proficient this year is 48.9%. It was 32.3% last year. That’s a 16.6 difference– or a shift of from nearly one-third to one-half of (65,000) sixth graders who are “proficient.” Surprisingly, differences of the same magnitude hold for all ethnic groups. In no other grades is the change more than 8.0.
[I know we’re not supposed to compare the 2018 results directly with the 2017 results. Still that’s a striking and singular difference since the same publisher produced both tests under a $44 million, 5-year contract.]
And how does this useless testing program serve educators who are judged by such inexplicable data and who must design programs to meet the academic needs of students–based on such shaky (as in meaningless) information???
LikeLike
Ka-CHING … high stakes testing.
The entire DEFORM movement is meant to DEFORM Public Education and get back to JIM CROW Laws.
LikeLike
Can’t agree with the “get back to JIM CROW Laws.” The Deform Movement isn’t about race, although, it can result in disparate treatment. The Deform Movement is about making money, that is the main goal.
LikeLike
I assume all the technical information is owned by the company that provides the test so parents, students and teachers have no rights to access this information. If the test were produced by the NYSED, I would imagine interested stakeholders could file a “Freedom of Information Act” request with NYSED. This is another reason why students should not be subjected to useless testing since the goal is not to help them or their teachers. The goal is to collect data and sell it to create a revenue stream for the testing company.
LikeLike
Released math and ELA test item from Spring 2018:
https://www.engageny.org/resource/released-2018-3-8-ela-and-mathematics-state-test-questions
LikeLike
Here is one, typical item, grade 8 ELA:
What does paragraph 9 mainly reveal about he narrator?
A) She thinks of nature as calming
B) She is attached to familiar things and is close to her family
C) She love language and has a vivid imagination
D) She pays attention to yearly patterns
Paragraph 9 (Excerpt from: Winter Wheat by Mildred Walker)
“One heavy dark Northern Spring . . . fifty two. The words came so fast they seemed to roll downhill. Nobody ever calls it all that; just spring wheat, but I like the words. They heap up and make a picture of a spring that’s slow to come, when the ground stays frozen late into March and the air is raw, and the skies are sulky and dark. The “Northern” makes me feel how close we are to the Rockies, and how high up on the map, almost to Canada.”
In the 7 items released pertaining to this passage, 6 test items used the terms “best”, “most”, and “mainly” which attest to the subjective nature of what should be purely objective MC test items.
LikeLike
“this passage” meaning the full 21 paragraph excerpt
LikeLike
So if a 13 year old gets this item wrong, what could it possibly mean to a teacher?
Evaluating teachers on such un-teachable skills is completely preposterous.
LikeLike
The NYTimes missed a teachable moment in their article by Eliza Shapiro. (https://www.nytimes.com/2018/09/26/nyregion/test-scores-new-york.html?emc=edit_th_180927&nl=todaysheadlines&nlid=268252670927) Too bad they don’t ask Duane Swacker or RaceAgainstThe Testocracy to weigh in! Here are the closing paragraphs:
“When the organization releases its next results in the spring, New Yorkers could get a clearer sense of how much to rely on state test scores.
This year’s results will no doubt influence fresh questions about education policy in New York, including the state Board of Regents’ decision about whether to use exam results in teacher evaluations. Those debates will reveal much about how New York, considered one of the country’s education reform capitals just a few years ago, is tacking in an entirely new direction on education policy.”
I don’t know how another year of test will give anyone a “clearer sense of how much to rely on test scores”…. but from the closing paragraph it looks like the Regents are trying to find some kind of fig leaf to cover their desire to introduce VAM in the same way Senate is hoping the FBI investigation will enable them to support Brett Kavanugh…
LikeLike