In this post, teacher Maria Baldassarre-Hopkins describes the process in which she and other educators participated, setting cut scores for the new Common Core tests in New York.

She signed a confidentiality agreement, so she is discreet on many questions and issues.

At the end of the day, Commissioner King could say that educators informed the process but in reality they made recommendations to him, which he was free to accept, modify, or ignore.

As many teachers have pointed out, in blogs and comments, no responsible teacher would create a test with the expectation that 70% of students are sure to fail. It would not be hard to do. You might, for example, give students in fifth grade a test designed for eighth graders. Repeat in every grade and the failure rate will be high. Or you might test students on materials they never studied. Some will get it, because of their background knowledge, but most will fail.

Why would you want most students to fail?

Commissioner King has repeatedly warned superintendents, principals, and everyone else that they should expect the proficiency rates to drop by 30-35-37% and they did.

This is a manufactured crisis. We know who should be held accountable.

It is Commissioner John King and Regents Chancellor Merryl Tisch. They wanted a high failure rate. They got what they wanted.


A response to the post above, by Fred Smith. Fred worked for many years as an assessment experts at the New York City Board of Education. He has now become an invaluable resource for those who are fighting the misuse and abuse of high-stakes testing.

Fred writes:

Kudos to Maria Baldassarre-Hopkins.  This is an extremely important piece–an outline and an articulate account of how the 2013 cut scores were set. We’re finally getting a glimpse inside the testing program’s “black box”–how cut points are/were established.
Three points grab me and support contentions I share with other observers:
First, the cut scores are after-the-fact. “Cut scores were to be decided upon after (emphasis hers) NYS students in grades 3-8 took the tests.”  I believe the standards were set in late June/early July.
Second, the review committee’s work is advisory–Despite the committee’s elaborate review process, the end results are recommendations to the commissioner.
Third –  “(During the review) We were given more data in the form of p-values for each question in the OIB – the percentage of students who answered it correctly on the actual assessment.”  This and the timing of the review strongly suggest that item-level data (item statistics) from the April 2013 operational tests were used to inform the determination of cut scores. That is, data generated by the test population were used–changing the concept of a standards-based test (as in testing aligned with the common core learning standards) to one that depends on the performance of students who took the test. 
This makes the Level 2, 3 and 4 thresholds dependent on how well kids did on the exams–bringing the test score distribution into play and rendering judgments about cut scores and student achievement relative to the composition of the students who took a particular set of items at a particular time–a normative framework instead of a standards-based one.  These factors will vary from year to year, and since 2013 was a baseline year with little it could be anchored to, it is even murkier to see how SED can justify what was done.
Let’s not forget either that the items on the April 2013 exams were largely generated via the indefensible June 2012 stand-alone field testing, a procedure that could not have yielded reliable or valid information to construct the core-aligned statewide tests–and, as a further consequence, would call the item stats the review committee worked with into question.
SED’s slide show presentation to the Regents in late July about the cut scores, this week’s news management spin campaign and its web site power point barrage on the release of the scores do not address important remaining questions about the quality of the 2013 exams and the cut scores. 
There is information the SED obviously has in its possession (and desperately wants to keep hidden), as strikingly noted by Ms. Hopkins. We must demand and obtain: 1- P-values (difficulty levels) for all field test items that were selected for inclusion on the operational April tests–both the field test p-values and the corresponding operational test p-values. 2- In addition, we must have complete item analysis data — showing the percentage of students who chose the correct answers, the percentage who chose each distractor (each incorrect mislead) and the percentage of omissions (no response to item). 3 – We must be given the same information demanded in #1 and # 2 but broken down by ethnicity and separately by need/resource capacity. 
Even if SED refuses to produce all of the 2013 operational items that it owns for our scrutiny, there is no justification for refusal to provide the statistical data we are demanding–because none of the data involve exposure of the items and their content.  SED and Pearson have no legitimate excuses for keeping us in the dark based on the immediate availability and nature of the information we are seeking.
The only way forward for all of us who want to have public schools that work is to cry out for sunshine, transparency and truth-in-testing.   Short of that we can have no faith in anything coming out of Albany about its latest vision of reform.  The messengers of bad news are on the run.  Blow the trumpets. Get your representatives on board.  Don’t let them slip and slide.  This is a pivotal year