Bernie Horn: The Common Core “Results” Are Not Actually Test Scores: MUST READ!

This is a terrific article about the Common Core test results. It explains in layman’s language how the test scores are calculated and converted to scale scores.

When you read the “results” in the newspaper or get the results for your child or your class, you need to understand that the “scores” are not really scores:

The only things that have been released are percentages of students who supposedly meet “proficiency” levels. Those are not test scores—certainly not what parents would understand as scores. They are entirely subjective measurements.

Here’s why. When a child takes a standardized test, his or her results are turned into a “raw score,” that is, the actual number of questions answered correctly, or when an answer is worth more than one point, the actual number of points the child received. That is the only real objective “score,” and yet, Common Core raw scores have not been released.

Raw scores are adjusted—in an ideal world to account for the difficulty of questions from year to year—and converted to “scale scores.” A good way to understand those is to think of the SAT. When we say a college applicant scored a 600 on the math portion of the SAT test, we do not mean he or she got 600 answers right, we mean the raw scores were run through a formula that created a scale score—and that formula may change depending on which version of the SAT was taken. Standardized test administrators rarely publicize scale scores and the Common Core administrators have not.

Then the test administrators decide on “cut scores,” that is, the numerical levels of scale scores where a student is declared to be basic, proficient or advanced

The cut scores are the passing marks. They are arbitrary and subjective decisions made by fallible human beings. They can raise the passing mark to create large numbers of “failures,” or they can lower the passing mark to create a “success” story, to celebrate their wonderful policies. In some cases, the cut score is set high, so many students “fail.” The next year, or year after, the cut scores are lowered, and HOORAY! Our Wise Leadership Has Created Success!

As Horn writes:

Now, when a news story says that proficiency percentages were “higher than expected,” you should know what was “expected.” The Common Core consortiums gave the strong impression that they would align their levels of “proficiency” with the National Assessment of Educational Progress (NAEP) nationwide standardized test. (That is, by the way, an absurdly high standard. Diane Ravitch explains that on the NAEP, “Proficient is akin to a solid A.”)

Score setting is a subjective decision, implemented by adjusting the scale and/or cut scores. If proficiency percentages are “higher than expected,” it simply means the consortium deliberately set the scores for proficiency to make results look better than the NAEP’s. And that is all it means.

It is no different from what many states did to standardized test results in anticipation of the Common Core exams. New York intentionally lowered and subsequently increased statewide results on its standardized tests. Florida lowered passing scores on its assessment so fewer children and schools would be declared failures. The District of Columbia lowered cut scores so more students would appear to have done well. Other states did the same.

The bottom line is this: The 2015 Common Core tests simply did not and cannot measure if students did better or worse. The “Smarter Balanced” consortium (with its corporate partner McGraw-Hill), the only one to release results so far, decided to make them look better than the NAEP, but worse than prior standardized tests. The PARCC consortium (with corporate partner Pearson) is now likely to do the same. It’s fair to say the results are rigged, or as the Washington Post more gently has put it, “proficiency rates…are as much a product of policymakers’ decisions as they are of student performance.”

You MUST MUST MUST MUST open the link to the cut scores announced by the Smarter Balanced Assessment Consortium, which Horn helpfully supplies. Scroll down to pp. 5-6. You will see that the cut scores predict that most students will “fail” in every grade. Only the top two levels are considered “passing,” that is, proficiency and advanced. In third grade math, 61% are predicted to “fail.” In fifth grade math, 67% are predicted to “fail.” In eighth grade math, 68% are predicted to “fail.”

The ELA predicted failure rates are slightly better, but even there, the majority of students are expected to “fail” because the cut score was so high.

If they chose different cut scores, the proportion passing or failing would be different, higher or lower.

This is not unique to the Common Core tests. This is the way all standardized testing is graded.

You can see how easy it is for political figures to manipulate the passing rates to their advantage.

howardat58 says:

September 13, 2015 at 10:33 am

The Queen of Parccs “Scores mean what I say they mean.”
and
New VAM = VAPM = Value Added Play-Doh Modeling
We have no faith in the Value, no idea what it is Added to, and it can be indefinitely reModeled.

(This is my comment on the latest Deutch29 post. It applies here.)

LikeLike

Duane Swacker says:

September 13, 2015 at 6:27 pm

TAGO dat!

LikeLike

concerned mom says:

September 13, 2015 at 10:44 am

Do they release the raw scores needed for each level? What about details about the level of difficulty of the questions?

In my state, the 3rd grade students have to answer 75% to be on grade level. They claim a group of teachers review the problems and that is how the cut scores are set.

There are two levels above grade level.

I often wonder how they determine a student is above grade level –

Are there questions on the test that are above grade level and that they wouldn’t expect an child who is just on grade level to get correct? – If that is the case, these types of questions could make a child nervous and spend too much time answering which in turn could make a child score lower than if they just had a l grade level questions. Do they just assume if a child gets more than 75% of grade level question correct, they are above grade level? That make no sense to me.

I believe there could be a child who spent a lot of time figuring out a hard problem and got it correct, but made errors on easier problems or ran out of time. That child who got caught up in solving a hard problem may get a lower score when compared to other students, but actually may have great problem solving skills.

From what I found for for my state, it does not seem as though problems are weighted.

The more I try to figure out what these test scores mean, the more confused I get and the more Duane’s Wilson comments on grades in general come to mind..(I admit I never read it all, but it has been posted enough, that the general idea has sunk in…:) )

Duane Swacker says:

September 13, 2015 at 6:31 pm

If I may help clue you in IM. It’s all bullshit, any of their blatherings no matter of what kind, it’s all BS. When one starts with falsehood and/or invalidity one ends with falsehoods and invalidities, and that holds true for this whole process of educational standards and the accompanying standardized tests.

It really is that simple.

See below for further explanation.

LikeLike

Betty says:

September 14, 2015 at 12:58 am

Teachers do set cut scores in my state. We don’t take PARRC or Smarter Balance. To set the cut score with a new test all the questions are ordered by the percentage students get correct. Then teachers decide at what point the questions represent a passing score and we bookmark it. We do it 3 times then try to come up with a consensus on the cut scores. Our recommendations are sent to the legislature and state education office who vote to accept our recommendations which they always do.

LikeLike

- Cap Lee says:
  
  September 14, 2015 at 8:33 pm
  
  This doesn’t make any sense. Doesn’t say anything about what a child can do , has learned or has to learn. Does pass mean the questions they didn’t get correct will be left un taught? Does fail mean they are just called stupid and move on to the next year? Does anyone understand the rhetoric here? Smart or stupid, pass or fail.
  
  Kids blossom at different rates and in different ways. If genius unfolds later, they still fail and never get to show their genius. Let the brainwashing end. Not only unethical, it is immoral!
  
  LikeLike

retired teacher says:

September 13, 2015 at 10:46 am

The tests based on the Common Core are a lot like the high jump. The higher the bar, the fewer will make it over. The big determiner is POLITICS. That’s real authentic assessment for you!

In Florida we have the subjective letter ranking of schools thanks to Jeb Bush. When schools saw ratings go from A to D in some cases, the realtors went crazy. They can sell houses more easily with an A or B rating so the state passed a law stating that a school can drop only one letter grade in a year. The “problem” was solved.

Chiara says:

September 13, 2015 at 11:07 am

“The tests based on the Common Core are a lot like the high jump. The higher the bar, the fewer will make it over.”

It just seems like NCLB all over again. They get the testing they want and then there’s no follow-up funding or support.

I can’t imagine what they’re thinking. 32 states have cut public school funding and they put in higher standards? So we actually get LESS support and a higher bar?

If you look at ed reform over the last 15 years the “accountability” people have all the power- they drive everything. The testing is always the first priority and once that’s in they never get around to all the other promises. I think that’s a measure of how little influence the funding and support people have in that “movement”. They never get anything. They lose and lose and lose. They’re either the worst advocates on the face of the planet or they have absolutely no influence within the “movement”.

How many times are they going to get played? We always get the tests but we never get any upside.

LikeLike

- retired teacher says:
  
  September 13, 2015 at 11:24 am
  
  They have to offer something more than blame and punish. More testing does not translate into better outcomes for students. While money alone won’t solve the problems, cutting funding never made anything better.
  
  LikeLike

KrazyTA says:

September 13, 2015 at 11:12 am

I have posted the latter part of this before on threads, but it has never been so apt. I include a bit more of the beginning:

[start]

“I don’t know what you mean by ‘glory’,” Alice said.

Humpty Dumpty smiled contemptuously. “Of course you don’t- till I tell you. I meant ‘there’s a nice knock-down argument for you!’”

“But ‘glory’ doesn’t mean ‘a nice knock-down argument’,” Alice objected.

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean- neither more nor less.”

“The question is,” said Alice, “whether you can make words mean so many different things.”

“The question is,” said Humpty Dumpty, “which is to be master-that’s all.”

[end]

(Lewis Carroll, THROUGH THE LOOKING GLASS.)

Test scores. Scales. How they’re presented and explained to the general public.

Or as explained in Rheephormish by the “thought leaders” of the education establishment: “When we use words like test scores and scales, they mean just what we choose them to mean—neither more nor less.”

But isn’t that begging the question, rigging the system, distorting the conversation? Kind of like shooting an arrow into the wall and painting the bullseye around the spot where the arrowhead is sticking and claiming you got it just right?

From their POV, not at all. Rheephormsters have a ready answer for any objection by lesser beings of their disruptively creative innovations in terminology and discourse: “The question is who is to be master—and that be us.”

So that’s what they mean by 3DM [Data-DrivenDecisionMaking]…

😎

George Eller says:

September 13, 2015 at 1:49 pm

KrazyTA,

“They are arbitrary and subjective decisions made by fallible human beings.”

making it 3DM: Data-DrivelDecision Making . . .

LikeLike

Ed Detective says:

September 13, 2015 at 11:29 am

“Error on top of invalidity” strikes again!

Akademos says:

September 13, 2015 at 5:04 pm

Likely a continuous onslaught.

LikeLike

- Duane Swacker says:
  
  September 13, 2015 at 6:39 pm
  
  Ouroboros?
  
  https://search.yahoo.com/yhs/search?p=snake+eating+own+tail&ei=UTF-8&hspart=mozilla&hsimp=yhs-002
  
  LikeLike

Chiara says:

September 13, 2015 at 11:31 am

The NYTimes has yet another glowing, fawning piece on charter schools:

The Best and the Brightest worship is just cringe-worthy, including the mandatory private sector resume which is of course much, much better than any public sector work.

Thank God for local newspapers or this would be the ONLY view available, nationally.

Lucia says:

September 13, 2015 at 11:45 am

SBAC has created a score report that includes the scale score. You can find an example of the score report by googling “Smarter Balanced Assessment Consortium
Reporting System.”

According to SBAC, “This document provides a static view of the reports, with annotations to explain the different features that will be available when the system is released. Video demonstrations of the system functionality are in production and will be released for viewing in August 2014. The completed system will be provided to the Smarter Balanced Assessment Consortium in September 2014. Member states are able to utilize the system as hosted by the Consortium, utilize the open-source code to adapt the system to their needs, or utilize a different system.”

The “static view” example SBAC provides for an individual student includes the scale score.

fsjenner says:

September 13, 2015 at 11:47 am

You know what teachers, parents, and students really want to see? They want to see exactly what Johnny and Suzy got right and what they got wrong. That is the level of detail needed for full understanding. As a teacher, I can often figure out why a child got #7 wrong by looking at his/her response. Harder to do with multiple choice but not impossible. Simple scores alone are not informative or useful–except to compare students.

I am not alone in seeing the test as a “black box” in that I have no confidence that what I am teaching and the way I am teaching it are aligned to the test. I never see the test. I never get the chance to understand the test format, the presentation of problems, the areas of focus match what I am doing day after day in the classroom. I am blindly doing my best to “teach to the standards” (nothing new BTW) with the materials I have been provided based on the training I have received, never knowing that it may not match the structure and content of the test.

concerned mom says:

September 13, 2015 at 12:10 pm

Great comment. I don’t understand why so many people trust that the test is valid without seeing the actual test.

LikeLike

retired teacher says:

September 13, 2015 at 12:11 pm

The only way to make the data even remotely useful is to disaggregate it. For example, reading comprehension questions may offer some information about the types of questions that the student got wrong. For example, the test may show that the student can locate information, understand cause and effect, but may have difficulty with drawing conclusions, which is a more complex skill. To tell you the truth, a good teacher probably already knows this, but parents may finding this interesting. However, with tests given on a frustration level, I think the information would be even less coherent since it may just show that the demands were beyond the students. I would think it would show less specific useful information. Also, when the results are available long after the student leaves the grade, the results are even less useful. If you believe these tests provide lots of useful information, I have not found that to be the case after many years of working with standardized tests. They offer a snapshot, nothing more, and there are better more efficient ways to get that information. I think the harder the tests, the less likely we will find anything useful from the results since students are less likely to show strengths in various skills categories. Why are we wasting all this time and money on something so meaningless? Politics!

LikeLike

Norm Scott says:

September 13, 2015 at 11:51 am

Great you guys are doing the work our silent teacher unions should be doing.

September 13, 2015 at 12:15 pm

Norm: Au contraire. Education unions not silent on this. My union and I advocate against this kind of stupid testing and scoring and misuse of test results all the time–with PTA, school and district admins, schools boards, department of ed, legislators, governor, public–anyone and everyone who has skin in the game. Others are doing the same.

Just because there has been little real progress does not mean folks aren’t standing up and speaking out.

Betsy Marshall says:

September 13, 2015 at 12:54 pm

Thank you fsjenner… I am equally as involved in educating and organizing around these issues, as is my local union. Here in NY state, NYSUT has become more focused on training and building leadership for organizing and political action. After being a “service union” for so many years it takes time to train and build capacity. I would say that the NEA, under the leadership of Lily Eskelsen Garcia, is becoming a more vocal and stronger force as well. On the other hand many teachers have lost faith in the AFT leadership after their early endorsement of Clinton for president. The too early, and undemocratic endorsement was a self destructive move. Michael Mulgrew and his crew at UFT, in NYC, has also been a destructive force in the union movement from what I can tell.

LikeLike

Marian Cruz says:

September 13, 2015 at 12:16 pm

This is not acceptable.

Lloyd Lofthouse says:

September 13, 2015 at 12:31 pm

Reblogged this on Crazy Normal – the Classroom Exposé and commented:
Why our children should not be measured and judged by standardized tests.

Michael Paul Goldenberg says:

September 13, 2015 at 5:06 pm

It’s a bit of an exaggeration to claim that this is how all standardized tests operate since many do not have any cut scores set. The many well-known ETS exams (SAT, GRE, LSAT, GMAT, etc.) don’t have cut scores and neither does the ACT. They do have conversions from raw to scaled scores, but there are several reasons for that, including at least one you didn’t mention. For example, on the ACT the number of scored mathematics questions (60) differs dramatically from the number of scored reading or science questions (40 each), which all differ dramatically from the number of scored English questions (75). If we are going to compare scores among these tests, a common scale is one way to do so, and that’s what is done. Added to the minor variations in difficulty from one test to another (based on empirical data, not subjective evaluation), it makes more sense to use a flexible scale than the other obvious choice for comparison among unequal “size” subtests: percentages. In my estimation, using scaled scores is perfectly reasonable. And since scaled scores are given along with national and state percentile rankings, it’s possible to compare scores between tests. Indeed, released tests also provide the scale on which conversions are done from the raw scores, so one can see how a particular subtest stacks up against that same subtest on other administrations. Generally, if memory serves, tests taken within a five-year period of one another are considered to have a fair basis for comparison. I know that if too much time passes from the time a test was taken, many schools will require that an applicant retake the test: I couldn’t use my 1973 GRE scores when I applied for graduate programs in mathematics education in 1991, for example (which worked out well for me, as my 1991 scores, particularly in mathematics, were much higher).

Neither the ETS nor the ACT offers any sort of “cut scores” for their tests as far as I know. Post-secondary institutions, professional schools, etc., that choose to use one or more of these tests have their individual ways of making use of the scores. But there is nothing to prevent a state from setting cut scores for any test, so if New York, say, decides to use the SAT instead of one of the testing consortium exams to measure high school students, it can certainly set cut scores and play the usual games with those.

I’m not defending the use of the SAT or ACT, but I think it helps to have a fully accurate understanding of the scoring methods and conventions.

Cap Lee says:

September 13, 2015 at 6:06 pm

Percentages of kids making artificial proficiency are meaningless. Therefore opt out saves kids a lot of aggravation and makes no difference to their education. The test is a huge and expensive waste of time.

What if assessments were at a local level and were demonstrations of kids skills. And they told parents the gains individuals made. Hopefully the Collins Sanders amendment to ESEA will open the window of opportunity and we can dive through it to give kids and their parents meaningful information.

Once we make assessment real, the curriculum becomes real and it takes highly qualified teachers to assure students are taught in the way they learn best. No more muddling through with artificial education by less than qualified teachers.

Public ed has the real deal!

gitapik says:

September 14, 2015 at 9:10 am

“The test is a huge and expensive waste of time.”

Got that right. Add the expense of the technology that the tests are being giving on and it goes off the charts.

LikeLike

Duane Swacker says:

September 13, 2015 at 6:27 pm

YEP! Exactly what Wilson described back in 1997 about “psychometric fudges” being a one part of the many errors and falsehoods involved in the educational standards and standardized testing educational malpractices that serve to render any results COMPLETELY INVALID! To understand read and comprehend Wilson’s never refuted nor rebutted dissertation/treatise on those malpractices. See: “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.

1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other words all the logical errors involved in the process render any conclusions invalid.

5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

Akademos says:

September 14, 2015 at 10:06 am

This is profoundly true. Though we live in reality and deal with the imperfection of quizzes and tests to very roughly gauge where things may lie, we should remember that even if the smartest person in the world were to personally assess by interview each and every student, the outcomes would still only be guesses as to what had truly transpired, where each student now stands and how they had been transformed.

LikeLiked by 1 person

- Duane Swacker says:
  
  September 14, 2015 at 1:38 pm
  
  Exactly!!
  
  LikeLike

Akademos says:

September 14, 2015 at 11:55 am

Ohio CC ‘results’ :

http://www.cleveland.com/metro/index.ssf/2015/09/ohios_common_core_test_results.html

Denis Ian says:

April 4, 2016 at 2:44 pm

The results aren’t actually a test score? We’ll, the Common Core reform isn’t actually a reform either. I am over being stunned.

Bernie Horn: The Common Core “Results” Are Not Actually Test Scores: MUST READ!

32 Comments Post your own or leave a trackback: Trackback URL

Leave a reply to Lucia Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

Bernie Horn: The Common Core “Results” Are Not Actually Test Scores: MUST READ!

Diane Ravitch's Blog

32 Comments Post your own or leave a trackback: Trackback URL

Leave a reply to Lucia Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats