Carol Burris: The Danger Done by “Fools with Tools”

April 30, 2014 //

Carol Burris here explains the deep, dark secret of standardized testing.

Whoever is in charge decides what the passing mark is. The passing mark is the “cut score.” Those in charge can decide to create a test that everyone passes because the cut score is so low and the questions so simple, or they can create a test that everyone fails. In fact, because of field testing, the test makers know with a high degree of precision how every question will “function,” that is, how hard or easy it is and how many students are likely to get it right or wrong.

As Burris shows, New York’s Commissioner John King aligned the Common Core tests with the SAT, knowing in advance that nearly 70% would not pass. That was his choice. Whatever his motive, he wanted a high failure rate. As King predicted, 69% failed. It was his choice.

Policymakers in Kentucky chose a more reasonable cut score and only about half their students failed.

Are students in Kentucky that much smarter than students in New York? No, but they may have smarter policymakers.

Knowing these shenanigans gives more reason to opt your children out of the state testing. The game is rigged against them.

Categories Common Core, Kentucky, New York, Testing

58 Comments Post your own or leave a trackback: Trackback URL

shelley says:

April 30, 2014 at 9:14 am

“Only” 50% failed?!

LikeLike

Reply
- Threatened out West says:
  
  April 30, 2014 at 7:07 pm
  
  If the test is normed on a bell curve, that is what you would expect. Which is one reason why this mandate that everyone has to be “on grade level” is so preposterous. If the test is normed to a bell curve, then 50% will NOT pass. It’s the nature of the beast, and I REALLY do mean a beast in this case.
  
  LikeLike
  
  Reply
  - jpr says:
    
    April 30, 2014 at 7:24 pm
    
    The test is not normed on a bell curve.
    
    If anyone wants the actual details on how the cut scores were determined, see http://www.scribd.com/doc/155685283/July-22-2013%C2%A0%E2%80%94%C2%A0New-York-s-Standard-Setting-process.
    
    95 educators spent 4-5 days reviewing the materials and determining the cut scores. The commissioner accepted their recommendation without change.
    
    I guess a “detail” like that gets left out because it doesn’t fit the narrative.
    
    LikeLike
jpr says:

April 30, 2014 at 9:19 am

I think the “smarter policymaker” chose cut scores based on something real, in this case a proxy for college readiness. Choosing arbitrary, low cut scores in order to pass more kids is not smart policy making, it’s doing what’s popular and easy.

Yes, I know many people will say that SAT scores are not the best indication of college readiness, but it beats the alternative: cut scores that are made up for no reason other than to pass an an “acceptable” number of students (whatever that is) regardless of what they actually know.

When Kentucky’s students have to take remediation courses for no credit in college, or drop out of high school because the grading “suddenly” got hard, or are passed up for a job because they don’t know basic skills, their policy makers won’t be there to help them out. Neither will Carol Burris.

LikeLike

Reply
- Ang says:
  
  April 30, 2014 at 9:40 am
  
  “I think the “smarter policymaker” chose cut scores based on something real, in this case a proxy for college readiness.”
  
  Must ALL children be “college ready” (what ever that is) at graduation?
  All Special ed students?
  All ELL students?
  How about students who simple do not want to go to college? Some want to do hair, be plumbers, etc.
  Must they be considered failures in high school? Because that is what I see happening now.
  
  PS didn’t we see somewhere that HS grades were better indicators of college success than the SAT?
  
  LikeLike
  
  Reply
  - jpr says:
    
    April 30, 2014 at 9:52 am
    
    My point is that arbitrarily high cut scores and arbitrarily low ones are both wrong. Give NY some credit for trying to tie cut scores to something actually meaningful.
    
    No, all children don’t have top be college ready, but students other than special ed and ELL students should have the choice to go to college if that’s what they want.
    
    Haven’t we had enough of the culture that says all students should do “above average” on tests? Where has that gotten us? Sooner or later, these students will run into reality, where they don’t have some pol to change the cut score for them.
    
    LikeLike
  - Tim says:
    
    April 30, 2014 at 10:08 am
    
    Surely there has to be some middle ground between labeling students who don’t have the desire or ability to attend college as failures, and merely moving kids along with tests and grades that aren’t an accurate indicator of college-readiness. For example, 75-80% of the New York City DOE high school graduates who attend a City University of New York community college require intensive remediation in at least one subject area (reading, writing, and math; 20-25% require intensive remediation in all three.
    
    Zakiyah Ansari, the director of the advocacy group Alliance for Quality Education and a member of Bill de Blasio’s transition team, had a daughter who was required to take math remediation to enter a CUNY, and she summed it up perfectly: “There are far too many of our children who are getting out of high school thinking they’re prepared, and they’re not,” she said. “And I am frustrated, as a parent.”
    
    I think the HS grades studies come with some fairly large caveats. High school quality matters a lot: http://goo.gl/IYhFXK. Second, I think the HS GPA studies largely suggest that colleges consider subject exams like AP and IB rather than the SAT/ACT–they are still recommending additional measures beyond HS grades.
    
    LikeLike
  - Jim says:
    
    April 30, 2014 at 10:58 am
    
    A traditional college education is only suitable for those with IQ’s in excess of 110, about a quarter of the US polpulation. Some type of vocational education s best for the other 75%.
    
    LikeLike
  - jpr says:
    
    April 30, 2014 at 5:20 pm
    
    Jim,
    Re: “A traditional college education is only suitable for those with IQ’s in excess of 110, about a quarter of the US polpulation. Some type of vocational education s best for the other 75%.”
    Do you have a source for this?
    
    Georgetown’s Center on Education and the Workforce says that 63% of jobs in 2018 will require post-secondary education.
    If these are both accurate, 63% of employers will be scrabbling over 25% of job candidates and 75% of job candidates will be competing for 37% of jobs. Not a pretty picture.
    
    LikeLike
  - Tim says:
    
    April 30, 2014 at 11:41 am
    
    Just found this recent article about SATs and college admissions, which includes some great points about the unintended consequences de-emphasizing the SAT may have: http://www.slate.com/articles/health_and_science/science/2014/04/what_do_sat_and_iq_tests_measure_general_intelligence_predicts_school_and.html
    
    LikeLike
  - Ang says:
    
    April 30, 2014 at 1:51 pm
    
    “merely moving kids along with tests and grades that aren’t an accurate indicator of college-readiness”
    
    Straw man?
    
    In my experience, students who require remediation in college KNEW that they were not doing well. The teachers knew it and their parents knew it (if they cared to know it or were able to understand it..think ESOL parents).
    The students’ HS grades/course selection reflected it.
    
    In 24 years of teaching I have never encountered a student with high grades in advanced/college prep/AP courses who required remedial college classes.
    
    But I have encountered (and tutored) many HS graduates who took the easiest/lowest level class possible, graduated with the bare minimum hours, and had low to middling grades in core classes. These kids then decided to go to college. Good for them I am a BIG believer in second chances. Some college took them, took their money and put them in remedial. Of course they needed help.
    
    Now, I know nothing of this guy and his daughter. Perhaps she took AP Calc in HS and really thought she was prepared. But that would make her a real outlier in my experience.
    
    LikeLike
  - Ang says:
    
    April 30, 2014 at 2:04 pm
    
    “Haven’t we had enough of the culture that says all students should do “above average” on tests?”
    Yes, we have.
    But if they don’t do above average their teacher will get canned.
    
    LikeLike
  - teachingeconomist says:
    
    April 30, 2014 at 5:29 pm
    
    Ang,
    
    Many of my students had no access to AP classes because their high schools were to small to offer those classes.
    
    LikeLike
- Spanish & French Freelancer says:
  
  May 1, 2014 at 1:09 am
  
  xxxxxxxxxxxxxxxxxxxx But you’re exaggerating the case. Kentucky chose a passing cut score equivalent to an SAT score of 1000 (eng+math), achieved by 50% of test-takers. That is not a poor score relegating students to remedial courses in college. It’s a mediocre score, like a “C”; if accompanied by B grades showing work capability, the student will probably do well at a less-selective college. NYS chose a cut score equivalent to SAT scores obtained by only 30% of test-takers, who w/B grades will probably do well at more selective colleges. The NYS choice is not appropriate as a cut-off for passing.
  
  LikeLike
  
  Reply
Momoffive says:

April 30, 2014 at 9:31 am

Why are “cut scores” used? Can’t it be graded and scored like a regular test (points correct over total points on a test)? Then the grading scale is 90-100 A; 80-89 B; ect. I just don’t understand the point of standardizing something in 45 states, if the test and scoring is not the same everywhere.

With SBAC every test will be different because the test adapts itself with harder or easier questions based on what the test-taker answers, so students will never really know where they stand among peers. I opted my kids out and I’m glad I did, especially after reading this article!

LikeLike

Reply
- NYS Teacher says:
  
  April 30, 2014 at 10:19 am
  
  They are norm-refrenced. not criteron referenced. This is a completely different issue. Norm refrencing requires a “bell type curve” in which a certain percentage must fail. It is used to stack rank students, not used to demonstrate proficiency. Yet it is a test that is supposed show proficiency. Only criterion refrenced exams will do that. This is another dirty little secret of standardized testing undre CCSS.
  
  LikeLike
  
  Reply
  - KrazyTA says:
    
    April 30, 2014 at 10:33 am
    
    NYS Teacher: while this merits more discussion, much said in few words.
    
    I especially like your reference to stack ranking/forced ranking/rank-and-yank/burn-and-churn.
    
    Not just a failed worst management practice from the distant past but strangely [?] it defines the very recent past of the de facto Secretary of Education of the USofA, Bill Gates.
    
    For a small sample of a Vanity Fair article on the same—
    
    [start quote]
    
    At the center of the cultural problems was a management system called “stack ranking.” Every current and former Microsoft employee I interviewed—every one—cited stack ranking as the most destructive process inside of Microsoft, something that drove out untold numbers of employees. The system—also referred to as “the performance model,” “the bell curve,” or just “the employee review”—has, with certain variations over the years, worked like this: every unit was forced to declare a certain percentage of employees as top performers, then good performers, then average, then below average, then poor.
    
    [end quote]
    
    Link: http://www.vanityfair.com/business/2012/08/microsoft-lost-mojo-steve-ballmer
    
    I don’t think it will take us ten more years [thank you, Bill Gates!] to figure out that the self-styled “education reformers” have failed, in large part because they simply keep repeating the same mistakes. They just rebrand them and hope nobody notices.
    
    “Insanity: doing the same thing over and over again and expecting different results.” [Albert Einstein]
    
    Thank you for your comments.
    
    😎
    
    LikeLike
  - NYS Teacher says:
    
    April 30, 2014 at 11:21 am
    
    Krazy TA
    This is one of the incomprehensible aspects of the CC regime. Using tests to show proficiency (college and career readiness if you will) in math and ELA, but failing to use a scoring methodology that reflects proficiency. The norm refrencing used has one purpose – to compare test takers.
    This is so non-sensical that most outsiders have trouble making sense of it. Just how far down the rabbit hole Bill Gates and Co. have taken us?
    
    LikeLike
  - NYS Teacher says:
    
    April 30, 2014 at 12:04 pm
    
    Believe this or not . . .
    
    A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. That is, this type of test identifies whether the test taker performed better or worse than other test takers, not whether the test taker knows either more or less material than is necessary for a given purpose.
    
    THIS MUST BE REPEATED:
    
    NRT DOES NOT INDICATE WHETHER A TEST TAKER KNOWS EITHER MORE OR LESS MATERIAL THAN NECESSARY FOR [BEING COLLEGE AND CAREER READY].
    
    CCSS are not CRT.
    A criterion-referenced test is one that provides for translating test scores into a statement about their relationship to a specified subject matter. Most tests and quizzes that are written by school teachers can be considered criterion-referenced tests. The objective is simply to see whether the student has learned the material.
    
    THIS BEARS REPEATING.
    CRT ARE USED TO SEE WHETHER STUDENTS HAVE LEARNED THE MATERIAL. CCSS ALIGNED TESTS ARE NOT CRT.
    
    This idea defies all logic.
    Claiming that the test scores demonstrate proficency, yet using a scoring method that only stack-ranks, and requires “X” percent of failure.
    
    Then they have the gall to rate teachers on the results.
    
    LikeLike
  - NYS Teacher says:
    
    April 30, 2014 at 12:27 pm
    
    All this begs the questions,
    
    “If PARCC/SBAC/Pearson tests are meant to measure college and career readiness in math, reading, and writing, than why are they scored with a method that can ONLY stack-rank students? Why do they use a scoring method that DOES NOT measure knowledge, mastery, or proficiency of the subject matter?
    
    LikeLike
  - jpr says:
    
    April 30, 2014 at 2:48 pm
    
    NYS Teacher,
    Can you explain your statement that CCSS tests are norm-referenced? It doesn’t make sense to me that standards imply a particular type of grading.
    
    Also, I don’t agree that NYS tests are norm-referenced. They are criterion referenced, but then cut scores for 1,2,3, and 4 are applied. Obviously that messes with the distribution, but calling them norm referenced isn’t accurate either. They don’t rank students.
    
    LikeLike
deb says:

April 30, 2014 at 9:43 am

The state tests in Ohio used for the last 20 years also have cut scores. The scores vary from year to year, yet they are used to compare the student’s success in attaining a year’s growth. This is then used to determine the success or failure of a teacher. It makes no sense. It is unfair. Yet we plunge ahead into the darkness. Why can’t children just enjoy the process of discovery, the joy of curîosity, or the learning achieved from trial and error. Mistakes teach us. Funny how these reformers don’t learn from their mistakes. Quite possibly, the quest for efficiency may just lead to the boredom of the mundane.

LikeLike

Reply
Bob Shepherd says:

April 30, 2014 at 9:46 am

Great piece by Principal Burris!

Of course, there are several ways in which testing companies, testing consortia, and state departments can manipulate the data to get the results that they want.

Here’s one of the biggest of the many dirty little secrets of standardized testing: Give me the result that you want, and I can design a standardized test to achieve that result. All one has to do is field test and then throw out questions that don’t yield the desired result.

Here’s another: Manipulate the raw-score to scaled-score conversions. The standard process for translating raw scores into scaled scores is a linear transformation. But graph the raw and scaled scores on the ELA and Math exams in New York for years under NCLB and you will find that the lines jump all over the place, wildly. They are about as predictable as a gerbil on methamphetamine or lines on a Jackson Pollock painting. Clearly, New York created raw-score-to-scaled-score conversations to yield the results it wanted to show.

Here’s another: Decide where you want to place the cut scores, as discussed by Principal Burris.

And, of course, if you start with tests that are themselves invalid–that do not measure what they purport to measure–there’s no way your data will mean anything anyway. The latest ELA tests in the Common Core College and Career Ready Assessment Program, or C.C.C.C.R.A.P., superbly illustrate such invalidity. These are supposed to be tests of reading and writing knowledge and ability, but what is tested on these assessments bears almost no relation to authentic, real-world reading and writing. They test what one might call InstaWriting for the Test and InstaReading for the Test but not reading and writing. They are based on standards that almost completely ignore knowledge and that define skills so broadly, so generally, so vaguely, that these are not concrete enough, not operationalized enough to be validly tested. And almost all the standards deal with explicit learning when MUCH of significant learning in ELA is implicit acquisition, and so the outcomes to be measured are improperly characterized and any measurement based on those characterizations will, therefore, be entirely suspect. And, finally, most of what actual readers and writers do when actually reading and writing is not tested and cannot be tested by the means chosen. So, the C.C.C.C.R.A.P. assessments do not test what they purport to be testing, and therefore they cannot be valid. QED.

So, in short, the Ed Deformers’ “data-driven decision making” is a variety of NUMEROLOGY. It is to rational decision making in education what astrology is to astronomy, phrenology to psychology. It’s hocus pocus. It’s a con, and a very lucrative one for a few testing companies and consultants and curriculum providers.

It’s time to tell these people, we have seen what you are proposing, and it is sloppy, uninformed, and unacceptable, in addition to being abusive of children and distorting of curricula and pedagogy.

LikeLike

Reply
- KrazyTA says:
  
  April 30, 2014 at 10:21 am
  
  Bob Shepherd: I think that a small part of your comment bears repeating—
  
  “Here’s one of the biggest of the many dirty little secrets of standardized testing: Give me the result that you want, and I can design a standardized test to achieve that result. All one has to do is field test and then throw out questions that don’t yield the desired result.”
  
  I would only change one part of the first sentence: “Here’s one of the biggest of the many POINTS OF PRIDE of standardized testing” — it’s not a “dirty little secret” except when the adherents of the High Church of Testolatry prey on parent’s misconceptions about standardized tests (“they’re just like the tests your teachers gave you in school every week!”) and lie by omission.
  
  Whether one loves, hates or is indifferent to standardized tests, it is an indisputable fact that they have many decades of use and development behind them. And work by some individuals of true distinction [at least in test making]. Surprises? Like what happened in NY not too long ago with an almost 70% failure rate? The client paid for that result. The only surprise was the sucker punch the rest of us received.
  
  And I’m not a fan of sucker punchers. That’s why I call them edubullies.
  
  Thank you for your comments.
  
  😎
  
  LikeLike
  
  Reply
  - Bob Shepherd says:
    
    April 30, 2014 at 12:50 pm
    
    A small part!!! Nay, prime time television and every Internet feed in the country should be interrupted to broadcast this in its entirety! LOL.
    
    LikeLike
- Jim says:
  
  April 30, 2014 at 11:00 am
  
  Most official statistics are garbage.
  
  LikeLike
  
  Reply
- Bob Shepherd says:
  
  April 30, 2014 at 12:48 pm
  
  Indeed, Jim. The deformes’ data remind me, often, of the stats for production of pig iron and pork bellies continually broadcast by Stalin and by IngSoc in 1984. Politically motivated baloney.
  
  LikeLike
  
  Reply
- Nimbus says:
  
  April 30, 2014 at 7:56 pm
  
  This just appeared on the-legendary- engageNy site, Robert? I would love your observations and the observations of anyone else interested in wading through…well, you know, stinky stuff.
  
  http://www.engageny.org/resource/regents-exams-english-language-arts-webcasts
  
  LikeLike
  
  Reply
  - Nimbus says:
    
    April 30, 2014 at 7:57 pm
    
    ugh…I hate the virtual keyboard. It should be a period after Robert.
    
    LikeLike
  - Bob Shepherd says:
    
    May 1, 2014 at 7:04 am
    
    I will look at this. Very important, of course. I was saddened to find that the presentation on the new Regents begins with an outright lie: “The standards were developed by lay stakeholders in the field, including teachers, school administrators, and content experts.” The new tests appear to be HEAVILY WEIGHTED toward analysis of and response to nonfiction texts. The third part of the test, the “Literary Analysis,” deals with Kennedy’s Inaugural Address. Sigh.
    
    LikeLike
- Duane Swacker says:
  
  May 1, 2014 at 8:42 pm
  
  “And, of course, if you start with tests that are themselves invalid–that do not measure what they purport to measure–there’s no way your data will mean anything anyway.”
  
  BINGO!! Exacto!! No Doubt!! TAGO!!
  
  #1 these standardized tests measure nothing as the tests aren’t measuring devices.
  
  #2 These standardized tests are COMPLETELY INVALID as proven by Noel Wilson (thanks, Bob, for an opening through which to drive the QQ Bandwagon). See “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
  
  Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
  
  1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
  
  2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
  
  3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
  
  4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
  
  In other word all the logical errors involved in the process render any conclusions invalid.
  
  5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
  
  6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
  
  7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
  
  In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
  
  My answer is NO!!!!!
  
  One final note with Wilson channeling Foucault and his concept of subjectivization:
  
  “So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
  
  In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
  
  LikeLike
  
  Reply
KrazyTA says:

April 30, 2014 at 10:10 am

“Knowledge makes a man unfit to be a slave.” [Frederick Douglass]

In that spirit, I offer the following—

MEASURING UP: WHAT EDUCATIONAL TESTING REALLY TELLS US by Daniel Koretz (2009, paperback). From Amazon: “The best explanation of standardized testing is Daniel Koretz’s Measuring Up: What Educational Testing Really Tells Us.–Diane Ravitch (New York Review of Books 2012-03-08)” — yes.

If Koretz is a little bit much to start with, then Phillip Harris, Bruce M. Smith and Joan Harris, THE MYTHS OF STANDARDIZED TESTS: WHY THEY DON’T TELL YOU WHAT YOU THINK THEY DO (2011).

Also very helpful: Todd Farley, MAKING THE GRADES: MY MISADVENTURES IN THE STANDARDIZED TESTING INDUSTRY (2009) [very readable!]; Sharon L. Nichols and David C. Berliner, COLLATERAL DAMAGE: HOW HIGH-STAKES TESTING CORRUPTS AMERICA’S SCHOOLS (2010); and the sections on testing in Gerald Bracey, READING EDUCATIONAL RESEARCH: HOW TO AVOID GETTING STATISTICALLY SNOOKERED (2006).

For more generalized critiques, I heartily recommend Noel Wilson (see postings by Duane Swacker on this blog) and Banesh Hoffman (THE TYRANNY OF TESTING, 2003).

Briefly, standardized tests (especially those of the high-stakes nature) measure little, are inherently imprecise, and have increasingly be put to uses for which they are not only not suited but are completely inappropriate.

As for the numbers they generate, IMHO, high-stakes standardized tests are highly accurate in measuring one thing and one thing only: your ability to “achieve” a certain score on your “performance” [literally terms by the test-makers] on a specific standardized test on a specific day under specific circumstances.

Period. And as for the general consequences of high-stakes standardized testing…

From Jim Horn and Denise Wilburn, THE MISMEASURE OF EDUCATION (2013), pp. 1, 55, and 147, for the following observations:

“What was once educationally significant, but difficult to measure, has been replaced by what is insignificant and easy to measure. So now we test how well we have taught what we do not value.” — Art Costa, professor emeritus at Cal State-Fullerton

“Initially, we use data as a way to think hard about difficult problems, but then we over rely on data as a way to avoid thinking hard about difficult problems. We surrender our better judgment and leave it to the algorithm.” —Joe Flood, author of THE FIRES

“When the right thing can only be measured poorly, it tends to cause the wrong thing to be measured, only because it can be measured well. And it is often much worse to have a good measurement of the wrong thing—especially when, as is so often the case, the wrong thing will in fact be used as an indicator of the right thing—than to have poor measurements of the right thing.” — John Tukey, mathematician, Bell Labs & Princeton University

And an old dead Greek guy also gives us something to think about:

“A good decision is based on knowledge and not on numbers.” [Plato]

😎

P.S. Please excuse the long comment.

LikeLike

Reply
- Bob Shepherd says:
  
  April 30, 2014 at 1:04 pm
  
  Tukey, the great statistician, NAILS these tests in that quotation.
  
  The most vetted standardized test in history, no doubt, is the SAT. Originally, it was called the Scholastic Aptitude Test because it was supposed to measure aptitude for college. But it did not predict college success. It was invalid for that purpose. And so the name was changed to the Scholastic Achievement Test. But it didn’t accurately measure Achievement either. So, they changed the name again to the Scholastic Reasoning Test, or simply the SAT, because the results of the test are correlated with Spearman’s g–IQ as measured by IQ tests. The SAT is also a SUPERB predictor of how wealthy students’ parents are.
  
  So, that extraordinarily carefully vetted standardized test was completely invalid for the purposes for which it was created. How much more invalid are the sloppily thrown together NCLB state tests and the national C.C.C.C.R.A.P. tests now being foisted on the country.
  
  It’s astonishing to me that anyone takes results from these at all seriously.
  
  LikeLike
  
  Reply
- Duane Swacker says:
  
  May 1, 2014 at 8:44 pm
  
  Thanks KTA for putting those resources together for all to check out!
  
  TAGO!
  
  LikeLike
  
  Reply
NYS Teacher says:

April 30, 2014 at 10:23 am

To provide further proof how this is nothing more than a rigged game, go to the PARCC website. Cut scores for the 2015 math and ELA tests will not be established until the summer of 2015, months after they have been taken. Even with all the field testing they brag about, 3 million field tests – or so. They will still wait until all 20 million exam results are in before they set cut scores.

LikeLike

Reply
- Bob Shepherd says:
  
  April 30, 2014 at 1:05 pm
  
  They have to figure out what they can get away with. They will probably set the initial cut scores low in an attempt to sneak these tests by the public and avoid an educational policy supernova.
  
  LikeLike
  
  Reply
- Duane Swacker says:
  
  May 1, 2014 at 8:46 pm
  
  Larger sample size?-HA HA!!
  
  LikeLike
  
  Reply
Joseph says:

April 30, 2014 at 10:30 am

every school in Kentucky has solar panels, and some are net zero efficient. NCLB may have been dropped because you can not determine its mandate that every child be proficient with cut scores.

LikeLike

Reply
Michael Fiorillo says:

April 30, 2014 at 12:00 pm

Yes, John King is a fool, but a dangerous one, implementing a vicious and destructive agenda.

LikeLike

Reply
jpr says:

April 30, 2014 at 12:55 pm

In fairness, manipulating cut scores is hardly the exclusive realm of new reforms. In NY, we regularly see cut scores adjusted upwards when a new State Ed boss comes in and then gradually lowered over time to make the data look better. It’s been happening for years.
As I mentioned above, at least the cut scores are based on something this year (college readiness), and I hope they stay put where they are. Sure, adjustments are needed for Special Ed and ELL, but artificially low cut scores do nothing except make students think they’re doing better than they are. Studies show we’re already the best at that.

LikeLike

Reply
- Bob Shepherd says:
  
  April 30, 2014 at 1:07 pm
  
  jpr, please read my comments, above, about the validity of these tests. thanks.
  
  LikeLike
  
  Reply
- NY teacher says:
  
  April 30, 2014 at 9:15 pm
  
  Correction:
  
  “As I mentioned above, at least the cut scores are based on something this year (a figment of David Coleman’s imagination),
  and I hope they stay put where they are.”
  
  LikeLike
  
  Reply
  - jpr says:
    
    April 30, 2014 at 10:04 pm
    
    Has anyone who is criticizing how cut scores were determined in New York read the actual document about how they were determined? If so, do you disagree with the method that was used or do you not believe the documentation? If you haven’t read it, doesn’t it make sense to do that before saying they were arbitrary, political, etc.?
    
    It’s a valid criticism of Common Core Standards that they were done without enough educator input. It looks to me like 70% of the people who worked on the NYS cut scores were teachers and the rest were other educators (BOCES, administrators, higher Ed). Commissioner King didn’t modify the cut scores recommended by this panel at all. Does it bug anyone else that this whole conversation, including the original post, is ignoring this?
    
    It looks to me like they did the right thing; benchmarking against NAEP and other national data to make sure NY’s scores actually mean something.
    
    LikeLike
  - Tim says:
    
    May 1, 2014 at 8:55 am
    
    :crickets:
    
    LikeLike
- Duane Swacker says:
  
  May 1, 2014 at 8:48 pm
  
  jpr,
  
  Can you supply us with a links so we can read that to which you refer: “Has anyone who is criticizing how cut scores were determined in New York read the actual document about how they were determined? ”
  
  Thanks in advance!
  
  LikeLike
  
  Reply
  - dianeravitch says:
    
    May 1, 2014 at 9:04 pm
    
    Duane and JPR, try this link for the setting of the cut scores.
    
    Why Common Core Tests Cause Scores to Collapse
    
    LikeLike
  - Duane Swacker says:
    
    May 1, 2014 at 9:09 pm
    
    Thanks for that link Diane but my request is more specific, that is, the methodology involved with the NY cut scores for the time frame about which we are talking. I am assuming jpr has those or at least knows where to link.
    
    LikeLike
  - 2old2teach says:
    
    May 1, 2014 at 11:54 pm
    
    And jpr’s link seems to indicate that they used NAEP stadards inappropriately.
    
    LikeLike
  - 2old2teach says:
    
    May 1, 2014 at 11:55 pm
    
    cx: standards
    
    LikeLike
  - jpr says:
    
    May 1, 2014 at 10:00 pm
    
    Sorry, Duane. The link was posted further up and got a bit buried. Here it is again: http://www.scribd.com/doc/155685283/July-22-2013%C2%A0%E2%80%94%C2%A0New-York-s-Standard-Setting-process.
    
    I think this points out that educators drove the process of placing the cut scores. I imagine that the goal of having the cut scores reflect college readiness pre-existed this conference, but all of the misinformation above about bell curves and arbitrary failure rates just doesn’t ring true.
    
    I think it’s important for the scores to mean something, and preparedness for college seems an appropriate benchmark for most students. Some students and parents may be satisfied with not being prepared for college, others will think it’s important.
    
    I guess if you think that the majority of students aren’t capable of being prepared for college, this wouldn’t make sense. I don’t believe that is true. I believe K12 education can prepare the majority of students to have the option of college. Frankly, it is doing that for the majority of students from high socioeconomic conditions, and students from low conditions are not inherently less intelligent; they just start out behind and frequently get the worst our education system has to offer instead of the best, as their well off counterparts get. In NY, our property tax-based education funding system is stacked against low SES kids.
    
    Hiding the problem with low cut scores doesn’t make it go away.
    
    LikeLike
  - deb says:
    
    May 1, 2014 at 10:06 pm
    
    I would just live to know 2 things. 1) How do poor students who are good students but not scholarship material supposed to PAY for college? And 2) why would anyone waste their money or take on heavy debt to go to college only to find no decent paying jobs when they graduate?
    
    Even excellent grades and high degres s don’t get a person an interview, let alone get them a job.
    
    So what is the real point of all this?
    
    LikeLike
  - jpr says:
    
    May 1, 2014 at 10:36 pm
    
    Deb,
    There is certainly a lot of frustration out there about college and lots of legitimate issues. But, the fact is that most jobs in the U.S. today require the degree, and that unemployment is largely a function of highest level of education obtained. Let me know if you want the data and graphs on this, but it’s definitely the case.
    
    I’m a big fan of the programs that offer tuition-free education at state schools for high performing students from low SES families.
    
    LikeLike
  - deb says:
    
    May 1, 2014 at 11:47 pm
    
    I know that. I am just living the situation. I don’t need graphs. I don’t need more info.
    
    My immediate family is educated with various degrees. They have great resumes, greatbgrades, honor students, skills in technology, people skills, are known as good citizens, helpful people, and ambitious.
    
    One son just a new job, not using his degree, but it is a job. He was a good student but didn’t pursue a masters degree.
    
    The other son has had to take a part time job to even have a job. He may have a better job next week when the final interview takes place. He has had trouble getting any interviews. He speaks 3 languages, is very socially conscious, has friends of call kinds of ethnicities, has a masters plus a year, great grades, and he helps other people write cover letters and resumes successfully. Yet he has been overlooked for some reason.
    
    Neither of them have any student debt.
    
    My husband has been a paper coating chemist, school board member, vocational school board member, and held various other hats in his work. His company shut down. He took a job out of state for 4 years. His entire division was closed down. He found another comparably paying job in our state but quite far from home. They shut down that division.
    
    He cannot find another job. He gets false leads. HR people don’t follow through. There are only sales jobs and truck driving jobs available. He applies for jobs that use his experience. He can’t find a new job.
    
    From personal perspective, I believe I know what is going on out there.
    
    I am not so sure our expense for college degrees has paid off.
    
    If I sound unhappy, I am.
    
    LikeLike
  - jpr says:
    
    May 2, 2014 at 1:08 am
    
    Deb,
    Very sorry to hear about it and completely understand. The stock market is going gangbusters, but those of us who have to work for a living aren’t doing so well in what passes for a recovery right now. Wish Washington would incent companies to spend, not invest and save offshore. I hope for some better news for you and yours soon.
    
    LikeLike
Mitch says:

April 30, 2014 at 5:15 pm

My son failed last year’s Math test by getting 23% correct. Except he got a 3.7. Last year was an experiment and NY’s children were guinea pigs. Everyone who allowed that needs to be fired, including Governor Cuomo. I loke forward to viting Republican for the first time in my 55 years for the next NYS Governor.

LikeLike

Reply
2old2teach says:

April 30, 2014 at 7:35 pm

Did you know that you can influence a person’s test scores by asking for a racial/ethnic identification? Or that African Americans will score better if a test is presented as something other than an IQ test? I had a student whose family was not celebrated for their intelligence, and young teenage girl had internalized this stereotype of her family. I had her in a special education resource program where we spent the majority of our time on language arts support. The task was to convince her that her effort would make a difference. Her mainstream teacher was demanding and quite traditional but she sent on students who were well prepared for high school classes. I had to convince this girl that hard work would pay off. She took a chance. A portfolio of her writing clearly showed her phenomenal growth. By the end of the year her performance was so strong that the teacher recommended her for an honors English program. And we want to use high stakes tests to determine the future of our children? Stupidity.

LikeLike

Reply
democracy says:

May 1, 2014 at 6:34 am

I’ve written before that public education in the U.S. is in greater danger than many think.

I’ve also written that unless and until public school educators wean themselves from the notion that the PSAT and SAT and ACT measure anything worthwhile other than family income, then the Common Core will roll full steam ahead. That applies as well to the College Board’s Advanced Placement program.

As Carol Burris points out in her remarks, the state of New York asked the College Board to “align” PSAT and SAT test scores to Common Core “cut” scores. In Kentucky, the Common Core “cut” scores were aligned with ACT scores.

Truly, this practice is inane. But inanity has come to define public education in this country. The recent iteration was the standards and testing regimen tied to No Child Left Behind, despite compelling research that none of it was necessary. The Sandia Report (Journal of Educational Research, May/June, 1993), published in the wake of A Nation at Risk, concluded that:

* “..on nearly every measure we found steady or slightly improving trends.”

* “youth today [the 1980s] are choosing natural science and engineering degrees at a higher rate than their peers of the 1960s.”

* “business leaders surveyed are generally satisfied with the skill levels of their employees, and the problems that do exist do not appear to point to the k-12 education system as a root cause.”

* “The student performance data clearly indicate that today’s youth are achieving levels of education at least as high as any previous generation.”

But the nonsense continued.

Now we’re faced with the Common Core, whose core rationale was that American public education had to ramp up its effectiveness or American economic competitiveness in the global economy was imperiled (the same scare tactic used in A Nation at Risk). But it’s simply not true.

The World Economic Forum ranks nations each year on competitiveness. The U.S. is usually in the top five (if not 1 or 2). When it drops, the WEF doesn’t cite education, but stupid economic decisions and policies.

For example, when the U.S. dropped from 2nd to 4th in 2010-11, four factors were cited by the WEF for the decline: (1) weak corporate auditing and reporting standards, (2) suspect corporate ethics, (3) big deficits (brought on by Wall Street’s financial implosion) and (4) unsustainable levels of debt.

More recently, major factors cited by the WEF are a “business community” and business leaders who are “critical toward public and private institutions,” a lack of trust in politicians and the political process with a lack of transparency in policy-making, and “a lack of macroeconomic stability” caused by decades of fiscal deficits, especially deficits and debt accrued over the last decade that “are likely to weigh heavily on the country’s future growth.”

This year (2013-14) the U.S. moved back to fifth place, with the WEF noting that “the deficit is narrowing for the first time since the onset of the financial crisis.” Guess who has opposed nearly all the policies the policies that led to the reduced deficit? The U.S. Chamber of Commerce, the Business Roundtable, and big banks, corporations and insurance companies that are avid supporters of Common Core.

Remember. The College Board and the ACT were major players in the development of the Common Core. They say they’ve “aligned” their products to it. Notice now how that alignment is being used.

American public education is in deeper trouble than many thought, and reliance on the ACT and the College Board is a significant cause of that problem/

LikeLike

Reply
Stiles says:

May 1, 2014 at 1:55 pm

Here is how SBAC intends to set achievement levels as announced today.
************************************************************************************************

At a conference in Minneapolis, MN, state education chiefs met in a public session and approved implementation of the achievement level setting design and process. Following the conclusion of the Field Test in June, Smarter Balanced will analyze the performance of more than 20,000 test items and performance tasks. This information will inform the setting of threshold test scores that separate the four performance levels on the assessments.

Achievement level setting will take place in three phases:
1. An online panel (scheduled for October 6-17) will allow up to 250,000 K-12 educators, higher education faculty, parents, and other interested parties to participate virtually in recommending achievement levels.
2. An in-person workshop (October 12–19) with panels of educators and other stakeholders working in grade-level teams will deliberate and make recommendations for the thresholds of the four achievement levels.
3. The vertical articulation committee, a subset of the in-person workshop, will then examine recommendations across all grades to consider the reasonableness of the system of cut scores.

Governing States will meet this fall to review and endorse achievement level recommendations. Input from the online panel will be provided to both the in-person participants and the vertical articulation committee.

“Our approach to achievement level setting emphasizes collaboration and transparency to establish a consistent means of measuring student progress on the Smarter Balanced interim and summative assessments,” noted Smarter Balanced Executive Director Joe Willhoft, Ph.D. “The online panel and the in-person workshop will provide an unprecedented opportunity to engage thousands of educators and interested stakeholders across member states, raising awareness about the importance and rigor of the assessments.”

States will nominate participants for the in-person workshop this spring, and registration for the online panel is now open to all interested K-12 educators and higher education faculty in Smarter Balanced member states.

Governing States also voted to endorse Career Readiness Frameworks, which provide guidance and information to students, parents, teachers, and counselors as students develop their career and academic goals. The Frameworks were developed by a Career Readiness Task Force that included national experts and representatives from Smarter Balanced member states. There are 16 model frameworks, one for each of the career clusters developed by the National Association of State Directors of Career Technical Education Consortium (NASDCTEc). Many states use these clusters to organize curricula for career and technical education (CTE) programs.

In endorsing the Frameworks, Governing States affirmed their expectation that all students should achieve a level proficiency in English language arts and math that prepares them for success in a wide range of careers, including those that require postsecondary education or training. Use of the Frameworks is voluntary, and states are encouraged to customize them based on career and technical education curricula, state and local labor markets, the educational offerings of postsecondary institutions, and other local needs.

LikeLike

Reply