William Mathis: Everything Important Cannot Be Measured

The following was written by William Mathis, vice-chair of the Vermont State Board of Education and Managing Director of the National Education Policy Center in Boulder, CO.

Education Reforms: Everything Important Cannot be Measured

We’re now in our seventieth year of national crisis. “Society is in peril of imminent collapse unless we do something about education,” is the mantra. It would seem that if we had an “imminent” crisis a lifetime ago, something bad would have happened by now. While doomsayers can go back to the Mayan calendar, we can start with the 1950s with Admiral Rickover attacking the “myth of American educational superiority” and unfavorably comparing the United States to other nations. He proclaimed education as “our first line of defense.” This was followed by the “Nation at Risk” report in 1983 which proclaimed that our schools were besieged, “by a rising tide of mediocrity that threatens our very future as a Nation and a people.” Unfavorable test score comparisons and military metaphors remain popular with the reformers. These prognostications failed to come true.

Perhaps, the reformers got it wrong.

Attributed to Einstein, “Everything that can be measured is not important and everything important cannot be measured.” In focusing on what is easily measured, rather than what is important, we fail to grasp the real problem. To be sure, tests measure reading and math reasonably well and we need to keep tests for that purpose. But that’s only one part of education. Schools also teach children to get along with others, prepare young people for citizenship, encourage creativity, teach job and human skills, integrate communities, teach tolerance and co-operation, and generally prepare students to be contributing members of society. These things are not so easily measured.

Even if we limit ourselves to test scores, as a society, we misread them. That is, the low scores are strongly affected by circumstances outside the schools. Children coming from violent, economically challenged and drug addicted homes, as a group, are not going to do as well as their more fortunate classmates. As the family income gap between children has widened, the achievement gap has also widened.

A Stanford professor compared all the school districts in the nation using six different measures of socio-economic well-being and found that a stunning 70% of test scores could be predicted by these six factors. When the PARCC tests, which are used to test “college and career readiness” were compared with freshman grade point average, the tests only predicted between one and 16% of the GPA. What this means is that the tests do a better job of measuring socio-economic status than measuring schools.

This pattern has been solidly and consistently confirmed by a mountain of research since the famous Coleman report in 1966. It pointed to family and social problems rather than schools. So what did we do? We collected more data. We now have “data dash-boards.” Countless ads on the web tout this lucrative market and proclaim how people can “drill down,” create interactive charts and visuals to provide “deep learning.” They display all manner of things such as differences by ethnic group, technical education, graduation rate and a myriad of exotic esoterica. By all means, we need to continue to collect this important data. The problem is that we already know what the dash-board tells us. What it doesn’t tell us is the nature of the real problems and how to correct them. First, we must look to those things outside the school that affect school performance. Second, in addition to hard data, we must use on-the-ground observations to see whether we provide legitimate opportunities to all children, whether the school is warm and inviting, and whether the curriculum is up to date and well-delivered.
By concentrating only on the easily measurable, we squeeze the life out of schools. We devalue, deemphasize and defund things that lead to a better life, better schools and a better civilization.
Finally, it misses the most essential point. Parents want their children to grow and lead productive, happy lives and contribute to society. They want their children to practice civic virtue and have loving relationships. But these things are not easily measured by a test. “Everything that can be measured is not important and everything important cannot be measured.”

William J. Mathis is Managing Director of the National Education Policy Center and Vice-chair of the Vermont State Board of Education. The views expressed here are not necessarily those of organizations with which he is affiliated.

[i] https://www2.ed.gov/pubs/NatAtRisk/risk.html
[ii] Haran, W. J. (may 1982). “Admiral Hyman G. Rickover, USN: A Decade of Educational Criticism, 1955-64.” Loyola Dissertation. Retrieved July 3, 2018 from https://ecommons.luc.edu/cgi/viewcontent.cgi?article=3077&context=luc_diss
[iii] https://ww w.washingtonpost.com/news/answer-sheet/wp/2015/02/12/whats-the-purpose-of-education-in-the-21st-century/?noredirect=on&utm_term=.cead22f07401
[iv] Reardon, S. F. (July 2011). “The Widening Academic Achievement Gap Between the Rich and the Poor: New Evidence and possible Explanations. Retrieved July 3, 2018 from https://static1.squarespace.com/static/58b70e09db29d6424bcc74fc/t/59263d05c534a59e6984a5fd/1495678214676/reardon+whither+opportunity+-+chapter+5.pdf
[v] Reardon, S. F. (April 2016). School District Socioeconomic Status, Race and Academic Achievement. https://cepa.stanford.edu/…/school-district-socioeconomic-status-race-and-academic-achievement
[vi] http s://www.washingtonpost.com/news/answer-sheet/wp/2016/05/27/alice-in-parccland-does-validity-study-really-prove-the-common-core-test-is-valid/?utm_term=.12cf542ae0cf
Attachments area

Yvonne Siu-Runyan says:

August 21, 2018 at 10:10 am

TRUE. Thanks, Mathis.

LikeLike

Duane E Swacker says:

August 21, 2018 at 10:20 am

From Ch. 6 “Of Standards and Measurement” in “Infidelity to Truth: Education Malpractice in American Public Education” (pgs 49-52):

“This confusion is compounded by what it means to measure something and the similar misuse of the meaning of the word measure by the proponents of the standards and testing regime. Assessment and evaluation perhaps can be used interchangeably but assessment and evaluation are not the same as measurement. Word usage matters!
The Merriam-Webster dictionary definition of measure includes the following:
1a (1): an adequate or due portion (2): a moderate degree; also: moderation, temperance (3): A fixed or suitable limit: bounds b: the dimensions, capacity or amount of something ascertained by measuring c: an estimate of whit is to be expected (as of a person or situation d: (1): a measured quantity (2): amount, degree
2a: an instrument or utensil for measuring b (1): a standard or unit of measurement—see weight table (2): A system of standard units of measure
3: the act or process of measuring
4a (1): melody, tune (2): dance; especially: a slow and stately dance b: rhythmic structure or movement: cadence: as (1): poetic rhythm measured by temporal quantity or accent; specifically: meter (2): musical time c (1): a grouping of a specified number of musical beats located between two consecutive vertical lines on a staff (2): a metrical unit: foot
5: an exact divisor of a number
6: a basis or standard of comparison <wealth is not a measure of happiness
7: a step planned or taken as a means to an end; specifically: a proposed legislative act
Measure as commonly used in educational standard and measurement discourse comes under definitions 1d, 2, and 3, the rest not being pertinent other than to be used as an obfuscating meaning to cover for the fact that, indeed, there is no true measuring against a standard whatsoever in the educational standards and standardized testing regimes and even in the grading of students. What we are left with in this bastardization of the English language is a bewildering befuddle of confusion that can only serve to deceive many into buying into intellectually bankrupt schemes that invalidly sort, rate and rank students resulting in blatant discrimination with some students rewarded and others punished by various means such as denying opportunities to advance, to not being able to take courses or enroll in desired programs of study.
The most misleading concept/term in education is “measuring student achievement” or “measuring student learning”. The concept has been misleading educators into deluding themselves that the teaching and learning process can be analyzed/assessed using “scientific” methods which are actually pseudo-scientific at best and at worst a complete bastardization of rationo-logical thinking and language usage.
There never has been and never will be any “measuring” of the teaching and learning process and what each individual student learns in their schooling. There is and always has been assessing, evaluating, judging of what students learn but never a true “measuring” of it.
The TESTS MEASURE NOTHING, quite literally when you realize what is actually happening with them. Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement:
“Physical tests, such as those conducted by engineers, can be standardized, of course, but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.”
Notice how he is trying to assert by proximity that educational standardized testing and the testing done by engineers are basically the same, in other words a “truly scientific endeavor”. The same by proximity is not a good rhetorical/debating technique.
Since there is no agreement on a standard unit of learning, there is no exemplar of that standard unit and there is no measuring device calibrated against said non-existent standard unit, how is it possible to “measure the nonobservable”?
PURE LOGICAL INSANITY!
Finally, what the proponents of the educational standards and standardized testing regime don’t appear to understand is that in many areas of human feelings and interactions there cannot be any measurement. How does one measure the love of one’s spouse, children, parents or friends? How does one measure what is going on in the heart and mind of a distressed person who has just lost a loved one? Why do we even begin to think that we can measure what goes on in the body and brain of the student who is learning any subject matter considering all the various hormonal and endocrinal influences occurring outside the individual’s control, with the hundreds of millions if not billions of neuronal firings going on at any given moment that partially influence what happens in the mind of the student in a teaching and learning situation? How do we believe that the thousands and thousands of environmental influences on each individual could begin to be measured and accounted for? Are proponents of the educational standards and standardized testing “measurement” regime that arrogant, hubristic and presumptuous to believe that they hold the key to measuring the teaching and learning process or more specifically, the learning, aka, student achievement, of an individual student?
Considering the facts of the misuse of language, logic and common sense as outlined above, the only wise course of action is to immediately cease and desist, to abandon those malpractices that harm so many students and contravene the state’s responsibility in providing a public education for all students. The billions of dollars spent by states on the educational standards and standardized testing regime would then be freed up to provide a better education for all students through perhaps such things as smaller class sizes, needed social services, foreign language instruction, arts programs, etc. And the state, by approving and mandating the fake standards and false measuring of student learning that are the malpractices of educational standards and standardized testing, is surely guilty of not promoting “the welfare of the individual so that each person may savor the right to life, liberty, the pursuit of happiness, and the fruits of their own industry.”

bethree5 says:

August 21, 2018 at 10:45 am

Duane this statement is hilarious: “Physical tests, such as those conducted by engineers, can be standardized, of course, but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.” Measuring the non-observable calls to mind the scamming tailors in “The Emperor Has No Clothes.”

LikeLiked by 1 person

- Duane E Swacker says:
  
  August 21, 2018 at 10:50 am
  
  Supposedly measuring the non-observable is EXACTLY what the proponents of standardized testing tell the world that is what is happening in that process. . . and that it is being done objectively scientifically at that, eh!
  
  LikeLike
- SomeDAM Poet says:
  
  August 21, 2018 at 10:58 am
  
  Some astronomers claim to have measured the unobservable (dark matter), but they really have not.
  
  Wat they have really measured are things (eg, movement of stars about galactic centers) that might (indirectly) indicate dark matter but might just indicate that their theoretical understanding of gravity is incomplete.
  
  Claims to have” measured the unobservable ” should always be taken with a grain (or block) of salt.
  
  LikeLike

Akademos says:

August 21, 2018 at 10:41 am

Yes, it’s so obvious a truth that we don’t spend enough time on it. There are so many things that make measurement itself absurd. “More Than A Score” is not a bad slogan. It’s emotional, simple, evocative. Yet, the truth of the matter is far more radical than that slogan. The score is very likely of very little value, if any. Other indicators should by far override “scores” (exams, bank accounts, rankings) in terms of importance for senses of worth and direction.

August 21, 2018 at 10:43 am

And of course, as always, there’s Wilson.

Duane E Swacker says:

August 21, 2018 at 3:45 pm

For a little Wilson on measuring and boundary conditions see below.

LikeLike

SomeDAM Poet says:

August 21, 2018 at 11:08 am

Everything that can be measured is not important and everything important cannot be measured.”

That is a mangling of the statement normally attributed to Einstein and actually has quite a different meaning from

“Not everything that counts can be counted, and not everything that can be counted counts.”

And by all indications, it was not Einstein who said it, but William Bruce Cameron

https://quoteinvestigator.com/2010/05/26/everything-counts-einstein/

I seriously doubt Einstein (or any other scientist) would ever have claimed either that “Everything that can be measured is not important” or that everything important can not be measured”

After all, gravity can be measured and is surely important!

Without it, the Earth would not orbit the sun and we wouldn’t be held to the Earth.

Akademos says:

August 21, 2018 at 11:33 am

Of course, measurement is often crucial in the world, when it determines what can or can’t be done, or what is it isn’t true, how long things will take, etc.

The thing is, there had grown an obsession around it, especially where it involves things that can only be very indirectly measured if at all, and even things that should not be measured, at all.

And, I know I don’t need to tell you that. This is just a kind of corroboration.

LikeLike

- Akademos says:
  
  August 21, 2018 at 11:34 am
  
  Has grown
  
  LikeLike
- SomeDAM Poet says:
  
  August 21, 2018 at 12:17 pm
  
  I agree that there is an obsession with measurement — which is conveyed by the actual quote — and i applaud William Mathis for pointing that out.
  
  But , unfortunately, the mis quote that he attributed to Einstein does not convey the original meaning and is not a claim that any scientist would ever make.
  
  LikeLike
- Akademos says:
  
  August 21, 2018 at 1:06 pm
  
  what is or isn’t true
  
  LikeLike
- SomeDAM Poet says:
  
  August 21, 2018 at 1:41 pm
  
  “What is it isn’t true”
  
  Sounds like something Rudy “Truth is not Truth” Giuliani would say
  
  LikeLike
- SomeDAM Poet says:
  
  August 21, 2018 at 1:43 pm
  
  Of Bill “That depends on what the meaning of is is” Clinton
  
  LikeLike
- SomeDAM Poet says:
  
  August 21, 2018 at 1:47 pm
  
  “What is it isn’t true”
  
  What isn’t is true
  What is, it is not
  For red, it is blue
  And cold, it is hot
  
  LikeLike
- Akademos says:
  
  August 21, 2018 at 5:09 pm
  
  Just a correction.
  
  “what is it isn’t true”
  Should be
  “what is or isn’t true”
  
  LikeLike
- Akademos says:
  
  August 21, 2018 at 5:23 pm
  
  For those keeping score.
  
  LikeLike
- Akademos says:
  
  August 21, 2018 at 5:31 pm
  
  Between Rudy the Mayor of the Country and Trump the King of the World, what is it that is or isn’t true? And why would or wouldn’t it be . . . 🇷🇺?
  
  LikeLike
SomeDAM Poet says:

August 21, 2018 at 1:57 pm

Actually, even the title of this post does not convey the message correctly

Rather than “Everything important cannot be measured”, it should really be “Not everything important can be measured” because some things that are important (like gravity) actually can be measured, so “everything important cannot be measured” is actually a false statement.

It’s not a matter of being pedantic, either, because not everyone who reads the piece by Mathis and posts like Diane’s will be familiar with the original quote and when they read the misquote, they might actually say “that’s nonsense”. And they would be correct in their assessment.

LikeLike

retired teacher says:

August 21, 2018 at 11:23 am

I am reminded of Robert Fulghum’s little book, “All I Really Need To Know I Learned in Kindergarten,” that teaches us of the immeasurable value of respecting self and others and learning to function cooperatively in a community. Clearly, the right wing zealots and hedge fund managers never read it. They are all about promoting themselves over community and survival of the fittest.

Georg says:

August 21, 2018 at 11:35 am

We should beware of the pendulum swinging from extreme, thoughtless testimania back to extreme, thoughtless testhostility. I agree that current tests and testing practice are bad, because they hardly, if at all, test was they pretent to test, but measure only reading speed, test anxiety, and test wiseness. This is, as Peter Sacks had rightly argued 2000 in his book “Standardized minds”, the actual reason for the strong correlation between test scores and economic status! If you do not say this, you unintenionally reinforce the false belief that these tests would be valid. We do have excellent, valid tests of important human traits which Einstein could not know. They are psychologically made tests, not statistically fabricatred ones. But there construction costs time and money and they do not provide the test industry with profits as bad tests do. In fact, they are off limits for high-stakes testing of people. They are only made for evaluating teaching methods, lesson plans, and educational policy-making. If we teach all kids well, we do not need to select them and can focus instead on self-selection and cooperation (Deming). Admission to scarce positions would then be made on the basis of a true lottery, not on the basis of a pseudo-scientific lottery.

Duane E Swacker says:

August 21, 2018 at 3:56 pm

“We do have excellent, valid tests of important human traits which Einstein could not know. They are psychologically made tests, not statistically fabricatred ones.”

Georg, please name those tests. And what then distinguishes them from the “statistically fabricated ones”?

Although, I think I’ve allowed you enough rope to hang yourself-LOL-I really am interested in your response.

LikeLike

bethree5 says:

August 21, 2018 at 8:31 pm

I thought for a minute you meant Myers-Briggs (which, at least in the case of mine & my husband’s, were eerily accurate). Echoing Duane: what are the psychologically-constructed tests which evaluate teaching methods, lesson plans and educational policy-making?

LikeLike

August 21, 2018 at 11:38 am

I knew I had read about de Maistre in The Proper Study of Mankind.

He very much factors into what Tolstoy wrote in War and Peace, according to Berlin’s famous essay ‘The Hedgehog and the Fox’. Tolstoy even requested de Maistre’s letters and notes as he wrote the novel. This all has much to do with an obsession with the concrete and verifiable, . . . Measurability.

Roy Turrentine says:

August 21, 2018 at 12:31 pm

“To be sure, tests measure reading and math reasonably well and we need to keep tests for that purpose”

I think I should re-phrase Mathis’ here. Try this: when we give tests to try to make observations about reading and math achievement and ability, some of the information might be of value if teachers are aware of it in time to adjust their teaching in reaction to it.

We all know that we do not measure learning. Only the most obdurate believe that this is anything but a metaphor for what we are doing. The Literalization of that metaphor has produced the monstrosity of modern testing, packaging the metaphoric in obscure language that shield the public from true understanding of the actual thing that is occurring. When we test, we get an idea of what is going on when a kid reads or does math. It is just an idea.

Of course, that language will never convince a school board, legislator, or superintendent to spend money on it, so the sales people use the metaphor.

Duane E Swacker says:

August 21, 2018 at 4:09 pm

That line also caught my eye, Roy.

But I am not sure at all that “we all know that we do not measure learning” as evidenced by Mathis’s statement and the continuing reliance on the false and error-filled concepts of the standards and testing malpractice regime. I wish I didn’t have to say “Say it ain’t so, Joe”, but. . . .

LikeLike

bethree5 says:

August 21, 2018 at 9:03 pm

I was going to take issue with that line, too, Roy — at least the reading part (can’t speak for math). And I am not an Eng teacher but just going by tests my kids had in ’90’s/’00’s, & sample Q’s from stdzd ELA tests, I agree w/Georg above that reading tests measure reading speed, test anxiety, & test ‘wiseness’.

If I were a reading teacher, I would try mainly to figure out a way to determine how much my students were reading. (And, of course, devise methods other than tests to get a handle on their reading comprehension.)

Anecdote: as a Mom, I followed (a bit too anxiously) how much my kids were reading, & worked hard to find books they would like. I worried that my eldest barely read for pleasure (other than the occasional perfect find e.g. David Sedaris). At about 12 [late ’90’s]he picked up on this & remarked, “you know I really read quite a bit.” How could I have forgotten the computer? The kid actually was -constantly – reading!

LikeLike

Peggie Schommer says:

August 21, 2018 at 1:13 pm

Bravo!

August 21, 2018 at 3:50 pm

From Wilson’s Ch. 9 Instrumentation-“Educational Standards and the Problem of Error”

Boundary conditions

Another fact of Science often conveniently forgotten is that the precision of the
physical sciences – that is, their ability to obtain (almost) identical results in
replicated experiments – is directly related to our ability to control the boundary
conditions of the experiment: to prevent heat loss, to create a vacuum, to
maintain a constant magnetic field, and so on. The precision of physics is
specifically related to our ability to create a completely controlled (and hence
artificial) environment in which to construct and conduct the experiment. The
formulas of dynamics are very accurate in predicting the velocities of objects in
free fall in a known gravity field in a vacuum. They are hopeless in predicting
such velocities for a skydiver who jumps from a real aircraft in a real
atmosphere. She will not reach the ground at the same time as a bunch of
feathers or a lead ball thrown out at the same time, nor, luckily for her, at any
time predicted by the formulas of simple dynamics. The point to note is that
controlling the boundary conditions often produces an artificial environment
which makes the data unusable in the ‘uncontrolled’ world.

This excursion into elementary physics is occasioned not only by nostalgia, but
by a desire to clarify some of the relationships between instrument precision
and measurement precision in that most precise of sciences, and to point out
that whilst precision in Physics certainly cannot be greater than that of the
measuring instrument, and any calculation based on that measurement is
limited by the empirical accuracy of the attendant theory, that in most cases
these two variables are not the main limitation on replicable accuracy. It is
rather the stability of boundary conditions, the physical scientist’s ability to
artificially freeze all other significant variables, that allows such precision,
predicability and control in these sciences.

And this is the precise problem we face when we try to measure people. For the
boundary condition for stable human behaviour (and all measurement of
people, all assessments, all tests, all examinations, must elicit or refer to some
form of behaviour), is a stable human mind. But the individual human organism
is not a computer. It does not produce a unique response to the same situation,
if for no other reason that the ‘same’ situation never reoccurs. Perception and
conception, and hence response, to ‘identical’ situations invariably differ, as the
variables that affect such reactions – attention, mood, focus, metabolic rate,
tiredness, visualisations, imagination, memory, habit, divergence, growth etc. –
come into play.

As Kyberg (1984) describes it:

‘Measurement makes sense only when the standards are reproducible, permanence over time being considered a form of reproducibility. Furthermore, the usefulness of measuring according to this scale depends on some form of reproducibility or permanence among the objects or processes being measured. (p190).’

So the very concept of a ‘true’ measurement resides in the assumption of a stability and permanence in the characteristic being measured, and the boundary conditions of the measurement. Lack of these conditions does not represent so much an error of measurement, as a discrepancy with fundamental assumptions.”

bethree5 says:

August 21, 2018 at 9:11 pm

Excellent cite, thank you. I would ask you to spread the word to every economist & datametrician everywhere, but I suspect the they are paid for lulling the public into a false sense of predictability.

LikeLiked by 1 person

SomeDAM Poet says:

August 21, 2018 at 10:47 pm

reproducibility is a key element of science and lack of reproducibility is a sure sign that something is not science.

vAM is the poster child for lack of reproducibility. A teacher can be rated effective one year and ineffective the next or even simultaneously effective and ineffective if they taught two different grade levels the same year.

LikeLiked by 1 person

- SomeDAM Poet says:
  
  August 21, 2018 at 11:03 pm
  
  Also the reason that folks like Raj Chettypicker are not scientists.
  
  That and his chettypicking.
  
  LikeLike

Bob Shepherd says:

August 22, 2018 at 11:55 am

Nothing important can be validly and reliably measured based on these tests of these “standards” in ELA. Whether an education official can figure out the reasons why is, i think, a quite valid measure of his or her intelligence.

Sent from my iPhone

Bob Shepherd says:

August 22, 2018 at 12:02 pm

or pundit

LikeLike

Bob Shepherd says:

August 22, 2018 at 12:17 pm

OK. I did overstate that a bit. The ELA tests are reliable (though not valid) measures of socioeconomic status and ZIP Code.

LikeLike

- Duane E Swacker says:
  
  August 23, 2018 at 6:11 pm
  
  And they aren’t “measures”. There is a correlation between SES status, i.e., ZIP Code and standardized test scores, and not even that strong of one at that. But a correlation is not a “measure” and to call it a measure is to misuse the language just as the edudeformers would like us to misuse the language to further their agenda.
  
  LikeLike

August 22, 2018 at 12:24 pm

How to Prevent Another PARCC Mugging: A Public Service Announcement

The Common Core Curriculum Commissariat College and Career Ready Assessment Program (CCCCCCRAP) needs to be scrapped. Here are a few of the reasons why:

1.The CCSS ELA exams are invalid.

First, much of attainment in ELA consists in world knowledge (knowledge of what—the stuff of declarative memories of subject matter). The “standards” being tested cover almost no world knowledge and so the tests based on those standards miss much of what constitutes attainment in this subject. Imagine a test of biology that left out almost all world knowledge about biology and covered only biology “skills” like—I don’t know—slide-staining ability—and you’ll get what I mean here. This has been a problem with all of these summative standardized tests in ELA since their inception.

Second, much of attainment in ELA consists in procedural knowledge (knowledge of what—the stuff of procedural memories of subject matter). The “standards” being tested define skills so vaguely and so generally that they cannot be validly operationalized for testing purposes as written.

Third, nothing that students do on these exams EVEN REMOTELY resembles real reading and writing as it is actually done in the real world. The test consists largely of what I call New Criticism Lite, or New Criticism for Dummies—inane exercises on identification of examples of literary elements that for the most part skip over entirely what is being communicated in the piece of writing. In other words, these are tests of literature that for the most part skip over the literature, tests of the reading of informative texts that for the most part skip over the content of those texts. Since what is done on these tests does not resemble, even remotely, what actual readers and writers do in the real world when they actually read and write, the tests, ipso facto, cannot be valid tests of real reading and writing.

Fourth, standard standardized test development practice requires that the testing instrument be validated. Such validation requires that the test maker show that the test correlates strongly with other accepted measures of what is being tested, both generally and specifically (that is, with regard to specific materials and/or skills being tested). No such validation was done for these tests. NONE. And as they are written, based on the standards they are based upon, none COULD BE done. Where is the independent measure of proficiency in CCSS.Literacy.ELA.11-12.4b against which the items in PARCC that are supposed to measure that standard on this test have been validated? Answer: There is no such measure. None. And PARCC has not been validated against it, obviously LOL. So, the tests fail to meet a minimal standard for a high-stakes standardized assessment—that they have been independently validated.

The test formats are inappropriate.

First, the tests consist largely of objective-format items (multiple-choice and EBSR). These item types are most appropriate for testing very low-level skills (e.g., recall of factual detail). However, on these tests, such item formats are pressed into a kind of service for which they are, generally, not appropriate. They are used to test “higher-order thinking.” The test questions therefore tend to be tricky and convoluted. The test makers, these days, all insist on answer choices all being plausible. Well, what does plausible mean? Well, at a minimum, plausible means “reasonable.” So, the questions are supposed to deal with higher-order thinking, and the wrong answers are all supposed to be plausible, so the test questions end up being extraordinarily complex and confusing and tricky, all because the “experts” who designed these tests didn’t understand the most basic stuff about creating assessments–that objective question formats are generally not great for testing higher-order thinking, for example. For many of the sample released questions, there is, arguably, no answer among the answer choices that is correct or more than one answer that is correct, or the question simply is not, arguably, actually answerable as written.

Second, at the early grades, the tests end up being as much a test of keyboarding skills as of attainment in ELA. The online testing format is entirely inappropriate for most third graders.

The tests are diagnostically and instructionally useless.

Many kinds of assessment—diagnostic assessment, formative assessment, performative assessment, some classroom summative assessment—have instructional value. They can be used to inform instruction and/or are themselves instructive. The results of these tests are not broken down in any way that is of diagnostic or instructional use. Teachers and students cannot even see the tests to find out what students got wrong on them and why. So the tests are of no diagnostic or instructional value. None. None whatsoever.

The tests have enormous incurred costs and opportunity costs.

First, they steal away valuable instructional time. Administrators at many schools now report that they spend as much as a THIRD of the school year preparing students to take these tests. That time includes the actual time spent taking the tests, the time spent taking pretests and benchmark tests and other practice tests, the time spent on test prep materials, the time spent doing exercises and activities in textbooks and online materials that have been modeled on the test questions in order to prepare kids to answer questions of those kinds, and the time spent on reporting, data analysis, data chats, proctoring, and other test housekeeping.

Second, they have enormous cost in dollars. In 2010-11, the US spent 1.7 billion on state standardized testing alone. Under CCSS, this increases. The PARCC contract by itself is worth over a billion dollars to Pearson in the first three years, and you have to add the cost of SBAC and the other state tests (another billion and a half?), to that. No one, to my knowledge, has accurately estimated the cost of the computer upgrades that will be necessary for online testing of every child, but those costs probably run to 50 or 60 billion. This is money that could be spent on stuff that matters—on making sure that poor kids have eye exams and warm clothes and food in their bellies, on making sure that libraries are open and that schools have nurses on duty to keep kids from dying. How many dead kids is all this testing worth, given that it is, again, of no instructional value? IF THE ANSWER TO THAT IS NOT OBVIOUS TO YOU, YOU SHOULD NOT BE ALLOWED ANYWHERE NEAR A SCHOOL OR AN EDUCATIONAL POLICY-MAKING DESK.

The tests distort curricula and pedagogy.

The tests drive how and what people teach, and they drive much of what is created by curriculum developers. This is a vast subject, so I won’t go into it in this brief note. Suffice it to say that the distortions are grave. In U.S. curriculum development today, the tail is wagging the dog.

The tests are abusive and demotivating.

Our prime directive as educators is to nurture intrinsic motivation—to create independent, life-long learners. The tests create climates of anxiety and fear. Both science and common sense teach that extrinsic punishment and reward systems like this testing system are highly DEMOTIVATING for cognitive tasks. The summative standardized testing system is a really, really backward extrinsic punishment and reward approach to motivation. It reminds me of the line from the alphabet in the Puritan New England Primer, the first textbook published on these shores:

F
The idle Fool
Is whip’t in school.

The tests have shown no positive results.

We have have had almost two decades,now, of standards-and-testing-based accountability under NCLB and its successor. We have seen only minuscule increases in outcomes, and those are well within the margin of error of the calculations. Simply from the Hawthorne Effect, we should have seen SOME improvement!!! And that suggests that the testing has actually DECREASED OUTCOMES, which is consistent with what we know about the demotivational effects of extrinsic punishment and reward systems. It’s the height of stupidity to look at a clearly failed approach and to say, “Gee, we should to a lot more of that.”

The tests will worsen the achievement and gender gaps.

Both the achievement and gender gaps in educational performance are largely due to motivational issues, and these tests and the curricula and pedagogical strategies tied to them are extremely demotivating. They create new expectations and new hurdles that will widen existing gaps, not close them. Ten percent fewer boys than girls, BTW, received a proficient score on the NY CCSS exams–this in a time when 60 percent of kids in college and 3/5ths of people in MA programs are female. The CCSS exams drive more regimentation and standardization of curricula, which will further turn off kids already turned off by school, causing more to turn out and drop out.

This message not brought to you by

PARCC: Spell that backward
notSmarter, imBalanced
AIRy nonsense
CTB McGraw-SkillDrill
MAP to nowhere
Scholastic Common Core Achievement Test (SCCAT)
The Bill and Melinda Gates Foundation (“All your base are belong to us”)

William Mathis: Everything Important Cannot Be Measured

38 Comments Post your own or leave a trackback: Trackback URL

Leave a reply to SomeDAM Poet Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

William Mathis: Everything Important Cannot Be Measured

Diane Ravitch's Blog

38 Comments Post your own or leave a trackback: Trackback URL

Leave a reply to SomeDAM Poet Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats