Writing in Forbes, where he is a columnist, Peter Greene explains why the Big Standardized Tests are a very expensive waste of money and rime.
Writing in Forbes, where he is a columnist, Peter Greene explains why the Big Standardized Tests are a very expensive waste of money and rime.
The next “policy priority” for ed reformers will be putting in security features and proctors for remote testing of public school students because obviously remote testing requires a new testing police force.
I feel sorry for US public school students. Only in this nutty country is testing them the first and only priority. They should be able to petition for new management. They’ve been stuck with this same bunch of testing freaks for the last 20 years.
In education as in every other concern there will be no progress so long as political campaigns are “sponsored” by corporate commercial interests.
Peter’s presence as voice for education at Forbes is important. He does a really great job in this post.
Here is another topic that I wish had greater publicity. It is study of how “philanthropies” invest money in education with attention to racial equity and justice, also opportunity for students who live in poverty. There is so much billionaire posturing about saving children from failing schools. Take a look.
http://schottfoundation.org/justiceisthefoundation?eType=EmailBlastContent&eId=a66d8650-a910-4566-928e-708de34e80e3
An eye-opener at that link.
My formula: fund K12 ed with $billions of new revenue obtained by taxing those holding $billions. Poof: no more waiting for “philanthropic” grants.
Greene’s article makes perfect sense, and most teachers would agree with him. Unfortunately, big money is the puppet master in this country. They will make public schools dance to their tune even though it is not in the best interest of students. I am very disappointed in Biden and Cardona. Biden has a very short memory of the campaign promises he made to pubic schools.https://www.youtube.com/watch?v=haGLsCBPWKA
Me TOO! I am sick.Did the DFERS get to Biden? If so, DFERS, “GO AWAY.”
DFERS were pushing testing, as well as everyone funded by Gates.
Instead of wringing our hands, maybe it’s time to offer a viable alternative. The big test takes 4 months to get back to teachers rendering useless.
What if educators designed an assessment that is meaningful. Include a small pretest given at the school and going directly to teachers for their information.
This will give a “jumping off” spot for student placement. There is absolutely no need for politicians or higher up educators to get their hands on them.
Assessment is only as good as the information gathered and it’s application to the education of the student. We must insist these assessments are not politicized.
Imagine the surprise when administrators realize student skills are all over the board This will confirm the argument for small class size.
Just saying!
Students should have privacy rights over their data. It should not be the property of testing companies that sell the information to third party vendors.
Who’s wringing their hands?
DeVos’ tenure was cut short by Trump’s defeat.
Biden’s tenure may be cut short if he continues to listen to CAP’s education policy. The confirmation hearing of Neera Tanden provides perfect opportunity to expose the billionaires stealing from Main Street.
Standardized tests are not all the same, so talk about “standardized tests” in general tends to commit what linguistic philosophers call a “category error”—a type of logical fallacy. George Lakoff wrote a book about categorization called Women, Fire, and Dangerous Things. He took the title from the classification system for nouns of the indigenous Australian language Dyribal. One of the noun categories in this language includes words referring to women, things with which one does violence (such as spears), phenomena that can kill (fire), and dangerous animals (such as snakes and scorpions). What makes this category bizarre to our ears is that the things in the category don’t actually share significant, defining characteristics. Women and things associated with them are not all dangerous. Speaking of all things balan (this category in the Dyribal language) therefore doesn’t make sense. The same is true of the phrase “standardized test.” It lumps together objects that are DIFFERENT FROM one another in profoundly important ways. Imagine a category, “ziblac,” that includes greyhound buses, a mole on Socrates’s forehead, shoelaces, Pegasus, and the square roots of negative numbers.” What could you say that was intelligible about things in the category “ziblac”? Well, nothing. Talking about ziblacs would inevitably involve committing category errors—assuming that things are similar because they share a category name when, in fact, they aren’t. If you say, “You can ride ziblacs” or “Ziblacs are imaginary” or “Ziblacs don’t exist,” you will often be spouting nonsense. Yes, some ziblacs belong to the class of things you can ride (greyhound buses, Pegasus), but some do not (shoelaces, imaginary numbers), and you can’t actually ride Pegasus because Pegasus exists only in stories. Some are imaginary (Pegasus, imaginary numbers), but they are imaginary in very different senses of the term. And some don’t exist (Pegasus, the mole on Socrates’s forehead), but don’t exist in very different ways (the former because it’s fictional, the latter because Socrates died a long time ago). When we talk of “standardized tests,” we are using such an ill-defined category, and a lot of nonsense follows from that fact.
Please note that there are many VERY DIFFERENT definitions of what “standardized test” means. The usual technical definition from decades ago was “a test that had been standardized, or normalized.” This means that the raw scores on the test had been converted to express them in terms of ”standard scores”–their number of standard deviations from the mean. You do this by starting with the raw score on a test, subtracting the population mean from it, and then dividing the difference by the population standard deviation. The result is a Z-score (or a T-score if the mean is taken to be 50 and the standard deviation is taken to be 10). People do this kind of “standardizing,” or “normalization,” in order to compare scores across students and subpopulations. Let’s call this “Standardized Test Definition 1.” Many measures converted in such a way yield a so-called “bell curve” because they deal with characteristics at that are normally distributed. An IQ test is supposed to be a test of this type. The Stanford 10 is such a Standardized Test, Definition 1.
Another, much broader definition is “any test that is given in a consistent form, following consistent procedures.” Let’s call this “Standardized Test Definition 2.” To understand how dramatically this definition of “standardized test” differs from the first one, consider the following distinction: A norm-referenced test is one in which student performance is ranked based on comparison with the scores of his or her peers, using normalized, or standardized, scores.. One of the reasons for standardized scores as per Definition 1, above, is to do such comparisons to norms. A criterion-referenced test is one in which student performance is ranked based on some absolute criterion—knowledge or mastery of some set of facts or skills. Which kind of scoring one does depends on what one is interested in—how the student compares with other students (norm-referenced) or whether the student has achieved some absolute “standard”—has or has not demonstrated knowledge of some set of facts or some skill (criterion-referenced). So, Standardized Test Type 2 is a much broader category, and includes both norm-referenced tests and criterion-referenced tests. In fact, any test can be looked at in the norm-referenced or criterion-referenced way, but which one does makes a big difference. In the case of criterion-referenced tests, one is interested in whether little Johnny knows that 2 + 2 = 4. In the case of norm-referenced tests, one is interested in whether little Johnny is more or less likely than students in general to know that 2 +_2 = 4. The score for a criterion-referenced test is supposed to measure absolute attainment. The score for a norm-referenced test is supposed to measure relative attainment. When states first started giving mandated state tests, a big argument given for these is that they needed to know whether students were achieving absolute standards, not just how they compared to other students. So, these state tests were supposed to be criterion-referenced tests, in which the reported was a measure of absolute attainment rather than relative attainment, which brings us to a third definition.
Yet another definition of “Standardized Test” is “any test that [supposedly] measures attainment of some standard.” Let’s call this “Standardized Test Definition 3.” This brings us to a MAJOR source of category error in discussions of standardized testing. The “standards” that Standardized Tests, Definition 3 supposedly measure vary enormously because some types of items on standards lists, like the CC$$, are easily assessed both reliably (yielding the same results over repeated administrations or across variant forms) and validly (actually measuring what they purport to measure), and some are not. In general, Math standards, for example, contain a lot more reliably and validly assessable items (the student knows his or her times table for positive integers through 12 x 12) than do ELA standards, which tend to be much more vague and broad (e.g., the student will be able to draw inferences from texts). As a result, the problems with the “standardized” state Math tests tend to be quite different from the problems with the state ELA tests, and when people speak of “standardized tests” in general, they are talking about very different things. Deformers simply assume that is people have paid a dedicated testing company to produce a test, that test will reliably and validly test its state standards. This is demonstrably NOT TRUE of the state tests in ELA for a lot of reasons, many of which I have discussed here: https://bobshepherdonline.wordpress.com/2020/03/19/why-we-need-to-end-high-stakes-standardized-testing-now/ . Basically, the state ELA tests are a scam.
Understanding why the state ELA tests are a scam requires detailed knowledge of the tests themselves, which proponents of the tests either don’t have or have but aren’t going to talk about because such proponents are owned by or work for the testing industry. Education deformers and journalists and politicians tend, in my experience, to be EXTRAORDINARILY NAÏVE about this. Their assumption that the ELA tests validly measure what they purport to measure is disastrously wrong.
Which leads me to a final point: Critiques of the state standardized tests are often dismissed by Ed Deformers as crackpot, fringe stuff, and that’s easy for them to do, alas, because some of the critiques are. For example, I’ve read on this blog comments from some folks to the effect that intellectual capabilities and accomplishments can’t be “measured.” The argument seems to be based on the clear differences between “measurement” as applied to physical quantities like temperature and height and “measurement” as applied to intellectual capabilities and accomplishments. The crackpot idea is that the former is possible, and the latter is not. However, t is OBVIOUSLY possible to measure some intellectual capabilities and accomplishments very precisely. I can find out, for example, very precisely how many Kanji (Japanese logograms) you know, if any, or whether you can name the most famous works by Henry David Thoreau and Mary Shelley and George Eliot and T.S. Eliot. If you choose to disdain the use of the term “measurement” to refer to assessment of such knowledge, that’s simply an argument about semantics, and making such arguments gives opponents of state standardized testing a bad name—such folks get lumped together, by Ed Deformers, with folks who make such fringe arguments.
I hasten to add that most of the so-called “standards” in the CC$$ cannot, in fact, be “measured” because they are too broad and vague. Now, the way to measure something vague is to operationalize it–to substitute for it something concrete and measurable. So, for example, if you wanted to measure whether your friend Yolanda runs your Book Club democratically, you might substitute for the vague term “democratically” some concrete, observable criteria like “takes votes on where and when meetings are held” and “takes votes on what books will be read.” This is called “operationalizing” because you define the thing to be measured (‘democratic-ness”) in terms of concrete operations that you can carry out (observing those things). But this such operationalizing can easily go wrong. Suppose that your operationalize “loves me” as “buys me expensive presents.” The operational definition falls short in obvious ways. One way to view define “tests” is as operational definitions of whatever is being tested. So, an 11th-grade state standardized test in ELA is an operational definition of “mastery of the Common Core State ELA Standards, Grades 11-12.” And as with the operational definition of love as “buys me expensive presents,” it’s a lousy operational definition. Why? Well, there are hundreds of “standards” and only about 50 questions. So, there simply aren’t enough questions to “measure” all those “standards.” And many of the “standards” are very vague and/or very broad (can write narratives, can make inferences from texts, is familiar with foundational texts in American literature and history)–too vague and broad to be adequately (validly) measured by one or two multiple-choice test questions apiece. Obviously.
And, of course, all this invalidity in measurement of specific “standards” can’t add up to overall test validity.
For this reason alone–invalidity–the state standardized tests in ELA should have been laughed off the stage decades ago and would have been if people had stopped to think EVEN A BIT about the entire undertaking. It’s shocking, really, that that hasn’t happened–that we are still having to argue about this with people who claim to be pundits (Gates, Coleman, Petrilli, Ian Rosenblum, etc.) but clearly know nothing or next to nothing about these matters.
None of those you mention was ever a teacher. Gates, Coleman, Petrilli, Rosenblum.
Thank you, Diane, for consistently and continually standing up for teachers and their right to do their own testing of their students!!!! Bless you for this!!! Teachers’ grades have always been the best predictors of college success. Ironically, our testing policies are set by know-nothings.
Thanks Bob for your wonderful post on standardized tests. You mentioned George Lakoff’s treatment of about “categorization” in a book called Women, Fire, and Dangerous Things. A great title and filled with insights.
I have that book and several others.
I began with Lakoff and Johnson’s little book Metaphors We Live By. I have often given talks about their “folk theory of mind” and used their illustration of various functions of mind –depicted in a side view of a skull with a simplified version of an old phrenology poster. I especially enjoyed how the folk theory of mind depicts imagination as a version of the wild thing that MUST be controlled..a perfect set up for enjoying Maurice Sendak’s Where the Wild Things Are.