Andy Hargreaves recently retired as a professor at Boston College. In this article, which appeared in the Toronto Star as part of a debate, he advises Canada to abandon mandatory testing. Canada tests every student in grades 3 and 6.
If you open the article, you can vote for or against mandatory testing.
Don’t you wish our students were tested only in grades 3 and 6?
He writes:
“Finland uses samples. Israel samples a different subject every year in three-year cycles. Provinces and countries are already compared by samples on national and international assessments. Streamline the work of the Education Quality and Accountability Office (EQAO) around samples and this government will meet its accountability requirements and also save a big chunk of more than $130 million over four years that can go straight into the classroom.
“Bigger data is no substitute for better leadership. Some experts believe sampling makes it hard to pinpoint problems in small sub-groups in a school or board, like equitable achievement for a particular ethnic minority. Statisticians have an answer for this – that you can vary the nature or size of the sample to include and protect these groups. But an even better answer is that when subgroups get very tiny in small schools or boards, we don’t need more data about everybody. We just need better feedback from and relationships with the people right in front of us.
“The side effects outweigh the benefits. If you have an illness and try some drugs to ease it, you don’t want the negative side effects to outweigh the benefits. The negative side effects of testing a whole population in any grade are immense. Test results are known to the media and to real estate agents. Some school board administrators put excessive pressure on their schools and teachers in high-poverty areas to hit the numbers. Principals will then do almost anything to get the scores up. The stakes and stress are incredibly high.”
Those tests are ridiculous.
Maybe the intent is to get more kids on anti- depressants. Wouldn’t surprise me. There’s a lot of $$$$$ to be made.
“Statisticians have an answer for this – that you can vary the nature or size of the sample to include and protect these groups. But an even better answer is that when subgroups get very tiny in small schools or boards, we don’t need more data about everybody. ”
It occurred to me when I was reading this line that the presence of statistics can be used to justify ignoring a population with more substantial needs. A government that wants to ignore a sub-group can claim that they do not ignore that sub-group because they keep statistics on it. But keeping stats and helping folks out are two different things. Just because we keep stats on one group or geographical area does not mean that we are helping them out with money and other support.
All that said, I am sensing that many voices are raising in opposition to the testing malignancy. Right on!
“Statisticians have an answer for this”
Statisticians have a statistical answer for everything.
But a statistical answer is neither required nor desired in every case.
If you ask a stupid question, you will get a stupid answer. Guaranteed.
Many of the questions that statisticians are asking about education (how can we use test scores to evaluate teachers?) are just dumb and are a direct result of the fact that the statisticians don’t understand the subject they are doing statistics on.
essential thinking: ” A government that wants to ignore a sub-group can claim that they do not ignore that sub-group because they keep statistics on it. “
exactly: if you tested the subgroup, it was noticed. If you did nothing at all to remedy the circumstances and needs of that subgroup, who cares?
In all of this argument about testing, the BIG ISSUE gets left out. That issue is the following: the tests simply aren’t valid and reliable measures of what they are supposed to be measuring, and this is a systematic problem. The standards in ELA are themselves so poorly constructed–so vague–that they cannot be validly operationalized to create objective measures of them, and those standards don’t even treat a lot of what constitutes attainment in ELA, including a lot of declarative knowledge. So, garbage in, garbage out. And the test questions have formats that aren’t appropriate to measuring what they are supposed to measure and tend to be confused and confusing.
And one can’t explain these matters to people in sound bites, which makes turning this into a political debate almost impossible.
“to measuring what they are supposed to measure”
Because they aren’t measuring anything. As staunch standardized testing proponent Richard Phelps has stated “Physical tests, such as those conducted by engineers, can be standardized, of course, but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.”
Someone please explain to me how one can measure the “nonobservable” mental traits when there is no standard unit of learning of those traits, nor is there a measuring device calibrated against said non-existent standard unit of learning.
So, according to Phelps and the psychometricians we are “measuring” the nonobservable with a non-existent measuring device that is not calibrated against a non-existing standard.
Wow, where can I get some of that good stuff he must be ingesting? I haven’t had a thought like that since doing acid 45 years ago.
How indeed can you observe the nonovservable? This is silly. You must observe something to measure it. Exactly what can you observe to “see” understanding. A person can answer questions, write sentences about something, compose poetry, essays, and maybe jump while doing it. The best you can do is to get a vague idea about understanding, which can be part of a conscious development over time. Over time you can begin to feel you are understanding better. No one can tell but you. And sometimes you really do not understand.
“Don’t you wish our students were tested only in grades 3 and 6?”
NO!
Because those test would still suffer all the inherent onto-epistemological errors and falsehoods and psychometric fudges identified by Noel Wilson that render the usage of any of the results COMPLETELY INVALID*.
WHY would one want to use completely invalid data from those tests for anything?
Certainly doesn’t make sense to me.
Any fool can do statistics and call himself or herself a “statistician. “. There are lots of ” plug and chug” software packages out there that
allow people to do pretty much any sort of statistics that they might desire.
Unfortunately, knowing when statistics are warranted and whether they mean anything requires a brain.
Oops, forgot to include the *:
“Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other words all the logical errors involved in the process render any conclusions invalid.
The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
From the beginning of the article (with minor corrections):
“There are NO good reasons for testing.
First, testing INVALIDLY AND FALSELY certifies people’s skill and knowledge, including their fitness for higher education.”
In my career I taught without standardized testing, with standardized testing, and the beginning of high stakes testing which is the worst educational climate of all. High stakes testing is a political tool, and it does nothing to inform education particularly when the tests based on the CCSS are deliberately written above grade level expectations to produce a wide net of failure. They also have a movable cut score that enables the privatizers to manipulate proficiency levels capriciously according to the whims of politicians and their corporate overseers. Not only are the tests poorly written and unfair, they have never been legitimately validated. Maybe the time has come to seek a solution in the courts the way VAM was tested. There are also civil rights concersn associated with testing that is designed to deliver large numbers of minority students over to private enitities rather than providing them with an equitable public education. Separate and unequal education should not be tolerated in a country founded on democratic principles.
No amount of testing and data mining is a substitute for education. We need to challenge all the fake accountability lies of high stakes testing and privatization. Parents, teachers and social justice groups must defend our young people’s right to a legitimate public education, and throw a monkey wrench is this politicized educational malpractice.
Laura Chapman posted about the Ohio Educational Attainment Summit on Sept. 10, 2018, in Columbus, Ohio. The audience will be Ohio mayors.
Hoover’s Hanushek will speak. Guess who is funding the summit? Surprise answer- Bill and Melinda Gates. The location- the PUBLIC Ohio State University.
Oh, the hypocrisy, Bill Gates’ privatization is responsible for 1/4 of 101,000 students enrolled in Michigan virtual schools who fail to complete even one class.
Hanushek will endorse school choice and high-stakes accountability. He testifies in court against spending more money on education. He is an advocate of using test scores to evaluate teachers and to fire the bottom 5% every year.