This is a terrific new book with essays showing what a farce the current test-based evaluation of teachers is.
It includes the work of several distinguished scholars who understand that it is farcical to judge teacher “quality” by using the scores on standardized tests.
I was happy to write the introduction.
Read the description and you will want to read the book.
One major finding: No state is using the teacher evaluations to improve instruction, only to punish and reward teachers.
This is a hugely valuable book that will help push back against dumb ideas.
Reblogged this on Transparent Christina.
Test score evals like FL’s VAM scores simply ignore the scientific method in the most critical way. The problem and the result are predetermined to be equivalent. That means that good teaching is defined as high test scores, then they look at test scores and declare that teachers of students with high scores (or annual gains) are the best. That is like saying that the best doctors have the healthiest patients or the best lawyers have the least guilty clients. A valid method would be to identify a particular teaching practice and measure scores against that (although it is nearly impossible to control for all of the other factors). Then you could simply ask teachers to do that practice that produces test score gains. It would actually be a school improvement model. However, if you look at Marzano’s iObservation, which basically does that (identifies good practices based on test score data) you will notice that his small scale research that produced some test gains has not replicated itself on a large scale even though teachers have proven that they are using these strategies. Basically, the only way to link test score gains to teaching practice is to change the definition of good teaching to equal high scores. That way, it doesn’t even matter if it is true or not!
“However, if you look at Marzano’s iObservation, which basically does that (identifies good practices based on test score data). . .”
Anyone who proposes to “identify good practices based on test score data” obviously doesn’t know what they are talking about since the teaching and learning process is far more complex than a simple score. To make the teaching and learning process one of raising test scores is to commit educational malpractice and in doing so one’s license should be pulled.
Does anyone know if Marzano has retracted his supposed study of interactive white boards and it’s supposed effect on teacher effectiveness. If he hasn’t why would I listen to what he has to say?
You and I are making the same point. Marzano identifies practices that are supposed to be effective but when you put them into the real world, they are not causing massive improvements. The classroom is not a closed system and no amount of number crunching is going to make students perform better. If a test is designed to measure exactly what has been taught, of course the students will do better. However, state tests are not designed that way. In defense of Marzano himself (and not the legions of fanboys and fangirls that have turned him into an education rock star), his research does not rely exclusively on standardized test data. He also looked at many other factors in arriving at his list of best practices. The list is pretty good and there isn’t much in iObservation that is against best practices. The problems are with rater reliability, high-stakes attached to observations, the volume and frequency of observations, the knowledge and skill of the observer, and the sheer number of things that can be labeled effective (or not) in a 30 minute observation. Marzano has said that his system should not be punitive. However, the reason he is raking in mega millions is because of those who want to use his system as a punitive measure!
Utah’s first school “grades” came out this week. The conservative daily in Salt Lake, the Deseret News, lauded the wonderfulness that is Florida’s grading system and stated that Florida’s NAEP test scores have gone up since the grading began. I am TOTALLY opposed to the grading of schools, but I’m wondering if anyone has the actual numbers so that I can write a rebuttal to this tripe.
Here’s the article: http://www.deseretnews.com/article/865585797/Utah7s-school-grades-are-opportunity-for-real-reform.html
Please tell your legislators and journalists over in Utah that they already discovered the reason for our elevated NAEP scores. Florida automatically retains the lowest 10% of third graders. Of course, the NAEP, which is taken by 4th graders showed improvement because the lowest 10% don’t go to 4th grade. That is why it help boost scores in the 1st year for 4th grade but subsequent years and 8th grade scores are a flat line (and that line is below the National average!). Jeb is like the Wizard of Oz. All you need to do is pull back the curtain a little to see the farce!
La. Purchase, Florida has good 4th grade nap scores. But it has a low graduation rate, lower than Alabama, which has not followed Jeb’s recipe.
Standardized tests were not designed to evaluate teachers and they are not valid instruments for doing so. It is time to use the research to help ourselves.
And to use any test results to draw conclusion about anything other than for what the test was designed is UNETHICAL, plain and simple.
AlwaysLearning & Duane Swacker: ya han dado en el blanco/y’all have hit the target [less literally, “you’ve hit the nail on the head!”].
A spate of books are coming that explain in great detail the national hazing ritual known as “high-stakes standardized testing.”
While I strongly lean towards buying the book Diane has highlighted in this posting, let me give a few choice bits from one I am now halfway through, Jim Horn & Denise Wilburn’s THE MISMEASURE OF EDUCATION (very recently available: July 12, 2013).
Think tying test scores to VAManiacal teacher evals is difficult for teaching staffs, students, and parents to grasp? Well, surely the people in charge know what it all means, right? After all, given that education is the linchpin of national security & prosperity & truth, justice and the American way, nobody in charge would just “leave it to the experts” now would they?
Horn & Wilburn 2013, p. 105, on a money quote from an educrat testifying in 2004 before the TN House Education Committee on the Tennessee Value-Added Assessment System, the ‘Frankenchild’ of Dr. William Sanders: “Vernon Coffey, Director of Grainger County Schools and former Tennessee Commissioner of Education, stated that he believed TVASS to be as reliable and valid as SAT and ACT, while admitting that ‘I don’t understand all the numbers, but I’m not supposed to.’”
So all it takes to be a Commissioner of Education these days is adherence to the time-honored [?] adages “ignorance is a virtue” and “here’s my nose: lead me where you will”? Wonder how much ignorance it takes to be Secretary of Education of these US of A?
On the following page (106) Dr. William Sanders delivers himself of the most modest pat on the back you could imagine, declaring that his VAManiacal system that undergirds the TVAAS system “has been considered by most reviewers to be the most advanced, the fairest education assessment system has been developed in the country today.”
Perhaps he needed to do some more reading. How about articles on VAM by the contributors to MANY CHILDREN LEFT BEHIND (2004)—Deborah Meier, Alfie Kohn, Linda Darling-Hammond, Theodore R. Size, George Wood and Monty Neill? Or maybe they didn’t qualify as “reviewers” since they might not have given him an A+?
Then two quotes from the book, each preceding a chapter: “What was once educationally significant, but difficult to measure, has been replaced by what is insignificant and easy to measure. So now we test how well we have taught what we do not value”.—Art Costa, professor emeritus at Cal State-Fullerton” (p. 1) and “Initially, we use data as a way to think hard about difficult problem, but then we over rely on data as a way to avoid thinking about difficult problems. We surrender our better judgment and leave it to the algorithm.—Joe Flood, author of The Fires” (p. 55).
The last words of Banesh Hoffman’s THE TYRANNY OF TESTING (1964), p. 217: “…let us shun overdependence on tests that are blind to dedication and creativity, and biased against depth and subtlety. For that way lies testolatry.”
🙂
Thanks for mentioning TN as an example of bad practices. We’re in reform Hell down here in TN. Last week the Nashville Tennessean reported that Cheats-for-Change Kevin Huffman blew the ratings on 45 school districts because of coding flaws and his convoluted evaluation scheme.
http://www.tennessean.com/article/20130816/NEWS04/308160104/TCAP-coding-flaws-issue-45-school-districts
Two school districts are taking legal action against TN DoEd. Here’s the story
https://www.facebook.com/RemoveKevinHuffman
“They note that Tennessee’s Department of Education blew the ratings for at least 45 school systems. Bad demographic data was used to score school districts, causing such obvious errors as rating both Franklin and Williamson County–commonly recognized as two of the state’s best school systems–as “needs improvement.” But Huffman allowed only one school district to appeal their rating: Nashville schools, whose “needs improvement” rating was correctly upgraded to “intermediate.”
As a result, Williamson County and the Franklin Special School District in a rare joint session agreed to support legal action against Huffman’s office. Williamson County’s superintendent pointed out that Huffman’s stance against accuracy “calls into question the credibility of the Tennessee Department of Education and its leadership.” Ain’t that the truth.”
KrazyTA, there a testing revolt happening in TN after Cheats for Change Kevin Huffman messed up the school ratings and refused to give waivers to all but one of the cheated school districts. His stubborn refusal to admit his mistakes has angered 2 of the wealthiest & influential school districts in TN.
https://www.facebook.com/RemoveKevinHuffman
“They note that Tennessee’s Department of Education blew the ratings for at least 45 school systems. Bad demographic data was used to score school districts, causing such obvious errors as rating both Franklin and Williamson County–commonly recognized as two of the state’s best school systems–as “needs improvement.” But Huffman allowed only one school district to appeal their rating: Nashville schools, whose “needs improvement” rating was correctly upgraded to “intermediate.”
As a result, Williamson County and the Franklin Special School District in a rare joint session agreed to support legal action against Huffman’s office. Williamson County’s superintendent pointed out that Huffman’s stance against accuracy “calls into question the credibility of the Tennessee Department of Education and its leadership.” Ain’t that the truth.”
“It is time to use the research to help ourselves.”
And the one most important piece of research that completely destroys the concepts of educational standards, standardized testing and the “grading” of students is Noel Wilson’s “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
AlwaysLearning, I ask that you live up to your screen name and join in reading and understanding what Wilson has to say and join me in my Quixotic Quest to rid public education of the these nefarious educational malpractices.
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
“I’m an “A” student’ is almost as harmful as ‘I’m an ‘F’ student…”
I agree. A parent told me a story of her “A student” who went off to college and got a B; the kid was inconsolable. The parent obviously had a lot invested in her daughter’s status as well , as she made excuses to me about the circumstances that resulted in her child receiving the “lower” grade.
Thank you Swacker. I have come to respect your Quixotic Quest.
De nada.
And thank you, Joanna, for bringing in the more “affective” or should that be “effective” aspect of the teaching and learning process, that of using/teaching music (And I’m dirt stupid when it comes to music knowledge, talk about a foreign language, but I sure do love listening to all types).
“Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.”
Einstein
He also wrote that the spirit of learning and creative thought were lost in strict rote learning.
What are we doing to our children’s spirits? What treasures are we smothering in them?
Every educator and anyone who has some type of influence on educational policies and practices–whether they should have that influence or not–needs a copy of this book! I will be getting a copy of this book and I will be sharing it.
Plain, simple and obvious. We know which students will pass or not which makes the test reliable and valid in that perspective. Yet we are forced to torture kids and ourselves for the envitable consequences of student failure and teacher ineffectiveness. It is ridiculous having to resort to publish studies to prove that teacher evaluation cannot be measure by a standardized test when it is obvious.
People with money and power throughout history have sustained the socio-economic gap because of their ideology of “their” world with neglect of the consequences. The good news is that there are more than us than them, so we can overcome, but why is there more debate and transparency on Syria (foreign country) than there is in our own country about education, environmental issues, etc.
We got the fabulous reform teacher evaluation this year. It just went up on the school website.
So great that we’re plowing money into this nonsense instead of instruction.
If anyone gets close to National Education Director Bill Gates and Deputy Education Director Michelle Rhee please tell them Ohio parents thank them for running the public schools in a county they’ve never visited and have no clue about. God knows we can barely tie our shoes out here without micro-managing from the billionaire/celebrity set.
This is going to be a game-changer. I can tell. It will be much, much better than the prior SIXTEEN YEARS of “market-based” school reform experiments we’ve endured in Ohio.
The worst part about evaluating teachers based on standardized test scores is that it allows those in power to continue treating children as if they were identical and interchangeable. In my state, teachers will be evaluated based on students’ proficiency levels and growth, which supposedly measures the rate at which they approach proficiency, and expected growth, which is how much growth that student would be expected to make in one year. The problem is that expected growth is calculated using absurdly broad demographic categories: free and reduced lunch or non, English language learner or non, special education or non, and race. This allows people to ignore the fact that not all English language learner, non-special education, Hispanic students are alike. Some of them have enormous support at home, some do not. Some have parents who are highly educated, some do not. Some have health care, some do not.
So just because your average English language learner, non-special education, Hispanic student makes X amount of growth in one year, does not mean that the 25 English language learner, non-special education Hispanic students sitting in my classroom should be expected to make the same growth. In fact, these students were placed in my classroom specifically because they have not performed well on standardized tests and other measures in the past. And we can certainly argue about the ethics of that decision as well as the various causes for poor performance. But the point is that these students will not make “expected” growth, and it will have nothing to do with the quality of my teaching. Now, I knew that going into the job, and I am okay with that. I will measure my success in other ways, within the walls of my classroom. But the punitive aspects of these evaluations (loss of jobs, loss of tenure, “blacklisting” from the district or state) do not go into effect in my state until the next school year.
If I am offered the same position next year, I will not take it. Who would? This group of kids pretty much guarantees a bad evaluation. And I do not know any good teacher in his or her right mind who would take it. Who will be left to teach these students? And what did they do to deserve this?
I’ve shared this in response to an earlier blog post of Diane’s, but it seems worth sharing again as it directly relates to this topic:
At a the NJAFPA conference on May 29, 2013, Charlotte Danielson (creator of the Danielson Frameworks for Teaching evaluation system that so many states and districts have adopted) said in her keynote: “Using standardized test scores to assess teachers is indefensible.” Very strong words, considering her audience included members of the NJ Department of Education. Danielson went on to say: “What counts as evidence? How will we use it? People are calling me for information on this; I don’t know; NO ONE KNOWS! Rather than standardized tests, we need to look at classroom/teacher’s learning evidence.”
You want to see a true farce, check out NYs “20 point band” system designed to demonstrate student growth. (See Engage NY: SLOs)
In order for a teacher to get credit for student growth, the student must move up at least one band. In this half-baked, mathematically inept scoring system, Student A can improve on their summative exam score by +18% points and show “NO GROWTH” because they were, by chance, stuck in the same 20-point band. Yet, Student B improves by +3% points and in fact “DEMONSTRATES GROWTH” because they happen to advance into the next 20-point band.
All I can say is any teacher in NY that is denied tenure, dismissed otherwise, or has their professional reputation damaged because of this nonsense MUST seek legal help. This garbage will NOT survive a court challenge.
But where would one go for legal help? I do not believe the unions will defend us. A class action lawsuit would be necessary to expose the reform movement for what it is–a moneymaking scheme.