One of my favorite bloggers is Anthony Cody. Anthony is an experienced teacher of science in California. I always learn by reading his blog “Living in Dialogue.” He recently offered his column to a teacher in Florida to explain how his or her evaluation was affected by “value-added modeling” or VAM.
The idea behind VAM is that teachers should be evaluated based on the rise or fall of their students’ test scores. Arne Duncan made VAM a requirement of the Race to the Top program, despite the lack of any studies or research validating this practice and despite ample warnings that it was invalid and would mislabel teachers as effective or ineffective. Nonetheless, many states pushed through legislation requiring that teachers be evaluated in part by their students’ changing scores. If the scores went up, they were a good teacher; if they did not, they were an ineffective teacher.
This idea was embraced most warmly by very conservative Republican governors like Rick Scott in Florida, where VAM accounts for fifty percent of a teacher’s evaluation. In the column cited here, the Florida teacher explains how it works and how absurd it is. This teacher teaches social studies to students in the 9th and 10th grades. When he/she went to get his evaluation, it turned out that the administrator had no idea how VAM would work, especially since the Florida test does not test social studies for 9th and 10th graders. At first, the teacher was told that his/her evaluation would be based on the whole school’s scores–not just the students in his/her classes–but then he/she convinced the administrator that the evaluation should be based only on those in his/her particular classes. That took a while to figure out. The teacher got the FCAT scores in May, but it took the district or state three months to prepare the teachers’ VAM using those scores.
By the end of the blog, it is obvious that the calculation of VAM is confusing, non-scientific, and inherently unrelated to teacher performance. It will be used to take away teachers’ due process rights and any protection for their freedom of speech. It is a weapon created to harass teachers. As this teacher concludes:
As someone who is not comfortable living life on my knees with duct tape over my mouth (you may have figured this out by now if you have been reading this blog for any length of time), I am not comfortable working on an annual contract. Teachers must be able to voice their concerns about administrative decisions that harm students without fear of losing their jobs. Eliminate continuing contracts and a culture of complacency, sycophants and fear will rule the schools. Senate Bills passed in state after Race to the Top state have included VAMs as a major portion of teacher evaluations all in the name of “Student Success” and “Educational Excellence” when in reality they have been immaculately designed to end the teaching profession as we know it and free state and districts from career teachers with pension aspirations. Some may brush me off as your typical history teacher conspiracy nut, but my daddy didn’t raise no sucker. VAM is a scam.
Diane

It sounds as if the teacher got a score similar to a NY SLO. It is nonsense akin to gambling based on no true data at all. Although I despise the whole system, he would have been better of taking the whole school score. In the end, everyone in the school contributes to each student’s learning. Isn’t that what we want?
LikeLike
AMEN, Diane! This is just getting rolled out in Wisconsin. VAM is a scam, idiotic, demeans the profession, is statistically and research flawed, and displays total denial of how children learn.
LikeLike
Diane and fellow blog readers. this is a great youtube animation that I found a couple years back called “Race from the Axe” I uesed to consider it a parody of satirical sorts providing some nice cynicism regarding the direction we were headed at the time. Now it is becoming more apparent than ever that it is more of a sad reality. It has a nice component related to VAM.
enjoy,
Jim
LikeLike
Fantastic, creative video Jim, that really nails it! “Race from the Axe”!
LikeLike
Mark, thanks but just to be clear all I did was post it. Found it on youtube a few years back. To people who follow education policy closely, it’s easy for us to transcend and visualize the bigger picture by reading words alone. This video really brings the insanity to life for someone who might not be as in tune to the constant attack the govt and reformers wage on education these days. Glad you enjoyed and pass it on.
Jim
LikeLike
The pressure for high test scores to stay high at the top-performing elementary (gifted centers) schools in Chicago appears to have led to some odd manipulations.
For example, very high scoring 6th graders who test for admission into 7th & 8th grade Whitney Young academic center are not admitted b/c their test score is suddenly, unaccountably low.
Twins who have nearly identical test scores — one point difference — are split and sent to 2 different elementary schools, and the reason given is the one-point drop in the second child’s scores prevented him from admission into his preferred school.
Under NCLB, these kids’s scores were important. Under VAM they are priceless.
I expect someday soon they may be traded by school administrators the way managers trade baseball players.
LikeLike
Reblogged this on Continuing Change and commented:
Great review of value Added Assessment – The Ridiculous We Call Home: Florida
LikeLike
As much as the professionals tried to point out the flaws in using VAM as a teacher evaluation tool, not enough could see it or admit its downfalls. Outcome, like FL, LA will have 50% of a teacher’s evaluation based on VAM. But, how to use this in areas and grade levels not tested is still up in the air! What a mess!
LikeLike
Here are the questions I sent to our DOE last year. I never got a response, of course, partly because the data is never meant to help students improve. It is designed to punish teachers and infuriate 33% of parents of students from across the board – 35% of DNP, 35% of Pass and 35% of passplus by labeling these children as low growth Here are the questions:
1. The ISTEP is used to measure the mastery of skills for a specific grade level, so the content for each grade level is specific to that grade level. It is a summative assessment, so it is not designed to show growth. How can comparing two grade levels show growth when the content of the test is completely different from grade level to grade level?
I would think to accurately show growth, you would need to compare two tests that are tested over the same content. I teach 4th grade and my standards are very different from 3rd grade standards, so to compare them in a growth model just doesn’t make sense to me. Please help me to understand this?
2. Also, I am bothered by the Growth Model always labeling 35% of Indiana’s students as low growth. I believe all students can learn, doesn’t the IDOE? No matter what the scores are, under the model 35% are always labeled red/low growth. Wouldn’t it be better to set a goal that ALL students in Indiana can achieve to show success? Let me restate this, under the current model, even if the gains that are reportedly needed are made, 35% of students will always be labeled “Low Growth” and marked red on the model. How will this help students to learn?
3. And that target that we are shooting for to increase learning, with the current growth model will change every year. And we won’t even know what the target is while we are shooting for it because it is dependent year to year on how other students do across the state and will most certainly change. Someone I know remarked it is like shooting a moving target. And I answered, yes, but add to it that the target is moving and I feel blindfolded. Please tell me how to use this data. I have been looking at Acuity and trying to compare it to that but since the scale scores for ISTEP do not match Acuity, comparing the two is very difficult. Wouldn’t it be better to compare Acuity or ISTEP scores to CSI scores?, since that shows a student’s ability to learn and the ISTEP show how much of the standards for that grade level they have learned?, just throwing that out there. Can’t wait to hear how other educators are using the data to drive instruction so I can get some ideas on how to practically use the data.
LikeLike
My question’s to Wis. DOE, at 2 different meetings and then sent by e-mail as they requested, that went unanswered were:
1. If a student’s scaled score in the 4th grade reading test is 500, what would the score be a year later to reflect 1-year’s growth?
2. What can a student with a scaled scoe of 550 in 4th grade reading do, concretely, that the student with a scaled score of 500 cannot.
No answers from DPI or the VARC people who developed VAM for Wis..
LikeLike
The DOE in Indiana was saying that the Growth Model showed a years worth of growth when they first started touting it. They then changed the propaganda they were sending out so it no longer said that. The growth model, as it is figured in Indiana, cannot show a years worth of growth. It can only assign a specific number of winners and losers in education.
I emailed, joined the DOE’s learning connections to ask these questions, sent them in at a meet and greet with Tony Bennett – where when they weren’t answered, cornered a person ‘ accompanied Bennett and who was suppose to be in charge of the Growth Model. He laughed at my questions and said he couldn’t explain to my how it showed growth because I wouldn’t understand, but trust him that it does statistically show growth.
LikeLike
Betty –
Ah, patronizing. Nice. Like the equally idiotic NAEP. “How’d we do on the NAEP? Can we see the results?”
“No. Trust us.”
No agenda here.
LikeLike
Mark by Wis DOE I assume you mean the DPI correct? I’m a WI father myself here…we like to reserve the term DOE for the John DOE investigation. Anyway, two things for you. 1. The DPI probably can’t answer your question because you didn’t give them choices A, B, C, D to fill in the circle for. 2. not sure where you fall on the political spectrum but you bring up VARC which you are probably aware is on Gov. Walker’s Read to Lead task force. In any case I did a little digging and found on their website that their two big financial donors are the U.S. Dept. of Ed (no surprise there) and the Joyce Foundation. Obama himself used to sit on the board of this extremely wealthy, powerful special interest group based out of chicago. You might be interested in this group and that fact that Gov. Walker created a direct pipeline for their money to influence public ed. policy through VARC.
LikeLike
Betty, the answer to your first question is more than likely, they use developmental scale scores, so the content of the standardized test is not supposed to be an issue. Although this may be true, I am not sure how they can guarantee that tests are of the same level of difficulty.
The answer to your third question, if I am understanding you right, you may be questioning the same thing I have to some fairly important people. They dismiss it, but I think its an issue – namely, as a teacher, how do I fulfill student growth expectations? It seems like a simplistic question, but it isn’t. For instance, I understand what it takes to get a student here in NC to be proficient and score an “80” on an EOC. I can tell instantly if that student is going to struggle with passing the EOC within a week of beginning to teach him. I also know how to implement strategies to get this student up to par so he can pass. But how do I determine whether a kid will show growth? Even if there was a way to determine whether a kid was on the right path, or not on the right path, how would I go about with remediation tactics in order to try to get him to show growth? This is a macroscopic example of how VAM is mostly out of our hands. There is no way to know what it is and to compensate for it in our teaching practices – it is out of our control.
LikeLike
i am a special ed teacher in wisconsin…specifically, milwaukee. i teach students with severe cognitive disabilities. how are they going to measure the growth that my students make? some of my students know how to write a number or a word one minute, but not the next. i have a student that has been working on writing her name for years, and still has difficulty. the letters look right…they’re just all mixed up. she’s in 6th grade. after 7 years teaching and 15 years in the education field, i am finally going to be a professional educator (yeah, that’s what my license will say now), yet i will always be ineffective according to the data. for the first time in my life, i tried to talk a friend out of becoming a teacher. she asked why.
LikeLike
Here is a very dangerous organization that is not spreading like a virus in my state of Wisconsin. Value Added Research Center VARC. Here is a link to their page that lists its current projects through the nation. Thought you and your readers would like to know about this. Major financial contributors to VARC are the Dept. Of Education and the Joyce Foundation (both organizations with a clearly unbiased agenda)
http://varc.wceruw.org/projects.php
LikeLike
Diane, I have many deep seated concerns in using VAM. I will say that I can appreciate most of the new teacher evaluation systems that have been implemented across the nation, at least in terms of the classroom observation piece. I think the old checklists were outdated and weren’t helping anybody unless principals purposely modified them to do other things than what they were intended to do.
However, VAM is complicated, which at the outset brings suspicion. Rarely are employee evaluation systems complicated, but that is because in the business world the outcomes are much simpler – how much money you made the company or how much money the employees that work for you made for the company. The issue here is that just because teaching is complex, doesn’t mean that a complex statistical analysis simplifies it. In fact, for anyone who understands just how complex the teaching profession is, you know right away that there is NO way to design a system to place a numerical value on any part of it. And who care’s about “multiple measures”? The fact that VAM even makes up 50% of a teacher’s evaluation is unsettling, and surely will not hold up in a court of law. All of the literature plainly states to not use VAM for high-stakes decisions, and placing VAM results as 50% of a personal, professional evaluation is “high-stakes”.
Specifically, the reason why VAM fails is that it cannot consider classroom level “lurking variables”. In science (and math), and specifically in linear regression, lurking variables are variables that affect the dependent variable (i.e. score on standardized test) that are not controlled for, and are not related to the independent variable (teacher effect). These sorts of variables invalidate linear regression models because they displace plots on the graph and mess up the calibration of the regression line. Essentially, the move plots up or down and when they are large scale in effect, they can really affect the slope on the line for a linear regression plot. For instance, how can VAM control for a class clown or for a bully that affects the progress and eventually the achievement of a whole class? Such a ‘lurking variable’ is uncontrolled for and can mess with potentially 30 or more data points. Keep in mind, that just having a score or two invalidated, whether it be because of test nullification or some other factor, is not enough to really mess up the calibration of the line on a linear regression plot. It is the large scale variables that affect whole classes and whole sets of data. One concern here is that groups of students and parents could gang up on a teacher and purposely nullify a test, which I can see happening.
The single most important ‘lurking variable’ is the non random assignment of students. For instance, at my school, we have a modified block schedule where we pull a set of kids out for drama, band, chorus, and other artsy stuff so they can be together all year long for their productions. Its a great idea, until you look at such a process under the lenses of VAM. This kills VAM, because what is happening, is that students that are involved in the arts will be some of your top students, and the rest of the teachers that teach during that time period are left with the ‘left-overs’. And not only that, they are left with the ‘left-overs’ at the end of the day, which spells disaster. And it does too – teachers during 4th and 5th block are prone to have more behavior issues and so forth. This kind of stuff kills VAM. That can be up to 60 students that are anything but randomly placed in my class at the end of the day.
I am not sure why so many researchers like Darling-Hammond, and others, are so set on upending VAM based on reliability issues. VAM could be perfectly reliable (which its not – 1/3 of teachers move from the highest to lowest rating, and vice versa) yet be invalid. Validity is not necessarily attached to reliability. Validity is dependent upon whether the process can actually capture the effects that a teacher has on a student’s test score. I think any judge or jury with half a brain will be able to see the murky mess of “lurking variables” and comprehend the ridiculous science that’s involved here. My prediction – class action lawsuit in say 5 to 7 years and VAM fails. That simple.
LikeLike
Daniel – The reason researchers oppose VAM is because the tests that yield the “V” of VAM are unreliable.
Your discussion of scheduling is right on point, but is just 1 example of a myriad of sound educational practices that are incompatible with VAM. VAM requires serving accountability instead of the best interests of students.
And, at the heart of the matter, VAM is based measures that are not only Inreliable, but also invalid. They represent, together with VAM, assessment at it’s fastest, easiest, most inexpensive , AND worst.
LikeLike
Finally, they do not measure what kids need to be successful, in every and all senses of the word, in their futures.
LikeLike
Tripped on an article that discredits VAM. Summary follows. Also had an email exchange with the author about VAM assumptions re: test score scaling. I was correct, as was Gerald Bracey (departed but not forgotten) speaker of truth to power, a psychometrician.
Standardized test scores are not constructed as interval scales (like a ruler or thermometer) where a single increment has the same meaning anywhere on the scale. Item analysis for difficulty does not solve this problem. Neither are the scores on tests vertically aligned so that a “unit of learning” (eg,, test score gain of three points) has a constant meaning and educational significance regardless of the subject and grade level. That idea is dumb from the get go. Someone noted that the testing industry is unregulated, can make claims that cannot be independently verified. Tests are not in fact, often aligned with standards and standards are rarely stable over the span of time that VAMers need to feed their formulas(Andrew Porter citation available).
Marketers of the VAM scam tap the huge reservoir of statistical models for VAM calculations (about 24 ) and a bunch of statistical techniques that “work” in formulas even if these calculations make no educational sense.
In a 2009 defense of the SAS proprietary VAM formula used in Ohio, Wm. Sanders (father of VAM in education) and colleagues–major provers of VAM analyses–say they run “tests” on the state tests to make sure they are OK!. Later in the same white paper ” A Response to Criticisms of SAS/EVAAS” they speak of a procedure that will “invisibly predict” missing scores. They justify this magical act as statistically “efficient.” The whole white paper is filled with marketing hype that ignores the difference between what is educationally meaningful and significant, and what is statistically possible. Sanders et. al. also claim that controlling for poverty and ethnicity are unnecessary steps because these factors are embedded in prior test scores of students. REMEMBER: The whole premise of VAM is that past performance on standardized tests predicts future performance. This concept is known as the self-fulfilling prophesy. It a form of prejudice that no reputable teacher would act upon. In fact, no responsible CEO publishes a report that says “Past performance predicts future performance. But, the VAM scammers and supporters think that this is a great principle and methodology for doing a triage on teachers.
Following is a contrarian view relevant to ALL champions of VAM.
A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling
Objective: This article reviews literature regarding the reliability and validity of VAM, then focuses on an evaluation of a proposal by Chetty, Friedman, and Rockoff to use VAM to identify and replace the lowest-performing 5% of teachers with average teachers. Chetty et al. estimate that implementation of this proposal would increase the achievement and lifetime earnings of students. The results appear likely to accelerate the adoption of VAM by school districts nationwide. The objective of the current article is to evaluate the Chetty et al. proposal and the strategy of raising student achievement by using VAM to identify and replace low-performing teachers.
Method: This article analyzes the assumptions of the Chetty et al. study and the assumptions of similar VAM-based proposals to raise student achievement. This analysis establishes a basis for evaluating the Chetty et al. proposal and, in general, a basis for evaluating all VAM-based policies to raise achievement.
Conclusion: VAM is not reliable or valid, and VAM-based polices are not cost-effective for the purpose of raising student achievement and increasing earnings by terminating large numbers of low-performing teachers.
Teachers College Record
http://www.tcrecord.org ID Number: 16934, Date Accessed: 6/26/2013
Author of this article is STUART S. YEH is Associate Professor and Coordinator of the Evaluation Studies Program at the University of Minnesota. He has conducted numerous cost-effectiveness evaluations and is the author of The Cost-Effectiveness of 22 Approaches for Raising Student Achievement.
LikeLike
If only you were all as skeptical of global warming statistics as of VAM procedures. As long as you support President Obama in his “war against coal” in the interests of claiming to have an impact on global temperature, you deserve to be personally evaluated on the basis of the likely equally bogus basis of VAM. What goes around comes around. Those who live by the scam of global warming will die by the scam of VAM. If you accept and teach to evaluations of STUDENTS by standardized tests, so will you be evaluated by student performance on those standardized tests. Do not do unto others what you do not want done to yourself.
LikeLike