Stuart S. Yen, a professor at the University of Minnesota, contends in this article in the TC Record that value-added-modeling is neither valid nor reliable.
He reviews the existing literature and notes that VAM is now used to hire, fire, promote, and reward teachers, all high-stakes decisions.
He writes:
In principle, value-added modeling (VAM) might be justified if it can be shown to be a more reliable indicator of teacher quality than existing indicators for existing low-stakes decisions that are already being made, such as the award of small merit bonuses. However, a growing number of researchers now advocate the use of VAM to identify and replace large numbers of low-performing teachers. There is a need to evaluate these proposals because the active termination of large numbers of teachers based on VAM requires a much higher standard of reliability and validity. Furthermore, these proposals must be evaluated to determine if they are cost-effective compared to alternative proposals for raising student achievement. While VAM might be justified as a replacement for existing indicators (for existing decisions regarding merit compensation), it might not meet the higher standard of reliability and validity required for large-scale teacher termination, and it may not be the most cost-effective approach for raising student achievement. If society devotes its resources to approaches that are not cost-effective, the increase in achievement per dollar of resources expended will remain low, inhibiting reduction of the achievement gap….
This article reviews literature regarding the reliability and validity of VAM, then focuses on an evaluation of a proposal by Chetty, Friedman, and Rockoff to use VAM to identify and replace the lowest-performing 5% of teachers with average teachers. Chetty et al. estimate that implementation of this proposal would increase the achievement and lifetime earnings of students. The results appear likely to accelerate the adoption of VAM by school districts nationwide. The objective of the current article is to evaluate the Chetty et al. proposal and the strategy of raising student achievement by using VAM to identify and replace low-performing teachers.
Method: This article analyzes the assumptions of the Chetty et al. study and the assumptions of similar VAM-based proposals to raise student achievement. This analysis establishes a basis for evaluating the Chetty et al. proposal and, in general, a basis for evaluating all VAM-based policies to raise achievement.
Conclusion: VAM is not reliable or valid, and VAM-based polices are not cost-effective for the purpose of raising student achievement and increasing earnings by terminating large numbers of low-performing teachers.
This is a video in which Yen discusses his findings about VAM.
The real agenda behind VAM….
VAM: The Scarlet Letter
–
Also: as I have pointed out repeatedly,
“A largely ignored problem is that true teacher performance, contrary to the main assumption underlying current VAM models, varies over time (Goldhaber & Hansen, 2012). These models assume that each teacher exhibits an underlying trend… {which is not the case!- gfb} From the perspective of individual teachers, it is inappropriate and invalid to fire a teacher whose performance is low this year but high the next year, and it is inappropriate to retain a teacher whose performance is high this year but low next year. Even if average teacher performance remains stable over time, individual teacher performance may fluctuate wildly from year to year.
While previous studies examined the intertemporal stability of value-added teacher rankings over one-year periods and found that reliability is inadequate for high-stakes decisions, researchers tended to assume that this instability was primarily a function of measurement error and sought ways to reduce this error (Aaronson, Barrow, & Sander, 2007; Ballou, 2005; Koedel & Betts, 2007; McCaffrey, Sass, Lockwood, & Mihaly, 2009). However, this hypothesis was rejected by Goldhaber and Hansen (2012), who investigated the stability of teacher performance in North Carolina using data spanning 10 years and found that much of a teacher’s true performance varies over time due to unobservable factors such as effort, motivation, and class chemistry that are not easily captured through VAM. This invalidates the assumption of stable teacher performance that is embedded in Hanushek’s (2009b) and Gordon et al.’s (2006) VAM-based policy proposals, as well as VAM models specified by McCaffrey et al. (2009) and Staiger and Rockoff (2010) (see Goldhaber & Hansen, 2012, p. 15). The implication is that standard estimates of impact when using VAM to identify and replace low-performing teachers are significantly inflated (see Goldhaber & Hansen, 2012, p. 31).
gfbrandenburg: thank you for this and your blog.
😎
When scientists use the scientific method they are careful to only change one variable in their experiment. People are complex beings, not single chemicals or elements that can be controlled in a laboratory setting. The bottom line about VAM, IMHO is that there are too many variables in the teaching learning process to define through a formula how much influence a teacher has on the learning process.
Is this the most expensive formula in the history of the world, or what? Why would they invest so much in one guy’s economic theory?
Good Lord. It’s nuts. You’d think he came up with gravity, the way they all followed him and put this in everwhere. Why are economists running public education anyway?
Some very important variables that VAM does not address(because it can’t) are how well or poorly the school is run, including decisions on how to allocate the budget, where to place teachers, how many pull-out groups, how large they are, are mandates followed, school culture, degree of instabity in the environment, degree of instability in leadership, poor decision-making, inadequate programs, dated programs, hostility, and the list goes on. When you think along these lines you can see why VAM is invalid and unreliable.
Exactly right Chiara, they can’t even get the economy right. The dismal science is a half correct moniker, just get rid of the science. They ignore it.
Virginiasgp, this is for you, ask Dr. Yen your silly questions.
Too funny! That was going to be my response as well. Cue VirginiaSGP in 3…2…1…
“So-be-it Core”
Common Core is commonism
So-be-it central planning
VAM is like Lysenkoism
A scientific banning
Excelente, SDP.
Ok it’s only 1:30 and I’m already up to post 3 out of 4 today. Oh well, this is the core issue so read up. Note that this subject is at the heart of stochastic processes which was part of my concentration in college so I am rather familiar with the topic so to speak. And to MathVale and Stile, I’ll answer your questions tonight, Stiles, I prepared an initial response but scratched it so I could address the article in full.
1. Mr Yeh points out that “In other words, the Chetty et al. analysis assumes that a high-quality teacher this year will remain a high-quality teacher next year; a low-quality teacher this year will remain a low-quality teacher next year. Later in the article, however, the authors conclude based on their data that “teacher value-added is not in fact a time-invariant characteristic” (p. 25). While the authors’ analysis assumed that teacher quality is fixed over time, their own data suggest that teacher quality, as measured by teacher value-added “is not in fact” time-invariant, consistent with the results reported by Goldhaber and Hansen (2012). If this assumption is not valid, the conclusions of the analysis are not likely to be valid.”
All this says is that a teacher may have slightly good years and slightly bad years. An analogy could be a basketball player that is “hot”. Researchers have tried to debunk the “hot streak” theory by looking at made shots over time. They claim that any “streak” is just a result of variation since even a casino has long runs of “red” or “even” by chance. However, I would contend that basketball players vary the difficulty of their shots based on how well they are shooting. If they are “hot”, they will shoot from anywhere. If not, they are more conservative. But I digress. The idea is can a teacher’s performance change over time even if he/she has similar kids to teach.
In engineering, we make assumptions all the time. When you deal with orders of magnitude, +/- 10% is often not an issue and is “assumed away”. Unless teacher contributions vary significantly over time, then the Chetty study was absolutely valid in making the assumption that teacher value-add is fairly constant. Even if teacher value-add is constant (just like a coin toss), there is measurement error associated with the tests of his/her students. Just like a coin can produce 7 out of 10 heads, that doesn’t mean it had a 70% chance of popping heads during those measurements. Over 1000 flips, it will be very close to 500 heads.
2. He then goes on to say that teachers in the bottom 25% in year one shift to the top 75% in year two. First, if a teacher ranked at the 24th percentile in year one and then at the 26th percentile in year two, he/she would qualify for this “shift”. But it make no mistake, those numbers must be accounted for. So let’s look at the footnote. It says they used a minimum of 15 scores. That is definitely too low to account for measurement error. Recall that Virginia gave guidance of at least 40+ scores before using the SGP data. As you lower the number of measurements, the error of the average goes up. If a district uses only 15 scores for anything, that is a bad practice period. But this doesn’t invalidate VAMs/SGPs that use larger samples.
3. The following statement is just bizarre: “Another problem arises when, for example, a pretest score measures pre-algebra but the posttest score measures geometry skills or when a teacher emphasizes pre-algebra but not geometry. Improvements in learning may not be captured by the assessment”. Who in their right mind would give a test on something other than the subject taught? We give algebra tests to kids that took algebra. The test questions are what should be taught since the test questions are what the public, collectively through the political structure that manages the department of education, wants to be taught. Suggesting VAMs measure geometry skills for a pre-algebra class is ridiculous.
4. Lastly, let’s look at the data I received from VDOE using anonymous teacher IDs. I took the median SGP of the students taught by the teacher in successive years. For math teachers the following was noted:
a. For teachers in the bottom 20% in year one, they were 10x more likely to remain in the bottom 20% in year two than to jump to the top 20%.
b. For teachers in the bottom 20% in year two, they were more likely to remain in the bottom 20% in year two than to move out of the bottom 20% (any of the other 4 quintiles)
c. The same applied to teachers in the top 20%.
d. However, nearly every VAM advocate suggests using multiple years of data. Thus, while I argue that single year rankings are relatively stable, I do NOT recommend any decision off just one year. Moving averages using multiple years of data would be best. And you need much more than 15 scores to calculate any reliable VAM.
At least the author talked about the cost-benefit of hiring/firing decisions. That is a much bigger topic than I have room for here (another day’s topic). But in summary:
i. This study did not reject any reasonable use of VAMs. It just provides the common sense that you should never hire/fire a teacher based solely on 15 test scores. Who would even do that?
2. It acknowledges that measurement error is a significant issue with the studies and that measurement error underestimates the impact of teachers. This was the argument between SomeDAM Poet and me about whether 1-14% was the correct number. The later studies suggest 40-60% of reliable variability can be attributed to teacher effects.
3. The author apparently did not understand Chetty. They claimed that replacing a “below average teacher” with an average one increases classroom earnings by ~$250K. A “below average” is one std dev below or about the 17% level. Chetty/Rockoff/Friedman stated that replacing a bottom 5% teacher would yield $50K increase for EVERY SINGLE CHILD in that class (see point #7). Again, I would suggest some reading comprehension exercises for the anti-VAM crowd. STEM appears to be the least of your worries.
4. The study also tried to confuse readers into thinking there is any airtight cutoff. if we want to push out the bottom 25%, we can’t be so confident to think we can measure the bottom 25% exactly. Thus, we say we will remove the bottom 5% (using multiple years of data and 40+ scores). In doing so, we are confident that virtually nobody in that bottom 5% removed was actually in the top 75%. Thus, we have a 20% margin of error. As I recall, the chances of mislabeling an average teacher as a bottom 5-8% teacher are less than 6%.
So it basically comes down to this which perspective you take. You see those 1500+ students in my ppt slides who were clearly taught by ineffective teachers, think of them when you read these two choice.
1. Do we choose to believe the 94% chance that a teacher is ineffective and give his/her students a shot at a quality education with a replacement teacher?
OR
2. Do we protect an ineffective teacher even if there is only a 6% chance that he/she is actually effective?
Choice 2 is, in a single word, EVIL!
Virginiasgp,
You can’t engineer teachers or children.
And the measures are garbage.
From Diane’s response to you:
“And the measure are garbage.”
Only if one stretches the meaning of “measure” beyond recognition. There is no measurement whatsoever.
Brian,
Please explain the standard involved in these supposed measures. Where is that standard? How did it come about? Is there only one standard? Who determined that standard? Does the process involved in making and using these supposed standards and measurements follow OSI protocols? If so where may I find that justification?
WITHOUT THE STANDARD(S) (and there isn’t one nor any by the way) THERE CANNOT BE ANY VALID MEASUREMENT. And if there is no valid measurement the whole VAM/SLO/SGP concepts are rendered “VAIN AND ILLUSORY”, in other words COMPLETELY INVALID.
Or as I put it MMoOO-Mental Masturbation or Obligatory Onanism.
The supposed economic gains can not be proven. You assume they will get a job paying above minimum wage. Nearly half of our STEM graduates are employed in non STEM areas or are unemployed. The $50 K per student per year has also been debunked. It is more like $5.00 per week http://nepc.colorado.edu/blog/revisiting-chetty-rockoff-friedman-molehill. These measures are garbage, even you can see it. How do we choose those bottom enders? Maybe they just had poorer classes. By the way, sgp’s were never even designed to evaluate teachers. Taken individually, VAM has a near 50% chance of error. Flipping a coin is just as accurate. I know, you do not like facts but ….your assumptions are not playing out.
Old Teacher: bazinga!
As you point out, highly questionable assumptions aplenty—and not made explicit. And about those pesky “facts”—
Did you know that the entire Atlanta cheating scandal was built on a foundation of “in-class products” made by teachers? I didn’t know that either, but that’s what you get when you ruthlessly apply a Rheeality Distortion Field to yourself.
Then taking one’s own students [forget that pesky co-teacher!] from the 13th percentile to the 90th percentile is not just possible but a mere byproduct of being that very special kind of person.
¿? You know, the kind of very special person that hallucinates on this blog about VAM but not on, say, VAMboozled (Audrey Amrein-Beardsley) or on the blogs of Bruce Baker or GFBrandenburg or Gary Rubinstein.
As for me, I don’t think we need an old dead Greek guy for this one:
“Everyone is entitled to his own opinion, but not his own facts.” [Daniel Patrick Moynihan]
😎
VirginiaSGP (aka Brian): your claim that a given teacher’s value-added score is pretty stable over time is refuted by the data from New York City, from North Carolina, and many other places, including in Yen’s paper.
In many cases, a simple scatter plot shows much more than a long paragraph or any equation. Take a look at the graphs I made at
and
Plus, as I have shown for DCPS, and Yen’s study also shows, there is only a weak correlation between classroom observation scores and VAM scores.
Yes, there are terrible teachers we should move into other lines of work, and excellent teachers we should keep. Unfortunately, VAM is has been shown NOT to be the right way to make those judgements, especially if it’s based on one year (as is the case, despite your protestations to the contrary) of scores..
This is junk science that reminds me of the mathematical tailors in the fictional island of Laputa in Gulliver’s Travels who used elaborate geometrical formulas to create clothing that fit nobody at all.
You seem to be much more certain of its utter veracity than anybody else.
The sad truth is this. VAM continues to be used for high stakes decisions about teachers because it has become a “hardwired” metric in some states. In Ohio, VAM calculations for teacher evaluation are required by the administrative code. SAS is the contractor for data crunching, and the VAM is called EVASS. EVASS is the proprietary legacy of Dr. William Sanders and his students who used a version of this methodology for genetic engineering studies intended to increase the productivity of seeds, sows, cows. That was before they discovered the thrills of crunching data and making inferential leaps about the productivity of teachers, year-to-year, and then mistaking those gains for all things wonderful about teaching.
Hoping to given credibility to VAM, the Gates Foundation hired economists for the $64 million “Measuring Effective Teaching” Project (MET. In one study, the economists wanted to strengthen the validity of the VAM by getting lots of schools and districts to do random assignments. Random assignments like that are among many assumptions buried in VAM.
Well that hoped-for “proof” did not happen. The economists offered schools a perk to comply with the statistical game plan; but all of those real principals, teachers, and students in multiple schools, districts, and states did not stay in their intended slots.
Audrey Beardsley’s website VAMBoozled is the go-to place to see all the references that show how this metric has been pushed by economic thinking and the conviction that statistical measures of productivity in economics can tell us which teachers are the best –“high-quality” teachers. By a process of circular reasoning a high-quality teacher is one who produces the most bang for the buck–meaning year-to-year gains in scores on statewide tests. VAM survives in part because there is a lot of arrogance about the “objectivity” of test scores.
Now the plot thickens. My copy of Educational Researcher just arrived. In the “Feature Article” two researchers use teacher VAM on statewide assessments as part of a study designed to construct “profiles” of strong and weak instructional practice. The researchers add insult to injury by recycling data on teachers from the failed MET study including VAM, scores on Danielson’s observation framework (used with videos, not live observations), and several other bits and pieces of information about teachers (e.g., experience, scores on licensing tests).
In my opinion, this study should never have gotten through the peer review process, much less be the “Feature Article” complete with generalizations about the potential use of their fledgling methodology to design “programs that are targeted at teachers’ needs and coordinated over domains of practice.”
This study focussed on teachers of “middle school literacy”–not anything close to full spectrum education. The article ends with several paragraphs that indicate the research procedures “may have induced misclassifications (of teacher profiles) due to different levels of measurement.” They note other errors/glitches in the accuracy of the estimates of this and that.
This study was funded by the Spencer and William T. Grant Foundations and with your tax dollars and mine, via the Institute of Education Sciences, where grant reviewers appear to be as dumb or indifferent to VAM as the economists who did the MET project. These two researchers are feeding on that flawed data to characterize teachers.
I am fed up with the pushers of VAM. I have nothing to lose by adding some commentary to amplify on the need to stop the use of this methodology for evaluating and profiling teachers. If matters, I also have a vintage PH.D in Educational Theory Construction and Research.
The fact that you have a vintage Ph.D in Educational Theory Construction and Research gives your comments and beliefs more credibility than a row of economists with an agenda. They may be able to crunch numbers, but they may not be treating significant information the same way or weight as someone with deep understandings of education would. Some human activities may not lend themselves to a mathematical analysis, and teaching may be one of them. Trying to force education into to an economic construct makes as much sense as hiring a plumber to fix your electrical problems, or going to a podiatrist for your heart problem.
VAM is the scarlet letter. I wear it along with many other teachers. Some teachers do not have to wear the scarlet letter. Luckily, they can thumb their noses at VAM because VAM has not taken over their jobs. I do not hold that against these teachers. I am happy that these teachers can still feel good about themselves and what they do in the classroom. I used to be able to do that too. My scarlet letter has lowered my self esteem in the classroom and has made me dread the first week of May when my new scarlet letter is assigned. I am looking forward to the day when my scarlet letter will be thrown in the trash where it belongs. I mourn for a profession I no longer recognize. 😢
Well, Yen works within standardized testing discourse of which Wilson has shown the COMPLETE INVALIDITY. He does identify various false assumptions and errors in that discourse promoted by economists and psychometricians (Two study areas that need to drop the concept that they are scientific in any way). The boundaries of that econo-psychometric discourse do not allow for questioning their basic assumptions, the main one being that the teaching and learning process can be “standardized” and “measured”.
The various fundamental assumptions (those epistemological and ontological foundations) undergirding the educational standards and standardized testing regime and by extension VAM which relies on those two concepts, have been proven to be full of errors and falsehoods resulting in any resulting conclusions drawn to be “VAIN AND ILLUSORY”, in other words, COMPLETELY INVALID.
Therefore VAM and it’s brother SLO/SGP methods of evaluating teacher performance can only rationo-logically be deemed USELESS AND A WASTE OF TIME MONEY AND RESOURCES, that could be better used in actually helping students in the teaching and learning process.
Insanity is as insanity does and these evaluation schemes can only be seen as a form of insanity.
To understand the COMPLETE INSANITY that is VAM & SLO/SGP read and understand Noel Wilson’s never refuted nor rebutted destruction of educational standards and standardized testing (of which VAM & SLO/SGP are the bastard stepchildren) in “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
Here’s a new word: VAM shame
See how VAMpire machine(above) is spinning like Terminator
http://vamboozled.com/its-a-vam-shame/
VAM is an attempt to validate and institutionalize the illegal activity of judging teachers based on exam outcomes, which has gone on long before VAM, and still goes on apart from it.
The exams are faulty, the curricula are faulty, school policies are faulty, our understanding of human psychological and intellectual development is faulty (which accounts for much of the inaccuracy and invalidity noted above), and so much research in the fields involved is biased (some is outright fraud) due to funding and politics and it is almost all far removed from hard science.
One could go on and on about correlations and margins of error, but it is unnecessary. You can say things wash out in averages, but almost everything ultimately does, so that is saying almost nothing. If you can’t have precision for individuals, and you don’t, then you are just tinkering in your own feces (pardon the graphic nature). The use of this garbage is unethical and stupid. Statisticians don’t know how to factor in or out the fraudulent test-prep machines that some teachers have become or have always been, the CTT classes without support, the students with IEPs who don’t really need them but whose parents are overly concerned and insistent upon meeting exceedingly high standards through extra help and extra time on tests, the easily distracted/disruptive yet brilliant students who change remarkably with class dynamics and teacher personalities, and on and on and on and on.
I had a professor nearly 30 years ago who stumped the class, mainly grad students. It led to a breakthrough only a few weeks ago. None of this kind of stuff appears in VAM manure.
You have to be an anti-intellectual to believe in this crap. And not one of those intuitive geniuses who are averse to conventions of expression, but a real major league dunderhead through and through.
Just listened to Yusef Lateef, “Rasheed” (INTO SOMETHING). It inspired me.
So in response to the riff on this thread and others by Johnny One Note off of a classic, a rheephorm favorite called “Strike Up the Bland,” I point viewers of this blog towards—
A posting here of 5-28-2014, “Thankful for John Ewing”—
[start of first four lines of posting]
John Ewing wrote a brilliant article called “Mathematical Intimidation.” If you haven’t read it, please do. It demolishes VAM. He calls on mathematicians to speak out.
[end of first four lines of posting’]
Link: https://dianeravitch.net/2014/05/28/thankful-for-john-ewing/
A link to his speech is provided in the posting, and at that time he was “Executive Director of the American Mathematical Society and President of Math for America.”
I urge all viewers of the blog to download and read it.
Just my dos centavitos worth…
😎
Ok, so virtually nobody responded to my the points in my post. Nice. You are such an honorable and rigorous bunch. Let me first address a few key points to which you must respond if you ever hope to rebut VAMs.
1. Teacher-observation evaluations are consistent with VAMs. Given this is the case, nobody appears to be disputing the fact that “credentialed educators” see the effectiveness in the same relative vein as the VAMs. Thus, there is no disagreement about which teachers are generally better than others. The only disagreement is whether everybody lives in Lake Wobegone (where all teachers are “great”) or whether the bottom 5-25% of teachers are ineffective and destroying kids lives.
2. Can none of you read to save your life?!!!!! Chetty/Rockoff/Friedman’s original study talked about an “above average teacher” generating $250K of additional lifetime income. They defined “above average” as 1 std dev above average or roughly the 83rd percentile (17th if you look at below average). This is where the $5/wk/student stuff comes in. But their latest response refined the effects of a top 5% teacher (at least 2 std dev) to be $50K for every single child in the class. Since you all keep repeating the 5% teacher and the $266K number together, it’s pretty obvious you would all fail any reading comprehension test even at your ripe old ages.
3. Let’s assume for a second that the tests used for these VAMs ask kids who won the Grammy for Hip Hop song last year (or other irrelevant questions). Obviously, one could never generate a reliable trend among teachers from year to year if this were true. However, given Chetty/Rockoff/Friedman’s findings of increased lifetime income for higher VAM teachers, does it even matter? In other words, assume that these teachers aren’t teaching any material to “standard” but merely engaging the students’ curiosity. That results in the students giving more effort on the test and having more ambition after high school. Eventually, they earn more in income. If that were the linkage, retention policies based on VAMs would obviously be worth it! The Chetty study makes all of your other points irrelevant.
Now, let me address some of your silly points:
A. Old Teacher: In reference to Baker’s blog entry, note that he can’t read either (see #2 above). But furthermore, Baker seems to imply that individual students’ incomes are only separated by $500/yr. But it’s the average of student incomes that are separated by $500/yr. A 1.2% increase in income adds up. Given that the per capita GDP is $50K/yr in the US, that’s ~$600/yr for every single man, woman and child in the country. If Obama could convince Congress to give a tax credit of $2400 per family, you would claim that would be a “huge” victory. But if we can generate the same thing via teacher retention policies, that “tax credit” can be generated for FREE!
B. Sad Teacher: so we must assume you have a below average VAM. Since we don’t know your identify, do you mind telling us your score? One point I have always made is that ineffective teachers are highly unlikely to know they are ineffective. Most teachers never get to observe their peers throughout the year. And in any field or sport, most folks think they are at least average. If your fellow teachers were asked to anonymously rank every teacher in the school, obviously some of you would be ranked lower than the “average”. Who do you think those less effective teachers are in your school? Without seeing those subjective ratings, how can you possibly dispute the objective rankings? But let me make this point. Everyone can improve. And even if you happen to be less effective, you should still be thanked for entering education. I am not a good salesperson. I realize that. At least I don’t enjoy performing that role. If I did, it would be harder to accept. If it turns out that you are not very effective in a teaching position, there are other ways to contribute. But ultimately, schools don’t exist to employ (eager) teachers. Schools exist to effectively educate students. That’s just the reality of education.
C. Duane Swacker: working so hard to come up with metaphysical excuses. This is all nonsense but you think it makes you “sound smart”. So let’s pull that string. Let’s assume there is no real standard. All these teachers are just guessing at what the “true standard” is. They try to teach their perceived standards but don’t always stay true to form (some by choice as they divert into preferred subjects while others are not able to stay on topic). The testing companies are even more “confused” as they just make up tests somewhere in the ballpark of the general curriculum topics. Then, students are asked to take “irrelevant” tests that bear little resemblance to the stated standards and even less resemblance to the instructed standards. But amazingly, VAMs produced from these “diverging standards” are remarkably reliable and show tremendous differences in academic growth among teachers over a period of years. How can you possibly make that claim? If the standards were not aligned, then there is NO possible way the data would show such correlation from year to year! I’ve got a suggestion. Go down to the local coffee shop/book store. Get a few stoned college kids and sell them on your ideas. They are about the only ones who will actually believe a word that is emanating from your mouth.
In the end, you all fight so hard to defend ineffective teachers. Yet every parent and every good teacher will acknowledge there are clearly ineffective teachers who fall through the cracks year after year after year after year. Have any of you proposed any plan to retrain and/or move these ineffective teachers out of core classes? Do you even care about the kids sitting in the classrooms of these teachers? Diane likes to rail about how just because other countries don’t use VAMs (even though they have other mechanisms to enforce quality), that we shouldn’t use them. But can anybody come up with a reason why the following choice should not be given:
Parents are allowed to choose whether they want high-VAM teachers or a teacher with fewer students in his/her class. All of you apparently think the VAMs make no difference. If so, why do you care if those of use who believe VAMs are tremendously informative choose the high-VAM teachers? What can you possibly lose?
Virginiasgp, how did you get to be more knowledgeable about teaching than scholars and researchers who have devoted their professional lives to the study of teacher evaluation?
I have to ask again, if VAM is so great, why is there no other nation using it?
Its main accomplishment is to drive teachers out of the profession, to discourage people from wanting to be teachers, to narrow the curriculum only to what is tested, to promote teaching to the test. Etc.
Last comment to you: VAM is junk science. It demeans teachers. Nothing you write here changes the reality.
Diane, your whole strategy depends on parents not having access to the teachers’ scores and being able to choose their kids’ teachers. If that were to occur, parents could see for themselves that VAMs are “common sense” and measure the same thing that observations see: quality and effective instruction.
But maybe they will be able to see…starting Aug 1…..at least in Virginia!
Virginiasgp, it is tiresome to say it again: VAM is junk science. If you want to get rid of good teachers, stay with it.
Your first point is an absolute joke. Of course observations by admin align vaguely with VAM. They are colored by test outcomes and are themselves often superficial and faulty like VAM. You need precision on an order of magnitude you can’t hope to approach, not rough correlation to crap.
Let me suggest that what you are experiencing in your rigorous research is confirmational bias. Don’t expect others to convince themselves of the ghosts, demons and fairies you are seeing in very deliberately selected and hacked up data and text. You are not truly informing your mind but deforming it.
If my digital thermometer measures subzero temperatures with +/-.0001°F accuracy every day in the middle of a Miami summer, instead of touting the testing device as a great success, I might use common sense and question both the thermometer and why I’m bothering to do this when everybody already knows it is hot.
If a single contradiction is found in a proof, that proof is false and must be reexamined. I find so many contradictions in the application of VAM to teaching, it is a wonder anybody with a smidgen of rational thought still believes it has benefit.
I should start a list….
1. Post test scores lower than pre-test means I am evaluated as a teacher on knowledge the student lost that I never taught them. This makes no sense.
2. Students well below grade level, the teacher brings them up several, yet still not within the test measurement range. A highly effective teacher is measured as ineffective.
3. Measuring students who do not participate. If students see little value in the test, what are the tests actually measuring? Indifference Quotient?
More…?
I understand. Statisticians get all jolly over numbers and models. It can be an adrenaline rush to think you can predict the future by tweaking cells in a spreadsheet. But those of us actually in a classroom know how disassociated from reality VAM functions in practice. It becomes a surreal exercise collecting irrelevant data to feed a secret model to get useless results in an untimely fashion. I feel like I am teaching in a Dalí painting.
“Wheel of (mis)Fortune”
Castles floating in the sky
Pigs and horses floating by
VAM for teachers is surreal
Spinning Wheel-of-Fortune wheel
Reblogged this on David R. Taylor-Thoughts on Texas Education.
My take-away:
“Currently, in the absence of Chetty et al.’s (2011) proposal, some of the existing vacancies across the nation are being filled with novice teachers, some are being filled with experienced teachers who rejoin the teaching force, and at least one vacancy remains (because there is a teacher shortage)—implying that any extra vacancies created by Chetty et al.’s proposal must be filled with novices. There is no other possible source. In the presence of a teacher shortage, it cannot be the case that any of the extra vacancies created by Chetty et al.’s proposal will be filled with experienced teachers. Ultimately, after the type of shuffling described above, all of the extra vacancies must necessarily be filled with novices. Therefore, any policy that involves firing low-performing teachers must acknowledge that the vacant positions will ultimately be filled with novices, not experienced teachers.”
In other words, fire teachers via VAM, fill the vacancies with ‘novices, not experienced teachers.’
What else do we need to know? VAM is a method by which to fire experienced teachers, and replace them with novices.
It should be clear from this study– if you didn’t already get it– that VAM is part & parcel of the ed-reform agenda: to eliminate that social good caled ‘public education’ from the US Constitution.
Reblogged this on rjknudsen.
VirginiaSGP, it is simply not true that classroom observations are closely correlated with VAM scores. Those correlations have R values of about 0.3, from what I’ve seen.
Plus, here is from Yeh’s article; sorry that the tables are hard to read in my cut-and-paste. Go back and read the original article. You are completely wrong on the facts.
============================
2.1 FIXED TEACHER QUALITY?
A key assumption of the Chetty et al. (2011) analysis is that true teacher quality is fixed over time: “The model for scores . . . . assumes that teacher quality μj is fixed over time . . . . This rules out the possibility that teacher quality fluctuates across years” (p. 7). In other words, the Chetty et al. analysis assumes that a high-quality teacher this year will remain a high-quality teacher next year; a low-quality teacher this year will remain a low-quality teacher next year. Later in the article, however, the authors conclude based on their data that “teacher value-added is not in fact a time-invariant characteristic” (p. 25). While the authors’ analysis assumed that teacher quality is fixed over time, their own data suggest that teacher quality, as measured by teacher value-added “is not in fact” time-invariant, consistent with the results reported by Goldhaber and Hansen (2012). If this assumption is not valid, the conclusions of the analysis are not likely to be valid.
The intertemporal reliability of value-added teacher rankings was investigated by Aaronson et al. (2007), Ballou (2005), Koedel and Betts (2007), and McCaffrey et al. (2009). In each study, VAM was used to rank teacher performance from high to low. In each study, a majority of teachers who ranked in the lowest quartile or lowest quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2). Furthermore, a majority of teachers who ranked in the highest quartile or quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2).
Table 1. Instability of Value-Added Teacher Rankings in Chicago and Tennessee
Teacher Rankings
Locale
Bottom 25% in Year t;
Top 75% in Year t+1
Top 25% in Year t;
Bottom 75% in Year t+1
Chicago, IL
67%
59%
Tennessee
60%
52%
Notes. Chicago data are from Aaronson, et al. (2007, Table 7) for high school math teachers, with controls for student, peer, and neighborhood covariates. Tennessee data are from Ballou (2005, Figure 5b) for math teachers in grades 3–8 in a single large district.
Table 2: Instability of Value-Added Teacher Rankings in San Diego and 5 Florida Counties
Teacher Rankings
Locale
Bottom 20% in Year t;
Top 80% in Year t+1
Top 20% in Year t;
Bottom 80% in Year t+1
San Diego, CA
65%
71%
Dade County, FL
70%
67%
Duval County, FL
67%
61%
Hillsborough County, FL
67%
67%
Orange County, FL
59%
65%
Palm Beach County, FL
69%
68%
Notes. San Diego data are from Koedel and Betts (2007, Table 9) based on elementary school math teachers, with controls for student and school fixed effects. Data for Florida counties are from McCaffrey et al. (2009, Table 4) based on elementary school math teachers with 15 or more students per year, with controls for student fixed effects.