When this statement first appeared in 2014, I said at the time that it should be on the bulletin board of every public school.
The American Statistical Association explains here why the evaluations of individual teachers should not be based on their students’ test scores.
Here is an excerpt. Read the whole statement, which is only 8 pages long:
It is unknown how full implementation of an accountability system incorporating test-based indicators, such as those derived from VAMs, will affect the actions and dispositions of teachers, principals and other educators. Perceptions of transparency, fairness and credibility will be crucial in determining the degree of success of the system as a whole in achieving its goals of improving the quality of teaching. Given the unpredictability of such complex interacting forces, it is difficult to anticipate how the education system as a whole will be affected and how the educator labor market will respond. We know from experience with other quality improvement undertakings that changes in evaluation strategy have unintended consequences. A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Overreliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.
Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences.
The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling. Combining VAMs across multiple years decreases the standard error of VAM scores. Multiple years of data, however, do not help problems caused when a model systematically undervalues teachers who work in specific contexts or with specific types of students, since that systematic undervaluation would be present in every year of data.
Despite the warning from ASA, which has no special interest and does not represent teachers or public school administrators, many states continue to use this method (called VAM, or value-added measurement or value-added modeling).
States were coerced into adopting this unproven method by the U.S. Department of Education, which said that states had to adopt it if they wanted to be eligible to compete for nearly $5 billion in federal funds in 2009, as every state was undergoing a budget crisis caused by the economic meltdown of fall 2008.
Many states adopted it, and it has not had positive effects in any state.
In Colorado and New York, among others, VAM scores count for as much as 50% of teachers’ evaluation.
A state court in New York ruled this method “arbitrary and capricious” when challenged by fourth grade teacher Sheri Lederman and her lawyer-husband Bruce Lederman.
Some states assign VAM scores to teachers based on students they never taught in subjects they don’t teach.
This is an example of federal and state policy that has no basis in evidence and that has harmed the lives of many teachers. It very likely has caused teachers to leave the profession and contributed to teacher shortages.

“Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable. . .”
The ASA is caught up in the false discourse regarding the supposed “measuring” of the teaching and learning process. I guess I shouldn’t expect any different. That meme has been floating around since the beginnings of the standardized testing regimes. Of course rationo-logical inconsistency of those supporting the testing regime regarding the onto-epistemological underpinnings of the testing regime hardly ever, closer to never gets questioned. We know that those onto-epistemological foundations are made of concrete of 99% sand and rock. Why are those underpinnings so weak, actually non-existent??
Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement (notice how he is trying to assert by proximity that educational standardized testing and the testing done by engineers are basically the same, in other words a “truly scientific endeavor”):
“Physical tests, such as those conducted by engineers, can be standardized, of course, but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.”
Now since there is no agreement on a standard unit of learning, no exemplar of that unit, and there is no measuring device calibrated against said non-existent standard unit exemplar, how is it possible to “measure the nonobservable” which is what all this standardized testing insanity is all about???
So standardized testing attempts to “measure” the “nonobservable” with a non-existent measuring device not calibrated to any non-existent standard unit of measurement of the teaching and learning process.
Let that last statement sink in for a moment!
How absurd and insane is the concept? How lacking in onto-epistemological foundation is the concept of “measuring” student learning? Really? What kind of suckers do the proponents of standardized testing take us for? Obviously very stupid suckers.
So much harm to so many students is caused by the educational malpractices that are standards and testing or as Phelps contends in “measuring the nonobservable”.
How insane is this all???
Utterly beyond my comprehension!!!
LikeLike
It could be insane or, based on a longstanding patterns of behavior by so-called reformers, it could be malicious.
I choose door number two.
LikeLiked by 1 person
To Duane and KrazyTA: A note from the field of philosophy, which will only stir the pot because it only points to the massive work in foundations that still needs to be done:
“This line of thinking opens out to a whole other field of inquiry regarding methods of inquiry where human data are concerned. That line of thinking we take up in other writings. Briefly, however, B. Lonergan isolates the classical and statistical sciences that, in their specific ways, relate to human studies, e.g., quantification of human data where it is applicable, and medicine. However, human beings are historical and conscious; and so both development (genetic) and dialectical methods also apply in their own data-specific ways. As of this writing, no one that I know of has taken on the application of these methods (or the functional specialties developed in Lonergan’s Method in Thology [1972]) in any systematic way to the human fields. It follows that those fields are still laboring under a kind of slavery to the paradigms of classical and statistical sciences alone with little or no reference or regard to the fact that humans are studying humans; or that this fact brings the comportment of the researcher into view on many levels, and under legitimate scrutiny. Need I say that scrutinizing methods are not yet developed either. An understanding of the functional specialties and the method of consciousness, as critically understood (general empirical method and self-affirmation) by researchers becomes the methodological imperative, but is still overlooked and much neglected.” My emphases for this blog and FWIW. (From my work in progress and reference to B. Lonergan’s Insight: a Study of Human Understanding [2000] and his other works FWIW).
LikeLike
“Utterly beyond my comprehension!!!”
Hey, that guy you shave with, figured it out.
“And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.”
Effective marketing-aganda starts where critical thinking ENDS…
LikeLike
The problem is that the data mongering rich and powerful are able to pawn off this nonsense to lay people so easily. Most people think human brains are just empty boxes to be filled with ‘stuff’, quantifiable stuff. They don’t understand child development or the processes of learning. Therefore, the problem is centralized control. The problem is “scaling up” products for sale.
LikeLike
VAM and its kin go by many names.
“Unintended” consequences?
Today’s DILBERT.
Three panels.
First. The pointy-haired boss announces at a staff meeting: “I ranked all of you based on your performance.”
Second. Someone says: “Wally came out on top because he didn’t make any mistakes.”
Third. Dilbert: “He also didn’t do any work.” Wally retorts: “Why does everyone hate winners?”
Indeed…
😎
LikeLike
Holy cow. What a study…. it should be case closed on VAMs and the testing craze.
I have very few bookmarks on my computer web-browser. This study is now one of them.
LikeLike
You are so right. Unfortunately, “reform” does not seem to respond to evidence and reason. It responds to what the key billionaire players want.
LikeLike
I prefer this statement about VAM for brevity, and directness. One-page is easier to post as well. The author has expertise.
VAMs Are Never “Accurate, Reliable, and Valid”
Citation and source: Klees, S. J. (2016). VAMs Are Never “Accurate, Reliable, and Valid.” Educational Researcher, 45(4), 267. doi: 10.3102/0013189X16651081
http://edr.sagepub.com/content/45/4/267.extract
See also the extended discussion by Klees in an article titled “Inferences from regression analysis: are they valid?”, real-world economics review, issue no. 74 07 April 2016, pp. 85-97, http://www.paecon.net/PAEReview/issue74/Klees74.pdf
Here is an excerpt where Klees takes up the same cause as Learner, an earlier researcher (with a fortuitously apt name).
Begin quote
In his now classic article, “Let’s Take the Con Out of Econometrics,” Leamer (1983, p. 36) describes regression analysis in the real world and its consequences:
“The econometric art as it is practiced at the computer … involves fitting many, perhaps thousands, of statistical models….This searching for a model is often well-intentioned, but there can be no doubt that such a specification search invalidates the traditional theories of inference. The concepts of unbiasedness, consistency, efficiency, maximum likelihood estimation, in fact, all the concepts of traditional theory utterly lose their meaning by the time an applied researcher pulls from the bramble of computer output the one thorn of a model he likes best, the one he chooses to portray as a rose.” (p. 91) End quote.
Educators who are being subjected to variations on VAM (and SLOs) have every right to angry that this “con in econometric thinking” is being perpetuated by people who knowingly disregard the consequences of the con.
Who are these cons?
Good old Pearson. See —teacher-career-advancement-initiatives-lessons-learned-from-eight-case-studies.pdf
Or try this from the University of Texas where The Project on Educator Effectiveness and Quality (PEEQ) is underway.http://sites.utexas.edu/peeqlbj/faq/
Or this brief from the Bureau of Economic research with the conclusion that
“Simulations calibrated to the Boston data show that, bias notwithstanding, policy decisions based on conventional VAMs are likely to generate substantial achievement gains. Hybrid estimates that incorporate lotteries yield further gains.”
The paper is behind a paywall. It comes from three MIT economists and one from Berkley. http://www.nber.org/papers/w21748
Or look at this to see how colleges and universities are being strong-armed into meeting the Gates Foundation’s idea of data for calculating the value of a college degree/certifccate and whether postsecondary education institutions/agencies are improving their outcomes.
See especially the color coded list of criteria for judgment and metrics sought to calculate “value.” These are on page iv.
Toward convergence: A technical guide for the Postsecondary Metrics Framework. Washington, DC: Institute for Higher Education Policy. http://www.ihep.org/sites/default/files/uploads/postsecdata/docs/resources/ihep_toward_convergence_low_2b.pdf
LikeLike
I am slow responding to comments because I am in a car driving from Utah to Arizona and wi-fi is spotty.
LikeLike
“The paper is behind a paywall. It comes from three MIT economists and one from Berkley. http://www.nber.org/papers/w21748”
It of course is ridiculous that one has to pay for this not yet peer reviewed article or work at a university to read it. In any case, we can easily dismiss the paper on the grounds that it relies on standardized test scores in math and English.
Here is a promising sentence from the conclusion of the article
The methods developed here may be useful for combining quasi-experimental and non-experimental
estimators in other contexts. Candidates for this extension include the quantification of teacher, doctor,
hospital, firm, or neighborhood effects.
I can’t wait to see the VAMming of doctors or my neighbors and their effects on each other performance.
LikeLike
Seriously now? Economists are now proposing methods to evaluate education better? What’s their relation to the K-12 teaching profession? One of the authors’ daughter is teaching at a charter school.
LikeLike
Laura “I prefer this statement about VAM for brevity, and directness. One-page is easier to post as well. The author has expertise.”
Which statement are you referring to, Laura? And which author? Is it the reference
http://edr.sagepub.com/content/45/4/267.extract
That’s also “behind a paywall” 😦
LikeLike
Laura “Or try this from the University of Texas where The Project on Educator Effectiveness and Quality (PEEQ) is underway.http://sites.utexas.edu/peeqlbj/faq/”
From the FAQ
What is value-added modeling in education?
Value-added modeling (VAM) is a way to measure a teacher’s (or principal’s/school’s) unique contribution to a student’s learning. By measuring a student’s academic history and factors that contribute to a child’s learning, statistical techniques are used to predict students’ performance. The difference between a students’ predicted performance and actual performance can be attributed to their teacher’s (or principal’s/school’s) effectiveness.
[…]
What do you mean by student achievement and student performance?
PEEQ uses the term student performance when referring to measureable outcomes of student learning, such as test scores. Student achievement is used to refer to the broader learning process.
In other words,
1) VAM relies of students’ performance (the broader concept of “achievement” is not taken into account)
and
2) Student’s performance is about measurable outcomes, ie test scores.
Since these two points above are not about the quality of education, we can safely dismiss this program.
LikeLike
Just from reading these notes about VAM reminded me of the moment in medicine when medical doctors thought bleeding people would fix their problems, like a fever; but we know now, of course, that people actually died from the process itself in many cases.
It seems the human fields need a set of theories and methods that take into account the human nature of the data as well as that of the researchers and doctors–as both are historical and have consciousness–and that do not rely ONLY on classical and statistical sciences and their methodologies for their understanding of the data. This is especially so when those theories and methods are not suited to the data, then, or to the active developmental and dialogical relationship between that data and the researcher. It’s quite uncritical and unscientific NOT to take account of those differences in a systematic way, but we haven’t found our way there yet.
LikeLike
Catherine: “It seems the human fields need a set of theories and methods that take into account the human nature of the data as well as that of the researchers and doctors–as both are historical and have consciousness–and that do not rely ONLY on classical and statistical sciences and their methodologies for their understanding of the data. ”
If Penrose is correct, not just human fields need such methods. I am not sure, though, that we need any data analysis in evaluating education. Is there a convincing example to the contrary?
LikeLike
To Mate Wierdl: “If Penrose is correct, not just human fields need such methods. I am not sure, though, that we need any data analysis in evaluating education. Is there a convincing example to the contrary?”
Too big for a blog; however, let me say this: The data is human: historical and conscious–unlike natural or physical data that belongs in those fields. A fundamental tenet of science is to pay attention to the data and so to develop and adapt methods and expectations that relate to it (and not to some other kind of data).
And so when we apply ONLY classical and statistical methods to a data field that rightly has “evaluation” as (1) an aspect of the data (from a qualified cognitional theory) and (2) as one of its terms preceding concrete policy and actions, we omit, overlook, or obscure a good part of what is essential in the data and to the researcher. That’s where education is presently, and teachers are at least intuitive about it, because they see its failures every day, and their own successes that seem to have no theoretical or evaluative backdrop. We are not systematic about what most teachers are intuitive about–in education or any of the fields that feed into it, e.g., the social sciences. though much is going really well in some circles, precisely because intuition rocks for many who have recognized our present methodological bankruptcy and have the openness and guts to go forward regardless of the power of statistical fascists. That’s the crisis we are in presently in the fields, however; but lots of light shining through.
The other two methods are, generally, developmental and dialectical. When we are at our best, the four work together in education and any human field–spontaneously, but not yet systematically. And to be consistent with that claim, your question: “do we need any data analysis in evaluating education?” needs to be treated dialectically. That is, YES, we need the other two GENERAL methodologies and several kinds of data analysis in evaluating education. But NO we don’t need it as placed on, then emerging from, the Procrustean bed it’s presently forced onto.
LikeLike
“States were coerced into adopting this unproven method by the U.S. Department of Education, which said that states had to adopt it if they wanted to be eligible to compete for nearly $5 billion in federal funds in 2009, as every state was undergoing a budget crisis caused by the economic meltdown of fall 2008.”
Sounds like bribery from U. S. Dept. of Ed. while states were struggling to maintain schools due to a financial crisis caused by excesses of millionaires/billionaires on Wall Street.
Then, millionaire/billionaire non-educators become education reformers!
Sounds like bullying to me. Where’s the HIB Incident Report?
LikeLike
HIB = ???
LikeLike
What’s going on is actually troubling. Papers like
http://sites.utexas.edu/peeqlbj/2016/02/12/new-paper-teacher-preparation-for-profit-or-prestige-analysis-of-a-diverse-market-for-teacher-preparation/
(from Laura’s link to UofTX) show that now the other side is piling up “research” to support their agendas so that now they really can say that their stuff is research based.
The only way I see to counter this is to say, these papers are not about the the quality of education (or even education as we understand it) but about test scores.
So, in my opinion, we really need to stop dealing with this at the beginning, at the fundamental assumptions, on high grounds, or else we’ll be pulled into very messy arguments about randomized likelihood variables of lottery effects and non-parametric least square estimators.
In other words, forget about getting pulled into muddy statistics, where we’ll drown, forget about waiting for some good soul to take the (enormous amount of) time and address the math and statistics these papers contain. There is absolutely no need for this.
Let’s trust that our simple message “test scores have no relationship to good education” will do the job.
Let’s put a clear, rigid distance between test scores and education, and don’t let it shorten.
With such an attitude, we’ll lose our insecurities when yet another paper comes out from the VAM laboratory, and we’ll be able to bravely cut through the mathcrap in search of the sentence about test scores, and once we found it (and we will find it) we’ll exclaim “irrelevant”, and delete the PDF of the paper from our hard drives. In time, we could even organize little investigative competitions to see who can find the “test score” sentence first in a shiny new VAM paper.
Don’t worry, this is not a brand new situation in science when papers appear with contents that are irrelevant to their claimed subjects. This happens all the time. There is even a standard terminology for these papers: these are the “Not even wrong” papers, because they are so removed addressing a particular problem, that we cannot even begin to qualify them right or wrong.
For example
In this paper, we quantify the likelihood of Donald “Yuge” Trump becoming president. We find, to our surprise, that if we make the mild assumption that Elvis is alive, the chances of the Donald’s presidency is at least 93%. Our main tool is a dispersified version of the standard law of large number likelihood lottery statistics.
Can you investigate if the paper’s conclusion about Trump’s presidency correct? No, because it is drawn from the incorrect assumption that Elvis is alive. The paper is beyond wrong. It makes no sense.
LikeLike
Hi Diane —
Another great blog post about VAM by Cathy O’Neil, Harvard Math Ph.d recipient and author of the recently published “Weapons of Math Destruction.”
Enjoy!
Cheers, Dave Yrueta >
LikeLike