AERA Issues Warning Against Use of VAM

The American Educational Research Association issued a warning against the use of value added measures for high-stakes decisions regarding educators and teacher preparation programs. The cardinal rule of assessment is that tests should be used only for the purpose for which they were created. A measure of fourth grade reading measures the student, not the teacher, the principal, or the school.

AERA Issues Statement on the Use of Value-Added Models in Evaluation of Educators and Educator Preparation Programs
WASHINGTON, D.C., November 11—In a statement released today, the American Educational Research Association (AERA) advises those using or considering use of value-added models (VAM) about the scientific and technical limitations of these measures for evaluating educators and programs that prepare teachers. The statement, approved by AERA Council, cautions against the use of VAM for high-stakes decisions regarding educators.

In recent years, many states and districts have attempted to use VAM to determine the contributions of educators, or the programs in which they were trained, to student learning outcomes, as captured by standardized student tests. The AERA statement speaks to the formidable statistical and methodological issues involved in isolating either the effects of educators or teacher preparation programs from a complex set of factors that shape student performance.

“This statement draws on the leading testing, statistical, and methodological expertise in the field of education research and related sciences, and on the highest standards that guide education research and its applications in policy and practice,” said AERA Executive Director Felice J. Levine.

The statement addresses the challenges facing the validity of inferences from VAM, as well as specifies eight technical requirements that must be met for the use of VAM to be accurate, reliable, and valid. It cautions that these requirements cannot be met in most evaluative contexts.

The statement notes that, while VAM may be superior to some other models of measuring teacher impacts on student learning outcomes, “it does not mean that they are ready for use in educator or program evaluation. There are potentially serious negative consequences in the context of evaluation that can result from the use of VAM based on incomplete or flawed data, as well as from the misinterpretation or misuse of the VAM results.”

The statement also notes that there are promising alternatives to VAM currently in use in the United States that merit attention, including the use of teacher observation data and peer assistance and review models that provide formative and summative assessments of teaching and honor teachers’ due process rights.

The statement concludes: “The value of high-quality, research-based evidence cannot be over-emphasized. Ultimately, only rigorously supported inferences about the quality and effectiveness of teachers, educational leaders, and preparation programs can contribute to improved student learning.” Thus, the statement also calls for substantial investment in research on VAM and on alternative methods and models of educator and educator preparation program evaluation.

Related AERA Resource:
Special Issue of Educational Researcher (March 2015)—
Value Added Meets the Schools: The Effects of Using Test-Based Teacher Evaluation on the Work of Teachers and Leaders

Laura H. Chapman says:

November 11, 2015 at 3:41 pm

Late Ugh. Mixed messages from the American Educational Research Association on so-called value-added measures— the infamous VAM.

AERA’s long delay in taking a position on the abuses of VAM in K-12 education over the last 15 years is suddenly worth attention, a public statement, but why now? Why so little so late?

And why this hedging?

The press release (Nov.11, 2015) actually gives credence to VAM and the whole mission of “measuring teacher IMPACTS on student learning outcomes.”

Is no one at AERA paying attention to the AERA to this horrible language? Or the gaping hole left for future abuse of VAM in this message?

…”While VAM may be superior to some other models of measuring teacher impacts on student learning outcomes, ‘it does not mean that they are ready for use in educator or program evaluation. There are potentially serious negative consequences in the context of evaluation that can result from the use of VAM based on incomplete or flawed data, as well as from the misinterpretation or misuse of the VAM results.’”

Superior ??

So, because VAM has migrated into higher education suddenly AERA discovers they need to say “not in my territory, not in teacher education, not in my program evaluations.”

Dear colleagues in research, the use of VAM for teacher, principal, and school evaluations has been common in K-12 education for fifteen years. Aided and abetted by your collective silence, countless schools have been closed, able and committed principals and teachers fired, others demeaned by the aggrandizement of test scores and quixotic ratings from VAM and so—called alternative measures of teaching effectiveness including the notoriously invalid SLOs.

So, suddenly the researchers and scholars in teacher education are “at risk” of being VAMed. Their “value-added” will be calculated by the student scores produced by graduates of their programs. Teacher education programs will be stack rated by the long reach of Bill Gates’ “teacher quality” initiative and the migration of VAM algorithms from their use as measures of the productivity of seeds, sows, and cows to their use as measures of the productivity of teachers and teacher education programs.

Being VAMed means you are going to be impacted by a flawed policy and metric.

Ask those who have been victimized. And while you are thinking about the sham ratings of teachers, consider an apology to the entire community of workers in K-12 education for your unconscionable silence about the abuse.

And please stop legitimating the idea that IMPACTING is a proper word for describing the work of teachers.

LikeLike

retired teacher says:

November 11, 2015 at 4:25 pm

Their warning about VAM is tepid, especially after spending the time to outline numerous flaws and shortcomings. They have chosen a too polite response,”there is wide agreement that unreliable or
poor-quality data, incorrect attributions, lack of reliability or
validity evidence associated with value-added scores, and unsupported claims lead to misuses that harm students and educators.” While they do mention that peer review and traditional evaluation models are superior, they fall short explaining the dire consequences from misuse of VAM. VAM is mostly being used to assign blame to teachers for students’ performance on standardized tests. In some cases teachers are being assigned blame for students or subjects that they don’t even teach. This is outrageous! Some states are relying on the results of VAM to reduce teachers’ wages or to circumvent workers’ due process rights. Governors appreciate VAM because it gives them all the rights to violate contracts and dispense with pesky senior staff. If this isn’t bad enough, the magical ax wielding algorithm is a big secret.

LikeLike

Tim says:

November 11, 2015 at 5:37 pm

The “Superior” remark is in relation to a status model, that is a model that only uses student scores at the end of a school year. The “VAM” models take a comparison of tests to determine growth instead of the status model that only uses one test.

In other words it is superior (only) to being evaluated by one test at the end of the year. (nothing else)

LikeLike

Duane Swacker says:

November 11, 2015 at 6:58 pm

retiredteacher,

How about a not so tepid warning:

Aviso/Danger/Warning!

VAM and SLO/SGP ARE COMPLETELY INVALID AND WOULD BE BEST CONSIGNED TO OBLIVION! ANYONE WHO USES* THE RESULTS OF THE PROCESS IS A CHARLATAN, QUACK, POSER AND CON ARTIST MIXED IN ONE.

(And I’m being nice!)

*Other than to debunk the bovine excrement that are educational standards and standardized testing.

LikeLike

Akademos says:

November 11, 2015 at 3:51 pm

And there’s this from the ‘Southern Tier’ of NY.

http://www.pressconnects.com/story/news/local/2015/11/09/southern-tier-sounds-off-common-core/75471310/

Ed Detective says:

November 11, 2015 at 4:19 pm

“A measure of fourth grade reading measures the student”

Of course, it doesn’t even do that. Right Duane?

dianeravitch says:

November 11, 2015 at 4:19 pm

Ed, have you not read Wilson?

LikeLike

Ed Detective says:

November 11, 2015 at 4:27 pm

Not in a month or so 🙂

LikeLike

KrazyTA says:

November 11, 2015 at 5:41 pm

From Banesh Hoffman, THE TYRANNY OF TESTING (2003 republication of the 1964 edition of the 1962 original, pp. 29-30):

[start]

Having fun at the expense of tests and testers is not mere idle amusement. It serves an important purpose. And exhibiting and analyzing defective questions serves an even more important purpose, as will be seen later on. The problem of testing is far too serious and far too difficult to be treated wholly in a spirit of fun. Two facts dominate the problem. One is that testing must take place. And the other is that, except in the simplest situations, there is no satisfactory method of testing—nor is there likely to be. Human abilities and potentialities are too complex, too diverse, and too intricately interactive to be measured satisfactorily by present techniques. There is reason to doubt even that they can be meaningfully measured at all in numerical terms. Yet measurement, assessment, estimation, guesswork—call it what you will—can not cease.

[end]

People ignore Wilson and Hoffman at their peril.

Just my dos centavitos worth…

😎

LikeLike

Duane Swacker says:

November 11, 2015 at 6:33 pm

My responses to Laura C. bringing this up in a prior thread:

Laura, speaking of the AERA and testing, have you read this?

“A Little Less than Valid: An Essay Review” by Noel Wilson

American Educational Research Association; American Psychological Association; National Council on Measurement in Education. (2002). “Standards for Educational and Psychological Testing” Washington, DC: American Educational Research Association. ISBN 0-935302-25-5

“As a test maker I worked for the Australian Council for Educational Research for six years. As a result I had always regarded this book in its previous incarnations as a sort of bible, a reference of last resort. So not until I wrote my Ph D thesis on Educational Standards and the Problem of Error did I subject the 1985 version of Standards to a more critical analysis (Wilson, 1997). As that analysis was not overly complimentary, I thought it only fair to look at the 2002 version with similar critical gaze. As before, I focus on validity. Why? Because, as the good book says, “Validity is, therefore, the most fundamental consideration in developing tests” (p. 9). I concur. If the test event is not valid, if indeed the test is invalid, then all else is vain and illusory.”

Citation: Wilson, Noel. (2007, April 26). A little less than valid: An essay review. Education Review, 10(5). Retrieved [date] from http://edrev.asu.edu/essays/v10n5index.html

And my responding to Laura again:

“Is no one at AERA paying attention to this horrible language” or to Noel Wilson’s critique of the testing bible?

From the introduction to the AERA VAM edition:

“While a useful starting point, the validity and reliability of the measures tell us very little about the effects on teaching and learning that come from embedding value added into policies like teacher evaluation, tenure, and compensation.”

How does that square with what the testing bible put out by the AERA, APA and NCME says about testing and by extension the usage of results of said tests: “Validity is, therefore, the most fundamental consideration in developing tests” (p. 9).

Let’s see validity of the test is “the most fundamental consideration” but the fact that the tests and conclusions are completely invalid as proven by Wilson “tells us very little about the effects on teaching and learning. . . . ”

NO, IDIOTAS, it tells us all we need to know, any conclusions are also COMPLETELY INVALID THEREFORE COMPLETELY USELESS WHICH TELLS US ALL (a hell of a lot more than “tells us very little”) WE NEED TO KNOW ABOUT THE PROCESS.

Using the results of standardized tests for anything (other than lining the bird cage or litter box with the paper) is pure conjecture, lying, fraud, falsehoods, deceptions, dishonest, etc. . . .

Why it is almost impossible to disabuse people of the tyranny that are educational standards and standardized testing and to help liberate their thinking and elevate practices beyond sorting, separating and discriminating against students is beyond my comprehension.

Open your minds folks, they’re rotting from within without a little sunshine of rationo-logical thought to disinfect them from these educational malfeasances.

LikeLike

Duane Swacker says:

November 11, 2015 at 6:35 pm

Ed, Yep! and KTA, excellent quote with your dos centavitos worth more than the AERA testing bible (and their current volume)

LikeLike

Ed Detective says:

November 11, 2015 at 6:52 pm

I have mixed feelings about that Hoffman excerpt, because it presents several different points and a few of them are flawed. One major flaw is the conflation of measurement, assessment, estimation, and guesswork. These are not interchangeable.

Another:
“One [fact] is that testing must take place”
This is not a fact.

The rest of it does seem to jive with Wilson, whom to my understanding would argue that just because you “measure” a student, doesn’t mean that measurement means anything at all. In other words, your measurement via the test is not really a measurement, it is a fallacious assumption and arbitrary label.

LikeLike

KrazyTA says:

November 11, 2015 at 7:43 pm

Ed Detective: the faults lie with me, not Banesh Hoffman.

1), When he refers to “testing must take place” he is not referring to himself but to the self-styled “professional testers” (and their enablers and enforcers) that insist and mandate that “testing must take place.” In other words, against the testocrats’ mantra of his day of the “fierce urgency” of standardized testing, especially of the high stakes variety. Indeed, his last chapter is entitled “Don’t Be Pro-test — Protest” — so the fault, again, lies with me in not providing sufficient context.

2), Again, when you state “One major flaw is the conflation of measurement, assessment, estimation, and guesswork” the fault, again, lies with me, not Hoffman. He had written for, and was writing for, the general public of his time. If he buy his book (a slim volume) you will realize that he is very careful in word choice. Hence, in my short excerpt I did not make it clear that he is trying to make a general point to a very broad audience.

What I hope comes through loud and clear are these two sentences:

“Human abilities and potentialities are too complex, too diverse, and too intricately interactive to be measured satisfactorily by present techniques. There is reason to doubt even that they can be meaningfully measured at all in numerical terms.”

I think today he would be a leading figure in FairTest and the Opt Out Movement. And I also think if he were alive in 2015 he would amend the second sentence to read “There is no reason to believe that they can be meaningfully measured at all in numerical terms.”

Hence, he and Wilson would stand together, arm in arm, against the onslaught of the bean counters that know the price of everything and the value of nothing.

Just my tres centavitos worth…

*I added one centavito to the first two.

😎

LikeLike

retired teacher says:

November 12, 2015 at 8:55 am

We have a big problem when policy is dictated by politics and prejudice rather than research. Think tanks, foundations and governors are not research. They are political arms that dictate rules that suit their agendas. Their big idea is to destroy the public schools that have made this country what is it so that a few string pulling wealthy can make a profit from our children.

LikeLike

AlwaysLearning says:

November 11, 2015 at 4:33 pm

It is time for the teachers’ union to file a class action suit on behalf of teachers and administrators against the use of VAM. Unless the testing companies say the test measure teacher effectiveness they should not be used for that purpose. Time for membership to pressure union to take a stand.

Tim says:

November 11, 2015 at 6:09 pm

What took them soooooo long?

When VAM was first introduced, the authors stated that it should not be used for high-stakes decision! Then money got involved and nobody listened!

AERA is just repeating what has been said over the last few years, that VAM, SGP, and so on, should not be used for high stakes decisions.

Will anyone listen????