This superintendent posted a request for help. I will be posting a summary of research on value-added-measurement later today. I think it is fair to say that while economists like VAM (they measure productivity), education researchers overwhelmingly oppose VAM because they know that most of the factors affecting test scores are beyond the control of the teacher.
I am a Superintendent in Texas and I’m looking for some insight into a connection I just became aware of. The state of Texas has begun the process of revamping principal and teacher evaluations. Recently (in the last few months) the Commissioner of Education reached a compromise with the USDE about NCLB requirements. Part of the compromise required Texas to include test scores in the teacher evaluation tool.
Now I see, taken from the SEDL website ( http://txcc.sedl.org/our_work/), that the states’ work on both the Principal and Teacher Evaluation systems are based on the priorities of the USDE. Unless I’m mistaken, the USDE priorities have been in place for several years. That would make the Commissioner’s “compromise” essentially a lie. He planned all along to implement a system like this. The best remedy to this kind of “in the dark” activity is sunlight.
Can anyone help explain these connections? I realize my explanation is short on details, best I believe the answers could be very enlightening when you consider the following points:
-Texas, especially our governor, has made a point of opposing EVERYTHING Washington
-Texas filed a waiver from NCLB and then pretended the result was the best it could do
– Educators are about to have an evaluation system imposed on them that will for all practical purposes, reestablish High Stakes Testing as a priority in this state by requiring student test scores be a SIGNIFICANT (emphasis TEA) portion of their evaluation
This stuff is not a coincidence, just look at the pattern of reform initiatives in other states. Its only just begun here in Texas.
My email is bendeancarson@gmail.com
THE BELOW INFORMATION IS FROM THE SEDL WEBSITE REFERENCED ABOVE
This project relates to the following USDE Priorities:
Identifying, recruiting, developing, and retaining highly effective teachers and leaders
Identifying and scaling up innovative approaches to teaching and learning that significantly improve student outcomes
education researchers overwhelmingly oppose VAM because they know that most of the factors affecting test scores are beyond the control of the teacher” Yes, this statement is sadly true. But this is why sanders proclaims that his use of randomly assigned students to classrooms (which we know does not happen in real schools) controls for impact of these outside factors–his model accounts for this so we can ignore the outside factors. The flaw of the VAM system marketed by SAS and sanders is that it contains no independent measures of a teachers quality. For example, quality indicators such as scores on licensing exams, grades from the ed prep program, major or minor in their subject, years experience, and more could be used to identify and rate teachers and assign a value related to teaching skills, then use that in the model. But sanders uses the test data to create an array of potential test scores from modeled students to assign a teacher rating, then plugs that rating into the model to determine if a low or high rated teacher makes an impact on student test scores (a circular system), thus causing the teacher rating to be based on the teacher rating. Also, in assuming that outside factors, class size, mix of students etc can be ignored because of that gold standard thingy and other statistical assumptions, sanders just plain ignores the impact such have on his predictive power of his model. Thus the outcomes of a teacher getting rated high one year and poor the next. No validity, no reliability. The models do not seem to measure what they say they measure, and they are not consistent across years. This could be fixed, but the model would still be silent on ratings of teachers who do not teach the tested subjects–how the hell do they think such test based ratings apply to other subject teachers?
AKLA, I will soon post a strong critique of Sanders model of VAM
Others with expertise have already done so, quite some time ago. But a good summary would be useful, although a bit late. Boat has sailed, VAM has been adopted by many states, marketed by SAS Institute, etc. About ten years ago or so, The North Central Regional Education Lab put out a nice series of papers on VAM–while they have since changed names, restructured, perhaps went private, the papers should still be online. Once again, it matters not that VAM was developed by sanders, an ag-econ, not an educationalist, nor that he has failed repeatedly to explain how the model works, nor that the model does not speak to outside factors that teachers cannot impact (his assumption is that random assignment and large sample size negates ses, so all he is measuring is the impact of the teacher in the classroom–a bit naive to say the least). It is that he uses no independent measures of teacher quality to rate the teachers, and then he uses his rating to loop back into the model to come up with the final rating and how that rating impacts student achievement, which is what was used to derive the teacher rating. Circular. This is fine for some models, but in the teacher ratings game, it is not appropriate. When I explain this system to my students, The first thing they come up with is “so how much you paying us to try hard to score well?” Looking forward to your summary of the research on vam modeling.
THE critique of VAM (and all educational malpractices based on standardized tests) that goes to the very fundamental epistemological and ontological false bases of it is Wilson’s 1997 work (see below or most any other post on this blog-ha ha!).
Everything else is secondary to tertiary at best (which doesn’t negate the importance either).
Reblogged this on David R. Taylor-Thoughts on Texas Education.
Dear Diane,
I am writing to respectfully challenge the assertion that “education researchers overwhelmingly oppose VAM.”
At the AERA meeting in Philadelphia there were two dozen sessions that included value-added modeling as a topic. Not all were favorable for sure, but many papers were reported by well-meaning social scientists engaged in non-ideologically driven inquiry on VAM methods.
In addition, while it is generally understood that “most of the factors affecting test scores are beyond the control of the teacher” this does not preclude analyses, such as VAM, that attempt to isolate and model the teacher’s effect independent of all other factors.
How those effects translate into high-stakes evaluations are another matter, and we should all be able to agree that educating our leaders and ourselves on the appropriate uses and limitations of VAM must be a priority for proponents and critics alike.
David H. Cooper, Ph.D.
Professor of Education
Elon University
Far too many papers on looking at the models, trying new models, testing assumptions and arguing over whether we should use statistically significant or some other education jargon when the statistical models fail to provide statistical evidence that a program works or is better than another. And just how many decimal places are needed? 🙂 AERA much ado about nothing for the most part. IMHO. AERA reminded me of the NHTSA conferences and the work on traffic safety issues of drunk driving, deterrence theory, speeding, etc. Much discussion, much research, much bickering about appropriate models etc. Or the education research and the heads of the big ed prep schools pulled together by the feds to discuss why so much research and publication was not relevent to teachers in the classroom–and why not use teachers as partners in the research to help identify real issues related to teaching? Their response–teachers do not have the background to understand research or to identify education issues of importance. Whitehurst almost could not believe the panel of ed leaders. Anyway, keep plugging away and one day your group will find out many angels can dance on the head of a pencil, but the relevance to classroom instruction and student learning will be scant.
Please re-read this posting. I also suggest looking at a few other postings by the owner of this blog on VAM.
IMHO, “education researchers overwhelmingly oppose VAM” by itself does not adequately describe Diane Ravitch’s opinions on VAM. That is why there are other words following the truncated quotation.
And let me add: as a quick shorthand, yes, I might write the same, but it would be an incomplete statement/thought.
😎
VAM is inherently ideological, containing in its language the ideology of transplanting business methods and goals to education. After all, it’s intended to “to increase the value of a product or service before it is sold to a customer.”
How is it not ideological to refer to students as “products” being prepared for sale to “customers” (i.e. employers, Pearson, Amplify, etc.)?
I think most teachers enter the field with very different goals, and perceive their students very differently. Likewise, parents do not view their children as products to be sold.
Thank God for that, at least.
It baffles me why people who are so enamored of business methods and goals go into public service in the first place.
If you want to call yourself a “CEO” go BE one in a private sector company. It’s a free country! Go start a business!
Don’t pretend to be a CEO of a private sector company and overlay that metric on a public entity like a public school.
If they want to run something “like a business”, we have a sector for that!
Go run a BUSINESS like a business. Let someone else who actually understands and values “public servant” and “public sector” run public schools. It’ll work out great.
Chiara, those $600 billion-a-year budget honeypots to too sweet to resist, especially since The US business class seems incapable of producing anything in this country besides data-mining social media bubbles.
David,
“I am writing to respectfully challenge the assertion that “education researchers overwhelmingly oppose VAM.”
They may not “overwhelmingly oppose VAM” but that doesn’t mean that they are correct in accepting it as a VALID and ETHICAL practice. It’s neither valid nor ethical. How many times in human history has the dominant ideology (in this case idiology-purposely misspelled) been proven to be wrong, as VAM is “wrong”.
“. . . we should all be able to agree that educating our leaders and ourselves on the appropriate uses and limitations of VAM must be a priority for proponents and critics alike.”
What we should agree upon is the complete epistemological and ontological invalidity of the educational malpractices upon which VAM is based-educational standards and standardized testing-as that is the only rational logical option since those processes have so many errors and fallacies that they are rendered completely invalid. To understand why read Noel Wilson’s complete logical and rational destruction of those concepts in his never refuted nor rebutted “Educational Standards and the Problem of Error” found at:
http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
There appear to be numerous state commissioners/superintendents misleading policy makers on the provisions of NCLB and waivers addressing educator evaluation guidelines. But states do not have to specifically include standardized testing in evaluation guidelines. Here is some background that may be helpful. It seeks to answer the following question:
“Would revisions to state teacher evaluation systems use of statewide testing jeopardize federal ESEA waivers?”
The short answer is that many states received waivers without indicating any percentage at all. What many states submitted was far above and beyond what was necessary; some at the urging of state agency commissioners or superintendents who purposely misled policy makers.
There is no valid reason for the US DoE to reject/invalidate waivers simply because a state excludes test scores, changes percentages, or for that matter alters the frequency of evaluations. The evidence is provided below (and there are references to a few state plans that do not include test scores at all – see examples at end).
First, federal waiver compliance require evaluation systems to “take into account multiple valid measures, including as a significant factor data on student growth”. Upon review of the ESEA/NCLB law, the ESEA Waiver Application, and the ESEA Waiver FAQ, none make any reference to standardized tests relating to educator evaluation (see other references to federal testing requirements below). The phrase most often used is “multiple valid measures”. There are numerous references in the federal ESEA Waiver FAQ (the full document and other resources can be found here http://www2.ed.gov/policy/elsec/guid/esea-flexibility/index.html ) that support this. It also notes a preference for flexibility around annual evaluations and mentions that certain teachers be evaluated once every three years, not annually as many commissioners have pushed.
Second, the situation in Maryland and Connecticut may be a good example relevant to other state’s struggles. In Maryland the power to determine the percentage of student performance that accounts for teacher’s ratings is granted to locals, and that locals must meet the federal standard of “significant part” or “significant factor”. Like Connecticut, the materials that Maryland’s Department of Education sent to the feds indicate that student performance accounts for about 50% of a teacher’s evaluation (we are roughly 45%). But unlike Connecticut, Maryland’s percentages are not determined with any statutory authority (conversely, Connecticut statute authorized the state agency to implement guidelines based on a certain council’s recommendations, so deferred authority to the agency). In short, Maryland’s plan is simply a state department model plan that they’ve disingenuously represented as the statewide requirement (similar, but more brazen then the Connecticut state agency’s attempt to present a “model” evaluation plan as the state’s only permissible model). In these states and elsewhere, legislatures may be trying reclaim authority over teacher evaluation requirements that have been a bit usurped by the state education agencies.
Third, there are multiple states that received waivers without using any percentage. Here are some specific examples of states that have received waivers without a reference to percentages, some of which used the term “significant part/factor”. Links to their federal waiver application are included:
Maryland – State law established that changes in student growth are to be a “significant factor” in the evaluation of teachers and principals. The state department recommends 50%, but this is not binding. http://www2.ed.gov/policy/eseaflex/approved-requests/mdrequestp3approvalamended010913.doc (see P. 152 regarding state law and P. 166 regarding state department – note the permissive language: “The State Board of Education specified that student-learning gains should comprise 50 percent of the evaluation.”).
Massachusetts – Does not reference any particular percentage. It defers to local districts to identify and use measures of student learning, growth and achievement, and to determine ratings of high, moderate, or low for educator impact on student learning. Growth on the statewide assessment is used where available. http://www2.ed.gov/policy/eseaflex/approved-requests/ma.pdf (see page 89)
Missouri – Received approval using no specific percentage. It uses the term “significant part”. (http://www2.ed.gov/policy/eseaflex/approved-requests/mo.pdf (see page 106-107, 112-114)
Michigan – Received approval using no specific percentage in 2011-12, 2012-13 also using the term “significant part”. Missouri’s plan changes from “significant part”, increasing in later years as follows: 25% in 2013-14, 40% 2014-15, 50% 2015-16. This is evidence that the US DoE approved a plan with a significant part arguably less than 25%. https://www2.ed.gov/policy/eseaflex/approved-requests/miamendedreq080213.pdf (this is an amended version approved late this Summer; see page 175)
Kentucky – Received approval using no specific percentage. It says “…multiple measures of effectiveness including use of student growth data (both state standardized tests and formative growth measures that are rigorous and comparable across schools in a district) as a significant factor.” https://www2.ed.gov/policy/eseaflex/approved-requests/ky_request_amend_092812.doc (p. 93)
A note on federal testing requirements, evaluations, and opt-out:
Some commissioners conflate the use of standardized testing in evaluations with requirements for testing in NCLB. This is disingenuous at best.
In order to remain compliant with NCLB, states have to administer exams, and such exams must be made available to all students (i.e. regardless of disability, limited language proficiency, etc.). Federal law does not place a mandate on each child to take a test; it does not require any student to take an exam. But NCLB threatens states with the loss of Title 1 funds for not meeting participation rates.
It should also be noted that the section of federal code cited by agencies (20 USC 6311 (b)(3)(c)) also require such tests to “be used for purposes for which such assessments are valid and reliable” and “that the assessments used are of adequate technical quality for each purpose required…” SBAC and PARCC in field-test forms do not meet these criteria. Furthermore, it is not apparent that the results of these tests, or others, will be used for their purported and valid purpose if applied to educator evaluations.
Thank you,
Ray
Ray Rossomando
Research and Policy Development Specialist
Washington State was told BY ARNE that in order to get our NCLB waiver & $$, we needed to change our legislation to mandate student standardized test scores as part of teacher evaluations. Currently some districts include them, some do not – it is a locally bargained issue. Our Democratic gov met privately with Arne, then came back and ordered state legislators to craft legislation to do so. They did so, but it did not pass for a variety of reasons, including the fact that even had we changed legislation, the Feds had made no guarantee we would get our waiver or our $$, we had already implemented a brand new evaluation system that hasn’t had a chance to work yet, and because many teachers went down to Olympia and explained to lawmakers how this would impact teaching – or rather teaching to the test – in Washington. Thus Arne did not get his way, and we are waiting for fallout, if there is any.
Even though you indicate above that there is no reason for USDOE to be forcing student standardized test scores into teacher evals, they are doing their level best to make sure it is done.
I’ve been looking at the research, and the researchers all warn against using VAM for anything high stakes (such as evaluations, bonuses) because it is so unstable and because there are too many variables.
If I may correct one sentence: “I think it is fair to say that while economists like VAM (they SUPPOSEDLY PURPORT TO measure productivity):”
From “The Neoliberal’s Dictionary” (formerly known as “The Devil’s Dictionary”):
Productivity (n): the process of paying fewer people less money to do more work.
The opposite of a charter CEO who gets paid a whole lot for doing very little.
You might find today’s statement on VAM from the American Statistical Society relevant: http://t.co/3NPSLekHK6
Incidentally, I brought up the forthcoming changes (TX evals being tied to test scores) in a meeting this morning with two state reps and one state senator, and they didn’t have a clue. I reminded them that TAMSA and others were most concerned about the overemphasis on testing (not just the numbers of tests) and that this policy–while required by the feds for the waiver–would re-emphasize testing and cause increased teaching to the test. The legislators fumbled for an answer, honestly appearing to not know this was going on. One finally said something to the effect that regulatory agencies (like the TEA) sometimes make rules without the consent or knowledge of the lege.
I felt a little better when a fellow supt said he understood that statute forbids the Commissioner from requiring local districts to adopt any certain appraisal tool and specifically permits locally-designed appraisal systems that meet statutory requirements–and tests tied to evals is not listed among the requirements in state law.
Hope this helps.
JK
but I bet those two highly qualified elected officials could have told you the latest recruiting news from college and high school sports pages!!! They have no need to know the details, their leadership tells them how to vote and the lobbyists kick in money and perks.