Adam Urbanski, president of the Rochester, Néw York, teachers’ union, is struggling to make sense of the state’s teacher and principal evaluation system, which varies wildly from district to district. Scarsdale, perhaps the most affluent and high-scoring district in the state, had no “highly effective” teachers. But Rochester, one of the districts with high poverty and low scores, had many. The reality is that none of the formulas for reducing teaching to a number make any sense. Teaching is an art, a craft, and a bit of science. A great teacher may be great one year, not the next, or great with this class but not another. (APPR in Néw York is the Annual Professional Performance Review.)
The ratings in Néw York are referred to as HEDI: Highly Effective, Effective, Developing, Ineffective. A commenter on the blog recently said that “Developing” is considered a low grade but she hoped that she was “developing” every day as a teacher.
This is what Adam wrote to his members:
“The Rochester Miracle?”
“Each year, we re-negotiate our APPR agreement with the District to do all we can to make it less damaging to our student and more fair to teachers.
“We are making progress in reducing the number of Rochester teachers (be)rated as Developing or Ineffective (40% in 2012-2013 but 11% in 2013-2014) and increasing the number rated as Effective or Highly Effective (60% in 2012-2013 but 89% in 2013-2014). Just one year ago, only 2% of Rochester teachers were rated as Highly Effective. This year, that number increased to 46%.
“Why such a huge fluctuation? Maybe it’s because we re-negotiated the agreement; or because teachers set more realistic SLO targets; or because the NYS Education Department adjusted the cut scores in ELA and Math; or because huge fluctuations are typical of invalid and unreliable evaluation schemes. Who knows? In any event, we continue to press for the total abolishments of APPR.
“Meanwhile, we are negotiating a successor agreement that would further diminish excessive testing of students and wrongful rating of teachers.”
“Each year, we re-negotiate our APPR agreement with the District to do all we can to make it less damaging to our student and more fair to teachers. . . Meanwhile, we are negotiating a successor agreement that would further diminish excessive testing of students and wrongful rating of teachers.”
The only agreement that would do justice to the teaching and learning process would be one that totally scraps any notions of standards and standardized testing as those have been proven to be COMPLETELY INVALID due to the myriad errors in the process of making and using said malpractices. Start with invalidities end with invalidities. Or in more common terms “crap in, crap out”.
When one negotiates with those who use COMPLETE INVALIDITY as the starting point the only ethical response is to point out those invalidities and declare that one’s starting point is after those invalidities have been eliminated. Until then, no negotiations, as otherwise one is just giving in to the lies and falsehoods being perpetuated upon the students and teachers.
My own pessimism leads me to believe that the vast majority of educators are of the GAGA mold (or is that of moldy Gaganess?). Grow some cojones public school educators or continue to have that heel of INVALIDITY on your throats.
Diane, thank you for publicizing this. Despite the false wedge that the so-called reformers have created between unions and parents, it is leadership like Urbanski’s that makes it clear that teachers unions are working toward many of the same goals parents are. As I’ve stated before, if it takes a union thug to get students what we parents know they need, I’ll hold a sign that reads “Moms for Union Thugs”.
NM has a group of parents who have created a monstrous wedge between parents and unions, making any corporate “reformer” very pleased. In fact this group is preparing to “expose” Urbanski and the president of my local, who happen to work closely together in our national union and in a progressive organization, Teacher Union Reform Network (TURN) because apparently, according to the anti-union parent group, these union leaders are shills for the corporate guys. When TURN was started in 1996, it did indeed take seed money from the Broad Foundation. Anyone who witnessed Urbanski taking down infamous Broadie, Jean Claude Brizard would know that Adam’s leadership is about the students, teachers and public education.
Be carefu, Karen. Urbanski is behind TURN an organization that is dedicated to union reforms. Whiils no one sees more reason for union rforms than I do, I certaonly do not approve of his efforts to deprofessionalize teachers or his association with Broad and Gates. Like Ramdi, he is trying to play it both ways. This is, ultimately, the reason we teachers are getting the shaft.
Greed and ambition usurp the unions’ mission because leaders are self-centered, deceitful and willing to do whatever it takes to realize their goals .
And where can one find the COMPLETE INVALIDITY of EDUCATIONAL STANDARDS and the accompanying STANDARDIZED TESTS and the BASTARD OFFSPRING-VAM, SGP’S etc. . . ??
Why of course in Noel Wilson’s never refuted nor rebutted classic takedown of those educational malpractices “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
By Duane E. Swacker
Once one surrenders to the chimerical notion that the forced ranking imposed by the [inherently imprecise] scores generated by standardized tests is useful and accurate and trustworthy, a lot of other things fall into place.
And not in a good way.
As I sometimes do, a healthy serving of Noel Wilson topped off with a little Banesh Hoffman.
[start quote]
The most important thing to understand about reliance on statistics in a field such as testing is that such reliance warps perspective. The person who holds that subjective judgment and opinion are suspect and decides that only statistics can provide the objectivity and relative certainty that he seeks, begins by unconsciously ignoring, and ends by consciously deriding, whatever can not be given a numerical measure or label. His sense of values becomes distorted. He comes to believe that whatever is non-numerical is inconsequential. He can not serve two masters. If he worships statistics he will simplify, fractionalize, distort, and cheapen in order to force things into a numerical mold.
The multiple-choice tester who meets criticisms by merely citing test statistics shows either his contempt for the intelligence of this readers or else his personal lack of concern for the non-numerical aspects of testing, importantly among them the deleterious effects his test procedures have on education.
[end quote]
[Banesh Hoffman, THE TYRANNY OF TESTING, from the 2003 edition of the 1964 publication of the 1962 original, pp. 143-144]
And we’ve been cautioned about this for many many years:
“Statistics are no substitute for judgment.” [Henry Clay]
And as far as Peter Drucker’s sad excuse for management theory, MBO [see comment elsewhere in this thread], not just W. Edwards Deming pointed out its flaws many years, but note the following comment by Canadian economist Charles Goodhart:
“When a measure becomes a target, it ceases to be a good measure.”
And don’t get me started on Campbell’s Law (1975)…
Just my dos centavitos worth…
😎
Great quotes KTA!!
Thanks!!
Peer- reviewed research on the reliability and validity of the SLO process is zero. None. Nada. I know this from four recently published reports from USDE…yep, USDE, Institute of Education Sciences. The SLO process is a version of Peter Drucker’s 1954 MBO—management-by-objectives on steroids. The process is being used in at least 26 states. Some states, most recently Maryland, are requiring all teachers to write up their “rigorous” SLOs. In federal definitions for RttT, rigor” means statistically rigorous..little known fact.
This pseudo-scientific process asks teachers to write up what amounts to a one-group pre-post test experiment . Teachers are evaluated on: (a) skill in writing the SLO and (b) skill in calculating the odds their students will meet the learning target(s) specified in the SLO at the start of the “interval of instruction” (e.g.., 90% of the students who attend the class at least 85% of the time will score at or above 95 on the 100-item district-approved 6th grade end-of-course music test).
NY state’s SLO template calls for the following information, all based on a template developed in 1999 for a Denver pilot of pay-for-performance. The SLO process is being pushed by USDE for the estimated 70% of teachers who have job assignments for which there are not (yet) state-wide tests or comparable tests that can produce the data for the great VAM SCAM. In effect, the SLO is used as a proxy for VAM in addition to micro-managing the work of teachers. The 8 components in the SLO process are:
1. Student Population: which students are being addressed? Each SLO will address all students in the teacher’s course (or across multiple course sections) who take the same final assessment.
2. Learning Content: what is being taught? CCSS/national/State standards? Will specific standards be focused on in this goal or all standards applicable to the course?
3. Interval of Instructional Time: what is the instructional period covered (if not a year, rationale for semester/quarter/etc.)?
4. Evidence: what assessment(s) or student work product(s) will be used to measure this goal?
5. Baseline: what is the starting level of learning for students in the class?
6. Target and HEDI Criteria: what is the expected outcome (target) by the end of the instructional period?
7. HEDI Criteria: how will evaluators determine what range of student performance “meets” the goal (effective) versus “well‐below”,” (ineffective), “below” (developing), and “well‐above” (highly effective). These ranges translate into HEDI categories to determine teachers’ final rating for the growth subcomponent of evaluations. Districts and BOCES must set their expectations for the HEDI ratings and scoring.
8. Rationale: why choose this learning content, evidence and target?
A recently patented software program digests the on-line information in a teacher’s SLO and spits out a HEDI rating or a similar one. One of the patent holders has an out-of school LLC and simultaneously serves as a district superintendent in Ohio.
Laura,
What does HEDI stand for?
Thanks for your valuable writing and sharing your insights with all!!
Hellish Educational Directive Insanity????
Highly effective
Effective
Developing
Ineffective
Always makes me think, by way of complete contrast, of Heidi’s education on the mountain with Grandfather.
Thanks, Christine!
There is no cure for AI* and ol Billy won’t fund research for a cure.
*”Acronym Impaired” disorder
LAUGHABLE: Districts are all scrambling to adjust scoring policies so their test scores don’t hurt the schools (and property values). Struggling inner city schools have seen good teachers flee for years knowing the APPR would punish them for teaching high needs students. But now, as 70% of kids statewide fail the CC aligned exams, we see administrators adjusting subjective measures to compensate for the arbitrary measures based on the latest version of the state tests.
The whole idea of rating student growth through testing is doomed to inaccuracy and misinterpretation because college readiness is years away from these grades – kids develop at different paces.
We see above how Rochester altered last years policy. Scarsdale’s new superintendent has rejected APPR out of hand and did not share individual teacher’s scores with them in protest.
The federal mandate for most kids to be compared to top performers and labeled not proficient was a horrible experiment with predictable results that enriched contractors and plunged schools into chaos.
The sooner these problems are laid at Obama’s doorstep the better. He tricked me into voting for him by flaunting Linda Darling Hammond on the campaign trail but after election, chose charter school champion Arne Duncan. Obama has never had my trust since.
My district has rolled out a proposed accountability system. Orchestra teachers will be awarded 4 points for a division 1 rating in both concert and sight reading, 3 points for a division 1 or 2 rating, and so on. Coaches will be rated on the number of students in the “healthy fitness zone”. The nurse will be rated by the number of students who return to class after visiting the clinic. The school will be rated by counting parental involvement. Gifted and talented teachers will be rated by how many of their students conduct independent research.
I can hardly wait for the teacher evaluations.
We are well on our way to building a house of cards.
“The nurse will be rated by the number of students who return to class after visiting the clinic.”
Sometimes reality is stranger than fiction. Or as comedian L Black states “I took acid when I was younger to prepare myself for these times.”
Could we come up with an equivalent rating system for Texas politicians, particularly those would wish to be president?
The NYS School Boards Association has a resolution of support for the NY APPR System set for a vote at the late October convention. Encourage your school board delegate to vote NO on Resolution #10. The West Seneca and Springville-GI School Boards have passed rebuttals to the NYSSBA Board of Directors resolution supporting APPR. http://www.ecasb.org/images/articleimages/appreb2014-wsen.pdf
“Scarsdale, perhaps the most affluent and high-scoring district in the state, had no “highly effective” teachers. But Rochester, one of the districts with high poverty and low scores, had many.”
I’m suggesting a theory that might explain this: teachers who are highly skilled in the art/craft of teaching who start out in schools with low scores stay in schools with high poverty rates but teachers who have low teaching skills, transfer out and up to escape and work with kids who are easier to teach.
It makes sense that there would be a high attrition rate in schools with high poverty rates becasue of the challenges faced, but in time a core of veteran teachers would develop who would stay and create an atmosphere of stability in the classroom.
For instance, I taught in schools with 70-percent or higher poverty rates among children and the high school where I spent the last 16 years of 30 had a fairly stable teacher population—and it had been that way for decades with some teachers staying more than thirty years—-but among new teachers fresh out of college, the attrition rate was high. The newbies who survived usually formed a support network with the veteran teachers.
I think that the promoters of growth measures (a euphemism for gains in test scores pre to post test or year to year) would say that Scarsdale is not setting the bar high enough for the kids. The rate of improvement is not fast enough. It is a no-win for teachers or schools where a preponderance of students are high achieves. In Ohio, the “solution” is for teachers to pile on some extra assignments for the high fliers in the hope that the extra work will make it possible to say more growth was produced.
“VAMs are Craps”
To judge a teacher, roll the dice
The VAMs are craps, and job the price
To keep a teacher, roll a seven
When coming out, or roll eleven
A 2 or 12 will crap them out
And also 3, the lousy lout
When point’s established, a 7 roll
Sends the teacher down the hole
But point repeated ‘fore a 7
Keeps them in the Seventh Heaven
The VAMs are logical as can be
And oh so fair, as you can see
“VAM: The Scarlet Letter.” A talk given by Andy Goldstein and Ellen Baker to the School Board of Palm Beach County, FL.
VAM is Great, VAM is Go
od. We thank it for our firing. AmenWe had almost the same thing in Delaware. One of our charter schools which has been ranked as 10th best in the country, was given a failing grade by our DOE because it failed to show the state standard of growth… The state target was for 5% growth and with previous scores of 98%,(never mind that 103% was theoretically impossible) it failed.. Since failing to meet its goals was the public reason our state is sending its first 6 failing schools over to charter, it had to hold tight to that standard for all schools in its reach. Therefore one of the top ten schools in the country received a notice of failure from our Secretary of Education… (who is also now a Chief for Change)
The same may have been true with evaluations between Scarsdale versus Rochester. In affluent areas where scores are already high, it is close to impossible to raise them. In a poverty area where scores are very low, just a little effort can raise them a lot… This concept when applied to driving would go if you were already driving 100 miles an hour, your 5% goal would be 105 mph… If you were driving 5 miles an hour, you goal would be 5.25 mph.. A lot can go wrong at 105 mph. 0.25 mph is a piece of cake…
What is the name of that charter school?
The growth measure really is a variant of the time/speed/distance traveled. Time is the interval of instruction, distance is the course content to be covered, speed refers to the rate of learning. If you start behind others on the pretest, you have to cover more ground and faster that otherwise to reach the learning target on time.
. You can actually graph the problem in a manner that shows the steep learning curve for students who score at the lowest level on the pretest, versus those who score high on the pretest. Mark those points and draw the line to the “target” for learning on the posttest. There you have the essence if Race to the Top.
A friend of mine is on her Fifth year as a teacher in NYC. She is still on probation and has not received tenure. She outscored me on the evaluation. I, who got by teaching with a collaborative team teacher, using her lessons and curriculum for the 2013- 2014 year outscored a teacher who put in Ten times the amount of work I did for the school year. So, to repeat, a teacher who the city doesn’t want to give tenure to outscored me who did absolutely no lesson planning who in return outscored a teacher who teaches AP History and is widely considered the most popular teacher at my school. You can’t make this stuff up.
It is incredible how much time is WASTED on the infamous SLO. The worst part of this is that teachers must focus on whatever they put on their SLO at the start of the year and focus throughout the year. If a career depends on showing “growth” on whatever was put down for the SLO you can bet that there is a focus on this throughout the year. And you can bet that whichever students are a part of that SLO have an edge in the “teacher pays attention to me” department. Careers depend on this selected group and their GROWTH. My question would be this… can parents fight this by putting forth a “right to know” clause. Each parent whose child is or is not part of the SLO should know this shouldn’t they? I do hope that the SLO goes wayside quickly. An awful lot of teachers are being forced to become statisticians of junk science. A good number of principals are forced to waste valuable time on adhering to this junk science.
“A good number of principals are forced to waste valuable time on adhering to this junk science.”
Yep, a good number of those principals are GAGAers and lack the cojones (well maybe it’s just because they like the higher salaries) to stand up, to show some backbone, and say no to these educational malpractices-weak willed namby pamby toadies all of them.
Principals cannot be inspirational leaders they are now auditors of data. The templates and criteria for SLOs have a bunch of unstated assumptions about teaching assignments. A local art teacher only teaches fourth graders, but has a class roster of 400 students. Interval of instruction is a farce because she has no control over the pull out programs, interventions, and the like. The computer template will probably spin out a flag her for non-compliance. The assessments for the SLOs must be comparable across classrooms. That means one size fits all teaching district wide for grade 4students and no teacher with special strengths will be permitted to have those put into play, and the rest. In a recent review of the Research literature on SLOs, not one study offered any evidence of reliability and validity or efficacy other than getting teachers to comply with the process, with several years of training needed to secure that.
When I’m stressed my guitar is where I go for relief, but sometimes more direct action is needed. This whole APPR, Student Learning Objectives, targets, pre-tests, etc. is overwhelming. Rather than just complain or dissolve the lining of my stomach, I used my frustration with it as a muse…and this is what resulted.
For those of you outside NYS, or perhaps retired and thanking the stars you don’t have to deal with it, it may not have much meaning, but for us in the trenches in New York, well, you’ll hear some familiar jargon.
“Ya gotta pay your dues to sing the blues” (as they say) and we collectively are doing just that.
My take on it: “S.L.O. Man”, sung to “Soul Man”
[audio src="http://www.nscsd.org/uploads/GFLICK/slo%20man%20draft%202.mp3" /]