Bruce Lederman is representing his wife, Sheri Lederman, a fourth grade teacher in Great Neck, New York, in a legal challenge to New York State’s teacher evaluation system. Several readers asked to see the court papers, and I will post some of the affidavits from nationally recognized experts in a day or two. For now, here is Bruce Lederman’s explanation of the theory behind the legal claim on behalf of Sheri Lederman. The New York State Education Department sought to have the case dismissed without a hearing, but the state Supreme Court accepted the case. There will be oral arguments on August 12 at 10 a.m. in Albany in the court of Judge McDonough, 10 Eagle Street. If you are interested, please attend.
Bruce Lederman writes:
Diane:
Several of your readers have asked for an explanation of the legal theories behind the Lederman v. King lawsuit. I am attaching the reply memorandum of law which explains in detail the evidence and expert opinions in the case, as well as the legal arguments at issue. I also attached reply expert and facts affidavits from Aaron Pallas (Columbia), Linda Darling-Hammond (Stanford), Audrey Amrein-Beardsley (ASU), and Sean Patrick Corcoran (NYU), Jesse Rothstein (Berkeley), Carol Burris, Sharon Fougner and myself (which has an important email exchange with Professor John Friedman, co-author of the widely cited Chetty, Friedman & Rockoff studies).
To summarize for your readers the legal theories, we are proceeding based upon three theories. First, seek to have Sheri’s Growth Score Rating of 1 out of 20 points declared null and void under New York law on the grounds that it is “arbitrary and capricious.” Under New York law, any actions by a State Agency (in this case the Dept. of Education) can be challenged as “arbitrary and capricious” which is generally defined by the Courts as irrational and unreasonable based upon the facts. Second, we are asserting that the New York Growth Model (a VAM program) actually violates the New York law because it does not measure growth as defined in Education Law §3012-c(2)(i), is also not transparent and available to teachers before the beginning of the school year as required by Education Law § 3012-c(2)(j)(1) and does not allow all teachers to get all points as required by Education Law § 3012-c(2)(j)(2). Third, we argue that if Sheri is not allowed to have the individual facts of her case reviewed and is rated by a computer program whose results are not reviewable by a human being base upon real life facts, then she has been denied due process of law in violation of the Constitution. We ask, rhetorically, is this 2001 a space odyssey where the computer is always right and common sense has gone out the window?
One specific thing we are challenging is that she got a growth score of 14 out of 20 in year 2012/13 and a growth score of 1 out of 20 in 2013/14, even though the proficiency of her students (i.e., Students whose scores meet or exceed state standards) was virtually identical and there is no rational explanation for such wild swings in scores year to year. Another thing her case illustrates is the problem of ceiling effect when teaching high performing students. For one student, she got a failing student growth percentile (SGP) of 27 out of 100 because the student got 60 out of 60 questions right on a 3rd grade test, and got 64 out of 66 questions right on his 4th grade test while in Sheri’s class. Even though the student was in the 98th percentile, the teacher was rated in the 27th percentile because a child got 2 questions wrong. Is that rational?
The issues of why New York’s Growth Model does not comply with the law is that the law tells the Department of Education to measure change in student achievement between two points in time. New York’s Growth Model does not do this because instead of measuring growth, it creates what we are calling a “survivor-type” competition where the computer predicts what children should do and evaluates teachers on a bell-curve for whose students met the computer predictions. There are many problems with this, most notably that the computer is comparing apples and oranges. The fact that a child got a score of 300 on a 3rd grade math test and a score of 295 on a 4th grade math test does not prove that the child did not learn substantial amounts in 4th grade. This is explained very well by Professor Aaron Pallas in his reply affidavit, which I highly recommend reading. Sheri and I suggest that all our experts provide important information and I suggest that people read their affidavits.
Another significant fact is a series of statistics located by Dr. Carol Burris. Dr. Burris found that there were wild swings in teacher ratings between 2012/13 and 2013/14 which made absolutely no sense. For example, Scarsdale, which is generally highly regarded, went from having 0% ineffective teachers and 13% highly effective teachers to 19% ineffective teachers and 0% highly effective teachers in one year. Something is obviously wrong. There are additional examples in Dr. Burris’ reply affidavit which your readers may find interesting.
Finally, a very important issue which is presented is the defense of New York State which claims that there are academic studies recommending the use of VAM-type programs for these types of high stakes teacher evaluations. All of our experts do a great job of explaining that there are no studies that suggest that VAM-type programs can accurately rate teachers in individual cases. Professor Sean Patrick Corcoran from NYU explains that studies have found that VAM is unbiased, not that it is accurate. New York’s Education Department is misunderstanding the difference in their position in our case. Professor Corcoran provides a simple example that if you throw darts at a dart board and always miss, but miss as much to the left as to the right, and as much to the top as to the bottom, you are not biased, but you are also neither precise nor accurate. Also, I had an interesting email exchange with John Friedman, co-author of the widely discussed Chetty, Friedman Rockoff studies where he readily acknowledged that his studies were only saying that VAM-type scores tend to be accurate “on average” which he explained means over the lifetime of teacher. He suggested considering VAM scores like a type of lifetime batting average in baseball. Professor Friedman specifically said that VAM scores can be too high or too low in any year, and that they may be wrong because a particular student had a bad day when the test was taken. Following this logic (which comes from one of the leading VA researchers) rating teachers based upon VAM generated scores is like rating a baseball player based upon a single randomly chosen at bat.
We are scheduled to have an oral argument on August 12, 2015, and are optimistic that the Judge will recognize that something is terribly wrong with New York’s Growth Model and the rating of 1 out of 20 points given to Sheri. We believe we have established that New York’s Growth Model (which it paid a contractor $3.48 million to develop) is a statistical black box which no rational person could find fair or accurate.
We thank all those who have supported us.
Great post, Diane. OMG …
I took statistics undergrad & grad (MSW) . I’m Cajun, I know the reform-corporatizers think statistics are magic, but there are actual procedures that must be followed, that make sense! The moving target of standardized tests (API) and the arbitrary VAM approaches are not statically valid and not even good grisgris (voodoo magic).
Good luck to them, and may justice prevail!
The million dollar question is what happens if you win this case?
Then her husband, the lawyer, will become inundated with requests from teachers to represent them in state court seeking to overturn unfavorable ratings, despite doing their best in the classroom under trying conditions. Go Mr and Mrs Lederman, give ’em hell!
When I have received a million dollar certified bank draft made out to Duane E. Swacker and it has cleared, I’ll give you an answer.
“For one student, she got a failing student growth percentile (SGP) of 27 out of 100 because the student got 60 out of 60 questions right on a 3rd grade test, and got 64 out of 66 questions right on his 4th grade test while in Sheri’s class. Even though the student was in the 98th percentile, the teacher was rated in the 27th percentile because a child got 2 questions wrong. Is that rational?”
That’s what I was trying to explain to FLERP.
No, it’s not rational. It’s insane.
But I think this is where there’s some confusion. I’m not convinced it really works that way with a direct comparison of each individual student’s scores from their previous grade to their current scores for their current grade. That would be a nightmare considering how often kids change schools, districts, states and even countries. And what do you do about students who have no score from the previous year? How would their “growth” be calculated? If anyone has more insight into how this really works, I’d appreciate it.
I would recommend downloading the teacher’s guide to interpreting scores PDF found at this link: https://www.engageny.org/resource/teacher’s-guide-interpreting-state-provided-growth-scores-grades-4-8-2013-14
I take your point. But one thing I doubt is that any of these crap models substantially take into account what’s likely to happen after a student has hit 100%. How likely is that going to be maintained? Or numbers like it. What if indicators place a student in position to be compared to 100% scores? And how do the models react when 100’s or near 100’s are not maintained or matched?
Thanks for the link. But perhaps I’m being dense, but I still don’t get it. From the section on how “growth” is calculated: “This method is illustrated in Figure 1, following, which shows Student A with an ELA score of 320 in 2013.1 Compared to other students who also had a score of 320 in 2013, Student A’s 2014 ELA test score hovers in the middle range. We can describe Student A’s growth in relative terms as a “student growth percentile” (SGP).”
Doesn’t that mean that in order to calculate the SGP for any given student, that student would have to have a score in both 2013 and 2014? What if that student opted out or lived in Mexico (or even Vermont) or whatever in 2013? How wold their SGP be calculated?
Dienne, I think that’s correct. If a student doesn’t have a prior-year score, you can’t calculate that student’s SGP. Presumably that’s why students without a prior-year score are excluded from the calculation of teacher ratings (see Tim’s response below).
Thanks Tim, but I suspect this is a huge problem across growth and comparison and mixtures of growth and comparison models.
From page 5:
“Teacher MGPs are based only on students who had test scores from the current and immediate prior school year and who met the State’s minimum enrollment requirement (enrolled for at least 60 percent of the course duration) in the current school year. Also, an MGP is only reported if it is based on at least 16 SGPs.”
I think the missing data problem is “solved” by inventing a score for the student. The process of doing so is called imputation. It is part of the magical thinking that assumes there is no difference in the content and important of a student’s test score in reading in grade 3 and reading in grade 4, plus assumptions about the mathematical properties of the scores on the year-to-year tests. Exactly how corrupted and incomplete data files are treated is usually buried in fine print or in a formula, or shoved to the side and not answered at all if the VAM is like EVASS in Ohio, where the details are proprietary and the company SAS sidesteps responsibility for the the integrity of the test scores altogether–that is not the responsibility of SAS.
VAM calculations are based on many assumptions that are not taken seriously by the promoters, or the assumptions are addressed in thick statistical language. Go back, for example, to test construction and what the scores mean. Are scores like tick marks on a ruler? The answer is no. A ruler is an interval scale. An inch is an inch. You can add up inches and subtract them. Unless I am mistaken, VAM calculations assume that test scores from different grade levels are designed from the get-go to be like interval scales or that they can be converted to interval scales with a dab of this or that kind of transformation. But those mathematical inventions hide the truths of the matter.
The following comment from Bruce Lederman’s post illustrates the stupid conclusions that can be reached by not understanding all of the largely invisible assumptions made in VAM calculations based on test scores.
“There are many problems with this, most notably that the computer is comparing apples and oranges. The fact that a child got a score of 300 on a 3rd grade math test and a score of 295 on a 4th grade math test does not prove that the child did not learn substantial amounts in 4th grade.”
I appreciate this briefing. Sheri has a well-informed lawyer and powerhouse experts who, like Bruce, are good “explainers.” I hope this works for her and offers a template for more cases, including class action lawsuits.
Only advice: Look out for SAS experts who have a big interest in protecting their intellectual property and others whose reputations have been built on marketing this statistical farce as if it is objective, trustworthy, valid, and predictive–the most reliable and an essential predicate for judging “teacher quality.”
Laura,
Over the weekend, I will post all the affidavits from the national experts. Fascinating reading for those who are interested in this issue.
Laura — NYSED does not appear to disagree with Mr. Lederman’s point about the student who scores a 300 one year and scores a 295 the next year. NYSED’s expert’s affidavit says: “Note also that New York State assessment scores are not directly comparable from grade to grade (that is, a score of 300 in one grade and 295 in the succeeding grade does not mean a student did not grow academically).”
Dienne, that’s simple. Here in Florida they simply rated me based upon the test scores of students I’d never taught since my primary students are not in a testing grade.
The state left the decision of how to rate each teacher via VAM to each county and mine changed the goal posts literally right up to the week the students took the test.
The first year they used the SAT 10, despite the fact that it is over 10 years old and is not aligned whatsoever with the current standards. No scores from previous year. Didn’t matter.
Last year we used this new online ‘teaching and assessment’ program. No scores from previous year because we just bought it. Didn’t matter.
My local union (and many around our Right to Work state) convinced our school board to sign an agreement that they would not take harmful actions against teacher, like demoting them, reducing their salaries, or firing them, until the state DOE provides some kind of proof that the tests and the methodology are valid, reliable, and aligned to what we are required to teach.
So far it’s holding up because the state legislature passed a law this past spring that held students harmless until the new tests are validated. They purposely would NOT include teachers but they haven’t gone after us either. Yet.
Nightmare? Yes. Fair? No. Make sense? No. Does any of that matter? No.
The sooner we can all come to the point that we understand that fairness, legality, logic, reason, and precedent no longer carry any weight whatsoever in the realm of education reform the better we will be in fighting the movement.
We can’t keep saying “But that doesn’t make sense!” or “But that’s not fair!” because our opponents DON’T CARE. They don’t play by the old rules anymore. Anything goes in service of their ideology. ANYTHING.
If they do get smacked down by a judge or court somewhere they simply reword and rework and reinstate their heinous policies.
It’s really that bad and it’s time for us to know that.
First of all, anyone who has taken a course or two in evaluation and measurement knows that there is no way you can compare the performance of a student on tests of different grade levels. The content a student should know increases with each grade level so it is simply irresponsible to compare a students performance on a third grade test with his or her performance on a fourth grade test. For example an actuary can do very well on his first or second exam and tank the third. What does this mean? Was he not putting forth the required effort in his studies? Did he work with an inefficient tutor or mentor? Or was it that the difficulty of the exam simply increased too much too quickly. I see this nonsense all too frequently as a new med student. There are Professors whom teach identical courses who give vastly different leveled exams in terms of difficulty.Yet all the students are compared accordingly regardless if they achieved their grades via a Professor who’s exams are cake versus a Professor who has increased the difficulty of what is academically expected ten fold.
You’re right. Content and the skills needed to master it change from one grade level to the next. I personally thought calculus was easier than algebra, for example.
I’d also like to add that the human brain develops in growth spurts that are different for each individual student. It seems impossible to me that any computer model could even come close to accurately predicting how and when growth spurts (and plateaus) for any individual human being will occur based on comparisons to peers on standardized tests. As Diane said, junk science! The development and functioning of the individual human brain will always be more complex than Big Data can handle.
Not that that’s a good legal argument; it just always bothered me. I think the legal arguments being presented in New York are solid. Thank all for your work.
Just the admission by Friedman drastically undermines, if not eviscerates, every pro-VAManiacal statement ever made on this blog.
Of course, don’t be surprised if someone writes that Friedman “wasn’t really on board with VAM” and “doesn’t understand math” and “is being paid by unions/being intimidated by teacher union thugs” into making contradictory and false statements.
One of the best things that will come out of the lawsuit, IMHO, is that the creators and designers and producers of VAM will be flushed out. Statements will be made—publicly. Testimony will come forth—publicly. Transparency and open direct discussion of VAM is like sunlight and garlic to the VAMpirical branch of the High Holy Church of Testolatry.
And that’s just where it belongs: in the movies, in the land of make believe.
😎
All USDA subsidies will now be based on comparative crop production from White House employees playing FarmVille.
This case is extremely important because it exposes so much of the lunacy of the “accountability” approach adopted by SED and reformers generally around the country.
It is quite astonishing to see in detail the utter nonsense put forward by political elites as wisdom, and as something good for kids.
This case might even allow one to say that “junk science” is too good a phrase to use to describe this stuff.
Truly amazing.
Gives new meaning to the old notion that “money talks, nobody walks.”
One wonders though, if nearly all methods of teacher evaluation have a breaking point of bias, accuracy, validity, and reliability (some better than others), at what point must we settle for an imperfect method? In the aggregate, are we letting the “great get in the way of the good” when it comes to the vetting of teacher evaluation methods? What levels of accuracy and reliability are acceptable among teacher evaluation instruments before they should be used to make staffing decisions with real teachers? Is it better for students to use a potentially flawed method of teacher evaluation or limp toothless method? The science isn’t there to answer these questions just yet. What do we do in the meantime?
Well,
Admin observations, peer observations and perhaps a look at data as an aside and/or with a huge grain of salt?
There are plenty of ways to go, but we have to put this garbage out because Cuomo and the deformers love to chew on cow dung.
Are you assuming that there has to be one universal system?
Steve said–“a potentially flawed method of teacher evaluation or limp toothless method”
Steve, This is a false comparison. A teacher of art in Kindergarten to grade 3 can be evaluated on many qualities that are not in any way connected to scoring the students on a standardized test, or even scoring the artifacts they create.
If there were such a test, there would be no reasonable way to attribute the performance of the student to the teacher, unless for example, you reduced the program of instruction to learning some art vocabulary ( a practice becoming more common) or transforming the creative-imaginative dimensions of learning in art into somethink like some 19th century exercises.
Example: Draw a series of straight vertical lines with a pencil, placed at equal distances from each other (parallel) and of uniform length and width and density (lightness or darkness) without a ruler. Then do the horizontal variation, the diagonal variation, and then the variation with right-angle intersections to create a grid with 25 squares each 1″ by 1″.
There are no standardized tests for these grades in this subject. The current committment to VAM overides all reason. Ohio teachers of art are likely to be given a “shared attribution” VAM score from school-wide reading or math tests and probably at grade 3 only and only a reading VAM because the legislators have said we have a grade 3 reading garantee.
The point is that one-size does not fit all–and the corruption of teaching to preserve the illusion that test scores are essential measures of teacher quality is not just illegal but at many levels it is an unethical practice.
It actually begins with the unwarranted assumption that a bunch of teachers are incompetent and probably need to be fired because they are not producing increments in test scores at a rate that “meets or exceeds” expectations and we can prove that by the numbers. OZ sets the expectations.
Attributing variance in student outcomes to teachers with shared students and teachers in content areas outside math and literacy is indeed a problem. This is particularly true for teachers in special education. How much of a students achievement is attributable to which teacher? All? Some? None? There are no easy answers here and the statistics required to control for shared variance are not likely a satisfactory solution to anyone outside the field of psychometrics.
Maybe we can agree though that student outcomes should be a point of consideration, given that the assessments are actually a measure of the skills and knowledge of the course(s) in which the teacher was assigned. It is possible to create observable, measureable outcomes aligned to art, music, and such. Unfortunately, many state laws dictate that all teachers must be evaluated on their contributions to a students state achievement test, which typically measure only reading, writing, and math.
Somehow we’ve managed for well over 100 years to maintain a national public school system that has produced scholars, inventors, politicians, writers, scientists, and citizens that maintained our supremacy in medicine, space travel, technology, innovation, business, economics, and ever-improving social equality (not there yet but major strides forward have been achieved).
There is no need for a new teacher accountability system; that is an artificial construct of the reform movement produced by the Friedman disciples who have been planning and implementing this destruction from its inception.
We can always think of ways to improve our public schools. Fully funding them would be a huge improvement so that not only schools in Great Neck, NY have all the highly-educated teachers, fully stocked labs, variety of challenging classes, meaningful electives, a plethora of books, up-to-date technology, good sports/art/music equipment, and extracurricular activities that are appealing and beneficial for the students but high schools in the South Bronx, NY and rural WV or TN also have those things. We aren’t even close to equity there.
We never fully-funded IDEA and left special education teachers and students hanging in the wind with schools crushed by unfunded mandates and lack of qualified personnel due to budget constraints.
Many, if not most, states now shift major portions of their budgets towards testing, preparation materials for testing, and technology required for testing instead of buying supplies, hiring more teachers or support people, and improving buildings, playgrounds, and extracurricular activity venues.
Principals have always had the power to remove ineffective teachers if they followed the rules. Some choose not to do their work in this area but there is no ineffective teacher crisis and there never has been.
No matter what methodology they throw at American teachers they rise to the top. Kentucky just released their new evaluation results and 93% of their faculties merited scores in the 2 top categories.
California, Florida, Ohio, New York, and more states pushed VAM down on teachers and in every state the majority of teachers were deemed effective or better. The reformers then claim that the VAM process was flawed instead of acknowledging that most teachers are competent, caring, and effective at their jobs.
The reformers won’t accept that reality so they continue working on impossible, ignorant, and inappropriate measures of ‘accountability’ and ‘effectiveness’. Nonsense that the business world is rightly abandoning due to its ineffectiveness and negative consequences.
We don’t need to reinvent anything or invent something new. We were doing it right before.
We need to stop searching for a computer formula to do what has always been the job of local administrators. The old fashioned observation and narrative format has worked for decades. While it may not be perfect, it is a better system than allowing states play with numbers so they justify their actions against teachers. The term that comes to mind is “railroading.” The old system is being replaced now because states are looking to fire teachers and bust unions. It is so much easier to let a computer wield the ax.
The need for valid and reliable teacher evaluation is clear. Linking evaluations to student outcomes in a scientifically and socially responsible way is a reasonable goal.
The argument that “old fashioned observation and narrative format has worked for decades” is simply a fallacy. That type of system is even more capricious, arbitrary, and subject to bias than any of the currently proposed evaluation models. The reason for it’s longevity is that it handily serves the status quo.
“The need for valid and reliable teacher evaluation is clear. Linking evaluations to student outcomes in a scientifically and socially responsible way is a reasonable goal.”
No, it is not clear at all. The only ‘need’ for such an ‘evaluation’ is if you choose to blame teachers for all the ills of society, such as poverty, child hunger, lack of medical care, underfunded schools, etc. in order to avoid having to come up with expensive solutions for these social ills at the expense of the richest of the rich.
This fallacy was created by rightwing economists and embraced by neoliberal democrats. It has no basis in reality. Diane herself has published proof that student outcomes were actually increasing at a higher rate and minority scores were rising faster before all this reform nonsense and ‘need for valid and reliable evaluation’ arose from the swamp pits of greed and racism.
I am a teacher. I am a professional, active in my profession for over 2 decades. There were no teachers seeing this so-called ‘need’. There were few, if any, parents or students, outside of a very small number, calling for the heads of teachers. This arose from whole cloth manufactured by the likes of Milton Friedman, Jeb Bush, Bill Clinton, and the Walton, Broad, Gates, and other billionaire criminals.
There was no national movement for evaluation, for CCSS, for NCLB, for RTTT, or any other asinine reforms. It was all part of a plot to destroy public education and to profits American children.
n Steve 15: “The argument that “old fashioned observation and narrative format has worked for decades” is simply a fallacy. That type of system is even more capricious, arbitrary, and subject to bias than any of the currently proposed evaluation models.” What experience do you have with this narrative system? Have you even been evaluated by such a system? The narrative system listed several key elements of consideration such as command of subject, teaching strategies, evaluation, classroom management. For each of the categories there was a rubric to guide the trained professional in the evaluation process. This is a more comprehensive and valid way to evaluate teaching and student-teacher interaction. The person doing the evaluation is trained in the process and credentialed in the field of education. I am not saying that the versions of this system is perfect, but it is beats VAM, a bunch of invalid nonsense.
Whatever system is used, it should be one that teachers and administrators are capable of understanding and discussing without having to retain experts.
Can anyone imagine having a PL (prof. learning) on VAM itself, the formula, its derivation and application?
FLERP!: much said in few words.
Thank you for your comment.
😎
I am so glad this ludicrous “evaluation” system is finally being challenged. Thank you to this teacher and her lawyer husband.
The “vammers” were hoping teachers would not have the resources to fight back.
This post includes outstanding information about the fraud that is VAM.
Thank you.
Bookmarked.
Thank you to the Ledermans for using your own time, money and metal and emotional energy to fight this outrageous, unfair, ridiculous, vindictive and abusive rating system.
It seems to me the Ledermans’ have compelling arguments. Let’s hope the judge agrees. The fact that Mr. Lederman can link what has happened to his wife to violations of New York State law, I think, strengthens his argument. Let’s hope the judge agrees, and forces the state to stop misusing data to corner teachers and damage their careers.
Quick, someone build an Enigma machine to break VAM, assuming that there is even a rational model to be deciphered, and it’s not just gibberish.
Interesting arguments.
The big problem is 2 fold as I see it.
If this isn’t arbitrary and capricious and courts do not demand human review of ratings (which I am not sure the law does) – what level of review is acceptable. As another poster put it – what replaces this system.
There is another issue here though that I don’t know the legal resolution to.
What happens if the law requires something that is impossible? The teacher review system and scores can be manipulated to produce via cut scores or test difficulty unrealistic results (though realistic is a less than precise legal term).
What if we can measure students at 2 points in time but can’t relate it to teacher effectiveness fairly? What if the model assumes a steady progress of growth when we have proof of backsliding?
Do the arguments leave room for the state to simply produce better literature explaining the score and tweaking the formula?
It looks like the affidavits are in the link provided by Mr. Lederman.
At least 6 years ago as all of this was beginning I said in our faculty room, “just wait until the lawyers get their hands on this”. I pray that Sheri and her husband prevail. How could they not? Looking forward to reading the affidavits. VAM should go away. It is not helping.
I thought the same thing but then this happened here in Florida:
“In his ruling, Judge Walker indicated there were other problems.
“To make matters worse, the Legislature has mandated that teacher ratings be used to make important employment decisions such as pay, promotion, assignment, and retention,” he wrote. “Ratings affect a teacher’s professional reputation as well because they are made public — they have even been printed in the newspaper. Needless to say, this Court would be hard-pressed to find anyone who would find this evaluation system fair to non-FCAT teachers, let alone be willing to submit to a similar evaluation system.”
“This case, however, is not about the fairness of the evaluation system,” Walker wrote. “The standard of review is not whether the evaluation policies are good or bad, wise or unwise; but whether the evaluation policies are rational within the meaning of the law. The legal standard for invalidating legislative acts on substantive due process and equal protection grounds looks only to whether there is a conceivable rational basis to support them,” even though this basis might be “unsupported by evidence or empirical data.”
Reread that last line. I only hope that the New York Supreme Court judge assigned to this case has a different interpretation of the law and VAM.
Other cases regarding VAM have upheld the right of the state to continue to use it simply because it is state law. It seems to me that Lederman is using state laws regarding teacher evaluations to combat the “arbitrary and capricious” nature of VAM as it violates a state law that states that teacher evaluations should not be capricious and arbitrary. He is also challenging the lack of transparency of VAM because the state laws require that there be transparency with regard to evaluations. Nobody knows the secret VAM formula. He is also questioning the statistical impossibilities of his wife’s VAM score. Even one of the architects of VAM appears to be questioning its use. We will have to wait and see if they prevail, but I think people in other states will have to comb through state laws to see if they can find any standing for a challenge on similar grounds.
Chris in Florida: I’ve actually carefully studied the Florida case, and have spoken with both the lawyers for both the plaintiffs and the amicus (professors). There is (hopefully) a significant difference that the Florida case was a direct challenge to the overall constitutionality of the law on behalf of all teachers. Sheri’s case is a challenge by a single teacher of first to (a) the rationality of the rating solely as applied to the facts of her case, (b) to whether the Education Department model follows the particular requirements of State Law, and (c) to the constitutionality of not giving Sheri a chance to have some due process review of her particular situation. Per the lawyers who handled the case in Florida, the issues I am raising were not raised in their case. I hope we will be more successful.
Bruce
Bruce: I wish you and Sheri well and you are both in my prayers. Good luck!
I also look forward to seeing them and thank Bruce and Diane for making them available.
The Link to the Reply Memo brings you to a site where all the affidavits are available.
It seems to me that people terrified of the messiness of the human experience adopt beliefs that deliver them from what for them is chaos. Belief in VAM makes them feel secure.
It makes lots of dedicated teachers that work in poor districts or with students that don’t fit the mold feel very insecure and vulnerable.
I’m wondering whether NYSUT is bringing similar lawsuits on behalf of other exemplary teachers with irrational ratings.
As long as we continue to believe that computers, statistics, data, and “metrics” can predict things like “student growth” and “teacher effectiveness,” we will continue to have these problems. You can use whatever formula you like and call it “scientific,” but these things are as measurable as the beauty of a sunset or how much I love my cats.
The fact that computer models can’t predict the weather with much precision at a month out, should be a cautionary tale for putting too much stock in VAM.
Someone on this blog keeps saying what a lot of people keep saying.
“The need for valid and reliable teacher evaluation is clear. Linking evaluations to student outcomes in a scientifically and socially responsible way is a reasonable goal.”
Here are some thing to consider.
Student outcomes are not significant just because they are test scores that can be churned by an algorithm.
Tests are not educationally significant just because they produce letter grades or scores.
There is not much scientific at all in the current methods of student and teacher evaluation and there are good reasons to think that human judgments are far more socially responsible than the current tests and “junk science” marketed as if essential, accurate, and perfectly reasonable.
What is really “clear” about the need for teacher evaluations? Why must students be subjected to standardized testing, in every subject, every year? Why must every teacher be subjected to evaluation almost every year, in every subject, and in every grade and with the same system–some combination of student test scores, student and/or parent surveys, observation protocols?
Why these checklists and rubrics from Danielson and Marzano that are designed to micromanage teachers and rate them as if one-size-fits all teaching is great and a settled matter–no need to question these schemes..ever?
What is reasonable or even scientific about this presumed authority to evaluate from afar, through surrogates like Danielson and Marzano rather than face-to-face discussions with peers in collegial groups?
Where has “good” science in education gone–science that thrives on skepticism, attention to nuance, reasonable doubt, not jumping to conclusions, modest about using the terms “accurate” and “objective,” not making statistical leaps through thin air? It has certainly vanished from teacher evaluation and from the pundits who push pseudo- science into legislation.
Current evaluations are being done on the cheap, and it shows. The widely used systems only look only at outcomes on a limited array of indicators. The information must fit into pre-determined categories, easy to code as data–no room for the input from a teacher whose job assignment is to teach art to 400 fourth graders per year–a real job assignment, and there are many others not easy to “input” without a serious loss of information needed to make a sound and fair judgment.
Were do we find a judicious consideration of the resources and support systems for teachers and students and their administrators? Current evaluations make it perfectly legitimate to have a total disconnect between the means and the ends in education. That is always dangerous. It invites shortcuts. It says, in effect, succeed by any means possible. That ethic invites cheating and competition stripped of any virtue. Think of Enron. Think of the crooks who tanked the economy.
“Outcomes only” thinking has dominated policy and shaped practices for too long. It has allowed people who are not even remotely connected to life in classrooms to damage the culture of schooling and blame the breakdown on teachers and students, and even on parents
This outcomes only focus has placed a premium on the production of increments in test scores from year to year. The outcomes must meet or exceed the norm—as if there is nothing else that matters quite so much as those increments–the myth of continuous improvement, always growth as if one year in school must be exactly like every other in a handful of performance indicators.
These few indicators, all reduced to alphnumerical codes and numbers are thin excuses for evaluation in education. They were ushered into the house of education through the back door by conflating bookkeeping with accounting and thinking that accounting procedures were sufficient for accountability and then treating educational accountability as if equivalent to calculating year-to-year returns on monetary investments.
The end-game of this reasoning is “monetizing” the value of students and every teacher as worthy of investment or not. In fact, vulture capitalists, aided by economists, are now speaking openly about “payout children” meaning those worthy of investment and valuable because they will produce real dollars and cents profits. That is not much different from the thinking of pimps.
Laura: Well stated rebuttal to the current accountability obsessed politicians that seek to destroy public education so a few can profit at the expense of many. This preoccupation with testing has nothing to do with improving outcomes for students. If only educators could make decisions about education! I love the pimp analogy.
The brilliant computer algorithm at the supermarket tells me it is time to buy mouthwash and dog food. Guess what? It’s wrong. Not everything in reality can be modeled.
There is just one thought racing through my mind after fighting this for the last TEN YEARS as a teacher… The fact that the only person to challenge this in court is a teacher’s husband is the most clear evidence I have found thus far that our union, NYSUT, is in bed with NYSED. Last time I checked, NYSUT has attorneys (my union dues help pay for them), and those attorneys would be equally capable of challenging fraudulent teacher evaluations. Their silence is deafening. Every single union member should be outraged that a teacher would need her own husband to defend her as a professional from an evaluation system that violates the law (and common sense) on innumerable levels.
They are too busy collecting your dues and paying themselves six figured salaries to worry about going to bat for a peon teacher. Yet the dummies continue to give up their hard earned money without any objection whatsoever. Teachers must form their own Unions plain and simple! Until then, they will continue to lack a true voice at the table willing to do what is absolutely necessary to improve the dire state of education in its present form.
You know Folks, anything based on standardized test scores are COMPLETELY INVALID because when one starts with invalidity only invalidity can result. Wilson has proven the COMPLETE INVALIDITY of the EDUCATIONAL STANDARDS and STANDARDIZED TESTING MALPRACTICES in his never refuted nor rebutted dissertation. Read and comprehend why in his “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
Any decision yet on this case? It has been more than three months and the Judge was supposed to make that decision by now.
Still waiting. State wanted to settle which we turned down. Judge wants briefing on impact on case of new regents regelation which is due feb 29