Peter Greene here evaluates a report by two analysts at Bellwether Education, a DC think tank, about how teachers should be evaluated. His post is a model of how to tear apart and utterly demolish the musings of people far removed from the classroom about how things ought to work.
He begins by situating its sponsor:
“I am fascinated by the concept of think tank papers, because they are so fancy in presentation, but so fanceless in content. I mean, heck– all I need to do is give myself a slick name and put any one of these blog posts into a fancy pdf format with some professional looking graphic swoops, and I would be releasing a paper every day.
“Bellwether Education, a thinky tank with connections to the standards-loving side of the conservative reformster world, has just released a paper on the state of teacher evaluation in the US. “Teacher Evaluation in an Era of Rapid Change: From ‘Unsatisfactory’ to ‘Needs Improvement.'” (Ha! I see what you did there.) Will you be surprised to discover that the research was funded by the Bill and Melinda Gates Foundation?”
He reviews what they describe as current trends and pulls each one apart.
Here is an example of a current trend and Greene’s response:
“3) Districts still don’t factor student growth into teacher evals
“Here we find the technocrat blind faith in data rearing its eyeless head again”
The authors say: “While raw student achievement metrics are biased—in favor of students from privileged backgrounds with more educational resources—student growth measures adjust for these incoming characteristics by focusing only on knowledge acquired over the course of a school year.”
“This is a nice, and inaccurate, way to describe VAM, a statistical tool that has now been discredited more times than Donald Trump’s political acumen. But some folks still insist that if we take very narrow standardized test results and run them through an incoherent number-crunching, the numbers we end up with represent useful objective data. They don’t. We start with standardized tests, which are not objective, and run them through various inaccurate variable-adjusting programs (which are not objective), and come up with a number that is crap. The authors note that there are three types of pushback to using said crap.
“Refuse. California has been requiring some version of this for decades. and many districts, including some of the biggest, simply refuse to do it.
“Delay. A time-honored technique in education, known as Wait This New Foolishness Out Until It Is Replaced By The Next Silly Thing. It persists because it works so often.
“Obscure. Many districts are using loopholes and slack to find ways to substitute administrative judgment for the Rule of Data. They present Delaware as an example of how futzing around has polluted the process and buttress that with a chart that shows statewide math score growth dropping while teacher eval scores remain the same.
“Uniformly high ratings on classroom observations, regardless of how much students learn, suggest a continued disconnect between how much students grow and the effectiveness of their teachers.
“Maybe. Or maybe it shows that the data about student growth is not valid.
“They also present Florida as an example of similar futzing. This time they note that neighboring districts have different distributions of ratings. This somehow leads them to conclude that administrators aren’t properly incorporating student data into evaluations.
“In neither state’s case do they address the correct way to use math scores to evaluate history and music teachers.”
After carefully pulling apart the report, here are the conclusions, theirs and his:
Greene reviews their recommendations:
“It’s not a fancy-pants thinky tank paper until you tell people what you think they should do. So Adelman and Chuong have some ideas for policymakers.
“Track data on various parts of new systems. Because the only thing better than bad data is really large collections of bad data. And nothing says Big Brother like a large centralized data bank.
“Investigate with local districts the source of evaluation disparities. Find out if there are real functional differences, or the data just reflect philosophical differences. Then wipe those differences out. “Introducing smart timelines for action, multiple evaluation measures including student growth, requirements for data quality, and a policy to use confidence intervals in the case of student growth measures could all protect districts and educators that set ambitious goals.
“Don’t quit before the medicine has a chance to work. Adelman and Chuong are, for instance, cheesed that the USED postponed the use of evaluation data on teachers until 2018, because those evaluations were going to totally work, eventually, somehow.
“Don’t be afraid to do lots of reformy things at once. It’ll be swell.
“Their conclusion
“Stay the course. Hang tough. Use data to make teacher decisions. Reform fatigue is setting in, but don’t be wimps.
“My conclusion
“I have never doubted for a moment that the teacher evaluation system can be improved. But this nifty paper sidesteps two huge issues.
“First, no evaluation system will ever be administrator-proof. Attempting to provide more oversight will actually reduce effectiveness, because more oversight = more paperwork, and more paperwork means that the task shifts from “do the job well” to “fill out the paperwork the right way” which is easy to fake.
“Second, the evaluation system only works if the evaluation system actually measures what it purports to measure. The current “new” systems in place across the country do not do that. Linkage to student data is spectacularly weak. We start with tests that claim to measure the full breadth and quality of students’ education; they do not. Then we attempt to create a link between those test results and teacher effectiveness, and that simply hasn’t happened yet. VAM attempted to hide that problem behind a heavy fog bank, but the smoke is clearing and it is clear that VAM is hugely invalid.
“So, having an argument about how to best make use of teacher evaluation data based on student achievement is like trying to decide which Chicago restaurant to eat supper at when you are still stranded in Tallahassee in a car with no wheels. This is not the cart before the horse. This is the cart before the horse has even been born.”
“I mean, heck– all I need to do is give myself a slick name and put any one of these blog posts into a fancy pdf format with some professional looking graphic swoops”
I think this also applies to economists.
” Econo Mist* ”
The fog-bank from economists
Is obfuscating conman trysts
Obliterating common sense
And sabotaging recompense
*The German meaning also works
How the hell do you just whip these off?? I have nothing to say other than, you’re just freaky brilliant. Keep it coming.
in the case of economists, they whip themselves (self flagellation)
Don’t need any help from me. 🙂
🙂
The verdict is in.
As put so eloquently by a very old and very dead and very Greek guy:
“As a vessel is known by the sound, whether it be cracked or not; so men are proved, by their speeches, whether they be wise or foolish.” [Demosthenes]
And as amply demonstrated, SomeDAM Poet is not just wise.
“Against the assault of laughter nothing can stand.” [Mark Twain]
Wisdom. Humor. Reminds me of a few lines from that Tennessee Ernie Ford song, “Sixteen Tons”:
“One fist of iron, the other of steel
If the right one don’t a-get you
Then the left one will”
😎
You are like the Eminem of education
Where is TE? He is oddly silent?
TE phone home
I am here wondering why a paper written by a man with a masters in public policy and a women with a degree in community health would lead to a discussion ridiculing the work of people like James Heckman of the University of Chicago.
Long story short, we now have the Donald Trumpification of American public education. A disaster in the makings.
What about THOUGHT LEADERS? OY! Same-o, same-o. Insanity.
This “think tank” is about “edujobs” which are basically jobs in the education “reform” sector that do not require direct service to children (i.e. teachers need not apply). It is about siphoning school tax money into private pockets. Fortunately the media and the public are catching on.
Bill Gates could have done so much for education just by familiarizing himself with 50 years of educational research. There is a lot that we know (Hint: What’s good for the privileged child is usually good for the underprivileged). I like the man and still hope that he changes course.
You like Bill Gates? I would suggest that your affections are misplaced.
It has always bothered me that there are people who influence the classroom who have spent their careers trying to figure out how to avoid the classroom.
…and reality, too
“Divorced from Reality”
I never married reality
So can not be divorced
Reality is not for me
And sure can’t be enforced
You are the total BOMB, DAM poet! Thank you!
You are dylan like – economy of words – either dylan…
I like this because it is all encompassing: it can be applied to Michelangelo and Bill Gates.
There’s nothing like the “scientific method.” You’re supposed to look at data to see if it supports or refutes your hypothesis. The problem with faux think tanks is that they are coming up with a conclusion and going to make everything else fit their biased model. Drug companies do this all the time! They want to pin the academic stumbling of some American children on teachers. If they can discredit, demoralize teachers, and attack their due process and pensions, they can swoop in like the vultures they are and pick the bones of American education through privatization. The American Statistical Society has already come up with the unbiased answer of a range of 1-14% of teacher accountability. There’s too many variables, and it’s not significant enough to hang a teacher’s paycheck on
The test Vam monster lurks in the fog
Public funds to eat it cultivates
It must be chopped down like a log
Then once more we’ll educate
From the article:
‘Here we find the technocrat blind faith in data rearing its eyeless head again.
But Peter I’m sure that there are at least a few light sensing cells on that head!
How is it possible to make rational arguments based on false / flawed hypotheses / assumptions?
An economic concept of growth as a “measurable gain” has migrated into federal policies for education. The policy impulse is to simplify the multifaceted character of education and treat the enterprise of teaching and learning as a business in need of proper management to get results. The desired results are defined by forms of learning that can be measured and with a calculation of the rate of learning within a year and year-to-year, comparable to knowing whether profits are increasing—on a trajectory of growth or not.
This economic concept of growth as a “rate of increase” now overrides the educational meanings of human growth and learning—as a multifaceted, dynamic, and interactive process with daily surprises and influences from many sources.
Federal policies treat the economic meaning of growth as a virtue and as an imperative for accountability. This “accountability imperative” is evident in key definitions within RttT regulations and other grant programs. Federal Register. (2009, November 18). Rules and regulations Department of Education: Final Definitions. 74 (221-34), 559751-52.
“Student achievement means (a) For tested grades and subjects: (1) A student’s score on the State’s assessments under the ESEA; and, as appropriate, (2) other measures of student learning, such as those described in paragraph (b) of this definition, provided they are rigorous and comparable across classrooms. (b) For non-tested grades and subjects: Alternative measures of student learning and performance such as student scores on pre-tests and end-of-course tests; student performance on English language proficiency assessments; and other measures of student achievement that are rigorous and comparable across classrooms.”
“Student growth means the change in student achievement for an individual student between two or more points in time.”
“Rigorous” means “statistically rigorous.” Federal Register. (2009, July, 29). Notices 74(144), 37803-37. Retrieved from the Federal Register Online via GPO Access [wais.access.gpo.gov] [DOCID:fr29jy09-148]
The federal definition of an “effective” teacher requires attention to the rates at which student’s scores increase.
“Effective teacher means a teacher whose students achieve acceptable rates (e.g., at least one grade level in an academic year) of student growth (as defined in this notice).
“Highly effective teacher means a teacher whose students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice.”
If should be obvious that calculations to determine “rates” of growth depend on a data system that matches the test scores of individual students and the “teacher of record” for a given student and test. Gates and USDE have poured millions into getting data systems linked and free of crud that will compromise the metrics for accounting.
These integrity of data in these records serve as “baselines” for estimates of the “value-added” by a teacher to the scores of their students and various sub-groups. VAM produce these estimates. SLOs (student learning objectives) are a proxy for VAM until statewide tests for nontested subjects are developed.
Federal definitions mandate “comparable” ratings of teachers regardless of the grade or subject. Learning a foreign language, or math, or learning in dance must be made to look comparable. The bean counters, and bookeepers, and accountants, and statisticians can’t deal with qualitative differences.
Federal policy makers have sought to “normalize” the idea that economic growth is the same as “student growth’ and just an extension of the longstanding metaphor of teaching as nurture, cultivation, gardening (kindergarten)—a child’s garden.
Today, almost every teacher who uses the phrase “student growth” in connection with evaluation has been infected with the federal definition.
Some value-added experts love this easy conflating of the meanings of growth because it makes the convoluted metrics for VAM and SLOs easier to sell… And the silly oak tree analogy one means of doing so. See. http://www.varc.wceruw.org/tutorials/Oak/index.htm
““Highly effective teacher means a teacher whose students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice.”
How completely absurd is that definition considering there is no solid agreed upon definition of a “grade level” nor of what exactly “student growth is except for the circular definition implied in that statement!?!?
Otros pensamientos brillantes.
OPB!
zzzzzzzzzzzzzzzzzzzz Part and parcel of this problem is that economic growth measures have all been skewed toward the short term– the quarterly and annual fiscal reports– through changes in accounting practices underway since the late ’70’s. Mergers & acquisitions, leveraged buyouts, quick turnaround, emergency takeover– get in, get your $ & get out– the sort of predatory financing that set in when long-term financial stability began to leave our shores– the picking of the carcass if you will– characterizes the prevailing ‘economic concept of growth as a measurable gain’. Long-term planning characterizes proper management of education, transportation, infrastructure.
I cannot help but think of the state of our world, with wars Ukraine/Russia, Israel/Gaza-Palestine, and the lunatic savages in Syra/Iraq – and these idiot think tanks backed by millionaires have nothing better to do than create bogus one-sided reports and manipulate statistics and spend billions on elections all to undermine US education, break unions, eliminate teachers and instill a curriculum to create dummies. Its shameful. There are more important things going on in the world than to waltz in edu-boob circles.
Can I quote you Donna? Brilliant!
The idea you can isolate or accurately measure student growth is like a bad rumor in high school you can’t seem to shake.
If a student scores lower on a post test, that means they somehow lost knowledge that you, as a teacher, never taught them.
And what of students several grade levels below the threshold where the test theoretically begins measuring growth? They do not even register, even if they improve several grade levels.
More of the absurd definitions and econometric nonsense of the day.
Students are performing on grade level if their scores on a standardized test are at or above the median on a percentile scale (1-99). On a large-scale test, a score at or near the 50th percentile (the median) will usually classify a student as proficient in the skills and subject matter on the test.
Expected growth means that gain-scores of students (on tests in a single subject, such as math or art) are staying in about the same location in a distribution from year to year—below average, average, or above average. For a large number of students, the distribution is likely to resemble a bell or normal curve.
Predicted growth is an inference about a student’s future gain-score, derived from a linear regression analysis of two or more years of that student’s gain-scores. This analysis assumes that past performance will predict future performance. Perhaps, but in education, this is a dismal assumption. It can become a self-fulfilling prophecy. The assumption is so risky that almost every corporate report begins with this caveat: Past performance does not predict future performance.
A student is said to have achieved a year’s worth of growth if his or her gain-score on a test of proficiency is equal to, or greater than, the gain-score made by a 50th percentile student. The same measure is applied to teachers. Teachers in some districts are rated highly effective only if all or most of their students have gain-scores of more than a year’s worth of growth.
References to a year’s worth of growth are fundamentally misleading because the common mental picture of a calendar year is different from a school year (typically 180 days); an instructional year (typically 172 days); and a typical accountability year (130 days from pre-test to post-test).
Academic peers are students whose test scores in a given year are the same or nearly the same. This concept permits comparisons of their gain-scores from the prior year to the current year. Students who make greater gains than their academic peers have an accelerated growth trajectory. Students who fall behind their academic peers need remedial work to keep up. The average of the gain-scores for academic peers in a teacher’s classes is typically used as a measure of the teacher’s productivity and effectiveness. This use requires a studied indifference to other influences on test scores.
A growth trajectory needs a target. Targets for learning need to be set using baseline data so the instruction offered to each student, during a known interval of time, is efficient and has a measurable impact on student learning. Meeting targets for learning is analogous to meeting a sales target or a production quota by a date certain.
Teachers and others who say they are “impacting the growth of their students” are not think-ing about the meaning of words. They are parroting econometric jargon.
Experts associated with Metametrics hope to set growth velocity standards. They describe their theoretical mapping of “aspirational trajectories toward graduation targets” in reading skills as analogous to “modifying the height, velocity, or acceleration respec-tively of a projectile launched in the physical world.” They seek greater precision in setting targets and cut scores for grade-to-grade progress in meeting the CCSS. (Williamson, G. L., Fitzgerald, J., & Stenner, A. J. (2013). The Common Core State Standards’ quanti-tative text complexity trajectory figuring out how much complexity is enough. Educational Researcher, 42(2), 59-69.).
Calibration refers to the quest for precision and consistency in measurement in the context of just-in-time delivery of a result, especially manufacturing.. In education, the term means that evaluators and other monitors have followed specifications in rating performances, presentations, processes, and products. Calibration events are training sessions intended to standardize how raters use or interpret language and to verify that rules for making judgments have been followed with fidelity. Such events are also called trainings or calibrations.
Audits are conducted to verify that calibrations are not needed, that rules have been followed, that data are free of ambiguity, and that low-inference definitions of performances and metrics are used consistently. Questions about the validity of the metrics may be ignored.
Bring to scale means that an educational policy, practice, or product is believed to merit replication in multiple locations, as in manufacturing and franchise systems for a mass market.
“Questions about the validity of the metrics may be ignored.”
YEP!
All the edu-babble from the edudeformers, devout devotees of edumetrics, who love their mental masturbation with edumetric memes.
Hey, you have a poem there.
All it needs is a little rearrangement and a title,if I may
“Edu Scrabble”
Edubabble from edudeformers,
Devotees of edunormers
Apply their edumathturbation
To edumetric memes
Resulting in disasturbation
Of education themes
I sit in awe of you SDP!
My brain doesn’t work very well in rhyme.
I hope you don’t mind if I save and use that one!
Laura,
I am curious about how you determine grades for students in your classes. Does it have nothing to do with comparing how much a student should learn from the class to how much each individual student has learned from the class (measured by examining teacher constructed exams and assignments)?
“more paperwork means that the task shifts from “do the job well” to “fill out the paperwork the right way” which is easy to fake.” I wish that were true. I’m in trouble not because of anything missing during the actual observations of my teaching but because I didn’t do a good enough binder of paperwork documenting that I did a good job of teaching. I need help with how to fake it.