Archives for category: VAM (value-added modeling)

Gary Rubinstein is well known to readers of this blog, as I have posted almost all of his blogs. He is a career high school math teacher in the New York City public schools. I met Gary about ten years ago, when I had made a complete turnaround in my views about testing and choice. I was working on an article about “miracle schools” that fudged their data and discovered that Gary was an expert on reviewing school-level data and exposing frauds. He helped me write an article (“Waiting for a Miracle School”) that appeared in the New York Times in 2011, and he has continued to be a friend ever since. Gary’s analytical skills have been invaluable in fighting off idiotic “reforms,” like evaluating teachers by their students’ test scores (known as VAM). In his multiple posts on that subject, he showed its many flaws. For example, an elementary teacher might get a high score in reading and a low score in math, posing the dilemma of whether the district could fire her in one subject while giving her a bonus in the other. I confess that I am a person of The Word, and I have never taken the time to learn how to put graphics into my posts. I can’t even reproduce charts. I only do words. So when I need to post a pdf or a graphic or anything else that is not words, I turn to Gary for help and he is always there for me. In addition to being a math and computer whiz, Gary is an author. As most of you know, Gary began his career working for Teach for America. As he explains below, he became disillusioned with the “reform” spin just as I became disillusioned with the propaganda about testing and choice. Gary writes about how strange it is to be frequently attacked on Twitter and other social media by “reformers.” My admiration for him is boundless.

Gary writes:

I got into blogging almost exactly ten years ago, just after the Teach For America 20 anniversary alumni summit.  Until that time, I was unaware of the politics of education and the emerging education reform movement.  I had seen ‘Waiting For Superman’ and knew it was propaganda, but I didn’t quite understand who was benefiting from it or what the possible negative side effects of it could be.

But at that conference it became very clear to me what was going on during a ‘Waiting For Superman’ reunion panel discussion.  I watched as Michelle Rhee, whom I had known from years earlier when we worked together at the Teach For America training institute, and Dave Levin, who I had known for a lot of years from when we were teaching in Houston around the same time.  At the end of the conference, Arne Duncan made an odd speech about how great it was that he shut down a school and fired all the teachers and now it is a charter school in which every student supposedly graduated and got into college.

It sounded fishy to me.  Having worked, by that time, at three different schools that had low standardized test scores, I knew that a school can have good teachers but still have low test scores.  I suspected that there was more to the story than Arne Duncan was saying so I did my first investigation.  Little did I know that it would lead to a ten year adventure that would give me the opportunity to be an investigative journalist and help save the world.  As an added bonus, I made a lot of friends, got a following to read my writing, appeared on NPR and also on a TV show called ‘Adam Ruins Everything.’  But there was a downside to this attention because I also became a target of various known and unknown internet personalities who have attacked, ridiculed, and slandered me.  I think that on balance the good outweighed the bad, but it is sad to me that I have had blog posts about what an awful person I am and there have been podcasts about how I don’t believe in the potential of all children.  Students of mine have googled me and located some of these smears and asked me about them.  It’s hard to explain to them that I’m embroiled in a strange war where the FOX news of education wants to vilify me for telling the truth.

Here is a recent example where Chris ‘Citizen’ Stewart, the CEO of the Education Post website, compares my views with those of Charles Murray of ‘The Bell Curve’ fame.

I suppose my story is that I was the right person at the right time and in the right place.  The small group of resistors to the misguided bipartisan teacher-bashing agenda needed someone like me.  I was a Teach For America alum so I had that whole ‘war veteran against the war’ kind of credibility.  I was very patient and able to comb through state data.  I was a math major in college so I was able to do some basic statistics and make the scatter plots that helped the cause so much.  You may or may not know that I have slowed down a lot on my blogging.  After about 7 years of intense blogging, I started to burn out.  Fortunately other bloggers came on the scene and took up the cause and have been great.  I do try to blog from time to time still, but I have also been doing other projects, like my recent effort to explain all the essentials of elementary school, middle school, and high school math in one ten hour YouTube playlist.  These efforts come from the same source — the desire to help students learn.  Whether it is by fighting off a destructive element or in providing a free resource that anyone in the world can access, I am very proud of what I’ve accomplished in the last ten years.

I want to thank the great Diane Ravitch for taking me under her wing and for being a great mentor and friend.  I wish for her a speedy recovery from her surgery.

Here is a presentation I did at Tufts University describing my journey from teacher to crusader:

 

Audrey Amrein Beardsley writes here about Houston’s experience with value-added evaluation of its teachers.

The Houston Independent School District (HISD) contracted with William Sanders’ SAS to provide a model to calculate the “value-added” of its teachers from 2007-2017.

Teachers objected that the method of calculating their scores was opaque. They couldn’t learn how to improve their practice because Sanders’ methodology was proprietary and secret.

Teachers were fired based on their VAM scores.

The Houston Federation of Teachers sued to stop the use of the “black box” method.

In 2017, a judge agreed and enjoined the use of VAM.

Thus by now, after a decade of VAMMING teachers, Houston should have identified and removed all the “bad” teachers and employ only “effective” or “highly effective” teachers.

But the state threatened to take over the entire district because one high school–Wheatley– has low test scores. Wheatley High School has a disproportionately large share of students who are poor and have special needs, has low scores, even though all of its teachers–like all of Houston’s teachers–were VAMMED for a decade.

If VAM were effective, HISD should be the best urban district in the nation.

All achievement gaps should have closed by now.

Why is the state–which has no expertise in running a large urban district–taking control away from the elected board?


Chalkbeat reports that the privatizers at “Democrats” for Education Reform have identified their candidates for Biden’s Secretary of Education. They are three big-city superintendents who have worked harmoniously with charter schools.

DFER is an organization of hedge fund managers and financiers who are supporters of charter schools, merit pay, high-stakes testing, and value-added evaluation of teachers. In 2008, DFER successfully advocated for the appointment of Arne Duncan, a supporter of their goals.

Democrats for Education Reform is coordinating a behind-the-scenes push for Chicago schools chief Janice Jackson, the head of Baltimore schools Sonja Brookins Santelises, or Philadelphia superintendent William Hite, according to an email sent to supporters Monday by the group’s presidentShavar Jeffries and obtained by Chalkbeat. All three, Jeffries wrote, would represent a “‘big tent’ approach to education policy making….”

DFER was an influential actor in policy during the Obama administration, but those policies have mostly proved ineffective and/or rejected by teachers. In light of Betsy DeVos’ fierce advocacy for charter schools, DFER’s agenda is out-of-step with the Democratic Party.

In general, though, DFER has found some of its favored policies moving further from the Democratic Party’s mainstream. As a presidential candidate, Biden has proposed a slew of new federal restrictions on charter schools and been critical of standardized testing — a clear shift from the Obama administration, which promoted the growth of charter schools and teacher evaluations linked to test scores. 

“It is certainly the Biden plan,” the campaign’s policy director Stef Feldman said at a recent event, describing the candidate’s agenda for schools. “The vice president is pretty committed to the concept that we need to be investing in our public neighborhood schools and we can’t be diverting funding away from them.”

A number of factors have driven the shift within the Democratic party — including disillusionment with Obama-era reforms, the increased political strength of teachers and their unions, and Education Secretary Betsy DeVos, who is highly unpopular among Democrats and became a figurehead for school choice.

This shifting ground is reflected in DFER’s recent policy agenda, which was signed onto by a few civil rights groups; the Center for American Progress, a progressive think tank; and major charter school organizations, including the National Alliance for Public Charter Schools. The document emphasizes areas of likely agreement with a Biden administration, including expanding access to early childhood education, increasing federal funding for low-income students and students with disabilities, and raising teacher pay. Charter schools get only a brief mention in a section about “choices in quality public schools.”

The Center for American Progress is not a “progressive” think tank. It has long advocated the Obama-era education policies that align with DFER.


Our wonderful reader Laura Chapman reports here on the origins of the laws that purport to measure teacher quality by the test scores of their students. The founding father of this methodology was the late William Sanders, an agricultural statistician who believed that the same productivity used to measure cows could be used to measure teachers. His ideas were adopted and promoted by Arne Duncan’s Race to the Top, which required states to adopt “value-added methodology” if they wanted to compete for a share of billions of federal dollars. The Gates Foundation also embraced test-based accountability. These methods proved to be ineffective at measuring teacher quality; they are inaccurate and demoralizing.


According to a 2019 report coauthored by Audrey Amrein-Beardsley, 15 states are still inflicting teacher evaluations by VAM (value added measures) and 28 are using the equally invalid process of writing up Student Learning Objectives (SLOs). SLOs require you to predict the end-of-year (or end of unit) achievements of students, among other ridicule-worthy feats. https://kappanonline.org/mapping-teacher-evaluation-plans-essa-close-amrein-beardsley-collins/

Vamboozled, the website of Audrey Amrein-Beardsley, is a great resource for anyone still being a victim of this method of estimating the “value you have added” to the test scores of your students.

But there is also a deeper and little known origin story for VAMs. That story was exposed to view in April, 2020, by Gene V. Glass, a Senior Researcher at the National Education Policy Center and a Regents’ Professor Emeritus from Arizona State University. Glass released a treasure trove of correspondence about VAM (value added measures), first used in education by the late William Sanders, an agricultural statistician. http://ed2worlds.blogspot.com/2020/04/an-archaeological-dig-for-vam.html

In his blog post “Archeological Dig for VAM,” Glass reveals how William Sanders borrowed statistical methods for calculating VAM, then began using those calculations to judge teacher productivity/quality, based on the test scores of their students, specifically in the Tennessee Value-Added Assessment System (TVAAS).

It turns out that Sanders’ TVAAS process (VAM) was “built on the formulation of the late C.R. Henderson, a Cornell statistician, a fellow in the American Statistical Association, known for his pioneering work in breeding animals, specifically herds of dairy cows. Henderson’s statistical methods of producing a “genetic evaluation of livestock have been widely accepted, utilized, and enhanced by animal breeders and statisticians.”

Until Henderson’s 1953 publication of “Estimation of variance and covariance components” in Biometrics,” no one had tackled the difficult problem of “estimating variance components from unbalanced data of cross-classified models, e.g., of milk production records of daughters of A.I. (Artificially Inseminated) sires in different herds – where sires are crossed with herds, and, for a large group of herds, each sire has daughters in many herds and each herd has daughters of many sires.” https://ecommons.cornell.edu/bitstream/handle/1813/31657/BU-1085-MA.pdf;sequence=1

If you have a background in statistics (mine is minimal and vintage), you may enjoy reading the extended “defense” of VAM/TVAAS by the late William Sanders who cites his debt to Henderson’s work. Sanders’ defense of using VAM with teachers and the test scores of their students is revealed in his answers to numerous questions from William Robert Saffold, Vanderbilt Institute for Public Policy Studies, who is well-informed about the results in TVASS in Tennessee and wanted more information to interpret the results of TVAAS for educators. The extended discussion reveals the many unwarranted assumptions Sanders made in constructing TVAAS.

I think the hoopla over the specifics of VAM (and SLO’s) is too often disconnected from the fact-based origin story on “how to cull herds of dairy cows to maximize their productivity.” VAMs and SLOs are designed to cull teachers based on their productivity in raising the test scores of their students.

Almost all of the accountability structures in education based on standardized test scores are designed to cull–select and discard–teachers who are not producing gains in test scores. In VAMs, test scores of students are not much different from measures of milk production, whether of individual teachers or the whole herd (school).

Some supporters of VAM’s are acting as if education geneticists. They seem to think that some teachers are destined to be more productive than others. They insist, for example, that Teach For America graduates with high GPA’s from selective colleges are good breeders of above average test scores in their students. Moreover, these potentially good breeders only need is a brief course in summer before they are ready to produce students who are high scorers on tests. That brief summer dose of instruction is analogous to providing artificial insemination in breeding females… or for males, perhaps a dose of Viagra.

VAMs and SLOs are flawed measured pushed by the Obama/Duncan administration’s Race to the Top. These measures are still present in many state ESSA plans. That may explain why Race to the Top testing resources are still available, even if developed under contract for Race to The Top by members of a “Reform Support Network.”

The Reform Support Network was nothing more than a huge marketing campaign for these flawed measures. Here, for example, is how they marketed SLOs as a substitute for subjects and grade levels for which there were no statewide standardized test scores for calculating VAM. One is the infamous collective measure where, as Diane notes, teachers “are assigned ratings for students they never taught in subjects they never taught.” https://www.engageny.org/sites/default/files/resource/attachments/rsn-slo-toolkit.pdf

A decade ago, Richard Phelps was assessment director of the District of Columbia Public Schools. His time in that position coincided with the last ten months of Michelle Rhee’s tenure in office. When her patron Adrian Fenty lost the election for Mayor, Rhee left and so did Phelps.

Phelps writes here about what he learned while trying to improve the assessment practices of the DC Public Schools. He posts his overview in two parts, and this is part 1. The second part will appear in the next post.

Rhee asked Phelps to expand the VAM program–the use of test scores to evaluate teachers and to terminate or reward them based on student scores.

Phelps described his visits to schools to meet with teachers. He gathered useful ideas about how to make the assessments more useful to teachers and students.

Soon enough, he learned that the Central Office staff, including Rhee, rejected all the ideas he collected from teachers and imposed their own ideas instead.

He writes:

In all, I had polled over 500 DCPS school staff. Not only were all of their suggestions reasonable, some were essential in order to comply with professional assessment standards and ethics.

Nonetheless, back at DCPS’ Central Office, each suggestion was rejected without, to my observation, any serious consideration. The rejecters included Chancellor Rhee, the head of the office of Data and Accountability—the self-titled “Data Lady,” Erin McGoldrick—and the head of the curriculum and instruction division, Carey Wright, and her chief deputy, Dan Gordon.

Four central office staff outvoted several-hundred school staff (and my recommendations as assessment director). In each case, the changes recommended would have meant some additional work on their parts, but in return for substantial improvements in the testing program. Their rhetoric was all about helping teachers and students; but the facts were that the testing program wasn’t structured to help them.

What was the purpose of my several weeks of school visits and staff polling? To solicit “buy in” from school level staff, not feedback.

Ultimately, the new testing program proposal would incorporate all the new features requested by senior Central Office staff, no matter how burdensome, and not a single feature requested by several hundred supportive school-level staff, no matter how helpful. Like many others, I had hoped that the education reform intention of the Rhee-Henderson years was genuine. DCPS could certainly have benefitted from some genuine reform.

Alas, much of the activity labelled “reform” was just for show, and for padding resumes. Numerous central office managers would later work for the Bill and Melinda Gates Foundation. Numerous others would work for entities supported by the Gates or aligned foundations, or in jurisdictions such as Louisiana, where ed reformers held political power. Most would be well paid.

Their genuine accomplishments, or lack thereof, while at DCPS seemed to matter little. What mattered was the appearance of accomplishment and, above all, loyalty to the group. That loyalty required going along to get along: complicity in maintaining the façade of success while withholding any public criticism of or disagreement with other in-group members.

The Central Office “reformers” boasted of their accomplishments and went on to lucrative careers.

It was all for show, financed by Bill Gates, Eli Broad, the Waltons, and other philanthropists who believed in the empty promises of “reform.” It was a giant hoax.

Jan Resseger read Valerie Strauss’s hopeful column about a possible end to America’s obsession with standardized testing, and wrote about how this testing has warped American education into a punishing regime, rather than an environment of nurturing , growth, caring, compassion, and human development.

Jan reviews some of the most important ways in which test scores have been used to punish students, teachers, principals, and schools, even school districts.

Enough is enough.

The Wall Street Journal, which has a teacher-bashing, union-hating pro-privatization editorial board, published an editorial warning about the dangers of policing the police.

The editorial included these sentences.

“There’s a case for police reforms, in particular more public transparency about offenses by individual officers. Union rules negotiated under collective bargaining make it hard to punish offending officers, much as unions do for bad public school teachers. By all means let’s debate other policies and accountability in using force.”

So, police brutality is the union’s fault. And killer cops are just like those “bad teachers” whose students don’t get high test scores. Clearly, the WSJ didn’t get the memo about the consistent failure of test-based accountability as a means of evaluating teachers.

Do I detect a false equivalency?

Other nations have police unions and a minuscule number of police killings, compared to the U.S. And since when did low test scores become comparable to a brazen act of crushing a man’s windpipe?

Say this for Eric Hanushek: He never gives up on his obsession with paying teachers more if their students get higher test scores. Arne Duncan built this concept into the requirements of his disastrous Race to the Top” program, which caused almost every state to adopt a teacher evaluation plan in which student test scores played a significant role. Harvard economist Raj Chetty wtote a highly-publicized paper with two colleagues, claiming that one good teacher (who raised test scores in the early grades) would raise lifetime incomes (by about $5 a week), reduce pregnancies, and be a life-changer. President Obama cited Chetty in his 2012 State of the Union address, but efforts to turn the theory into reality fell flat. (Read more about this catastrophe in SLAYING GOLIATH.) In fact, every state that imposed value-added measurement learned that it discouraged teachers from teaching in high-needs schools, where their chance of getting a big test score gain was reduced. It did not produce any of the promised benefits.

But forget about reality! Let’s stand by the theory. Hanushek’s new venture at the conservative Hoover Institution is joined by Christopher Ruszkowski, who served as Commissioner of Education in New Mexico after the resignation of Hanna Skandera (who previously worked for the Hoover Institution, Jeb Bush and Arnold Schwarzenegger). After eights years of “reform” leadership, New Mexico remained mired at the bottom of NAEP. The state had a harsh, test-based teacher evaluation plan, but the union fought it in court, it was enjoined by a judge, and the New Democratic Governor scrapped it as one of her first executive actions. New Mexico has one of the highest proportions of students living in poverty, but Republican state leaders ignored that inconvenient fact. After a decade of consistent failure, we can safely put test-based teacher evaluation into the category of a Zombie idea. Dead but still stalking the land.

 

Hoover_Centennial_Logo_RGB Match PMS 202 (red)_w 600

FOR IMMEDIATE RELEASE

CONTACT:

Hoover Institution, Jeff Marschner, (202) 760-3200

NEWLY FORMED HOOVER EDUCATION SUCCESS INITIATIVE RELEASES PAPER ON TRANSFORMING TEACHER COMPENSATION

Four education policy papers to be released in 2020—addressing how states should consider transforming education in the decade ahead.

STANFORD, CA. (January 30th) – As state legislative sessions begin around the country, the Hoover Education Success Initiative (HESI), a new research program at Stanford University’s Hoover Institution, has released “The Unavoidable: Tomorrow’s Teacher Compensation”—a policy briefing on the important connections between teacher compensation systems and student achievement outcomes. The research-based policy paper includes both a summary of findings and practical recommendations for policymakers.

The paper highlights often overlooked areas for attention including shifting overall compensation from retirement into salaries, ending the practice of paying for advanced degrees that do not yield changes in student outcomes, addressing teacher shortages in a targeted fashion instead of generally, and paying teachers more when they are effective in higher-need schools.  The paper concludes that teachers’ salaries should be significantly increased, but that students will not make achievement gains unless salaries are also linked to teacher quality.

“We need to pay teachers competitively, which we are not doing now,” said Dr. Eric Hanushek, author of the policy synthesis. “But just increasing compensation without recognizing teacher effectiveness is unlikely to lead to improved student outcomes. We should bundle together better pay with a serious recognition of just how important effective teachers are when it comes to influencing student achievement.”

“While we have spent much of the last year reviewing and synthesizing the research, the next phase of our work turns to helping states implement the policy ideas,” said Christopher N. Ruszkowski, executive director of HESI. “There is overwhelming evidence that nothing matters more than teacher quality, and state legislatures and governors should take strong action. Neglecting this responsibility causes harm to our students that may not be immediately visible today but will certainly be reflected in our students’ lives and in our economy tomorrow.  It’s a tough issue and it may feel like something we can avoid, but it will catch up with us.”

Click here to read the policy analysis brief.

About the Hoover Education Success Initiative

With passage in 2015 of the Every Student Succeeds Act (ESSA), states are again in charge of American education policy. To support them in this undertaking, the Hoover Education Success Initiative (HESI), launched in 2019, seeks to provide state education leaders with policy recommendations that are based upon sound research and analysis.  HESI hosts workshops and policy symposia on high-impact areas related to the improvement and reinvention of the US education system. The findings and recommendations in each area are outlined in concise topical papers.

The leadership team at HESI engages with its Practitioner Council, formed of national policy leaders, and with interested state government leaders. HESI’s ultimate goal is to spark innovation and contribute to the ongoing transformation of the nation’s K-12 education landscape, thus improving outcomes for our nation’s children.

###

Jeff Marschner
Director of Media Relations

Audrey Amrein Beardsley is one of the leading experts in the nation in the field of value-added assessment and also one of the nation’s leading skeptics of the claim that teacher “effectiveness” can be measured by the test scores of their students.

Recently, a study was published by economists that purported to measure the effect of teachers’ on their students’ height. The study was a blatant lampoon of VAM (value-added modeling or measurement).

It turns out that I was one of about 25 people who promptly forwarded it to Amrein-Beardsley.

She reviewed the study here. 

Beardsley reminds us of a paper written by economist Jesse Rothstein nearly a decade ago in which he lacerated VAM by showing that it could be twisted to show the effect of teachers on students’ past achievement, a feat that is clearly absurd.

When a policy idea like VAM becomes the target of satire, you know that it is well and truly dead. Now, if only someone would tell the state legislatures that.

A group of scholars collaborated to write a paper published by the National Bureau of Economic Research that studies how teachers affect student height. It is a wonderful and humorous takedown of the Raj Chetty et al thesis that the effects of a single teacher in the early grades may determine a student’s future lifetime earnings, her likelihood graduating from college, live in higher SES neighborhoods, as well as avoid teen pregnancy.

When the Chetty study was announced in 2011, a front-page article in the New York Times said:

WASHINGTON — Elementary- and middle-school teachers who help raise their students’ standardized-test scores seem to have a wide-ranging, lasting positive effect on those students’ lives beyond academics, including lower teenage-pregnancy rates and greater college matriculation and adult earnings, according to a new study that tracked 2.5 million students over 20 years.

The paper, by Raj Chetty and John N. Friedman of Harvard and Jonah E. Rockoff of Columbia, all economists, examines a larger number of students over a longer period of time with more in-depth data than many earlier studies, allowing for a deeper look at how much the quality of individual teachers matters over the long term.

“That test scores help you get more education, and that more education has an earnings effect — that makes sense to a lot of people,” said Robert H. Meyer, director of the Value-Added Research Center at the University of Wisconsin-Madison, which studies teacher measurement but was not involved in this study. “This study skips the stages, and shows differences in teachers mean differences in earnings.”

The study, which the economics professors have presented to colleagues in more than a dozen seminars over the past year and plan to submit to a journal, is the largest look yet at the controversial “value-added ratings,” which measure the impact individual teachers have on student test scores. It is likely to influence the roiling national debates about the importance of quality teachers and how best to measure that quality.

Many school districts, including those in Washington and Houston, have begun to use value-added metrics to influence decisions on hiring, pay and even firing….

Replacing a poor teacher with an average one would raise a single classroom’s lifetime earnings by about $266,000, the economists estimate. Multiply that by a career’s worth of classrooms.

“If you leave a low value-added teacher in your school for 10 years, rather than replacing him with an average teacher, you are hypothetically talking about $2.5 million in lost income,” said Professor Friedman, one of the coauthors…

The authors argue that school districts should use value-added measures in evaluations, and to remove the lowest performers, despite the disruption and uncertainty involved.

“The message is to fire people sooner rather than later,” Professor Friedman said.

Professor Chetty acknowledged, “Of course there are going to be mistakes — teachers who get fired who do not deserve to get fired.” But he said that using value-added scores would lead to fewer mistakes, not more.

President Obama hailed the  Chetty study in his 2012 State of the Union address.

Value-added teacher evaluation, that is, basing the evaluation of teachers on the rise or fall of their students’ test scores, was a central feature of Arne Duncan’s Race to the Top when it was unveiled in 2010. States had to agree to adopt it if they wanted to be eligible for Race to the Top funding.

When the Los Angeles Times published a value-added ranking of thousands of teachers, teachers said the rankings were filled with error, but Duncan said those who complained were afraid to learn the truth. In Florida, teacher evaluations may be based on the rise or fall of the scores of students that the teachers had never taught, in subjects they had never taught. (About 70% of teachers do not teach subjects that are tested annually to provide fodder for these ratings.) When this nutty process was challenged inn court by Florida teachers, the judge ruled that the practice might be unfair but it was not unconstitutional.

The fundamental claim of VAM (value-added modeling or measurement) has been repeatedly challenged, most notably by economist Moshe Adler. When put into law, as it was in most states, it was found to be useless, because only tiny percentages of teachers were identified as ineffective, and even the validity of the ratings of that 1-3% was dubious. The use of VAM was frozen by a judge in New Mexico, then tossed out earlier this year by a new Democratic governor. It was banned by a judge in Houston.  A large experiment funded by the Gates Foundation intended to demonstrate the value of VAM produced negative results.

Now comes economic research to test the validity of linking teacher evaluation and student height.

 

Marianne Bitler, Sean  Corcoran, Thurston Domina, and Emily Penner wrote:

NBER Working Paper No. 26480
Issued in November 2019
NBER Program(s):Program on Children, Economics of Education Program

Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height. We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice.