A group of scholars collaborated to write a paper published by the National Bureau of Economic Research that studies how teachers affect student height. It is a wonderful and humorous takedown of the Raj Chetty et al thesis that the effects of a single teacher in the early grades may determine a student’s future lifetime earnings, her likelihood graduating from college, live in higher SES neighborhoods, as well as avoid teen pregnancy.

When the Chetty study was announced in 2011, a front-page article in the New York Times said:

WASHINGTON — Elementary- and middle-school teachers who help raise their students’ standardized-test scores seem to have a wide-ranging, lasting positive effect on those students’ lives beyond academics, including lower teenage-pregnancy rates and greater college matriculation and adult earnings, according to a new study that tracked 2.5 million students over 20 years.

The paper, by Raj Chetty and John N. Friedman of Harvard and Jonah E. Rockoff of Columbia, all economists, examines a larger number of students over a longer period of time with more in-depth data than many earlier studies, allowing for a deeper look at how much the quality of individual teachers matters over the long term.

“That test scores help you get more education, and that more education has an earnings effect — that makes sense to a lot of people,” said Robert H. Meyer, director of the Value-Added Research Center at the University of Wisconsin-Madison, which studies teacher measurement but was not involved in this study. “This study skips the stages, and shows differences in teachers mean differences in earnings.”

The study, which the economics professors have presented to colleagues in more than a dozen seminars over the past year and plan to submit to a journal, is the largest look yet at the controversial “value-added ratings,” which measure the impact individual teachers have on student test scores. It is likely to influence the roiling national debates about the importance of quality teachers and how best to measure that quality.

Many school districts, including those in Washington and Houston, have begun to use value-added metrics to influence decisions on hiring, pay and even firing….

Replacing a poor teacher with an average one would raise a single classroom’s lifetime earnings by about $266,000, the economists estimate. Multiply that by a career’s worth of classrooms.

“If you leave a low value-added teacher in your school for 10 years, rather than replacing him with an average teacher, you are hypothetically talking about $2.5 million in lost income,” said Professor Friedman, one of the coauthors…

The authors argue that school districts should use value-added measures in evaluations, and to remove the lowest performers, despite the disruption and uncertainty involved.

“The message is to fire people sooner rather than later,” Professor Friedman said.

Professor Chetty acknowledged, “Of course there are going to be mistakes — teachers who get fired who do not deserve to get fired.” But he said that using value-added scores would lead to fewer mistakes, not more.

President Obama hailed the  Chetty study in his 2012 State of the Union address.

Value-added teacher evaluation, that is, basing the evaluation of teachers on the rise or fall of their students’ test scores, was a central feature of Arne Duncan’s Race to the Top when it was unveiled in 2010. States had to agree to adopt it if they wanted to be eligible for Race to the Top funding.

When the Los Angeles Times published a value-added ranking of thousands of teachers, teachers said the rankings were filled with error, but Duncan said those who complained were afraid to learn the truth. In Florida, teacher evaluations may be based on the rise or fall of the scores of students that the teachers had never taught, in subjects they had never taught. (About 70% of teachers do not teach subjects that are tested annually to provide fodder for these ratings.) When this nutty process was challenged inn court by Florida teachers, the judge ruled that the practice might be unfair but it was not unconstitutional.

The fundamental claim of VAM (value-added modeling or measurement) has been repeatedly challenged, most notably by economist Moshe Adler. When put into law, as it was in most states, it was found to be useless, because only tiny percentages of teachers were identified as ineffective, and even the validity of the ratings of that 1-3% was dubious. The use of VAM was frozen by a judge in New Mexico, then tossed out earlier this year by a new Democratic governor. It was banned by a judge in Houston.  A large experiment funded by the Gates Foundation intended to demonstrate the value of VAM produced negative results.

Now comes economic research to test the validity of linking teacher evaluation and student height.

 

Marianne Bitler, Sean  Corcoran, Thurston Domina, and Emily Penner wrote:

NBER Working Paper No. 26480
Issued in November 2019
NBER Program(s):Program on Children, Economics of Education Program

Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height. We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice.