According to the text of the Vergara decision, two expert witnesses for the plaintiffs were Professor Raj Chetty of Harvard and Professor Tom Kane of Harvard.

Professor Chetty, the judge said, testified that “a single year in a classroom with a grossly ineffective teacher costs students $1.4 million in lifetime earnings per classroom.” Dr. Kane testified that students in LAUSD taught by a teacher in the bottom 5% of competence lose 9.54 months of learning in a single year compared to students taught by an average teacher.

Chetty, you may recall, is the nation’s leading proponent of VAM. Kane directs the Gates Foundation’s MET (Measures of Effective Teaching) Project.

The judge accepted these statements as fact, not knowing they are strongly disputed by other scholars.

### Like this:

Like Loading...

*Related*

I just watched the video of the Finnish prof talking about Finnish education and now I’m either inspired or totally depressed by the comparisons between theirs and ours. I haven’t decided. Teachers have many fewer classroom hours per week, kids have 2 years less schooling than here, and, amazingly, teachers are able to teach and assess students very well.

Loaded dice?

Chetty’s evidence on which this extraordinary ruling is based hasn’t even appeared in any peer-reviewed journal – it was published as an NBER working paper. The charts he showed to the judge violate basic rules of scientific data presentation: the y-axes are all manipulated, the error bars and R-square values are missing. That a judge would accept such weak evidence to strike down statutes is disturbing and highlights the lack of scientific literacy of the judiciary. Maybe that too can be blamed on the teachers.

Actually the work is forthcoming as two papers in the American Economic Review, generally considered the most important and prestigious peer reviewed economic publication in the world.

Thanks. I was not making a factual error however. As of this writing, there is only an unreviewed working paper. The AER version hasn’t been published yet and none of us know what it will look like. I would still be interested in your opinion whether it is permissible to show these data as points without error bars, to show the regression line without r-square value, and to rescale the y-axis to exaggerate the purported effect 20-fold.

As I said, both papers have been reviewed and accepted. I think this is probably the second round. In the first it looks like the referees recommended taking the longer single paper and dividing it up into two papers.

As I said, publication lags are long, so it might be a while coming out.

Are they publishing the charts without error bars and r-square values and with manipulated y-axes?

I do not know what the final format of the paper will be like. It will probably be out in a year or so, publication lags are long.

Are you one of the authors of the paper? I cannot comment on a paper that hasn’t been published yet. I am commenting on the charts that HAVE been published and those violate scientific standards. I also point out that even if AER reviewed and accepted a version of that paper, that doesn’t make it “settled science” by a huge stretch. This evidence should be debated, critiqued and replicated among experts, not used as ideological fodder. To base a court decision striking down a half dozen state laws on a single “forthcoming” paper is grotesque. A conscientious scientist must also, if called to testify in a court room, be honest about the limitations of their findings. That too is a duty of a scientist: admitting that one could be proven wrong by further research.

I am not one of the authors of the paper. I was pointing out a factual error in your post that this work was simply a working paper. The authors seem to me to be very open, allowing anyone in the world access to their data set and SAS program used to do the calculations. If you can find the data manipulation I am sure that AER would be very interested in publishing your results.

Bruce Baker (http://schoolfinance101.wordpress.com/2012/01/07/fire-first-ask-questions-later-comments-on-recent-teacher-effectiveness-studies/) also noticed the “super-elastic, super-extra-stretchy Y axis”: “the NYT graph shows an increase of annual income from about $20,750 to $21,000. But, they do the usual news reporting strategy of having the Y axis go only from $20,250 to $21,250… so the $250 increase looks like a big jump upward. That said, the author’s own Figure 6 in the working paper does much the same!”

I guess he would be as surprised as I am that AER would actually publish the “super-extra-stretchy” chart.

I dug a bit deeper about the Chetty et al. study. Here’s what I found:

The study was first published as an NBER working paper at http://obs.rc.fas.harvard.edu/chetty/value_added.pdf in 2011. It was submitted to American Economic Review and accepted after being split in two papers. Preprints can be found at http://obs.rc.fas.harvard.edu/chetty/w19424.pdf and http://obs.rc.fas.harvard.edu/chetty/w19423.pdf.

All the charts in all versions of the papers violate scientific data presentation standards:

– all Y-axes are “super-extra-stretchy” (Bruce Baker), exaggerating the purported statistical effect by orders of magnitude;

– none of the charts have error bars or confidence intervals; if they were shown, the error bars would certainly exceed the purported statistical effect by far;

– no r-square values or correlation coefficients are given; the r-squares are likely to be very low;

– the sample sizes on which the regressions are based are not shown; although the study drew on millions of student records, most of the records were incomplete and the actual regressions are based on only a fraction of the records.

Criticisms of the working paper by Bruce Baker (http://schoolfinance101.wordpress.com/2012/01/07/fire-first-ask-questions-later-comments-on-recent-teacher-effectiveness-studies/) and Moshe Adler (epaa.asu.edu/ojs/article/download/1264/1033) are worth reading. Adler points out that Chetty et al. report a small (albeit visually wildly exaggerated) but statistically significant effect of teacher VAM on student earnings at age 28 *but they found no effect at all on earnings at age 30*. Explains Adler:

“However, although the result they found for 30 year olds is not statistically significant, the words “not statistically significant” are nowhere to be found in their study. Instead the authors write, “The 95% confidence interval for the estimate is very wide. We therefore focus on earnings impacts up to age 28 for the remainder of our analysis.” … Furthermore, they didn’t just “focus” on earning impacts up to age 28; instead they proceeded as if the result for 28 year olds, an increase in income of .09%, was also the result that they found for 30 year olds, and made the assumption that this would have also been the result for any subsequent age group. Based on this assumption, which is contradicted by their own evidence, they calculated a life-time benefit of $25,000 from an increase of one standard deviation in teacher value-added. … After conducting a study its authors cannot ignore the results; science does not permit cherry picking. The result that teacher value-added does not have a statistically significant impact on earnings at age 30 must be part of any conclusion drawn from the Chetty et al. study.”

Damningly, the AER version of the paper omits even mentioning the negative finding for earnings at 30. The language from the working paper has simply been removed from the AER version! The editors and reviewers of AER must answer some hard questions about their scientific standards.

Thanks, I tried to read the Chetty study on income mobility across generations, but my mind was spinning in circles pretty quick.

Basically, I would agree that good teaching increases future income. Trying to measure teachers against each other, or measure future income is fraught with peril, I believe, do to the fact that there are far to many variables to isolate to ever claim any reasonable amount of precision.

Plus, that whole zoom in on the graph thing, boo. I think that might have happened on a graph in the movie “inconvenient truth”, plus I’ve seen it a few other times. Distorting data is bad form at minimum.

Please note that figures 1c, 2b, 4, and 7 all include confidence intervals drawn in as dotted lines. Standard errors of all coefficient estimates are reported in the tables in parentheses below the coefficients. The response to Alder can be found here: http://obs.rc.fas.harvard.edu/chetty/Adler_response.pdf

I’m sorry but the confidence intervals are missing from most charts *and in particular from the ones that were shown to the judge* (which are easiest to view at http://www.washingtonpost.com/blogs/wonkblog/wp/2014/06/10/a-california-judge-just-ruled-that-teacher-tenure-is-bad-for-students/). These charts represent binned regressions, that is, for each 5-percentile-band, a whole set of thousands of data points was reduced to one single point, and then a regression line was fitted through these 20 points. Each of these 20 data points needs to be shown with a box-and-whisker plot or at least error bars – omitting them is a serious fault and I reiterate that no real scholarly journal with editors and reviewers that actually understand statistics would accept that. Show me a single chart like that in Nature or Science and I’ll buy you a beer. (Note also that the binned data points actually don’t fit the regression line very well – it seems that the slightly above-average teachers produce better outcomes than the “superstars”).

Moshe Adler’s extensive critique can be found here: http://nepc.colorado.edu/thinktank/review-measuring-impact-of-teachers with a response from Chetty et al. Part of the response is cringe-worthy: they admit that there is no statistically significant effect on income at age 30 but claim that “this does not mean there is no effect at age 30; rather, it means that one has insufficient data to measure earnings impacts accurately at age 30.” But the data are not insufficient – they are what they are. The method chosen by the authors come up with the result that there is no significant effect, hence the null-hypothesis cannot be rejected, hence there is no sufficient scientific evidence in favor of the alternative hypothesis (that there is an effect). It is exactly as I thought: the statistical evidence is so thin that the authors deliberately left the error bars/confidence intervals *out of all their charts* and stretched the Y-axes to create the impression that there are clear effects. This is a clear no-no and borders intentional fraud.