This is an important article, which criticizes and deconstructs the notorious VAM study by Chetty et al. I refer to it as notorious because it was reported on the first page of the New York Times before it was peer-reviewed; it was immediately presented on the PBS Newshour; and President Obama referred to its findings in his State of the Union address only weeks after it first appeared.
These miraculous events do not happen by accident. The study made grand claims for the importance of value-added measures of teacher quality, a keystone of Obama’s education policy. One of the authors told the New York Times that the lesson of the study was to fire teachers sooner rather than later. A few months ago, the American Statistical Association reacted to the study, not harshly, but made clear that the study was overstated, that the influence of teachers on the variability of test scores ranged from 1-14%, and that changes in the system would likely have more influence on students’ academic outcomes than attaching the scores of students to individual teachers.
I have said it before, and I will say it again: VAM is Junk Science. Looking at children as machine-made widgets and looking at learning solely as standardized test scores may thrill some econometricians, but it has nothing to do with the real world of children, learning, and teaching. It is a grand theory that might net its authors a Nobel Prize for its grandiosity, but it is both meaningless in relation to any genuine concept of education and harmful in its mechanistic and reductive view of humanity.
CHETTY, ET AL, ON THE AMERICAN STATISTICAL ASSOCIATION’S RECENT POSITION STATEMENT ON VALUE-ADDED MODELS (VAMs): FIVE POINTS OF CONTENTION
by Margarita Pivovarova, Jennifer Broatch & Audrey Amrein-Beardsley — August 01, 2014
Over the last decade, teacher evaluation based on value-added models (VAMs) has become central to the public debate over education policy. In this commentary, we critique and deconstruct the arguments proposed by the authors of a highly publicized study that linked teacher value-added models to students’ long-run outcomes, Chetty et al. (2014, forthcoming), in their response to the American Statistical Association statement on VAMs. We draw on recent academic literature to support our counter-arguments along main points of contention: causality of VAM estimates, transparency of VAMs, effect of non-random sorting of students on VAM estimates and sensitivity of VAMs to model specification.
INTRODUCTION
Recently, the authors of a highly publicized and cited study that linked teacher value-added estimates to the long-run outcomes of their students (Chetty, Friedman, & Rockoff, 2011; see also Chetty, et al., in press I, in press II) published a “point-by-point” discussion of the “Statement on Using Value-Added Models for Educational Assessment” released by the American Statistical Association (ASA, 2014). This once again brought the value-added model (VAM) and its use for increased teacher and school accountability to the forefront of heated policy debate.
In this commentary we elaborate on some of the statements made by Chetty, et al. (2014). We position both the ASA’s statement and Chetty, et al.’s (2014) response within the current academic literature. As well, we deconstruct the critiques and assertions advanced by Chetty, et al. (2014) by providing counter-arguments and supporting them by the scholarly research on this topic.
In doing so, we rely on the current research literature that has really been done on this subject over the past ten years. This more representative literature was completely overlooked by Chetty, et al. (2014), even though, paradoxically, they criticize the ASA for not citing the “recent” literature appropriately themselves (p. 1). With this being our first point of contention, we also discuss four additional points of dispute within the commentary.
POINT 1: MISSING LITERATURES
In their critique of the ASA statement, posted on a university-sponsored website, Chetty, et al. (2014) marginalize the current literature published in scholarly journals on the issues surrounding VAMs and their uses for measuring teacher effectiveness. Rather, Chetty et al. cite only works representing econometrician’s scholarly pieces, apparently in support of their a priori arguments and ideas. Hence, it is important to make explicit the rather odd and extremely selective literature Chetty, et al. included in the reference section of their critique, on which Chetty, et al. relied “to prove” some of the ASA’s statements incorrect. The whole set of peer-reviewed articles that counter Chetty, et al.’s arguments and ideas are completely left out of their discussion.
A search on the Educational Resources Information Center (ERIC) with “value-added” as key words for the same last five years yields 406 entries, and a similar search in Journal Storage (JSTOR, a shared digital library) returns 495. Chetty, et al., however, only cite 13 references to critique the ASA’s statement, one of which was the actual statement itself, leaving 12 external citations in total and in support of their critique. Of these 12 external citations, three are references to their two forthcoming studies and a replication of these studies’ methods; three have thus far been published in peer-reviewed academic journals, six were written by their colleagues at Harvard University; and 11 were written by teams of scholars with economics professors/econometricians as lead authors.
POINT 2: CORRELATION VERSUS CAUSATION
The second point of contention surrounds whether the users of VAMs should be aware of the fact that VAMs typically measure correlation, not causation. According to the ASA, as pointed out by Chetty, et al. (2014), effects “positive or negative—attributed to a teacher may actually be caused by other factors that are not captured in by the model” (p. 2). This is an important point with major policy implications. Seminal publications on the topic, Rubin, Stuart and Zanutto (2004) and Wainer (2004) who positioned their discussion within the Rubin Causal Model framework (Rubin, 1978; Rosenbaum and Rubin, 1983; Holland, 1986), clearly communicated, and evidenced, that value-added estimates cannot be considered causal unless a set of “heroic assumptions” are agreed to and imposed. Moreover, “anyone familiar with education will realize that this [is]…fairly unrealistic” (Rubin, et al. 2004, p. 108). Instead, Rubin, et al. suggested, given these issues with confounded causation, we should switch gears and evaluate interventions and reward incentives as based on the descriptive qualities of the indicators and estimates derived via VAMs. This point has since gained increased consensus among other scholars conducting research in these areas (Amrein-Beardsley, 2008; Baker, et al., 2010; Betebenner, 2009; Braun, 2008; Briggs & Domingue, 2011; Harris, 2011; Reardon & Raudenbush, 2009; Scherrer, 2011).
POINT 3: THE NON-RANDOM ASSIGNMENT OF STUDENTS INTO CLASSROOMS
The third point of contention pertains to Chetty, et al.’s statement that recent experimental and quasi-experimental studies have already solved the “causation versus correlation” issue. This claim is made despite the substantive research that evidences how the non-random assignment of students constrains VAM users’ capacities to make causal claims.
The authors of the Measures of Effective Teaching (MET) study cited by Chetty, et al. in their critique, clearly state, “we cannot say whether the measures perform as well when comparing the average effectiveness of teachers in different schools…given the obvious difficulties in randomly assigning teachers or students to different schools” (Kane, McCaffrey, Miller & Staiger, 2013, p. 38). VAM estimates were found to be biased for teachers who taught more relatively homogenous sets of students with lower levels of prior achievement, despite the levels of sophistication in the statistical controls used (Hermann, Walsh, Isenberg, & Resch, 2013; see also Ehlert, Koedel, Parsons, & Podgursky, 2014; Guarino et al., 2012).
Researchers repeatedly demonstrated that non-random assignment confounds value-added estimates independent of how many sophisticated controls are added to the model (Corcoran, 2010; Goldhaber, Walch, & Gabele, 2012; Guarino, Maxfield, Reckase, Thompson, & Wooldridge, 2012; Newton, Darling-Hammond, Haertel, & Thomas, 2010; Paufler & Amrein-Beardsley, 2014; Rothstein, 2009, 2010).
Even in experimental settings, it is still not possible to distinguish between the effects of school practice, which is of interest to policy-makers, and the effects of school and home context. There are many factors at the student, classroom, school, home, and neighborhood levels that would confound causal estimates that are beyond researchers’ control. Thus, the four experimental studies cited by Chetty, et al. (2014) do not provide ample evidence to refute the ASA on this point.
POINT 4: ISSUES WITH LARGE-SCALE STANDARDIZED TEST SCORES
In their position statement, ASA authors (2014) rightfully state that the standardized test scores used in VAMs should not be the only outcomes of interest for policy makers and stakeholders. Indeed, current agreement is that test scores might not even be one of the most important outcomes capturing a student’s educated self. Also, if value-added estimates from standardized test scores cannot be interpreted as causal, then the effect of “high value-added” teachers on college attendance, earnings, and reduced teenage birth rates cannot be considered causal either as opposed to what is implied by Chetty, et al. (2011; see also Chetty, et al., in press I, in press II).
Ironically, Chetty, et al. (2014) cite Jackson’s (2013) study to confirm their point that high value-added teachers also improve long-run outcomes of their students. Jackson (2013), however, actually found that teachers who are good at boosting test scores are not always the same teachers who have positive and long-lasting outcomes on non-cognitive skills acquisition. Moreover, value-added as related to test scores and non-cognitive outcomes for the same teachers were then, and have since been shown to be, weakly correlated with one another.
POINT 5: MODEL SPECIFICITY
Lastly, ASA (2014) expressed concerns about the sensitivity of value-added estimates to model specifications. Recently, researchers have found that value-added estimates are highly sensitive to the tests being used, even within the same subject areas (Papay, 2011) and the different subject areas taught by the same teachers given different student compositions (Loeb & Candelaria, 2012; Newton, et al., 2010; Rothstein, 2009, 2010). While Chetty, et al. rightfully noted that different VAMs typically yield correlations around r = 0.9, this is typical with most “garbage in, garbage out” models. These models are too often used, too often without question, to process questionable input and produce questionable output (Banchero & Kesmodel, 2011; Gabriel & Lester, 2012, 2013; Harris, 2011).
What Chetty, et al. overlooked, though, are the repeatedly demonstrated weak correlations between value-added estimates and other indicators of teacher quality, on average between r = 0.3 and 0.5 (see also Corcoran, 2010, Goldhaber et al., 2012; McCaffrey, Sass, Lockwood, & Mihaly, 2009; Broatch and Lohr, 2012; Mihaly, McCaffrey, Staiger, & Lockwood, 2013).
CONCLUSION
In sum, these are only a few “points” from this “point-by-point discussion” that would strike anyone even fairly familiar with the debate over the use and abuse of VAMs. These “points” are especially striking given the impact Chetty, et al.’s original (2011) study and now forthcoming studies (Chetty, et al., in press I, in press II) have already had on actual policy and the policy debates surrounding VAMs. Chetty, et al.’s (2014) discussion of the ASA statement, however, should cause others pause in terms of whether in fact Chetty, et al. are indeed experts in the field, or not. What certainly has become evident is that they do not have their minds wrapped around the extensive set of literature or knowledge on this topic. If they had, they may not have come off as so selective, as well as biased, citing only those representing certain disciplines and certain studies to support certain assumptions and “facts” upon which their criticisms of the ASA statement were based.
References
American Statistical Association. (2014). ASA Statement on using value-added models for educational assessment. Retrieved from http://www.amstat.org/policy/pdfs/ASA_VAM_Statement.pdf
Amrein-Beardsley, A. (2008). Methodological concerns about the Education Value-Added Assessment System (EVAAS). Educational Researcher, 37(2), 65–75. doi: 10.3102/0013189X08316420
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Ravitch, D., Rothstein, R., Shavelson, R. J., & Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. Washington, D.C.: Economic Policy Institute. Retrieved from http://www.epi.org/publications/entry/bp278
Banchero, S. & Kesmodel, D. (2011, September 13). Teachers are put to the test: More states tie tenure, bonuses to new formulas for measuring test scores. The Wall Street Journal. Retrieved from http://online.wsj.com/article/SB10001424053111903895904576544523666669018.html
Betebenner, D. W. (2009b). Norm- and criterion-referenced student growth. Education Measurement: Issues and Practice, 28(4), 42-51. doi:10.1111/j.1745-3992.2009.00161.x
Braun, H. I. (2008). Viccissitudes of the validators. Presentation made at the 2008 Reidy Interactive Lecture Series, Portsmouth, NH. Retrieved from http://www.cde.state.co.us/cdedocs/OPP/HenryBraunLectureReidy2008.ppt
Briggs, D. & Domingue, B. (2011, February). Due diligence and the evaluation of teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District Teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center. Retrieved from nepc.colorado.edu/publication/due-diligence
Broatch, J. and Lohr, S. (2012) “Multidimensional Assessment of Value Added by Teachers to Real-World Outcomes”, Journal of Educational and Behavioral Statistics, April 2012; vol. 37, 2: pp. 256–277.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2011). The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood. Cambridge, MA: National Bureau of Economic Research (NBER), Working Paper No. 17699. Retrieved from http://www.nber.org/papers/w17699
Chetty, R., Friedman, J. N., & Rockoff, J. (2014). Discussion of the American Statistical Association’s Statement (2014) on using value-added models for educational assessment. Retrieved from http://obs.rc.fas.harvard.edu/chetty/ASA_discussion.pdf
Chetty, R., Friedman, J. N., & Rockoff, J. E. (in press I). Measuring the impact of teachers I: Teacher value-added and student outcomes in adulthood. American Economic Review.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (in press II). Measuring the impact of teachers II: Evaluating bias in teacher value-added estimates. American Economic Review.
Corcoran, S. (2010). Can teachers be evaluated by their students’ test scores? Should they be? The use of value added measures of teacher effectiveness in policy and practice. Educational Policy for Action Series. Retrieved from: http://files.eric.ed.gov/fulltext/ED522163.pdf
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. J. (2014). The sensitivity of value-added estimates to specification adjustments: Evidence from school- and teacher-level models in Missouri. Statistics and Public Policy. 1(1), 19–27.
Gabriel, R., & Lester, J. (2012). Constructions of value-added measurement and teacher effectiveness in the Los Angeles Times: A discourse analysis of the talk of surrounding measures of teacher effectiveness. Paper presented at the Annual Conference of the American Educational Research Association (AERA), Vancouver, Canada.
Gabriel, R. & Lester, J. N. (2013). Sentinels guarding the grail: Value-added measurement and the quest for education reform. Education Policy Analysis Archives, 21(9), 1–30. Retrieved from http://epaa.asu.edu/ojs/article/view/1165
Goldhaber, D., & Hansen, M. (2013). Is it just a bad class? Assessing the long-term stability of estimated teacher performance. Economica, 80, 589–612.
Goldhaber, D., Walch, J., & Gabele, B. (2012). Does the model matter? Exploring the relationships between different student achievement-based teacher assessments. Statistics and Public Policy, 1(1), 28–39.
Guarino, C. M., Maxfield, M., Reckase, M. D., Thompson, P., & Wooldridge, J.M. (2012, March 1). An evaluation of Empirical Bayes’ estimation of value-added teacher performance measures. East Lansing, MI: Education Policy Center at Michigan State University. Retrieved from http://www.aefpweb.org/sites/default/files/webform/empirical_bayes_20120301_AEFP.pdf
Harris, D. N. (2011). Value-added measures in education: What every educator needs to know. Cambridge, MA: Harvard Education Press.
Hermann, M., Walsh, E., Isenberg, E., & Resch, A. (2013). Shrinkage of value-added estimates and characteristics of students with hard-to-predict achievement levels. Princeton, NJ: Mathematica Policy Research. Retrieved form http://www.mathematica-mpr.com/publications/PDFs/education/value-added_shrinkage_wp.pdf
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.
Jackson, K. C. (2012). Non-cognitive ability, test scores, and teacher quality: Evidence from 9th grade teachers in North Carolina. Cambridge, MA: National Bureau of Economic Research (NBER), Working Paper No. 18624. Retrieved from http://www.nber.org/papers/w18624
Kane, T., McCaffrey, D., Miller, T. & Staiger, D. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Bill and Melinda Gates Foundation. Retrieved from http://www.metproject.org/downloads/MET_Validating_Using_Random_Assignment_Research_Paper.pdf
Loeb, S., & Candelaria, C. (2013). How stable are value-added estimates across
years, subjects and student groups? Carnegie Knowledge Network. Retrieved from http://carnegieknowledgenetwork.org/briefs/value‐added/value‐added‐stability
McCaffrey, D. F., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4, 572–606.
Mihaly, K., McCaffrey, D., Staiger, D. O., & Lockwood, J.R. (2013). A
composite estimator of effective teaching. Seattle, WA: Bill and Melinda Gates Foundation. Retrieved from: http://www.metproject.org/downloads/MET_Composite_Estimator_of_Effective_Teaching_Research_Paper.pdf
Newton, X. A., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010). Value added modeling of teacher effectiveness: An exploration of stability across models and contexts. Educational Policy Analysis Archives, 18(23). Retrieved from: epaa.asu.edu/ojs/article/view/810.
Papay, J. P. (2010). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163–193.
Paufler, N. A., & Amrein-Beardsley, A. (2014). The random assignment of students into elementary classrooms: Implications for value-added analyses and interpretations. American Educational Research Journal.
Reardon, S. F., & Raudenbush, S. W. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492–519. doi:10.1162/edfp.2009.4.4.492
Rosenbaum, P., & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 17, 41–55.
Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, (4)4, 537–571. doi:http://dx.doi.org/10.1162/edfp.2009.4.4.537
Rothstein, J. (2010, February). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics. 175–214. doi:10.1162/qjec.2010.125.1.175
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, 6, 34–58
Rubin, D. B., Stuart, E. A., & Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics, 29(1), 103–116.
Scherrer, J. (2011). Measuring teaching using value-added modeling: The imperfect panacea. NASSP Bulletin, 95(2), 122–140. doi:10.1177/0192636511410052
Wainer, H. (2004). Introduction to a special issue of the Journal of Educational and Behavioral Statistics on value-added assessment. Journal of Educational and Behavioral Statistics, 29(1), 1–3. doi:10.3102/10769986029001001
Cite This Article as: Teachers College Record, Date Published: August 01, 2014
http://www.tcrecord.org ID Number: 17633, Date Accessed: 8/10/2014 8:23:06 AM
Are VAM supporters lying or are they guilty of bad biased science? It appears that the battles over what counts as better for education in the United States will be decided, not by the relative strength of evidentiary arguments, but instead by who most successfully claims the moral high ground. Public acceptance of policy prescriptions turns not turn on technical determinations, but on values identification and moral judgments. As I argued here (http://www.arthurcamins.com/wp-content/uploads/2014/06/Resistance-to-Attacks-on-Public-Education-is-Not-Enough.pdf), we need to both expose evidentiary lapses and failures, but also mount a values-based campaign for equitable democratic education.
Sorry wrong link: http://www.washingtonpost.com/blogs/answer-sheet/wp/2014/08/04/its-innovative-but-is-it-really-better/
“It is a grand theory that might net its authors a Nobel Prize for its grandiosity”
And that just about says it all…
Under what conditions can valid causal inferences be drawn from data? Are there any?
Define “valid”.
By valid, I mean “the kind of causal inference that won’t get you accused of confusing correlation for causation.”
LOL! I’d love to hear the PhDs in stats weigh in here as the ASA already has. If we are looking for absolutes on observational studies, even the sun may not rise tomorrow. No P(event) = 1.0 in our world. Even a pure double blind experiment, could, and should, be questioned and peer reviewed or “accused” as you suggest. And they can’t set conditions to draw out causation. Causation must already exist.
Too much of this reformy-wormy wisdom seems somewhat self-referential. The Reformers do studies using test scores to judge teaching, but then insist we only improve teaching by raising test scores. Something is wrong with that.
Chetty, though, is such an obvious misapplication of the math that it is laughable. To those of us that LIVE these silly VAM models, it becomes pretty obvious there is a major disconnect between Chetty’s conclusions and reality. With even a college level working knowledge of stats and logic, Chetty’s basis for his data is suspect, his models terribly incomplete, and his attempt at drawing causation is a stretch. I mean, c’mon. I always thought teen pregnancy was caused by something else, not bad teachers and low test scores. But, hey, what do I know. I didn’t go to Harvard.
Causality can be inferred when test subjects are randomly assigned to their groups. Results can be extended from the sample to the population when members are drawn from the population randomly. Those two conditions are often very difficult to achieve ethically when dealing with human subjects.
Assume, for argument’s sake, sampling isn’t used, or, rather, 100% of the population is represented in the sample. Under those conditions, when (if ever) is it possible to make valid causal inferences based on the data I have about that population?
And the effects of hidden or confounding variables also must be accounted for. Hence, double-blind controlled randomized trials when dealing with human subjects and human observers. I have no idea how you would accomplish that in education, let alone ethically. You’d have to have identifiable “good teachers” and “bad teachers”, be willing to randomly assign children to the known bad teachers, and neither the children nor the observers nor even the teachers themselves could know which were the good and which were the bad.
Flerp!,
This is the typical situation for economists. Studies linking poverty to poor school performance are not the result of randomly assigning some students to be poor and others to be wealthy.
I suppose there is no evidence of that link either.
Random assignment is one, but not he only, requirement for establishing experimental control in human studies. What is critical to understand about VAM studies is that they are not designed to identify causes of any variable on another variable.
VAM studies are correlation analyses that attempt to see if 2 variables change in concert. Strong correlations simply mean there is a relationship and other types of analyses are needed to decide why that might happen. Cause cannot be inferred. For example, look at these correlational data with strong relationships that are totally unrelated:
http://twentytwowords.com/funny-graphs-show-correlation-between-completely-unrelated-stats-9-pictures/
Causal outcomes are tested when one (or more) independent variable(s) are manipulated under controlled conditions and result in changes in the dependent variable. An experimental variable is given to one group but not the other.
In both correlation & causal experiments control for error is critical. According to this analysis, Chetty’s study violated several controls for error which means the results are invalid.
It would be embarrassing if Chetty won a Nobel for such low quality work.
Jcgrim,
It is usually a body of work that wins a Nobel prize. I believe his MacArthur “genius” award was based on his work on tax incidence, though I suppose his work showing the importance of small class size and teacher experience in kindergarten for lifetime outcomes played a role.
Sometimes we find natural experiments in economics, most of the time we don’t. That is why we have developed specialized statistical techniques to deal with it.
TE, yes. Throw out these useless metrics and let teachers teach. Next thing you know, esteemed Harvard economists will insist correlations and observational studies show 90%+ debt-to-GDP ratios are a serious hinderance to economic growth therefore we must cut schools to the bone. But, of course, that will never happen and no one is silly enough to fall for that, let alone base fiscal policy on such flawed models lacking peer review. Are they?
We don’t take the pilot out of the cockpit regardless what the instruments say. Metrics can help as diagnostic tools well trained teachers can analyze and review. But stats are not the ends.
MathVale,
What should we do with useful metrics?
Diagnostics. I question any metric blindly as “useful”. I know my students not as functional representations of some behavioral theory with varying parameters, but as complex, unpredictable humans.
Stats are fine, but not an end goal in themselves. Give me stats, give me predictive models, give me analysis, but don’t take away my ability to use them as I see fit based on my own judgement.
We are far from that, instead limiting and punishing teachers with “useful metrics”. Useful to whom?
But will the people who need to read this do so? Will they understand it’s message, implications and importance, or will they cling to their unfounded beliefs? One does not have to look far to find examples where zealots have promoted ideas unfounded by scientific findings. At least this commentary gives those who warn that VAM is junk science, really not science at all, support to continue to make their case of how VAMs are being misused.
It is odd to me that the current deform movement touts “data driven” on a rather selective basis only when it supports what it is that they are promoting.
Please see this link as a follow-up to this and many amazing pieces on Diane’s blog.
https://theconversation.com/five-trends-that-jeopardise-public-education-around-the-world-28969
cross posted at http://www.opednews.com/Quicklink/The-Holes-in-the-Chetty-et-in-Best_Web_OpEds-Education_Important_Influence_Policy-140811-744.html#comment506049
with this comment at the end:
“I hope by now, IF you are interested in the truth about education, OR how the conspiracy to end democracy and the road to opportunity begins with the purposeful destruction of our INSTITUTION OF PUBLIC EDUCATION, that you go to the Ravitch Blog to keep abreast of the conversation among the real educators and to read about the shenanigans of the oligarchs as they continue their assault on THE CLASSROOM TEACHER, who is the enabler an facilitator of learning… not some provider of a service, but a full professional who needs to be autonomous when meeting genuine OBJECTIVES for her students.”
How is that for a run-on sentence from an “English Teacher?”
I don’t get it… Chetty’s findings were publicized and promoted PRIOR to peer review… Why? I’ve always admired researchers who are open about the limitations and delimitations of their studies. Thanks for sharing this.
They were publicized because the “study” said what those with the power and money wanted it to say. That’s all.
NBER generally releases working papers to the academic community. Working papers have often been praised here in the past, if the results conform to the dominant narrative. See this for example: https://dianeravitch.net/2013/07/19/the-negative-effects-of-holding-kids-back-in-third-grade/
Nice observation and example TE. However, the New York Times and PBS goes beyond what I’d consider to be the academic community. Was it ever disclosed to those media outlets that it was a working paper and had yet to be peer reviewed?
Joe,
The Times also reported on his study about intergenerational income mobility. I have no doubt that the reporters of the NYT are competent enough to actually look at the papers they are reporting on.
I’d don’t know TE. In my experience, journalists can be selective and get things wrong (even the NY Times). Even if journalists are competent enough to look at the papers, that doesn’t mean they do.
Joe,
It is a simple matter to actually look at the New York Times article, the link provided by Dr. Ravitch. Here is the relevant section of the NYT article:
“The study, which the economics professors have presented to colleagues in more than a dozen seminars over the past year and plan to submit to a journal,…”
Sounds like they know a working paper when they report on one.
Indeed. So why is the research being reported on in the popular media prior to being submitted to a journal and being subject to peer review? I don’t get it.
Joe,
One of the concerns is probably publication delay. These papers will likely not be published in AER until late 2015 or 2016.
Well TE, the 2012 NY Times article reads that they plan to submit it to journals. Maybe not having yet to submit it could be one reason for the publication delay (beyond the typical delay in publication after an article is accepted). I’m just not thrilled with the idea of publicizing the findings in the popular media prior to peer review and publication. Btw, after reading a lot of your comments on Chetty et al. in general on this blog, I’m starting to wonder if you’re Chetty himself (or perhaps one of et al.). 🙂
Joe,
I am not Chetty nor one of his coauthors. We have never meet.
You may be unfamiliar with the academic publishing cycle. My guess is that the first working paper was submitted to AER (usually thought of as the most prestigious economics journal and the most difficult to publish in) soon after appearing as an NBER working paper. The AER editor in charge of the paper would next send it out to three referees to read and write up any concerns they have with the article and a recommendation on printing the article in the journal. This could easily take at least six months, especially with a long and complicated paper. Apparently the referees recommended that the authors “revise and resubmit” the article. Most articles that are eventually published get this type or referee report with the initial submition. In this case it seems that the referees had two major suggestions: re-estimate the model to allow for drift in teacher quality over time and rewrite the single paper into two separate papers, one just looking at the stability of VAM measures of teacher quality (where issues of the endogineity of student assignment to teachers, a concern voiced by posters here, is considered) and a second paper linking teacher quality to life outcomes like having children while a teenager, probability of attending college, and earnings. After the rewrite, the papers were resubmitted, the reviewers read the new version (this should take less time than the original submition). At this point the paper was accepted for publication and enters the queue. Because the pair of papers will use up a large number of journal pages they will likely be placed at the back of the queue.
That publishing process sounds about right TE. I actually do blind reviews for an education research journal myself. We make revision suggestions as well, unless the article is rejected to begin with. Sounds like AER has a high rejection rate.
I think organizations and individuals that support typical uses of VAM just haven’t thought deeply about why their proposed uses have such negative consequences. I recently began a debate with StudentsFirst VP Eric Lerum about why his organization’s approach to teacher evaluation is counterproductive (in addition to being unfair). I think our conversation can begin to shed some light on where the misunderstandings come from (and can hopefully clarify, for reformers, why they need to change their approach).
Here was part 1 of the conversation: http://34justice.com/2014/08/04/studentsfirst-vice-president-eric-lerum-and-i-debate-accountability-measures-part-1/
Here was part 2 (which also touched on the problems with the way reformers talk about poverty): http://34justice.com/2014/08/11/eric-lerum-and-i-debate-teacher-evaluation-and-the-role-of-anti-poverty-work-part-2/
The proposed uses are a different issue from the validity of the research. The orthodox opinion here is that the Chetty study presents no evidence that high quality teaching results in long term positive consequences for students.
Not high quality teaching, high standardized test scores.
Akademos,
Changes in test scores, especially when they are unconnected to teacher evaluation as they were in the Chetty study time period, seem likely to be correlated with good teaching.
I agree that those are different issues, though they’re often intimately tied to one another. I think the key point about VAM is not that it’s a bad idea – VAM data is worthwhile to study and I have no problem with people researching it further – but that nothing in the research supports the conclusions that reformers draw from it (even the conclusions drawn by the researchers themselves often represent huge logical leaps from the results they obtained).
I also think it’s important to note that nearly everyone who reads Diane’s blog (hopefully) thinks that teachers can make a long-term impact on their students. We just contest the way reformers talk about it, and the way they misconstrue results like Chetty’s, because they wildly overstate teacher impact and then undermine more important reforms by aligning themselves with politicians and foundations that actively work against the poor.
I’d just highlight that it is accurate and intellectually consistent to both acknowledge that teachers make an impact while maintaining that most reform efforts are misguided.
Ben,
The distinction between the validity of the research and the proposed uses of VAM are never made here, so I am not sure it is obvious to everyone.
Believing that strong teachers have lasting impacts on students is not the same as having evidence that can convince others is something else.
You make two good points. I don’t think they contradict what I wrote, but I agree that both those things are important to keep in mind.
Ben,
I don’t think they contradict what you posted.
Your statement that “.that most reform efforts are misguided.” leads to the obvious question of which reform efforts do you feel are not misguided?
Some reformers want to develop new teacher leadership pathways, improve teacher professional development and support and administrator training, and incentivize teachers to work in low-income communities. Many also want to increase opportunities for teachers and administrators to collaborate, both within and across schools. This is not meant to be an exhaustive list, and the devil is often in the implementation details, but I think these are all good ideas.
Ben,
That seems like a good list. I also see value in moving away from geographically defined school admission policies in order to allow schools to differentiate themselves from one another so they can provide students with a variety of approaches to education. The relatively rich in my town, for example, can choose a Montessori education or a progressive education or a Waldorf education for their students, but the relatively poor can not afford the tuition of the private schools are not given those options.
Also a good point, though I think that unless all schools are of comparable quality and the “choice” is about style/philosophy instead of quality, the concept of school choice itself should deeply concern those people interested in educational equity.
Ben,
I think it might be difficult to draw a sharp distinction between the quality of a school and the style/philosophy of the school without thinking about how students come to attend the school. A high quality school for one kind of student may be a low quality school for another kind of student depending on the approach taken. My middle son, for example, learned best when the faculty got out of the way and let him proceed on his own. That would have been a very poor educational environment for my foster son.
That makes sense. To clarify, I meant “comparable quality” to mean that each school would serve its population’s needs comparably well.
Ben,
I certainly like the goal, but I think it will be difficult to achieve if you have a central auditor trying to regulate the schools into searching their individual populations equally well. Ensuring uniformity across the schools is so much easier that I worry any centralized auditing system will devolve to that standard.
I have more hope that allowing parents to choose between schools (where student density is high enough to allow multiple similar schools to exist) could serve as a check on school quality over time without imposing uniformity across schools.
I don’t disagree that it would be difficult to adhere to the ideal I mention; however, I think education is a right. No matter how hard it is to implement, I think it is our responsibility to strive for the condition in which everyone has equivalent access to that right.
I thought Chetty’s data said that earnings increased for 28 year olds, but not 30 year olds. The influence of a good teacher wears off after a decade.
AKA, the Chetty grossly effective teacher shelf life study.
Actually not. The loss of data meant that there was more than a 5% chance that the observed increase in earnings for 30 year olds (there was an increase in earnings for 20 year olds) might have occurred by chance.
Fair enough. What was the nugget of the Chetty study? That a “good teacher” (whatever the technical definition for that is) produces 50K in increased earnings per working life, per student, compared to an average teacher?
TC,
The nugget of the study is that good teaching (roughly measured by increases in student scores while in a teacher’s class) results in good outcomes decades later (roughly measured by low teenage pregnancy rates, higher income, more stable relationships, etc). I think it is remarkable that we can hear the echo of good teaching in the cacophony of things that impact a students life.
TE, As said many times on this blog, “good teaching” defined by test scores is the issue, not that “good teaching” is an undesirable activity. The reformy logic goes like: good teaching is measured by test scores, so to improve “good teaching” we must improve test scores. A self-referential, self-limiting scenario. That is not a good thing.
I disagree teachers impact only 5-20% of student learning. I suspect there are necessary, but not sufficient, conditions that are first met, then teacher impact is much higher IN THE CLASSROOM.
MathVale,
Good teaching is not defined by changes in test scores in this study, changes in test scores are the tracks that you see when a strong teachers has passed through a students life.
How do you then identify “strong teacher”? By correlating those “tracks” with lifetime metrics? Very shakey ground.
Yup, that is the way you do it. What impact do teachers have on the lives of there students? Is there any other meaningful measure?
So, I guess Danielson wasted her time with all of that organizational mania.
But seriously folks, there are so many vicissitudes in life and specifically countercurrents as people rise in levels of awareness, echelons of so-called excellence or strata and responsibility, etc. One’s conscience and consciousness can get in the way of all sorts of gains, in some fields you can rise indefinitely without a pay raise, unless you step out and go another avenue, an avenue that simply may not suit you in any way. And so many things can be deep learning experiences that resonate throughout your life strongly but that do not necessarily lead to material gain or even happiness in the common sense, though you may be grateful for the unending insight and have no regrets.
Life, lives aren’t simple things. There is so much enrichment that you will probably never see on any sort of a metric, let alone a paycheck. And not all positive enrichment and profound resonances lead to better or even good things.
Amazing. Stats 101 – do not confuse correlation with causation, careful of confounding variables, know your models and data.
Right?! Using this theory, I could just as easily say that low test scores “cause” bad teachers. After all, when a school of children continues to score poorly, regardless of who’s in front of them, there is usually a strong of “bad teachers” fired who at one point had been good teachers when they’d worked in more affluent districts. There’s a reason that many teachers “burn out” in high poverty districts… and it’s often NOT the students… it’s the constant barrage of teacher shaming when kids don’t score well. I’d venture to say more lousy test scores cause “ineffective teachers” than the other way around!!!!
*string not strong
Or you could say that poor school performance causes poverty.
Or lousy snacks in the cafeteria caused Iggy Azalea. Ditch metrics but for diagnostics. Let teachers teach.
MathVale: even I, a charter* member of the innumerati, get what you’re saying.
*For those enamored of CCSS ‘closed-minded’ reading a la Chetty & Co. research, I am indulging in an archaic use of language called “word play.”*
Howzabout we include the role that definitions—and the way they influence what and how we measure something—into the mix? And with definitions come assumptions, assumptions that —Gloryosky!—can reveal what we value and what we do not value.
And to top it off, with such a self-limited “literature” at their disposal (aka echo chamber), for the accountabully underlings of the charterite/privatizer movement there’s no need to trouble themselves with doubt and self-correction and apologies for past mistakes and such tomfoolery. That’s for the little people, not the “Michael Jordans” like themselves.
They constitute the priesthood of the High Holy Church of Testolatry. They are complete and whole unto themselves. They may argue about the number of data points that can dance on the point of a pin, but genuine learning and teaching—don’t confuse them with facts and logic and reasoning! You’re not playing fair!
Peer review—what’s that?
“A man is his own easiest dupe, for what he wishes to be true he generally believes to be true.” [Demosthenes]
😎
Meanwhile, companies aren’t giving great raises, or raises at all, and are cutting staff and benefits, and salaries are decreasing, except for the higher ups, CEOs, etc. Staff will be held monetarily “under thumb.” THAT is the result of having poor teachers? I think not.
Did they locate these people who are making more/making less and survey whether or not they had access to “good” teachers or “bad” teachers? How can any of this nonsense be validated?
Many current college graduates are working at bartenders, receptionists, clerks, because they can’t find jobs in their fields. Are teachers responsible, too, for the bad economy according to Chetty, et al?
Many in reform positions (principals, supers, administrators) could be called $uccessful, based on the monies they have thieved or been granted in awards of excellence from Broad, et al., or stolen from hedge-fund type projects, or the real estate bubble, etc. Did they have super teachers who taught them how to be thieving dirtbags?
Cause/effect. I’d like to know how they came to their theories, and why anyone would believe them, other than as a tool to dupe us into getting rid of teachers. Period. Then, the TFA clerks can take over, as well as computer screen teaching, via Rocketship.
Meanwhile, again, I post that Michelle Rhee is now going by Michelle Johnson, and has a new gig on the Board of Scott (Miracle Gro). I guess she will do her reformy magic there as well. http://www.bizjournals.com/columbus/news/2014/08/11/former-d-c-schools-chief-joins-scotts-board.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+bizj_columbus+(Business+First+of+Columbus)
Gee, how I can’t stand her.
Donna – Don’t you find it ironic that she has moved from spreading her form of educational fertilizer around the public school farm and now is engaged in the same activity with Scott and Miracle Gro? Given her experience in DC, she should do well with Scott!
Did the American Statistical Association ever hold up the “virtual twin” methodology CREDO uses in its studies to this degree of scrutiny?
“The whole set of peer-reviewed articles that counter Chetty, et al.’s arguments and ideas are completely left out of their discussion.”
Of course. Reputations are at stake.
Cherry picked sources are rampant in other studies as well. It is more than sad that the American Statistical Association did not nip the VAM scam in the bud. It has been hyped and marketed and functions as the centerpiece of the awful econometric turn in education. This turn honors circular reasoning, bizarre assumptions, and inferential leaps into fantasy worlds where teachers and kids are portrayed as if they were responsible for the fate of the nation’s economy. It this reasonable? Doesn’t Congress have a role in shaping the economy, and Wall Street, and international events,…and reports from economists and statisticians whose forecasts are wrong or just lies?
Of course all of the hoopla about VAM ignores the fact that about 70% of teachers have job assignments for which there are not statewide tests. In this respect, the federal architecture for policy in education, and most of the flawed work of Chetty et.al. has been constructed with limited knowledge of, or concern for, anything except easy-to-score statewide tests in reading, math, and science. Such tests typically sample knowledge and skill with 50-60 fill in the bubble answers (and 10% of those items not really counted but included by publishers who want field tests).
The arrogance of statisticians in allowing this truncated view of education to flourish and be honored in federal policies is matched by the willingness of too many policy makers to be seduced into the belief that numbers are “objective” and authoritative–as in “numbers don’t lie.” But they do, not only in what gets counted, but what is not counted. The slim little book called “How to Lie with Statistics,” is a classic for good reason.
The accountability movement in which VAM has played a major role has been permitted to flourish by a combination of silence among qualified critics and a batch of full monty marketing campaigns on behalf of this or that study. In fact, the federally funded gatekeeper on “sound empirical research” musters people to review studies based on the “media attention” the study has attracted even it the study has not been peer reviewed. That is a screwy criterion, but it illustrates how the machinery of PR is embedded in and can influence policies.
All of those dubious data points that have led to the reification of VAM have seriously misrepresented what students have been are learning and should learn beyond participating in standardized tests.
Now of course, many states will have scores from the forthcoming tests of the CCSS. Kindergarten students, class of 2015, are supposed to be ready for college and entry into a career in 2028.
I wonder how statisticians and economists will retrofit all of the test data from NCLB tests to accommodate the new tests of the CCSS and predict the earning potential of each cohort of students who are at different thresholds of being taught to reach those 1.620 CCSS standards–13 cohorts of students in all.
When was the last time economists and statisticians were charged with gross negligence and misrepresentation of anything? Or federal officials and billionaires who who have reified VAM?
The obvious conclusion to the Chetty study, is that the greatest economic stimulus to the U.S. would be to double all teachers salaries, thus attracting more grossly effective teachers and leveraging the incredible gains in future earnings of students.
Probably not.
Agree to disagree.
Probably not. Triple them,
Hello Diane and everyone, an additional Chetty absurdity occurred to me late last night and I’d like to share it with you. Chetty’s allegations on the increase in earnings due to having a “better” teacher or teachers is pure aspirational fantasy. I really do mean that literally as these phantom increases in pay take place in an idealized and over simplified employment environment that simply does not exist. The numbers he cites are an artifact of big data, not a description of reality. He claimed an annual increase in SALARY of $1,000 would result from having a great vs a good kindergarten teacher. If we assume a 50 week work year, that’s $80 per month. The mechanism by which this occurs is not clear. Does the salaried worker negotiate a better deal with his employer as a result of having had a better teacher? So we are to believe that the kindergarten teachers effects reach all the way to an employer she never taught? How does a better teacher improve an employees odds in an arms length salary negotiation that is based on unpredictable market forces and conditions occurring over a decade out? Obviously Chetty didn’t factor in the steep rise in the cost of college that will more than consume the pathetic earnings increase as student loans are repaid. You may have noticed that I capitalized my first use of “salary” above. It should be acutely obvious that there is no benefit to hourly workers since there’s no way that having had a better teacher ( better by what measure now Mr. Chetty?) will have any impact on raising the minimum wage, a current political goal of the working class being strongly resisted by the business community. Sorry all you STEM and other students that didn’t find work in your field and had to settle for a minimum wage job to survive, no increased earnings for you. Just for fun, lets apply this to lawyers. Since they are salaried if employed by a firm, one would assume that it’s their productivity that would lead to a pay raise which, for their employer means billable hours. Yes, you can see it coming! By some magical method known only to Mr. Chetty, that great kindergarten teacher has attracted more clients to the law firm where the lawyer works. We could also talk about the stagnation of wages and the massive upward redistribution of wealth that we’ve been living under the duress of in todays economy, but why kill a fly with a sledge hammer? Because this absurd idea needs to die!
“VAM is Junk Science.” (Diane) Enough said.
Five Trends that Jeopardize Public Education Around the World
https://theconversation.com/five-trends-that-jeopardise-public-education-around-the-world-28969
I just think there’s a POLITICAL bias toward this study and toward this whole theory. The laser-like focus on teacher quality is obviously attractive to a whole group of people including politicians, parents and people who are opposed to paying taxes for public schools because it lets all of us off the hook EXCEPT teachers.
No wonder everyone loves it. It’s the best theory ever, unless you’re a teacher.
As long as we’re all scolding teachers we don’t have to take any responsibility for our schools.
I don’t even blame the author. It’s not his fault that every politician and policymaker seized on this. It was the most politically saleable theory at the right time.
It reminds me of when they all jumped onboard the austerity train and were flogging that study that was later discredited. They wanted to put in austerity policies and that study told them what they wanted to hear.
Quite. Especially an elite status in academic clique can dazzle a lot of politicians and reformers.
Chetty et al was (were?) used to support the Vergara decision, right? How does this timely criticism of the validity of Chetty et al affect a possible challenge to Vergara?
I have not read the original article. Is there a link to it somewhere?
Anyway, it *seems* that the authors identified good “high-value added” teachers by them having students who scored well on a test. They then determine that students of those teachers do better in terms of earning later in life. That is, students who did well in at least one aspect of school did better, generally, in one aspect of their adult life. Gee, it’s almost like you wouldn’t need to even look at the teacher to guess that conclusion. And, gosh, there certainly wouldn’t be any lurking variables beyond the teacher linking those to items.
Google and you will find the study. Here is the NY Times’ story, which links to it: http://www.nytimes.com/2012/01/06/education/big-study-links-good-teachers-to-lasting-gain.html?pagewanted=all
We Must,
Just use the search terms “Chetty teachers” and look for the Harvard link.
We must,
You might want to read the articles before deciding that they did not control for other things that would influence the outcomes.
I just read the results. I was correct. They identify high-value teachers based on the performances of the students, track those students, and then claim that the later performances of those students was due to the teachers they were associated with. They really have only shown that higher performing students end up as higher performing adults. From what they have published so far, they have *no* reason to claim causality.
I think very few people would reject the notion that people who are successful in their student years tend to be successful in their adult years. The quarrel is whether that can be causally traced to particular teachers, even in part and thus if using a rather indirect measure of student performance should then be used as a stand in for teacher quality.
We Must,
Most of the folks here think there is a causal relationship between high quality teaching and long term benefits for students. This study provided some empirical evidence to support this. Through out the evidence, and what you are left with is truthiness.
Never mind. After reading Adler’s criticism, I have to agree that Chetty et al. haven’t actually shown anything, even that students with higher test scores earned more later.
Two limitations of the Chetty study are that they do not control for a complex set of socioeconomic variables and also use income at age 28 as the outcome. For the convenience of the study, I understand why they would do this for practical reasons.
However, for a study of an older generation there is reason to believe that there are important variables that Chetty et al did not control for and that income at age 28 is a premature point to draw conclusions about lifetime earnings.
For those who are interested in lifetime attainment and the factors that contribute to it, you may find the Wisconsin Longitudinal Study of interest. The WLS is a long-term study drawing on a data set that began with a statewide survey of 1957 public high school graduates in Wisconsin and continues to this day. The research literature based on the WLS is vast. Here is the link for just the education studies. http://www.ssc.wisc.edu/wlsresearch/publications/pubs.php?topic=education
The WLS is a significant research project, although much has changed in 50+ years so the question of generalizability is there.
Stiles,
Actually the Chetty study does control for a rich set of student characteristics including prior exam scores by the students and parental characteristics.
Chetty uses income at 28 as the measure of earnings primarily because of data limitations. If we wait another 20 years, we can look at earnings at 50 if you like. There are, of course, other outcomes that the paper looks at. A female student who has a strong teacher is less likely to become pregnant as a teen. Students who have had a strong teacher in 3-8 grade are more likely to go to college, more likely to live in a community where many have gone to college, and more likely to participate in a retirement savings program.
TE, unless I’m missing something I disagree that Chetty et al controlled for a rich set of parental characteristics, although they controlled for some. I believe they controlled for adjusted gross income, home ownership, 401K, marital status, and mother’s age at childbirth. However, they did not control for mother’s education, father’s education, and occupational status. When controlling for family background they controlled more for economic factors that social factors, yet we know that parental education and occupational status can have equal effects to income in contributing to educational attainment. While parental income does tend to have a greater effect on the child’s income, parental education and occupational status would have an influence on the student achievement gains that is unaccounted for in the model.
Regarding the limitations of age 28, your suggest that there is a trade-off in waiting another twenty years. That’s true, but it is still the case that at age 28 individuals in the study are early in their careers and that there is probably a negative correlation between post-secondary education and job experience.
I’m not disparaging the research design, but pointing out the limitations. The more significant the claim, the stronger the evidence that should be marshaled. Major policy decisions are using the Chetty, Friedman, and Rockoff study. To me, it’s a pretty clear case of outrunning the headlights.
Stiles,
They controlled for what they could. Access to tax records allowed them to easily take into account family income, but they did not have data on parental educational attainment with that data set.
All research is limited. The question is if the limitations so distorted the results as to make them invalid. The concern that VAM is being used illegitimately as a way to evaluate current teachers is only tangentially connected to the validity of this research.
Larry Cuban has a post this morning in his blog
linked to an article that recently appeared in The Atlantic
http://www.theatlantic.com/education/archive/2014/07/how-to-read-education-data-without-jumping-to-conclusions/374045/?single_page=true
that addresses these same concerns.
VAMification supports might want to hem and haw to their whim but no matter how you slice it, the study doesn’t add up to the valid assessment. These VAMifiers don’t know, don’t care, and don’t even bother to do research on quality of standardized tests. They’re nixed. Bye.
Supports/supporters
Although the validity of VAM is a constant topic here, and although I understand why that’s so, I don’t think I would support the mechanical use of VAM in teacher employment decisions whether it were valid or not. Even if were accurate and valid, it’s too opaque. Teachers don’t understand it. Administrators don’t understand it. When the people being evaluated and the people who are evaluating them don’t understand the evaluation, there will be no confidence in the system. There’ll just be fear, confusion, paranoia, anger, resentment.
Flerp!,
You are wise beyond your posts!
There are profound reasons we cannot and should not attempt to be a strict meritocracy. It would actually be an idiocracy.
But allow me to plagiarize myself below, a response to te claiming the only way to measure good teaching is to measure its demonstrable impact on students’ lives.
Akademos on August 12, 2014 at 9:14 am
So, I guess Danielson wasted her time with all of that organizational mania.
But seriously folks, there are so many vicissitudes in life and specifically countercurrents as people rise in levels of awareness, echelons of so-called excellence or strata and responsibility, etc. One’s conscience and consciousness can get in the way of all sorts of gains, in some fields you can rise indefinitely without a pay raise, unless you step out and go another avenue, an avenue that simply may not suit you in any way. And so many things can be deep learning experiences that resonate throughout your life strongly but that do not necessarily lead to material gain or even happiness in the common sense, though you may be grateful for the unending insight and have no regrets.
Life, lives aren’t simple things. There is so much enrichment that you will probably never see on any sort of a metric, let alone a paycheck. And not all positive enrichment and profound resonances lead to better or even good things.
Wondering about whether other variables have been considered:
1) In school tracking and parent influence on how students are matched with teachers may mean that students who already are “more likely to succeed” are assigned to certain teachers making it appear that teachers are the cause rather that the student characteristics.
2) Non-random assignment of students often means that students with greater challenges are clustered in particular classes thereby influencing the expectations and achievement of students in those classes. As a result, students with similar demographics and prior achievement appear comparable, but are educated in very different contexts.
Arthur,
That is the topic of the first paper, “Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates”.
Here is a link if you want to understand how these issues are addressed: http://obs.rc.fas.harvard.edu/chetty/w19423.pdf
Reblogged this on peakmemory and commented:
More on value added measures of teaching
The problem with saying that value-added modeling is junk science is that it places all value-added modeling in the same box. Value-added modeling of student achievement is better than what happens in most states, which is that schools are regarded as good if their students perform well in standardized tests without taking into account their prior achievement. Value-added modeling attempts to control for some of the factors that influence student achievement (prior achievement, student characteristics, etc.) but not others, and some authors do it more carefully than others. But here’s the real danger. If we vilify value-added modeling, we give support to those who want to go back to ranking schools, districts, and teachers on the raw achievement of their students (i.e., without taking other factors into account). And that would be much worse. Value-added modeling is, as the ASA study authors note, almost useless for evaluating individual teachers, but can be used meaningfully at the school level.
Dylan William,
I see no purpose to ranking and rating schools. I also believe, based on extensive research, most recently the report of the American Statistical Association, that VAM is junk science. Standardized tests are being overused and misused. They are not the measure of students or teachers. Apparently the public and most parents agree. What you measure with such a test is usually family income.