Yesterday, Secretary of Education Arne Duncan had a story in the Huffington Post extolling his work in building respect for the teaching profession.

He has accomplished this, he says, by insisting that teachers be evaluated based on the test scores of their students.

Exhibit A of his success, he says, is Tennessee. Mr. Duncan relies on a report by Kevin Huffman, the state commissioner of education (former PR director for TFA, now employed by one of the nation’s most conservative governors).

The report says that since Tennessee won Race to the Top funding in 2010, it has seen remarkable results because it is now using test scores as 50% of teachers’ evaluations.

Leave aside for the moment the fact that leading researchers (like Linda Darling-Hammond of Stanford University and the National Academy of Education and the American Educational Research Association) say that these value-added measures are inaccurate, unreliable, and unstable.

It is simply bizarre to boast about a one-year change in state test scores. It has long been obvious that state test scores are less reliable than NAEP and that any real change requires more than one year of data as evidence of anything.

According to NAEP, the scores for Tennessee in both reading and math were flat from 2009-2011. Perhaps Secretary Duncan should wait for the release of the 2013 NAEP  before boasting about the dramatic gains in Tennessee.

In the meanwhile, I urge Secretary Duncan and his staff, and Commissioner Huffman, to read the joint statement of the National Academy of Education and the American Educational Research Association on value-added testing and its misuse in evaluating teachers. It is called “Getting Teacher Evaluation Right.” I am sure that the Secretary agrees that policy should be informed by research.

Here is the executive summary:

Consensus that current teacher evaluation systems often do little to help teachers improve or to support personnel decision making has led to a range of new approaches to teacher evaluation. This brief looks at the available research about teacher evaluation strategies and their impacts on teaching and learning.

Prominent among these new approaches are value-added models (VAM) for examining changes in student test scores over time. These models control for prior scores and some student characteristics known to be related to achievement when looking at score gains. When linked to individual teachers, they are sometimes promoted as measuring teacher ―effectiveness.‖

Drawing this conclusion, however, assumes that student learning is measured well by a given test, is influenced by the teacher alone, and is independent of other aspects of the classroom context. Because these assumptions are problematic, researchers have documented problems with value-added models as measures of teachers‘ effectiveness. These include the facts that:

1. Value-Added Models of Teacher Effectiveness Are Highly Unstable: Teachers‘ ratings differ substantially from class to class and from year to year, as well as from one test to the next.

2. Teachers’ Value-Added Ratings Are Significantly Affected by Differences in the Students Who Are Assigned to Them: Even when models try to control for prior achievement and student demographic variables, teachers are advantaged or disadvantaged based on the students they teach. In particular, teachers with large numbers of new English learners and others with special needs have been found to show lower gains than the same teachers when they are teaching other students.

3. Value-Added Ratings Cannot Disentangle the Many Influences on Student Progress: Many other home, school, and student factors influence student learning gains, and these matter more than the individual teacher in explaining changes in scores.

Other tools have been found to be more stable. Some have been found both to predict teacher effectiveness and to help improve teachers’ practice. These include:

  • Performance assessments for licensure and advanced certification that are based on professional teaching standards, such as National Board Certification and beginning teacher performance assessments in states like California and Connecticut.
  • On-the-job evaluation tools that include structured observations, classroom artifacts, analysis of student learning, and frequent feedback based on professional standards.

    In addition to the use of well-grounded instruments, research has found benefits of systems that recognize teacher collaboration, which supports greater student learning.

    Finally, systems are found to be more effective when they ensure that evaluators are well-trained, evaluation and feedback are frequent, mentoring and coaching are available, and processes, such as Peer Assistance and Review systems, are in place to support due process and timely decision making by an appropriate body. 

    And here is a short summary of the report by Linda Darling-Hammond.