Archives for category: Teacher Evaluation

Governor Andrew Cuomo doesn’t understand why students should opt out of state testing, because the tests won’t count against students. Instead, they will be used to rank and evaluate teachers. So, he wonders, why should students opt out?


But the governor is not recognizing the consequences of his statement. As one blogger asked, why should students take the tests if they are meaningless?


That is a good reason to opt out. Why should students waste their time on tests that are meaningless?


But more important, if the students are not motivated to do their best, if they know the tests don’t count, why should teachers be evaluated by their students’ lack of effort? Taking a test is not like stepping on a scale. The scores vary depending on many factors, not least of which is motivation. If students go into the tests knowing they don’t matter, why should they try?


How will Governor Cuomo feel if his plan fails most teachers? Maybe happy. But will he be responsible for harming the state’s entire public education system? He should be.

The editorial board of the “Journal-News” in the Lower Hudson Valley calls out the absurdity of Goveror Cuomo’s teacher evaluation plan. The deadline for a new plan is June 30, which is impossible.

They understand why parents are angry at the testing system and the governor:

“Declining morale in neighborhood schools is one big reason that many parents boycotted the state tests. How can Cuomo not see the connection?

“Now our leaders are racing to fix the system, but are likely to make it worse. Cuomo and legislative leaders, as part of their budget agreement, gave the state Board of Regents until June 30 to re-create the evaluation system, setting strict rules that tie the Regents’ hands.
Stop it. It’s time for the Board of Regents to take a stand – and stand up to Cuomo. The board should declare that it can’t slap together a viable evaluation system. New York should keep its current system in place and use at least the rest of 2015 to design a system that would promote classroom instruction and hold teachers accoutable.

“Judith Johnson, the Lower Hudson Valley’s new representative on the Board of Regents, has the right idea. “What the governor has put in place makes no sense,” she said. “If you want a scholarly system, you can’t throw it together in 30 to 60 days. If we ignore the science behind teacher evaluations, it’s just a political decision.”

Does the Board of Regents have the backbone to tell the governor and the legislature that they are wrong? Will they stick to science and turn their backs on Cuomo’s vindictive agenda?

Merryl Tisch, chancellor of the Néw York Board of Regents, has delayed implementation of Governor Andrew Cuomo’s draconian and misguided plan to evaluate teachers by test scores.

When Néw York sought Race to the Top money, it promised that test scores would count for 20%. Under pressure from Governor Cuomo, the proportion rose to 40%. Cuomo was angry when almost every teacher was rated effective or highly effective. He wanted to fire teachers. Tisch wrote a letter to Chomo agreeing with his demand to raise the testing proportion to 50%.

The legislature caved during budget negotiations and passed a “matrix” that implies 50% but left the final determination to the Regents. Tisch decided more time was necessary and extended the deadline.

The sad part of this drama is that no one ever refers to research. Numerous studies and reports have refuted the validity of test scores for measuring teacher quality. Start with the American Statistical Association’s statement on VAM. There are too many variables that the teacher does not control that influence test scores.

The current dispute seems to be about whether to misjudge teacher quality sooner or later.

In the midst of a story about a teacher who walked 150 miles to deliver a letter to Governor Cuomo, there was mention of a statement about the opt outs by the State Education Department.

Basically the SED said that the opt outs will not derail its determination to rate teachers based on test scores.

The State Education Department released a statement saying, “We are confident the Department will be able to generate a representative sample of students who took the test, generate valid scores for anyone who took the test, and calculate valid State-provided growth scores to be used in teacher evaluations.”

The SED did not say how it will generate valid ratings for teachers whose students opted out, especially in districts where the majority of students did so; nor did it say how it would generation valid ratings for the 70% teachers who don’t teach the tested subjects. Even if only 10% opted out, how will the SED know if they were high-scoring students or low-scoring students? The SED will succeed in making a process of dubious value even less valid. The SED is determined to do the wrong thing with or without adequate data.

Read More at:

Since 2009, when Race to the Top was launched, Arne Duncan has been an avid proponent of evaluating teachers by test scores. Some states evaluate teachers by the scores of students they never taught or subjects they don’t teach. To be eligible for Race to the Top money, states had to agree to evaluate teachers by test scores. To get a waiver from impossible mandates on NCLB, states had to agree to do it.

When Duncan testified, Congresswoman DeLauro asked if he was willing to rethink VAM. He responded that the federal government doesn’t require VAM. Duncan said that while the Feds don’t require VAM, they require evidence of growth in learning.

Sounds like VAM. Can anyone make sense of this?

*I had several spelling errors in the original post, due to composing it on my cellphone in a bumpy car-ride. I fixed them.

Valerie Strauss analyzes the debate between Chancellor Merryl Tisch and me on MSNBC’s “All In With Chris Hayes.”

She includes the transcript.

What she found odd was Tisch’s resoonse right after I explained that teachers are not allowed to see how individual students answered questions, so the tests have
no diagnostic value. All that teachers see is the students’ scores and how they compare to others. There is no item analysis, no description of students’ weaknesses or strength.

Tisch answered:

“TISCH: Well, I would say that the tests are really a diagnostic tool that is used to inform instruction and curriculum development throughout the state. New York State spends $54 billion a year on educating 3.2 million schoolchildren. For $54 billion a year I think New Yorkers deserve a snapshot of how our kids are doing, how our schools are doing, how our systems are doing. There is a really important data point.”

She began by saying that the Common Core standards and tests would close the achievement gap, although there is no evidence for that claim. Then she said the tests are a valuable diagnostic tool, but they don’t provide enough information to perform that function. Then she said the tests would show how our schools were doing, which I disagree with, because the passing mark was set artificially high, guaranteeing that most children would fail.

Unfortunately I had no opportunity to respond.

Miriam Kurtzig Freedman, an attorney who represents public schools in education matters, including testing and special education—and is currently working to reform special education—posted this comment. Her website is


Can we really use student tests to measure teacher effectiveness?


Miriam Kurtzig Freedman, M.A., J.D.


This is the year! Tests related to the Common Core State Standards (CCSS) are launching across our country. They are designed to measure how well students are learning the CCSS. Meanwhile, some states, with federal encouragement, plan to use them also to measure teacher effectiveness. Is this use valid?


There is no shortage of controversy about educational testing and, unfortunately, this controversy includes the opportunity to file lawsuits. The use of student achievement data to also evaluate teacher effectiveness is certainly controversial. Notably, Arne Duncan, the Secretary of Education, gave states a year’s reprieve on implementing this practice. Across the country, teacher unions have called it unfair. My concern is far more basic. It’s about validity.


As an attorney who has represented public schools for more than 30 years, I am concerned about this multipurpose use. It may not get us what we need—a valid, reliable, fair, trusted, and transparent accountability system. The tests at issue include the PARCC and SBAC, two multi-state consortia that are funded by the U. S. Department of Education and private funders. They were charged with developing an assessment system aligned to the CCSS by the 2014-15 school year.


At last count, these consortia have 27 states and the District of Columbia signed up— affecting 42% of U.S. students according to Education Week.
The media remind us constantly that our ‘failing’ schools need fixing; that, to do so, we should assess student skills and knowledge to help teachers improve instruction; that we also need to evaluate and rate teachers and weed out poor performers. And we are told that these tests can be multipurposed to do all of the above!


Sounds good? Actually, it sounds too good to be true. Does this multipurpose use to evaluate teacher effectiveness clear a key psychometric hurdle: test validity?


What is test validity?


At its core, it is the basic, bedrock requirement that a test measure what it is designed to measure. Thus, if a test is designed to measure how well 3rd graders decode, we judge the test according to how well it does that. Can students decode? If it is designed to be predictive; say, to measure if students are ‘on track’ or progressing toward college or career-readiness, we judge it accordingly. Either way, we must ask if a test whose purpose is to measure what students learn or whether they are ‘on track’ can also be used to measure something else—such as how well teachers teach?


So what are these tests’ purposes? For answers, let’s review the PARCC and SBAC websites. First PARCC, the Partnership for Assessment of Readiness for College and Careers:


PARCC is a group of states working together to develop a set of assessments that measure whether students are on track to be successful in college and their careers. These high quality, computer-based K–12 assessments in Mathematics and English Language Arts/Literacy give teachers, schools, students, and parents better information whether students are on track in their learning and for success after high school, and tools to help teachers customize learning to meet student needs.


PARCC is based on the core belief that assessment should work as a tool for enhancing teaching and learning. Because the assessments are aligned with the new, more rigorous Common Core State Standards, they ensure that every child is on a path to college and career readiness by measuring what students should know at each grade level. They will also provide parents and teachers with timely information to identify students who may be falling behind and need extra help. [Emphasis added]


Second, the SBAC, Smarter Balanced Assessment Consortium:


The [SBAC] is a state-led consortium working to develop next-generation assessments that accurately measure student progress toward college- and career-readiness. Smarter Balanced is one of two multistate consortia awarded funding from the U.S. Department of Education in 2010 to develop an assessment system aligned to the Common Core State Standards (CCSS)by the 2014-15 school year.


The work of Smarter Balanced is guided by the belief that a high-quality assessment system can provide information and tools for teachers and schools to improve instruction and help students succeed – regardless of disability, language or subgroup.


Smarter Balanced involves experienced educators, researchers, state and local policymakers and community groups working together in a transparent and consensus-driven process. [Emphasis added]


Clearly, these tests’ purpose is to (a) measure student progress on the Common Core State Standards (CCSS) and college or career readiness, (b) give teachers and parents better information about students, and (c) help improve instruction. No mention is made of gauging teacher effectiveness.


Yet, questions about the validity of using these tests in this multipurpose way seem to be missing from national discussions, even as other validity issues are raised. For example, questions are raised about score validity when tests are administered in different ways (on a computer or with paper and pencil) and at different times of the year.


Also discussed are questions about whether these tests are aligned to the CCSS. The media reports battles among states, unions, and others about how to measure teacher effectiveness through these tests; e.g., through value-added models, student growth percentages, or other approaches. But, questions of basic test validity from the get-go about this multipurpose use of these tests are not part of today’s public discourse.


They should be.


If we continue on this track of creating high stakes for teachers with tests designed for a different purpose, we may well end up with unintended consequences, including distrust of the system, questionable accountability, and lawsuits.


My suggestion? Given the reprieve for states and growing concern among the public about these tests and the CCSS themselves, test consortia and our federal and state governments should take a deep breath and do two things.


First, the consortia should remind the public that the purpose of these tests is to measure student achievement on the new CCSS and career and college readiness, provide better information to teachers and parents, and improve instruction.


Second, the states (with federal approval and encouragement) that intend to use these results also to evaluate teacher effectiveness must inform the public explicitly about how they intend to validate the tests for this new purpose. They need to provide solid proof that their proposed use, which differs from the stated purpose of these tests, is valid, reliable, and fair. The current silence is worrisome, not transparent, and unwise.


This test validity issue needs to be fully aired and resolved satisfactorily before we can begin to tackle the larger issues about the multiple uses of testing. Otherwise, in our litigious land of opportunity, the ensuing battles may be costly and not pretty. Let’s not go there.

The following was posted as a comment on the blog:


Dear Dr. Ravitch,


I have spent the last week and a half reeling from the shot across the bow that public education took on March 31st when the New York State Legislature ostensibly signed off on its destruction with the passing of the New York State Budget, and its attached legislation, S2006B-2015. As a teacher who is passionate about what she does, with two years of failing State Growth Scores, I know my days as a teacher are numbered. I am left with only one choice, to continue to act out of love for my students until the day comes when my district will be forced to remove me from the classroom and students I graciously serve.


My first act of love for my students, since the passing of this legislation and the absolute betrayal of my own elected officials, is the following letter I sent to the Board of Regents this afternoon.


Dear New York State Board of Regents:


This letter is in response to New York State Law S2006B-2015, dated March 31, 2015. I write you as a teacher of thirteen years who loves her profession and her students more than words could possibly capture. There has not been one day in the classroom that I wished away. Not one paycheck that I did not regard with awe over the fact that I could be paid to do a job I loved so deeply. Not one August that I did not greet with excitement in anticipation of new students, new challenges and new victories. Nor one end of school year I did not confront with sadness over the end of a ten-month partnership with my students filled with reading and writing and thinking and questioning.


Teaching is my passion. Every single day I ask myself what went wrong? Who did I not reach? What can I do tomorrow to push harder and support the growth of my students? I sincerely love teaching because after thirteen years, I am clear on only one thing – I will never have all of the answers. And I like that challenge. Each year brings new students, new families, new strengths and new areas of opportunity into my classroom. My voracious appetite for meeting their respective needs is confronted by the infinite possibilities that education offers.


This year, we had an interesting scenario. It became very clear on reading comprehension assessments that students understood what they were reading, but of the fifteen students in my class receiving Academic Intervention Services (AIS) for reading, out of a total of twenty-seven students, eight continuously earned failing scores on weekly assessments. We asked ourselves, is it the vocabulary in the questions? No. Is it vocabulary in the choices? No. We realized that students could not see the correct answers in the choices because they lacked the transferal skills to get themselves from what they knew the answers were to the choices given. We started giving the students the questions without choices, and having them write their own answers. Then we gave them the choices and they had to select the choices that most closely resembled their answers. Our failure rate dropped substantially from eight students to one to two students. This is what teaching is. Every single day we must go in, assess what our students need from us, and devise ways to meet those needs.


I often tell people that a teacher’s job is never, ever done. I could work around the clock twenty-four hours a day, seven days a week and still have things I want to accomplish in the classroom. As teachers, we have to eek out as much time as we can before school, during school and after school, and spend that time on the work we determine offers our students the greatest return on investment. This is why grading assessments we provide is so important to us. Students and teachers require continual assessment feedback so instructional time can best serve students’ needs.


Where is all of this going? It boils down to assessment. Your board has been asked to craft an APPR plan that bases 50% of a teacher’s APPR on assessments you deem appropriate for this purpose. Much of what I am about to discuss pertains solely to the current grades three through eight state testing program, but please keep in mind that these thoughts relate to any assessment we deem appropriate for removing a child’s teacher from his/her classroom.


Any assessment we use for the state’s 50% of the APPR must:


1. Include reliability and validity testing that demonstrates the instrument’s ability to measure what we are asking it to measure. Assessment in New York State public school classrooms must measure a student’s progress toward New York State Standards.


2. Be created by an entity that does not also sell curricular materials to school districts. The 2013 New York State 6th ELA exam included proprietary material that Pearson had also included in its series, Reading Street, which it sells to districts. This is a serious conflict of interest.


3. Have the ability to measure all growth a student experiences during a school year. The current methodology provides simple scores of one, two, three and four limiting its ability show us where growth has or has not transpired, for a variety of reasons.


4. Inform teachers and parents of information both parties do not already know. We know who has difficulty reading and who does not. We must use an assessment that offers rich details about where our students struggles are, as well as what students are doing well.


If we continue on our current path, teachers like me who love what we do, and have an innate desire to be the best teachers we can for our students, will be gone. For the last two years, I have been given a one and a two respectively for my State Growth Score. If you proceed with the State Legislature’s plan, and your current method of assessments, you will be taking good teachers away from the students who need them, using fraudulent instruments. With your June 30th deadline looming, I beg you to contemplate the gravity of this system, and as the law prescribes, use the next few months to speak with teachers and parents who are invested in this system, to craft a plan that places children first.


In all earnest, I am willing to meet with you anytime to discuss the frailties of our current system and measures we can take to meet the law’s deadline in a way that best serves public school children. They are what matter most.


Warm wishes,


Melissa K. McMullan
6th Grade Teacher
Comsewogue School District
Port Jefferson Station, NY

The House members of the Tennessee legislature voted unanimously to reduce the role of test scores in teacher evaluations, at least temporarily. Controversy continues about whether teachers and other school staff should be evaluated by the scores of students they don’t teach. (Note: readers, please tell Andrew Cuomo that other states are reducing the role of test scores, not increasing them.)


A bill that temporarily would alter the amount that student test score growth impacts teacher evaluations in Tennessee passed unanimously in the House Thursday. But first, lawmakers debated the merits of a system that grades teachers based on scores in subjects they don’t teach.
The proposal, brought to the legislature by Gov. Bill Haslam’s administration, now awaits consideration by the full Senate.
The bill proposes to phase in the weight of test scores as the state transitions to its new assessment, called TNReady, which will be rolled out during the 2015-2016 school year. Under the proposal, scores from the new test would account only for 10 percent of the teacher evaluation score in 2015-16 and 25 percent in 2016-17, before returning to the current 35 percent in the 2017-2018 school year.
The policy also addresses concerns that teachers of non-tested subjects — such as art and physical education, as well as school counselors — can be penalized for test scores they don’t directly impact. The bill proposes that student growth for those positions count for 10 percent in 2015-16, down from 25 percent, and move to 15 percent in subsequent school years.
Some legislators said that provision is inadequate, however. House Minority Leader Craig Fitzhugh (D-Ripley) offered an amendment that would prohibit test scores from impacting evaluations of non-testing teachers at all. He said allowing educators to be graded based on the scores of other teachers is akin to grading students based on the scores of their peers.
“Parents would be outraged,” Fitzhugh said.
Rep. Mark White maintained the bill is fair without the amendment, however, because no teacher works in isolation. “Does the librarian not have an effect on student reading?” he asked. “Can a guidance counselor not play a role in affecting student performance?”

Merryl Tisch is the Chancellor of the New York Board of Regents. She has been a Regent for 20 years. She is a strong supporter of high-stakes testing. In this article, she criticizes those who opt out and who encourage others to opt out. She says they are hurting the kids who need help the most. She thinks the schools would neglect the neediest children if they were not tested every year. Since no high-performing nation tests every child every year, they must be overlooking their neediest children.


She writes:


“It used to be easy to ignore the most vulnerable students. Without assessments, it was easy to ignore the achievement gap for African-American and Latino students. Without an objective measure of their progress, it was easy to deny special education students and English Language Learners the extra resources they need. Obviously we still need to do more for those students, but now is not the time to put blinders back on.


“Without a comparable measure of student achievement, we risk losing track of the progress of all of our students in all of our schools. This risk applies not only to students of color, urban and rural students, and students with special learning needs. Many students from affluent districts do not make the year-to-year progress necessary in today’s world and need early support to get back on track. It’s far better to find that out while they’re still in the classroom than wait until they’re out of school and faced with real world challenges in college or the work place without the skills they need to overcome those challenges.”


One would think after a dozen years of high-stakes testing that there might be evidence that the children she names have benefitted, that poverty has decreased, but she fails to mention any evidence of the benefits of high-stakes testing.


Celia Oyler, a faculty member at Teachers College, Columbia University, read Chancellor Tisch’s letter and drew different conclusions. She wrote the following comment to The Hechinger Report, where Tisch’s article appeared:


Professor Celia Oyler wrote:


“Very few parents would be refusing the New York State Pearson tests if they were decent measures of learning. And if they were decent measures of learning from year to year there would be no Teachers of Conscience movement of teachers who are refusing to administer the high stakes tests. There are so many flaws with what Chancellor Tisch and Commissioner King have done:


“(1) These tests are not measures of what an individual student has learned from year to year: they are not vertically aligned. State Ed has created what they call growth scores, but calling something by a name does not make it real. In fact, these scores do not measure growth from year to year, but measure the score on the test one year and the score on a different test the next year.


“(2) The NYS tests are too blunt to measure learning of the students Chancellor Tisch proclaims to care most about: the children who do not do well on standardized measures (whether due to horrible stresses that often accompany poverty and affect learning, or from a print or language or intellectual disability, or because they are learning English as an additional language). And we also know from numerous adequately designed studies that a teacher accounts for only about 10-15% of test score variance on any child: to hold one teacher 50% responsible for a single test score is scientifically unjustifiable. And doing so damages the chances for such children to receive the education they need. Children who struggle with school tasks do not need more test prep curriculum (which is what they are mostly getting — get out to schools more, Chancellor Tisch!), they need more rich, integrated, experiential, three-dimensional learning that is organized around meaning and not memorization. Punishing children, their schools, and their teachers for poor scores on poor tests is not the way to promote the rich learning environments they desperately need.


“(3) The misuse of so called Value Added Models or Measures takes lousy tests and then puts them through a formula not even designed to measure one teacher’s influence on the score from year to year: VAMs have greatest reliability when used on groups of teachers across multiple years. To make matters worse, most all researchers continually agree that a teacher accounts for about 10-15% of any standardized score variance. So teachers in NYS are punished by giving them a score that was not even designed to measure what Chancellor Tisch has made it measure. Study after study after study demonstrates that VAM has confidence intervals of as much as 60%! This is utterly insane and has enraged educators who understand what is being done to them.


“(4) Chancellor Tisch has just announced that some districts and schools should be exempt from this high stakes bad math folly that she and her cronies have wrought upon the children and teachers of New York State. This is an abomination. We have decades of research demonstrating the link between wealth and standardized test scores. Yes, there are exceptions: we have schools where children from low-income schools have learned to do well on a high stakes test. We need to learn more from these anomalies. But even within the anomalies researchers continually find that doing well on one high stakes test does not transfer to other high stakes tests. This means that students can be taught how to do well on a high stakes test. It does not mean they are learning content, concepts, and skills of value, that transfer. This raises the question: Do we want learning, or do we want achievement test scores?


“It is apparent to many parents who are refusing the tests, and to many teachers who are taking up activism against these brutal educational “reforms,” that Chancellor Tisch and her ilk care way more about a reductive number on a spreadsheet than they care about real learning and about actually improving the possibilities for the most marginalized children in our society. New York State teachers and children deserve support and assistance, particularly in economically distressed communities. Tisch and her millionaire friends can do much better than punish us all with their willful ignorance.”


Celia Oyler, PhD
Box 31 Teachers College
525 W. 120th Street, NY, NY, 10027
office phone: 212.678.3696
office location: 312 Zankel Hall


Get every new post delivered to your Inbox.

Join 150,045 other followers