Jack Schneider and Pat Jehlen: Test Scores are a Lousy Way to Measure School Quality, But…

Jack Schneider of the College of Holy Cross and Pat Jehlen of the Massachusetts’ Senate Committee on Education have a sensible article about the use and misuse of test scores to measure school quality.

As they point out, test scores are highly correlated with socioeconomic status, so schools that serve low-income students look like “bad” schools even when they are doing a good job of educating the students.

The matter of how a state identifies the “lowest performing” schools is a high-stakes enterprise. After all, labeling a district as such can lead to the flight of quality-conscious parents—weakening the district’s capacity and increasing segregation by class. And additionally, schools identified among the lowest ten percent of performers face the charter “cap lift” provision, allowing the state to send up to 18 percent of the district’s funds to charter schools — twice as much as in other districts. When the cap is raised and more charters are granted, students leave, budgets are cut, and schools can close. While some families now have a new choice, others find their chosen school closed.

To be clear: some districts really are ineffective. And no one wants children to be trapped in failing schools. Yet the simple truth is that current approaches for measuring effectiveness are methodologically weak and ethically dubious.

Standardized test scores, which constitute the lion’s share of how we evaluate school effectiveness, are highly problematic. Standardized tests capture a narrow slice of life in schools and reflect only a fraction of what the public values. They are designed to be time efficient and cost-effective rather than to align with what we know about cognitive development. And they are subject to gaming.

Although they recognize that standardized tests in general capture only a very small aspect of what schools and teachers do, they think that they could be used more wisely.

Instead of judging schools solely by test scores, they might be judged–at least in part–by student growth.

This is certainly wiser than what Massachusetts and most other states do now.

But if you bear in mind that standardized testing itself is highly contested, even student growth scores will contain defects–just different defects.

Imagine living in a world without standardized testing. That might be bad for Pearson, but just imagine it.

Such a world exists.

It is found in almost every private school in America, where teachers make judgments about their students’ progress and their needs.

It is found in Finland, where teachers teach as if they were in an American private schools.

Why are we so wedded to those standardized tests, which originated as IQ tests, filled with racial and ethnic and class bias?

Which state or district will be first to try a new way of assessing school quality, for example, with an inspectorate of expert educators?

Ohio Algebra II teacher says:

May 13, 2014 at 9:13 am

Somewhat tangential point, but I’m getting as annoyed about the term “high-stakes testing” as I am with “education reformer.” It just doesn’t seem apt to me. How do you describe something that takes the infinitude of what goes on in a school year and reduces it down to a single number for each kid? “High-stakes” isn’t the answer for me.

Duane Swacker says:

May 13, 2014 at 9:34 am

I call it excrement of bovine origin.

Harold says:

May 13, 2014 at 11:17 am

The term designates a test that will be pivotal in determining someone’s future, as opposed to an evaluative test that is simply a snapshot of how the test-taker is doing.

Labor Lawyer says:

May 13, 2014 at 9:43 am

I concede that, as a retired labor attorney, I have little/no special expertise re what goes on in the nation’s public schools. However, common sense suggests that the instructional program (who the teachers are, what they teach, how they teach it, and perhaps class size) are pretty much the same everywhere in the US. This is more true today than it was 50 years ago as per-student spending in most of the low-SES/inner-city schools has largely caught up (or even passed) per-student spending in the high-SES/suburban schools. Certainly, there are more AP history courses in the high-SES suburban schools and more remedial English courses in the low-SES inner-city schools — but, this mostly reflects differences in what the students need not differences in the instructional programs (that is, if the low-SES school has an AP history course or the high-SES school has a remedial English course, those courses will be very similar to the courses being taught elsewhere).

From this, it follows that the differences between schools in instructional-program quality are relatively small. The big differences between schools are in student characteristics, not instructional-program quality. If we took all the administrators/teachers from a high-SES suburban school and switched them with all the administrators/teachers from a low-SES inner-city school, the high-SES suburban school would continue to have high test scores and the low-SES inner-city school would continue to have low test scores.

If this is true, then any discussion of “school quality” or “school reform” that focuses on the quality of the instructional programs will be largely a waste of time. The low-SES schools will have low test scores; the high-SES schools will have high test scores. The keys to raising the test scores in the low-SES schools will not involve changing teachers/principals or closing the school and sending the same students to another school. Instead, the keys will involve identifying the “something(s)” that predisposes the low-SES students to academic failure while predisposing the high-SES students to academic success and then implementing reforms specifically targeting that “something” so as to give the low-SES student a better chance to achieve academic success. One “something” almost certainly involves the quality/quantity of verbal interaction with adults from birth to kindergarten. Another “something” might involve the predisposition of low-SES students to minor classroom misbehavior that constantly disrupts instruction and creates anti-academic-achievement peer pressure.

In any event, it’s student characteristics, not “school quality”, that is causing those low test scores in the low-SES schools.

Dienne says:

May 13, 2014 at 10:40 am

Stop talking sense.

I’ve been saying very similar things since “failing school” became the buzzword of the day (year, decade). It’s not the school that takes the tests – it’s the kids in the school. So what is it about kids in high SES schools that make them “successful”, while kids in low SES schools tend to “fail”? Gee, I can’t think of a thing….

FLERP! says:

May 13, 2014 at 10:42 am

“Another “something” might involve the predisposition of low-SES students to minor classroom misbehavior that constantly disrupts instruction and creates anti-academic-achievement peer pressure.”

I think you may be underestimating the variations in teacher quality among schools and the impact it has on students (although certainly I can’t prove that). But the point above seems like it’s the elephant in the room in every discussion about why “low-performing” schools are low-performing.

- Labor Lawyer says:
  
  May 14, 2014 at 9:36 am
  
  I used to think that the minor-but-endemic-misbehavior in the low-SES schools was the major cause of poor academic achievement. Having spoken to many teachers, my thinking has evolved somewhat. Although I still think that the misbehavior is a huge problem (and an unaddressed elephant in the school-reform room), I now think that the misbehavior is itself the result of many low-SES students starting school with inferior verbal/vocabulary/abstract thinking skills (caused by inferior adult/child verbal interactions from birth through kindergarten). This predisposes those students to academic failure which, in turn, causes the students to view school as difficult/frustrating. The students then respond to that difficulty/frustration by minor misbehavior. When there are many such students in a class (as in the low-SES schools), the misbehavior becomes endemic and self-reinforcing. (My thinking is particularly influenced by the observation of many teachers that the low-SES-school students behave relatively well in kindergarten and first grade, with behavior getting progressively worse each year. If the cause of the misbehavior were parental attitudes towards rules/authority, the misbehavior would be worse in the early elementary grades and gradually improve as school influences gradually replaced parental influences.)
- FLERP! says:
  
  May 14, 2014 at 9:58 am
  
  I don’t know whether minor misbehavior/disruption/etc. is the “major cause” or just “a cause,” and I certainly don’t know what the major cause of the cause is. I do know that I’ve found it interesting to look at the teacher portions of the learning environment surveys for both low-SES and higher-SES schools and see how they respond to questions about whether a “lack of discipline” is a problem, whether bullying is a problem, whether students and parents treat teachers with respect, and whether they agree with the statement “I am safe at school.”
- Ang says:
  
  May 14, 2014 at 9:15 pm
  
  Don’t pot too much stock in workplace surveys.
  In my experience the people most likely to respond at all are those with an ax to grind.
  I have known some teachers to be angry with this or that issue and give a very angry review of everything as a means to ” get back at” an administrator.
  
  Any possibly similar situation at your workplace that just does not get published?
- FLERP! says:
  
  May 14, 2014 at 9:30 pm
  
  Ang — I hear what you’re saying. I think there are ways to account for what you’re talking about, though. With small schools, any one survey is more difficult to read. But trends are informative, and trend comparisons with similarly sized schools are informative. Same with similar SES schools. And generally, the teacher response rates are pretty high, much higher than parents. So I get what you’re saying, but there’s definitely interesting and possibly informative information in the teacher surveys.
Harold says:

May 13, 2014 at 11:18 am

Labor lawyer, you are confusing the issue with facts.

Ang says:

May 13, 2014 at 12:52 pm

“If we took all the administrators/teachers from a high-SES suburban school and switched them with all the administrators/teachers from a low-SES inner-city school, the high-SES suburban school would continue to have high test scores and the low-SES inner-city school would continue to have low test scores.”

Correct.
Having taught is a VERY high performing, posh public high school populated mainly by the children of tenured college professors, lawyers, accountants etc. and a low SES, high ESOL public school populated mainly by children of undocumented day laborers; I can assure you, the faculties were not different at all.

Chiara Duggan says:

May 13, 2014 at 9:50 am

“reflect only a fraction of what the public values.”

Ed reformers should know this better than anyone. Most of the charter schools in Ohio don’t do any better on standardized tests than the “sending” district, many do worse, yet we open more and more charter schools every year.

Obviously, parents are “choosing’ based on something other than test scores, and the people opening these schools are using something other than test scores to justify their starting a new school.

The same is true for vouchers for private schools in Milwaukee and Cleveland. Taken as a whole, they don’t do any better on tests than public schools.

Their own charter/voucher experiment discredits the idea that parents use test scores as the Holy Grail measure of what it valuable in a school, yet they continue to insist that public schools should be valued exclusively on test scores. It doesn’t play out that way in their own charter and private schools. Why would it be true of public school parents?

KrazyTA says:

May 13, 2014 at 10:09 am

To the viewers of this blog: please excuse my presenting this again.

From Jim Horn and Denise Wilburn, THE MISMEASURE OF EDUCATION (2013), pp. 1, 55, and 147:

“What was once educationally significant, but difficult to measure, has been replaced by what is insignificant and easy to measure. So now we test how well we have taught what we do not value.” — Art Costa, professor emeritus at Cal State-Fullerton

“Initially, we use data as a way to think hard about difficult problems, but then we over rely on data as a way to avoid thinking hard about difficult problems. We surrender our better judgment and leave it to the algorithm.” —Joe Flood, author of THE FIRES

“When the right thing can only be measured poorly, it tends to cause the wrong thing to be measured, only because it can be measured well. And it is often much worse to have a good measurement of the wrong thing—especially when, as is so often the case, the wrong thing will in fact be used as an indicator of the right thing—than to have poor measurements of the right thing.” — John Tukey, mathematician, Bell Labs & Princeton University

In a more perfect world, standardized tests scores would play a very minor supporting role in assessing teaching and learning. In our world, they are given an outsized and toxic importance because they are furiously pushed by free-market fundamentalists who worship at the feet of ROI aka $tudent $ucce$$.

With the proviso that self-styled “education reformers” are openly promoting a two-tiered education system: one that ensures genuine teaching and learning for THEIR OWN CHILDREN [aka the advantaged few] and another that mandates low-level skills training and docility for OTHER PEOPLE’S CHILDREN [aka the vast majority].

The former severely deemphasizes standardized testing, the latter does the opposite.

Let’s emulate the former. What’s good for the goose is good for the gander.

😎

Duane Swacker says:

May 13, 2014 at 10:36 am

Quit beating around the bush and tell it like it really is!

“Yet the simple truth is that current approaches for measuring effectiveness are methodologically weak and ethically dubious.”

NO! They aren’t “methodologically weak and ethically dubious”. They are COMPLETELY INVALID (as proven by Noel Wilson) and their usage is COMPLETELY UNETHICAL.

“Standardized test scores, which constitute the lion’s share of how we evaluate school effectiveness, are highly problematic.”

NO!, They are not “highly problematic”. They contain so many epistemological and ontological errors that any results are COMPLETELY INVALID AND HARM INNOCENT STUDENTS.

“Instead of judging schools solely by test scores, they might be judged–at least in part–by student growth.”

NO! Student growth percentiles suffer the same INHERENT PROBLEMS that render any results “VAIN AND ILLUSORY.”

Which state or district will be first to try a new way of assessing school quality, for example, with an inspectorate of expert educators?”

That has already been done in Missouri, but the process was determined to be too time consuming and finding those “expert educators” was difficult and costly so MODESE went the cheap route, all metrics all the time.

Laura H. Chapman says:

May 13, 2014 at 10:52 am

“Instead of judging schools solely by test scores, they might be judged–at least in part–by student growth.”
This is not an improvement of any kind, but the precise language from Race to the Top Legislation (see reference below).

In federal and state policy “student growth” is just a euphemism for a gain score from pre-test to post-test, or year-to-year. In other words, the term “growth” has been thoroughly corrupted to mean just another score, and preferably a score with properties that can be processed to produce a VAM–value added score. (See reference below on the new grammar…)

Do not be mislead. The marketeers of “growth” as if this is some gold standard or “fair” measure for judging educational activity are engaging in a propaganda campaign. Participants include USDE and its hired hands who know that this term “growth” has a rich and elaborate semantic reach in education. They are cynically trying to cut away understandings of growth and development as teachers understand it for individual students–multifaceted and asynchronous (e.g. bright but socially awkward; coordinated dancer, but not an athlete; enchanted with calligraphy but has terrible handwriting). To be sure, there are normative patterns for a large number of students, but so-called “developmental levels” also mask all of the wondrous variability in students. Forget all that, the new meaning of “growth” is a gain or increase in a metric derived from a test.

A perfect example of the marketing effort on behalf of redefining “human growth” (as a difference in metrics) is the infamous “Oak Tree Analogy” (see reference below)–that conveniently ignores that fact that students, unlike trees, have minds of their own.

I call this a cynical move because the oak tree analogy is framed to place teachers in the role of workers in a nursery in charge of providing the “nutrients” that are needed for trees to thrive. This frame, as Lakoff and Johnson remind us, taps a “nurturing parent” metaphor for teachers, and also the traditional role for women. The campaign to portray teachers as bad nurturers, lay, soft, uncaring is nowhere more evident that in the excessive use of “rigor” and “rigorous” as obligatory adjectives for almost everything bearing on “improvements” in education. See Lakoff, G., & Johnson, M. (2008). Metaphors we live by. University of Chicago press.

Repeat. Federal and state policy documents define “growth” as a gain in pre-test to post test scores, and a gain in year-to-year scores. Such scores are used to radically simplify judgments about districts, schools, teachers, and students. The distorted views of education produced by aggrandizing tests and “metrics” as if these refer to the actual complexities of human growth and development–perceptual, intellectual, social, physical, creative, aesthetic–is a fraud.

For federal language for “growth” see: Final Definitions 559751-52 Federal Register / Vol. 74, No. 221 / Wednesday, November 18, 2009 / Rules and Regulations DEPARTMENT OF EDUCATION 34 CFR Subtitle B, Chapter II [Docket ID ED–2009–OESE–0006]
RIN 1810–AB07 Race to the Top Fund AGENCY: Department of Education.Retrieved from http://www.gpo.gov/fdsys/pkg/FR-2009-11-18/pdf/E9-27426.pdf

For the false comparison of human development and oak tree “growth” see:
Value-Added Research Center. (2012). Teacher effectiveness initiative value-added training oak tree analogy. Madison: University of Wisconsin. Retrieved from Retrieved from http://varc.wceruw.org/tutorials/oak/index.htm

For the cynical promotion of a preferred “grammar” for education see:
Reform Support Network. (2012, December). Engaging educators, Toward a New grammar and framework for educator engagement. Author. Retrieved from http://www2.ed.gov/about/inits/ed/implementation-support-unit/tech-assist/engaging-educators.pdf

May 13, 2014 at 10:57 am

lay is a typo, should be ” lazy”

Hilary Appelman says:

May 13, 2014 at 11:03 am

Diane, could you please respond to this as well? http://educationnext.org/us-students-educated-families-lag-international-tests/

FLERP! says:

May 13, 2014 at 2:53 pm

“Why are we so wedded to those standardized tests, which originated as IQ tests, filled with racial and ethnic and class bias?”

Because public school systems are enormous, unfathomable bureaucratic beasts that must be managed from afar, and because absentee managers like standardization. That’s my working theory.

wgersen says:

May 13, 2014 at 3:05 pm

You ask: “Which state or district will be first to try a new way of assessing school quality, for example, with an inspectorate of expert educators?”… MAYBE Washington State. After all, they aren’t bound by RTTT any more!

Ben Carson says:

May 13, 2014 at 7:02 pm

But wait, Pearson will help your kids get higher scores, at least in Texas. Here Pearson offers helps for kids needing qualifying scores on the TSI, a test high school students have to take if other tests don’t qualify them for on-level freshmen college courses. Imagine that, a test that identifies what another test couldn’t. And Pearson keeps making money from every angle.
http://www.pearsonschool.com/live/customer_central/email4/501-k12-marketing/501a048-tx-myfoundations/index.html?WT.mc_id=701d000000173O0AAI&WT.dcsvid=617280223&utm_campaign=701d000000173O0AAI&cmpid=701d000000173O0AAI

Jack Schneider and Pat Jehlen: Test Scores are a Lousy Way to Measure School Quality, But…

21 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

Jack Schneider and Pat Jehlen: Test Scores are a Lousy Way to Measure School Quality, But…

Share this:

21 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats