A Reading List on Testing

Bob Shepherd posted this reading list on testing.

The list was compiled by Alfie Kohn.

I have a few additions:

Todd Farley, Making the Grades

Banesh Hoffman, The Tyranny of Testing

Phil Harris, The Myth of Standardized Testing

Jim Horn and Denise Wilburn, The Mismeasure of Education

Daniel Koretz, Measuring Up

Diane Ravitch, The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education

Richard Rothstein, Grading Education: Getting Accountability Right

For a short online version of the Rothstein critique that is very powerful, read “Holding Accountability to Account”

Here is Alfie Kohn’s list:

The “five fatal flaws” of the Tougher Standards movement are adapted from Alfie Kohn’s book THE SCHOOLS OUR CHILDREN DESERVE, from which a shorter book called THE CASE AGAINST STANDARDIZED TESTING has been spun off.

You may also be interested in a list of essays about standards and testing available on this website.

Other resources:

Two books on standards: WILL STANDARDS SAVE PUBLIC EDUCATION?, a short essay by Deborah Meier followed by comments from other thinkers, published by Beacon Press;

ONE SIZE FITS FEW: The Folly of Educational Standards, by Susan Ohanian, published by Heinemann.

A collection of essays about the destructive effects of (and dubious intentions behind) NCLB: MANY CHILDREN LEFT BEHIND (Beacon Press), with contributions by Meier and Kohn as well as Ted Sizer, Linda Darling-Hammond, George Wood, Stan Karp, and Monty Neill of FairTest.

Also on NCLB: WHEN SCHOOL REFORM GOES WRONG by Nel Noddings (Teachers College Press); and ENGLISH LEARNERS LEFT BEHIND: Standardized Testing as Language Policy by Kate Menken (Multilingual Matters).

Also see NoChildLeft.com and this excellent summary of the law and its effects.

Other books about testing:

– Phillip Harris et al., The Myths of Standardized Tests (Rowman & Littlefield, 2011)

– Sharon L. Nichols & David C. Berliner, Collateral Damage: How High-Stakes Testing Corrupts America’s Schools (Harvard Education Press)

– Sherman Dorn, Accountability Frankenstein: Understanding & Taming the Monster (Information Age, 2007)

– M. Gail Jones et al., The Unintended Consequences of High-Stakes Testing (Rowman & Littlefield, 2003)

– Linda McNeil, Contradictions of School Reform: Educational Costs of Standardized Testing (Routledge, 2000)
–

Marita Moll, ed., Passing the Test: The False Promises of Standardized Testing (Canadian Centre for Policy Alternatives, 2004)

– Kathy Swope and Barbara Miner, eds., Failing Our Kids: Why the Testing Craze Won’t Fix Our Schools (Rethinking Schools, 2000)
–

Gary Orfield and Mindy L. Kornhaber, ed., Raising Standards or Raising Barriers?: Inequality and High-Stakes Testing in Public Education (Century Foundation Press, 2001)

– Peter Sacks, Standardized Minds (Perseus, 1999)

– W. James Popham, Testing! Testing!: What Every Parent Should Know About School Tests (Allyn and Bacon, 2000)

– Gerald Bracey, Put to the Test: An Educator’s and Consumer’s Guide to Standardized Testing (Phi Delta Kappa, 1998).

A book about Nebraska’s recently aborted attempt to build assessment from the classroom up, thereby challenging the top-down premise not only of NCLB but of the whole “accountability” movement of which it’s a part: Chris W. Gallagher, Reclaiming Assessment: A Better Alternative to the Accountability Agenda (Heinemann, 2007)

Information from and about FairTest, the leading national organization offering a critical perspective on standardized testing. Its website, http://www.fairtest.org, includes an evaluation of every state’s testing policy and links to a listserv called the Assessment Reform Network. A related group, the Coalition for Authentic Reform in Education (CARE), which opposes the new testing program in Massachusetts, has drafted an alternative assessment proposal — a very useful document for anyone who wonders (or is asked), “If not standardized tests, then what?” For a more recent answer to that question, see Ken Jones’s article “A Balanced School Accountability Model: An Alternative to High-Stakes Testing” in the April 2004 issue of Phi Delta Kappan.

A remarkable collection of examples of, and essays about, the destructive effects of standardized testing and related policies at http://www.susanohanian.org.

A list of state and national websites devoted to challenging the tests can be found about halfway down the page devoted to practical strategies. Note in particular a new (2011) group called “United Opt Out National,” with a website and Facebook page, devoted to organizing people to refuse to take the tests.

Audio- and videotapes of presentations by Alfie Kohn on these topics: http://www.alfiekohn.org/tapesdvd.htm

A powerful study that finds no evidence of improvement on national exams (such as the NAEP and the SAT) for states that use high-stakes testing. Rising scores on state tests appear to reflect only training to do well on those particular tests; indeed, by some measures, students in high-stakes states actually fare worse on independent measures of achievement.

Beardsley and Berliner on “High-Stakes Testing, Uncertainty, and Student Learning” http://epaa.asu.edu/ojs/article/view/297/423

A devastating analysis, based on the high-stakes TAAS test in Texas, of how efforts to raise scores effectively undermine the quality of teaching and learning — and how this effect is most pronounced in schools that serve poor and minority students. This chapter, by Linda McNeil and Angela Valenzuela, is included in the book mentioned above, Raising Standards or Raising Barriers?. For the most comprehensive analysis of the effects of testing in Texas, click here to be linked to a lengthy article by Walt Haney.

Research demonstrating that when teachers are held accountable for raising standards and test scores, they tend to become so controlling in their teaching style that the quality of students’ performance actually declines:
Flink et al., Journal of Personality and Social Psychology, vol. 59, 1990: 916-24.
Deci et al., Journal of Educational Psychology, vol. 74, 1982: 852-59.
Pelletier et al., Journal of Educational Psychology, vol. 94, 2002: 186-96.

Copyright © 2007 by Alfie Kohn. This article may be downloaded, reproduced, and distributed without permission as long as each copy includes this notice along with the author’s name. Permission must be obtained in order to reprint this article in a published work or in order to offer it for sale in any form. Please write to the address indicated on the Contact page at http://www.alfiekohn.org.

dollygirl16 says:

July 4, 2014 at 11:52 am

Could you please tell me where this is originally posted? I had followed Mr. Shepherd on a WordPress blog a few months ago, but it appears that it has been abandoned. Thank you.

LikeLike

Bob Shepherd says:

July 4, 2014 at 1:34 pm

This list is from Alfie Kohn’s website. http://www.alfiekohn.org/index.php

There is much there worth reading.

LikeLike

KrazyTA says:

July 4, 2014 at 12:40 pm

Excellent suggestions. I urge everyone to follow whatever path they feel best serves their purposes.

I offer a few suggestions in order of reading [mostly repeating the above] so one can be prepared to handle a critical recent work by Audrey Amrein-Beardsley, RETHINKING VALUE-ADDED MODELS IN EDUCATION: CRITICAL PERSPECTIVES ON TESTS AND ASSESSMENT-BASED ACCOUNTABILITY (2014).

1), Start with Darrel Huff’s HOW TO LIE WITH STATISTICS (1954, reprinted many times). This very slim and very accessible intro to numbers & stats will prepare you for the accountabully tricks you will encounter in later readings.

2), Todd Farley, MAKING THE GRADES (2009). I have read several times and come up with more each time. Again, a very accessible book that strips the magic away from the standardized testing industry.

3), Banesh Hoffman, THE TYRANNY OF TESTING (original 1962, latest incarnation 2003). Not only a good and cogent read—and again, short—but it makes it strikingly plain that there is no “we’re working on fixing the problems and tweaking and refining” high-stakes standardized tests. Over 50 years since his book appeared and they still haven’t finished tweaking and fixing and refining—since the problems are inherent to the product.

4), I refer to a reviewer on Amazon: “The best explanation of standardized testing is Daniel Koretz’s Measuring Up: What Educational Testing Really Tells Us. (Diane Ravitch New York Review of Books 2012-03-08).”

While there are other fine works, e.g., Jim Horn and Denise Wilburn, THE MISMEASURE OF EDUCATION (2013), Alfie Kohn, and others, you are now ready for a preliminary go-through of Audrey Amrein-Beardsley’s book.

“Knowledge makes a man unfit to be a slave.” [Frederick Douglass]

Let’s all get unfit.

😎

Ang says:

July 4, 2014 at 1:32 pm

KTA,
just wanted to take this opportunity to tell you how much I enjoy your posts.
Appreciate you.
Happy 4 th.
And thanks for my new motto:
Let’s all get unfit!

LikeLike

- KrazyTA says:
  
  July 4, 2014 at 11:11 pm
  
  Ang: right back at ya’.
  
  😃
  
  And my treat next time at Pink Slip Bar & Grille.
  
  Although beware of Greeks bearing gifts. If Socrates offers to treat, find out whether he’s in a, er, good mood or not so good mood. When he hears us talking about FairTest and the growing opposition to high-stakes standardized testing, he brings out the good ouzo. If it’s about Bill Gates expounding on “no connection” between Microsoft and Pearson and aligning to Kommoners Kore for $tudent $ucce$$, well, then he hints darkly about the “hemlock special.”
  
  I don’t know what that is, exactly, but I pretty sure I don’t want to know either. And it’s something in a vial he usually keeps hidden under his ‘not party’ toga…
  
  In any case, this time it’s on me. Especially for all your comments here on a blog dedicated to a “better education for all.”
  
  😎
  
  P.S. And since it’s on my dime this time, order as much as you like. Let’s all get unfit together!
  
  😏
  
  LikeLike
Sandra says:

July 4, 2014 at 2:44 pm

I concur with adding Audrey Aimrein-Beardsely’s book. It must be on any list, well-researched, logical, explains statisitcally fallacies so that everyone can easily understand. One of the best!

LikeLike

Laura H. Chapman says:

July 4, 2014 at 1:22 pm

Here are some additional citations bearing on absurd policies that treat the production of student’ test scores, and gains in these, as the critical attribute of an “effective” teacher.The attention given to VAM in teacher evaluation (see resources at http://vamboozled.com/) has overshadowed the fact that about 70 percent of teachers have job assignments for which statewide tests are not available. This fact has not prevented policymakers from continuing to promote “student learning objectives” (SLOs)—a comparable system for stack rating teachers.

Student learning objectives (SLOs) are a version of the business practice known as management-by-objectives with some elaborations for education based on “behavioral objectives” for programmed instruction and training programs.
Citations:
Drucker, P. (1954). Thein education practice of management. NY: Harper
Mager, R. (1962). Preparing objectives for programmed instruction. Palo Alto, CA: Fearon-Pitrnan Publishers.

In brief, lower-level managers identify measurable goals and specific performance “targets” once or twice a year. A manager of higher rank approves the goals, targets, and measures. Lower-level managers can earn a bonus if they attain or exceed these pre-determined targets. States and districts are using versions of this process to comply with RttT or similar teacher evaluation mandates.

SLOs are sometimes called “student learning targets,” “student learning goals,” “student growth targets (SGOs),” or “SMART goals”— Specific, Measurable, Achievable, Results-oriented and Relevant, and Time-bound. Although definitions and acronyms vary, all SLOs are a strategy for managing teachers so they focus on specific goals for instruction and students’ test scores as indicators of learning. Routinely, gains in these scores, pretest to posttest, are also treated as essential measures of the teacher’s own effectiveness.

Citation: For one of the most widely copied templates with criteria for an approved SLO see the checklist for Ohio https://education.ohio.gov/getattachment/Topics/Academic-Content-Standards/New-Learning-Standards/Student-Learning-Objective-Examples/080612_2497_SLO_Checklist_7_24_12-1.pdf.aspx

On June 27, 2014 six major education agencies in the state of Maryland, including the Baltimore Teachers Union, signed a memorandum of understanding (MOU) endorsing the use of SLOs statewide. Claiming that “the SLO process is an important component of effective instruction,” (para 1) this MOU sets out an ambitious plan for training teachers and principals to develop “rigorous and measurable, but obtainable SLOs” (para. 2) and to assist teachers in “fully understanding, utilizing, and embracing SLOs” para. 2.”

There is no mention of research in support of this initiative, perhaps because there is so little, and none that speaks to reliability and validity. There is no mention that teachers and principals are being asked to “embrace” management-by-objectives with pay-for-performance as the statewide pedagogical model for education. Teachers are expected to embrace this ideology (or pretend to) if they teach in Maryland.

Citation: Maryland Public Schools (2014, June 27), Maryland Public Schools ( 2014, June 27) Student Learning Objectives Memorandum of Understanding. Retrieved from http://marylandpublicschools.org/press/06_27_2014.html

Teachers in many states are being subjected to the SLO process with its pseudo-scientific language, eight essential descriptors, and the pretense that this mandated process is “collaborative.” All of these teachers should know this approach to their evaluation is not evidence-based. Here are two recent citations both from USDE funded agencies.

The authors of a 2013 literature review found only two peer-reviewed studies on SLOs, both dealing with pay-for-performance plans. “…No studies of SLOs have looked at reliability” (p.ii). Not one. Nor has the validity of this process for teacher evaluation been established. Only three studies considered validity and only by seeking correlations with VAM for standardized tests (p.ii). Given that VAM ratings are unstable and are not even designed for content and standards in “untested” subjects, it is clearly a mistake to think that correlations of VAM and SLOs are meaningful indicators of the validity of each other.

Citation: Gill, B., Bruch, J., & Booker, K. (2013). Using alternative student growth measures for evaluating teacher performance: What the literature says. (REL 2013–002). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Mid-Atlantic. Retrieved from http://ies.ed.gov/pubsearch/pubsinfo.asp?pubid=REL2013002

If a test is unreliable, it cannot be valid.
For a test to be valid, it must reliable.
However, just because a test is reliable does not mean it will be valid.
Reliability is a necessary but not sufficient condition for validity.

A related 2014 study found 26 state education websites with policies that required teachers to prepare SLOs. Only three states stipulated that the pretests and posttests required in SLOs had to be reliable and valid.

Citation: Lacireno-Paquet, N., Morgan, C., & Mello, D. (2014). How states use student learning objectives in teacher evaluation systems: a review of state websites (REL 2014-013). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast & Islands. Retrieved from http://ies.ed.gov/ncee/edlabs

Joe Nashville says:

July 4, 2014 at 3:07 pm

Wow, I knew there were a lot of publications regarding testing, but what a list! I definitely concur with Koretz’s “Measuring Up” as a fantastic book.

In addition to Popham’s book listed above, another good one from Popham is “The Truth About Testing: An Educator’s Call to Action” (ASCD, 2001).

John Wund says:

July 4, 2014 at 11:05 pm

I just wanted to add (as a starting point for an exploration of the testing craze), the late Stephen J. Gould’s masterpiece, The Mismeasure of Man. At first glance, you will think it’s only about the misguided measurement of “intelligence”, however Gould brilliantly exposes the fallacious basis of so many conclusions drawn from ‘multivarient analysis’, as well as the conservative social purpose behind many ‘tests’ (including the SAT).

In my classes, I often produced tests that I thought would prepare students for what they would face in the future. I have been honored by students who went on to study physical science and engineering at elite colleges, who returned a few years later to express their gratitude for having been well-prepared. However, I sometime wonder if I did the right thing. Gould has made me think. Was I supporting a flawed model?

Most students don’t need that sort of preparation. Instead, they need to develop an awareness of and interest in the natural world they experience. They also need to understand the importance (indeed, the superiority) of inductive logic as a means of our survival on our planet. Fortunately, I had the freedom to develop and teach a course aimed at ‘artists, English majors, and such’ at several schools. I, of course, developed tests that sought to stimulate, rather than punish. Thus, I could prepare informed (and questioning/skeptical) citizens, not just engineers or physicists.

I could, however, only do this in schools that gave me a free hand in developing both curriculum and the accompanying testing. ‘CCSS’ would have, of course, prevented this.

Here, in Tennessee (where I have never taught), the results of our TCAP were delivered weeks behind schedule. After reading Gould, I,m pretty sure I know why. The results didn’t conform to expectations, and needed to me ‘massaged’ before release. The tail wags the dog.

Duane Swacker says:

July 7, 2014 at 6:55 pm

“Was I supporting a flawed model?”

Yes, Much more likely than not. See below for further comment.

LikeLike

kellyflynn1 says:

July 5, 2014 at 7:29 am

To this list I would like to add The Teachers’ Lounge (Uncensored): A Funny, Edgy, Poignant Look at Life in the Classroom. This book uses real-life classroom anecdotes to show the reader exactly why standardized testing — and most other corporate reforms — fail both students and teachers. Foreword by Nancy Carlsson-Paige.

Rene Diedrich says:

July 5, 2014 at 6:11 pm

If you can only read one, Mismeasure if Education is a revalation. It is comprehensive and well written. It is so engaging i was stunned because I dread these things even when they are about what most interests me. I can do read Diane , Weiner and the lady whose name I cant remember, the one whose out there in the same circles, but most of the stuff gives me a head ache. A lot of it is the language and lack of human elements. Some of it is cold and free of viable connotations, so it makes sense that teachers never see this assault coming. It reminds me of the standards I had learn as a credentialed teacher. Redundent, logically didactic and without style.
Horn writes in such an easy affable style, it is tribute to his intelligence that he can take admittedly dreary topics and ignite one’s interest by actually providing a concrete sensibility to the events and practices inside. He knows his auduence and his topic. This is especially so in his explication of the history of standardized testing,. Ironically, it grounds you in what you know as a student and as a teacher or a parent . Abiding the polite academic tone we expect, Horn proceeds to let the devil in the details move readers.
When you discover how earliest standardized tests were written and analyzed, I guarantee to you will have an epiphany about the value of objective data and appreciate how passionately the statistics are Devined in what is a very practical yet time consuming effort . This something that is less about proving anyone’s hypothesis. Testing then is about discovery. This mid 20th century inquiry also bears out what we know about students performance repeatedly with many configurations. . Poverty is the biggest threat to proficiency and diversity is the best thing for all students and the future of society. That we fail to abide Brown in this day and age is an affront to our national philosophy.
You arm yourself with this book when you ho back in Fall. You tell the principal why the test scores at your high need urban HS are bunk. This book can and should impower all of us. It will eventually become evidence in court.

Duane Swacker says:

July 7, 2014 at 7:03 pm

“If you can only read one. . . ”

. . . make it “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700

A major faux pax for Diane and/or Bob for leaving out THE MOST IMPORTANT AND NEVER REFUTED NOR REBUTTED STUDY that completely logically destroys educational standards and the accompanying standardized tests. (But thanks for the opportunity, Diane, for providing another opportunity to get the word our about Wilson’s work)

There is no excuse for ALL EDUCATORS to not read and to not understand THIS MOST IMPORTANT OF ALL STUDY.

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)

1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other word all the logical errors involved in the process render any conclusions invalid.

5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?

My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

LikeLike

A Reading List on Testing

13 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

A Reading List on Testing

Diane Ravitch's Blog

13 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats