The Atlanta Journal Constitution reporter Heather Vogell conducted a two-year investigation of standardized testing and discovered many errors in them.
Yet with all these errors, the test scores are being used and misused to make life-changing decisions about students, teachers, principals, and schools.
Standardized tests are a weak reed on which to base a decision for firing staff and closing schools.
Vogell and the AJC were key in exposing the widespread cheating scandal in Atlanta. Now the AJC reports:
Miscalculated scores, flawed questions and other errors on standardized tests have become near commonplace in public schools across the country, according to a new investigation by The Atlanta-Journal Constitution.
Repeated failures in quality-control measures have allowed mistakes to keep happening even as testing took on a more crucial role for students and teachers, the newspaper found. In some cases, students have been initially denied diplomas or entry into special academic programs because of incorrect scores.
The findings expose significant problems in the execution of the landmark No Child Left Behind Act of 2001, which sought to use test scores to hold schools publicly accountable for students’ academic performance. The newspaper has previously reported other problems with standardized testing, exposing widespread cheating in Atlanta Public Schools. A follow-up investigation in 2012 revealed nearly 200 school districts nationwide had high concentrations of suspect scores.
In the current year-long investigation, the AJC’s Heather Vogell studied test design, delivery and scoring and reviewed statistical reports on the quality of more than 92,000 test questions given over two years to students in 42 states and Washington, D.C.
The investigation revealed that almost one in 10 tests nationwide contained significant blocks of questions that were likely flawed. Such questions made up 10 percent or more of those tests — threatening their overall quality and raising questions about fairness.
Anyone wanting to read about the built-in flaws of the testing industry should read Todd Farley’s book Making the Grades, which exposes the scandalous lack of qualifications of those hired to score test questions, their poor training and supervision, and the superficial attention they give to the answers students write.
There is no question that any testing with an objectively scored performance component will have scoring errors that impact the validity. It could be argued that such performance items are the only ones capable of revealing reasoning in mathematics, and are therefore the most significant. When you factor in the demands of language competence in interpreting such questions and in effectively expressing reasoning, the validity of such measures drops through the floor, despite extensive costly attempts to standardize and score the items consistently. This is the reason that reliance on a single pe measurement tool is ineffective at every level, and one of the reasons a high stakes test is poised to fail students in providing useful data about reasoning to their teachers. The definition of assessment is changing, and will ultimately need to include responsiveness to in-the-moment informal measures, teacher planning with informal assessment in mind, and collection and discussion of student work, including documentation of non-written performances in class. This demands a high level of content and pedagogical knowledge from teachers and administrators, and suggests that the best tool for moving children forward is teacher professional development in responsive instruction, informal assessment, and math content.
Please don’t forget the other big problem, cultural and economic bias in test questions. It’s heartbreaking to me when I’m watching these kids take the test. They’re smart kids, good kids, but they often mark the “wrong” answer just due to a difference in perspective.
One example I can think of was this – “which of the following is the safest place to put your money – A. Piggy bank B. Mattress C. The bank D. In your wallet. Every one of my kids chose mattress or piggy bank. Many of them come from families that do not have bank accounts and do not trust banks at all, and i do not think their curriculum covered FDIC, so they just answered with what they know from home. Grandma says to put your money in your piggy bank. Dad keeps his money under the mattress. Etc.
Another one was this: Choose the word that best fits the blank – “Carol started a new school on Monday. She didn’t know anyone. All morning, on her first day, she was sad. She felt lonely, and worried she would never find any friends. When the class went to lunch, a girl came up to her and said “Hi, my name is Sarah. If you want, I could meet you after school and show you around”. Carol was _____. She had never felt so _____ in her whole life. The options were A. Bored B. Happy C. Scared D. Tired.
Every child chose C. Scared. Every. Single. One. In their world, that of a pretty bad neighborhood where they dont know whats going to happen to them from day to day, someone you don’t know wanting to meet you after school is a very bad and scary thing.
I’m tired of hearing that my students aren’t smart, or that they need “interventions”. I’m sick of them always being put down, and us teachers being scolded as “bad teachers”. Every judgement made against them is based on a stupid multiple choice test, written by the privileged class. Who are they to judge my babies? Why can’t we be trusted to grade our students performance accurately? I’m so tired of it. I just want to quit. But then I’d be quitting those kids I care for so much…. Vicious cycle.
While I welcome the attention this article brings, in fact the problem has been well-documented for more than a decade. There have been multiple news stories about such problems in major media, and some of us have been writing about such problems for close to two decades, and yet we have had trouble getting traction with policy makers.
I remember going through ONE year’s High School Assessment in Government with a panel in Prince George’s County worried about the impact of test scores – a panel that included representatives from the County Board, the School Board, the Principals’ Association, the community, and the school administration. As it happens, the then Superintendent John Deasy found a reason to excuse himself. In a test with 58 multiple choice questions, if I recall 9 had either no correct answer as phrased or more than 1` correct answer. More than 20% of test was on economic-related issues, even though that was well above the range for that content, and in two cases (and thus four questions) there were effectively two versions of the same question.
I can also remember when the Maryland Department of Education released sample questions for the HSAs about 15 years ago, that the sample question for government had two correct answers as phrased. I can remember questions that to answer correctly required knowledge of other content – biology for example, which then was often not taken until 10th grade, even though government was then a 9th grade test.
And even those questions that did not have multiple or no correct answers were often poorly phrased and low level. Some unfortunately displayed prejudice.
This problem has been around a long time.
Testing companies and curriculum.book companies (which too often are the same, especially in the case of Pearson) have continued to expand, despite a history of errors of having to pay fines, and demonstrated low quality. They effectively “bribe” policy makers through trips and conferences paid for by their “foundation” and they seem to consider the fines a cost of doing business, rather than spending the money up front for appropriate quality control.
And thus students suffer inappropriately, teachers are punished, schools are demeaned, restructured and closed, and the profits keep flowing to the companies like Pearson.
How does this improve our education?
How does this help our students?
How does this benefit American society as a whole?
This is an old story.
We can only hope that this time more people will pay attention
“. . . a panel that included representatives from the County Board, the School Board, the Principals’ Association, the community, and the school administration. . .”
What percent of the panel was made of government/social studies teachers?
None – in fact, I was the only teacher asked to testify to the panel, and that was only because I was also known to be knowledgeable about testing. The panel was set up by the County Board who worried that if the percentage of student failing the state tests and thus not graduating on time were to skyrocket, it would affect property values and make it harder to recruit new business to the county.
Points to the obvious problem, eh!
I guess I need to get on a board that oversees surgeries so that we can eliminate all the problems associated with surgery!!
I note the similarity of your snarky response with this famous blog post by former National Teacher of the Year Anthony Mullen.
Except in this case it is bit misplaced.
Among the other who testified were testing coordinators in schools, principals, university professors, etc.
The state department of education sent a representative from the testing office to sit on the panel- only he lost in and acted very unprofessional while I and a university professor were testifying, and as a result i was asked to write up what was wrong with what he said and he was I believe reprimanded by the head of his office, for whom he was sitting in that day.
The purpose of the panel was to determine what the community should do to first understand the crisis and then respond to it. That response could include using the state legislative delegation, providing more resources, etc. It was NOT involved in setting policy in which case the kind of comments you and Mullen offer would be totally appropriate.
And unlike Tony’s epxxperience, I was invited specifically to offer my observations and experience and expertise, and I was taken quite seriously by the panel.
Ken,
Thanks for that link. I guess great minds think alike, or maybe they don’t and it just seems that way-ha ha!
I really wish that when people write columns like that they would name names. Without the names the story lacks punch. Call these folks (and I’m playing nice using that term) out, allow all to see the idiocies they are promoting. Put names to these faceless edudeforming politicos.
How do you stop it?
By “people” I suppose Obama voters.
‘. . . in some cases, students have been initially denied diplomas or entry into special academic programs because of incorrect scores.”
Should public schools be sorting and separating devices causing much harm as mentioned above. This was my response to a statement by Diane in along past post about the purpose of public education. The first quote is hers, my response follows.
“What is the primary goal of education? To assure that the younger generation is prepared in mind, character and body to assume the responsibilities of citizenship in our society.” Not quite. There are thousands of missions statements out there, probably at least one for every district and school and more likely than not those are “secondary” statements. What is the primary goal of public education? And where can it be found?
To answer the second question first, in each state’s constitution in the article that authorizes public education. So in essence there are 50 different goals/purposes although I suspect that they are similar in nature to what Missouri’s constitution has to say: Article IX, subsection 1a: “A general diffusion of knowledge and intelligence being essential to the preservation of the rights and liberties of the people, the general assembly shall establish and maintain free public schools for the gratuitous instruction of all persons in this state within ages not in excess of twenty-one years as prescribed by law.”
I’ll let you decide what “A general diffusion of knowledge and intelligence being essential to the preservation of the rights and liberties of the people. . .” means. But I do not see anything about “preparing students to assume the responsibilities of citizenship”-whatever those “responsibilities” may be. We have assumed a purpose that may or may not be in concert with what the constitution says so I have concerns with these mission statements that go beyond the basic purpose as delineated in the constitution.
Now the “prescribed by law” part can be a problem in that some laws made may be unconstitutional, e.g., segregated schools. And I believe that when we sort and separate students using grades and standardized tests to name a couple of nefarious practices, some of whom then receive rewards funded by the state-scholarships, special treatment, awards, etc. . . , or vice versa, are sanctioned, not getting scholarships, held back, not given a diploma but a certificate of attendance, etc. . . , then we, the public schools are discriminating against a certain class of student, those who through no fault of their own (in essence like skin color) don’t “live up to the standards”. And in doing so we are contravening the fundamental purpose of education and causing harm to some students.
along = a long
And “The findings expose significant problems in the execution of the landmark No Child Left Behind Act of 2001. . . ”
Those problems were exposed well before NCLB came into being with the most devastating being Noel Wilson’s 1997 “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
So read and learn, come and join my Quixotic Quest to rid the schools of these abominable, nefarious, damaging and damning educational malpractices that are the sorting and separating of students through educational standards, standardized testing and the “grading” (as if they were a product to be consumed-maybe they are to be consumed by those in power) of students.
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
With all due respect to the owner of this blog, I would amend the last part of her last paragraph from “and the superficial attention they give to the answers students write” to “and the superficial attention they are required to give to the answers students write (if the scorers do otherwise they are penalized).”
This is not a small change. Even with the best intentions, adequate and timely training, and the requisite educational background, scorers do not set the terms of their employment. The companies hiring them mandate the conditions of their work, such that the hurried superficial thinking required of the students taking the test translates into the hurried superficial scoring that occurs afterwards.
Most importantly, though, I commend the mention of Todd Farley’s MAKING THE GRADES: MY MISADVENTURES IN THE STANDARDIZED TESTING INDUSTRY (2009). Revealing, engaging, short, inexpensive.
Reading his book you might experience that most rare of treats nowadays, “the joy of learning.”
If so, you will hardly notice whether or not you saw “author effectiveness” on display or “reader performance” rise or the USA win the “global readership competition.”
🙂
I also commend the Farley book, as I did when I reviewed it. However, as I noted in that review, scoring of constructed answers does NOT have to be done by untrained people under the kinds of pressures he describes, and as an alternative I offer my experience as a Reader (grader) of the Free Response Questions on the AP US Government and Politics exam (although I would note I have some problems with the way FRQs are graded, at least they are graded by people who are competent in their background, and I have seen Readers dismissed because the quality of their work was unacceptable).
teacherken: I appreciate and welcome your comment.
Wouldn’t it be wonderful if even one of the leading charterites/privatizers could approach “education reform” with even a smidgeon of the nuanced thinking that occurs on this blog alone?
Keep posting. I’ll keep reading.
🙂
Teacherken,
I second your endorsement of the “AP scoring method”. They are hand graded by competent people with many checks and balances along the way.
May I offer a hypothesis as to why we will not see the method replicated for the ever expanding array of everyone must take ’em mandatory testing?
Perhaps not enough profit in it that way.
With all the huge money going to testing and testing companies, the public has a right to expect near-perfection with tests, testing, and scoring. “Accountability” is the new reality for teachers and schools, so why does it not apply to the testing companies, the ones who actually GET the big bucks for their work? Time for accountability in that arena. Schools districts and states should DEMAND it!
Checkers Finn doesn’t think we work in the real world (his “twittery ” quote on Fordham Institute site”
My comment (among 3 I left at the Fordham Institute site).
Does C. Finn think we do not work in the real world? My training at B.U. in the 1960s started with the measles epidemic when there were blind and deaf children in cribs with no intellectual stimulation and insufficient daily care or contact with other human beings. It was no better than a Russian orphanage.. MA was among the first to get the children out of the institutions (Chapter 766 which became P.L. 94-142) and among the first to tie new tests (with reliable and valid approaches and Technical Manuals) to the curriculum; and also among the first to include health care for a mandate. When a child moves into the homeless shelter where I volunteer we can sign them up for health care the very same day. Fordham, just stay out of MA and stop using MA as your early paragraph in an article to tell other states your policies are to be credited for success in MA because that is an out and out lie (romney didn’t do it either but romney claims he got the Bulger curve out of the state payroll$$$$$$)….another lie.
Just leave us alone and stop citing evidence of MA when you have absolutely no understanding of what went on here in the 1960s and the 1970s (when Greg Anrig was Commissioner and he went to ETS)* Please see additional references I have posted on the work that ETS started here back in the 70s with the MEAP test before MCAS on standard setting. And, just stop beating up on all teachers everywhere because they don’t deserve it. You are making tried and true professionals like myself very angry.
From the article:
“Chris Domaleski had a problem and its name was Andrew Lloyd Webber.
Question 42 on Georgia’s sixth-grade social studies test had asked whether Webber was a playwright, painter, sculptor or athlete.
The famous composer of Broadway musicals, however, was none of those things. But what should Domaleski, the state’s testing director, do?”
Andrew Loyd Webber on a 6th grade SS test?
Wow. Just wow.
More than a year ago, the Chicago Teachers Union proposed a resolution to the AFT convention that would make every high-stakes test public once it has been administered. Presently, critics of the testing mess are thwarted by “test secrecy” laws (which were cited against me by Judge Richard Posner in his Seventh Circuit opinion in Schmidt v. Chicago Board of Education) which prevent the public from actually going over the tests themselves, and the “rubrics” used to score them.
The only solution is going to be a return to democracy state by state through the full public disclosure of every test and every scoring. This is not impossible. More than 15 years ago, both Texas (that’s right, when George W. Bush was governor) and Massachusetts released both their tests and their scoring methods. That was in the days of the TAAS and MCAS, and they made fascinating reading.
Sadly, by the time we published the infamous CASE (Chicago Academic Standards Examinations) after they were administered in the January 1999 edition of Substance, test secrecy had become the norm, and with Judge Posner’s decision, test secrecy (and “copyright”) thwarted all First Amendment rights to review how public dollars were being spent. More than $1 million on the CASE tests alone here in Chicago.
When every state gets to the point where every high-stakes test is public, we can have a sane discussion. Meanwhile, we are living and working in a Through The Looking Glass, Orwellian, Kafkaesque world where we are told to trust what comes out of the Black Box (the high-stakes test) without knowing anything about what’s in the Black Box.
Absurd? Yes.
True today? Also, Yes.
It will take a couple of years, once all the tests are fully public, for democracy to return. At that point, what we will see is that “we” have spent a couple of decades enslaved by delusions and frauds and hoaxes. Those tests that remain will be put in their places, which are very narrow, and everyone can look back in history, from years hence, and ask how more than a generation of Americans could have been enslaved in their minds by such nonsense.
Meanwhile, the Race To The Top continues, and the BS that spews from Arne Duncan expands with the Common Core “assessments” that are now spreading across the country like those nitrogen compounds used to goose crop yields (and lead to the spreading of slime across vast expanses of oceans…)…
George’s point on disclosure is important. It is one way errors can be caught. It demonstrates when there are problems in a test.
As it happens, I grew up in NY State, which in those days had Regents (state board of education) tests in so many subjects …. as a side note, they were scored in the building and the scores counted as part of our final grades. Anyhow, NY State disclosed them every year, so there was a pool of released tests with which students could practice. Oh, and that they released so many tests gives the lie to the argument that if they released tests they would not have enough items (questions) to use.
New York State for a long time, if I recall correctly, had a law requiring public release of all tests used for admissions purposes. As a result, for while, the GMAT for business school was not given in the state??? I think that’s correct.
I’m not sure how they handled APs. I think I remember there being released AP US History and English tests in my day – graduating from high school in 1963 (yes, I am that old, I attended my 50th reunion in June).
Of course, corporate influence over educational decisions is far greater now. And as we see concentration with ever fewer companies in the field (in the case of textbooks and curriculum) that concentration leads to even greater influence.
It is very sad to think that a British company (Pearson) and technology companies (Apple and Microsoft) now have more influence over educational policy (including testing) and curriculum than do teachers and their professional organizations.
Did no one then, in NYS, take the SAT or the ACT?
stop being dense Harlan. One reason the SATs used to be released nationwide was the NY State law. In NY no ACTs – that was the middle of the country
Thanks, Ken. I had not remembered that the SAT’s were released nationally, but now you mention it, I do recall discussions of the questions and one case in which a kid found an error in a question. Duh!
Just think of the ramifications on AYP for teachers that follow the group of tests that were misgraded. Those errors will impact the students as well as all the future teachers. Those students could be incorrectly scored as not meeting AYP when they actually did, or vice versa. So, what can be done? Again, with the low paid, low skilled, low concern people who are hired to grade these tests, we again show how the priority of the testing companies is to show a proft, no matter who is hurt in the process. Therein lies the problem, the root, of this testing mania. Yet, teachers get the bum’s rap for being “out for themselves”. I think they/we are searching for the way to just state the truth. Can we find it?
Is there then, a level of “legitimate” profit for a business, ever?