FairTest: Computer Test Failures Are Common

Florida had widespread problems with its FCAT, delivered–or not–by Pearson. Pam Stewart promised to seek damages from Pearson. Remember the bad old days when teachers tested students, graded the tests, and students got immediate feedback. Now state officials trust Pearson more than teachers. Who peddled the idea that all testing should be done online?

Here is a report from FairTest:

FairTest
National Center for Fair & Open Testing
for further information:
Bob Schaeffer (239) 395-6773
cell (239) 699-0468
for immediate release, Tuesday, April 22, 2014
FLORIDA COMPUTER TEST PROBLEMS NOT UNIQUE;
OTHER STATES EXPERIENCE SIMILAR SYSTEM FAILURES;
NEW POLITICALLY-DRIVEN EXAMS “NOT READY FOR PRIME TIME”

Today’s technical problems, which disrupted computerized testing in many Florida districts, are far from unusual. Many other states have experienced similar failures, according to the National Center for Fair & Open Testing (FairTest), which monitors standardized exams across the country.
Earlier this month, the statewide testing systems in Kansas and Oklahoma both crashed. Last year, technical problems disrupted computerized exams in Indiana, Kentucky, Minnesota, Ohio and Oklahoma. In the recent past, new, automated testing programs collapsed in Oregon and Wyoming, requiring administration of replacement, pencil-and-paper versions.
After root cause investigations, both Wyoming and Oklahoma levied multi-million dollar fines against Pearson, the same testing vendor Florida uses. Wyoming labeled the company in “complete default of the contract” and replaced it. Oklahoma let its contract with Pearson expire.
American Institutes of Research, the company that takes over testing in Florida next year was responsible for computer exam problems in Minnesota in 2013. The firm’s contract was not renewed.
“The reason for so many screw-ups is simple,” explained FairTest Public Education Director Bob Schaeffer. “The technology supporting statewide computerized testing is not ready for prime time.”
Schaeffer continued, “Like many other testing policies, politicians imposed new requirements before systems had been thoroughly developed and beta-tested. There are at least three separate problems. Many schools lack the up-to-date computer equipment and other infrastructure needed to mass administer tests. Large numbers of districts do not have the internet bandwidth to handle the volume. Some testing company servers do not have the capacity the meet the surge of demand from multiple locations logging on simultaneously.”
FairTest supports Florida school superintendents and communities seeking a multi-year moratorium on attaching consequences to the state’s new tests. Schaeffer has lived full-time in southwest Florida for almost 15 years.
– – 3 0 – –

– links to clips documenting computer-testing problems in other states and a detailed chronology of Pearson’s history of testing errors are available on request.

Lloyd Lofthouse says:

April 22, 2014 at 7:57 pm

Why the rush? Because the few oligarchs behind Common Core knew this program and all of its draconian elements wouldn’t be popular once people (outside the profit river) learned what Bill Gates and his flock of billionaire supporters were up to. That’s why there’s so much secrecy. If there was nothing to fear, then the program would have been transparent and would’ve been Beta tested the way programs have almost always been implemented in public education.

concerned mom says:

April 22, 2014 at 9:08 pm

Computer testing puts children with no computer access outside of school at a great disadvantage.

Amy O says:

April 22, 2014 at 10:16 pm

I’d actually say that taking standardized tests on computers puts many students at a disadvantage–visual fatigue is a problem, feeling “removed” from the testing materials, and struggling with the interface is a problem, especially in the lower grades. Children who are used to reading with their fingers beneath the lines don’t have the opportunity to do that or to make marks on the test booklet–it sounds like a small thing, but it’s not. The tests might be cheaper to score, but they tie up computer labs for the whole school, and thus have associated opportunity costs, since the labs can’t be put to other uses (never mind the time spent practicing for the tests). Students who can’t type would be terribly frustrated about composing essays…and the list goes on….

Laura H. Chapman says:

April 22, 2014 at 10:20 pm

Here is a thought experiment. Redefine an effective teacher as one who engages students in conversation about what they have learned, how well, why that matters, when, and to whom. Forbid scores, tests, grades and metrics as substitutes for this central feature of the human, and humane, teaching-learning process. Reserve time for students to propose topics worth learning about and put the the supports in place for them to initiate and follow-up on lines of inquiry. Sounds a bit like the best of John Dewey, lab school days.

Why on-line tests? In theory, the answers to questions would “automatically” be part of the system that produces the summary metrics that are used to label kids and teachers. You can skip some steps in cleaning up the data (e.g. verifying a bunch of inputs and codes, dealing with missing data). Cost savings.

You can have curb appeal. Questions can be built around illustrative material with color, animations, and “if-then” sub-routines –not possible with paper and pencil tests. This generation likely has the expectation that an on-line test will be “interesting.”

Tests and training can be united. The tracking systems developed for various gaming environments are latent or explicit models for next generation tests. You combine vintage “programed instruction” training modules with the idea of an all-knowing teacher, described as a “recommendation system.” The system moves students toward a predetermined goal. Amazon or Netflix are the examples of recommendation systems. Testing thus becomes the impetus for, and integral to, training. Training is not the same as education, but who cares about such distinctions. Compliant well-trained students are the intended output of the current test-driven regime.

With digital imagery, examples of graphics from textbooks and ancillary videos can be migrated into the online test environment with little additional cost.
You just re-purpose the huge reservoir of existing print and audio/visual resources, especially those acquired or developed for conventional content (the focus of most tests) and owned in-house. You can by-pass the ritual of seeking permissions and inevitable delays. Cost savings are huge.

By definition, most “academic” tests are dealing with well-worn content, tweaked from one year to another. About 10% of the items on a typical test are there just for field testing, not counted in a student’s score but candidates for inclusion in an “item bank” for future tests. Some of the examples in the Common Core State Standards are thinly disguised test items, just moved down from college to grade 9 or 10, then shoved further down to Kindergarten.

Here is an example of recycled content from the American Diploma Project (ADP) into one of the Common Core State Standards, specifically for ELA Standard RL. 9-10.7, for students in grades 9-10. “Analyze the representation of a subject or a key scene in two different artistic mediums, including what is emphasized or absent in each treatment (e.g., Auden’s ‘Musée des Beaux Arts’ and Breughel’s ‘Landscape with the Fall of Icarus’). This standard is identical to a benchmark assignment in the ADP project, which came from an Introductory English Survey Course at Sam Houston University, Huntsville, TX. This exact example appears on pages 98-99 in Achieve (2004) American Diploma Project (ADP), Ready or Not: Creating a High School Diploma That Counts, http://www.achieve.org/readyornot (see pages 105-106). This example also illustrates one meaning of “rigor,” namely, making 9th or 10th grade assignments the same as collegiate studies.

Of course, the whole architecture on on-line tests depends the “plumbing.” It is not in place for testing on a large scale in a short window of time. The security issues are also huge.

The more fundamental flaw is the assumption that results from these large-scale standardized tests can and should drive instruction. It is clear that they are high profile sorting devices, one-size-fits-all creators of league tables.

Bob Shepherd says:

April 22, 2014 at 10:55 pm

Laura, I put that very question into a McDougal, Littell textbook back in the 1980s when I was a baby editor working in house for them on a literature program. That very question. Of course, when I wrote that question for that program, I was imagining class discussion, students taking their time with the pieces. Great stuff for a class. Terrible for a test. In the final product, I recall, we had an overhead transparency for the teachers to use of the Breughel. I think that this was in a 12th-grade, Brit lit text. It’s been a long time.

I see a lot of this recycled material being, now, pushed down in level. The same now stock questions and selections. A teacher friend recently showed me some test prep lessons she was using in class because she thought they were pretty decent for test prep. Imagine my surprise when I found that there were three folktales in them that I had written for an entirely different program about 15 years ago. The wording of the texts was exactly the same. The use entirely different. They were in the publishers’ database of stuff to be recycled. And increasingly, that’s exactly what the educational publishers do. They have invested heavily in technologies for churning old material into new.

I was horrified recently to hear that Denise Levertov’s amazing poem “A Tree Telling of Orpheus” was being used in one of these tests. I thought that a terrible blasphemy, for it’s a sacred work, in my book. I put that poem into McDougal’s program years ago when it wasn’t, yet, in any of the other basal lit anthologies. It’s now in all of the American lit texts for use in the junior year. I picked it up from a great class on American Women Poets at Indiana University. In my book, it’s one of the finest poems ever written by an American. It’s sicking to think of that holy work being in a standardized test–thinking of a student being introduced to it in that context makes me ill, especially given the subject of the poem–an encounter with art that turns out to be terribly wrenching and utterly transforming.

Bob Shepherd says:

April 22, 2014 at 11:03 pm

“Here is a thought experiment. Redefine an effective teacher as one who engages students in conversation about what they have learned, how well, why that matters, when, and to whom. Forbid scores, tests, grades and metrics as substitutes for this central feature of the human, and humane, teaching-learning process. Reserve time for students to propose topics worth learning about and put the the supports in place for them to initiate and follow-up on lines of inquiry.” My, Laura, that is beautifully said!

April 23, 2014 at 1:07 pm

Thanks. Gates would not be pleased.
Still trying to fathom how easily the definition of “effective teacher” was highjacked and reduced to “producing a gain in test scores” as if this is the gold standard for excellece in education.

Ken Watanabe says:

April 23, 2014 at 3:18 pm

FACT= Florida Cannot Assist(or Assess) Test

Ken Watanabe says:

April 23, 2014 at 11:07 pm

>>FACT= Florida Cannot Assist(or Assess) Test

Meant, fact about FCAT. 😛

FairTest: Computer Test Failures Are Common

9 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

FairTest: Computer Test Failures Are Common

Share this:

9 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats