A reader said it would be expensive to release all test items as the test publishers would have to spend lots more money creating new ones.
Actually there is another way to think about all this. Release all the items as a public bank of thousands of items. Each year’s tests can be reviewed and howlers would be omitted from future use. No student could possibly take or memorize every item. The bank of tens of thousands of items would make good fodder for test prep.
Let’s face it. The items are recycled now. It is a time-honored practice in NY state for teachers of Regents exams to use old versions of the Regents. Sometimes the old questions appear again–either in exactly the same form or so slightly modified that it doesn’t matter.
Why did Pearson charge NY $32 million for a lot of items that had been pulled out of its old testing bank, recycled from other states. The Pineapple story had been used and ridiculed in other states but Pearson didn’t retire it. Now they have pineapple juice on their corporate face. Why did Pearson charge Florida $250 million for a test that students took in two shifts, with no change in the items? Now Pearson will up their fee to pay themselves for test security.
At some point, the public will see all this as a great farce that takes money away from instruction and plows it into redundant and error-prone testing.
Diane
The latest numbers I saw said that less than 20% of teacher prep programs provide advanced course work in assessment design. That’s not to say that teachers don’t understand quality items or teachers can’t distinguish good items from bad. Assessment illiteracy – to borrow a phrase from Popham – is an issue that harms our profession. A great many teachers are amazing artists at day-to-day assessment and authentic feedback cycles. Our field, though, isn’t doing a sufficient job preparing educators to understand the science of assessment.
That said, I’ve seen teachers ID an item as challenging but the released item analysis data shows that a high percentage of students got the question correct and the technical report doesn’t show any flags for any subgroups across the state. So what our adult eyes read as hard or no good, students have many items, answered overwhelmingly correctly. While I’m an advocate for transparent systems, I fear that looking through 1000’s of items will turn people into arm chair psychometricians.
I agree it is a farce and this current system is untenable. I wonder, though, if the solution lies somewhere in the middle. Perhaps a subset of secure items that are used to anchor the tests for vertical alignment and a larger public pool that items are pulled from?
@DianeRavitch,
I’m afraid you are ducking the question
Creation of such a pool of high quality items would cost a great deal of money. Even if we could wave a magic wand so that the industry was capable of producing so many items so quickly — which it clearly is not — it would cost a great deal of money.
Would you advocate for the necessary money to spend to develop such massive pools of items for every tested content area and tested grade combination?
Even if we ignore all of the complications of such a system. Even if we could wave our magic wand to make all of the barriers to such a solution a possible (other than money, of course), would you be willing to devote the money necessary to do so?
But there are also practical cocerns with being able to do such a thing.
It seems as though you want all the items created up front and then used for years and years.
Where do we get the capacity to develop those items so quickly? Let’s say that the industry was able to ramp up. (it’s not even up to doing Common Core once, but let’s just say it could.) What happens to the industry the next year? Where do all of those people go? All those people who were working hard for one year are out of jobs the next year?
We are presuming these are high quality items, right? I mean, not only reliable, but actually — gasp! — VALID items. That would take really smart and educated and well trained people. It would be an amazing effort. No learning curve or anything, because it would all be done at once.
What happens in year 2?
Which of the really smart and capable people we need to set this system up would take the job, knowing it has such a low chance of even existing in a year. It’s just a temp job.
Even if the industry kept adding to the pool each year, the first year would take a HUGE number of items to be added to the pool, need a TON more work in year one.
Don’t say that we can use all the historical items to set it up. We are going to Common Core, which is QUITE different than what has traditionally been tested. A huge bank of Common Core aligned questions — and Common Core compatible questions in science and social studies — would have to be developed from scratch.
How does this happen?
You are thinking inside the box. I oppose high-stakes testing, so I would not be sorry to see the industry fold its tent and fade away. I went to public schools with no high-stakes tests, in fact, with no standardized testing other than a single administration of the SAT and occasional idiotic tests to predict our future careers. My children went to schools with no standardized tests. I think our society has gone made for testing, and ranking and rating and grading everyone. The testing has gotten out of control. I wish we could have five years with only no-stakes tests and see how we do. When people’s lives and careers hinge on a score on a test that is fundamentally tied to socioeconomic status, race, and class, then I think we have gone off the deep end.
Diane
High stakes testing is nothing more than a means to privatize public education. It’s a form of data mining to be able to state that public schools are failing. Create flawed tests, that purposely deflate scores and you get the data you need to fit the agenda.
Last time I checked, the US’s economy drives the world. Pretty good despite a ‘failed public education system”.
Pearson getting $35million to recycle poor tests state to state while claiming secrecy is needed for test security is evidence that it’s all about the money. Billions are up for grab with this latest scam.
It’s almost like Blackwater is now in the ed business.
I don’t disagree with you on any of that. I really don’t.
My question is how we get there from here. I don’t see that as a possibility — at least no time soon. We are stuck with big tests, and even if we can lower the stakes and/or make them a smaller piece of the decision, we still have big tests.
So, what do we do about these tests? How can we make the tests better, while also reducing their role? How do accomplish those goals? And what do we do in the meanwhile to improve the situation?
To adopt what you propose would take a massive infusion of money for item development. Would you be willing to support that?
I don’t want the perfect to be emeny of the good. I want to make progress and improve the situation. (Yeah, I have my pipe dreams, too. But most of the time I try not to rant and try to figure out how to make it better sooner than the time frame in which my dreams can be accomplished.)
Reblogged this on School Refresh and commented:
Just think what we could do with all of the billions of dollars we spend on standardized testing each year.
A modest proposal about “how to get there from here”: Maybe if we (a collective we-measurement/education/policymaker community) put our heads together about the most useful/realistic PURPOSES of large-scale test data, a much more sane (and limited) system could be developed. I don’t love the NAEP test, but if we need a point in time, nationally comparable measure of student achievement, their sampling procedure makes sense (I sure wouldn’t claim that NAEP measures all kinds of student achievement, but that’s not the purpose of those data?) Then if we apply the same logic to other areas, the task may become more clear:
Want to evaluate teachers? Don’t use a student achievement test not built for that purpose – come up with multiple measures designed specifically to evaluate teachers.
Want to evaluate schools? Don’t solely rely on a large-scale statewide student achievement measure that students aren’t invested in and ends up narrowing the curriculum – design multiple measures for schools that are worthy outcome measures for worthy goals (we could look at good accreditation programs!)
Focusing on the purpose of the data could reduce threats to validity (since validity is a score USE issue) and reduce many of the negative impacts of large scale testing.
Are you aware that the PARCC and Smart Balance folks are doing just that? They are trying to develop these Common Core tests with huge amounts of federal money and they are trying to figure out what the states want the tests for.
The states (governors and chief state school officers) want to be able to examine students’ levels and their growth. They want these tests to be used to evaluate teachers. They want these test to be able to compare states, schools and districts. They want THESE tests to be used for all that and more.
These are hoped to be the next generation of state tests. Every grade. Massive money. The best standards. That’s what folks think.
They want to be able to compare a student is one state with a student in another state, across consortia.
Those of us in schools, who think about pedagogy and curricula and students, and those of us in assessment who think about the development of and valid uses of assessments are not making these decisions. It’s the politicians.
Bloomberg ran, in 2001, saying he wanted control of the schools and he wanted to be judged by the performance of the schools. Look at what he has done with the schools in NYC. Look at what he has done with assessment. Well, he’s been reelected twice.
Democratic control of the schools is a good thing. I really think it is. But the cost of that is that there are folks — and they keep winning elections — who want to use tests in ways that are not valid. It’s not a technical question, because we already know the technical answer. These are political questions subject to democratic oversight.
That’s where the battle is.