In this post, which arrived a few days ago as a comment, Ron Lapekas, a retired teacher, explains why standardized tests have no value or validity for many students:
“I am a retired teacher. I always thought the SBT’s (Stupid Bubble Tests) had little value for my East Los Angeles 99% Latino students for several reasons.
“First, vocabulary necessary both to understand the questions and the answer choices made any test results meaningless, even in math. If you don’t understand the question how can you evaluate the correctness of the answer?
“Second, we didn’t get the results until the end of summer. I never gave SBT’s to my students because, as I told them, I grade work, not answers. If a student doesn’t know which answers were incorrect, if there is no way to review how the answers were selected, and if there is no way to give feedback to the students, SBT’s are not education tools at all.
“Third, SBT’s are so standardized that they are useless for our most challenged and disadvantaged students. Unlike business models used by the Broad-Gates advocates, you can’t order students to learn and you can’t demand that they all learn at the same rate in the same way. That is like pushing rope.
“Fourth, evaluating teachers by student test results is like comparing the driving skills of drivers driving different models of cars built in different years. My students arrived with different levels of knowledge so I taught from the lowest level. This bored some of the better prepared students — but they were “better” because of test scores, not because they understood how they got their scores. By the end of the year, 70% of my students were at grade level and the other 30% had significantly improved their understanding. (As one teacher told, me he would rather have my “F” students than some teachers’ “A” students.) But I could do this because I had tenure.
“Fifth, the administrators have lost touch with the classroom. If they have been out of the classroom for more than five years, they have no clue how to teach the “new” standards. Therefore, they abdicate their duty to evaluate the teacher’s teaching schools and use the arbitrary test scores and “measurable” or observable factors such as disciplinary records, pretty bulletin boards, and classroom organization instead. For example, I was once down-checked because I re-taught a topic my students didn’t understand and, therefore, did not follow my scripted lesson.
“Lastly, SBT scores are used more as “evidence” to dismiss teachers than they are to identify areas in which to focus attention to improve teaching skills. As noted above, few administrators have a clue about how to teach subjects according to the “new” standards so they use checklists with ambiguous and arbitrary descriptions.
As most readers of this blog will agree, until all students enter a classroom with uniform background knowledge and skills, proper nutrition, and enough time to learn, evaluating teachers by the scores of their students will create false data for the Broad-Gates “data driven” models.”
“until all students enter a classroom with uniform background knowledge and skills, proper nutrition, and enough time to learn, evaluating teachers by the scores of their students will create false data for the Broad-Gates “data driven” models.””
Even if all students somehow entered class with uniform background knowledge and skills, it *still* wouldn’t be fair to evaluate teachers by test scores because they have no control over what the student does. Teaching is an individual relationship and no teacher is going to be a great teacher for every student. My brother and I (both honors students) had many of the same teachers and disagreed about half the time about how “good” such teachers were. We weren’t the kind of students who would let our opinion of a teacher influence how we took a test (but then again, we didn’t grow up in the standardized test era either), but many kids would. Why should students have that kind of power over teachers, when all sorts of personality and motivation factors come into play?
“Teaching is an individual relationship and no teacher is going to be a great teacher for every student. ”
Completely agree.
I have noticed over the years that often ( buy not always) certain teachers really shine with specific types of kids. Every time a kid says mr. so and so can’t teach ( or something like that) , some other kid ( and usually, to my way of experiencing these young people, a different kind of kid with vastly different interests, attitudes and aptitudes from the first speaker), will pop up and extol the virtues of mr. So and so.
This is why I always contend that there are relatively few ” bad” teachers out there. Mostly just different people who connect best with different students.
PS : to head off any of our local trolls or members of the clueless brigade: Yes, there are those who have no business teaching. They are , in my experience , very easy to spot, and well known to all.
The ESL teachers at our school just did a wonderful presentation first to our leadership team and then to all of our faculty about SIOP, which is based on 10 years of observation in what works for helping students understand instructional vocabulary. Many times on these tests the students have so many holes in their vocabulary that they cannot even be certain what the question is asking.
Click to access 4_Overview_of_SIOP_model_Thrower.pdf
We have used the stop model for years. Very helpful for all, IMOH, but especially ELLs.
Regarding ELLs and testing….True story: several years ago one of our ELLs was taking a high stakes math SBT. The question asked about how much a percent rose . The student was very confused as to how a number would become a flower.
Sigh
I agree with everything Ron Lapekas says.
In fact, what he says about SBTs is the reason why I only had one bubble test a year (not counting the standardized tests we were required to give). That test was a final exam that was open book; open notes and with a study guide handed out two weeks before the tests telling the students what stories to review—this was the final exam that took place the last week of each semester and it was only worth 10% of the semester grade based on a curve from the highest or next highest student as long as the span between #1 or #2 wasn’t that far from the top ten scoring students.
Those final exams also came with an extra-credit essay designed to appeal to the higher performing students and anyone else who wanted to give the essay a chance.
By the time of the final exam, students—who did the work—had written a lot of essays and I’d spent hundreds of hours grading them.
Instead of SBTs to analyze understanding, I focused on essays: for instance, one for each act of “Romeo and Juliet”; one for each chapter in Steinbeck’s “Of Mice and Men”, and one essay for each short story and poem we read.
The essay prompts—written by me—linked the theme/plot/characterization in each story or poem to the environment the kids I taught lived in—mostly a brutal barrio dominated by violent Latino street gangs—and required my students to make links between the literature and that real world.
There were always lively discussions before writing an essay and then students analyzed the rough drafts in small teams using an easy to understand rubric designed for students to use instead of teachers—I wrote the rubric and taught the kids how to use it.
And I’m sure that if I had no job protection, I would’ve been fired before I had taught five years because I was an outspoken critic of administration—a loud thorn in their side for thirty years—-even as an intern and substitute—who used written words as his weapons.
Did my students perform well on those standardized tests? Yes In fact, I was told by a friendly vice principal (one of the good ones) that my students showed steady gains, even the ones who failed my class, annually all the way back to my first year of teaching under a full time contract.
But even that wouldn’t have saved my job without the protection the law offered. The tyrants we often find in district administrators and a few of the principals would have sacked me in a moment without that job protection—to get rid of my criticisms of their ignorant and stupid decisions that often were designed to support an SBT environment.
Teachers were seldom if ever part of the decision making process at the district office of how and what we were supposed to teach.
Reblogged this on Crazy Normal – the Classroom Exposé.
Yes, those bubble tests are worthless! It would be great if many more teachers, administrators and parents addressed this issue and perhaps, change would happen.
Yes! Well put! The secondary school from which I retired was in Program Improvement in the early years of NCLB. When new teachers who had taught at a non-PI (read middle class with tiny subgroups of ESL and Spec ED) school arrived on campus, they found a noticeable difference. There were so many academically challenged students, but lessons and their delivery had to be rigidly controlled. (Note, personally, I find big DI to work, but I am writing of little di.) So, yes, teachers were not allowed to reteach. Now, one consultant, mind you, stated in a workshop that she could “teach Calculus to a Kindergartener”.
Actually, it was an embarrassment for me, when my Below Average classes of students did so much better on an important test question than other teachers’ Above Average students. (Yes, contrary to what we were told at the beginning of NCLB, teachers test scores were used to bring “competitiveness” to education.) I knew that they did not understand the math, that I had merely trained them with a trick to get the correct answer. But bubbling in the correct answer was all that mattered. It made everyone believe that mastery had occurred. Points, points, points. That’s all that teaching boiled down to.
And pushing a rope? Indeed! My former district had a public charter magnet STEM school. One family sent two brothers there. One returned to my campus, not for lack of ability, but lack of effort in STEM classes. I pity the teacher who got credit for HIS scores, as he hated the testing. As the push for STEM students increased, I noticed more bright students being labeled “lazy”, because they had no interest in certain classes. They would be forced to take Algebra 2, for example, and would settle for a “D”, no matter what their teacher could do. The “D” didn’t matter to the student, but teachers felt demoralized. In PI schools teachers and students became used to being beaten with either carrots or sticks.
1. In my undergraduate career, we read a superfluous amount of research into Standardized Testing and the implications brought about from their scoring. As a pre-service teacher, the thought of being graded by my student’s assessment scores was daunting because of the field I was going into. Students with special needs are required to take their grade level equivalent state standardized assessment, regardless of disability. Now, in the county, there are a percentage of exceptionally low students who are exempt from these assessments, but that percentage is usually on 2 or 3 percent of the school’s special needs population. Those students with special needs are required to progress at the same rate as their general education counterparts, which is an absurd thought in and of itself. If these students were assessed and had diagnosed disabilities that adversely affected their academic progress, why are we then continuing to expect these students to succeed at a level that might not be achievable to them? I have witnessed firsthand the county in which I currently resign not meet AYP (adequate yearly progress) because the students with disabilities did not progress at the level required of them. To me, the answer is clear, we need to revisit the standards of old and look for more appropriate ways to assess their success in academics.
SBTs = MGTs
Sorry, but MGTs = ???
Multiple Guess Tests is what I call them!
Very good Duane. You are understanding acronyms and making up some of your own. It’s great to see your growth on the blog. Keep up the good work.
TAGO
🙂
Several things about the original post and especially the general agreement in the comments strike me as very revealing.
First, that the poster was against “stupid bubble tests” were “useless for our most challenged and disadvantaged students”. This suggests that Ron Lapekas thought the tests were useful for the least challenged and disadvantaged students. What weight should we give the utility of testing for the least challenged and disadvantaged students when thinking about testing policy?
Second, faced with a diversity of student preparation in the class, Ron Lapekas “…taught from the lowest level. This bored some of the better prepared students…”. Perhaps this suggests that increased tracking might be useful to keep students engaged in the work.
Finally, the testimony of downstream teachers about the poster’s former students is more evidence that peer evaluation of teachers would be a valuable approach teacher evaluation.
“peer evaluation of teachers would be a valuable approach teacher evaluation”
definitely
However, it would be important for such a system a) to be part of an ongoing system for shared reflection on specific classroom practice–what worked and didn’t work last week–like that done in Japanese Lesson Study (time and techniques for doing this have to be built into teachers’ schedules); and b) one would have to build the system in such a way as to have input from a number of experienced teachers and so not to provide a vehicle for exercise of individual teachers’ petty vendettas against particular colleagues
We have peer evaluations of teachers in NC. A peer evaluation is one of the four each year during the three years a teacher works up to Career Status.
At least that is how it has been.
When are people going to figure out that multiple-choice tests are generally TERRIBLE vehicles for testing vaguely formulated, abstract skills (like those in the CC$$ for ELA) or for any purpose beyond testing specific, concrete content or procedural knowledge (recall tasks)? Yes, it is POSSIBLE to test higher-level thinking skills via multiple-choice questions, but it’s fiendishly difficult to do that at all well, as our current crop of high-stakes tests abundantly demonstrate by their failure to do so.
I’ve reviewed hundreds and hundreds of these high-stakes tests over the years. In most cases, hand me the state high-stakes test, and I will be able to show you many examples of multiple-choice questions in which the stem and/or choices were so poorly written that
1. none of the choices is actually correct;
2. a choice intended by the test designer to be incorrect is actually correct, and the choice intended by the test designer to be correct is actually incorrect;
3. more than one answer is actually correct;
4. no proposition that can be assigned a truth value (correct or incorrect) can be legitimately formed from the stem plus any of the choices. (In other words, any combination would result in a statement that is neither true nor false but absurd or meaningless.)
As test creators try to ramp up their game and make the distracters (or distractors–both spellings are widely used) more plausible (and so to make their tests more rigorous, to use the Rheformish word translatable into standard English as “unnecessarily difficult and confusing”), they end up producing increasing numbers of questions that fall into one of the above-mentioned categories, and it becomes an ever-more amusing game to go through their tests and point out where the test maker, not the student, failed, even after (in the case of PARCC and SBAC) spending many millions of taxpayer dollars and ungodly amounts of editorial time on the development. I say that pointing out the problems in the tests can be an amusing diversion. However, when one remembers that high-stakes decisions will be made on the basis of these tests, one’s amusement soon turns to annoyance and then to rage.
Rheformish–the dialect spoken by those attempting the current education Rheformation–is remarkable for its abstraction and vagueness. All Rheformish commentary about the Rheforms is done from the 50,000-foot perspective: Schools are failing! No excuses! Higher standards! You get what you measure! Blah blah blah blah. But the devil is in the details. And the Rheformers have intentionally created NO MECHANISMS WHATSOEVER for exorcising those devils that so numerously infect specific standards, specific test questions, specific teacher evaluation criteria, etc.
NB: There seems to be debate among field linguists with regard to the proper term for referring to this latest dialect of Goblish. Should we speak of Reformish, Rheformish, Rheeformish, Deformish, Dephormish, Rephormish, or Rhephormish? My choice of Rheformish rests on an aesthetic rather than a philological principle. “Rheformish” just strikes me as pleasingly appropriate because it is almost standard and thus close to reasonable but actually completely wacked, as the Gospel of the Education Rheformation–the Revelation to Achieve–itself is. Many thanks to those brave linguists who have subjected themselves to unspeakable suffering in data chats and other Rheformist venues to collect data for the ongoing Rheformish Lexicon Project! Mastery of the tongue is, of course, essential to the ongoing effort to repel the invasion of our schools by the Rheformers.
Long live the Counter-Rheformation!
NY Board of Regents just voted to make Pearson/CCSS assessments open to public viewing/scrutiny. If and when (and to what degree) this actually comes to be, experts on test writing will have a field day skewering these abominations. Writing objective MC items to test for subjective skills, abstract thinking, and mind reading abiliities will prove to be disasterous for CCSS supporters.