Jason Stanford says that so long as there are high stakes attached to testing, there will be cheating.

Arne Duncan says districts need more test security.

A new report by the federal GAO documents instances of cheating in 33 states.

When Duncan was asked about a moratorium on high stakes, he couldn’t give a straight answer.

Stanford says:

“Removing the high stakes from standardized tests would take away the incentives to cheat and return testing to its original, intended purposes—to diagnose where schools and students need improvement. Sec. Duncan can do better than holding a meeting, issuing a report, and calling it a day, but until he addresses the root causes—to paraphrase the Japanese submarine commander’s famous phrase—the cheating will continue until morale improves.”

Categories Duncan, Arne, Race to the Top, Testing

41 Comments Post your own or leave a trackback: Trackback URL

Andrew King says:

May 29, 2013 at 8:45 am

Not a good argument. Because some people cheat is not a valid reason to eliminate high-stakes testing. Why let people drive? After all, some might drink. Why let people out on parole? After all, many will re-offend. There are better arguments against standardized testing (and by the way, assessment is a good thing, it’s just the massive overload of it, the manner in which we do it, and the way in which we deify the data that is the problem).

LikeLike

Reply
- Andrew King says:
  
  May 29, 2013 at 8:46 am
  
  Oops, I meant to finish my post. What I was going to say was… “There are better arguments against standardized testing… so we should be actively making THOSE (whatever those be).”
  
  LikeLike
  
  Reply
  - Duane Swacker says:
    
    May 29, 2013 at 9:30 am
    
    The best logical argument against educational standards and standardized testing is to be found in Noel Wilson’s “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 . Will summarize in another post so as to not make this a string bean thread.
    
    LikeLike
  - FLERP! says:
    
    May 29, 2013 at 10:36 am
    
    Duane — please do summarize it. It is truly incomprehensible.
    
    LikeLike
  - M. Schneider says:
    
    May 29, 2013 at 11:07 am
    
    I think the fact that cheating has become a prime byproduct of standardized testing is a pretty powerful argument:
    
    Cheating As We Worship: The Almighty Standardized Test
    
    LikeLike
  - Duane Swacker says:
    
    May 29, 2013 at 2:04 pm
    
    I apologize to all that my summary is taking so much space, but I will at the end summarize my summary-ha ha!
    Thanks for bearing with my posts!
    
    LikeLike
- Jon Awbrey says:
  
  May 29, 2013 at 9:05 am
  
  Arguing with morons* demands a special brand of rhetoric.
  
  They are too profoundly ignorant of all the factors that it takes to make measures meaningful in the contexts where they can be meaningful at all, and they are even more profoundly ignorant of the fact that metrics do not exist for the things that matter most in life.
  
  So we are reduced to making arguments on the grounds of human nature, in technical terms, the Laws of Campbell and Goodhart.
  
  * Maybe “cargo cultists” would be a more polite term.
  
  LikeLike
  
  Reply
- Dienne says:
  
  May 29, 2013 at 10:12 am
  
  You seem to be conflating assessment with high stakes testing.
  
  LikeLike
  
  Reply
- FLERP! says:
  
  May 29, 2013 at 10:32 am
  
  It’s a cost-benefit analysis, Andrew. Some policies carry external costs that can outweigh the policies’ benefits. The benefits of automobile transportation are greater than the cost of deaths from drunk driving, so we don’t ban driving to eliminate drunk driving. Instead, we regulate drunk driving and try to enforce those regulations. If it can be regulated effectively and it doesn’t cost too much to do so, then the policy is worthwhile.
  
  (A more difficult example of your counter-argument: “Just because some people commit mass murder doesn’t mean that people should not be allowed to buy high-capacity, semi-automatic weapons.” Or: “Just because some people are purchasing alcohol illegally doesn’t mean that Prohibition should be repealed.”)
  
  If cheating is rampant or undermines the reliability of the accountability system that high-stakes tests are supposed to create, then cheating creates very high external costs that arguably outweigh the benefit of high-stakes tests. And if cheating cannot be effectively regulated, that’s a legitimate argument against high-stakes testing.
  
  Or, as you say, you could argue that the benefits of high-stakes testing are not high enough to justify the costs. Either way, it’s just two sides of the same analysis.
  
  LikeLike
  
  Reply
Jon Awbrey says:

May 29, 2013 at 8:46 am

Waste To The Top …

LikeLike

Reply
Ron Poirier says:

May 29, 2013 at 8:59 am

Obviously, what state school districts need is enhanced test security. I’m sure there are edupreneurs who can help with that — for a modest fee, of course!

LikeLike

Reply
- Ang says:
  
  May 29, 2013 at 9:46 am
  
  “I’m sure there are edupreneurs who can help with that — for a modest fee, of course!”
  Oh, yes, you bet!
  And, in addition, lets not forget the cost of all the “man” hours (of the teachers, administrators) providing this security..really..guarding boxes, monitoring rooms, double proctoring tests, delivering boxes of materials to and fro..and on and on.
  I thought we were employed to teach the children!
  
  LikeLike
  
  Reply
Duane Swacker says:

May 29, 2013 at 9:32 am

Inanities abound when Duncan opens his pie hole.

LikeLike

Reply
WordsMatter says:

May 29, 2013 at 10:20 am

Off topic, but excited to read about two NY legislators whom are proposing a bill to make state tests public information in NY. Maybe when the public sees the tests, the testing mania will come to a stop.

LikeLike

Reply
Robert D. Shepherd says:

May 29, 2013 at 10:24 am

Cheating has been part of the accountability program from the very beginning. Remember Rod Paige, Secretary of Education, Mr. “No child left behind”? http://www.cbsnews.com/2100-500164_162-591676.html

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 10:36 am

From Wilson’s “Educational Standards and the Problem of Error”:

“So in this study I will substantiate the contention that some of the explicit and implicit “truths” embedded in assessment practices are falsifiable; that empirical data constructed from their own assumptions denies the accuracy they assume; that this data
is not only adequately detailed in the literature, but further, that the notion of error is the
epistemological basis of much of that literature. All of which makes the public silence
about the presence of error even more puzzling.”

Plain English: ‘I will prove that the claims, both spoken and unspoken, that standardized test makers make (such as the scores are accurate, that the rankings are accurate, that labeling a student with a score is accurate, etc. . . ) are not true by using their own words of justification against them. And that the concept of “error” is central to proving the logical fallacies that the test makers attempt to hide or gloss over.

Yes, Wilson is “deconstructing” (oh no, cry the empiricists, not another post-modern diatribe), that is, tearing down, ripping apart the whole apparatus of educational standards and standardized testing.

To be continued. . . .

LikeLike

Reply
- FLERP! says:
  
  May 29, 2013 at 10:42 am
  
  It’s a shame Wilson didn’t have you as an editor.
  
  LikeLike
  
  Reply
  - Duane Swacker says:
    
    May 29, 2013 at 10:55 am
    
    FLERP!
    
    Thanks for the compliment. I have read the study well over a dozen times and still get something new out of it when I go back over it. One has to realize that the study is a doctoral dissertation, so that, yes, it should be “dense”.
    
    I do have his permission to “spread the word”.
    
    LikeLike
Duane Swacker says:

May 29, 2013 at 10:51 am

Continuing from Wilson:

“I shall show that the epistemological* and ontological** grounds for the whole field of assessment of individual persons are enormously shaky. I shall also explain how the
literature about the very notion of validity is founded on a biased position, so that the
sources of invalidity are much deeper and wider than is admitted in practice, even
though clearly implied in theory and its attendant discourse.”

Plain English: I will show that the theoretical foundation of assessing someone has so many cracks (errors) in it to render almost all assessments invalid. And that the test makers do not acknowledge, or at best ignore, those “cracks”.

*Epistemology (from Greek ἐπιστήμη – epistēmē, meaning “knowledge, understanding”, and λόγος – logos, meaning “study of”) is the branch of philosophy concerned with the nature and scope of knowledge[1][2] and is also referred to as “theory of knowledge”. It questions what knowledge is and how it can be acquired, and the extent to which any given subject or entity can be known. (from Wiki)

** Ontology is the philosophical study of the nature of being, becoming, existence, or reality, as well as the basic categories of being and their relations. Traditionally listed as a part of the major branch of philosophy known as metaphysics, ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences. (from Wiki)

To be continued. . . .

LikeLike

Reply
Virago Black says:

May 29, 2013 at 11:06 am

Standardized tests have proven to be a reasonably good way to assess a student’s family’s socioeconomic status. But we could get more accurate data by asking parents to submit their tax returns and high school/college/grad school transcripts, and that would take much less class time.

If standardized tests have proven to measure anything else accurately, I am not sure what it is.

LikeLike

Reply
- Duane Swacker says:
  
  May 29, 2013 at 1:08 pm
  
  But they don’t “measure”/assess SES status. That’s just a correlation.
  
  LikeLike
  
  Reply
ME says:

May 29, 2013 at 11:06 am

I do not speak of the things I’ve seen in the world of standardized testing. I’ve probably seen enough to get at least two colleagues fired, both at former schools. Even if I had brought the issues up to people at the county, it would have been tough to prove they cheated.

I trust NO standardized test results that have high-stakes attached, mostly because of what I’ve seen.

Technically, teaching to the test IS cheating. Teaching to the test invalidates the results of the test.

End of discussion. We are all cheaters.

LikeLike

Reply
- Duane Swacker says:
  
  May 29, 2013 at 1:09 pm
  
  ME,
  
  “We are all cheaters” UMMM, NO! WE are not all cheaters. I reject that characterization.
  
  Duane
  
  LikeLike
  
  Reply
Duane Swacker says:

May 29, 2013 at 11:15 am

Continuing Wilson:

“I shall indicate the complexity of the notion of in validity, with its practical face of error. Error includes all those differences in rank ordering and placement in different assessments at different times by different experts; all the confusions and varieties of
meaning attached to the “construct” being assessed; and all those variabilities arising out of logical type errors, issues of context, faulty labeling, and problems associated with prediction. To further complicate the matter error has a different meaning depending on the assessment frame of reference. And I will show that estimates of the extent of the confusion along many of these dimensions may be easily estimated. This is a critical study.”

Plain English:

The concept of validity is very complex and when a process (educational standards and standardized testing) has “errors” (or falsehoods) then logically it is invalid . And since it is invalid, the “meanings” (conclusions) derived from said process are false. Here Wilson is starting to list where the various errors in the process occur: From confusing and conflating the “frames of reference”* of the assessors to the “issues” of context, misplacing a student on the ranking scale to the “problem” of the supposed predictive capability of standardized tests (such as using ACT scores to determine who may or may not succeed at the post secondary level).

To be continued. . . .

*Wilson’s four frames of reference to be discussed next.

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 1:27 pm

Continuing from Wilson

In Chapter 7 four different frames of reference are defined; four different and largely incompatible sets of assumptions that under lie educational assessment processes
as currently practiced: First is the Judges frame, recognized by its assumption of absolute truth, its hierarchical incorporation of infallibility; second is the General frame,
embedded in the notion of error, and dedicated to the pursuit of the true score; third is
the Specific frame, which assumes that all educational outcomes can be described in
terms of specific overt behaviors with identifiable conditions of adequacy; fourth is the
Responsive frame, in which the essential subjectivity of all assessment processes is
recognized, as is their relatedness to context. Because of their contradictory assumptions , slides between frames result in confusion and compound invalidity.

Plain English:

Judges Frame-think classical college professor who “knows” a students “worth”, i.e., grade.

General Frame-think standardized testing that claims to have a “scientific” basis.

Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen.

Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback.

Each frame has it’s own assumptions as to a “true” assessment of the individual. And those assumptions are not compatible. When one mixes up, confuses and conflates one category or more one causes “error” in the assessing process. The fundamental purpose of one frame can’t be used in another frame. It’s kind of like using a cake batter to make a pie crust, it doesn’t work!

Next: A look at “standards”

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 1:47 pm

Continuing Wilson’s thoughts:

“I focus on the cultural meanings that attach themselves to the notion of the standard, and assign the idea of the human standard to the mythological sphere, a place apart from critical thought. I examine the emotional intensity of discourse about the standard, its significance as an article of faith, and how this is related to the maintenance of control and good order. . . . I examine the crucial part that the standard plays in the whole mechanism of defining cut-offs for abnormality and non-acceptance, and how important it is that these standards be seen as accurate if current societal structures are to be maintained.”

Plain English:

To question standards is to have one’s own sanity questioned. How can one question the “standard”, the “golden criterion”? For to be less than at the apex of the touchstone is to be substandard or perhaps the “F” word-a FAILURE. Why we might lose our status as the supposed top dog nation of the world if we don’t live up to THE STANDARD.

Yeah, all hail the almighty and great golden STANDARD!

LikeLike

Reply
- Duane Swacker says:
  
  May 29, 2013 at 1:49 pm
  
  What then is a standard? Is it the sought after grail or is it the search itself?
  
  LikeLike
  
  Reply
  - Duane Swacker says:
    
    May 29, 2013 at 1:51 pm
    
    To follow: More or less. . . .
    
    LikeLike
Duane Swacker says:

May 29, 2013 at 2:02 pm

The concepts of more/less and better/worse come into play in trying to figure out what an educational standard is. Wilson writes:

“The fundamental distinction between more and less, and better and worse, is first elucidated, and this is linked with ideas of uni- and multi- dimensionality and notions of doing or having. This analysis is then applied to ideas of traits, abilities, and skills, and
their supposed measurement in tests and examinations. Some fundamental confusions
are exposed.”

Plain English:

When invoking a “standard” one attempting to make a distinction between the categories of more/less, i.e., quantification, and of better/worse, i.e., qualifications, that can’t logically be made. Standardizing by definition involves more/less distinctions but its proponents purport it to be able to distinguish better/worse. Standardization attempts to simplify the complex nature of the teaching and learning process into one of yes/no, more/less categories (think of cut off points between beginning, intermediate and/or advanced designations on a test, or of cutoff points between A, B, C, D, F-why isn’t there an “E”?)

To follow: Notions of error

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 2:18 pm

“. . . the meaning of error in each frame of reference for interpreting assessments is considered. As the meaning of error changes with assessment mode, so do the methods designed to reduce such error. Procedures to reduce error in one frame are seen to increase it in another. From a perspective of oversight of the whole assessment field, this is another source of confusion and invalidity, particularly as it is rare for any practical assessment event to remain consistently within one frame of reference.“

Plain English:

What are the fundamental assumptions of error in one assessment frame (Judge, General, Specific, Responsive) can be the fundamental basis for another frame. In other words, the judge assumes to have infallibility in his/her ratings of students but this has been shown to be invalid. Think in terms of different teachers grading the same essay and coming up with different scores while at the same time a computer (specific frame) grading the same essay may come up with a different score. To increase validity in one frame correspondingly is seen as incompatible with the other frames’ basis for validity. Standardized test proponents (general frame) contend that there scoring is objective when, in reality, it is nothing more than human judgment to begin with, especially when using rubrics (Judge frame). Confusion abounds!!

To follow: Wilson’s 13 sources of error in educational standards and standardized testing.

LikeLike

Reply
H.A. Hurley says:

May 29, 2013 at 2:54 pm

Test Security:
Having experienced the ever tightening procedures for test security, it is almost easier to break out of prison than for teachers to cheat…in some settings. Every signature, test pick-up, sign in and out, restroom procedures, break procedures, taping answer sheets with special tape, initialing every strip of tape, every taped envelope, having witnesses count and recount tests & test booklets. Only thing left is to do a DNA test before checking out testing materials.

The missing link in test cheating is if the school has several staff, sworn to secrecy no doubt, who network and have access after hours. I think that was the case in Atlanta & DC.
That said, the expectations of increase in test scores was over the top. Students with disabilities and 69 IQs were required to meet or exceed grade level scores. Really? Many urban schools have high concentration of SWD in their schools. Those were also schools not making AYP. In the end, it did not matter to Duncan/Obama/Gates/Rhee & Co. that those kids were not supposed to perform at grade level by sheer definition and eligibility. Many folks ended up cheating to preserve their jobs. What a shame! The bullies who just refuse to back off are still making the same insane rules.
Not supporting cheating! No one is listening to educators All about the $$$.
Some day we will look back and view this entire public education fiasco with collective shame and outrage. First, we need to survive today.

LikeLike

Reply
FLERP! says:

May 29, 2013 at 2:57 pm

Duane — help me understand these basic questions, as a way to orient myself.

First, let’s say you’re teaching math, and specifically multiplication. You spend a semester teaching that. You then write a test covering that material, in multiple-choice format. 25 questions, one class of students, all multiplication problems, all material you’ve covered, all multiple choice. Is that exam invalid? Is it useful in any way?

Second, let’s imagine you’re running a school district with several schools. The schools all have the same math curriculum. You write the same multiplication test as above, and teachers give that test to their students. Is that exam invalid? Is it useful in any way?

LikeLike

Reply
- Duane Swacker says:
  
  May 29, 2013 at 3:56 pm
  
  FLERP!,
  
  Depending upon how the test was constructed, how it was graded and how results of that test are disseminated, yes, more likely than not, it is invalid in the sense that it still will only test one kind of knowledge of math-being able to get a “correct” answer to the question at hand. Most would agree that “doing” math is much more than getting answers correct in a multiple guess (oops I meant choice) format. So in one sense it only assesses a fraction of what “doing” or “learning” math is about.
  
  In another sense it is invalid due to the limited number of questions. Are they constructed to assess every aspect of what learning math entails? My answer is no, that’s impossible to do. So, it is only a small, incomplete sample of what the student may actually know and be able to do in math.
  
  It is invalid depending on the dissemination of the results and how they are used. Do the students get a chance at looking at the questions that are graded and have a chance at ensuring that the questions are without mistakes or that the assessment (grade) given is correct?
  
  Are the results used to “judge” a teacher’s effectiveness? If so, then the results are definitely being used invalidly as the test is supposedly a test of students’ math capabilities and not a test of teacher effectiveness. Not only that but to use the results in that fashion is UNETHICAL as to use the results of any test for something other than the test was designed is wrong.
  
  “Is it useful in anyway?” Yes, if used as an instructional device for the learning process. That is if the student, in conjunction with the teacher, reviews each item so as to learn from the process. Then and only then, yes.
  
  For your second example see above. Same concerns/problems, etc. . . .
  
  For me, Wilson’s “Responsive Frame” is the most valid frame for assessing, in conjunction with the student, a student’s work.
  
  LikeLike
  
  Reply
  - FLERP! says:
    
    May 29, 2013 at 5:34 pm
    
    Thanks.
    
    LikeLike
George Buzzetti says:

May 29, 2013 at 2:59 pm

As the Director of Policy for the Congress of Racial Equality of California (CORE-CA) we believe in honest government. This means no cheating. Cheating is unacceptable in many ways and one of them is the message we send to our youth when they see adults and their leaders accepting illegal behavior. It says to them go do it yourself, no one cares. Then what do we have? There are some bad people in education, but not as many as they would have you believe. In what large operation are there no bad people? We need to start to once again show our youth that they really count and that we will do what it takes to help them. This does not mean 100% success as you are always looking for the “Best Outcome.” First, you have to start with financial controls as without cash there are no programs. The more carefully you spend your cash the more you can do. It is amazing what we find in school budgets through time. The answer for the educational failures are usually in a close look at the budgets through time. It is where the big cheating goes on which leads to the lack in the classroom and on the ground. We need ACCOUNTABILITY.

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 3:39 pm

First, I will not explain Wilson’s thoughts on errors as I believe that it is comprehensible to the average reader of this blog. Important to understand though. I will give examples of each category of error, any one of which destroys validity and by extension reliability. From Wilson:

“Error is predicated on a notion of perfection; to allocate error is to imply what is without
error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our
epistemology, practically by the events and non-events, the discourses and silences, the
world of surfaces and their interactions and interpretations; in short, the practices that
permeate the field.

All assessment statements about a person are statements about that person engaged in an event, or a potential event. They are descriptions or indicators or inferences about the person’s performance in that event. As such they involve at the very least an event in which the person being assessed is an element, and an event in which the assessor engages directly in the first event, or with a product (element) of it.

Error is the uncertainty dimension of the statement; error is the band within which chaos
reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.

Thirteen (overlapping) sources of error are examined, all contributing to the essential invalidity of categorizations of persons:”

1. Temporal errors-think of differences among pretest, formative and summative test scores. According to proponents the same test should yield the same results otherwise it would be seen as unreliable. From Wilson: “Temporal errors are indicated by the differences in assessment description when the assessment occurs at different times.”

2. Contextual errors-think assessing an art student via rubric of work, via verbal description, via votes on best work, etc. . ., the myriad ways one can assess an individual’s performance. Which one is the “true” assessment? From Wilson: “. . . contextual errors include all those differences in performance and its assessment that occur when the context of the assessment event changes.”

3. Construction errors-think how different Spanish teachers choose to assess the student’s learning of Spanish, some emphasize speaking, others reading, still others writing etc. . . . Again, which give us an error free assessment? From Wilson: “. . . construction errors are indicated by all those differences in construction errors are indicated by all those differences in assessment description when the same construct is assessed independently by different people in different ways.
4. Labeling errors-Is the test an “achievement” test? Or is it an “intelligence” test? Who gets to label the test and what does that label mean? From Wilson: “. . . labelling errors are indicated by the range of meanings given to the label by all those who use it before, during or after the assessment event.”

5. Attachment errors-think of “attaching” the A, B, C label to a student. “Oh, my Johny is an A student. From Wilson: “Attachment errors are the ontological slides that occur when a description of a relational event is attached to one of the elements of that event; specifically, when a complex relational event involving the construction of a test, an interaction of the test with a person, and a judgment of an assessor, is described as a property of the assessed person, this is an error in attachment.”

6. Frame of reference errors-think of computer (Specific frame) grading an essay which when assessed by humans is in the Judge frame. Which is more true and “errorless”? From Wilson: “Frame of reference errors are indicated by specifying the frame in which the assessment is supposedly based, and indicating any slides or
confusions that occur during the assessment events.”

Error categories 7-13 will follow.

7. Instrument errors
8. Categorization errors
9. Comparability errors
10. Prediction errors
11. Logical type errors
12. Value errors
13. Consequential errors

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 4:31 pm

Continuing Wilson:

7. Instrument errors-think in terms of margins of error as done in polling/phone surveys as a test is only a small sample of what the teaching and learning process is. From Wilson: “The practical indicator of instrumental error [are] those errors implicit in the construction of the measuring instrument itself; what is conventionally called standard error of the estimate.”

8. Categorization errors-think in terms of how a student may be categorized as Proficient when his score was in the margin of error to be judged Advanced. Or in terms of a student with an 89.4% vs and 89.5%, the first a “B” the second an “A”. Is there any logical basis to make that distinction (considering all tests and assessments in schools have margins of errors that could compound each other to make that distinction a non-distinction)? From Wilson: “Categorisation errors are all those differences in assessment description that occur when particular data is compared with a particular standard to produce a categorisation of the assessed person.”

9. Comparability errors-think in terms of GPA’s and how the GPA is a mishmash of different grades from different subjects all derived differently. From Wilson: “comparability errors are indicated by constructing different aggregates according to the competing models. The differences that these produce indicate the comparability error.”

10. Prediction errors-think in terms of using GPAs or ACT scores as a predictor of success at the university level (they’re not very good predictors). From Wilson: “prediction error is indicated by the differences between what is predicted by the assessment data, and what is later assessed as the case in the predicted event.”

11. Logical type errors-Many confuse and conflate a test score to mean more than what it does. A 59% on a vocabulary quiz is interpreted as meaning that the student failed, whereas it may just be that the student didn’t happen to know that part of the vocabulary that was tested. From Wilson: “Test scores are often interpreted as giving specific information about what a student can or cannot do. For example, a score of 90 per cent on a spelling test gives no information about whether any individual item on the test was actually spelt correctly by a particular student. Any assumption to the contrary is a logical type error. Similarly, a score of 80 per cent on a mastery test gives no
information about what information or skill has been mastered. Common inferences made from test scores are riddled with such logical type errors.”

12. Value errors-What is “valued” in an assessment? In a writing test, Is writing complete sentences valued or is it the thought that counts? Does spelling count etc. . .? From Wilson: “Value errors are indicated by making explicit the value positions explicit or implicit in the various phases of the assessment event, including its consequences, and specifying any contradiction or confusion (difference) that is evident.”

13. Consequential errors-Think in terms of time better spent than on test prep, the loss of recess, the loss of electives, etc. . . . From Wilson: “Consequential errors involve all those negative effects on a student’s learning and a teacher’s teaching that are attributable to the assessment event.

Next: Psychometric “fudging”.

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 4:48 pm

Continuing on (almost done for now):

“. . . details some of the ways in which psychometricians fudge; by reducing criteria to those that can be tested; by prejudging validity by prior labelling; by appropriating definitions to statistical models; and by hiding error in individual marks and grades by displaced statistical data, and implying that estimates are true scores.”

Plain English:

The last statement being the most important “implying that estimates are true scores.” Any and all tests assess only a fraction of what a student can or cannot do in a particular subject so that they are only an estimate of what a student may know. Proponents of standardized testing would like us to believe that the scores are the student, the scores are the teacher, the scores are the school, etc. . ., that they are the “end all be all” of educational practices (they’re actually educational malpractices). For a shorter version on the validity and reliability aspects (that should actually read invalidity and unreliability) of standardized tests see: Wilson’s “A Little Less than Valid: An Essay Review” found at: http://www.edrev.info/essays/v10n5.pdf .

LikeLike

Reply
Duane Swacker says:

May 29, 2013 at 4:51 pm

In conclusion from Wilson:

So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test
what the person who pays for the test wants the test to be named.

The person who does the test has already accepted the name of the test and the measure that the test makes by the very act of doing the test; when you enter the raffle you agree to abide by the conditions of the raffle.

So the mark becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.

LikeLike

Reply
Old Teacher says:

May 29, 2013 at 10:48 pm

All standardized tests are based on abstract concepts that are operationally defined to fit into a statistical model based on the assumptions of the test construction personnel. All statistically based tests have type 1 and type 2 errors that can not be filtered out. There also exists a wide variety of error and variance that can not be statistically accounted for due to the erroneous assumptions inherent in the test construction. This has plagued research in psychology for the entire existence of the field in academia. Phrenology was a failed attempt at this, some things can not be completely and accurately measured. And to the misinformed, normed tests are designed to yield a bell curve, they are altered until they do, that is an inherent assumption in their design, everyone can not score proficient on such tests. It puts all of us in a rat race pitting us and our students against everyone else. I would point the interested or the doubting to Dr. Bruce Baker’s work (see school finance 101) regarding Student Growth Percentiles. All of this nonsense is inherently flawed. We teachers do not demand perfection, just fairness, we are not perfect, neither are these testing schemes.

LikeLike

Reply
David B. Cohen says:

June 1, 2013 at 4:27 pm

I had a similar observation in my own blog post based on a “Freakonomics Radio” program: http://accomplishedcaliforniateachers.wordpress.com/2011/10/18/freakonomics-cheating/

LikeLike

Reply