The San Antonio Express-News published a blistering editorial calling for a halt to state testing until all the errors and computer glitches were resolved. This may mean forever, given the track record of testing companies that produce online assessments.
Fifty superintendents from the Houston area wrote a letter to the new state superintendent Mike Morath outlining the problems their students and teachers had encountered.
As the editorial states:
There are inherent problems in any massive project, but this is no simple undertaking. The STAAR test — the State of Texas Assessment of Academic Readiness — is high stakes. The scores impact schools, teachers and students. Failing grades can cause students in the fifth and eighth grades to be held back, and high school students who don’t pass three of the five end-of-course exams will not get a diploma. Teachers’ evaluations will be based in part on how well students perform on the STAAR test.
Until all the problems are resolved, school administrators are asking the Texas Education Agency to delay use of scores for the alternative test for students needing special accommodations due to a learning disability. They make a valid point.
It appears that the state’s new testing vendor, New Jersey-based Educational Testing Services, commonly referred to as ETS, was ill-equipped to take on the four-year $280-million contract. There is no excuse for the company to ask test takers not to answer a question because there was no correct answer or having to scramble at the last minute to certify personnel to grade the test.
School districts can ask that tests be re-evaluated, but that action comes at their own expense. Lewisville ISD appropriated $50,000 to have thousands of English tests retaken by their high school students after many high performers scored a zero on that portion of the test. School districts should not be forced to pay that expense because the state made a bad call when it awarded the testing contract.
There is something terribly amiss here, and it needs to be fully resolved before the test scores can be given much weight. Morath has said ETS will be held financially liable for the problems and could lose the state’s business if the issues are not adequately resolved. That is good news for Texas taxpayers but does not adequately resolve all the issues.
Too much is at stake to merely assure everyone it will be done better next time. The state should not go forward with a testing system few have confidence is working properly. There is no do-over for students who get held back, the high school seniors who won’t walk the graduation stage or teachers whose careers are damaged.

Well considering that it has been known for almost two decades that the educational standards and standardized testing regime is chock full of errors, falsehoods and psychometric fudges that any results gleaned from the process are COMPLETELY INVALID why the hell would we continue using such educational malpractices???
For those reading who haven’t heard of Noel Wilson I give you my summary of his never refuted nor rebutted (hell, you’d think that the people in the organizations-APA, AERA & NCME who produce the testing bible “Standards for Educational and Psychological Testing would address his issues in the latest version, but no) “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other words all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
LikeLike
Mike Morath is a charter man, and his appointed board are all of the same ilk. Not going to hold my breath.
LikeLike
You cannot get an accurate assessment of student ability or teacher competence from 30+ standardized questions, usually of poor quality and written by some faceless, far off being. We’re going through this in Ohio with Battelle for Kids off in the weeds focused on data collection, numbers, and models. What is lost in the process is teaching and learning.
LikeLike
Off in the Weeds. Yes.
Battelle for Kids should be considered a subsidiary of the Gates Foundation and SAS, the source of the value-added scores still used to rate teachers in Ohio. Battelle for Kids has received over $8.2 million for Gates initiatives. Here are descriptions.
Purpose: to expand value-added analysis at the high-school level, provide support to accurately and effectively produce, share and use value-added data, and manage a data warehouse that will be used for research: $4,989,262
Purpose: to support the scaling of effective teaching practices among members of the Ohio Appalachian Collaborative (so-called personalized learning): $1,307,738
Purpose: to develop a web based, free version of Battelle For Kids’ Link Community Edition commercial linkage and verification solution, which allows educators to correct errors or omissions, and verify their instructional schedules and students taught: $855,766
Purpose: to build the infrastructure for a longitudinal value-added database at the high school level to measure contributing factors to students’ academic progress: $637,532
Purpose: to support the implementation of the Common Core State Standards: $249,808
Purpose: to improve understanding around the use of value-added analysis for educational improvement: $124,757
Purpose: to support a national conference on existing efforts to create differentiated compensation systems for teachers based on performance: $50,000
Battelle for Kids also receives funds from the Columbus (OH) Foundation and Vanderbilt University’s Center for Performance Incentives (merit pay). Additional funds come from fees charged for varied services, including the in-house data warehouse services.
Fees for state-mandated services include $5.50 per student for intake of school accountability data which is then forwarded to SAS for use in calculating value added scores for teachers.
This is a contract that shows this whole process is a “permitted by-pass” to FERPA regulations (parental consent for commercial use of test scores)…The district provides the data to Battelle for Kids. SAS provides Ohio’s EVASS calculations of teacher “value added.” The formulas are never disclosed. They are proprietary.
Here is a recent sign off by a school district. Notice the range of test scores scooped up in the “Agreement.”
Click to access Dec%2015%202015%20Exhibit%20D.pdf
LikeLike
Thank you for the excellent breakdown. Battelle for Kids should be considered biased and incapable of objectively participating in any teacher performance system. The ongoing VAM based data collection efforts are clearly meant to root out the suspected bad teachers like a high tech inquisition. None of Ohio’s VAM efforts can tell a teacher how to improve nor demonstrate a link between teaching methods and learning. The Ohio Teacher Evaluation System is a political kabuki dance where the outcome is predetermined – “teachers are terrible and we must reprogram them all”.
LikeLike
The tweaks, fixes and cure-alls of high-stakes standardized tests are always in one tense: the future.
And after many decades of use and misuse, the problems just multiply and get more difficult to resolve.
Google “pineapple” and “hare” and “Daniel Pinkwater.”
From the first hit:
[start]
Eighth-graders who thought a passage about a pineapple and hare on New York state tests this week made no sense, take heart: The author thinks it’s absurd too.
“It’s hilarious on the face of it that anybody creating a test would use a passage of mine, because I’m an advocate of nonsense,” Daniel Pinkwater, the renowned children’s author and accidental exam writer, said in an interview. “I believe that things mean things, but they don’t have assigned meanings.”
Pinkwater, who wrote the original story on which the test question was based, has been deluged with comments from puzzled students — and not for the first time. The passage seems to have been recycled from English tests in other states, bringing him new batches of befuddled students each time it’s used.
The original story, which Pinkwater calls a “fractured fable,” was about a race between a rabbit and an eggplant. By the time it got onto standardized tests, however, it had doubled in length and become a race between a hare and a talking pineapple, with various other animals involved. In the end, the animals eat the pineapple.
The tests can be used to determine whether a student is promoted to the next grade. Once new teacher evaluations are put in place, the tests will also affect teachers’ careers.
Pearson PLC, which created the test as part of a five-year, $32 million state contract, referred questions to the New York State Education Department. The department hasn’t returned requests for comment since Wednesday.
[end]
*Since this is the WSJ no rheephormer can dispute the veracity and accuracy of the piece.
Link: http://blogs.wsj.com/metropolis/2012/04/20/daniel-pinkwater-on-pineapple-exam-nonsense-on-top-of-nonsense/
So let’s do some real, not rheeal, world math. 1962 is the first publication of Banesh Hoffman’s THE TYRANNY OF TESTING in which he points out, among many other things, the absurdity of some standardized test questions. 2012 gives us a rabbit and a hare and shirt sleeves.
50 years!
😱
And they go from bad to worse.
😎
LikeLike
It is absurd to deny students a diploma due to their performance on one, flawed test. Some students just have a bad day; they may be poor test takers or highly neurotic. My guess is that high numbers of these students are poor, minority and have limited access to technology. This is patently unfair!
LikeLike
I just read an editorial in the Pittsburgh Post-Gazette excoriating a bill in PA that will allow schools to display In God We Trust in their halls as a distraction… and it struck me that bills like the “In God We Trust” legislation and standardized testing have the same impact: they eat up the time and energy of school boards and dominate media coverage on public education which diverts the public’s attention from the real problems facing public schools.
LikeLike
Ah, the eternal religious question: Whose god(s)??
LikeLike