A reader, Charlene Williams, who holds a Ph.D. in clinical psychology, sent the following comment in response to this post about the vocabulary used on one of the Common Core tests:
This speaks to one of the essential issues in the current high stakes testing debacle. Why the Pearson, PARCC, and Smarter Balanced testing is unscientific and unethical. I am a psychologist, faculty at UCLA, and a mother in California. I hadn’t heard about these concerns with the current high stakes testing, until after I became very concerned with the developmental level of the SB practice items when helping my daughter (dutifully prepare for the tests).
The 6th grade ELA practice performance task for the Smarter Balance was completely inappropriate for 11-12 year olds, requiring them to toggle between several screens (on small Ipad screens), and choose multiple pieces of evidence to evaluate, select, paraphrase, compare and contrast, as well as write a multiparagraph essay. Never mind that while practicing, toggling back to the articles caused the students’ written work on the essay to be erased (lost).
Why the current high stakes testing is unscientific:
1) There is no proven Construct Validity (does your test measure what you think it measures)
2) Cut scores are determined by an unknown (arbitrary) process- labeling children as proficient, or failing appears to not be based on any scientific process. It is not scientific to arbitrarily decide what levels of your test scores actually mean in the real world. Scientific measurement requires cross-validation with external measures that provide evidence for your claims (like grades, or independent in-depth measures of children’s educational achievement in a a smaller sample with highly experienced evaluators).
3) Computer adaptive tests- there have been many concerns raised about how item difficulty has been decided. Children continue to progress on these tests if they continue to get a certain number the most recent answers correct. Educational measurement specialists (true academically trained professionals) and parents and children have observed that very often items following very difficult questions are significantly easier. This raises concerns that children’s scores are artificially deflated by unscientifically determined item difficulty determinations.
4) Inter-rater reliability- No checks exist to independently determine whether the scoring administered by these testing companies has truly reliable and valid measurements of children’s answers (see Todd Farley http://www.bkconnection.com/static/Making_the_Grades_EXCERPT.pdf )
Most importantly, the Pearson, PARCC, and Smarter Balanced testing is unscientific because they violate the basic rule of science. The assessments are not verifiable, because they are not permitted to be subject to independent scientific evaluation. Their validity cannot be proven nor disproven. Under the guise of “test security” companies use copyright laws so extreme they prevent true scientific evaluation of the validity of these tests, by scientists with expertise in the fields of Education, Psychology, and related fields.
So I am deeply concerned that the profit-driven testing business is using unscientific (and expensive) testing which is portrayed to the public as if it’s truth, with high stakes ramifications on children, teachers, and our public education system. As stakeholders and parents, we need to demand accountability, real science, and an ethical separation between profit-driven educational businesses and the true scientifically-based education and measurement. For the sake of our children, our teachers, and our educational system which is truly one of the foundations of our democratic country.
Reblogged this on David R. Taylor-Thoughts on Texas Education.
There is expert opinion and then there is hearsay.
Common Corporate tests are that second thing.
All this testing craze is really, really destructive for many, many reasons. One = THEY ARE PLAIN STUPID.
I knew the sleeping giant–parents in California–would awaken to this mess. What will happen when the “results” (artificially set cut scores) reach them over the summer?
Alert!!!! Teachers are given access to test questions which are similar to ones on the test in order to prepare their students for the test. I am looking at a few examples that my child brought home. Oh my ______! It is absolute trickery! These are not fair questions. They are meant to fool and trick. How is this educational? David Coleman and Pearson need to explain themselves. We need to ask them who exactly is writing these test questions. Clearly they have no idea how the mind of a child works. They will break the spirits of children if we do not begin holding the writers of these test accountable.
roxanne,
Your observation:
“Clearly they have no idea how the mind of a child works. They will break the spirits of children if we do not begin holding the writers of these test accountable.”
is perceptive and why this type of testing is, in my opinion, a form of child abuse. What would be the reaction were something similar initiated by either a parent or an individual teacher?
I warned my elementary age daughter that some answers to the test questions would purposely try to trick her. Her response? That’s not very fair. Why would they want to do that to us? My response: I don’t know, but I agree…it’s not fair.
Roxanne,
I would be very interested in seeing the questions from the practice test your child brought home. Can you post some of them here?
Not only are the cut scores on these exams not scientifically determined, but they are in fact grossly political, intended to “prove” that students (and by implication their teachers and public schools) are failing, and thus justifying school closings and charter expansion.
It’s the hostile takeover of a public good by a small group of immensely wealthy, venal, faux philanthropists, justified by spurious, politically-motivated pseudo-science.
And the cut scores are often changed from year to year, preventing any kind of “comparison” anyway.
We need many, many, many psychologists, educational psychologists, evaluators, researchers in the area of test & measurement, school psychologists, professional associations on testing…all, to step forward and address the standardized testing issues, publicly. We know so much more about testing children, than what the near-do-wells try to shove down our VAM threatened professional throats.
Good start today!
We need to get the writers of the exam to come forward and reveal themselves. We need to ask them what rational did they use to write these test. Specifically we need to get David Coleman, Jason Zimba and Phil Darrow to the table. How do we do this?
Experts in child development need to analyze the demands placed on the K-2 testing. Requiring young children to work on a computer for extended periods is wrong and may even be harmful. Expecting young children that can barely visually track to work from a split screen is absurd, and it may even be abuse.
Diane, the leader behind the tests is David Coleman. How can we get these leaders to respond to the public backlash? I would like to speak with Bill Gates, Phil Darrow, Jason Zimba and David Coleman to ask them more about their thought processes. How in the world do they think these tests are fair? Please let us know how we can get in touch with the writers so that we can give them our parental expert opinion.
Roxanne,
You could look up the address for the College Board in New York City. David Coleman is its CEO. Write to him.
SBAC ELA passage on the 4th grade test had a lexile of 1140, grade equivalent of 8.3. Fair?
But if we make the tests harder our children will learn more. Ass-backward thinking if every I had heard it.
I agree. Or BassAckward, either way, same dif.
Remember the soft bigotry of low expectations? There is SOME truth in that if we expect our kids to do nothing then they will learn nothing – but that doesn’t justify setting the bar so astronomically high that only the best of the best can succeed.
Somewhere in our system of differentiation we became schizophrenic as to what we want to do with our tests.
On one hand, we want them to measure a minimum level of competency so that we have some idea as to where in the system their education breaks down if it does and that they are indeed prepared for the next level of material. Enter NCLB.
Then, when we started teaching towards the basic minimum because that’s what the tests demanded, we had low expectations. Enter tests that expect way too much.
Now our kids are failing, but the bar is so high we can’t tell whether they are still meeting a “minimum” level of competency because it’s so far above most students natural developmental abilities.
So we expect testing to verify basic minimum competency, while setting a high standard so we don’t dumb down teaching and teach towards the minimum, and to evaluate teachers based on this extremely high minimum level.
We expect these tests to do an awful lot don’t we?
In 10 years of administrating these tests, I have not learned ANYTHING about my students that I didn’t already know through my daily interactions with them.
Nicely put! We pretend that these end-of-year high-stakes tests actually measure individual student learning. They don’t. We pretend that these tests tell us how effectively individual teachers perormed in the past 9 months. They don’t. These tests allow us to rank schools…and oddlly enough, when we rank schools by test scores, we find that this corresponds to a ranking of zip codes. Let’s abolish all this pretending and get back to real teaching and real learning. We have a lot of really, really smart people out there – can’t someone find a fair way to measure student learning and determine teacher effectiveness authentically???!!!
yes. The direct costs of these tests are enormous–1.6 billion a year in state testing contract costs alone–but the opportunity costs are even greater. Those–incalculable.
Re raising the bar . . .
Most of us cannot slam dunk a basketball. But if the height of the basket in basketball were to be raised, then more of us could slam dunk the ball, IF only the coaches were more “Effective”.
I love this basketball analogy. Do I have your permission to use it? TIA
ROFLMAO!
The writers are holding our children to the standards set in Asia. This is unfair and unrealistic. The punishment from not towing the line in Asian culture is strict and sometimes severe. You can force a child to learn anything if you “force” them. However we know too much control and corporal punishment in the early years does not lead to self actualization personal happiness. Our mostly white reformers look at the scores of these children and assume we need to match their performance. Little do they know about the oppressive psychology and motivators that help crank out the scores. I can say this confidently because I am part Asian. Superficially they recognize only the surface of the scoring. I’m not surprised because while they believe themselves to be deep thinkers I find their understanding of children to be “SUPERFICIAL”.
“The writers are holding our children to the standards set in Asia.”
NO!, They are not doing that. Where did you come up with that idea (which is absurd)? Do you have a source for that statement? If so, please share it!
The 4th grade sbac ELA left was way too difficult for the majority of the kids in my class, most of whom live in poverty; but the common core pushers assure us that they’ll all get there. Sickening.
Here’s a study which analyzes there assumptions:
“we may be hastily attempting to solve a problem that does not exist and elevating text complexity in a way that is ultimately harmful to students.”
From: Challenging the Research Base of the Common Core State Standards: A Historical Reanalysis of Text Complexity
The conclusions of the study were as follows:
1) The condemnation by the Common Core State Standards (CCSS) authors that school reading texts have become less difficult is inaccurate.
2) The effort to ratchet up text complexity seemed unnecessarily rushed. “According to a wide range of studies, the much more significant challenge is that many third and fourth graders are not reading proficiently at current complexity levels.”
3) The overemphasis on text complexity distracts us from educational problems that are arguably much more pressing, i.e. instructional quality, reversal of the decline in school funding, educational inequalities, and the socio-economic conditions that have consistently demonstrated as the major factor to the achievement gap.
http://www.academia.edu/4652539/Challenging_the_Research_Base_of_the_Common_Core_State_Standards_A_Historical_Reanalysis_of_Text_Complexity
Myth busters is a TV program with high entertainment value, and it is rich in scientific thinking…forwarding of hypotheses, ingenious schemes for testing these, and wowzy proofs or disproofs of outcomes or still puzzling results.
Some of the best demonstrations of the absurdity of these tests are from the ingenious satires contributed to this and other blogs and especially the detailed descriptions of parents who struggle to make sense of the testing interface, and content, and rationales for the test items. This is to say there are limits to the power of ” scientific” or logical reasoning alone in making a case for any cause.
There are peer review rating systems for standardized tests. These reviews supplement stats on norms, measures of reliability and claims for validity– construct, content, correlations, references and so on. Those should soon be available from PARCC and SBAC…if those consortia survive, and that is in doubt. Buros Mental Measurement volumes were was once a go-to source for independent reviews of tests, but these reviews were far short of the assumed rigor of a peer review process.
Now, as it happens, edTPA is in the process of being revised. This is a test that makes auspicious claims for reliability and validity in judging teacher performances from video snippets prepared by student teachers. The snippets show them teaching in a classroom they have commandeered for the sake of teaching for this test and becoming eligible for certification.
The field trials and claims of validity/ reliabilty for edTPA are based on as few as 11 student teachers in some subjects. There were 46 teachers in field trials for the visual arts, no information on how these candidates were selected from what institutions, in what grade levels, and so on. They were evaluated on lesson compliance with four broad domains of learning identified in visual arts standards written in 1994.
Now for the visual arts revision, the scholars at Stanford who own the edTPA tests distributed by Pearson are seeking suggestions for new categories of standards-based judgments, derived from brand new visual art standards, published in 2014, so new they are not integrated into teacher education and not yet incorporated in school programs.
The scholarly method for determining how student teachers will be evaluated–based on expert consensus on which new standards are in common use–is a survey monkey distributed to a higher education list- serve for art educators.
The money asks for judgments which no thoughtful person can make about the current normative practices and themes in visual art education, especially irrespective of grade spans and the fact that the new standards are dense, with references to 21st century Skills and the Common core in addition to art-specific content.
The monkey does not ask responders to identify their own role in teacher education or disclose anything about their qualifications to “validate” typical or normative practice for a high stakes test. The monkey does not indicate why or how the proposed modifications for edTPA have been placed into the survey. They have simply appeared from an undisclosed process.
I have communicated my concerns about this sham method of establishing “validity” to the lead person at Stanford. And for a cross check, I completed the survey twice… receiving a monkey thank you at the conclusion of the last entry
I am not currently in teacher education, I do not approve of this test or the outsourcing of the test administration to Pearson. I also think the field tests met the bare bones minimums for stats on reliability.
There are normative policies and practices in visual arts education. Among these are documented cuts to programs that are not picked up by the fast response surveys issued by the National Center of Education Statistics, the inexplicable classification of the arts as a “technical” subject in the Common Core, the use of SLOs for rating teachers of untested subjects, or the rating of art teachers by the school-wide scores on reading.
Even if these tests were “valid” and “reliable” (for what?), that would not mean they are good idea.
“Of Bugs and Rugs”
DDT can kill the bug
Which doesn’t mean we ought
Valid science, sometimes rug
To cover up the rot
Like.
Even if these tests were valid and reliable (they are neither), they would have a) no pedagogical value whatsoever and b) lead to narrowing of curricula and pedagogy and c) stop innovation in both curricula and pedagogy cold (because everyone would be, as they are, teaching narrowly to the test and doing nothing new).
Aie yie yie! Diane, this deserves a separate post!!!!
You all should take a look at the exam questions if you can get access to them. It will blow your mind. Challenge is good. Trickery is a crime. It will kill the confidence in our children. Stop them now!
The sample release questions are abominable, and yet, of course, they pull the best questions for these in order to put their best foot forward for marketing purposes. Imagine, given how dreadful the released questions are, what the ones we can’t see look like!!! My students would always come back from their testing MAD–furious and frustrated. They all sign statements saying that they won’t talk about the questions, but they are usually so furious that they just can’t contain themselves. Having reviewed a lot of the sample release questions, I know just what they mean. Here in Florida, selling swampland to Yankees used to be a big business. Well, these testing companies are selling the equivalent. And you don’t even get to look at the test so you can see for yourself what a scam it is.
The membership of the SBAC technical advisory committee are available here: http://www.smarterbalanced.org/wordpress/wp-content/uploads/2014/08/Biographical-statements-for-Technical-Advisory-Committee-members.pdf.
A rogues gallery of Vichy collaborators with Education Deform.
For years, we have been shouting about the flaws in common core, about the flaws with using defective statistical measures to evaluate Teachers. Professors, Teachers, Parents, Students all agree with the problems created by this common core but no one pushing this core seems to be listening!
We have tried to get their attention by “opting-out”. The “reformers” have countered by threatening to withhold State and Federal funding. If they follow through with their treats, we will have to step it up a notch until our voices are heard!
I wish that every kid in the country, next year, would write on his or her standardized, high-stakes ELA test:
“My mind is not standardized enough to formulate the requested responses. Please ask one of Mr. Gates’s computers to do this.”
Thank you, Charlene Williams, for this debunking.
Frankly, it’s a shame that this debunking is even necessary. The lack of “construct validity” should be obvious.
The pretense of science in the standards and the tests is pathetically thin.
What are these tests supposed to be measuring? (crickets)
Aaron to answer your last question.
They don’t measure anything. They are not measuring devices, they are supposedly an assessment device (and a piss poor one at that).
Yup. It’s pseudoscience. It’s numerology. And that’s so darkly, darkly ironic, isn’t it?
But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.
Jefferson knew!!!
Reblogged this on edukait and commented:
What are we testing? Our students? or Pearson’s ability to come up with a valid and reliable test?
“Pearson’s ability to come up with a valid and reliable test?”
No one has the ability to come up with a valid and reliable standardized test. Even teacher’s class tests are suspect in those regards.
But teachers’ grades have more predictive validity (with regard to college grades) than does the SCCAT (the Scholastic Common Core Achievement Test) or whatever Lord Coleman is calling the SAT these days.
For all the regular folks here, you knew this had to be coming!!
“Why the current high stakes testing is unscientific.”
It doesn’t matter if the tests are current, past, or future nor does it matter if “high stakes testing is unscientific” as Noel Wilson has already proven the COMPLETE INVALIDITY of the process of developing educational standards and standardized tests in his never refuted nor rebutted treatise: “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.”
The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
There is hardly a ‘slice of life’ left untested by Pearson.
Just viewed the product lists and lists of Pearson assessments.
Found out that if one stays long enough in our field, many of our colleagues along the way are publishing tests and selling them through Pearson. My former professors are colleagues know their ‘stuff’ and are highly respected. It is a shame that there is only one major testing company in town. Monopoly on Human Data!
Not to alarm the retired folks among us, but once we turn 65, every primary care physician or specialist whips out portions of elder functioning assessments, from…you guessed it, Pearson, and goes to town on you. Not to be too oppositional, I now refuse ANY such questions or tests. I tell them that any cognitive and/or executive functioning test done on me will be requested ONLY by me, and by the professional of my choice. The casual treatment and lack of professional request for permission is a major concern…esp. as I age. Remember, Rahm’s brother is out there talking about plugs, pulljng, stopping medical intervention at 75. I am sure we have many volunteers who would pull his plug.
Heads up to all Elder BATS.
Miles to walk before we sleep…
You are spot on! As a teacher I have participated in the administration and scoring of NCLB & CCSS exams for 10 years. I have routinely questioned the validity and reliabilty of these tests. In confusing multi-step questions, we do not know which standard was the focus because the test questions are not released. (I bet the standard isn’t the one where students make their errors). In my experience, there is limited, if any, reliability in the scoring. We complete practice questions during training with annotations as to why they are scored the way they are (sometimes with faulty and contradictory relationales) and “quality assurance questions” but NO inter-rater reliability data has ever been collected. Sure we have “read behinds” but even this isn’t scientifically done. When confused as to how to score a paper (because kids will write stuff that’s not inthe training material) we consult one another. And by the end of a long day scoring, our minds are mush and it’s hard to know if the first and last papers have been scored similarly. There are so many reasons why these tests are bad…
To understand all the errors, falsehoods and problems with every aspect of standardized testing from conception to the grading of and dissemination of result read Wilson’s work referenced above. There is no “quality control” whatsoever in the process as the process itself due to the many epistmological and ontological errors is conceptually bankrupt.
The afflictions of the “system” will not be cured with a heavier dose
of the system.
Love this point: “There is no proven Construct Validity (does your test measure what you think it measures)” The test-makers CLAIM (and everyone seems to take them at their word) that these are “smarter”, “better” tests that don’t measure bad, old factual knowledge but instead measure higher-order thinking skills. Where’s the proof?
Thanks Diane for posting my comment (or re-posting it in your blog). I appreciate all that you’re doing to help us save education from the profiteers.
Another aspect of validity is consequential validity, meaning what are the consequences of your testing. Particularly what is the potential benefit vs. harm of your testing. I think the benefit is to the profit-driven “reformers” who see big bucks in taking over education. The harm is to children, teachers, schools and our democratically-based educational system.
I think it’s no coincidence that in one of the Smarter Balanced ELA 6th grade practice tests, the three words in footnotes at the end of one of the source articles were “lobbyist” “venture capitalist” and “legal limbo.” Oh- all of which were portrayed in glowing ways in the article!
Thank you, Dr. Williams, for your superb post.
One could make quite a long list of the ways in which these tests fail with regard to validity. A couple of these ways:
The ELA tests fail with regard to prima facie construct validity
a) because the standards [sic] that they supposedly measure are too vague and abstract to be operationalized, rationally, and so validly (or reliably) tested;
b) because the number of items per standard (one or two) are insufficient to test for meeting a standard that is very broad,. or general; and
c) the items are typically so badly written that they don’t test for what the test authors thought they were testing for. (In the ELA tests, their makers wanted to use multiple-choice questions to test for “higher-order thinking skills,” because those are easily scored by machine, and in order to do this, they hit upon having the wrong answer choices be “plausible.” Well, what does “plausible” mean? It means “reasonable.” So, reasonable answers are judged to be incorrect. I kid you not. THIS INSANITY IS NOW THE STANDARD PRACTICE IN THE ELA TESTS! In practice, the questions are so sloppily written that no answer is actually correct or more than one answer is, arguably, the correct answer. And, of course, the subjective nature of literary response, plays into this.
The traditional way to determine validity of standardized tests is to correlate results on them to results from some other accepted measure of the standard. So, where are the studies that the test publishers have done to show that the two items on the test that measure proficiency with regard to standard y correlate with some other accepted measure of that proficiency? Well, there are none. So, the tests don’t demonstrate convergent validity either.
I could go on and on explaining other ways in which these tests fail to be valid, but I’ll stop there.
These tests are a scam. They are numerology.
cx: the number of items . . . is, ofc
I just tried to quote the last paragraph of Charlene’s comment, with the citation to this blog post, to fb, and I was blocked! “This message contains content that has been blocked by our security systems. If you think you’re seeing this by mistake, please let us know.” I did, but now what? I’ve never seen this before.
OMG, Sheila!!! That’s really awful. We should do a test of this.
What was the exact content that you tried to post on FB? And why are they censoring this?
Now fb says:
“You can’t post this because it has a blocked link
The content you’re trying to share includes a link that our security systems detected to be unsafe:
https://dianeravitch.net/2015/04/29
Please remove this link to continue.
If you think you’re seeing this by mistake, please let us know.”
Sheila, it worked for me.
That’s good! I just was able to post it, too. I don’t know what that snafu was about.
“Computer adaptive tests- there have been many concerns raised about how item difficulty has been decided.”
I did a presentation in Guatemala City to the Guatemalan Reading Association showing that there was no correlation between these tests and the NY City and State standardized tests.