Miriam Kurtzig Freedman, an attorney who represents public schools in education matters, including testing and special education—and is currently working to reform special education—posted this comment. Her website is http://www.schoollawpro.com.
Can we really use student tests to measure teacher effectiveness?
Miriam Kurtzig Freedman, M.A., J.D.
This is the year! Tests related to the Common Core State Standards (CCSS) are launching across our country. They are designed to measure how well students are learning the CCSS. Meanwhile, some states, with federal encouragement, plan to use them also to measure teacher effectiveness. Is this use valid?
There is no shortage of controversy about educational testing and, unfortunately, this controversy includes the opportunity to file lawsuits. The use of student achievement data to also evaluate teacher effectiveness is certainly controversial. Notably, Arne Duncan, the Secretary of Education, gave states a year’s reprieve on implementing this practice. Across the country, teacher unions have called it unfair. My concern is far more basic. It’s about validity.
As an attorney who has represented public schools for more than 30 years, I am concerned about this multipurpose use. It may not get us what we need—a valid, reliable, fair, trusted, and transparent accountability system. The tests at issue include the PARCC and SBAC, two multi-state consortia that are funded by the U. S. Department of Education and private funders. They were charged with developing an assessment system aligned to the CCSS by the 2014-15 school year.
At last count, these consortia have 27 states and the District of Columbia signed up— affecting 42% of U.S. students according to Education Week.
The media remind us constantly that our ‘failing’ schools need fixing; that, to do so, we should assess student skills and knowledge to help teachers improve instruction; that we also need to evaluate and rate teachers and weed out poor performers. And we are told that these tests can be multipurposed to do all of the above!
Sounds good? Actually, it sounds too good to be true. Does this multipurpose use to evaluate teacher effectiveness clear a key psychometric hurdle: test validity?
What is test validity?
At its core, it is the basic, bedrock requirement that a test measure what it is designed to measure. Thus, if a test is designed to measure how well 3rd graders decode, we judge the test according to how well it does that. Can students decode? If it is designed to be predictive; say, to measure if students are ‘on track’ or progressing toward college or career-readiness, we judge it accordingly. Either way, we must ask if a test whose purpose is to measure what students learn or whether they are ‘on track’ can also be used to measure something else—such as how well teachers teach?
So what are these tests’ purposes? For answers, let’s review the PARCC and SBAC websites. First PARCC, the Partnership for Assessment of Readiness for College and Careers:
PARCC is a group of states working together to develop a set of assessments that measure whether students are on track to be successful in college and their careers. These high quality, computer-based K–12 assessments in Mathematics and English Language Arts/Literacy give teachers, schools, students, and parents better information whether students are on track in their learning and for success after high school, and tools to help teachers customize learning to meet student needs.
PARCC is based on the core belief that assessment should work as a tool for enhancing teaching and learning. Because the assessments are aligned with the new, more rigorous Common Core State Standards, they ensure that every child is on a path to college and career readiness by measuring what students should know at each grade level. They will also provide parents and teachers with timely information to identify students who may be falling behind and need extra help. [Emphasis added]
Second, the SBAC, Smarter Balanced Assessment Consortium:
The [SBAC] is a state-led consortium working to develop next-generation assessments that accurately measure student progress toward college- and career-readiness. Smarter Balanced is one of two multistate consortia awarded funding from the U.S. Department of Education in 2010 to develop an assessment system aligned to the Common Core State Standards (CCSS)by the 2014-15 school year.
The work of Smarter Balanced is guided by the belief that a high-quality assessment system can provide information and tools for teachers and schools to improve instruction and help students succeed – regardless of disability, language or subgroup.
Smarter Balanced involves experienced educators, researchers, state and local policymakers and community groups working together in a transparent and consensus-driven process. [Emphasis added]
Clearly, these tests’ purpose is to (a) measure student progress on the Common Core State Standards (CCSS) and college or career readiness, (b) give teachers and parents better information about students, and (c) help improve instruction. No mention is made of gauging teacher effectiveness.
Yet, questions about the validity of using these tests in this multipurpose way seem to be missing from national discussions, even as other validity issues are raised. For example, questions are raised about score validity when tests are administered in different ways (on a computer or with paper and pencil) and at different times of the year.
Also discussed are questions about whether these tests are aligned to the CCSS. The media reports battles among states, unions, and others about how to measure teacher effectiveness through these tests; e.g., through value-added models, student growth percentages, or other approaches. But, questions of basic test validity from the get-go about this multipurpose use of these tests are not part of today’s public discourse.
They should be.
If we continue on this track of creating high stakes for teachers with tests designed for a different purpose, we may well end up with unintended consequences, including distrust of the system, questionable accountability, and lawsuits.
My suggestion? Given the reprieve for states and growing concern among the public about these tests and the CCSS themselves, test consortia and our federal and state governments should take a deep breath and do two things.
First, the consortia should remind the public that the purpose of these tests is to measure student achievement on the new CCSS and career and college readiness, provide better information to teachers and parents, and improve instruction.
Second, the states (with federal approval and encouragement) that intend to use these results also to evaluate teacher effectiveness must inform the public explicitly about how they intend to validate the tests for this new purpose. They need to provide solid proof that their proposed use, which differs from the stated purpose of these tests, is valid, reliable, and fair. The current silence is worrisome, not transparent, and unwise.
This test validity issue needs to be fully aired and resolved satisfactorily before we can begin to tackle the larger issues about the multiple uses of testing. Otherwise, in our litigious land of opportunity, the ensuing battles may be costly and not pretty. Let’s not go there.
Reblogged this on Exceptional Delaware and commented:
I can answer this with a resounding NO!
I am not in education but I am following this “situation” as closely as I am able to with great interest. It has caught the attention of many no doubt. So I have a question emanating from Chancellor Tisch’s attempts to explain why this testing is important and now that I have heard her explanation a number of times in different settings, as she changes her emphasis from sound bite to sound bite, I’m starting to understand some of the underlying issues which she appears to be obfuscating.
If I go to the doctor and he/she orders a battery of blood tests to help diagnosis what my health issues are (and they are many), the doctor receives about two or three pages worth of test results outlining what the lab was testing for (i.e. – sodium, potassium Creatine, BUN, HDL, LDL, etc. etc. etc.), what the normal reference range for that item is, what items are in range, what items are out of range – etc.
I get a copy of the report and the doctor gets a copy within a day or two. The doctor reviews all of the “out of range” results with me and we set up a strategy to deal with the problems such as taking a Vitamin D supplement, changing my diet, going for more in-depth testing on some of the items, taking a CT scan of a kidney, etc. etc. etc.
My doctor doesn’t get “written up” if some of my test results are worse than they were the last time I was tested, if fact, he/she gets a little more involved with my healthcare now that he/she knows much more about me. Therefore, the way I see it, the blood test is a diagnostic tool for me and my physician to plan a strategy to move me toward a healthier life.
If the both of us had received a postcard with a number 2 on it about six months after I had taken the blood test, I could be dead.
So why is Chancellor Tisch saying that these tests are important diagnostic tools if the child, the parent, and the teacher never get to see the specific results? And, if the tests are, as she seems to be touting, valid and reliable diagnostic tools, why shouldn’t ALL schools want to participate, including those public schools in the wealthier districts and especially the private schools where I suppose the children in her family attend? As I seem to be realizing, Obama’s children, Arne Duncan’s children, Cuomo’s children, Tisch’s children, Christie’s children don’t attend schools subject to this testing regimen?
Am I missing something?
Tisch is clueless. I doubt she has even see one of the Pearson math and ELA tests that she is touting. She is regurgitating shopworn talking points that have long lost their credibility with educators and parents. She (and Cuomo) own the mess they created and she is desperate and confused in face of a parent revolt that is now reaching epic proportions.
It’s too late. The reformers are using the courts to push their agenda and the resistance must go to court too. We can’t sit by waiting while someone the public never hears debates these issued in the dark to determine if these tests are valid and useful.
I say, flood the legal system and file as many court cases against the corporate eduction reformers as possible—we can start with David Coleman (treason, racketeering and conspiracy), Arne Duncan (treason and conspiracy), Bill Gates (treason and conspiracy) and the Waltons (treason, racketeering and conspiracy).\
If teachers can be sent to prison for changing answers on bubble tests, what should prison sentences be for subverting democracy with bribery at every level of government and ignoring the Constitution and Bill of Rights?
18 U.S. Code § 2381 – Treason
Whoever, owing allegiance to the United States, levies war against them or adheres to their enemies, giving them aid and comfort within the United States or elsewhere, is guilty of treason and shall suffer death, or shall be imprisoned not less than five years and fined under this title but not less than $10,000; and shall be incapable of holding any office under the United States.
The illegal act is the conspiracy’s “target offense.” Conspiracy applies to both civil and criminal offenses. For example, you may conspire to commit murder, or conspire to commit fraud. Conspiracy generally carries a penalty on its own. In addition, conspiracies allow for derivative liability where conspirators can also be punished for the illegal acts carried out by other members, even if they were not directly involved. Thus, where one or more members of the conspiracy committed illegal acts to further the conspiracy’s goals, all members of the conspiracy may be held accountable for those acts.
Think of all the money that Gates, Eli Broad, the Waltons and those hedge fund billionaires have spend to influence the actions of public officials. For instance, Governor Cuomo, who is guilty as the devil of crimes against God.
Bribery is the practice of offering, giving, receiving, or soliciting something of value for the purpose of influencing the action of an official in discharge of his/ her public or legal duties. Bribery is a gain to an illicit advantage. Federal statutes refer to two classes of offenses: graft and bribery.
http://bribery.uslegal.com/federal-laws-on-bribery/
Exactly correct. Teachers NEED some legal groundwork to slow this test & punish fraud. Validity is the foundation of accurate assessment and if someone can’t make the case that these tests are invalid they haven’t been looking.
While the reformers are establishing a legal framework in courts the national teacher’s unions are making campaign calls for Democrats & debating the Common Core.
The whole premise of this argument is flawed in that it assumes a fair or meaningful or trustworthy motive for teacher evaluation practices today. Gerald Bracey and Susan Ohanian proved this was false years ago. The motives are nefarious and are meant to destroy teachers and public schools.
Neither set of tests has yet, as far as I can tell, been properly evaluated for reliability. That is, do the tests provide CONSISTENT results for the same testees? A test that is not reliable pyschometrically cannot have validity for any purpose.
Reliability does not mean validity. I can get on a scale that constantly records my weight as 170 pounds. It will if I pick up my oldest cat record our joint weights at 176 pounds. But actually I way 190 and he weighs 12, for a total of 212. Thus although the scale consistently measures, and thus is reliable, any attempt to draw valid conclusions from the numbers it gives will fail – it is not valid.
How about we determine whether or not these tests are even reliable before we start arguing about whether or not they are valid even for their intended original purpose?
Because, if a test isn’t valid it doesn’t matter if it is reliable. Tests can be reliable yet still be an invalid measure.
Exactly! Reliability and validity are intimately entwined. See Wilson’s essay review of the testing “bible” to help understand why: “A Little Less than Valid: An Essay Review” found at: http://www.edrev.info/essays/v10n5.pdf
I’ve always said, as soon as teachers start suing over their evaluations this whole thing will go down the drain as too costly. Just like the tests will go away as soon as parents stop allowing their kids to take them, the evaluations will go away as soon as teachers start suing over their obvious misuse and misinterpretation.
Teachers here in Florida sued and lost. The judge acknowledged that VAM was crazy and unfair but he declared it constitutionally legal.
The seven teachers filed their suit based on constitutionality issues. They were evaluated based on the scores of students that they didn’t teach and questioned the valdity of such an arbitrary, irrational and unfair evaluation system
A lawsuit based on “harm” (dismissal. denial of tenure) would be subject to an entirely different set of legal benchmarks and would probably be much more winnable.
https://feaweb.org/teachers-file-federal-736-lawsuit
That may be but the FL case was under the careful control of the FEA and its highly paid labor attorneys. Why they thought the constitutional angle was the way to go is beyond my knowledge and understanding.
I know of several teachers who have been fired and a few who will be losing their teaching credential this year due to VAM. I am hopeful that more lawsuits will follow.
My local says they are in the pipeline and it will take years to litigate. In the meantime there are bills to be paid and families to feed and support and no one that I know of provides support for those in the process of litigation so I’m not placing my hopes on that particular solution in the short term.
The questions asked in this article are legitimate and beg to answered. My experience as an ESL teacher allowed me to participate in the validation process of a standardized ESL test. It was a thorough process that took almost five years of administering the test to different groups at different times to establish norms. From what I understand the CCSS related tests are not standardized. My question is how did they determine what are appropriate grade level expectations? It has been stated that the reading passages are on a frustration level for most students, and this results in about 2/3 of the students failing the test. It seems to me the tests are failing the students, not the other way around. Likewise, there are problems with ambiguity, format and content with the math tests as well. How do they support the claim that the failing students are not ready for college and career? Not all careers have the same expectations. I have seen struggling readers become successful mechanics, even though I know a student I taught would have failed this test. What is the point of sorting people into winners and losers? How does this improve instruction?
As far as the CCSS related tests being useful to teachers, these are more questionable claims. The teachers receive nothing back from the tests to make the tests useful or diagnostic. By the time the teachers receive the results, they are not longer teaching the same cohort. It is even more absurd to assume these tests can evaluate teachers’ performance. These claims are false and have been discredited by statisticians that were only willing to concede a range of 1-14% reliability. The powers trying to hold public schools hostage need a serious dose of reality, and we may have to go to court to get it.
RT,
“My experience as an ESL teacher allowed me to participate in the validation process of a standardized ESL test. It was a thorough process that took almost five years of administering the test to different groups at different times to establish norms.”
But the test still suffers from all the epistemological and ontological errors and falsehoods (fudges as Wilson calls them) that render any conclusions COMPLETELY INVALID. He addresses the processes. Those psychometric fudges in that process appear to legitimate the test but they can’t change the fact that at the conceptual basis (epistemological and ontological) they are invalid. That is what Wilson proves. Notice that no one, not a single psychometrician or any other statistician, test maker, researcher etc. . . have ever rebutted or refuted his work.
There is no validity of the process.
“Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.”
The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
“The use of student achievement data to also evaluate teacher effectiveness is certainly controversial.”
“Student achievement data”?
How about calling the ‘student achievement data” what they actually are: test scores.
Then people can read what actual experts (eg, at ASA) have said and decide for themselves whether the use of such scores for that purpose is “controversial”.
“Teach the Controversy”
The controversy’s contrived
By those who push the VAM
It’s incontrovertibly hived
By busy bees for jam
I hope the teacher unions get ahold of this and start shaking it like a terrier shakes a rat.
As a teacher, taking legal action is very risky. You see half of my evaluation is based on student test scores, the other half is based on administrator observation. If I’m seen as a trouble maker for speaking out against the system it could very easily cost me my career. All they need to do is rate me as ineffective. I feel I need to keep my mouth shut and play the game. After all, I’ve got kids to teach. Let the self-appointed experts make a mockery of what I do every day and I’ll keep teaching my kids the right way.
If the tests are designed to measure college and career readiness than how can we know if a 3rd grader is ready for college until we find out 9 years later if he was accepted, and how he performed? And what happens if he did poorly on the Common Core Tests in 3rd grade but gets into Harvard? Who gives that teacher the credit then?
So true. I had a Russian student in ESL K-3. He was very immature and presented like a slow learner. He struggled in English reading, but he also went to Russian school on the weekends and was learning to read in Russian at the same time. This is a bad practice, but we had our marching orders so we forged on. He was in compensatory reading for a couple of years after he finished ESL. I lost track of him, but met his mother several years later. She informed me he was finishing his Ph.D. in biochemistry from Columbia. The lesson here is not to make assumption about people’s potential. Humans are dynamic, not static.
The tests may say they measure college & career readiness but that doesn’t mean they do. Buyer beware. Every standardized test should be peer reviewed by assessment experts so school personnel, psychologists, physicians, etc can make informed decisions before buying. Quality assessments are repeatedly updated, piloted, normed & re-normed. Has this been done with the CC?
You should find the technical data including validity & reliability for all published tests here http://buros.org/mental-measurements-yearbook
jcgrim,
Thank you for that site!
The use of national tests is basically a political one. Of course the tests are technically valid and reliable. The terrain upon which the anti public school groups have chosen for their war is political and that is the battle field pro public school forces should use if they have any hope of winning this war.
“Of course the tests are technically valid and reliable.”
Unless your being facetious, NO they are not “technically valid and reliable.
Is this you, NY teacher? It’s brilliant:
Perdido Street School: NYSED Can’t Come Up With The Data To Show Why Sheri Lederman Is “Ineffective” On The APPR Test Component
Comment from NY teacher:
“NYSED has a legal vulnerability regarding testing that could easily bring down the use of scores for teacher evaluation.
Two simple and related questions to ask the Pearson test developers:
1) What evidence can you present that the Pearson math and ELA tests were developed for the purpose of accurately and reliably measuring “teacher effectiveness”?
2) Please explain how the Pearson math and ELA tests feature “instructional sensitivity”. Provide evidence to show that all outside influences (such as prior knowledge or pre-requisite skill deficiencies) were nullified through proper item development, so as to insure each and every test item related to the specifics of individual teacher/grade level instruction.
The bottom line is that these tests were never intended to measure teacher effectiveness and Pearson has zero evidence to present in such a defense. In fact, Pearson would have to admit that NYSED did not request ‘instructionally sensitive’ tests.”
http://perdidostreetschool.blogspot.com/2015/02/nysed-cant-come-up-with-data-to-show.html
And I would add one more question for Pearson:
3) What evidence do you have that supports the use of scores from your math and ELA tests to accurately measure the “effectiveness” of anyone who teaches a subject other than Common Core math and ELA?
Note:
Any so-called brilliance detected in my post at Perdido Street School is directly attributed to the professional and relentless efforts of frequent commenter, Laura Chapman,
This is weird. I went to Ms. Freedman’s web site and could not find the original post as I wanted to reference it for a friend.
Is it possible that Dr. Ravitch got a preview of an upcoming post? If so, that would explain it as “the google” can’t find it either. Or was this sent to her only?
If the post has a direct link, could Dr. Ravitch provide it?
Thank you in advance.
Education by SOFTWARE doesn’t work. DUH….learning is more complex than any software can muster. There are LIMITS to software. Gates will never say this outloud. He profits by the lie.
The PARCC and SBAC tests cannot be “validated” as suitable for evaluating teachers. They are not, to use the language of testing, “instructionally sensitive.”
This means that the even with the fanciest of statistical maneuvers, there is no unequivocal way to claim that any changes in student scores (and these are always expected to be gains) can be attributed to a single teacher.
The most common methods of attribution, so-called value-added models/ metrics–VAM–are known to violate many assumptions that are required for the statistical estimates that are used. You can visit the website VAMBOOZLED to learn more about the fraudulent use of student scores to evaluate teachers.
About 70% of teachers do not have scores of the kind produced by the large-scale assessments–whether state-wide or multi-state like PARCC and SBAC. For those teachers, some few are being evaluated on other student tests.
A very large number of teachers have classes that do not yet have tests of the kind that statisticians want for VAM estimates of the “value-added” by teachers to the scores of individual students. Many of these teachers are assigned the school-wide score on PARCC/SBAC tests, usually the ELA test since literacy is supposed to be the responsibility of ALL teachers.
Many teachers who do not have teaching assignments that generate state-wide or regional test scores are being subjected to a process of meeting “targets” for learning set at the district level and on district-wide test. Tese tests are not routinely reviewed for reliabilityand they ae usually constructed around the easy-to-test parts of instruction. For example, teachers of visual art in a district may have settled on a single vocabulary test for each grade level. That is easy to score.
The real deal is that Pearson tests of students are NOT instructionally sensitive. Who says so? Pearson!!! Pearson does not know how to make tests instructionally sensitive. NONE of the major test-makers will verify that their tests are instructionally sensitive. Many, like Pearson, are sort of committed to doing some research. A Pearson expert says all of this in a document available at http://www.parcconline.org/sites/parcc/files/Instructional_sensitivity_memo_final(08 15 14).pdf This PDF includes a diagram that is a perfect example of circular reasoning that makes absolutely no sense.
Anyone who says “tests are objective” has offered proof-positive that he or she knows nothing about the process of constructing a test.
Here are a few of the conditions that must be met for any claim that standardized tests are “instructionally sensitive.”
1. The students enter school as a blank slate and they do so every year, slate wiped clean of prior learning. Every test score is based on one “interval” of instruction and learningby one teacher, and only one teacher. Otherwise, the test score may be influenced by the work of other teachers.
2. The teacher instructs each student in content that has absolutely no connection to anything that the student has ever encountered in any form, at any prior time. This means that the tested content/skills must be free on any and all influences from learning at home, from peers, from members of the community, from the mass media, from travel and so on. Statisticians try their best to eliminate all of those those possible sources of learning as influences on the test score of each student. They can’t. And that is one reason why variations in test scores are so strongly related to the education and income of parents and the “value-added” by those resources. At most, the variation in scores that can be attributed to specific teachers is from 1% to 14%.
3. Another way to talk about the problem of tests is that they are more reliable when the all content is “de-contextualized,” meaning free of biases. That means that the tests must favor literal over connotative meanings, and mostly conventional knowledge and procedures–strictly academic–not open to multiple interpretations based on personal experience, or learning beyond the cocoon of the classroom. Any “other instruction” and may amplify learning, but is not intended by the teacher, becomes a good reason to question the validity of the test.
I will let others elaborate on the idiotic assumptions that govern the testing scene at the dawn of 21st century. Every state that uses test scores from students to evaluate teachers should produce peer-reviewed evidence that those tests are “instructionally sensitive.”
A memo from PARCC addressing the instructional sensitivity of theiir tests:
Click to access Instructional_sensitivity_memo_final%2808%2015%2014%29.pdf
I have already attempted to recruit attorneys for just this purpose. I was unsuccessful. No attorney in CT seem willing to take this on. Are there any attorneys out there willing? If so, please respond!
Bill Morrison,
Try Wendy Lecker. Or a law clinic at Yale.
Bill (Fixin’ to Test Rag) Morrison,
Attorneys will have little legal muscle (or interest) unless the tests have produced substantial harm to yourself and/or other teachers. Suing on a matter of principle is not how it works. The law and justice are two different concepts.
The tests are perfectly valid for their actual purpose: destroying public education. Very reliable too.
But is it a valid way to destroy it?…hmmm
a simple “no” will do. It seems that no one is talking about the fact that the kids who are taking the 3-8 tests don’t care. How can you evaluate teachers on a test without value to the test taker. It would be like firing the basketball coach based on the results of a pick up game on the playground.
The past 3 days I proctored the 7th grade NYS ELA Common Core test. Yesterday I had one little girl finish in a half hour. Today she wrote a whole essay on what she does every day. She even drew nice pictures to go with her essay. She told me she didn’t know how to answer the question so this is what she wrote. Some kids wrote 2-3 sentence answers. I feel sorry for the teachers who are going to be “graded” on kids like this and possibly lose their jobs. Can you imagine going to school and becoming a teacher to be judged in this crazy system? Many times parents can’t even get control of their own kids, yet teachers are going to be evaluated on how they do on tests. Maybe they’ll even be evaluated on students they don’t even teach. It is a sick and toxic syste. When are we going to realize that we have no control over other people and least of all what little kids do on tests. And, by the way, the rest of the day was totally shot. The kids were talkative, unruly and tired. I’ve even had parents ask if tests and quizzes in other subjects could be postponed because their kids are so stressed out from these tests. All we do now is test.
One day, those tests will get subpoenaed, and read to a judge or jury.
I think we need to “go there” to the land of litigation.
Who writes the test items that make whole schools “fail”?
Test Development Job Opportunities
Item Writers
Item writers construct passages and/or develop test questions for educational assessments. A bachelor’s degree is preferred with experience in item writing, teaching, or developing state standards, curriculum, or tests or test-preparation activities. Opportunities exist in the following content areas:
English Language Arts (Reading, Writing)
Mathematics
Science
Social Studieshttp://www.pearsonassessments.com/careers/test-development-job-opportunities.html
Yep, we knew the whole thing rested on a house of cards, but who’d have thought the cards were fake, too?
The students are on to the fact that these tests mean nothing in terms of grades, pass/ fail, or whether or not they receive academic support. Half of the 7th grade group I was proctoring left for a separate opt- out room, two refused the test so sat and read, and the students left began asking how they could opt- out. At this point they realized not everyone was being made to take the test so imagine how that affected their motivation. Three students closed their books and put their heads down after 20 minutes. I encouraged them to go back over the test, to no avail. They felt they had done well enough. Of the students who completed the test, two worked hard and the rest put down 3-4 sentence answers. Yesterday,on day 2 of the test, 3 came with opt- out notes.This means they will now receive a score in the 1 range because they took part 1 of the test, but will not complete the rest. As students turned in their work I noticed one of them( who was mad her dad would not opt her out) had scribbled in all her responses! How can these test scores possibly be used to evaluate teachers. This is truly insane.
.Your story is probably typical of how the opt out/refusal movement has corrupted the scores across the state; especially at the 7th and 8th grade levels; ages where students are not so compliant. By next week (math) it will get even worse as the opt out mentality goes viral.
One day, those tests will get subpoenaed, and read to a judge or jury
To all parents and teachers:
According to Dr.Miriam Kurtzig Freedman, M.A., J.D., please read her conclusion, as follows:
“This test validity issue needs to be fully aired and resolved satisfactorily before we can begin to tackle the larger issues about the multiple uses of testing. Otherwise, in our litigious land of opportunity, the ensuing battles may be costly and not pretty. Let’s not go there.”
Since Dr. Kurtzig confirms that in our litigious land of “OPPORTUNITY”, the ensuing battle may be “COSTLY”. Therefore, all of parents and Teachers need to control our emotion and only exercise our logical mind in ONLY HIRING “THE” lawyer who is confident and who volunteers to fight for a fee which is the result from his or her winning case without any “front load” cost to parents and teachers.
I am just saying that we should be aware of two words “OPPORTUNITY” and “COSTLY” that can work two ways “for” and “against” us. As NY Teacher points out that “law” and “justice” are two different concepts in the court because money can create all kinds of loopholes to work around the law, as well as money can intimidate justice’s outcome to favor party with money. It is sad but there is not much we can do about it! Back2basic