Mr. Anonymous, an education policy analyst who is working towards his doctorate, wrote the following cautionary story about the use and misuse of statistics for political purposes. He requires anonymity for the usual reasons, mostly fear of retaliation for speaking up.
He writes:
The Common Core and Departments of Education: Lies, Darn Lies, Statistics and Education Statistics
Numbers have taken center stage in the discussion of education policy in the United States. Test score metrics have become a particularly critical set of numbers. They are seen as objective measuring devices, comparable across years, that provide a reliable evaluation of how students, teachers, schools, districts, and the United States as whole are doing. But are they really objective?
The push for implementation of Common Core exams has caught the attention of the public. In New York State, as in many other states across the nation, questions have been raised about the motivations of those pushing for the roll-out of these exams and their use in high-stakes evaluations. As we will see below such concerns are definitely legitimate given the history of the New York State Department of Education and the Board of Regents in setting cut-scores and changing exams in ways that serve political and other ends.
Let’s start with Biology, a standard course that almost every high school freshman takes. Remember dissecting that frog? In 2001 the New York State Department of Education changed the Biology Regents to a re-named “Living Environment.” A rather remarkable aspect of the change was the dramatic lowering of the passing score. In the Biology exam a student needed at least 59 points (out of a total of 85 possible points) to earn a passing grade of 65. On the new Living Environment Regents students need only 40 points (out of a total of 85 possible points) to earn a passing grade of 65. In some years (e.g. 2004) a student needed only 38 out of 85 points to earn a passing grade of 65.
The story repeats itself in mathematics. Until 2002 the New York State Department of Education required students to take a “Sequential Mathematics I” exam. That test had a total point value of 100 points. The conversion was simple enough, each point was equal to one point and a student needed 65 points to pass. Then, in 2002, the math exam was switched to a “Mathematics A” exam.
On this test students needed to score 35 out of a possible 84 points to earn a 65 and pass. Earning 42% of the possible points led to a 65. Then, in 2008, the math exam was switched again, this time to an “Integrated Algebra” exam. On this test students needed to earn 30 out of a possible 87 points to earn a 65 and pass. Earning 34% of the possible points now led to a 65.
The United States and Global History exams underwent similar changes at the turn of the millennium. Before the changes students were required to write 3 essays accounting for 45% of their final score. After the changes students were required to write only two essays accounting for only 35% of their final score. On one of the essays students are provided with extensive information they can use in their writing.
A couple of years later the exact same process occurred with the English Regents. In 2011 the New York State Department of Education changed the exam from a two part six hour test with two essays to a single part three hour test with only one essay. Again the cut scores were dramatically lowered. The scales on these two exams are very different making comparison difficult. One way to measure the change is to look at the grade a student would receive if s/he got exactly half the multiple choice questions correct and earned exactly half of the possible points on the essay(s). On the old English exam that student would have received a grade of 43. On the new English exam a grade of 50.
A year ago the New York State Department of Education changed things yet again. But this time they did not change the exam. They just changed the cut scores. From 2011 until 2013 out of 286 possible point combinations on the exam an average of 74 resulted in a passing grade. Then, in June of 2013, the number of point combinations leading to a passing grade was dramatically lowered by 23%. Since then an average of 63 point combinations out of 286 leads to a passing grade.
It is disturbing that this change occurred at the very moment when the test results would first be used to evaluate teachers. The research base shows that such value-added metrics are unreliable. For example a RAND report concluded “the research base is currently insufficient for us to recommend the use of VAM for high-stakes decisions.” A report out of Brown University concluded “the promise that value-added systems can provide such a precise, meaningful, and comprehensive picture is not supported by the data.” Nonetheless New York State passed laws requiring school districts to use test scores in teacher evaluations. Why, at the same time, did the Department of Education quietly change the cut scores on the English Regents? Is it an attempt to ensure that more teachers are rated ineffective? This would allow certain interest groups to declare the law a success and claim that “bad teachers” are now being identified and should be fired. Is it an attempt to create evidence that there is an epidemic of failing students in New York State? This would allow certain interest groups to proclaim that the crisis can only be solved if the new Common Core Standards are implemented without delay.
Advocates of the Common Core are either ignorant of or deliberately ignore this history. A decade ago New York State Department of Education decided that the high school graduation rate was too low. They therefore changed exams and cut scores to make them easier. The graduation rate went up. Now it seems that some powerful interests have decided that it is too easy to graduate. So they want the exams made harder and the passing cut scores raised. It is evident from the history reviewed above that playing with cut scores is not the way to improve education. After all that just leaves us in the very place we are in today. Yet we seem to be condemned to repeat this cycle all over again. We seem to be enamored of easy solutions. Make exams harder (or easier). Raise cut scores (or lower them). What we do not seem to be willing to do as a nation is roll up our sleeves and do the really, really hard work of ensuring that every student receives a quality education.
When I took my first statistics class in the arly 80’s, the professor warned about the misuse of stastistics. He continuously warned us about the dangers of using statistics to ‘prove’ any point we wanted. At the time, only liars and other ‘stretchers’ of the truth used statistics for their gain. Now everyone does it. the science of statistics has been misused to the point that when I see an article using them, I am automatically dubious of its claims.
Is it any wonder that academic honesty is becoming a thing of the past?
The problem is that a change in the passing score on a well-crafted exam does not indicate a change in expectations. You have to know about the difficulty of the items.
Too many of us are locked into the idea of a 100 point scale with 60 or 70 as passing.
AP Calculus is a pretty challenging course. When I las taught it. The exam had 108 points. About 40 could get you a 3 (passing) and about 70 got a 5 (the highest score).
Statistics (data, really) does get abused, sometimes by not understanding the context of the numbers. Changing the requirement for an A from 93 to 90 does not necessarily mean standards were lowered or even that there will be more As.
Statistics can say anything the author wants them to say. The only time I trusted statistics was in electronic design. That was because there were consequences to skewing the data. The circuit didn’t work.
Minor correction needed: the pre-2011 NYS English Regents exam was given over two days, correctly noted by the author, but required students to write four essays, not two.
The current exam, which is being phased out in favor of a Common Core-based exam that is designed to fail large numbers of students and teachers, contains one essay and two paragraph-writing tasks.
60 was a good round number. But to race the world the ante was upped to 65/100, 13/20, 6.5/10. Then when the test scores went into the 30’s and 40’s, they just cheat and readjust the score to 65. Makes perfect sense.
By any other name—
Read a blog posting that appeared here on 5-28-2014. The first four lines: “John Ewing wrote a brilliant article called “Mathematical Intimidation.” If you haven’t read it, please do. It demolishes VAM. He calls on mathematicians to speak out.”
Link: https://dianeravitch.net/2014/05/28/thankful-for-john-ewing/
Download the linked pdf of the article. It’s well worth the read.
😎
Our speakers go to 11. 65 is passing, not 60. When I see the number 65, I see the mark of endemic nonsense, and I know whomever devised these grading scales, fools abound.
I was not paying attention to the education reform movement during the time period when Bill Gates was funding and promoting his “small schools” initiative (which he has subsequently decided was ineffective, though it is my understanding that no apology for the damage he caused was forthcoming). I recently read about this (Joanne Barkan in Dissent) — her belief, based on the information she acquired as a journalist, is that our Bill misread the data — that is, Gates saw that small schools were disproportionately among the high performing high schools and concluded that largeness was a problem he would try to fix with his money. So he spent money breaking up large, heretofore unsuccessful (high-poverty, I’d wager) public high schools and forming multiple small high schools out of them. It didn’t work.
What Ms. Barkan did not mention in her article (at least not the one that I read) is the mathematical reason for this. OF COURSE the small schools were disproportionately represented among those schools with high rates of success. The mathematical “law of large numbers” explains this perfectly. For those who understand data and the “law of large numbers”, it is ironic that “lots of big public high schools endured lots of disruption all because our Bill can’t, actually, as it turns out, do some of the more important mathematics such as data analysis”. I am familiar with the biographical information on our Bill that he was something of a precocious math student. Still, the law of large numbers is something almost anyone could understand if they tried . . . . basically, in this context, it says that the smaller groups will always be disproportionately represented at the extremes . . . . due to randomness, chance (this is before you even talk about the fact that poverty is not actually random and poverty really IS the independent variable in everything we are doing with public education). It’s mathematics, Bill — please stop meddling in our schools until you get a little better at it!
I am a math-o-phobe, and I hadn’t heard of the “law of large numbers,” but even I can figure this out, even before knowing of the law. It’s actually pretty common sense. So WHY don’t our fearless leaders know this???? (psst: the answer is money and power. That’s always the answer!)
To expand somewhat on your remarks the Law of Large Numbers says that sample mean tends to population mean for large sample sizes in the sense that with a large sample the sample mean is very unlikely to differ much from the population mean.
So if one takes 1000 fish out of a lake and weghts them then the average weight of the fish in the sample is probably pretty close to the average of all fish in the lake. But if one takes a sample of 5 fish then there is a substantial probability that the average weight could be either considerably higher or considerably lower than the average of all the fish in the lake.
So small schools will show a lot more variation in average academic achievement of their students due to random fluctuations in the average ability of their students. The average academic ability of students in a large school is not likely to vary much from year to year.
By the way Bill Gates apparently was very good at mathematics as a student. But probability/statistics is a field with a lot of unintuitive results. As a example considering flipping a fair coin. If an excess of say head over tails has developed people tend to expect that it will “self-correct” fairly soon. In fact once an such an excess has built up it wil generally persist for a long time. A lot of gambling systems are based on a fallacious belief in some kind of “restoring force”.
Or consider the famous hat-check girl problem. The patrons at a opera hand their hats to the hat-check girl. She misplaces the tickets and hands the hats back to the patrons at random. What is the probability that no one will get their own hat back?
Caveat to the Law of Large Numbers. The samples must be true random samples; otherwise sample size is irrelevant.
That the rub with most polling data.
I mention the hat-check girl problem as an example of a simple probabilty problem with a rather surprising and unintuitive solution.
Folks and “fellows” (Paid for with Gates and Tisch funding) are both ignorant of history and ignoring the history of the tests. Many of them did not know the history of teachers writing Regents exams, for example, because they themselves did not attend or teach in public schools or even bother to understand how NYSED was run (due to budget cuts, these”fellows” now run the department). The cut scores and conversion charts change constantly; some scores are not possible on a given exam (there was no 99, 98, 96, 94, or 90, to name a few) on June’s English Regents exam. At the same time, the state dropped the variance of 55 for students with disabilities. These students, unless they “pass” with a 65, will no longer earn a diploma, but will instead earn a “credential.” So in addition to the cleverly manipulated cut scores, some students will no longer get a “non-regents” diploma, which became the new “local” diploma with NCLB. They will only get a certificate of completion. Numbers matter. Words matter. Ideas matter. But the education malanthropists don’t care about anything but distorting the truth with words and numbers and making money on other people’s misery.
“We seem to be enamored of
easyrandom solutions. Make exams harder (or easier). Raise cut scores (or lower them).”“Do this don’t do that, can’t you read the sign?”
It certainly appears that the best model for education reform is the “random walk”, also referred to as the “drunkard’s walk”.
The public’s understanding of anything quantitative is very poor. I read of a study once showing that telling people of a supposed million dollar saving in a government program made just as much impression as telling them of a supposed billion dollar saving.
Despite the public’s ignorance the sciences of probability and statistics contain an enormous amount of valid knowledge.
“The public’s understanding of anything quantitative is very poor.”
Says the guy who believes that IQ is a valid indicator of anything other than SES.
Dienne: what you said.
😎
Psychometric testing goes back more then 100 years and there is an enormous body of statistical evidence supporting it. The results of IQ testing not only predict academic achievement better than anything else they are also correlated with all kinds of social outcomes..
Comparison of IQ scores with evaluations by peers show strong correlation between high IQ tests and being described as “smart”. IQ scores of chess players were found to be correlated with their FIDE rating. IQ scores are correlated with decision times in experiments where subjects have to push different buttons depending on which of certain lights come on. Although virtually all people can do this very quickly measurements show that the decision time is less the higher a person’s IQ.
No part of psychology is anywhere near as well-developed as psychometrics.
Jim, you’re right. But you have to choose between being right and politically correct.
Maybe SES is a predictor of iq.
SES may predict IQ but that maybe because IQ is more the casual factor and SES the effect.
It’s not just the public thick is ignorant of statistics; it’s educators and supposed researchers and especially the media.
Which is another argument for more stats and probability in K-12.
I would LOVE it if we could teach more data analysis / probability / statistics in high school math class. This is one very compelling reason to be against common core. While it does include some very elementary data analysis and probability, it’s very little — and by the time kids get through with the mind-numbing dullness of their algebra classes, who wants to take statistics ? It is truly a shame, because this is the #1 skill the public needs. It would be so much better if we could do data instead of “algebra 2” (which is truly unnecessary for all save those who pursue science / engineering or economics at a high level).
Jim,
Please define IQ other than being “intelligence quotient”. What are the various factors that go into making up IQ? How many factors are there? Are those various factors agreed upon by all who insist that IQ is a logically meaningful concept? Has the definition of IQ changed over those “100 years”? (reminder, an apparent “long time” for a social meme doesn’t necessarily mean that the meme is logically correct.) In what way does IQ “predict” “academic achievement”? What does “predict” and “academic achievement” mean in your statement and are those definitions completely agreed upon by all who use the terms? (In the sense that 1 + 1 = 2).
And for a final comment: Psychometrics = Phrenology = Eugenics = Geocentrism = Religious “revealed truths” = Heaven = Hell, etc. . . .
Duane – There is an extensive technical literature on psychometrics where you will find answers to most of your questions.
Thanks for the non-reply!
Duane – It’s up to you to acquire knowledge of psychometrics to the extent that you wish to do so. There is a large body of publications available.
Again, thanks for the non-response!
You could probably start with Jensen’s book on the g-factor. He has an extensive bibliography.
Again, thanks for the non response. Define it so that we can see what you are talking about.
The cultural achievements of different racial/ethnic groups over long periods of history correlate well with average IQ’s. Cultural achievements of Sub-Saharan African cultures have lagged substantially behind Eurasian cultures since the Mousterian 50,000 years ago. The average IQ of Sub-Saharan Africans is about 70. The average IQ of Amerindians is about 85. Although Amerindians have only been in the New World for about 15,000 years they created cultures much more advanced than those in Sub-Saharan Africa but much less advanced than in Europe or East Asian where typical IQ levels of 95-105 are found.
Ashekenazi Jews whose measured average IQ is about 110-115 have extraordinary records of high accomplishment in intellectually demanding fields as shown by the number of Nobel prizes in the hard sciences, Fields Medals in mathematics and chess grandmasters.
“Although Amerindians have only been in the New World for about 15,000 years they created cultures much more advanced than those in Sub-Saharan Africa but much less advanced than in Europe or East Asian where typical IQ levels of 95-105 are found.”
Utter tripe! Naw, that’s being too kind. Pure bullshit.
Shows a complete lack of understanding of the accomplishments of the various cultures, not just in the Americas but the rest of the world.
There are higher levels of cultural achievement in the American Southwest such as at Chaco Canyon, a place extremally peripheral to the main centers of Meso-American civilizations than in almost all of Sub-Saharan Africa. The Jalisco culture on the West Coast of Mexico was a rather minor part of the Meso-American cultural area but much above the levels of culture found in Sub-Saharan Africa.
All the domesticated animals and plants of Africa with the exception of the banana were first domesticated outside Africa and then introduced into Africa. Amerindians in South American domesticated 7 of the 9 ungulates present. The two species not domesticated live only at very high altitudes in the Andes. In Scandanavia there are two ungulates, the elk and the caribou of which one, the caribou, was domesticated. In Tibet the single ungulate, the Yak, was domesticated. Sub-Saharan Africa has lots of ungulate species none of which was domesticated. The genetics of Masai cattle for example show that they came from Asia.
Even African domestic dogs are derived like dogs all over the world from the Eurasian wolf. This despite the very large number of wild caninds in Africa.
Where do you get such excellently absurd talking points?
Do you have a link to this?
By the way Duane – do you actually know anything about Meso-American cultures?
What’s the difference between Tenochtitlan and Teotichuacan?
I doubt that you know Aztec from Zapotec.
Should I presume that by Aztec you mean Mexica?
And I do know my Azz from my zapatos!
Teotihuacan was deystroyed long before the ancestors of the Aztecs arrived in Northern Mexico in their trek from the Great American Basin. So there’s no particular connection between the two although the Aztecs knew of the site.
Casi bien, hijito.
Although a few suggest that the Mexica may have come from the desert southwest of what is now the US, the most agreed upon location is what is present day North Central Mexico.
The Aztecs considered Teotihuacan to “holy, spiritual site”.
Hate to disappoint you but I’ve had quite an avid interest in Meso and South American archaeology since I first (actually the second but I was too young to remember the first although we do have pictures of my family at the pyramids) visited Teotihuacan and the National Museum of Anthropology in Mexico City in 1972 and after having lived in Peru in 1973 being able to see many “huacas” from the house in which I stayed and having visited a number of archaeological sites both in Northern Peru and in the Cuzco region. I’ve taken an upper level course “History of Mexico” (although that mainly focused on the period after the Spaniards arrived) and a 400 level “Archaeology of South America” course with approval from the instructor since I hadn’t taken “Archaeology 101”. Over the years I have been an on and off member of the American Archaeological Society and have read any number of books on these areas when I could get my hands on them including in Spanish. So rest assured I know a little bit about the subject
The Aztecs certainly came from the north. Their own myths describe their wanderings through the desert. Nahuatl is a Uto-Aztecan language and the distribution of such languages suggest that the proto-language from which they originated was probably spoken in the Great American Basin.
Within the Uto-Aztecan family, Nahuatl belongs to the Aztec-Tanoan sub-family. The other languages of this sub-family are the languages of the Taos Pueblos and Kiowa. The Kiowa were a tribe from the Taos Pueblos who became Plains Indians after horses became available following the Pueblo Revolt of 1680.
So ultimately the Aztecs or their ancestors probably came from the pretty far north.
The site of Teotihuacan was well-known to the Aztecs and other people in
Mesoamerica. Very little is known about the people who built Teotihuacan but no reason whatever to think that they had any connection with the Aztecs.
Duane, you do know Aztec from Zapotec. I did misjudge you on that point.
That East Asians are smarter than whites. Jews smarter than gentiles. Blacks smarter than whites. All t his is patently obvious.
Jim,
What is “smarter”? (and not meaning the supposed metric of IQ) What constitutes being “smarter”?
If Jews are so much smarter than gentiles why was Hitler and his minions able to obliterate so many million Jews? (and others?)
Me thinks, also, that you have a typo in your response (and certainly not the ‘t his’). Re read it!.