The Policy Consortium in the UK has a good overview of the British response to the PISA scores.
Each political party is pointing fingers at the other for the scores not being as high as they would wish.
The Conservatives say it is Labor’s fault.
The Labor party says it is the fault of the Conservatives.
But here are some good takeaways.
“… it is important to attract the most talented teachers to the most challenging classrooms and teacher shortage and disciplinary climate are inter-related.
Moreover, despite what Michael Gove asserts, a qualified teaching force is a key driver of quality and performance, the PISA data shows. Two findings are key here: first, the quality of a school (or college) cannot exceed the quality of its teachers. Second, principals of disadvantaged schools have difficulty attracting well qualified teachers, so students suffer doubly.
And more:
Much has been said in the media about the influx of migrant groups such as the Roma undermining provision for the indigenous population (all part of the attack on the EU by right-wing media). But, again, the PISA report shows this is first and foremost an issue of resources. “The concentration of immigrants in a school is not associated, in itself, with poor performance”, it says.
Nor is it a question of fairness. High-performing school systems tend to allocate resource more equitably across advantaged/ disadvantaged schools. Also, combining high performance with a high degree of equity is possible – it happens in some countries.
Another observation, with which all political parties would claim to be in tune, is that “schools with more autonomy over curricula and assessments tend to perform better when they are part of a school system with greater collaboration between principals and teachers”.
So, rather than sniping across the political garden fence, politicians should try to build a consensus around policy options that improve performance and equity. Four such actions which the PISA report clearly identifies are:
- targeting low performance regardless of socio economic status
- targeting disadvantaged pupils through additional resources or finance
- improving the quality of teaching staff, focusing on time for teachers themselves to learn
- including marginalised students in mainstream education
And here is another important finding:
From the outset, the need for a good start is clearly identified. For example, the report shows, one year of pre-school improves performance in maths by one year of schooling.
Better staff-student relations are associated with greater student engagement. “Too many students do not make the most of the learning opportunities available to them because they are not engaged with school and learning. Drive, motivation and confidence in oneself are essential if students are to fulfil their potential”.

Maybe if the deformers keep pointing their fingers rather pointing to themselves (they know they are the problem: it’s about money), their fingers will fall off.
LikeLike
I don’t buy the teacher quality argument. By the above, if you took the teachers from say, Scarsdale and put thm in schools in a poor inner city district, things would be fine.
It’s the quality of teaching, which is affected by many factors including community support of education, the community and family environments, school support of teaching, and, yes, teacher abilities and beliefs.
Correlation does not imply causation.
What these countries do show is higher quality learning. The role of teachers in that is not so clear. Was what we know best taught to us by the highest quality teachers?
LikeLike
Good point. Teacher quality is relative, context specific and largely unmeasurable.
LikeLike
Dienne:
Teacher quality or teaching quality may be difficult to measure, but that hardly makes them non-measurable. It may well be that you cannot reduce it to a single metric or dimension, but that simply means you need to consider multiple dimensions.
LikeLike
Bernie,
Again not everything in this world is “measurable”. The teaching and learning process is one that absolutely cannot be “measured” from a logical epistemological ontological perspective. The very heart of the teaching and learning process belongs in the logical realm of aesthetics not measurement so from the very start the concept of measuring it is a falsehood.
Can you measure the love you have for your children? Do you sort and separate and rank them according to some metric(s)? What do you then say to the lower/est ranked one?
So ingrained is this sorting and separating episteme that most cannot even recognize when they are doing so.
After I read the Scientific American article about paper being a better medium for the brain to digest/deal with written information the librarian asked: “Well, what was the best part?” And I said, “There isn’t any best part, as the. . .” and she cut me off and said, “Come on, Duane, there has to be a best part.” I kindly explained to her that I don’t think in those terms and that there were quite a number of thoughts/ideas that synergistically came together to present a powerful case and that it would be best if she would read it herself. She looked at me like I was crazy. (I didn’t necessarily need her confirmation).
LikeLike
Duane:
I really do not want to go through this again. Your POV and that of Wilson are counsels of perfection – as I have already said. As for your question about measuring the love I have for my children – well that depends on the context and what issue we are trying to work on. More to the point – because I am sure the next issue is can we measure beauty – read Heather Hill’s paper cited in Haertel and then we can talk about the strengths and limitations of her approach rather than argue philosophically about whether anything can ever be measured.
LikeLike
No, Bernie our position is not one of “perfection”.
It is one of epistemological and ontological validity of the discourse about how to assess not only students but teachers, administrators and all involved. What Wilson points out that the best (notice not perfect) assessment framework is that of description, not measurement, and that description process should include not only the assessor but also the assessed. Yes, it is a “philosophical” argument, I know of no other way to hash it out. Attempts at “measuring” the very human process (kind of like the very human-not necessarily exclusive to humans-process of “measuring” the love of one’s children) of teaching and learning is far too complex to reduce to mere numbers. I truly don’t understand how people can’t or refuse to see/comprehend that simple fact, other that the cultural habitus and/or ideology are overriding the logical thought process. Hopefully, it is the first, it’s more understandable.
“well that depends on the context and what issue we are trying to work on.”
That love depends on context and issue? Perhaps I am misreading what you have written. Even if it does do you measure that love? (But I do agree that all assessments, which one constantly does whether one realizes it or not and the vast majority of those assessments have nothing to do with “measurement”-are context, time and space delimited.)
I’ll have to look up the Hill paper, thanks for the lead!
LikeLike
And Wilson and I both agree that any assessment process is less than “perfect” if one assume that there is such a thing, and I don’t. Acknowledge the imperfections in the assessment process, do not try to hide them which is what psychometrics does with standardized tests. And no, the vast majority of people have no clue about construct validity, item response theory etc. . . .
LikeLike
There does seem to be conflicting evidence about this point in recent posts on this blog.
The post based on Dr. Ladd’s article discusses the importance of teacher experience on student learning as measured by test scores, yet the post about Dr. Haertel’s talk cites him as saying “that social scientists generally agree that “teacher differences account for about 10% of the variance in student test score gains in a single year.” Out-of-school factors account for about 60% of the variance; many other influences are unexplained variables.”
If all differences between teachers account for 10% of the variance in scores, it is difficult to see how experience, one of the possible differences between teachers, has a large impact on measured student learning.
LikeLike
I haven’t looked at the study but perhaps years of teacher experience is the predictor variable being used for “teacher differences”? I am hard pressed to think of another easily obtained measure that might be used except perhaps “advanced degree” or “national certification status” expressed dichotomously?
Taking a step back, it is easy to see how teacher experience could have a beneficial impact on learning outcomes that are not well-measured by standardized test scores. Reading for pleasure was mentioned (and could be measured, just not on a test). Other studies have looked at students’ plans for the future. Certainly an experienced teacher could have a beneficial impact on that as well.
LikeLike
I would think that graduate degrees and or graduate hours could be used. With a large enough sample size you might be able to differentiate between individual teachers. I imagine that it would require a very large sample.
LikeLike
Emmy:
You should read Haertel’s speech/article then find the Hill et al 2010 article.
Much of the confusion comes when using multiple regression techniques with under specified models. The assumption is that the missing variables will lead to larger error term, BUT there can be huge changes in the actual importance of factors when new previously unconsidered variables are added. You need to think about how the two examples cited by Haertel might show up in a regression model.
LikeLike
Emmy,
Read and understand Wilson’s work and you’ll realize that there is absolutely no need to read all the edumetricians’ machinations about standardized testing, multi-variate regression models, or any other aspect. Anything from the edumetricians have all the same logical epistemological and ontological errors that expose the complete invalidities of the process as shown by Wilson.
Read and understand that and help break the current national addiction to “metrics”, sorting and separating and ranking and “grading” students. Blind addiction, sad!
No one has refuted nor rebutted Wilson’s work. I’ve not seen any come near it. It’s almost as if it’s radioactive, that’s how damning it is to the current cultural habitus of eduction.
I’m beginning to believe that the vast majority of educators absolutely cannot bring themselves to confront themselves and those atrocious academic malpractices because they have been a part of them so long (since they were five) and cannot face themselves realizing that they have caused many innocents harm. Truly there would be no place in hell for those who Go Along to Get Along.
LikeLike
Duane,
The problem is that Wilson’s work just does not make very much sense. Perhaps it would be helpful for you if we worked through your ten point summury to show that temperature is as flawed a measure of hotness and coldness from Wilson’s viewpoint as exam results are for education.
LikeLike
TE,
Wilson’s work wouldn’t be considered an easy read, I guess, if you put it in one of those reading level diagnostics. But I would think that you would be able to understand it with enough hard workd and effort. Reread it! The more you reread it the more you will understand it. Take your time, read each section and sub section carefully, write a brief summary in your own words of each section.
Actually Wilson does use the temperature example to show that measuring devices do indeed “interfere”/”interact” with the item measured. I explained that in a post about a month ago or so when you first brought the temperature aspect up.
Temperature can be measured accurately enough for general usage by people as the degree of accuracy doesn’t effect our dealings with temperature. Temperature is one simple physical aspect that can be adequately described with a number. Does it have to be? No, mankind got along for a long time with out temperature measuring devices.
The teaching and learning process is vastly more complex and therefore is not amenable to being described through simple means such as number of correct answers on a test. We may do it but it is so inaccurate as a means of describing a complex problem that it amazes me that we do it other than human’s being lazy and looking for simple solutions to complex undertakings.
LikeLike
Duane,
Give me your thoughts on this, perhaps a point at a time.
Brief outline of Wilson’s “Temperature Standards and the Problem of Error” and some comments of mine. (updated 12/4/13 by an economist)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The temperature is, by definition, lacking in the sense that “hotness and coldness are always of multidimensional qualities. To quantify them as one dimensional quantities (degrees) is to perpetuate a fundamental logical error” (per Wilson). The perception of hotness and coldness falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify hotness and coldness we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “temperature” of the of the back yard, not only onto the backyard but also, by extension, the neighborhood, town and county. Any description of temperature is only a description of an interaction, that of the air and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the air as it cannot be a description of the air but the interaction. And this error is probably one of the most egregious “errors” that occur with weather forecasts (and even the backyard thermometer readings by individuals).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the temperature assignment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the day will be “hot” and dresses accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like seeking to program a thermostat to comfortable daytime temperature, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a bakery program where the learner interacts with the oven with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.” (no modification needed for this paragraph as it concerns all error in tests, temperatures, time measurements, distances, etc. )
In other word all the logical errors involved in the process render any conclusions invalid.
5. The meteorologists, through all sorts of mathematical machinations attempt to “prove” that these temperatures (based on degree scales) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the temperature results are. He is an advocate for the clothes wearer not the warm clothing maker. In doing so he identifies thirteen sources of “error”, any one of which renders the temperature measuring/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the meteorologists can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does temperature measure in our world? It measures what the person with the power to pay for the thermometer says it measures. And the person who sets the thermometer will name the temperature scale what the person who pays for the thermometer wants the temperature scale to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many clothing manufacturers as the social rewards for some are not available to others who “don’t make the warm clothing(sic)” Should temperature have the function of sorting and separating days so that on some heavy clothing is worn yet on others light clothing will be chosen, especially considering that the sorting and separating devices, thermometers, are so flawed not only in concept but in execution?
LikeLike
TE,
We can be quite accurate in measuring temperature while at the same time having to come up with new descriptions of that accuracy and how to get that accuracy. See: http://www.sciencedaily.com/releases/2013/07/130711084858.htm
Attempting to use temperature measurement in the context of Wilson’s work seems to me to be akin to using using lasers to measure the distance between the constellations of the zodiac (choose whatever zodiac). In other words not a good choice.
And no, using weather temperature for determining what one wears is no where near the same as using a standardized test score to describe a student’s abilities, skills, capabilities. Again the comparison is ludicrous.
In essence it is a Red Herring:
This may seem like an odd name for a common logical fallacy. The term comes from an archaic practice of using strong-smelling fish to distract hounds from a fainter scent during their training. A red herring fallacy is simply any unrelated topic brought into an argument to divert attention from the subject at hand. A person on the defensive end of an argument changes the subject from one that he or she feels uncomfortable with to one he or she knows more about. A red herring fallacy looks like this:
There is discussion of issue A.
There is an introduction of issue B (irrelevant to issue A, but pretending to be relevant).
Issue A is forgotten and issue B becomes the focal point.
From: http://www.education.com/study-help/article/distracting-techniques/#heading2
Hard to catch red herring here in the Show Me State!
LikeLike
Duane,
Let us do this one step at a time. Is “temperature” an attempt to quantify a quality? Indeed might a temperature reading in February be judged warm by a person be judged cool in July despite having exactly the same measurement in degrees Celsius or cent
LikeLike
Let me finish, or centigrade.
LikeLike
@Duane. I will put Wilson on my reading list. From your summary points I think I get the gist. He sounds like Bruno Latour. In general, I am very sympathetic to this viewpoint. However, I think it is unlikely that we can replace our current state of affairs with philosophy. Most people don’t get it…and of those that do, they’d rather be playing around with HLM. Or at least that is where their bread is buttered.
I am more interested in what mixed methods can do to answer real questions about learning. If you can put a pragmatist hat on for a few minutes, what kind of research would you like to see?
LikeLike
Emmy,
What is HML? Thanks in advance!!
“I am more interested in what mixed methods can do to answer real questions about learning. If you can put a pragmatist hat on for a few minutes, what kind of research would you like to see?”
I’m not quite sure what you mean by “mixed methods can do to answer real questions about learning” Please expound as I can’t figure out what you are asking. Again, thanks in advance!!
Duane
LikeLike
TE,
Again, the red herring of temperatute takes away from the discussion at hand, that of the validity or should we say invalidity of educational standards, standardized testing and the usage of the results to compare, rank and sort different countries educational systems. As Wilson says, “Why this enormous urge to represent uni dimensionally a variety of human performances which are obviously multi dimensional? Why this obsession with numbers, this illusion of
numerical accuracy, this delusion of descriptive adequacy?”
LikeLike
Duane:
As I have said before, it depends on the issue or question you are trying to answer. Take for example the selection of students at MIT. Somebody else had raised the question and so I looked at the link to the MIT admissions process. It turns out, not surprisingly, that MIT had almost 19000 applicants and admitted 1548 or 8.2%. Now there are undoubtedly many criteria that MIT also used in selecting these students – especially since there are some non-STEM majors at MIT. That said, the admissions committee has a tough challenge eliminate 92% of the applicants. I do not know specifically, but I would contend that they would use results on various standardized tests as confirmatory and or level setting mechanisms. 96% of incoming Freshmen in 2013 had SAT Math scores of 700 or higher. It also looks like a combined score on the 3 components of the SATs of greater than 2300 doubled the chances of you being admitted.
http://mitadmissions.org/apply/process/stats
How would you address the challenge of filling 1548 places from almost 19000 applicants?
LikeLike
Bernie,
Don’t know how I would go about choosing those incoming students, never gave it a thought. Since Meremac In Town is a public community college I would do a lottery.
LikeLike
Duane;
Well, your approach is guaranteed to destroy the reputation of MIT in 1 or 2 years tops. Whether or not you answer the question, the question remains. The members on the admissions committee will endeavor to determine who gets in and who doesn’t in as reasonable way as possible and it will involve the use of standardized test results in some shape or form.
LikeLike
“What these countries do show is higher quality learning.”
Prove it! And I’m not talking PISA or any other standardized test scores. What is “higher quality learning”? Hint: It has nothing to do with higher standards and standardized testing.
LikeLike
Enjoyed reading your post, Emmy. I do have a different perspective on one of your points. You state, “… the quality of a school (or college) cannot exceed the quality of its teachers.” My view is that the quality of a school – or any institution – cannot long rise above the level of its leadership. Great athletes do not coalesce into a great team and sustain that level for any length of time without a great coach. In the absence of competent leadership at the principal and superintendent level, great teachers typically end up working in isolation – frustrated, on a path toward burn-out, often gradually lowing their own standards along the way. Conversely, it is often the case the good leadership can foster growth in mediocre teachers. Here in America, too, much (rhetorical) emphasis has been placed on improving the quality of teachers. No one’s actually taking steps to bring this about, but even if policy did change, I believe that putting better teachers in classrooms will be to little avail if, first, the abysmal quality of school leadership is not improved. Perhaps in the U.K. you have better administrators than we do. In any event, I did quite enjoy your post. Thanks.
LikeLike
Strongly agree that great leadership is very important. Karen Louis of the Univ of Mn and others have done a lot of research that affirms this point. “And…the impact of leadership tends to be
greatest in schools where the learning needs of students are most acute.”
Click to access ReviewofResearch-LearningFromLeadership.pdf
LikeLike
Leaders, schmeaders. We don’t need no stinkin leaders!!!!
LikeLike
Or, as the Wobblies (Industrial Workers of the World) would answer when confronted by a deputy sheriff demanding to know who their leaders was, “We’re all leaders!”
LikeLike
My apologies. I meant to address the above to you, Diane!
LikeLike
And yes, I realize you are not British!
LikeLike