What Does NAEP Really Tell Us?

James Harvey, executive director of the National Superintendents Roundtable, regularly sends out news bulletins about education. His group might be thought of as the antithesis of the Broad Academy; they are educators with experience, not tyros looking to move up quickly with minimal experience. Harvey has wisely inveighed against the common perception of NAEP’s proficiency level, which advocates of the Common Core and the CC-aligned tests (PARCC and Smarter Balanced) treated as if it were “grade level.” It is not.

Harvey writes here about the National Assessment of Educational Progress, which is often called “the nation’s report card.”

Here’s a good summary of what ails NAEP and how its results are reported

What’s the actual lesson to be learned from NAEP scores?According to Forbes contributor Peter Green (r), nothing much.
Green argues that despite the hope among many that NAEP data would help us to evaluate the effectiveness of different education policies, “In education, it’s fruitless to imagine that data will settle our issues.” He points out also that, “The three NAEP levels (basic, proficient, and advanced) do not necessarily mean what folks think they mean . . . NAEP’s ‘proficient’ is set considerably higher than grade level,” as noted on the NAEP site.

The Roundtable has taken strong exception to NAEP’s definition of proficiency. The Roundtable’s 2018 report, “How High the Bar?” concluded that not even 40% of fourth-graders in Finland and Singapore (nations typically thought to be world-class in terms of student achievement) can be deemed proficient in reading by the NAEP standard. The fact that uninformed policymakers and advocates conflate “proficiency” with grade-level performance is one of the absurdities of the current national conversation about schools.

retired teacher says:

November 19, 2020 at 11:24 am

There is a tremendous difference between grade level and proficiency scores. Grade level determination is derived by a mathematical formula that will show how a students performs on a test relative to the other students in the same grade. Proficiency is a subjective determination that may change from test to test and year to year. It may also be know as “cut scores.” Cut scores can change from year to year. Basically, proficiency may change when those in power want to raise or lower the bar.. If we want many student to fail, we simply raise the cut scores. If we want many students to pass, we lower the cut score. The designation of “proficient” is subjective.

The NAEP scores are an example of a test where so-called proficiency is higher than what would be considered the grade level equivalent. Another way to explain it is that the NAEP designation of proficient is higher than the grade level equivalent scores. In other words the NAEP awards proficiency status to students that are performing above grade level.

I hope this helps clarify the difference between these two concepts.

Yvonne says:

November 19, 2020 at 11:29 am

Thank you, retired teacher.

You should be Secretary of Education, as I have stated before.

You make sense.

Thank you.

- drext727 says:
  
  November 19, 2020 at 12:21 pm
  
  You know making sense is against the rules…lol

Laura H. Chapman says:

November 19, 2020 at 11:26 am

In the case of NAEP scores in the visual arts and music, they tell us little because the are now given at best every ten years and only in grade 8 when many students are not enrolled in art or music. Tests in theater and drama were dropped for reasons of cost and insufficient enrollments for sampling.

Of greater interest in NAEP (for the visual arts) are all of the background questions about the character of in-school instruction and out-of-school participation in specific activities, such as attending an art museum or studio of an artist.

The National Endowment for the Arts has very little school-specific information because “participation in the arts” is the larger category into which almost everything is pushed, regardless of any explicit educational aim.

In other words, the overall program of NAEP testing tells more about what the nation values in education than do the test scores taken as measures of “progress.”

Gordon Wilder says:

November 19, 2020 at 11:45 am

When I see the lack of integrity, the acceptance of blatant, outrageous lies by so very many people who follow a sociopath demagogue like Trump and yes, the Republican party who will not stand up to his agenda I think the goals and objectives of education have indeed been lost.
Know thyself; who are we as HUMAN beings, why are we here and where are we going kinds of things have been lost when the moneyed interests have made children but cogs in the wheels of industry.
My view: we have indeed lost our north star and the results are tragically showing. PBS has been running a series on the rise of Nazism. I suggest looking at it carefully.

retired teacher says:

November 19, 2020 at 1:43 pm

Right now an unhinged Rudy is trying cast doubt on the legitimacy of the election. Extremist Republicans are attempting to convince their base that Democrats stole the election to delegitimize a Democratic White House. Pathetic.

- dianeravitch says:
  
  November 19, 2020 at 4:39 pm
  
  About a third of the electorate thinks that Biden “stole” the election

Duane E Swacker says:

November 19, 2020 at 1:50 pm

Discussing the results of NAEP scores as if they were valid is just another faith-based belief grounded in the ever shifting sands of onto-epistemological falsehoods, errors and misconceptions. One has to have faith in the validity of the NAEP process and the resulting scores to continue to use it for anything. Noel Wilson has proven the invalidity of the process and the fallacy of using of the results for anything. But people insist that NAEP process is valid when it is not. That is faith-belief, to continue believing something when you’ve been shown that it is logically impossible and wrong.

There has been no rebuttal/refutation of Wilson’s work that I know of, and I’ve been looking for over twenty years now. See: https://epaa.asu.edu/ojs/article/viewFile/577/700

For you RBMTK!

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other words all the logical errors involved in the process render any conclusions invalid.

The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?

My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

(Sorry about the lack of line breaks I can never get WordPress to do the breaks that are in the writing.)

GregB says:

November 19, 2020 at 11:22 pm

You had me at “faith-based.”

- Duane E Swacker says:
  
  November 20, 2020 at 12:30 pm
  
  Not sure, actually have no idea, what you mean by that statement. Please elaborate. Thanks.

November 19, 2020 at 1:53 pm

For a shorter read on the invalidities involved in the standardized testing malpractices see Wilson’s A Little Less than Valid: An Essay Review at:

http://edrev.asu.edu/index.php/ER/article/view/1372/43:

bethree5 says:

November 19, 2020 at 10:11 pm

I’d thought maybe the NAEP was valuable because it’s standards are consistent. Now, as when it started over 40 yrs ago, NAEP-proficient still means B+/A-. True, no state should ever have aligned “proficient” to indicate meeting grade-level requirements. But at least it provides a measuring stick against multiple states’ cut-score-manipulated state-standardized scores, no?

Duane E Swacker says:

November 20, 2020 at 12:44 pm

If those “standards” are invalid, made up, nonsensical it doesn’t matter if they are “consistent” which in the terminology of the standardized test making is the concept of reliability. Reliability, as shown by Wilson, is one offshoot of validity. Validity concerns rightly outweigh reliability concerns as something can be reliably wrong, can be consistently invalid, be dependably unethical, can be stable malpractices which predictably harm all students (yes, even if they don’t take the NAEP due to the unwarranted influence those scores have on policy discussions and then on implemented malpractices).

And, no, it doesn’t “provide a measuring stick”. Not at all. Considering that those tests measure nothing and using the concept of “measuring” in regards to standardized testing is not only false but wrong both ethically and justness wise.

The most misleading concept/term in education is “measuring student achievement” or “measuring student learning”. The concept has been misleading educators into deluding themselves that the teaching and learning process can be analyzed/assessed using “scientific” methods which are actually pseudo-scientific at best and at worst a complete bastardization of rationo-logical thinking and language usage.
There never has been and never will be any “measuring” of the teaching and learning process and what each individual student learns in their schooling. There is and always has been assessing, evaluating, judging of what students learn but never a true “measuring” of it.
But, but, but, you’re trying to tell me that the supposedly august and venerable APA, AERA and/or the NCME have been wrong for more than the last 50 years, disseminating falsehoods and chimeras??
Who are you to question the authorities in testing???
Yes, they have been wrong and I (and many others, Wilson, Hoffman etc. . . ) question those authorities and challenge them (or any of you other advocates of the malpractices that are standards and testing) to answer to the following onto-epistemological analysis:
The TESTS MEASURE NOTHING, quite literally when you realize what is actually happening with them. Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement:
“Physical tests, such as those conducted by engineers, can be standardized, of course [why of course of course], but in this volume , we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.” [my addition]
Notice how he is trying to assert by proximity that educational standardized testing and the testing done by engineers are basically the same, in other words a “truly scientific endeavor”. The same by proximity is not a good rhetorical/debating technique.
Since there is no agreement on a standard unit of learning, there is no exemplar of that standard unit and there is no measuring device calibrated against said non-existent standard unit, how is it possible to “measure the nonobservable”?
THE TESTS MEASURE NOTHING for how is it possible to “measure” the nonobservable with a non-existing measuring device that is not calibrated against a non-existing standard unit of learning?????
PURE LOGICAL INSANITY!
The basic fallacy of this is the confusing and conflating metrological (metrology is the scientific study of measurement) measuring and measuring that connotes assessing, evaluating and judging. The two meanings are not the same and confusing and conflating them is a very easy way to make it appear that standards and standardized testing are “scientific endeavors”-objective and not subjective like assessing, evaluating and judging.
That supposedly objective results are used to justify discrimination against many students for their life circumstances and inherent intellectual traits.
C’mon test supporters, have at the analysis, poke holes in it, tell me where I’m wrong!
I’m expecting that I’ll still be hearing the crickets and cicadas of tinnitus instead of reading any rebuttal or refutation.

bkendall527 says:

November 20, 2020 at 12:40 am

I cannot entirely agree with Mr. Green and the National Superintendents Roundtable.

While I agree, the NAGB’s NAEP proficiency levels are not mathematically what they want the nation to believe. I have found the results to be reasonably consistent with PIRLS, SAT, and ACT results.

I just pulled the PIRLS results and looked at the 2011 4th grade average reading scale score, and I recently looked at the class of 2019s 2011-NAEP 4th-grade average reading results, and there is a minor difference.

The difference between myself and apparently everyone else: I translate the tests first. So I can do an apple to apple look at the results, like translating multiple languages into American Standard English. And yes, I know, everything possibly wrong with an assessment can be described in two words. “Human Error.” I no longer need my list. And my best guess, there is a reasonable probability of human error in every evaluation.

And I don’t believe it is my fault smart people in the last 70-years have not figured out how to translate the tests. And then I came along with my piddling middle school math skills and figured it out. Frankly, I am not the guy who is supposed to be able to do this. Crikey, I married my math tutor.

And I have stated all I am going to say here, about this.

I hope everyone stays safe and healthy.

dianeravitch says:

November 20, 2020 at 9:26 am

To be clear, I am not opposed to NAEP. I served on the NAEP board for seven years.
The point here is that many, many people wrongly believe that NAEP Proficiency is grade leve. They think that students who don’t achieve that level are “failing.” That’s nonsense. NAEP Proficient is a high bar. At best 35-40% will get there. That’s the way the test is designed.

- Duane E Swacker says:
  
  November 20, 2020 at 12:54 pm
  
  You should be opposed. 😉
  
  I don’t believe I’ve ever seen a rebuttal or refutation by you or anyone that negates the many invalidities involved in the standards and testing malpractice regime as proven by Wilson, Hoffman, myself and others.
  
  How can you support an invalid process that is educational malpractice and harms many students?
Duane E Swacker says:

November 20, 2020 at 12:48 pm

“And I don’t believe it is my fault smart people in the last 70-years have not figured out how to translate the tests.”

By translate do you mean interpret?

Because the inherent problems of accurate translation from one language to another the test questions adds another invalidity component to the whole standards and testing malpractice regime.

retiredbutmissthekids says:

November 20, 2020 at 2:51 am

Thanx, Duane! I’ll never tire of your Wilson rant, & always welcome, as there are many new people who come on the blog & need your wisdom!

Duane E Swacker says:

November 20, 2020 at 12:50 pm

See above for the other main onto-epistemological problem with standards and testing malpractices. That being that we are literally attempting to “measure the unmeasurable, the unseen”.

What Does NAEP Really Tell Us?

19 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

What Does NAEP Really Tell Us?

Share this:

19 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats