The new Common Core tests funded by the federal government agreed to adopt the standard of “proficiency” used by the National Assessment of Educational Progress (NAEP). Students who are not “proficient” are deemed to have “failed” to meet the standards. They are described as “not proficient,” which is a very bad thing indeed.
But what does NAEP proficiency mean?
I served on the NAEP governing board for seven years. I understood that “proficient” was a very high standard. There are four NAEP achievement levels: Advanced (typically reached by 5-8% of students); Proficient (typically reached by about 35-40% of students); Basic (typically reached by about 75% of students); and Below Basic (very poor performance, about 20-25% of students). Thus, by aligning its “pass” mark with NAEP proficient, the PARCC and SBAC (the two testing groups) were choosing a level that most students will not reach. Only in Massachusetts have as many as 50% of students reached NAEP proficient. Nearly half have not.
As Catherine Gewertz wrote in Education Week, “The two common-assessment consortia are taking early steps to align the “college readiness” achievement levels on their tests with the rigorous proficiency standard of the National Assessment of Educational Progress, a move that is expected to set many states up for a steep drop in scores.
After all, fewer than four in 10 children reached the “proficient” level on the 2013 NAEP in reading and math.”
So, if these consortia intend to align with the very rigorous standards of NAEP, most students will fail the tests. They will fail them every year. Will the test results be used for promotion and/or graduation? If so, we can expect a majority of the current generation of students not to be promoted or graduate from high school. What will we do with them?
It is time to ask whether NAEP proficient is the right “cut score” (passing mark). I think it is not. To me, given my knowledge of NAEP achievement levels, proficient represents solid academic performance, a high level of achievement. I think of it as an A. Advanced, to me, is A+. Anyone who expects the majority of students to score an A on their state exams is being, I think, wildly unrealistic. Please remember that NAEP proficient represents a high level of achievement, not a grade level mark or a pass-fail mark. NAEP basic would be a proper benchmark as a passing grade, not NAEP proficient.
Furthermore, the NAEP achievements levels have been controversial ever since they were first promulgated in the early 1990s when Checker Finn was chairman of the NAEP governing board. Checker was subsequently president of the Thomas B. Fordham Foundation/Institute, and he has long believed that American students are slackers and need rigorous standards (as a member of his board for many years, I agreed with him then, not now). He believed that the NAEP scale scores (0-500) did not show the public how American students were doing, and he was a strong proponent of the achievement levels, which were set very high.
James Harvey, a former superintendent who runs the National Superintendents’ Roundtable, wrote an article in 2011 that explains just how controversial the NAEP achievement levels are.
He wrote then:
Since definition is crucial in any discussion of standards, let’s define the terms of the discussion. The No Child Left Behind Act, passed by Congress in 2001 as the latest reauthorization of the Elementary and Secondary Education Act of 1965, permitted states to develop their own assessments and set their own proficiency standards to measure student achievement. Most states, for their purposes, quite sensibly defined proficiency as performance at grade level.
What about NAEP? Oddly, NAEP’s proficient standard has little to do with grade-level performance or even proficiency as most people understand the term. NAEP officials like to think of the assessment standard as “aspirational.” In 2001, long before the current contretemps around state assessments, two experts associated with the National Assessment Governing Board—Mary Lynne Bourque, staff member to the governing board, and Susan Loomis, a member of the board—made it clear that “the proficient achievement level does not refer to ‘at grade’ performance. Nor is performance at the proficient level synonymous with ‘proficiency’ in the subject. That is, students who may be considered proficient in a subject, given the common usage of the term, might not satisfy the requirements for performance at the NAEP achievement level.”
It is hardly surprising, then, that most state assessments aimed at establishing proficiency as “at grade” produce results different from a NAEP standard in which proficiency does not refer to “at grade” performance or even describe students that most would think of as proficient. Far from supporting the NAEP proficient level as an appropriate benchmark for state assessments, many analysts endorse the NAEP basic level as the more appropriate standard because NAEP’s current standard sets an unreasonably high bar.
What is striking in reviewing the history of NAEP is how easily its governing board has shrugged off criticisms about the board’s standards-setting processes.
In 1993, the National Academy of Education argued that NAEP’s achievement-setting processes were “fundamentally flawed” and “indefensible.” That same year, the General Accounting Office concluded that “the standard-setting approach was procedurally flawed, and that the interpretations of the resulting NAEP scores were of doubtful validity.” The National Assessment Governing Board, or NAGB, which oversees NAEP, was so incensed by an unfavorable report it received from Western Michigan University in 1991 that it looked into firing the contractor before hiring other experts to take issue with the university researchers’ conclusions that counseled against releasing NAEP scores without warning about NAEP’s “conceptual and technical shortcomings.”
“Most state assessments aimed at establishing proficiency as ‘at grade’ produce results different from a NAEP standard.”
In addition, NAGB absorbed savage criticism from the National Academy of Sciences, which concluded in 1999 that “NAEP’s current achievement-level-setting procedures remain fundamentally flawed. The judgment tasks are difficult and confusing; raters’ judgments of different item types are internally inconsistent; appropriate validity evidence for the cut scores is lacking; and the process has produced unreasonable results. … The results are not believable.”
For the most part, such pointed criticism has rolled off the governing board like so much water off a duck’s back.
As recently as 2009, the U.S. Department of Education received a report on NAEP from the University of Nebraska’s Buros Institute. This latest document expressed worries about NAEP’s “validity framework” and asked for a “transparent, organized validity framework, beginning with a clear definition of the intended and unintended uses of the NAEP assessment scores. We recommend that NAGB continue to explore achievement-level methodologies.” In short, for the last 20 years, it has been hard to find any expert not on the Education Department’s payroll who will accept the NAEP benchmarks uncritically.
Those benchmarks might be more convincing if most students outside the United States could meet them. That’s a hard case to make, judging by a 2007 analysis from Gary Phillips, a former acting commissioner of the National Center for Education Statistics. Phillips set out to map NAEP benchmarks onto international assessments in science and mathematics and found that only Taipei (or Taiwan) and Singapore have a significantly higher percentage of proficient students in 8th grade science than the United States does. In math, the average performance of 8th grade students in six jurisdictions could be classified as proficient: Singapore, South Korea, Taipei, Hong Kong, Japan, and Flemish Belgium. Judging by Phillips’ results, it seems that when average results, by jurisdiction, place typical students at the NAEP proficient level, the jurisdictions involved are typically wealthy—many with “tiger mothers” or histories of excluding low-income students or those with disabilities.
None of this is to say that the method of determining the NAEP achievement levels is entirely indefensible. Like other large-scale assessments—the International Mathematics and Science Study, the Progress on International Reading Literacy Survey, and the Program on International Student Assessment—NAEP is an extremely complex endeavor, depending on procedures in which experts make judgments about what students should know and construct assessment items to distinguish between student responses. Panels then make judgments about specific items, and trained scorers, in turn, bring judgment to bear on constructed-response items, which typically make up about 40 percent of the assessment.
That said, it is hard to avoid some obvious conclusions. First, NAEP’s achievement levels, far from being engraved on stone tablets, are administered, as Congress has insisted, on a “trial basis.” Second, NAEP achievement levels are based on judgment and educated guesses, not science. Third, the proficiency benchmark seems reachable by most students in only a handful of wealthy or Asian jurisdictions.
It is important to know this history when looking at the results of the Common Core tests (PARCC and SBAC). The fact that they have chosen NAEP proficient as their cut score guarantees that most students will “fail” and will continue to “fail.” Exactly what is the point? It is a good thing to have high standards, but they should be reasonable and attainable. NAEP proficient is not attainable by most students. Not because they are dumb, but because it is the wrong cut score for a state examination. It is “aspirational,” like running a four-minute mile. Some runners will be able to run a four-minute mile, but most cannot and never will. Virtually every major league pitcher aspires to pitch a no-hitter, but very few will do it. The rest will not, and they are not failures.
What parents and teachers need to know is that the testing consortia have chosen a passing mark that is inappropriate, that is not objective, and that is certain to fail most students. That’s not right, and that’s not fair.
An invaluable posting.
Like all rheephorm schemes, the use of NAEP “proficient” is to punish-by-measurement.
The heavyweights and thought leaders of the self-proclaimed “education reform” movement love catchy and vacuous slogans that sneer and jeer at, and smear, public education.
Ok. I’ll skip “hazing ritual” and “sucker punch” and “rigged” re high-stakes standardized testing and get right to one of the classics of rheephormist ideology—
“The soft bigotry of low expectations.” Except remember that no matter how well teachers and students “perform” or “achieve” [using the preferred psychometric terms] nothing but failure satisfies the rheephormistas. That is, any result that shows public school failure is believable (and the more massive, the more believable) and any result that shows public school success (however modest) is to be rejected out of hand.
In other words, the default position of the charterite/voucherite/privatizer crowd is that public schools suck. There’s a word for that: prejudice—“preconceived opinion that is not based on reason or actual experience.”
And how do you get the hard data points known as standardized test scores to confirm the “soft bigotry of low expectations”?
You employ the “hard bigotry of mandated failure.” But this is not even slight-of-hand. The cart is publicly and openly being put in front of the horse: ensure you get the results you want, i.e., failure (preferably massive), and then use those entirely predictable results to justify your “conclusion” that charters and vouchers and privatization are the solutions to what ails public schools.
The “not best answer” on a standardized test goes by various monikers: decoy, mislead, distractor. The “soft” and “hard” bigotry of the self-serving edupreneurs and their enablers and enforcers have as one of their most important objectives:
To try to make people not notice that one of the biggest obstacles to a “better education for all” is self-styled “education reform.”
Yes. Decoy. Mislead. Distractor. Sums up the rheephorm playbook quite nicely.
Or, as they proudly admit when they get together to discuss $tudent $ucce$$ and forget that others might be listening in:
“The secret of life is honesty and fair dealing. If you can fake that, you’ve got it made.”
Marxism—today, tomorrow, forever. Clutching their Groucho to their chest and never letting him go.
😎
Peter Greene had an interesting musing on this topic: http://curmudgucation.blogspot.com/2015/05/proficient.html#comment-form
“There are four NAEP achievement levels: Advanced (typically reached by 5-8% of students); Proficient (typically reached by about 35-40% of students); Basic (typically reached by about 75% of students); and Below Basic (very poor performance, about 20-25% of students). Thus, by aligning its “pass” mark with NAEP proficient, the PARCC and SBAC (the two testing groups) were choosing a level that most students will not reach.”
Issues with NAEP need to be considered in relation to the new concept of “college and career readiness” forwarded by enthusiasts for the Common Core and the fact that SBAC and PARCC scores are supposed to have been constructed to assess grade-level achievement of those standards and specific methods of teaching attached to some of them, with “close reading” an example. These tests have been introduced early in “roll-out” process, meaning that no students have experienced all of the instruction called for as prerequisites for grade 11 tests, or grade 3 to 8 tests, or whatever. So, these tests have no integrity as measures of achievement or proficiency relative to the Common Core standards. NONE. The first cohort of students who might have been taught in ways that comport with the grade-by-grade assumption in the Common Core and college/career readiness criteria will enter grade 11 in 2015.
That said, and damn the torpedoes full speed ahead, SBAC has set cut scores based on field trials in 21 states and a four-category reporting scheme that echoes that of NAEP. PARCC tests are supposed to be comparable (per original grants from USDE).
Producers of the SBAC tests have their set “cut scores” to report on four levels of performance.
Level 1 signals failure. Level 2 implies “at risk of failure.” Level 3 implies “safe harbor, doing well.” Level 4 means “proficient.”
For students in grade 11, only Level 4 indicates readiness for entry-level, credit-bearing courses in college. The SBAC website shows the cut scores and estimated percentage of students whose scores will be reduced to one of those four categories.
I think that any questions about the relevance of NAEP cut scores to SBAC and PARCC scores–or other tests supposedly aligned to the the Common Core–are missing the point.
Unless the frameworks that structure the NAEP tests have some verified comparability with the grade-by-grade Common Core standards and the Common Core’s definition of “proficiency” which is firmly tied to “college and career readiness” the discussion of cut scores is a lot of hot air.
NAEP tests are a work in progress and the process and results are subject to a lot of peer review. I worked on the NAEP tests in the visual arts in the 1970s, music also tested for the first time back then.
I think there is ample evidence that the SBAC and PARCC tests, like the whole Common Core initiative has been foisted on educators in a way that has never been true for NAEP tests.
Year is 2015, not 2015.
also note that this message kept appearing when I tried to comment…
A site to discuss better education for all
404 Not Found
Oops! This page does not exist. Maybe you can try searching for it again.
Search for:
YEAR IS 2025 not 2015…out of here, not my day.
Excellent article. In California, the State Board of Education and its president Mike Kirst and the State Superintendent, Tom Torlakson,with the advice of Linda Darling-Hammond and others have studiously avoided using the term “proficiency”. Instead, they are tentatively equating levels 3 and 4 on SBAC at 11th grade as comparable to being prepared for a 4yr college curriculum (which they realize needs to be further validated). Or, as some of the commentators and NAEP research studies have argued, levels three and four could be interpreted as “ready for credit bearing courses in a four year college”. (SBAC essentially used NAEPS level three and four as their guide in setting the levels with the understanding that they were measuring the 4yr college bound. Whether the levels are accurate needs to be researched and if necessary adjusted).
At present a little over a third of our students reach this level, and it seems to me a legitimate goal to try to increase that pool. But that still leaves a huge number of students who might be prepared for tech-prep or other career pathways being labeled as failing which is unfair to most of them and their schools and teachers. Many may be well prepared for rigorous alternative pathways, just not at the same level or with the same courses (Alg.2) as the 4yr college bound. Thus,schools should not just be held accountable for preparing students for 4 yr colleges, they should also be attempting to prepare large numbers of the remaining students for these alternative, demanding career/tech pathways.The state is grappling with a valid measure of success for this broader group.
We also are emphasizing that SBAC test results are to be used primarily for feeding back information to school and district efforts to improve performance and not for purposes of high-stakes accountability. We understand that the annual standardized test-results are probably one of the weakest sources of useful information and need to be embedded in a much richer array of measures including professional judgments, performances, projects, etc–a combination of state collected and non-standardized locally generated information. There is a strong effort to communicate these ideas to parents and the public.
Finally, the State Board which has responsibility for recommending an accountability system next year has scheduled several all day meetings on the subject (Linda Darling-Hammond and David Conley presented at the first one, making the point among others that accountability should be viewed as one part of a broader system of continuous improvement strategy for the state). The tenor of the first meeting was to go slow on any statewide standards or targets and wishing to rely on local districts for much of accountability system. Board members stressed the need for more reliable data on what these SBAC results actually mean especially in the grades three through eight, what are legitimate goals (which are usually pulled out of a hat) for state tests, graduation rates, engagement measures, honors courses, course preparation for college or career strands etc.) and how reporting data fits in with a broader improvement strategy.The Board seemed wary of setting goals prematurely without significant exploration of what targets of performance or growth would be realistic, fair, and valid.
Thank you so much for posting this information. Our superintendent recently tweeted a Columbus Dispatch article indicating “Ohio students are less prepared for college, career, report shows”. He indicated that we will rise to the challenge. Of course this report cited by the Dispatch was written by Achieve Inc. and supported by Fordham, both major recipients of Gates money. Our superintendent also said this in 2/2012 to justify Common Core in our schools:
“I believe the splintered, local control curriculum and assessment approach that the US has had is what has led to our International demise. The variability in performance expectations varies across different states and honestly Ohio has set its passing score on its achievement tests at a drastically low level. The result has been “fooling” people into believing that a student might be Proficient in reading at a local (in this case a state level), when in fact the student isn’t proficient at a National or International level. The Common Core movement started at a grass roots level and was conducted by Superintendents of Public Instruction from the individual states. ….As it relates to teaching and learning, we know more now than we ever have and to give complete curriculum, instruction and assessment control to the individual states has led to disastrous consequences as demonstrated by our International performances on the PISA and TIMMS.”
Jenny,
Your superintendent has it backwards. The variety and diversity of America’s standards and assessments have led to our pre-eminence in the world.
Thank you! We will continue to fight this in our little school district in Ohio and have a pretty high rate of opt-outs considering it’s a small town. Our school district also boasts a large number of Distinguished Alumni – teachers, a television producer, Academy Award Winner, College Jeopardy winner, a world-renowned jewelry maker, business, community, military leaders, etc. etc. etc.. We are a special school district that already had high standards taught by our wonderful teachers. Now we have CPM math and the administration requires our highly qualified and experienced math teacher to have CPM coaches to learn a new way of teaching math – as “facilitators.” It is condescending! Plus we had a high school junior girl create a petition to return to the traditional way of teaching math and she received over 600 signatures. The petition was ignored by the superintendent and school board. They forge ahead. The actual data of successful Alumni and students not learning math and being dissatisfied with the new math is still not enough to change course back to our traditionally high standards. It is very frustrating, especially since our OH governors insists we have Local Control. We don’t.