How should we define “readiness for college,” and what should we look for when reviewing applicants for admission? Over most of the last century, there have been two schools of thought on these questions.
The traditional view has emphasized achievement, as demonstrated by students’ high-school grades in academic subjects. This view reflects a reward-and-incentive philosophy in which admission to college is the reward for hard work and achievement in high school. In this view, rewarding performance not only assures colleges of a high-quality student body but also has an incentive effect on K-12 education. It encourages schools to offer, and students to take, a rigorous college-preparatory curriculum.
The alternative view is that students should be judged not simply on what they have learned but on their ability to learn. The idea that students can and should be judged on ability is closely associated with the SAT, and it has captivated American college admissions since the test was first introduced.
Whereas the older “College Boards” had tested knowledge of college-preparatory subjects, the “Scholastic Aptitude Test,” introduced in 1926, purported to measure a student’s capacity for learning. This idea dovetailed perfectly with the meritocratic ethos of American college admissions. If aptitude for learning could be reliably measured, the SAT could provide a vehicle for social mobility: Colleges could identify promising students from disadvantaged backgrounds who, despite inferior instruction and academic performance, were nevertheless deserving of admission. Insofar as the SAT was standardized, it offered a more uniform measure of college readiness than high-school grades, since grading standards vary by school. Above all, the SAT provided a predictive tool, giving admissions officers a means to distinguish between applicants who were likely to perform well or poorly in college.
All of these claims—equity, uniformity, technical reliability, and prediction—resonating as they did with the meritocratic values of leading universities, help explain the SAT’s widespread acceptance and growth over the past century. Though both the test and the terminology describing what it is intended to measure have evolved over time—from “aptitude” to “generalized reasoning” and most recently “critical thinking”—the one constant has been the SAT’s claim to gauge students’ general analytic ability, as distinct from their mastery of specific subject matter.
By the turn of the 21st century, however, the increasing selectivity of college admissions and the intensifying debate over educational equity cast a harsh light on the SAT—and indeed on the very notion that ability should trump achievement in assessing readiness for college. And on closer scrutiny, many of the claimed advantages of the SAT over traditional measures of academic achievement have been found to be illusory:
• The SAT is a relatively poor predictor of student performance. Admissions criteria that tap mastery of curriculum content, such as high-school grades and achievement tests, are more valid indicators of how students are likely to perform in college.
• As an admissions criterion, the SAT has a more adverse impact on poor and minority applicants than do high-school grades, class rank, and other measures of academic achievement. Admissions criteria that emphasize demonstrated achievement over potential ability are better aligned with the needs of disadvantaged students and schools.
The University of California and the SAT
These are the conclusions of a series of research studies begun over ten years ago at the University of California (UC). After Californians voted to end affirmative action in 1996, the UC system undertook a sweeping review of its admissions policies in an effort to reverse plummeting Latino and African-American enrollments. The aim was not to find a proxy for race—none exists —but to find admissions criteria that were valid indicators of later performance in college, yet had less adverse impact on low-income and minority applicants. The results were surprising. In studies of almost 125,000 students entering UC between 1996 and 2001, my colleagues and I found that high-school grades in college-preparatory courses had the least adverse effect on the admission of poor and minority applicants.
High-school grades were also the strongest predictor of college success (Geiser with Studley, 2003; Geiser and Santelices, 2007). Irrespective of the quality or type of school attended, high-school GPA proved the best predictor not only of freshman grades in college, as many studies have shown, but also of long-term college outcomes such as cumulative grade-point average and four-year graduation (Geiser and Santelices, 2007). The predictive superiority of high-school grades was consistently evident across all entering classes, academic disciplines, and campuses in the UC system.
Confident in the value of grades as a selection criterion, UC introduced a new policy in 2001 extending eligibility for admission to the top four percent of graduates from each California high school—based on their high-school GPA.
Grades are sometimes viewed as a less reliable indicator than the SAT because schools differ in grading practices. But SAT scores are based on a single sitting of three to four hours, whereas high-school GPA is based on repeated sampling of student performance over several years. And college-prep classes present many of the same kinds of academic challenges that students face in college—term papers, quizzes, labs, and final exams—so it makes sense that prior performance in such activities would be indicative of later performance.
Advanced Placement (AP) courses, on the other hand, were poor predictors of student success in college, while hurting the chances of admission for applicants from schools with limited AP offerings. A growing number of students now enroll in AP because they earn “bonus points” simply for attending class, without taking or passing the AP exams. This boosts GPAs and improves admissions profiles. But while AP exam scores were a good indicator of students’ later performance at UC, mere enrollment in AP classes bore no relationship to that performance (Geiser and Santelices, 2006). These findings call into question the widespread practice of inflating students’ grades for AP classes.
Most surprising was the relatively poor predictive power of the SAT compared to achievement tests, such as the SAT II subject tests or AP exams, which measure mastery of specific subjects like biology or U.S. history. The SAT’s claim to assess general reasoning abilities, independent of curriculum content, was long thought to give it an advantage in predicting how students will perform in college.
UC has required applicants to take both the SAT I and the SAT II achievement tests since 1968 and so had an extensive database to evaluate that claim. (UC is also a good laboratory for admissions research because it includes a mix of highly selective and less selective campuses.) Our data showed that achievement tests were consistently superior to the SAT in predicting college outcomes, including those for poor and minority students (Geiser with Studley, 2003). After taking account of students’ grades and achievement-test scores, SAT scores added almost nothing to the prediction.
The SAT II writing exam was a relatively strong predictor of success in college, a testament to the importance of writing in almost all college majors. But other subject tests, such as the AP exams, proved to be even better (Geiser and Santelices, 2006). Like grades in college-preparatory courses, it makes sense that tested mastery of foundational subjects such as math, science, and history would be predictive of college performance. These findings helped then-UC president Richard Atkinson persuade the College Board to revise the SAT—adding a writing sample, dropping verbal analogies, and phasing in more advanced math—although it is an open question whether the “New SAT” introduced in 2005 has moved far enough in the direction of an achievement test (Atkinson, 2001).
In sum, the UC findings challenge the widespread belief in the SAT’s capacity to measure student ability and thus predict success in college. While students do differ in their abilities, it is questionable whether a three- or four-hour test can measure such differences with sufficient precision to predict college performance reliably. Traditional measures of academic achievement such as high-school grades and curriculum-based tests are more valid indicators of how students are likely to perform in college.
Achievement Criteria and Minority Admissions
The UC findings also challenge the long-held belief in the SAT’s capacity to identify high-ability students from disadvantaged backgrounds and promote greater equity in college admissions. This belief is rooted in the progressive narrative of American higher education and has proven remarkably enduring.
Yet our data showed that the SAT had a more adverse impact on poor and minority applicants than traditional measures of academic achievement. SAT scores were much more closely correlated than high-school GPAs with students’ socioeconomic characteristics. As a result, the SAT lowered the chances of admission for underrepresented minority applicants, who come disproportionately from disadvantaged backgrounds. When UC applicants were rank-ordered by SAT scores, roughly half as many Latino, African-American, and American-Indian students appeared in the top third of the applicant pool than when the same students were ranked by high-school grades (Geiser and Santelices, 2007).
Nor was the SAT useful for identifying high-ability students from disadvantaged backgrounds, since traditional measures of academic achievement proved to be more effective for that purpose as well. High-school grades and subject tests were the strongest predictors of success at UC even for students from the most disadvantaged schools (Geiser with Studley, 2003). Looking at students with “discrepant scores”—those who scored well on the SAT but poorly on the SAT II subject tests—we found that members of this group came disproportionately from families with higher incomes, performed less well at UC, and were more likely to be white (Geiser and Studley, 2002).
Such differences in the demographic footprint of the SAT and traditional measures of academic achievement are especially problematic where, as in California, affirmative action can no longer be used to offset the SAT’s disparate impact. Given that impact—and given also the SAT’s limited predictive power—it was a straightforward decision to de-emphasize SAT scores in favor of high-school grades and achievement tests as requirements for admission to the UC system, a decision UC made shortly after affirmative action was phased out in 1998.
Achievement Criteria and University Outreach
The advantages of achievement indicators and the limitations of the SAT were evident not only in university admissions but also in UC’s outreach programs to California public schools.After affirmative action was dismantled, UC recognized early on that simply revising its admissions criteria would not be enough and that the university would need to provide significant assistance to the state’s deteriorating K-12 school system in order to restore minority enrollments at UC over the long term. To that end, UC massively expanded its outreach programs to disadvantaged students and schools. These programs included not only conventional one-on-one tutoring and mentoring but also “whole-school” improvement efforts involving teachers, principals, and curriculum development. At their height, before state budget cuts, UC outreach programs were serving over 300,000 students and 70,000 teachers and principals, and UC campuses had established school-university partnerships with 300 of the lowest-performing schools in the state.
College admissions criteria can have a profound influence—a “signaling effect,” as Michael Kirst has called it—on such schools. After UC introduced its policy extending eligibility for admission to the top four percent of graduates from each high school, many California schools in poorer districts found themselves pressured by parents to expand their offerings of UC-required courses so that their children could qualify for the program.
UC’s experience in low-performing schools showed that curriculum-based achievement tests have significant advantages over the SAT in facilitating educational improvement and school reform:
• Achievement tests help reinforce a more rigorous academic curriculum. Unlike the SAT, achievement tests assess students on materials that they have studied in the classroom, so that teaching, learning, and assessment are more closely aligned. The reinforcement that achievement tests provide for the college-preparatory curriculum may be even more important in low-performing schools than in others. Experience in implementing state curriculum standards (as distinct from No Child Left Behind) suggests that a strategy of setting clear content standards, teaching to the standards, and assessing students against those standards may produce the greatest benefits within the most disadvantaged schools.
• Achievement tests serve an important diagnostic function. Unlike SAT scores, which tell students only how well they have performed relative to others, achievement-test scores provide feedback on the specific areas of the curriculum where students are strongest and weakest. Achievement tests provide a better foundation for self-assessment, for both students and their teachers and schools.
• Most important is the message that achievement tests convey to students. A low SAT score sends the message that their performance reflects a lack of ability rather than factors such as unequal access to good schools and well-trained teachers. Especially for poor and minority students, SAT scores can be damaging to self-esteem and academic aspiration.
Achievement tests send a much different message. A low score on an achievement test means simply that the student has not mastered the specified content. This may be due to any number of factors, including inadequate instructional resources and inferior teaching—or lack of hard work on the part of the student. Achievement tests focus attention on determinants of performance that are alterable, at least in principle, and are thus better suited to stimulate educational improvement and reform.
Persuaded by the advantages that achievement tests offer both for college admissions and for K-12 schools, in 2002 the UC faculty crafted what may be the first comprehensive policy adopted by any major U.S. university on the selection and use of admissions tests. The policy concluded that “achievement-oriented tests are both useful to the University in identifying high-achieving students and philosophically preferable to tests that purport to measure aptitude” (University of California, 2002).
Restoring the Primacy of Academic Achievement
The UC studies show that cumulative record in high school is both more valid than the SAT as a predictor of college outcomes and more equitable as an admissions criterion. Fittingly, we speak of those who perform well on this criterion as having “earned” high grades or class rank. Although raw intellectual ability is important, other student qualities such as motivation, personal discipline, and perseverance are also key to achieving and maintaining a strong academic record over the four years of high school. High-school record and related indicators of academic achievement remind us of an older and alternative meaning of “merit,” although one that remains vital for college admissions today: “to earn or deserve; to be entitled to reward or honor.”
Determination of high-school GPA or class rank should be straightforward, without statistical weighting for enrollment in Advanced Placement classes. Not only is the latter practice unfair to students with limited access to AP offerings, but it is also invalid, according to the UC data, since mere enrollment in AP classes is unrelated to later performance in college. “Bonus points” for AP classes are justified only where students demonstrate actual mastery of the subject matter by passing the relevant AP exams. The key, again, is demonstrated achievement.
Limitations of the SAT and ACT as Achievement Tests
Proponents of the SAT often argue that even if high-school grades are the primary basis for admission to college, reasoning tests provide additional information about applicants that can help admissions officers make better decisions. As a supplement to the high-school record, they argue, SAT scores improve predictions of college performance by a statistically significant increment. SAT scores are also useful as a check on grade inflation, they claim, helping to restrain the pressure on grading practices that would result if admissions officers relied on the high-school record alone.
Yet curriculum-based achievement tests are a better supplement to the high-school record than the SAT. Subject-specific tests such as the AP exams, SAT II achievement tests, and the New SAT writing test (the old SAT II) add more incremental prediction, beyond what is provided by high-school record alone, than the verbal and mathematical reasoning components of the SAT. And unlike the SAT, subject tests can function as end-of-course exams, since they are more closely aligned with what is taught in school, thus helping to reinforce a rigorous college-preparatory curriculum.
Can the New SAT be considered an achievement test? Many of the changes in the test, such as the incorporation of the SAT II writing exam and the addition of higher-level math, are evidently intended to move the SAT in that direction. But the SAT‘s provenance as a reasoning test remains evident as well. Most of the New SAT’s content is less closely linked to what students actually study in school than the ACT, whose content is derived directly from surveys of high-school curricula. It would be difficult to imagine using the SAT as a high-school exit exam in the way that the ACT is used in some states.
Nor does the New SAT exhibit other characteristics of an achievement test. It remains “norm referenced,” designed primarily to compare students against one another, rather than “criterion referenced,” intended to measure students’ mastery of curriculum content. It has little diagnostic value in providing feedback to students on specific areas of strength and weakness. And in the first nationwide study of the new test, College Board researchers found that while the writing exam, as expected, was the most predictive of the three individual SAT sections, overall the New SAT was no better at predicting college outcomes than the old SAT (Kobrin, et al., 2008).
The New SAT, in short, is a test at war with itself. Although it has added elements associated with achievement testing, the College Board has been at pains to demonstrate the psychometric continuity between the old and new versions of the test. The New SAT’s heritage as a test of general reasoning ability still predominates, and it remains to be seen whether future iterations of the test will evolve more fully into a curriculum-based assessment.
The ACT, on the other hand, comes nearer to what one would expect of an achievement test. Test content is based on periodic national curriculum surveys as well as review of state standards for K-12 instruction. The test is divided into five subject areas (now including an optional writing test) corresponding to the college-preparatory curriculum. The ACT appears less coachable than the SAT, and the consensus among test-prep providers is that the ACT places less of a premium on test-taking skills and more on content mastery. The ACT also has a useful diagnostic component to assist students as early as the eighth grade to get and stay on track for college.
Like the SAT, however, the ACT remains a norm-referenced test and is used by colleges and universities primarily to compare students against one another rather than to assess curriculum mastery. The ACT is scored in a manner that reproduces almost the same bell-curve distribution as the SAT.
The ACT is also hampered in its aspiration to serve as the nation’s achievement test by the lack of national curriculum standards in the U.S. The ACT has tried to overcome this problem through its curriculum surveys, but the “average” curriculum does not necessarily reflect what students are expected to learn in any given state, district, or school. Nor does the ACT cover each of its subject areas in the same depth as do the AP exams or SAT II subject tests. A single national achievement test may be impossible in the absence of a national curriculum.
An Expanded Role for Subject Tests
Of all nationally administered tests, subject-specific assessments such as the SAT II and AP exams are the best available exemplars of achievement tests. The SAT IIs, now officially renamed the SAT Subject Tests, are offered in 18 subject areas and the AP exams in 33. The SAT Subject Tests are hour-long, multiple-choice exams, while the AP exams take two to three hours and include a combination of multiple-choice, free-answer, and essay questions. Test-prep services such as the Princeton Review advise students that the most effective way to prepare for subject exams is through coursework, and in a telling departure from its usual services, the Review offers content-intensive courses in mathematics, biology, chemistry, physics, and U.S. history to help students prepare for these tests.
The SAT Subject Tests and AP exams do have limitations. Scoring on both is norm referenced, despite the fact that colleges often treat them as proficiency tests (especially the AP exams, which are used for college placement as well as admissions). And the AP program has come under fire from some educators who charge that, by “teaching to the test,” AP classes too often restrict the high-school curriculum and prevent students from exploring the material in depth. A number of elite, college-prep academies have dropped AP for this reason.
Nevertheless, subject tests proved most effective of all nationally available tests in predicting student performance at UC. The AP exams, in particular, were remarkably strong indicators, second only to high-school GPA in predicting college outcomes. Taken together, subject tests performed significantly better than either the SAT reasoning tests or the ACT.
Subject tests also have advantages for students. They provide them with an opportunity to demonstrate knowledge of subjects in which they excel and can assist them in gaining admission to particular college majors. UC has long allowed students to choose one of the subject tests they take for admission, and the elective test had the lowest correlation of any exam with students’ socioeconomic backgrounds while remaining a relatively strong indicator of their performance at UC (Geiser with Studley, 2003).
That students should be able to choose the tests they take for admission may seem anomalous to those accustomed to viewing the SAT or ACT as national “yardsticks” for assessing student readiness for college. But the real anomaly may be the idea that all students should take one test, or that one test is suitable for all students. In the final analysis, admissions tests must be judged on results. If readiness for college is operationally defined by pre-admissions measures that are most directly related to performance in college, then a selection of subject tests—including some selected by students—is superior to either of the generic national assessments.
Back to the Basics
Emphasis on curriculum mastery can help restore a measure of rationality to the overheated world of “high-stakes” college admissions. Norm-referenced admissions tests, such as the SAT, which are designed primarily to compare students against one another, only add fuel to the fire. Such tests also do a disservice to poor and minority applicants. Even where these students achieve real gains in academic preparation, as measured on standards-based assessments, they may fail to improve their relative standing on admissions tests, since the bar keeps rising and the competition for college pushes test scores for all applicants ever higher. Small differences in test scores too often lead to denial of admission for students from less-privileged circumstances, when in fact such differences have little, if any, validity as indicators of how they will perform once in college.
College admission may never be a perfectly fair and rational process, but it can be fairer and more rational than it is today if we judge students on what really matters—demonstrated achievement, as reflected in the high-school record and performance on subject tests. Our first consideration should not be how an applicant compares with others but whether he or she demonstrates sufficient mastery of college-preparatory subjects to benefit from and succeed in college. When we apply that standard, as admissions officers well know, we will find that we have many more qualified candidates than places available, and our candidate pool will be more diverse.
Then begins the true work of admissions in applying institutional selection criteria—special talents and abilities, leadership and community service, opportunity to learn, socioeconomic disadvantage, and, where permissible, race—to build an entering class that reflects our institutional values and commitments.
A Note on Prediction Methods
The UC studies show the SAT to be a relatively poor predictor of student performance in college, yet the College Board claims the test is a “powerful” indicator. Why the difference?
Much of the difference results from differing methods. Researchers affiliated with the College Board typically employ simple bivariate correlation methods in their prediction studies. That is, they examine the correlation between SAT scores and an outcome variable such as freshman grades, and the size of the correlation is taken to represent the predictive power of the SAT. At most, College Board researchers report multiple correlations involving only two or three variables, as when they examine the joint effect of SAT scores and high-school grades in predicting college outcomes (see, for example, Kobrin, et al., 2008).
But correlations of this kind are misleading, since they mask the contribution of socioeconomic and other factors to the prediction. Family income and parents’ education, for example, are highly correlated both with SAT scores and with performance in college, so that much of the apparent predictive power of the SAT actually reflects the proxy effects of socioeconomic status. Princeton economist Jesse Rothstein conservatively estimates that simple correlation studies that omit socioeconomic factors overstate the predictive power of the SAT by 150 percent (Rothstein, 2004).
The UC studies, in contrast, used multiple regressions involving many additional admissions factors—family income, parents’ education, high school quality, subject-test scores, and many others—in order to isolate the predictive power of SAT scores when other factors were taken into account. This method generates “standardized coefficients,” or predictive weights, that permit direct comparison of the relative contribution of individual admissions factors in predicting college outcomes, other factors being equal.
Saul Geiser is a research associate at the Center for Studies in Higher Education at the University of California, Berkeley and formerly director of admissions research for the UC system. His work has shaped numerous admissions initiatives undertaken by UC after Californians voted to end affirmative action in 1996. His research on the predictive validity of achievement tests was influential in persuading the College Board to revise the SAT in 2005.