It is April 1, 1972. I am awaiting the letters that will tell me into which colleges, if any, I have been admitted. I want to go to a university in or near the Boston area, to be close to my girlfriend. The letters come. The envelope from my preferred institution is small. Uh oh. I open it. I'm waitlisted. I wait. I am finally admitted.
My second year as an undergraduate, I start working part-time in the admissions office that waitlisted me, and after graduation I take a position as special assistant to the dean of admissions. Another admissions officer whispers to me that, if I wanted to, I could look up my admissions record. One day, unable to resist temptation, I hang around the office after others have left and look up my file. I'm too ashamed to read the whole thing. I don't have to. I look at the report of the on-campus interviewer. He reports that I have “a flakey personality.” I didn't much like him either.
I eventually leave the job to go to graduate school to study psychology. Although I maintain my interest in admissions, I am now out of the field. But I don't forget my waiting-list file—or that I was lucky to get into college at all. As a child, I had terrible test anxiety and did very poorly on standardized tests. So I've now got two problems to study—admissions and standardized tests.
Thirty years after the waiting-list ordeal, I was rather well-known in the field of human abilities. I was teaching at Yale, where I had an endowed professorship. I was convinced that the way college admission was being done left many very bright and able students with the short end of the stick. Why? Because standardized admissions tests such as the SAT and the ACT measure only a narrow segment of the skills needed to become an active citizen and possibly a leader who makes a positive, meaningful, and enduring difference to the world. And I'm convinced that the purpose of college education is to produce the next generation of such citizens and leaders.
The problem with standardized admissions tests is that they promised, under what have proven to be shaky pretenses, a new social order, but instead they have ended up perpetrating the old one. Prior to their development, college admission, at least to schools of high prestige, was determined largely by socioeconomic status (SES). Standardized tests were designed to replace this fairly rigid social-class system with a meritocratic one. The founders of the testing movement, such as James Conant at Harvard and Henry Chauncey at the Educational Testing Service, had the best of intentions. But there was a fact they could not yet know: Scores on the standardized tests they promoted would end up correlating highly with SES.
In retrospect, this is no great mystery: Children who are given opportunities for more and better education tend to do better on the standardized tests that measure the learning that such education produces. Tests such as the SAT were originally intended to be “aptitude” tests (indeed, SAT originally stood for “Scholastic Aptitude Test”), but they largely measure students' knowledge base and skills in analyzing that knowledge—as do so-called “intelligence” tests, with which exams such as the SAT and the ACT are highly correlated. And students' scores reflect their opportunities to achieve that knowledge and those analytic skills, opportunities that vary radically in homes and at schools across the United States (and elsewhere).
So society has ended up with the appearance of a meritocracy rather than with the real thing. If it does not provide equality of opportunity—and it never fully has—the results of this kind of testing will not be equitable.
Moreover, success in college and in the world at large demands skills beyond knowledge and analytical reasoning. I have proposed a “theory of successful intelligence,” which takes account of this fact.
“Successful intelligence” is the setting and attainment of personally meaningful goals. It involves creative skills in coming up with new ideas, analytical skills in determining whether the ideas are good ones, and practical skills in implementing those ideas and convincing others of their value. Successfully intelligent people capitalize on their strengths with regard to these skills and correct or compensate for their weaknesses.
Standardized college admissions tests assess only analytical skills, as well as the knowledge base on which they act, and completely ignore creative and practical skills. Yet who can keep pace in a rapidly changing world without the ability to adapt to change creatively and flexibly, and who can adapt without the ability to translate ideas into action?
The Rainbow Project
As described in my book, College Admissions for the 21st Century (Harvard University Press, 2010), in 2001, I, along with others dubbed the Rainbow Project Collaborators, initiated a project called Rainbow to test a new approach to admissions. Although the project was planned at Yale, it involved roughly 1,000 high school seniors and freshmen at colleges ranging from totally non-selective (community colleges) to somewhat selective to highly selective. There was good geographic, gender, and ethnic balance. We collaborated with faculty and admissions officers throughout the country in trying out this new approach.
We measured analytical skills through the kinds of problems found on typical tests of “intelligence,” using both multiple-choice and performance-based items. I will not describe the multiple-choice problems for reasons that will be made clear below.
For the creative tests, we had, in addition to the multiple-choice items, performance items of three kinds. In one, participants were asked to write creative short stories using two among about a dozen choices of titles that we gave them, such as “The Octopus's Sneakers,” “3421,” or “The Beginning of Time.” In another kind of creative test, we showed several pictorial collages, and participants told a story about one of them by speaking into a microphone attached to a computer. In a third kind, we presented cartoons that participants had to caption, like those found in The New Yorker.
For practical skills, we also had three kinds of items beyond the multiple-choice ones. In one, participants were presented with a scenario similar to one they might have encountered or would encounter in college—for example, working hard in a course to achieve an A. They were asked to evaluate the effectiveness of various options in helping them achieve the A.
In a second kind of item, participants were presented with work-related scenarios, such as working on a project with a colleague whom they really disliked. They were asked to evaluate various options for dealing with the unpleasant colleague.
In the third kind, participants were presented with a short movie clip presenting a situation that they might have encountered or would encounter in college. For example, one clip had a student walk into the office of a professor to ask for a letter of recommendation; as the professor looked up from her desk with a blank stare, the student realized that she did not know who he was.
Four major results emerged out of the study.
First, it was possible to separate out, statistically, creative and practical skills based on performance tests. The theory of successful intelligence predicts this—contrary to traditional theories of intelligence, which argue that there is just one general overriding intellectual ability, typically called g, (general intelligence, which is similar in conception to IQ). But we also found something unexpected: All of the multiple-choice test questions, regardless of whether they were supposed to measure analytical, creative, or practical skills, loaded on a single factor (underlying general mental skill). That is, all multiple-choice items measured the g, or general-intelligence.
Second, our Rainbow assessment doubled the predictive value of freshman GPA beyond the level one would obtain from the SAT alone. Even if we threw high school GPA into the equation, we still increased the prediction of first-year college academic success by 50 percent.
Third, our tests reduced ethnic-group differences by a substantial amount—those differences were considerably less than they are on the SAT. Thus we increased prediction at the same time that we decreased differences due to ethnicity. This is not a common result.
Finally, students reported that they enjoyed taking the assessments much more than they did more conventional standardized tests. They felt that our assessments enabled them to show themselves more as they were and in broader terms than conventional testing did.
Our article reporting on these results was published as the lead piece in a major journal in the field, and the results were covered fairly widely by the popular press. We were very proud of our results and hopeful that the testing company that had been funding our research now would do so at a higher level.
That didn't happen. Instead, they terminated the funding, arguing that the kinds of assessments we were using could not be “upscaled” and hence were useless to them. We were grateful for the funding we had had, but we had expected more to come.
The Advanced Placement Project
This was not the only project we were doing for the testing company. We also had done two others, the results of which were published in high-quality refereed journals, examining the Advanced Placement (AP) examinations in psychology, statistics, and physics. Based on our Rainbow results, we predicted that if we inserted items assessing creative and practical thinking into the AP tests in these areas, we would increase the “construct validity” of the tests—the extent to which they measured what they are supposed to measure—while reducing ethnic-group differences. After all, the AP tests are, in a sense, simply more difficult and knowledge-steeped versions of the SAT.
And this is exactly what we found. We were able to increase construct validity and reduce ethnic-group differences. When we got these promising results, that funding ended as well.
In 2005, I was at a crucial juncture in my career. I had been a professor at Yale for 30 years and viewed the Rainbow Project as, in a sense, the culmination of my work there. But I was left without funding for what had come to be my most important research project (among many). And I had become convinced that as an administrator I might be able to get something done rather than just exhorting others to do things. So I decided upon a change of career and left Yale.
The Kaleidoscope Project
In mid-2005, I arrived at Tufts University as dean of the School of Arts and Sciences. I spent the first year promoting the idea of an upscaled version of the Rainbow Project, to be called Kaleidoscope. The idea was to measure the creative, analytical, and practical skills that Rainbow had assessed. But my theory of successful intelligence had been augmented; it now included wisdom as well. One needs not only creative skills to generate new ideas, analytical ones to assess them, and practical skills to make them work, but also the wisdom to ensure that they help achieve a common good, over the long as well as the short term, through the infusion of positive ethical values.
Tufts had (and still has) an outstanding dean of undergraduate admissions, Lee Coffin, who was not only interested in the project but became a full collaborator in it. We now knew that multiple-choice questions were a dead end in terms of measuring broader skills. So together with his admissions staff, Coffin created essay questions to measure creative, analytical, practical, and wisdom-based skills.
In reality, no question is a pure measure of a single skill. But a prompt designed to assess analytic skills might ask applicants to describe their favorite book and to say why it was their favorite, whereas a creative question might ask what things would be like today if some event in history had turned out differently—for instance, if the Nazis had won World War II. Or it might ask them to write a short story with a title such as “The End of MTV” or “Confessions of a Middle-School Bully” or ask them to draw something. A practical question might ask applicants how they had convinced someone of something that the person had not initially believed. And a wisdom-based question might ask applicants how some passion they had in high school might be turned toward a common good. (The complete set of questions used in the first five years can be found in the appendix of College Admissions for the 21st Century, mentioned above.)
We created rubrics to rate the answers. We would evaluate creative answers for their novelty, quality, and appropriateness to the task and analytical answers based on their analytical strength, organization, logic, and balance. Practical answers would be scored for their feasibility with respect to time, place, and human and material resources, as well as their persuasiveness. And we would assess wisdom-based answers based on the extent to which they reflected a desire to reach a common good by balancing the student's own interests with larger ones.
But scoring responses to essay questions requires additional resources on the part of an admissions office. How would we acquire those resources? In fact, as a dean I found nothing easier to raise money for than the Kaleidoscope Project. Many of Tufts' most successful alumni were people who had not excelled on standardized tests but who had nevertheless achieved great things in their life, and some of them were happy to help Kaleidoscope become a reality.
Kaleidoscope differed from Rainbow in one fundamental respect. Rainbow was a study whose results were not actually used in making admissions decisions, whereas Kaleidoscope was done as action research: The results were used in making those decisions. Roughly two-thirds of the 15 to 16 thousand applicants per year responded to the Kaleidoscope questions, so we built up a very substantial data base during the five years I was dean of arts and sciences at Tufts.
We made four major decisions early on. The first was that participation in Kaleidoscope would be voluntary. At that point, we did not want to risk putting such questions on the application and then finding that our number of applications had plummeted—a result that would have been undesirable both for the university and for my longevity as dean.
Second, we decided that students would be asked to write just one essay. We did not want to burden applicants unduly, given all the work they already had to do for college admissions.
Third, we decided that we would score not just the essays but the whole application for creative, analytical, practical, and wisdom-based skills. There were many ways applicants could demonstrate those skills. For example, creative skills might be revealed in experiences such as writing poetry, composing music, doing a major science project, or inventing something. Practical skills might be evident if the applicant had started a successful business, written for or edited a student newspaper, or engaged in high-level debating. Raters would score the application with a quality grade or simply indicate that there was insufficient information to rate the application for a particular dimension.
Fourth, we decided that Kaleidoscope scores would be used only in a positive way. That is, we would not penalize applicants for an uncreative or impractical essay; we only would view as additionally meritorious applicants who persuaded us of their unusual creative, analytical, practical, and/or wisdom-based skills.
In general, the assessment tended to be most useful for applicants in the middle of the distribution of those judged to be acceptable. Applicants who stood out in terms of their traditional credentials tended to be admitted in any case. Applicants who clearly could not do the academic work that Tufts demanded were usually rejected, regardless of their Kaleidoscope ratings. But there were many applicants in the middle who were neither clear admits or rejects on the basis of traditional credentials, and it was for those applicants that Kaleidoscope was most helpful.
Six major results emerged from the Kaleidoscope Project.
First, as was the case for Rainbow, Kaleidoscope predicted academic success at Tufts beyond the prediction obtained from the SAT and high school GPA. The gain was smaller than with Rainbow (which involved a wider range of student abilities and thus produced stronger results), but it was significant.
Second, Kaleidoscope predicted meaningful participation in extracurricular and leadership activities. In other words, its predictive power was not limited to the academic.
Third, scores on Kaleidoscope showed no ethnic-group differences, a result stronger than we had obtained with Rainbow. However, the integrity of this result is compromised by the fact that whereas Rainbow scorers had not known the ethnicity of the participants whose information they were scoring, Kaleidoscope scorers did. It is impossible to say whether this knowledge biased their responses.
Fourth, during the years that Kaleidoscope was used, mean SAT scores and GPAs increased every year. Obviously, we cannot say that Kaleidoscope caused these increases; many other factors may have been responsible as well. But some people had the misconception that scores on a broader admissions assessment would be negatively related to scores on traditional tests. This was not true. Rather, scores on the two kinds of assessments were positively related, although very weakly.
Fifth, during those years, applications from, and the admission of, underrepresented minority-group members tended to increase. Again, many factors may have been responsible, but it was good to know that, for whatever reason, the figures were going in the direction we wished them to go.
Finally, applicants and their parents and school counselors liked the expanded application. They felt it conveyed information about the applicant that could not be transmitted through the Common Application or other conventional admissions assessments.
In the ideal world, consideration of augmented skills would not be just a matter for the admissions office. If you admitted, say, students who tend to be creative or practical thinkers, then, once they arrived, college teachers would teach and assess those students in ways that matched their patterns of skills.
At Tufts, we started the Center for the Enhancement of Learning and Teaching (or CELT, an appropriately name for a unit in a school located where the Boston Celtics have a large following), initially directed by Linda Jarvin. The purpose of the Center was to teach faculty how to teach to students with diverse learning and thinking styles.
In my research over the years, I had found that students learn in different ways; teaching should reflect that fact. Of course, life is not arranged for tasks to be presented to us in ways that always allow us to work from our strengths. So we need to teach in all ways to all students, so that at any given time, some students are capitalizing on strengths and others are correcting or compensating for weaknesses.
My own experiences as a student reinforced my belief it is extremely important to teach to the way students learn. As a freshman, I was determined to major in psychology because, as a child, I had done poorly on IQ tests. Unfortunately, I did poorly in my introductory-psychology course as well, pulling a grade of C. My professor commented to me when I received my first test score that there was a famous Sternberg in psychology (he was referring to “Saul Sternberg,” no relation), and it was obvious there would not be another one.
I was crestfallen and decided to major in math. Suffice it to say that I did worse in the introductory real-analysis course for math majors than I did in introductory psychology, and my professor suggested that I drop the course. I did, because at that point, a C was looking pretty good.
Thirty-five years later, when I was president of the American Psychological Association, I commented to the psychologist who had been president the year before on the irony that the president of the largest association of psychologists in the world had received a C in introductory psychology. He commented that he had received a C as well.
The truth is, in my whole career in psychology, I was never asked to memorize facts from a book or a series of lectures. How many multiple-choice tests have you taken lately? We need to teach in ways that elicit the skills that actually will be needed for success later on, not just for success in the artificial environments of many of our classrooms.
After five years at Tufts, I was looking for a new challenge, and one presented itself.
The Panorama Project
In 2010, I moved to Oklahoma State University as provost and senior vice president. I had long believed that the goal of college education is to produce the active citizens and positive leaders of tomorrow, and this notion was deeply embedded in the land-grant mission of Oklahoma State. Also, neither of my parents had graduated from college or even high school, and I felt especially attracted to my new institution because so many of the students, like myself, were first-generation college students.
In some respects, the challenge at Oklahoma State was even greater than at Tufts. Tufts had been using holistic assessment of admissions applications for many years. Oklahoma State, in contrast, had (and still has) four doors to admission: ACT score, high school GPA, a combination of ACT score and high school GPA, and a holistic assessment. The large majority of students were admitted through the first three doors.
But I was fortunate in that Oklahoma State had a wonderful vice president for enrollment management, Kyle Wray. Wray and I were totally on the same page. We decided that beginning in mid-2012, we would use Panorama for admission through the holistic door and for scholarship consideration (with the hope of expanding its use in future years).
Currently, the university uses a variant of Panorama for these routes to university admission and financial aid, as well as for admission to the Honors College. As in Kaleidoscope, the essays are optional. But students who wish to be considered for scholarships, regardless of the door to admission they choose, are well advised to answer the questions.
Applicants participating in Panorama are encouraged to do three essays from among the many options they are given. We are now piloting essays to ensure that we use only the best ones. We had learned at Tufts that we could not easily predict which would be effective just from their appearance. The only way to know was to try them out.
At Oklahoma State, as at Tufts, we are following up on our admissions procedure with enhanced methods of instruction and assessment. The number one academic goal of our president, Burns Hargis, is to increase first-year retention, currently hovering at about 80 percent. So we are starting to teach in the varied ways in which students learn, such as analytically, creative, or practically—or visually, orally, or kinesthetically.
Our new Learning and Student Opportunity Success (LASSO) Center (as aptly named as CELT, given that our athletic teams are referred to as Cowboys and Cowgirls), directed by Cheryl DeVuyst, devotes substantial resources to helping each student optimize his or her learning potential. The Center is available to all students but focuses on those in their first year: We do not want any student to drop out for lack of preparedness to face the challenges of life and work at Oklahoma State. And our newly reorganized Institute for Teaching and Learning Excellence, headed by Christine Ormsbee, teaches faculty how to teach to students' diverse learning styles.
If medical tests today were the same as they were a century ago, our society would be in very bad shape. Imagine going to a doctor with an illness and watching her pull out leeches or the mercury compound. Medical testing has advanced greatly in the last hundred years. But educational testing has advanced little since the first intelligence test created by Alfred Binet and Theodore Simon in 1905 or the early standardized college admissions tests of the early 1900s.
Certainly there have been cosmetic changes, but even those have been mostly in response to great pressure on the testing organizations, such as when Richard Atkinson, president of the University of California, threatened to pull the university system out of the College Board testing system unless changes were made in the test. A writing section was added, but unfortunately, the increased length added to the burden of the test-taker—both in time and money—without increasing the predictive validity of the test. Essentially, it has been a lot of fanfare with little benefit.
I have been hoping that testing organizations would change, but my prediction is that, like other businesses, they will do so only in response to market pressure. Not everyone can apply the kind of pressure that Richard Atkinson did, but why don't admissions offices (and others) demand change, especially when they know that the current tests are less than fully adequate?
There are several reasons, I believe.
First is what I call the pseudo-quantitative precision heuristic. People like the fact that tests give them numbers and make everything seem easily quantifiable, even if the numbers in themselves are not very adequate representations of a student's abilities and full range of relevant achievements. Psychologists in particular tend to be drawn to things that are easily countable rather than to things that count.
Second is what I refer to as the similarity heuristic. Psychologists know that people tend to be attracted to others like themselves. Most of the people making decisions about college admissions did reasonably well on standardized tests—otherwise, they would not have been admitted to college themselves. So they look for others like them.
Third is what I have named the pseudo-accountability heuristic. Using test scores gives the appearance that colleges and universities have impartially taken into account the skills of the applicants. If a student does not do well in college but did well on the standardized tests, at least they can blame the tests rather than themselves.
Fourth is the publication heuristic. Magazines and other sources of ratings use standardized test scores to evaluate colleges and universities. So institutions, in trying to raise their profiles, become ever more enmeshed in the testing system instead of recognizing they should not accept it.
Fifth is what I refer to as the convenience heuristic. Students generally pay for the tests, not the colleges and universities—which end up being freeloaders off this system, getting information and not paying for it.
Sixth is plain superstition. People typically do not seek to disconfirm their presuppositions, in this case that the tests work. So they continue to believe in their usefulness without empirically verifying it. For some, testing is almost a religion: They have faith!
At Oklahoma State University we have relied heavily on standardized tests with no data to support this reliance. But when we did the statistics, we found that the increment in prediction of either first-year retention or six-year graduation over and beyond what we learn from high school record is trivial in practical terms. (In statistical terms, the increment in R-squared, or percentage of variance accounted for, was .0024 for first-year retention and .0037 for six-year graduation.) We are thus expanding our admissions with Panorama, as described above. We also have found that a risk factor for dropout after one year is high ACT score coupled with low high-school grades.
The standardized college admissions tests originally seemed likely to serve a noble purpose—moving our society away from one in which privileges were doled out on the basis of parental wealth and social status and toward a system based on merit. When almost everyone taking the test was white, male, and upper class (or at least upper middle class), perhaps there was a certain logic to this: At that time, most of the variation in test scores would have been a result of differential academic skills (although the tests, even then, would not have tapped much into leadership skills).
But today, a much broader population of students takes the standardized tests, and much of the variation in their performance reflects differing levels of opportunity. This is not fair or good in a society that is increasingly polarized socioeconomically. But unless there is economic (or legal) pressure on the testing companies, they are unlikely to change when, financially, they are playing a winning game.
Projects like Rainbow, Kaleidoscope, and Panorama point to a new direction for assessment in college admissions. They recognize that colleges are not there just to cater to the socioeconomically privileged or to those who are the top scorers in the academic game. Moreover, successful donors like those who supported Kaleidoscope often did not do particularly well on standardized tests and may not have been at the top of the GPA distribution either. It is ironic that they are viewed as inferior in our current system of assessment, because they will remember how they were viewed when we ask them for money. If they are treated shabbily, we who raise money can expect only the same treatment in return.
In selecting those whom we will educate to be the future movers and shakers of our society—leaders at the family, community, state, national, and international levels—we need to determine whether as applicants they display the germs of leadership: the capacities to create a vision, analyze whether it is a good vision, implement that vision, persuade others of its value, and help achieve a common good. We need broader assessments not because they are new but because they will help us achieve the kind of society in which we all wish to live.
Acknowledgments: I would like to thank all my collaborators (especially the Rainbow Project Collaborators), Dean Lee Coffin of Tufts University, and Vice President Kyle Wray of Oklahoma State University, without whom none of this work would have been possible. The Rainbow Project was funded by the College Board, and the Advanced Placement Project by the College Board and the Educational Testing Service. Private donations to Tufts University made the Kaleidoscope Project possible, and Oklahoma State University provides the resources for the Panorama Project.