The Test Generation

On exam day in Sabina Trombetta's Colorado Springs first-grade art class, the 6-year-olds were shown a slide of Picasso's "Weeping Woman," a 1937 cubist portrait of the artist's lover, Dora Maar, with tears streaming down her face. It is painted in vibrant -- almost neon -- greens, bluish purples, and yellows. Explaining the painting, Picasso once said, "Women are suffering machines."

The test asked the first-graders to look at "Weeping Woman" and "write three colors Picasso used to show feeling or emotion." (Acceptable answers: blue, green, purple, and yellow.) Another question asked, "In each box below, draw three different shapes that Picasso used to show feeling or emotion." (Acceptable drawings: triangles, ovals, and rectangles.) A separate section of the exam asked students to write a full paragraph about a Matisse painting.

Trombetta, 38, a 10-year teaching veteran and winner of distinguished teaching awards from both her school district, Harrison District 2, and Pikes Peak County, would have rather been handing out glue sticks and finger paints. The kids would have preferred that, too. But the test wasn't really about them. It was about their teacher.

Trombetta and her students, 87 percent of whom come from poor families, are part of one of the most aggressive education-reform experiments in the country: a soon-to-be state-mandated attempt to evaluate all teachers -- even those in art, music, and physical education -- according to how much they "grow" student achievement. In order to assess Trombetta, the district will require her Chamberlin Elementary School first-graders to sit for seven pencil-and-paper tests in art this school year. To prepare them for those exams, Trombetta lectures her students on art elements such as color, line, and shape -- bullet points on Colorado's new fine-art curriculum standards.

All of this left Trombetta pretty frustrated, and on a November afternoon, she really wanted to talk. As she ate lunch (a frozen TV dinner) in her cheery, deserted classroom plastered with bright posters, she recounted the events of the past week. She liked the idea of exposing her young students, many of whom had never visited a museum, to great works of art. But, Trombetta complained, preparing the children for the exam meant teaching them reductive half-truths about art -- that dark colors signify sadness and bright colors happiness, for example. "To bombard these kids with words and concepts instead of the experience of art? I really struggle with that," she said. "It's kind of hard when they come to me and say, 'What are we going to make today?' and I have to say, 'Well, we're going to write about art.'"

Harrison District 2 spent about six months creating a test that turned out to be far too difficult for most first-graders, who are just learning to read full paragraphs, let alone write them. Yet the children's art-exam scores, along with results from classroom observations, will determine Trombetta's professional evaluation score and, consequently, her salary. If she "grows" her students' test scores over the course of the year, she could earn up to $90,000 -- more than double the average for a Colorado teacher. But if her students score poorly two years in a row, her salary could drop by as much as $20,000, and she could eventually lose tenure.

Like many Harrison teachers, Trombetta isn't sure whether she wants to continue working in the district, despite the possibility of significant salary gains. She loves her school and its principal, and she supports evaluating teachers based, at least in part, on their ability to advance student achievement. But she's torn on the value of test prep: "I want to maintain a sense of integrity and be faithful to my values about art."

In the social sciences, there is an oft-repeated aphorism called Campbell's Law, named after Donald Campbell, the psychologist who pioneered the study of human creativity: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." In short, incentives corrupt. Daniel Koretz, the Harvard education professor recognized as the country's leading expert on academic testing, writes in his book Measuring Up that Campbell's Law is especially applicable to education; there is a preponderance of evidence showing that high-stakes tests lead to a narrowed curriculum, score inflation, and even outright cheating among those tasked with scoring exams.

A number of state- and city-level studies from the No Child Left Behind era found that swiftly rising scores on high-stakes state tests were accompanied by appalling stagnation in students' actual knowledge as measured by the National Assessment of Educational Progress, the gold-standard exam administered to a sample set of students each year by the federal Department of Education. In 2005, for example, Alabama reported that 83 percent of its fourth-graders were proficient in reading, even though the NAEP found that only 22 percent of these children were proficient readers. The harsh punishments associated with NCLB had encouraged Alabama and most other states to dumb down their tests and then teach directly to them.

"The kind of motivation that results from pressure can get you certain kinds of test scores, but what happens is that the motivation and the learning don't persist over time," says Edward Deci, a social psychologist and expert on motivation. Deci has studied the effects of testing on teaching and learning since the early 1970s, and he is a firm opponent of tying teacher evaluation and pay to student test scores. "The kind of learning associated with pressure is rote learning, rather than conceptual learning," he says.

Despite these warnings from social science and the patent absurdity of first-graders writing critiques of Matisse, Harrison's culture of high-pressure testing could represent the wave of the future in Colorado -- and across the country. In May, the Colorado Legislature narrowly passed Senate Bill 191, or "The Great Teachers and Leaders Bill." Taking cues from the Obama administration's education-reform agenda, a narrow bipartisan majority voted to overhaul the way Colorado's teachers are evaluated and granted tenure. Beginning in 2013, 51 percent of every teacher's annual professional evaluation score must be based on student-achievement data. Once the law goes into full effect, any teacher in the state can lose tenure if he earns unsatisfactory performance evaluations two years in a row. If he then fails remediation efforts and loses his job, he won't be guaranteed a new one; after one year without a classroom assignment, he will be let go.

An expert panel appointed by Colorado's previous Democratic governor, Bill Ritter, is working to develop tools for assessing student growth and incorporating the data into teacher-evaluation scores. The panel could push for mandatory testing across every grade and subject area, as Harrison does. (The district's students even sit for pencil-and-paper tests in gym class.) It could also ask the state to measure student growth using a combination of test scores, portfolios of student work, and in-class presentations, as union advocates and the Obama administration would prefer. The federal Department of Education is "developing guidance for states, so they appreciate it doesn't have to be a paper-and-pencil test," says Carmel Martin, the assistant secretary for planning, evaluation, and policy development. "In things like music and physical education, there are other ways."

The downside of more holistic evaluation systems is their subjectivity compared to test results, especially when livelihoods and reputations are at stake. Creating and implementing more complex evaluation tools would also be more expensive than simply writing and grading additional tests.

As New York, Louisiana, and other states revamp their own teacher-evaluation systems to incorporate student-achievement data, they are paying attention to how Colorado implements SB 191. New York City Mayor Michael Bloomberg visited the state in October and praised the legislation in several speeches. Meanwhile, state Sen. Mike Johnston, the former principal and Teach for America teacher who is the driving force behind the bill, travels the country promoting his efforts to lawmakers and education philanthropists. In December, he was enthusiastically received by the New Jersey Legislature, which is considering a similar reform agenda; in February he spoke on the keynote panel at the Washington, D.C., conference celebrating Teach for America's 20th anniversary.

The Colorado Education Association, the state's largest teachers' union, campaigned hard against SB 191, but the fact that the state's small American Federation of Teachers affiliate endorsed the bill is a marker of just how radically the education-policy debate has shifted in recent years, with Democratic Party moderates from Barack Obama on down embracing the standards and accountability reform agenda once associated primarily with President George W. Bush. Indeed, Sabina Trombetta could be the archetypal Obama-era teacher, squeezed between her own belief in the importance of raising student achievement -- especially for disadvantaged kids -- and her desire for professional autonomy.

Harrison's focus on testing does seem to be improving its students' standardized test scores, though there is no data yet on whether the district's performance-pay system attracts or repels highly skilled teachers in the long-term or helps increase high school-graduation and college-attendance rates for students. What's clear is that the system is resulting in more standardized testing, which most American parents heartily oppose. Harrison administrators say they hear almost no complaints from parents -- but it remains to be seen whether reforms like Harrison's would go over so well in cities like Aspen or Boulder, as more affluent communities have a history of protesting test-focused instruction.

All the high-stress test days -- approximately 25 per school year for the typical Harrison child -- might be worth it if more low-income kids graduate high school having been exposed to great art and music and the best instruction in science and civics. The testing might not be worth it if it causes frustrated veteran teachers to leave the district or even the profession, exposing children to a steady stream of inexperienced, often underprepared newbies. Colorado is about to find out.

"What was the urgency?" Trombetta asked of Harrison's new pencil-and-paper testing in art, music, and gym. "So we can be the first in the nation? At what cost?"

***

Harrison School District's groundbreaking teacher "effectiveness and results" plan, known as E&R, was launched in 2007 by Superintendent Michael Miles, a charismatic U.S. Military and State Department veteran who traveled the world as a Sovietologist before returning to his home state of Colorado in 1995 and entering public education. After working as a social studies teacher, principal, and assistant superintendent, he ran in the 2004 Democratic Senate primary against Ken Salazar and lost. When Salazar ascended to President Obama's Cabinet in 2009, Miles again expressed interest in the job, which eventually went to another nationally recognized education reformer, Michael Bennett, a former Denver schools chief.

As superintendent in Harrison since 2006, Miles enjoys wider latitude over teachers than most district leaders. Harrison lacks collective bargaining, so Miles has been able to completely do away with the traditional teaching career ladder; within the district, teachers can only earn raises by demonstrating student-achievement growth, never from accruing additional degrees or spending more years on the job. In September 2011, E&R will expand to encompass school guidance counselors and social workers. Those evaluation systems are still under development but will likely measure professional "effectiveness" by tracing data on student attendance and behavioral referrals back to staff who counsel each child.

Michelle Frank, a counselor at Harrison's Wildflower Elementary School, welcomes the new system. "I know I play a huge role in attendance," she says. "I'm tracking kids who are absent, and sitting down with their parents to explain why it's so important to be at school." I ask Frank if she ever meets a student whose family situation is so dire -- abuse, homelessness, illness -- that she feels it would be unfair to base her evaluation and pay on how well that child functions at school. "No," she replies. "I grew up in this neighborhood, so I have a lot invested in it. It's been way too long that we've been sitting on tenure."

An August 2010 district poll found that about 55 percent of Harrison teachers support E&R; in conversation, the typical Harrison educator expresses mixed feelings about the system. During an after-school meeting last November between Miles and teacher representatives from each school in the district, one young woman leaned over and whispered to me, "There's a lot of unhappiness. A lot of emotions are being ignored." Later in the meeting, a middle-aged teacher raised her hand. "It seems we don't have time to teach," she told Miles, "because every time we turn around, we're testing."

Miles admits that some district assessments -- like the first-grade art exam -- need improvement. "The prompt was probably too hard for first-graders," he said. "Next year it will be easier." But he is unapologetic about the anxieties caused by the district's obsession with test scores; he regards less quantitative educational philosophies as lacking in rigor. "For the first time, you have art teachers saying, 'I'm going to have to teach to the standards, not just do coloring," he said in an interview.

Miles showed me a memo titled "Looking for Heroes: A note to potential Harrison teachers and educators," which he plans to give every job applicant who interviews in Harrison. In it, he writes, "If you prefer routine and the status quo, you might want to consider a different district. If you think 'tenure' gives you more job security than teaching effectively, do not come to the Harrison School District. If you are given to excuse-making or blaming our parents or students, do not come to the Harrison School District."

The letter is a thinly veiled attack on teachers' unions and the job security for which they fight. Mike Stahl, former executive director of the Pikes Peak Education Association, says union membership in Harrison has decreased by half under Miles' leadership, and that teacher turnover, at about 25 percent from year to year, "is the highest in the state among like-sized or larger districts." According to Stahl, Miles "is very anti-union and very prone to retaliation for speaking in opposition to district or superintendent plans. ... There was no collaboration with staff or union in the development of this plan. As a result, district teacher morale is extremely low."

According to district data, 74 of Harrison's 820 teachers performed at the "distinguished" level during the 2009-2010 school year; 89 percent of that elite group returned to their jobs. A total of 103 tenured teachers received either an "unsatisfactory" or "progressing" evaluation score, a third of whom resigned or were dismissed.

Since Miles became superintendent, Harrison's scores on state exams in math, reading, and writing have steadily increased. In reading, for example, 54 percent of Harrison students were proficient in 2005, compared to 61 percent in 2010. Critics who chalk those gains up to "drill and kill" teaching might find at least one thing to love about Harrison District 2: Its test score-based teacher-evaluation system is matched by intense professional-development efforts of the sort promoted by education experts from across the political spectrum. Harrison teachers are told to expect up to 16 on-the-spot classroom observations per semester from administrators and instructional consultants; after these visits, teachers receive feedback on everything from classroom layout to lesson plans to whether they are spending too much or too little time explaining assignments to students before letting them try a hands-on activity.

"This is a high-anxiety district to work in," said one elementary school writing instructor in his second year as a Teach for America recruit in the district and unsure if he'll return for a third. When I met him in November, the teacher agreed to be interviewed on the record; by March, as this article was being prepared for publication, he had changed his mind. He noted that Harrison teachers feel constantly watched -- not only are they subject to all those surprise observations, but they must keep their classroom doors open at all times. But "really systemic, momentous things are happening right now, and I am at the ideological epicenter of that change," he added. "If nothing else, it's really interesting."

***

The public debate around SB 191, the bill that requires every school district in Colorado to evaluate teachers based on student-achievement data, made many educators in the state uneasy. The local media framed the issue as a political showdown between bipartisan reformers and the Colorado Education Association over outdated teacher-tenure protections, with the reformers representing the interests of children. A May 14 Denver Post editorial on the bill accused its critics of airing "hysterical assertions": "Every legislator who voted for this measure, particularly Democrats who resisted the full court press of the CEA, should be both proud and concerned. We say concerned because it's not over yet. ... Education reformers must remain focused and watchful to ensure the creation of a detailed and fair evaluation system isn't sabotaged by opponents."

Zach Rupp, a Denver elementary school music teacher, remembers the SB 191 debate as a time of "a lot of negativity" about teaching. One teacher in suburban Douglas County, midway between Denver and Colorado Springs, told me that the months of legislative argument over the bill made her feel like "I've chosen a profession that, in the public eye, is worse than prostitution."

In framing the debate as one about the fate of bad teachers with tenure, the local media ignored the fact that in order to collect "growth" data on every single Colorado public school teacher, the state would likely need to administer more standardized tests to kids. Many smart and motivated potential teachers are turned off by systems that over-rely on standardized lessons, worksheets, and tests, says Christina Jean, a former Colorado public school teacher who now works as a field coordinator for the Denver Teacher Residency program, an alternative licensing pathway for career changers. "That's not challenging and stimulating for adults," she says. "One of the things I think is really critical is re-professionalizing and re-intellectualizing teaching. I am an intelligent person who has this love and passion for educating kids, so let me use what I know about my content area, about pedagogy, about child psychology, to create an experience for my students that reflects my expertise. Let me create my own assessments that are in-line with state standards."

But are the results of such standardized tests, many of them new and experimental, a fair way to measure the effectiveness of teachers and schools?

There are two popular, relatively new ways to measure teacher effectiveness using standardized tests. The first, called the "growth model," uses Jimmy's score on last year's reading exam -- let's say 72 percent -- to predict his score on this year's exam. From years of data, the state knows the average third-grader who scores 72 becomes a fourth-grader who scores 76. If Jimmy instead scores an 80 in fourth grade, the growth model would attribute his improvement relative to his peers to the skill of his fourth-grade classroom teacher and reward her with a better professional evaluation score.

Research by testing experts has shown, however, that this model stacks the deck against teachers who work with high-poverty, academically struggling students. It turns out that kids who begin the school year with decent standardized test scores are the same ones who will "grow" fastest academically; all the advantages such students bring to school -- supportive parents, highly verbal home environments, better nutrition -- also help them continuously learn more, improving their performance on tests over time. Meanwhile, poor students are more likely to experience disruptions such as moves, divorces, and homelessness that make their academic performance fluctuate from year to year and even day to day.

To address this shortcoming, economists who study education have created a complex statistical tool called value-added measurement, which attempts to quantify the impact teachers have on their students' academic growth while controlling for factors such as poverty, race, class size, and even how many years the teacher has been on the job. The Obama administration has asked states to incorporate value-added measurement into all teacher evaluations, and according to Johnston, the state senator, Colorado's new evaluation system will use value-added to calculate at least part of the 51 percent of a teacher's evaluation score that will be based on student growth. Cities including New York, Washington, D.C., Chicago, and Los Angeles are already using the technique.

Value-added could be called the most controversial math equation in the United States -- or really, group of math equations, because there are as many ways to compute a teacher's value-added score as there are districts experimenting with the technique. In August, the Los Angeles Times created a searchable online database of the value-added scores of 6,000 LA public school teachers. Secretary of Education Arne Duncan said he supported the move, but union officials slammed the paper -- even American Federation of Teachers President Randi Weingarten, who has cautiously supported experiments with value-added. In New York City, the teachers' union is fighting the Bloomberg administration's attempt to release the value-added scores of 12,000 teachers to local media, including The New York Times.

Rival groups of education researchers interpret the reliability of value-added differently but even the technique's defenders have urged caution, as have the Educational Testing Service and the Department of Education's own Institute for Education Sciences. Experts raise a number of powerful objections: that value-added measurements are often based on poorly designed, unsophisticated standardized tests; that the ratings are particularly volatile (a teacher who scores very well or very poorly using value-added has only a one-third chance of getting a similar score the following year, and it takes about 10 years of data to reduce the value-added error rate to 12 percent for any individual teacher); and that the technique gives the impression that the teacher is the only factor in student achievement, ignoring parental involvement, after-school tutoring, and other "inputs" that research shows account for up to 80 percent of a student's achievement outcomes.

Of course, any evaluation system will have its flaws. Given the perfunctory state of teacher evaluation in the United States -- according to one widely cited study, less than 1 percent of American teachers receive an unsatisfactory evaluation in any given year, and three out of four teachers receive no specific feedback on improving -- wouldn't value-added, which is at least based on real data, be a major improvement?

Educators often claim everyone in a school building "just knows" who is an effective teacher and who isn't, and it turns out they aren't lying: Value-added ratings do tend to match principals' subjective evaluations of the teachers who work for them. The ratings may still prove useful, though, as a data-based "backup" for subjective classroom observations. But despite the Obama administration's faith in the technique, it doesn't seem to be the silver bullet that, alone, will transform teacher evaluation by allowing administrators to sniff out poor performers in a way they aren't already able to.

Johnston says his goal is for Colorado administrators to use the new evaluation tools mandated by SB 191, including value-added, to remove the bottom-performing 5 percent to 10 percent of the state's teaching force. The political and media focus on the profession's lowest performers is discomfiting to teachers, sure, but also to some principals, who will be the ones scheduling test days, observing classrooms, and firing teachers. "What troubled me so much about the SB 191 debate was that it became about bad teachers," said Bruce Caughey, deputy director of the Colorado Association of School Executives, at a Nov. 16 Denver panel discussion on the education-reform documentary Waiting for Superman. "I see that as a really small part of the problem. You really need to look at best practices in professional development" -- in other words, how to transform mediocre teachers into good ones.

A consensus is emerging on what those best practices are, and they have little to do with test-driven instruction. Research by Linda Darling-Hammond, a Stanford University teaching expert and former Obama adviser, has found that in Finland, South Korea, and other high-performing nations, teachers spend just 50 percent of their workday in the classroom with students, compared to about 80 percent for American teachers. During the rest of their day, Finnish and South Korean teachers work with other adults to plan lessons, observe one another's classrooms, and evaluate student work. This balance is especially important for beginning teachers; powerful evidence suggests that the single most helpful teacher-training exercise is to spend time inside a master teacher's classroom and to get feedback from that master teacher on one's own practice.

In the United States, teaching has traditionally been a "flat" profession, with few opportunities for leadership or advancement. When teachers spend more of their workday in partnership with other professionals, that changes -- there is more space and time to define formal roles for mentor teachers in observing and guiding the instruction of novices.

Politically, such reforms could be more complicated (and expensive) to enact than cracking down on tenure. During a recession with an unemployment rate near 10 percent, the job security enjoyed by teachers has become an increasing source of public resentment. And requiring teachers to spend more time in the classroom allows for smaller class sizes, a reform popular with parents, politicians, and teachers' unions. Yet small class sizes are not a characteristic of every high-performing nation; the average primary school class in South Korea has more than 30 students, compared to 23 students in the U.S.

Colorado politicians don't need to travel as far as Seoul, however, to get a look at education reform that prioritizes good teaching without over-relying on standardized testing or punitive performance-pay schemes. In 2009, in the southwest Denver neighborhood of Athmar Park -- a Latino area studded with auto-body repair shops, tattoo parlors, and check-cashing joints -- a group of union teachers opened the Math and Sciences Leadership Academy (MSLA), the only public school in Colorado built around peer evaluation. The elementary school borrows some of the cooperative professional development tools used in other countries: Every teacher is on a three-person "peer-review team" that spends an entire year observing one another's classrooms and providing feedback. The teachers are grouped to maximize the sharing of best practices; one team includes a second-year teacher struggling with classroom management, a veteran teacher who is excellent at discipline but behind the curve on technology, and a third teacher who is an innovator on using technology in the classroom.

Each teacher in the group will spend about five hours per semester observing her peer's teaching and helping him differentiate his instruction to reach each student. (MSLA is 92 percent Latino, and more than 97 percent of its students receive free or reduced-price lunch. Sixty percent of the student population is still learning basic English.) "It's kind of like medical rounds," explains Kim Ursetta, a kindergarten and first-grade English and Spanish literacy instructor who, as former president of the Denver Classroom Teachers Association, founded MSLA. "What's the best treatment for this patient?"

Peer review accounts for a significant portion of each MSLA teacher's evaluation score; the remainder is drawn from student-achievement data, including standardized test scores, portfolios of student work, and district and classroom-level benchmark assessments. MSLA is a new school, so the state has not yet released its test-score data, but it is widely considered one of the most exciting reform initiatives in Denver, a city that has seen wave after wave of education upheaval, mostly driven by philanthropists and politicians, not teachers. Alexander Ooms, an education philanthropist and blogger at the website Education News Colorado has written that MSLA "has more potential to change urban education in Denver than any other single effort."

When I visited MSLA in November, the halls were bright and orderly, the students warm and polite, and the teachers enthusiastic -- in other words, MSLA has many of the characteristics of high-performing schools around the world. What sets MSLA apart is its commitment to teaching as a shared endeavor to raise student achievement -- not a competition. During the 2009-2010 school year, all of the school's teachers together pursued the National Board for Professional Teaching Standards' Take One! program, which focuses on using curriculum standards to improve teaching and evaluate student outcomes. This year, the staff-wide initiative is to include literacy skills-building in each and every lesson, whether the subject area is science, art, or social studies.

Ruth Ocon Neri, who last year taught at a traditional Denver elementary school, says she is much happier -- and more driven to succeed -- at MSLA. "I would never want to be a principal! But here we have different kinds of leadership opportunities," she says.

Though MSLA's peer-review system seems to be working well for its teachers, students, and neighborhood, in the coming years, the school will have to follow the dictates of SB 191, like every other public school in Colorado. That means MSLA may have to weigh student standardized test scores more heavily in its teacher evaluations. "There are aspects of the bill that are more than a little disconcerting," says Lori Nazareno, MSLA's "school leader" (there is no "principal" here) and a veteran science teacher. "Are we just going to be slicing and dicing CSAP scores?" she asks, referring to Colorado's state exams in science, math, and reading.

As Nazareno walked me through MSLA's hallways, introducing me to kids and teachers, she reflected on how her profession is changing. "I'm not afraid of being held accountable. I haven't dedicated a career to have kids unable to read or do science," she said. "But people need to understand that teaching and learning are very complex processes, and any time you try to measure anything that's highly complex, you can miss the nuances."

Nazareno paused outside a classroom door and lowered her voice. "We had a girl in the second grade whose mother died. At the school next door, a girl was brutally murdered. That's all they've been talking about there for two weeks; they lost a lot of instruction time."

She raised her eyebrows. "How do you factor that into value-added?"

You need to be logged in to comment.
(If there's one thing we know about comment trolls, it's that they're lazy)

Connect
, after login or registration your account will be connected.
Advertisement