Opinion

MARGINS OF ERROR

Nate Silver, 30, is the managing partner of Baseball Prospectus, which compiles and interprets baseball statistics. A graduate from the University of Chicago with a degree in economics, Silver invented the PECOTA player-forecasting system (Player Empirical Comparison and Optimization Test Algorithm), praised by baseball managers and fantasy players for its accuracy in predicting future performance. In March, he anonymously started FiveThirtyEight.com, at which he applied his knowledge of statistics to polling in the presidential election.

It was a Sunday evening, two days before the 2004 presidential election, and Vice President Dick Cheney was aboard Air Force Two, on his way to Hawaii.

What was Cheney doing 30,000 feet above the Pacific Ocean instead of attending to some swing state like Ohio or Florida? Days before, a poll had come out in the Honolulu Advertiser showing Bush-Cheney tied with Kerry-Edwards in the state, at 43 percent of the vote each. This was considered a surprise; Al Gore had carried Hawaii by 18 points. But it was the first Hawaii poll to come out in months, and Cheney decided to gamble some of his limited time on the state’s four electoral votes.

It proved not to be worth the jet fuel – Kerry won Hawaii by a 9-point margin. But it was nevertheless a watershed moment in the history of polling. Instead of the campaign dictating polling, the polls were now dictating campaign strategy.

Bush, of course, won that election by a good margin. But imagine if Kerry had defeated Bush by a few hundred votes in Ohio; Cheney’s luau in Hawaii might have been remembered as a strategic disaster, all because of the false hopes of a survey.

Or consider the case of New Hampshire. For the several days preceding that state’s primary this year, the 9,351 square miles of New Hampshire were perhaps the most densely-polled region in the history of the known universe. Thirteen separate polling firms released surveys, collectively having conducted more than 14,000 interviews. All 13 showed Barack Obama winning the state. And then Election Day came, and Hillary Clinton won New Hampshire by three points.

*

It’s incidents like these that made me realize that polls cannot always be trusted.

My fulltime occupation has been as a writer and analyst for a sports media company called Baseball Prospectus. In baseball, statistics are meaningless without context; hitting 30 home runs in the 1930s is a lot different than hitting 30 today. There is a whole industry in baseball dedicated to the proper understanding and interpretation of statistics.

In polling and politics, there is nearly as much data as there is for first basemen. In this year’s Democratic primaries, there were statistics for every gender, race, age, occupation and geography – reasons why Clinton won older women, or Obama took college students.

But the understanding has lagged behind. Polls are cherry-picked based on their brand name or shock value rather than their track record of accuracy. Demographic variables are misrepresented or misunderstood. (Barack Obama, for instance, is reputed to have problems with white working-class voters, when in fact these issues appear to be more dictated by geography – he has major problems among these voters in Kentucky and West Virginia, but did just fine with them in Wisconsin and Oregon).

And so this March, I created a Web site called FiveThirtyEight.com (named after the number of votes in the Electoral College) to try and apply the same scientific spirit that we’ve used in baseball to the political world.

The first lesson: polls are nowhere near as accurate as they claim. A typical poll reports a margin of error of somewhere between 3 and 5 percentage points. But thus far in the Democratic primaries, polls released within 10 days of the election have missed the final Obama-Clinton margin by an average of 7.4 points.

The good news is that general election polling tends to be much, much more accurate than polling conducted in the primaries. There are fewer swing voters in the general election, since the large majority of voters simply vote with their party. Moreover, voters tend to make up their minds earlier, as they’ve had plenty of time to gather information on the candidates.

Nevertheless, it is best to treat any poll with a skeptical eye. Here are a few strategies that will help you to read the polls like a pro:

n Be picky. Some polling firms are better than others, and the best polls aren’t necessarily the ones that show up most often in the newspapers. At FiveThirtyEight.com, we have built a database of more than 900 polls from 32 different polling firms going back to the 2000 election. Each poll is assigned a weight based on the past performance of the pollster, as well as its sample size and how recently it was conducted. The best pollster in the country is Selzer & Company, a boutique firm located in Des Moines, Iowa, which rightly predicted a large win for Barack Obama in Iowa when most other firms had the state as a toss-up. After that are SurveyUSA of New Jersey and Rasmussen Reports, two companies that rely exclusively on automated calling scripts (e.g. “robopolls”), which have proven to be every bit as accurate as their live-interview counterparts. Meanwhile, brand name pollsters Zogby and Gallup rank just 17th and 25th on the list.

n Size matters – but only to an extent. While larger sample sizes are desirable, they aren’t a cure-all. A poll of the Democratic primary in Virginia by a firm called Constituent Dynamics included more than 6,000 respondents. It still miscalled the election by 11 points. A small sample size can undermine an otherwise good poll, but a large one can’t make up for poor methodology.

n Embrace the uncertainty. Some Web sites are constantly “calling” states for one candidate or another based on the most recent polling result. My philosophy is that the only time a state should be called is on Election Day. Instead, we treat each state probabilistically, simulating the election 10,000 times based on our current polling averages. We estimate, for example, that Barack Obama has a 66 percent chance to beat John McCain in Pennsylvania, but just a 38 percent chance in Virginia. As we move toward Election Day, the estimates will become bolder, but there is always some potential for a New Hampshire type of surprise.

n Polling is no substitute for common sense. Common sense would have dictated that Bush-Cheney didn’t have much of a chance in Hawaii, a state where Democrats outnumber Republicans almost two to one. The FiveThirtyEight.com version of common sense is to evaluate more than a dozen demographic variables as a complement to the polling. For example, some polling has shown Barack Obama competitive with John McCain in North Dakota, while trailing him significantly in South Dakota. This doesn’t make much sense, since the states are essentially identical demographically. Thus, we make demographic inferences from some states to help us interpret the results in other states. This proved to be a highly effective tactic in the Democratic primaries. Our demographic model, for instance, called North Carolina for Barack Obama by 17 points and Indiana for Hillary Clinton by 2 points. In each case, that prediction proved to considerably more accurate than the polling averages, which had shown a smaller lead for Obama in North Carolina and a larger one for Clinton in Indiana.

*

So what do the polls tell us now about the national election?

Rasmussen has McCain winning by three points; Reuters/Zogby has Obama up by eight. Liberal blogs say confidence in Republicans is low, and Obama will win in a landslide. Clinton holds on because she says Obama will be destroyed by McCain.

I think they’re all wrong – the 2008 election, like 2000, will be a nailbiter.

As of Friday morning, our model showed Barack Obama with a 51.8 percent chance to win the election and John McCain with a 48.2 percent chance. The only state that looks nearly certain to switch sides from 2004 is Iowa, which John McCain ignored during the primaries and where his opposition to the farm bill and ethanol subsidies will hurt him.

Otherwise, the election will be fought along three focal points: in the Upper Midwest, where Obama will try and steal Ohio and Indiana while defending Michigan, Wisconsin and Pennsylvania; in the Mountain West, where Colorado, Nevada and New Mexico all appear to be toss-ups, and in the Chesapeake region, where Obama hopes to add Virginia and North Carolina to the Democratic column.

Obama’s biggest decision will be whether or not to seriously contest Florida. Of the 12 polls that have come out in Florida since the first of the year, Obama has trailed McCain in all of them. While it’s possible that this is an artifact of his decision not to campaign there in the primaries, the state’s older demographics are not especially favorable to him, and McCain has made a strong play for Jewish voters. If I were Obama, I would concede the state, instead conserving my resources for the Midwest and Mountain West battlegrounds. Likewise, if I were McCain, I would not contest California, Oregon or Washington, where the political DNA seems to be especially favorable to Obama. Instead, I would try and take Michigan and Wisconsin from Obama, which could put his electoral equation in serious jeopardy.

And this goes without saying: if either candidate is visiting Hawaii, it had better mean they are on vacation.