In Tournament of Upsets, V.C.U. Has Overcome Longest Odds

Somehow, my N.C.A.A. tournament bracket still ranks in the 76th percentile nationwide, a result which it owes to having performed very strongly in the first couple of rounds.

It’s all downhill from here, however, since the bracket had all four No. 1 seeds advancing to the Final Four in Houston — something that, after Kansas’ convincing loss to Virginia Commonwealth on Sunday — none of them will do.

But if my bracket had to be so thoroughly destroyed, better by rags-to-riches stories like Virginia Commonwealth and Butler than by a bunch of upwardly mobile No. 2 seeds.

Virginia Commonwealth, in fact, might be the basketball equivalent of Susan Boyle. In a competition famous for its upsets, the Rams having made it to Houston may be the most unlikely occurrence in the history of the tournament.

Before the tournament began, we had Virginia Commonwealth with just a 12-in-10,000 chance of reaching the Final Four, making the Rams 820-to-1 underdogs. The reason for the extremely long odds were twofold.

First, Virginia Commonwealth was one of eight teams that had to play in the N.C.A.A.’s newly expanded opening round, which means it had to win five times rather than four to reach Houston. (After defeating U.S.C. in its opening game, Virginia Commonwealth’s odds shortened to “only” 335-to-1 against, according to our formula.)

But second, while there were some dangerous-looking lower seeds in this year’s field, like Richmond, Gonzaga and Utah State, Virginia Commonwealth was not among them.

Instead, Virginia Commonwealth rated as the 84th-best team in the country heading into the tournament, according to Ken Pomeroy‘s ratings. Our projections, which were based on a combination of Mr. Pomeroy’s ratings and three other systems, had the Rams as the 51st-best team in the 68-team field, ahead of only Memphis and a dozen or so automatic qualifiers from smaller conferences.

Computer ratings, to be sure, are imperfect instruments, but Virginia Commonwealth had looked no more impressive according to the “eye test.” V.C.U.’s résumé had included losses to undistinguished programs like Northeastern, Georgia State and South Florida. Many commentators, from stat-heads to traditionalists, had objected to V.C.U. even being in the tournament field.

But Virginia Commonwealth, obviously, has made the most of its opportunity. Its run to the Final Four is an order of magnitude less likely than any other example since the tournament expanded to 64 teams in 1985 — although there is one other case that is comparable to it if you go back slightly further. Here is a rundown of the competition.

George Mason, 2006. The most natural comparison for Virginia Commonwealth is George Mason, its companion in the Colonial Athletic Association which reached the Final Four in 2006. Like Virginia Commonwealth, George Mason was a No. 11 seed.

George Mason, however, was a better team than Virginia Commonwealth — or at least looked that way heading into the tournament. Mr. Pomeroy’s ratings, at the time, had it 28th in the country, while Jeff Sagarin’s ratings had it 38th. Other systems had George Mason a bit lower, but it entered the tournament with seven losses to Virginia Commonwealth’s 11, despite the indignity of having lost twice to Hofstra late in the regular season.

What would our projection system have said about George Mason if it were setting the odds in 2006?

I developed a pared-down version of the model based on one relatively simple computer rating called (appropriately) Simple Rating System or S.R.S. The advantage of this particular rating system is that it is built in a relatively straightforward way from game-by-game data, so I can calculate with a reasonable degree of precision what a team’s rating would have been before tournament play had begun.

Some of the other bells and whistles of our model, like accounting for injuries and the geographic location of the game, are not included in this version. But it should give a pretty good approximation — and it can account, for example, for the difficulty of the draw that each team faced.

In George Mason’s case, the model would have had it with a 1.42 percent chance of reaching the Final Four heading into the tournament, giving it 70-to-1 odds against. Those are pretty impressive numbers to overcome — but it pales in comparison to Virginia Commonwealth’s 820-to-1 odds.

Louisiana State, 1986. The only other double-digit seed to have reached the Final Four was Louisiana State, in 1986. The way L.S.U. did this involved knocking off a No. 6 seed (Purdue), a No. 3 (Memphis) a No. 2 (Georgia Tech) and a No. 1 (Kentucky) in consecutive rounds. Each of those teams was appropriately seeded — in fact, Georgia Tech was unusually strong for a No. 2 and Memphis quite good for a No. 3 — so the accomplishment is just as impressive as it sounds.

We’re not trying to figure out, however, which team defeated the best opponents en route to the Final Four (it would be difficult, both in practice and in theory, to top Louisiana State in that department). Instead, we’re trying to determine which team did so despite having the longest odds against it.

Louisiana State was a very strong No. 11 seed. Based on its S.R.S. rating heading into the tournament, a No. 6 seed or even a No. 5 would have been more appropriate. But L.S.U. had taken a number of tough losses, losing four games by 3 points or fewer, and the seeding committee punished L.S.U. for having gone 9-9 in conference play after having started the year 10-0.

There was plenty of talent on the team, though, including three players who would later play in the N.B.A. The model would have given the Tigers a 1.45 percent chance of reaching the Final Four, or 69-to-1 against, about the same odds as George Mason but much shorter than V.C.U.

Villanova, 1985. Since the tournament expanded to 64 teams in 1985, three No. 8 seeds have reached the Final Four. The weakest one by some margin, by its S.R.S. rating before the tournament, was Villanova in 1985, which was somewhere around the 35th or 40th best team in the country heading into tournament play and may have received a break on seeding.

Villanova also got a pretty good draw; the No. 1 seed (Michigan), No. 2 seed (North Carolina) and No. 3 seed (Kansas) in its region were all relatively weak by historical standards. Our model would have given Villanova almost exactly a 2 percent chance, or 50-to-1 against, of advancing to the Final Four.

Note, however, that Villanova not only reached the Final Four but actually won the tournament; I don’t have an exact version of this calculation, but Villanova would have been something on the order of 800-to-1 against doing that. So Villanova winning the tournament was very unlikely — but only about as unlikely as Virginia Commonwealth merely reaching the Final Four.

Pennsylvania, 1979. Although the tournament did not reach 64 teams until 1985, it began to seed them in 1979. That year, the Pennsylvania Quakers entered the tournament as a No. 9 seed and reached the Final Four.

Why this is more impressive than it sounds is because the tournament consisted of only 40 teams that year — 10 in every region — so a No. 9 seed then might have been comparable to a No. 14 or No. 15 today.

Pennsylvania was probably the worst team ever to reach the Final Four. The Quakers played a reasonably formidable out-of-conference schedule, including some tournament teams like Temple and Georgetown. Still, while Penn took only five losses in the regular season, the Ivy League was still the Ivy League and its average margin of victory during the regular season was only 6.5 points.

S.R.S. is not available for 1979 or years prior to it, but I inferred a rating for Pennsylvania based on its margin of victory, applying a strength-of schedule-adjustment based on the teams that Pennsylvania played in the years from 1980-1984, after the SRS ratings came online. That calculation suggests that Pennsylvania was perhaps 2 or 3 points weaker during the regular season than this year’s Virginia Commonwealth team, and indeed was about at the level of a No. 14 seed.

Pennsylvania had the good fortune, however, of playing in an era of tremendous parity in college basketball (consider, for instance, that the 1981 Indiana Hoosiers, one of the best teams of the period, nevertheless endured nine losses during the regular season).

Inferring S.R.S. ratings for other teams in its region based on standing in the Associated Press poll and won-loss record heading into the tournament, I show Pennsylvania as having had only a 0.19 percent chance of reaching the Final Four, or 537-to-1 against. Still, those are slightly better odds than the 820-to-1 that I have for this year’s Virginia Commonwealth team.

Because of the extra assumptions involved here, this one is a debatable case; Pennsylvania may well have been the longer long shot. Or alternatively I could be underrating the team somewhat.

Virginia Commonwealth could put the case to rest, however, by winning the national semifinal against Butler, the fellow underdogs who now unexpectedly find themselves as the Las Vegas favorites for the first time in the tournament. (Butler’s road to the Final Four was not quite as steep as Virginia Commonwealth’s; we had it as a relatively pedestrian 37-to-1 long shot.) If V.C.U. did that, it would unambiguously vault past both Pennsylvania’s 1979 Final Four appearance, and Villanova’s 1985 championship, as the most statistically unlikely success story.

And if V.C.U. wins the tournament, it would accomplish something that our model had given them a 1-in-17,611 chance of doing, which would rank among any achievement in sports.

*-*

But one postscript here, and one that may actually be of interest to the non-basketball fans who have managed to read this far.

Whenever you come across a statistical model which suggests that something extremely unlikely has occurred, you ought to be in the habit of questioning whether whether the event really was that unusual, or instead whether the model was designed with faulty assumptions.

For instance, before the tournament, our model gave a No. 16 seed, Alabama State, odds of about 3.6 trillion-to-one against winning everything. If that were to occur, it would be naive to suggest that the model had merely been “unlucky”.

Usually, this sort of thing doesn’t get you into trouble. If the chance of an event occurring is really 100,000-to-1 against, and your model instead has it as at 100,000,000-to-1 against, in some sense that’s a gigantic error: you underestimated the likelihood of the event by a factor of 1000! In the real world, however, a 100,000-to-1 bet is unlikely enough to come through that you won’t normally have to live with the consequences.

Nevertheless, this is the sort of thing I think about quite a lot — and which I had thought about specifically in the context of this year’s tournament. Believe it or not, although having Virginia Commonwealth at 820-to-1 against making the Final Four sounds like extremely long odds, other smart folks had them a bit longer still; Mr. Pomeroy’s model for instance, had Virginia Commonwealth as about 3,300-to-1 against making the Final Four. And it had them as about 200,000-to-1 against winning the tournament, rather than 17,000-to-1.

The reason for the difference is not because we began with any more favorable an impression of Virginia Commonwealth. Rather, it’s because our model is somewhat Bayesian and assumes that conditional upon having been wrong once about Virginia Commonwealth, it’s more likely to be wrong about them again. At this point, Virginia Commonwealth has won five games, all as underdogs (although our model has recommended a bet on Virginia Commonwealth against the Las Vegas point spread on several occasions), so it has revised its expectations of Virginia Commonwealth pretty significantly and now sees them as tantamount to a No. 5 or No. 6 seed. The process is explained at more length here. This principle also has applications in a number of other fields ranging from epidemiology to finance.

In a basketball sense, the implication of this is that the deeper you get into the tournament, the more willing you should be to dispense with your initial impressions of the teams. This year’s tournament has been very unusual in that favorites did quite well during the first two rounds, but since then there’s been total chaos. It may be that we need to tune up this parameter even further, which would mean that while the system would make some fairly confident bets during the first round or two, it would narrow the odds thereafter, assuming that once a team had advanced to a certain point in the tournament, its results would speak for themselves and it would be foolish to continue to bet too heavily against them.

For the time being, I think we’re O.K. Keep in mind that, although the odds of any one team reaching the Final Four as a 800-to-1 long shot are of course very long, there are many such opportunities every year; in every tournament, perhaps there are 10 or 15 teams that fall somewhere in the vicinity of Virginia Commonwealth. There have now been 33 tournaments played since the field began to be seeded in 1979, so it wouldn’t be unusual for one or two of those bets to come through.

That doesn’t make what Virginia Commonwealth has accomplished any less remarkable, though. Or my bracket any less busted.

Correction: April 2, 2011
The FiveThirtyEight column on Tuesday, comparing the chances of Virginia Commonwealth University’s reaching the men’s Final Four with those of previous long shots in N.C.A.A. tournament history, miscalculated the odds faced by the University of Pennsylvania in 1979. The odds, which should have been based in part on an Associated Press poll released three days before the tournament, were 537-1, not 420-1. The incorrect figure was based on that season’s final A.P. poll, which was released four days after the tournament started. (A corrected chart appears at nytimes.com/FiveThirtyEight.)