Thursday, January 20, 2011
Sabermetric basketball statistics are too flawed to work
You know all those player evaluation statistics in basketball, like "Wins Produced," "Player Evaluation Rating," and so forth? I don't think they work. I've been thinking about it, and I don't think I trust any of them enough put much faith in their results.
That's the opposite of how I feel about baseball. For baseball, if the sportswriter consensus is that player A is an excellent offensive player, but it turns out his OPS is a mediocre .700, I'm going to trust OPS. But, for basketball, if the sportswriters say a guy's good, but his "Wins Produced" is just average, I might be inclined to trust the sportswriters.
I don't think the stats work well enough to be useful.
I'm willing to be proven wrong. A lot of basketball analysts, all of whom know a lot more about basketball than I do (and many of whom are a lot smarter than I am), will disagree. I know they'll disagree because they do, in fact, use the stats. So, there are probably arguments I haven't considered. Let me know what those are, and let me know if you think my own logic is flawed.
------
The most obvious problem is rebounds, which I've posted about many times (including these posts over the last couple of weeks). The problem is that a large proportion of rebounds are "taken" from teammates, in the sense that if the player credited with the rebound hadn't got it, another teammate would have.
We don't know the exact numbers, but maybe 70% of defensive and 50% of offensive rebounds are taken from a teammates' total.
More importantly, it's not random, and it's not the same for all players. Some rebounders will cover much more of other players' territory than others. So when player X had a huge rebounding total, we don't know whether he's just good at rebounds, whether he's just taking them from teammates, or whether it's some combination of the two.
So, even if we decide to take 70% of every defensive rebound, and assign it to teammates, we don't know that's the right number for the particular team and rebounder. This would lead to potentially large errors in player evaluations.
The bottom line: we know exactly what a rebound is worth for a team, but we don't know which players are responsible, in what proportion, for the team's overall performance.
------
Now, that's just rebounds. If that were all there were, we could just leave that out of the statistic, and go with what we have. But there's a similar problem with shooting accuracy.
I ran the same test for shooting that I ran for rebounds. For the 2008-09 season, I ran regression for each of the five positions. Each row of the regression was a single team for that year, and I checked how each position's shooting (measured by eFG%) affected the average of the other four positions (the simple average, not weighted by attempts).
It turns out that there is a strong positive correlation in shooting percentage among teammates. If one teammate shoots accurately, the rest of the team gets carried along.
Here are the numbers (updated, see end of post):
PG: slope 0.30, correlation 0.63
SG: slope 0.40, correlation 0.62
SF: slope 0.26, correlation 0.27
PF: slope 0.28, correlation 0.27
-C: slope 0.27, correlation 0.43
To read one line off the chart: for every one percentage point increase in shooting percentage by the SF (say, from 47% to 48%), you saw an increase of 0.26% in each of his teammates (say, from 47% to 47.26%).
The coefficients are a lot more important than they look at first glance, because they represent a change in the average of all four teammates. Suppose all five teammates took the same number of shots (which they don't, but never mind right now). That means that when the SF makes one extra field goal, each teammate also makes an extra 0.26, for a team team total of 1.04 extra field goals.
That's a huge effect.
And, it makes sense, if my logic is right (correct me if I'm wrong). Suppose you have a team where everyone has a talent of .450, but then you get a new guy on the team (player X) with a talent of .550. You're going to want him to shoot more often than the other players. For instance, if X and another guy are equally open for a roughly equal shot, you're going to want to give the ball to X. Even if Y is a little more open than X, you'll figure that X will still outshoot Y -- maybe not .550 to .450, but, in this situation, maybe .500 to .450. So X gets the ball more often.
But, then, the defense will concentrate a little more on X, and a little less on the .450 guys. That means X might see his percentage drop from .550 to .500, say. But the extra attention to X creates more open shots for the .450 guys, and they improve to (say) .480 each.
Most of the new statistics simply treat FG% as if it's solely the achievement of the player taking the shot, when, it seems, it is very significantly influenced by his teammates.
------
Some of that, of course, might be that teams with good players tend to have other good players; that is, it's all correlation, and not causation. But there's evidence that's not the case, as illustrated by a recent debate on the value of Carmelo Anthony.
Last week, Nate Silver showed that if you looked at Carmelo Anthony's teammates' performance, and then looked at that performance when Anthony wasn't on their team, you see a difference of .038 in shooting percentage. That's huge -- about 15 wins a season.
Dave Berri responded with three criticisms. First, that Silver weighted by player instead of by game; second, that Silver hadn't considered the age of the teammates (since very young players improve anyway as they get older); and, third, that if you control for age and a bunch of other things, the results aren't statistically significant from zero. (However, Berri didn't post the full regression results, and did not claim that his estimate was different from .038.)
Finally, over at Basketball Prospectus, Kevin Pelton ran a similar analysis, but within games instead of between seasons (which eliminates the age problem, and a bunch of other possible confounding variables). He found a difference of .028. Not quite as high as Silver, but still pretty impressive. Furthermore, a similar analysis of all of Anthony's career shows similar improvements in team performance, which suggests the effect is real.
To be clear, this kind of analysis is the kind that, I'd argue, works great -- comparing the team's performance with the player and without him. What I think *doesn't* work is just using the raw shooting percentages. Because how do you know what those percentages mean? Suppose one team is all at .460, and another team is all at .490. The .490 means that you have more players on the team above average than below average. But, the above average players are lifting the percentages of the below average players, and the below-average players are reducing the percentages of the above-average players. But which are which? We have no way of telling.
Here's a hockey example. Of Luc Robitaille's eight highest-scoring NHL seasons, six of them came while he was a teammate of Wayne Gretzky. In 1990-91, Robitaille finished with 101 points. How much of the credit for those points do you give to Robitaille, and how much of the credit do you give to Gretzky? There's no way to tell from the single season raw totals, is there? You have to know something about Robitaille, and Gretzky, and the rest of their careers, before you can give a decent estimate. And your estimate will be that Gretzky that should get some of the credit for some of Robitaille's performance.
Similarly, when Carmelo Anthony increases all his teammates' shooting percentages by 30 points, *and it's the teammates that get most of that credit* ... that's a serious problem with the stat, isn't it?
------
So far, we've only found problems with two components of player performance -- rebounds and shooting percentage. However, those are the two biggest factors that go into a player's evaluation. And, additionally, you could argue that the same thing applies to some of the other stats.
For instance, blocked shots: those are primarily a function of opportunity, aren't they? Some players take a lot more shots than others, so the guy who defends against Allen Iverson is going to block a lot more shots than his teammates, all else being equal.
------
Still, it could be possible that the problems aren't that big, and that, while the new statistics aren't perfect, they're still better than existing statistics. That's quite reasonable. However, I think that, given the obvious problems, the burden of proof shifts to those who maintain the stats still work.
The one piece of evidence that I know of, with regard to that issue, is the famous study from David Lewin and Dan Rosenbaum. It's called "The Pot Calling the Kettle Black – Are NBA Statistical Models More Irrational than 'Irrational' Decision Makers?" (I wrote about it here; you can find it online here; and you can read a David Berri critique of it here.)
What Lewin and Rosenbaum did was try to predict how teams would perform last year, based on their previous year's statistics. If the new sabermetric statistics were better evaluators of talent than, say, just points per game, they should predict better.
They didn't. Here are the authors' correlations:
0.823 -- Minutes per game
0.817 -- Points per game
0.820 -- NBA Efficiency
0.805 -- Player Efficiency Rating
0.803 -- Wins Produced
0.829 -- Alternate Win Score
As you can see, "minutes per game" -- which is probably the closest representation you can get to what the coach thinks of a player's skill -- was the second highest of all the measures. And the new stats were nothing special, although "Alternate Win Score" did come out on top. Notably, even "points per game," widely derided by most analysts, finished better than PER and Berri's "Wins Produced."
When this study came out, I thought part of the problem was that the new statistics don't measure defense, but "minutes per game" does, in a roundabout way (good defensive players will be given more minutes by their coach). I still think that. But, now, I think part of the problem is that the new statistics don't properly measure offense, either. They just aren't able to do a good job of judging how much of the team's offensive performance to allocate to the individual players.
Now that I think I understand why Lewin and Rosenbaum got the results they did, I have come to agree with their conclusions. Correct me if I'm wrong, but logic and evidence seem to say that sabermetric basketball statistics simply do not work very well for players.
-----
UPDATE: some commenters in the blogosphere are assuming that I mean that basketball sabermetric research can't work for basketball. That's not what I mean. I'm referring here only to the "formula" type stats.
I think the "plus-minus"-type approaches, like those in the Carmelo Anthony section of the post above, are quite valid, if you have a big enough sample to be meaningful.
But, just picking up a box score or looking up standard player stats online, and trying from that which players are how much better than others (the approach that "Wins Produced" and other stats take) ... well, I don't think you're ever going to be able to make that work.
UPDATE: I found a slight problem with the data: one team was missing and one team I entered twice. I've updated the post. The conclusions don't change.
For the record, the wrong slopes were .30/.39/.31/.25/.24. The corrected slopes, as above, are .30/.40/.26/.28/.27.
The wrong correlations were .59/.58/.37/.26/.40. The corrected correlations are .63/.62/.27/.27/.43.