Removing Outliers - A New Look at Kobe Bryant and Advanced Stats
Recently, I went back and was looking at game logs of Kobe's prime years, namely 2000-2003 and 2005-2010. It was amazing at how consistently good Kobe was on a night to night basis. He is often criticized for things such as never shooting about 47% from the field in a season or being inefficient. This was hard to believe after eye-balling his game logs and then I came across the culprit - the outliers. I calculated that Kobe was roughly prone to about 10 "bad" games in a season in which he shot the ball very poorly, swaying his season's averages quite a bit. This should be no surprise as 82 games (less in his case as he missed a few games here and there) is a small sample size and outliers do make a significant difference. Now, it's no secret Kobe is more prone to these type of games, than say Lebron, because of his style of play. However, these outliers have perhaps painted Kobe's image in a negative manner, which is rather unfair to him in general.
Therefore, I went through each of his prime seasons and removed 10 outliers from every NBA player's game-log to recalculate each their new season averages. Over 85% of the data is still being retained in these situations (I didn't just go and remove all of the games Kobe shot less than 50%). I hypothesized that this is likely a better method of finding out just how good someone is and shows a new perspective of Kobe in his specific case. For the outlier removal, I used a One-Class Support Vector Machine (SVM) (reference: http://scikit-learn.org/stable/modules/outlier_detection.html). The SVM was trained on every player's game logs containing box score data. Since this is an automated machine learning method, the software picked up on different "types" of outliers for each player. For example, an outlier for Kobe may be a game with low FG% while an outlier for a big man may be games with low rebounds. Here are Kobe's updated season averages from 2000-2010 (with 2004 left out as he missed too many games due to his rape case and such):
As you can see, it has a dramatic impact on Kobe's statistics. His averages with 10 outliers removed are absolutely remarkable. Bryant's FG% on average is around 49% during this time period while maintaining a beyond elite TS% in every season as well. One season that stands out to me is his 2007 season in which he averaged 33.31 PPG on a 61.0% TS%; even after outlier removal the only player in NBA history to score at this volume and efficiency was Michael Jordan. Furthermore, after league-wide outlier removal, Kobe led the league in both PER and WS/48 in 2006 and 2007 (a feat accomplished by only a handful of players in NBA history). These results carry over to the playoffs as well. For example, Kobe's approximate PER of 30.88 in the 2009 playoffs, after outlier removal (6 games removed for each player), is the 5th highest ever in a championship run (at least 16 games).
Next, I wanted to see if my hypothesis was correct and if these updated advanced metrics were actually improved compared to before (correlated better to team win %). The method to calculate this, is to compute the minute-weighted average of WS/48 and PER for every team, and regress this upon winning %. This is described by Neil Paine here. He found that on a 1-year basis WS/48's correlation to wins was 0.694 and PER's correlation to wins was 0.638. I ran the correlation test again with the outlier-removed WS/48 and PER and obtained correlations of 0.726 and 0.654 respectively. This was very exciting because it had a significant improvement in each of the statistics and confirmed my hypothesis. This subsequently means that the outlier removal is not simply artificially inflating stats to make players like Kobe look better, but is doing a better job of explaining wins (and therefore player value).
To conclude, advanced stats do not "hate" Kobe Bryant. When popular advanced metrics such as WS/48 and PER are improved on with outlier removal, he is viewed in a very favorable light by them. In fact, after this, he is one of the very few players to lead the league in WS/48 and PER (Kareem, Moses Malone, Larry Bird, Jordan, Shaq, T-Mac, KG, Dirk, Lebron, and KD are the only others) -- and he did it twice (joining only Kareem, Bird, Jordan, Shaq, and Lebron to do it multiple times). Further experiments would include seeing how the correlation of WS/48 and PER change with respect to the number of outliers removed.
Feel free to leave your thoughts and suggestions in the comments below.
0 comments:
Post a Comment