The Patterns of Success
On the night of the WNBA Draft, the usual suspects could be found in the Atlanta Dream’s War Room at the Marriott Marquis. There were coaches, Atlanta Dream staff, and media members – the sorts of people you’d expect to see.
However, there was a special set of visitors on Draft night. Some of them were members of the women’s basketball team from Emory University in Atlanta. Accompanying the players were two Emory professors from the Goizueta Business School.
Dr. Michael Lewis is an Associate Professor of Marketing and Dr. Manish Tripathi is an Assistant Professor in the Practice of Marketing. The professors were here to see the culmination of a special project – an attempt to predict the success of WNBA Draft prospects with number crunching of the highest order.
The marriage of math and sports isn’t new – Moneyball was a best-selling book and the movie based on the book made $110.2 million. Could these new predictive techniques be applied to women’s basketball? Dr. Lewis answered some of our questions, sharing his insights on the project.
Whose idea was it initially to start the project? Was it the idea of someone from Emory, or was it the idea of the Atlanta Dream?
Dr. Michael Lewis: I think this was more of an incremental / organic project than something that came from one side or another. We teach a predictive sports analytics course and we had the Dream come in to speak to the class. Following that [we had] two students (MBAs) propose doing course projects related to WNBA free agency. Phase two of bringing in some athletes to work with the Dream on the WNBA Draft came from some conversations we had with the Emory Athletic Director Tim Downs. Basically we have been discussing creating more opportunities for students to do this type of work. We reached back out to the Dream and ran with the project.
Did anyone on the project have any previous experience with statistical sports analysis?
ML: Manish and I run a blog that focuses on sports business issues (at https://scholarblogs.emory.edu/esma). We have a fairly broad coverage of the business side and occasionally delve into the on field side. Almost everything we do from brand equity analyses to player performance predictions is based on data and statistical analyses.
You arrived with a group of Emory students to the WNBA draft. Were the Emory students who attended the draft members of the women's basketball team, were they students in a marketing/stats class, or were they both?
They are all members of the basketball team. We hope to have them in future classes.
Who actually "did the math" on the project? Was that the students, or was that the professors?
ML: I spearheaded the statistical analyses and the students focused on using the models for prediction. The students also were instrumental in guiding the development of models. The goal, of course, is to continue to ramp up the students. We consider this phase one. We would have like to have done more work on the stats side but we ended up having to devote a lot of time to building data.
My understanding is that the art of factor analysis is used for determining the underlying structure behind a set of data. How does that differ from, say, linear regression or other types of analysis that try to find a relationship between variables?
ML: As we explored this project we found that missing data was a significant issue. In particular, the NCAA seems to do a poor job of tracking turnovers. The other issue is that, even though there is significant missing data, there is too much data as well. At a minimum there is four years of data on points, rebounds, three pointers, free throws, assists, etc. The WNBA is a relatively new league and there are only a limited number of teams - this means that the data on drafted player performance encompasses only about 120 players.
Factor analysis is a nice solution for this type of environment, as we are able to reduce the number of explanatory variables down to a manageable number. The big innovation (or assumption) is that the “factor” scores are correlated with the non-observable data.
We then used the player factor scores to predict WNBA performance. We are therefore using factor analysis as a data transformation tool but we are also using standard tools like linear regression and logistic regression for prediction.
How difficult is this sort of analysis to do? Is this the kind of mathematics that can be done on an Excel spreadsheet? Or in a software application like R2? What kinds of tools are used for the number-crunching part of it?
ML: The analyses were a bit beyond Excel (though there are add-in packages that can extend Excel). We used a combination of Excel for data processing and SAS for access to statistical procedures such as factor analysis, and logistic regression.
How much data was obtained for the analysis?
ML: Four seasons of college data for each player. Looking at development was very important. In particular it was important to see growth in performance metrics for front court players over the four years of college.
We looked at rookie season and career performance for the professional variables - dependent variables. Extending the pro data to season by season could be a phase two element as we could start to look at player development trajectories.
Is there a way to account in the data for the different circumstances of prospects, say, for one player playing in a strong conference than another, or against stronger competition? Or did some assumptions have to be simplified in working with the data?
ML: We used strength of schedule (SOS) as one of the inputs to the factor analysis. Alternatively we could have done some adjustment to stats – points, rebounds, etc. - based on SOS. I prefer to use it as an input to factor analysis rather than to make some (arbitrary) assumption about the appropriate adjustment factor.
I understand that John Hollinger's PER (Player Efficiency Rating) was used as a foundation point for the analysis. Given that there are all sorts of basketball evaluation metrics out there, what was the appeal of using PER?
ML: This seemed to be a useful, common and highly critiqued metric. The selection to begin with PER was fairly arbitrary. Given the lack of consensus about the best metric or even transference to the women’s game, our strategy was to rely on multiple metrics.
Was there a comprehensive attempt to rank all of the draft prospects? Could one rank all of the draft prospects from the results of this kind of analysis?
ML: The Dream gave us a list of about thirty prospects. We ranked them all. The prospect scoring is easy. We did a second analysis that was based on comparables. In this analysis we identified the most similar prospect to established WNBA players. We used a slightly different factor analysis model and compared prospects to the college data for the established players.
My understanding is that the results conclude that the rebounding ability of guards is undervalued. Are there any other interesting conclusions to be drawn from the data?
ML: We did find that guards that rebound better were more successful pros. It’s probably more accurate to say that backcourt players that have more of a “power factor” – blocks, rebounds, etc. - in their game translate better to the next level. What I would speculate is that they are stronger players that are more able to adapt to the bigger, stronger next level.
We did find evidence that front court players that score better on the “skill factor” made up of assists – passing, etc. - and similar elements translate better to the pros.
The story is really the same for both the front court and back court. It’s having the something extra compared to their college peers. Most guards are skilled so strength is the predictor. Most front court players are physically strong so it’s the skills that differentiate.
Is this work proprietary? Does the Dream own it, or can anyone use it?
ML: No. Not proprietary. I’m thinking about writing it up as an academic paper. This was very much a rush job so we need some time to reflect, extend and clean it up.
Will there be an attempt to expand on this work in the future, or was this a one-time project?
ML: My personal feeling is that this could be a great ongoing partnership for the Dream and Emory. It’s really up to the students and the Dream. I’m happy to facilitate.
I truly hope it does continue. I think this was a special experience for the students. I also think that the Dream can get a great deal of future value. The thing about these projects is that there is a lot of upfront work related to data gathering. We’ve done the hard work, so it gets easy from here.