My Model Monday: NBA Draft Scouting Text Analysis

With March Madness just wrapping up, the natural next step in the basketball calendar is to turn to the NBA Draft. At Model 284, we have created a number of different models projecting college basketball players in the NBA; these include: our Peak NBA Statline Projection (PNSP) which attempts to predict overall NBA ability on a scale of 0-100, Similarity Scores that capture style of play, and lastly, a Role Probability Model which puts a probability that each player ends up in a certain role in the NBA (All-Star, Starter/Sixth Man, Bench, or out of the league). One idea which has come up repeatedly in our draft analysis is incorporating scouting or subjective analysis into our draft models. Rather than using a rubric of someone’s scouting grades, I wanted to do a text analysis of scouting reports already written. Since DraftExpress recently folded and moved to the dark side with ESPN Insider (probably a smart business decision), I used

Using scouting reports, I did a text analysis on NBA Draft prospects. What is nice about the website and scouting articles is that they keep their scouting reports fairly consistent from player to player, having a paragraph or two on strengths and weaknesses, respectively (as seen below).
For this My Model Monday, I started small, using the 2012 NBA Draft class. Why the 2012 NBA Draft class? Well, no reason other than I choose a random number from the vector c(2008:2013) and 2012 came out. Eventually, I’d like to explore more draft classes, but I kept it to one class for the initial analysis. Sticking to one draft class is beneficial because of changing scouting reports over the years, and the fact that a subset can be a good starting point in any statistical analysis.

Alrighty, let’s roll! First, I started by looking at what the most common words in the 2012 NBA Draft player profiles as a whole. Below is a word cloud of the top 100 most frequently appearing words. Size, location, and color are based on the frequency (i.e., the more frequent words are bigger, closer to center, and colored gray/blue). Thus, in the plot below the top 5 most occurring words for the 2012 NBA Draft Class scouting reports are: GOOD (in honor of Jocko Willink), can, ball, ability, and will.
If you dislike the word clouds, then you are probably no fun, but since some people do, here is a bar graph of the top 10 most frequently occurring words across the roughly 60 2012 NBA Draft Scouting Reports.

Breaking down these profiles into a single prospect, below is top pick Anthony Davis’s word cloud. Which looks very lengthy.
“Length” (did you catch my pun?), “athleticism”, “body”, and “agility” are all among top 10 most frequent words in Anthony Davis’s scouting report. This makes sense given Anthony Davis’ freakish physical tools evident in his pairing of a 7’6″ wingspan with a guard-like body. “Defensive” is another frequently occurring word which is another trait Davis is known for given his record-setting Freshmen block numbers.

Interestingly, some prospects have words show up that generally carry a negative connotation such as “doesn’t” or “can’t.” This moves us to the next part of our text analysis, sentiment analysis. Sentiment analysis is the task of detecting the writer’s feeling based on the text written. Thus, using sentiment analysis, we can determine if a scouting report is generally more positive or negative as well as the degree of positivity and/or negativity. University of Chicago professor Bing Liu has created a word bank that is considered positive, and similarly, a word bank for negatively denounced words. Below is an example of some of the words associated with each.I started by applying our sentiment analysis to superstar Anthony Davis’s scouting report. sentiment (score) is equal to the number of positive sentiment words less the number of negative sentiment words.Interestingly, Anthony Davis grades out negatively by the Bing sentiment analysis. How does the rest of the 2012 NBA Draft class look? The table below shows the top 10 sentiment scores for 2012 NBA Draft class:
At first glance, there does not appear to be a relationship between sentiment score and draft pick number, but what about NBA performance? Is more positive writing in prospect reports predictive of NBA success or vice versa? To explore this, I looked at the correlation between sentiment score and a couple NBA statistics, namely Games Played and Box Plus Minus (plotted below).

There does not appear to be much relationship for either Box Plus Minus (correlation = -0.19) and Games Played (correlation = -0.02) with Sentiment Score. As you may imagine, the number of words differs on each scouting report, so if we look at the sentiment score rate as opposed to the total sentiment score, maybe that helps increase or decrease the correlation.

Unfortunately, there still does not appear to be much relationship from our measurement of sentiment score rate and Games Played (-0.11). Sentiment score rate and career Box Plus Minus also does not have much correlation, (-0.14). Does this mean’s reports are useless? Absolutely not! In fact, there are still many more areas to explore in this text analysis. For example, first we probably want to extend our sample of scouting reports. Next, our Bing measurements of sentiment score might not be optimal for NBA scouting reports, so we could make our own lexicon set of words for positive and negative sentiment. Furthermore, association or pairing of words might have some interesting information because looking at single words obviously doesn’t paint the whole picture. Lastly, as noted with the specific words appearing in Anthony Davis’s report, we could explore the correlation between certain types of words that might help explain traits. One example would be with defense, which in fact, is something we believe our PNSP model is not fully capturing. Our PNSP model only rated Anthony Davis as a good prospect rather than elite likely due to the fact that we were not fully capturing his physical tools. We saw that Anthony Davis’s scouting report had a number of words associated with athleticism and defensive ability. Does having a high number of those said words predict NBA defensive performance? Maybe, but at any rate, we could go on forever with these questions. While the initial analysis leaves us a bit empty-handed, there is still room for more exploration through text analysis, so be sure to stay up-to-date on everything Model 284 so you don’t miss out!