In sports, people love to categorize players by their playing style. For example, in hockey, people distinguish defensemen as offensive or defensive, or the rare all-around defensemen. In this week’s installment of My Model Monday, I look to create mathematical groupings of NHL defensemen using 2017-2018 NHL data.

### Data

All data was pulled from Manny Perry’s website, Corsica.hockey. The following statistics for NHL defensemen from the 2017-2018 NHL season were used in this clustering analysis:

- Goals Scored (5 on 5)
- Primary Assists (5 on 5)
- Secondary Assists (5 on 5)
- PowerPlay Goals
- PowerPlay Primary Assists
- PowerPlay Secondary Assists
- Hits Per 60
- Individual Penalties Taken per 60
- Individual Penalties Drawn per 60
- Blocked Shots per 60
- Takeaways per 60
- Giveaways per 60
- Corsi %

### Clustering Analysis

A good first step in a clustering analysis is to simply plot the data onto to a 2-D grid. There are a couple different ways to do this; one is Principal Component Analysis (PCA), which we will implement here. In short, PCA is used here to boil down the above list of statistics into two “Principal Components” that can be viewed on a 2-D grid. The resulting grid is shown below:

Right away, we see a couple of interesting outliers. In the top right corner is Mark Borowiecki of the Ottawa Senators. According to Corsica.hockey, his Hits For per 60 is 17.24, which places him first for defensemen by a wide margin (2nd is Alex Biega at 12.69). Brendan Smith and Kurtis MacDermid (top mid-right), are both in the top ten for Penalties Taken per 60 and Giveaways per 60 (neither of which you probably want to be in the top 10 for). The left corner has some names that come up in the Norris Trophy conversation each year, i.e. Erik Karlsson, Brent Burns, John Klingberg, etc.

Now that we’ve gotten a feel for the data, let’s start clustering. For this analysis, I used a K-means clustering. Loosely related to K Nearest Neighbors, K-means seeks to partition all defensemen into k clusters in which observations are bucketed by the nearest mean. Applying this to NHL defensemen data, we settled on four clusters. Here are some players that fell into each category:

Below I have colored our principal component plot from above by the four clustering groups.

We appear to have 4 nicely separated groups. Adding more clusters made it less clear how to interpret each cluster of defensemen, so despite some small overlap, four clusters will serve to adequately separate the groups. We can dig deeper into these clusters in the table below, which shows the averages of each statistic by cluster grouping.

**For interpretations, blue represents a higher score and red a lower score for a given statistic. Higher does not necessarily mean better, as Penalties Taken could be seen both ways. Giveaways are an example where generally a lower number is better.*

Cluster 2 is at top the list in nearly every statistical category, making this group our **elite defensemen** category. While Cluster 3 has similar point production numbers on the Power Play to Cluster 2, their ratio of Giveaways to Takeaways, Assists to Goals, and Power Play Points to 5v5 Points make this group more of a **playmaking defensemen** category than full blown elite, all-around defensemen group. It should be noted that these two clusters appear to have the most overlap on the PC plot. Next, Cluster 4 is pretty clearly the **enforcer group**, with significantly higher average Penalties Taken and Hits. Lastly, Cluster 1 appears to be the steady, stay-at-home, **defensive defensemen** with limited point production, low penalties taken and minimal giveaways, while having the highest blocks per 60 on average.

Below is an updated table with our new assigned category names:

### Future Work

Now that we have created a mathematical way to bucket defensemen, the next questions I ask myself are: can we breakdown some of these clusters even further? And what combinations of these defensemen are optimal to winning games? Do you want all cluster 1 players, if possible, or is there value in other categories beyond saving money? Do different combinations of types of defensemen play better together?