All posts by Jack Werner

My Model Monday: Curling Win Probability Model

It seems like every four years when the Winter Olympics come around, curling has a moment. This year’s Pyeongchang games are no different. Curling gets a ton of love online, for reasons both ironic (its shuffleboard-on-ice aesthetic and inherent meme-ability) and non-ironic (its simple-enough rules and interesting strategy). Sure it’s a little goofy, but if you’ve found yourself getting into the sport within the past week or so, you’re not alone. Continue reading My Model Monday: Curling Win Probability Model

My Model Monday: Baseball Names from Z to A

When I heard I was scheduled for the second-ever installment of My Model Monday, I felt a heavy responsibility on my shoulders. I’m very excited about the opportunity to bring regular shorter-form analysis to Model 284, but I also knew that in week two, this series would still be finding its footing. I needed to choose my topic with care. I needed a subject that was interesting, important, and relevant. Something worthy of the short but solid history of analysis we have here at Model 284.

So I chose to write about Engelb Vielma.

What’s that? You haven’t heard of my favorite minor-league shortstop? The former Twins prospect (until he was waived and claimed by the Giants last September)? Alright, let me explain.

Any old prospect can wow you with unique skills, unicorn-like physique, or all-star potential. Sure, a guy may hit moonshots or mow down a ton of opposing hitters. Nick Gordon? Fernando Romero? A dime a dozen. But rare is the prospect who introduces you to a new hobby. Engelb Vielma did this for me; ever since I heard about him, I haven’t been able to stop thinking about name reversibility.

Here’s what I mean: say you’d never heard of Vielma, and I told you the Twins had a new player whose name was either “Engelb” or “Blegne” (Engelb backwards). Which name would you sooner believe? It’s at best a toss-up, right? I think I’d even be tempted to pick Blegne. After all, it could be a creative spelling of “Blaine”, like the male equivalent of “Aimee” or “Khloe.”

Reversible names. Exhilarating stuff!

But I couldn’t just stop at Engelb. Were there other baseball players out there with reversible names who I’d either never heard of or overlooked? This became very important to me—though not quite important enough to check every name by hand. No, instead it was time for some of the hard-hitting analysis we love here on this site.

Ygolodohtem

I came up with way to give every name a score — higher for a name that seems more natural forward than backward, and lower for the opposite. Here’s the intuition behind it. Why does “Blegne” look more natural than “Engelb?” For starters, you don’t see many names that end with the letter B, but a bunch of names start with it. In the same vein, “Bl” seems like a natural combination, but “lb,” not so much. If I could look at each letter-pair and compare how common it is forward versus backward, I could get a good sense of a name’s reversibility from these frequencies. In the case of Engelb, the score would gauge how common “eg” is compared to “ge,” and how often words end with “b” rather than start with it. The overall score would be the average of these individual scores.

Now for the nitty-gritty details (feel free to jump to the next section if you want – I’d be offended, but just don’t tell me). To gauge letter-pair frequencies, I downloaded the Social Security Administration’s baby name data. Using the names from 1950 to 2016, I calculated the frequency of each letter pair (each time it appeared in a name, times the number of times that name was used). I also calculated the frequency of each pair’s reverse. A letter-pair’s score was roughly the frequency of that letter pair divided by the total number of times those letters appeared together in either order. I also included a normalizing factor and scaled it from -1 to 1. Here’s the formula:

\displaystyle Score = 2 \ast \frac{freq_{pair} + 5}{freq_{pair} + freq_{reverse} + 10} - 1

A quick example: the pair “mi” has appeared 11,088,958 times in baby names between 1950 and today. (Almost four million of those were from the name Michael alone!). The pair “im,” on the other hand, appeared 3,234,352 times. The score for “mi” is:

\displaystyle Score = 2 \ast \frac{11,088,958 + 5}{11,088,958 + 3,234,352 + 10} - 1 = 2 \ast \frac{11,088,963}{14,323,310} - 1 \approx 0.55

The main thing you need to know: right-side forward pairs should be close to one, backward pairs should be close to -1, and toss-ups are around 0. Average all the pair scores, and you get the complete name reversibility score.

Stluser

I used this technique to score the first name of every single player on an MLB 40-man roster as of January 13. Any name with a score around zero is a candidate for reversibility, and any with a negative score should look better backwards than forwards, at least in theory. Out of 530 unique names, 64 were negative. Here are some highlights.

  • AJ (-.76, 1st most reversible): The lowest score, but initials kind of feel like cheating.
  • Nik (-.48, 2nd): I’m sure Nik Turley has had his C-less name spelled wrong too often to count, but it’s all worth it now that he’s gotten the second spot on our list.
  • Noel (-.28, 7th): An encouraging test case for our score, since Leon is itself a name.
  • Nomar (-.24, 9th): Ditto.
  • Boog (-.17, 17th): The MLB has counted two unrelated Boog Powells among its ranks, which is fitting for such a perfect baseball name. But if you ask me, “Goob” would be even better.
  • Socrates (-.08, 36th): If our scoring system is confused by this name, it clearly hasn’t been reading its Western canon. But Setarcos does roll off the tongue…

And here’s my reversibility dream team. All of these players have names that I think look great backwards. All have scores under 0.15. All except Keuchel, Alonso, Suarez, and Herrera have negative scores.

  • P: Sallad (Keuchel)
  • C: Reiday (Molina)
  • 1B: Rednoy (Alonso)
  • 2B: Denguor (Odor)
  • 3B: Oinegue (Suarez)
  • SS: Laburdsa (Cabrera)
  • LF: Sineoy (Cespedes)
  • CF: Lebudo (Herrera)
  • RF: Leisay (Puig)

Noisulcnoc

What can we take away from this very important investigation? Three things. First, apparently MLB teams don’t value name reversibility like I do, because as I was writing this, the Pirates designated our dear Engelb for assignment. Second, apparently no problem is too insignificant or stupid for me to waste too many hours tackling on a Saturday. And finally, we need a major leaguer named Goob. At least make it someone’s nickname. I need this!

NBA Lineup Evaluator: Spacing (2016-2017)

Below is a table of our NBA Lineup spacing metric applied to all NBA Lineups that played more than 50 minutes together in the 2016-2017 season. Our NBA Lineup Spacing metric seeks to quantify a lineup’s ability to generate and score from efficient shots (i.e. at the rim and from the three point line). For complete methodology behind the calculation, see here.

Continue reading NBA Lineup Evaluator: Spacing (2016-2017)

NBA Lineup Evaluator: Spacing

Preface

First and foremost, I regret to inform you that this analysis is NOT done with player tracking data.

Secondly, I want to say this lineup metric is called spacing, but it is not really a measure of spacing; it is more a measure of how capable a lineup is of producing efficient shots. So, why call it spacing? Firstly, because spacing is catchy and trendy, but also because we believe that when the average fan thinks or hears the word spacing, they are generally thinking about maximizing the optimal shots in basketball: three-pointers and shots at the rim.

Continue reading NBA Lineup Evaluator: Spacing

NBA Lineup Evaluator: Diversity

In most sports and at most skill levels, if you are unpredictable in your movements and actions you will have a better chance at being successful; you’ll have a better chance of beating your defender if he doesn’t know what you’re going to do. Granted, at the end of the day, high-performance level always wins out, but one can give themselves a better chance of winning a battle by being unpredictable or diverse.

Continue reading NBA Lineup Evaluator: Diversity

Components Methodology: NBA Lineup Evaluator

This article details the methodology and calculations of the components found on our NBA Lineup Evaluator. Each component represents a different skill or ability an NBA lineup could have. We can use these to asses strengths and weaknesses of NBA lineups that have yet to play together, or that haven’t played enough minutes to accurately evaluate their performance. Data is trained from NBA Lineups from 2015-2017 that played at least 50 minutes together. All data comes from either NBA.com or Basketball-Reference.com.

Continue reading Components Methodology: NBA Lineup Evaluator

Wait, why does PML favor the Astros?

After four games, this World Series is tied at two games apiece and shaping up for an exciting finish! According to our PML model, Houston has a 53% chance of winning it all. This might come as a surprise; the majority of other prediction models, like FiveThirtyEight’s Elo ratings, give Los Angeles a slight 56% edge—not surprising, given the Dodgers play two of the three games at home. In contrast, PML favors the away team in every game from here on out. Ultimately, the difference between the 44% chance FiveThirtyEight gives the Astros and our 53% is not huge. Whichever you prefer, the Series is a toss-up. But it’s useful to dig into why we’re a bit higher on the Astros, and in the process, get to know our PML model a little better.

Continue reading Wait, why does PML favor the Astros?