Besting the Bookie: Predicting NFL Games to Bet On

I hope if you’re reading this you are aware of the explosion of the sports gambling industry. If not, just turn on ESPN or FoxSports1 and you can see whole shows dedicated to gambling, a wide variety of podcasts discussing the topic, and a limitless amount of Twitter accounts handing out picks. In 2018, the Supreme Court ruled that an earlier ruling in 1972 was actually unconstitutional and violated the 10th amendment, which opened the door for states to legalize and sponsor sports betting licenses. Since then a handful of states have legalized sports gambling and have reaped the reward heavily, especially in the age of the pandemic. Once sports resumed in mid to late summer, the demand grew exponentially and helped keep these massive industry leaders afloat until regular brick-and-mortar operations could resume. As shown below, in January of this year sports related betting (yes, e-gaming is widely bet on) was up almost 3x showing the robustness and huge capital to be made by the industry.

The NFL in particular is the driver for this growth. It is by far the most bet on sport across the world and in the US, bringing in $100B and $4.3B respectively. So how can we get in on the action and make some money watching the NFL? The answer is using an Elo rating system. If you are unfamiliar with how the system works, I highly recommend heading over to FiveThirtyEight and checking out their article describing their development cycle as it was used as a foundational piece for developing my own rating system, plus they do a great job explaining each component. Once you’ve done that, the process shown below will make more sense.

There are really 2 key areas where our creativity, intuition, and domain knowledge will come into play and have the biggest impact on our model’s performance. The adjustments to the ratings we choose to make, and how we decide when to place a wager. The first will involve using our football knowledge to make informed decisions about what information to include, limiting ourselves to what is known prior to avoid data leakage and overestimating our performance. The second is much more subjective since we can optimize our returns either assuming we bet on everything and just changing our ratings accordingly, or optimize our selection process in order to aid our model’s guidance. Since volume betting is a notorious trap, we will go with the latter. This will allow us to focus on model performance predicting outcomes without fine tuning or eliminating features just because we are losing “at the window”.

FiveThirtyEight’s model explanation is accompanied by code that lets you play in their yearly Elo ratings game to see how well you think you know the NFL when it comes to picking games. It also gives a baseline model to build from that includes initial ratings (think of them as seeds from which our own ratings will grow) for each team and three adjustments to play around with:

· Home field advantage: a constant 55 Elo points added for the home team. We will work to incorporate travel and adjust home field for low attendance, less enthused fan bases, and cities with many transplants from more enthralled fan bases (i.e. Los Angeles)

· Scaling factor K: constant 20. In order to avoid the plague of autocorrelation we will adjust this factor using margin of victory and how close our predicted outcome was to the actual, which is one of our evaluation metrics

· New season regression ratio: every new season in order to account for drafts, free agent signings, retires, and coaching changes FiveThirtyEight reverts each team’s final rating 2/3 to the mean

In order to produce new ratings we will need to gather relevant data. I scraped every NFL game from and put this data into several SQL databases so I could query and aggregate the relevant situational and secondary stats that can be used as features in our model. Our ratings will incorporate the following adjustments:

· Rest: is a team coming off of a bye week? These teams have a chance to get more players back from injury and prepare more for the next opponent, where as a team who is not has less than a week to prepare and potentially travel (another negative impactor)

· QB value: the most important position on the field that even Vegas admits (through line changes) has the biggest impact on the outcome of a game. We will heavily adjust our ratings based on the value a QB has which will be dependent on stats such as completions, yards, third down efficiency, and QB rating against similar defenses

· Primetime games: we’ve all heard it before that some people shrink and some shine when the lights are brightest, so we will adjust ratings based on previous performances in primetime games, which include Thursday night, Sunday night, and Monday night football since they are all stand-alone games that are nationally televised

· Divisional games: these are the opponents a team is guaranteed to see multiple times every year and will most directly impact their chances of making the playoffs since every division winner is guaranteed a playoff spot. These games are also shown to be closer and more competitive as we should expect. The more you see and learn from your opponent the more you can prepare to slow them down or attack their weak spots

· Vegas spread: because 25 Elo points is equivalent to 1 point in the NFL we can convert our Elo difference between any 2 teams to come up with our expected spread value. Comparing this result with the Vegas line can show us market inefficiencies as well as inefficiencies in our model. If our expected spread is closer to the actual outcome then we can be confident in our ratings and possibly placing a wager on that particular game. If the Vegas spread is much closer than we can use that to adjust our ratings accordingly back calculating our difference

· Early down success rate: this is defined by Warren Sharp as plays on first and second down that gain 6 yards for pass plays and 4 yards for run plays. He has screamed online about how this is most correlative factor to wins and losses in the NFL and how getting the lead quickly will impact the end result very determinatively

Now that we have our adjustments, let’s look at how we will use these ratings and subsequent match up calculations (win probability & expected spread) to make a more informed decision about what games to place a wager on. Because our model is now set in stone we can find out what kind of threshold we need to set to get some kind of return. Using this framework, it came that to place a wager we had 2 things to consider: the type of wager & the criteria to place that kind of wager. Our 2 types of wagers are moneyline bets and against the spread. Moneyline wagers are simply saying this team will win (or conversely this team will lose), and the have a varying risk level for the same reward based upon the match up. For example placing a wager on the best team in the league to beat the worst and risking one unit will yield very little return. Another way to think about that is that you will have to risk a lot in order to win 1 unit. With a bet against the spread you are saying that a team will win or lose by some amount. Now this gets a little more complicated to think about the sides and the implications, but the simple part is this: your return is standardized (usually 0.91 for every 1 unit risked) as this is the most common bet made. For our purposes, to make a bet of either category one criterion must be met. For moneyline bets, we must have a probability greater than 60%, and for a bet against the spread, our expected spread must be greater or less than the Vegas line by more than 2.5 points.

With that in mind we can finally evaluate how well our model does in “besting” the bookie. With all of our adjustments, the lowest Brier score (win probability — binary outcome) was 0.172, which is almost ~20% better than the baseline model. Most importantly, our wager win percentage was 56.2%! That’s pretty good. Anything above 52.5% will return some kind of profit, and professional gamblers usually boast about any success around 58–60%, so we are in the money, but still have some exciting room for growth. Diving into the results a little further we can see some highlights, and some shortcomings. Our model struggled most to adjust for 2 things: injuries (especially to the QB position) and QB’s on their rookie contract who hit the ground running and vastly over performed their value rating (think Lamar Jackson & Patrick Mahomes). We can improve that in the future with developing a college Elo system so that we can better predict rookie performance, whereas predicting injuries is much more difficult (at least with the data we have currently). Where the model excelled was showing not too much autocorrelation which was most evident in our wagers. If our model was heavily autocorrelated we would have almost exclusively bet on heavy favorites, but that wasn’t the case, as 53% of our against the spread bets were on the underdog. Our QB value system also seems to be pretty reasonable. When you look exclusively at QB value, all the league MVP’s who were QBs were in the top 3 for season long QB value.

This was a pretty good first pass at building out an Elo rating system, but there’s some future work that could be done to make it more effective. We’ve all heard “defense wins championships”, so incorporating a defense value similar to the QB one could prove helpful. Simulating the upcoming season could also show how robust the ratings are, and how they adjust in adverse circumstances or edge cases. Using more advanced data that can be gleamed from using more advanced machine learning techniques such as computer vision or neural networks could provide new input features that can improve the model as well.