Learn how to use Python to evaluate team strength using nflfastR's EPA model.
If you like Fantasy Football and have an interest in learning how to code, check out our Ultimate Guide on Learning Python with Fantasy Football Online Course. Here is a link to purchase for 15% off. The course includes 15 chapters of material, 14 hours of video, hundreds of data sets, lifetime updates, and a Slack channel invite to join the Fantasy Football with Python community.
In this post, we're going to do something that's more general NFL-analytics than straight Fantasy Football analysis.
We're going to be using nflfastR's (exposed through the Python package nflfastpy) EPA (Estimated Points Added) model to visualize the best offenses and defenses in the league.
nflfastpy's play by play data comes with EPA data for each play. EPA is a model that estimates the expected points added per individual play based on starting and ending field position, down, and field goal distance.
Each play has an EPA, and we're going to be finding each team's EPA per play on offense and defense. For offense, it's straight forward. If a play has an EPA of 1.2 on offense, that means the offense moved the ball such that they added an expected 1.2 points to their score. For defense, it's going to be the opposite. If a team is on defense, and the EPA for the play is 1.2, then we'll say the defense gave up or allowed an estimated 1.2 points on the play. Team defenses with more negative EPAs are better defenses, while team defenses with more positive EPAs are worse defenses.
This analysis will be helpful for fantasy purposes since having players on good offensive teams facing poor defensive teams is a recipe for success. This can also be useful for streaming defenses. In the next post we will take this even further to look at strength of schedule. This will be even more helpful for your fantasy team since we can focus in on players that will have an easier defensive schedule in the second half of the season.
First things first, load up your Google colab or jupyter notebook and import the libraries we'll need for this post.
Next, we'll load in 2021 play by play data via nflfastpy. We've used this data quite a bit, just as a reminder it is an extensive database detailing every snap that has taken place so far this year.
play_id | game_id | old_game_id | home_team | away_team | season_type | week | posteam | posteam_type | defteam | ... | out_of_bounds | home_opening_kickoff | qb_epa | xyac_epa | xyac_mean_yardage | xyac_median_yardage | xyac_success | xyac_fd | xpass | pass_oe | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2021_01_ARI_TEN | 2021091207 | TEN | ARI | REG | 1 | NaN | NaN | NaN | ... | 0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 40 | 2021_01_ARI_TEN | 2021091207 | TEN | ARI | REG | 1 | TEN | home | ARI | ... | 0 | 1 | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 55 | 2021_01_ARI_TEN | 2021091207 | TEN | ARI | REG | 1 | TEN | home | ARI | ... | 0 | 1 | -1.399805 | NaN | NaN | NaN | NaN | NaN | 0.491433 | -49.143299 |
3 | 76 | 2021_01_ARI_TEN | 2021091207 | TEN | ARI | REG | 1 | TEN | home | ARI | ... | 0 | 1 | 0.032412 | 1.165133 | 5.803177 | 4.0 | 0.896654 | 0.125098 | 0.697346 | 30.265415 |
4 | 100 | 2021_01_ARI_TEN | 2021091207 | TEN | ARI | REG | 1 | TEN | home | ARI | ... | 0 | 1 | -1.532898 | 0.256036 | 4.147637 | 2.0 | 0.965009 | 0.965009 | 0.978253 | 2.174652 |
5 rows × 372 columns
Here, we're making a DataFrame called epa_df which will sum up team EPAs for each play and we'll also count the number of plays. In a moment, we'll also visualize the relationship between team offensive yardage and team EPA / play.
offense_epa | offense_plays | offense_yards | offense_epa/play | |
---|---|---|---|---|
ARI | 71.112288 | 752 | 3586.0 | 0.094564 |
TB | 56.053889 | 698 | 3385.0 | 0.080306 |
LA | 57.654229 | 728 | 3596.0 | 0.079195 |
IND | 57.924929 | 747 | 3341.0 | 0.077543 |
KC | 56.133836 | 780 | 3540.0 | 0.071966 |
GB | 41.429580 | 701 | 3001.0 | 0.059101 |
DAL | 37.671326 | 715 | 3478.0 | 0.052687 |
BUF | 33.788242 | 694 | 3123.0 | 0.048686 |
TEN | 35.558411 | 786 | 3215.0 | 0.045240 |
CLE | 30.852390 | 720 | 3427.0 | 0.042851 |
Not many surprised on this list. Arizona and Tampa Bay have clearly been the best offenses this year so it checks out seeing them with the highest offensive epa/play.
Let's move on to visualizing the relationship between yardage and EPA per play. We'll also use the scipy.stats package to find the R-squared and place it in the plot title.
We can see there is decent correlation between yardage and offensive EPA per play. The correlation is actually stronger when you look at team touchdowns. We'll move on, though, to finding defense EPA/play. Since the DataFrame is already instantiated, let's just add the defense columns via assignment.
offense_epa | offense_plays | offense_yards | offense_epa/play | defense_epa | defense_plays | defense_epa/play | defense_yards_given_up | |
---|---|---|---|---|---|---|---|---|
DET | -59.581578 | 671 | 2577.0 | -0.088795 | 73.452331 | 622 | 0.118091 | 3033.0 |
NYJ | -50.516508 | 659 | 2637.0 | -0.076656 | 75.569352 | 697 | 0.108421 | 3267.0 |
WAS | -22.375791 | 652 | 2789.0 | -0.034319 | 58.822950 | 680 | 0.086504 | 3115.0 |
KC | 56.133836 | 780 | 3540.0 | 0.071966 | 61.340686 | 719 | 0.085314 | 3437.0 |
JAX | -52.687387 | 668 | 2663.0 | -0.078873 | 46.280072 | 657 | 0.070442 | 3003.0 |
These are the 5 worst defenses in the league by EPA per play. Remember, more positive EPAs per play on the defense side are bad. This means the defense is allowing (an estimated amount) of more points per play.
Let's now visualize the relationship between defensive yards given up by a team and defensive EPA.
We can see here that the correlation between defensive EPA and defensive yardage given up is a looser tighter than offensive EPA and offensive yardage. This may change slightly from year to year.
Let's tie everything together and scatter plot defensive EPA on the y-axis and offensive EPA on the x-axis. This will more clearly demonstrate which teams have a good defense and good offense, bad defense and good offense, bad defense and bad offense, and good defense and bad offense.
And that's it! The visualization is pretty self explanatory, and some of the results make a lot of sense if you've been following the NFL this season. Arizona Cardinals, Buffalo Bills, LA Rams, Bucs - all of the best teams in the league in the upper left corner, and the Texans, Jags, Jets, Dolphins, Bears - all of the worst teams in the league in the top right corner.
Like I mentioned next week we will dive a little deeper and start applying this to strength of schedule for fantasy purposes.
Thanks for reading!