Heading into the 2021 draft, Ja’Marr Chase was lauded by many as one of the best wide receiver prospects to enter the NFL in a decade. He was the highest-ranked prospect in our Rookie Guide that season and sat inside our top-20 dynasty rankings before the NFL Draft. Why were we, and many others, so optimistic about his future? It was pretty simple. He projected as an early Round 1 pick, made an impact at an early age in college, and had one of the most productive WR seasons in the history of college football.
When considering prospects, we often ask whether they “check all of the boxes.” It’s great when players like Chase accomplish this, but we know that there are plenty of successful players who only checked a handful. Some boxes are more important than others and in some cases, the combination of checks is more important than the total.
Regression trees are a great way to help us reframe this idea. Throughout the site’s history, writers such as Kevin Cole and Anthony Amico used regression trees to consider the importance of production and age in predicting NFL outcomes. Regression trees help us understand the mixture of attributes that tend to drive NFL performance and provide a visual way to understand how these attributes interact. Last draft season, I revisited the use of regression trees to better understand the 2023 class of rookie WRS. You can check out the results of that process here and in the following, we’ll overview regressions trees in more detail, consider the results of the trees built last year, and then apply them to the 2024 WR class.
What Is A Regression Tree?
So what in the world is a regression tree? For this article, think of a regression tree as a series of questions. The response to each question leads to another question, which leads to another question. This process is repeated until all questions are answered and an estimate of the points per game a prospect is expected to score in the first three seasons of his NFL career is reached.
To create the regression trees used in this article, I gathered collegiate production, age, athletic measurables, and draft data for every WR included in the Prospect Box Score Scout. I filtered this listing to only include players who had logged two or more NFL seasons. With some help from an algorithm, I then worked through different combinations of measures until I arrived at a mixture that was well tied to NFL fantasy points, easy to visually follow, and included intuitive inputs.
The specifics of how the tree was built fall outside the scope of this article. But for a little more background, I split my listing of players into training sets and test sets. I then fed the training sets into an algorithm that ran through the process of building the tree by continually separating players based on thresholds. Once the tree was built, I fed the test sets into the model and compared each player’s predicted result (as calculated by the regression tree) to his actual result. I repeated this process, workshopping different mixtures of statistics until the differences between predicted and actual results were limited while also producing a tree that was small enough to be easily interpreted.
This exercise wasn’t as much about trying to build the “best” model possible as it was about giving us another tool to help us understand how WR metrics interact. Further, it gives us another input into building an expected range of outcomes for incoming rookies. One challenge with regression trees is that as you get into the lower branches, you’ll often find a couple that seem counterintuitive. This can happen for a variety of reasons but more often than not it is because the model is overfitting. (This happens when the model aligns too closely to the specific training data, preventing it from being usefully applied to new data.) Also, there may be counter-intuitive relationships, or specific profiles that tend to violate some of the general lessons we have learned.
The Results
Before we look at some of the regression trees that came out of this process, there are a couple of key qualifiers worth mentioning. First,