Predicting Second Year RB Performance
I know you’re not supposed to engage internet trolls, but sometimes it’s hard not to bicker with them. I was in a mock draft with a troll last night – it was an auction, and he decided to spend half his budget on Brent Celek and Clay Harbor (the ol’ Eagles TE strategy). After totally skewing the auction values, he went on to claim that 2nd year RBs are never as good as they were during their rookie years, and anyone who picked Doug Martin or Bernard Pierce was an idiot.
This seemed patently absurd to me. Just from a common sense perspective, on average, guys with more experience ought to get more playing time, and guys who get more playing time ought to have better fantasy seasons. So I told the troll what I thought, which of course led to more trolling blah-blah-blah, but in the end I realized I didn’t have any good evidence to back up my intuitions. The Player Season Finder at pro-football-reference.com makes it pretty easy to investigate that sort of thing, though, so I did, and now I’m sharing it with you.
I found all the rookie RB seasons from 2001-2011 and all the 2nd year RB seasons from 2002-2012, then cleaned the data to make sure that I only included guys who were rostered for both of their first two seasons.
Here’s a summary of the data:
Y1=year 1, Y2=year 2; Att=rushing attempt; Y/A=yards per attempt
There were 353 RBs who played in the NFL for 2 consecutive seasons in the span of 2001-2012, and whether you want to use the average or the median, it looks like Y2 RBs got more carries, gained more yards, and had a slightly better Y/A than Y1 RBs. Eyeballing some summary statistics clearly shows that there are general improvements in RB performance from Y1 to Y2.
For my stat-nerd brethren out there, the differences in attempts and yards were both statistically significant (paired samples t-tests, p’s <.05); however, the increase in Y/A did not reach significance (p=.840). To further clarify my methodology, I log-transformed the data prior to analysis – actually, a natural log + 1 transformation – to normalize the distributions. I’ll be reporting the raw numbers for everything in tables, for the sake of clarity.
I could stop here, and declare victory over the troll, but I don’t think that my RB sample was the most representative of the RBs that are going to be on the radars of fantasy football owners. For instance, the RBs included in the first dataset included a bunch of guys who were rostered but never touched the ball on offense, or who were only used very sparingly. That may skew the data and give us some weird results; plus fantasy football players typically aren’t spending a lot of time looking at guys who are buried at the bottom of depth charts.
100 Total Attempts
My next set of analyses will look at different subsets of RBs from the original dataset. First, I looked at the RBs who had at least 100 total attempts across their first two seasons, on the assumption that guys who get on average at least 50 carries/year are a lot more likely to be drafted by fantasy football owners than guys who get fewer carries:
Same pattern emerges here, but with less statistical power (n=119). RBs increased their carries and yards in Y2, however, neither of these improvements were statistically significant (Att p=.093, Yds p=.170). The water is just plain muddy when it comes to Y/A (p=.532), suggesting no reliable changes one way or the other. On the bright side, TDs increased significantly in Y2 (p<.001), so there’s some additional evidence that second-year RBs don’t always hit a wall. Still, TDs are pretty hard to predict from year to year, so this isn’t much of a case.
100 Attempts in Rookie Year
But we’re not done yet! Some RBs are pegged are drafted to be immediate contributors, and teams give them a lot of opportunities right out of the gate. Just last year alone, Tampa Bay and Cleveland drafted Doug Martin and Trent Richardson to be their respective starters ASAP. Here’s the data for RBs who got at least 100 carries in Y1 (irrespective of how many carries they got in Y2):
Ah, so here’s where the troll maybe has a point. If a rookie RB gets a lot of carries, these data suggest that he’ll slide back a little bit in Y2 attempts (p<.01) and yards (p<.01). The dropoffs weren’t that big, but the significant results suggest that the differences can’t be attributed to chance alone. There were no significant differences for Y/A (p=.119) or TDs (.437). For the RBs who get the most carries as rookies, it seems like there may be a bit of a wall.
100 Attempts in Year 2
The troll got me there. But, that list excluded a lot of top-tier RBs. Guys like Arian Foster, CJ Spiller, and Stevan Ridley were all given fewer than 100 carries as rookies. So, I looked next at guys who had more than 100 carries in Y2 (irrespective of Y1 carries):
We’re back to across the board improvements! And these are much bigger than the gains for the other datasets. More reliable as well – the improvements in attempts, yards, and TDs were all statistically significant at p<.001. But once again, the change in Y/A was not significant (p=.876). This pattern suggests that RBs who show enough talent in Y1 to earn carries in Y2 will generally continue to be productive, even though they don’t necessarily do “more” with their carries than they did in Y1 (as evidenced by the lack of significance in Y/A difference).
There are probably a million interesting ways to carve up my original dataset to look at RB performance from Y1 to Y2, but I don’t think there’s a lot more to learn from that kind of analysis, beyond what we’ve already seen: RBs generally get a little bit more productive from Y1 to Y2, especially the guys who end up with a lot of carries in Y2. The exception to that rule is that RBs who get a lot of work as rookies tend to have a small dropoff in Y2, in attempts, yards, and TDs. Nothing groundbreaking, though it does clarify why the troll thought that RBs hit a wall in Y2.
Here come the caveats:
First, these data don’t include anything related to receiving or special teams. These are obviously important facets of the game, and a better analysis of RB performance ought to at least be based on total opportunities (carries + targets) rather than carries alone.
Second, the cutoffs I used for the filtered analyses were completely arbitrary: there’s nothing special about getting 100 total carries in the first 2 years, or 100 carries per season. A future analysis might try to objectively determine better cutoffs (by looking at average opportunities per team, or something like that).
Third, there’s a LOT of variability in these data. Anytime your standard deviation is approaching your summary statistic, you should make sure to be especially skeptical of the results. I think there’s a big enough sample size here that we can put some faith in the numbers, but I think its still an important thing to recognize.
This isn’t the end, though. You haven’t won a battle with an internet troll by simply showing that you might be right – trolls don’t give up that easily. A nice model predicting Y2 RB performance, though, could provide a nice bit of converging evidence to support my position. Get ready for Part 2…
Part 2 – The Model & Bryce Brown
In Part 1, I presented a bunch of tables showing that RBs usually don’t hit a second-year wall. Since we’ve come this far, though, why not see if we can predict Y2 performance based on how we think a RB’s attempts will change from Y1 to Y2? We’ll call this statistic deltaATT, and it is simply the difference in attempts between Y2 and Y1. A positive deltaATT means a guy got more carries in Y2 than Y1, and a negative deltaATT means fewer carries in Y2 than Y1. We really can’t know in advance whether a RB is going to get more or fewer carries from season N to N+1, but I think we can make reasonable approximations based on preseason data, coaching tendencies, historical trends, etc.
I would think that a higher deltaATT should lead to an increase in Y2 yards and TDs (it necessarily means an increase in Y2 attempts), and maybe an increased Y2 Y/A. I’m only going to look at RBs who are likely to have some relevance to fantasy football owners, and I think the broadest definition of that is the set of RBs who had at least 100 carries total across Y1 and Y2. Looking over the list, there weren’t many relevant RBs who missed that cut. Here’s what the correlations looked like:
|Y2 Measures||Correlation coefficientw/ deltaATT (p’s <.01)|
The strong positive correlations between deltaATT and Y2 performance measures suggest that I’m on the right track. On average, RBs who got (relatively) more carries in Y2 than Y1 had (relatively) more yards, TDs, and Y/A than RBs who got (relatively) fewer carries in Y2.
But correlation is not causation, and regression is a better tool for predicting future performance. Let’s take a look at how well we can predict Y2 measureables (Yards, TDs, and Y/A) based on Y1 measurables: I’m going to do this in a stepwise fashion: first, I’m going to see how well we can predict Y2 measures based solely on Y1 measures; then, I’m going to add deltaATT to the model, and see how it improves the model’s accuracy.
Predicting Y2 Yards
First, I looked at Y2 yards, as a function of Y1 attempts, yards, Y/A, and TDs. The model wasn’t significant – those Y1 factors couldn’t predict Y2 yards any better than chance. Adding deltaATT to the model changed the picture significantly, however (ANOVA of models yielded p<.001); the model including deltaATT accounted for an additional 52.4% of the variance in Y2 yards. That’s a HUGE jump in forecasting accuracy. Here’s what the equation looks like (all predictors have Beta values with p’s <.05, with the exception of the constant, which is marginally significant at p=.079):
lnY2Yds = (lnY1Att*7.251) + (lnY1Yds*-7.119)+(lnY1YA*1.883)+(deltaATT*.885)-5.974
(Keep in mind, that to generate a usable prediction, you’ll need to transform your stats when you put them into the model, i.e. create an Excel column with the formula =LN(value+1); when you get the model output, you’ll need to back-transform, i.e. create an Excel column with the formula =EXP(value)-1.)
Let’s use this model to see what Bryce Brown might do in 2013. As a rookie, Brown had 115 attempts, 564 yards, a Y/A of 4.904, and scored 4 TDs. If he gets 50 more carries next year, the model predicts a minor increase in yardage, to 655 yards; if he gets 100 more carries, the model expects Brown to rush for 1194 yards. Depending on what you expect out of Brown in Chip Kelly’s offense, he could be a very high upside play.
One interesting tidbit to take from that model, is that Y1TDs don’t actually increase the accuracy of the prediction (p=.998). Another note is that lnY1Yds has an inverse relationship to lnY2Yds – the more yards accrued by a rookie, the more likely that he’ll fall back a bit in his second season. And a final note, is that it doesn’t work very well for deltaATT >150 or so (it’s a linear model, so it really over-projects those guys). I’m not especially concerned about that last point, however: if you already know a RB is going to take on a significantly bigger role in Y2, then you can absolutely infer he’s going to have more fantasy production in Y2, based on opportunity alone.
Predicting Y2 TDs
We’re not stopping with second season yards, though. Next, lets take a look at Y2TDs. Once again, adding deltaATT to the model beyond the Y1 stats improved the model, this time by 52.8%. Here’s the model (all p’s <.05):
lnY2TD = (lnY1Att*5.797)+(lnY1Yds*-5.216)+(lnY1YA*6.273)+(lnY1TD*.192)+(deltaATT*.008)-4.603
Looking back at Bryce Brown again (Y1Att=115, Y1Yds=564, Y1YA=4.904, Y1TD=4). If we project a deltaATT of 50, the model predicts 4.7 TDs in 2013 – not much of a change. If we input a deltaATT of 100, however, the TDs jump to 7.5, which looks great long with the 1194 predicted rushing yards. I think that’s pretty reasonable. TDs are notoriously difficult to predict from year to year (though I suspect it would be a lot easier with coaching tendencies accounted for in the models, but I don’t know where to get that data easily), but this isn’t a horrible first step.
Finally, I took a look at Y/A in Y2. In this case, the model doesn’t do a very good job, with or without deltaATT (only a 16% improvement, but overall no better than chance). The only significant predictor, deltaATT, was only weighted at .001, meaning that it managed to account for 1/1000th of the variance in the equation (the constant was nonsignificant at p=.086, all other p’s >.5). This is disappointing – it’d be cool to know which RBs are more likely to bite off bigger chunks of yardage in Y2. But on the bright side, it doesn’t really matter for fantasy football owners, unless you’re in a league that gives points for rate statistics.
The same caveats that I noted in Part 1 apply here as well. In addition, there should be some kind of error term to account for the uncertainty in preseason deltaATT. For instance, we can guess that Doug Martin will receive a similar number of carries in Y2 as he did in Y1, or that Daryl Richardson will get a lot more carries this season than last, but we have no way of knowing whether those things are going to happen. But like I mentioned earlier, I think we can probably estimate the true value of deltaATT a lot better than we can estimate the value of deltaYds or deltaTD. That’s an empirical question, but one for a later day.
In fact, if you want to play around with the raw RB data, you can download them in Excel and tab-delimited text. I also made an Excel calculator for Y2Yds and Y2TDs. Have fun! Tweet me if you’ve got any questions or comments. Frankly, these models aren’t as good as they can get. I suspect that there’s a nonlinear relationship between Y1 and Y2 RB measures, but I’m not necessarily stats savvy enough to figure that out yet. By all means, play around with the equations and see if you can come up with something better!
So in the end, I guess trolls serve a small (and incredibly irritating) purpose in the fantasy football ecosystem: making sweeping generalizations that require some legwork to refute. I’m sure a reasonable person is capable of raising some of the same points as the trolls, but rarely does a reasonable person’s argument get me worked up enough to actually do the research. Stupid trolls.