On Monday the Ravens announced a new position in the front office. Specifically they introduced Sandy Weil as the club’s Director of Football Analytics. In a nod to the statistical analyses made famous to business professionals in the 2003 book by Michael Lewis (and to women in the 2011 film starring Brad Pitt…) the Ravens will ask Mr. Weil to analyze thousands of statistics to determine “market inefficiencies”. This essentially means that the Ravens’ new Prognostication Czar will attempt to sift, combine, shuffle, and re-combine arcane data sets into meaningful decision making tools for front office personnel and coaches. Contrary to most of the comments I’ve read in the blogosphere on the topic, most of these data will be used to inform draft decisions and game plan formation versus individual play calling.
I welcome Sandy to the team and wish him well, but admit I’m skeptical about the utility of applying Moneyball-type analytics to football. It’s not that I’m afraid of math. Despite my Alabama public school roots, I’ve always had a knack for the subject and concentrated in the area in college and beyond.
My initial misgivings on football analytics were based on the problem of sample size. Specifically, I viewed the 16-game NFL season as too small to glean meaningful data from as compared to a 162-game MLB season. As I dug deeper, however, I realized that this was not as problematic as I first thought. Yes, a typical MLB season has roughly 10x more contests from which to gather data, but when one considers the number of opportunities throughout a season for some players to influence key statistics the problem fades somewhat.
As examples consider the king of offensive baseball statistics – on base percentage, versus an NFL running back’s measure of merit –average yards per carry. Assuming an MLB player has 4 at bats per game and plays all 162 games the statistician has 648 data points. A more realistic view of 3.5 at bats over 150 games (we’ll give the poor guy a few days off…) yields 525 bits of information. An NFL running back with 20 carries per game over a 16 game season generates 320 data points, a lesser number to be sure, but not a statistically insignificant number. Even a more realistic 15 carries per week over 15 games still creates a decent sample size of 225.
So for skill positions that have objectively quantifiable statistics that can be measured around 10 times per game, the small sample size issue is probably not a compelling reason to discount football analytics.
For positions other than quarterbacks and running backs, however, I’m still concerned that sample size can be an issue. For instance, some of the statistics now used to quantify a receiver’s worth such as yards gained after catch could be tricky. Let’s say a receiver is thrown to 5 times and makes 3 of the catches. Aggregated over a 16 game season this is only 48 data points (I even did this one without a calculator…). While not completely insignificant, such a small sample size makes the data set more prone to intangible anomalies. This problem gets worse when the limited data points are further divided into other categories, say yards after catch at home vs. away, or during prime-time games vs. 1:00 pm games, or against the Steelers. In some of these categories there might only be 3 samples.
The other primary problem I have with the growing football analytics craze has nothing to do with sample size, but is related to the 16-game NFL season. More specifically, when a club has only 16 opportunities to make the playoffs, each contest gains much more meaning which decreases the predictive value of statistics and increases the significance of those elusive intangible factors.
I’m sure that Sandy is well aware of these issues and is using data sets that I’ve never considered and in ways that none of us have thought of. I hope this is the case as our cap space needs some serious relief.
Again, welcome to the Ravens Sandy. On behalf of the Ravens Nation, I wish you great success. I’m reminded, however, that as well as Billy Beane’s Oakland A’s performed (especially considering their payroll), they didn’t win the World Series…