# Is ‘home field advantage’ worth taking down a banner for?

Do you think if you flipped a coin in a mint, it would show heads more than tails? Imagine if we set up a small coin-stadium in or adjacent to the mint where the coin was made, where other coins would sit around watching the coin get flipped. Say we flipped the coin outside of the stadium first a bunch of times and showed that it was relatively 50/50 whether it was going to be heads or tails, but then we went back to this mint-stadium and flipped the coin 3,879 times, and it turned up heads 2,219 times. With a simple statistical test, you can show that the probability of a 50/50 coin giving this result in the stadium is 0.000000000256%.

Football is not a coin. However every team – no matter how good or bad – plays 16 games in the regular season: 8 of those at their own stadium and 8 of those at an opponents stadium, so a good team will play at home as much as a bad team will. Yet when you run through the stats the ‘home field advantage’, i.e that the home team are more likely to win than the away team, is more statistically significant ($8\sigma$) than the detection of the Higgs boson ($5\sigma$).

What I’ve got: 14 years of regular season NFL data (2000-2014)  – a few thousand games, half a million plays.

What I’m going to do with it: Try and find which bits of a football game are affected by ‘home field advantage’ in a (fairly) rigorous manner.

To prove that home field advantage is a big deal in the NFL, let’s briefly go over one of the funnier stories of this week. For those who are sick of bannergate, skip to the next horizontal line.

Mike Dobs, family man living in North Carolina, woke up this sunday knowing that by the end of the day his dream would become a reality. Mike is originally from Wisconsin where he grew up supporting one of the dominant NFL teams in recent history, the Green Bay Packers. Mike also managed to land front-row tickets to what was poised to be one of the best games of the season, his hometown Packers against the Carolina Panthers at Bank of America Stadium, the Panther’s home stadium. Mike wasn’t happy just going to the game wearing a Packers shirt and wearing a cheese hat (Packers fans like to be known as cheeseheads), he had to find a way to really show he cared about his hometown team – there was only one thing for it.

(via)

Cam Newton, quarterback for the Carolina Panthers, woke up this sunday hoping by the end of the day he would still be an MVP candidate. Cam has been having an extraordinary season winning all of his first 7 games – mostly on his own back – by his intelligence, physical prowess and just a real desire to win. Cam knew that this game might end the winning streak of the Panthers, as the Green Bay Packers were also dominating, having lost only one game this season – today was not going to be easy. Cam took some solace from the fact that the game was going to be in Carolina and so he would have the crowd on his side,

This is where our stories cross paths:

(via)

Here is quarterback Cam Newton walking across the field in North Carolina. The white bundle under his arm is Wisconsin-ite Mike Dobs’ cheesehead banner, that he has just taken down.

Cam Newton cares about home field advantage enough to worry about one banner in a stadium with 75,000 seats. Cam and the Panthers took the game to go 8-0 on the season. Coincidence? Well yes probably, but lets keep going anyway.

## False Starts

If you switch on to an NFL game and want to know whether the home team or the visiting team is on offense, listen to the crowd. While the visitors are setting up  on offense to run a play, any good home crowd will be going absolutely wild – screaming, shouting and clapping to try and ruin any possible communication between the offense before the play starts. When the home team is on offense, the stadium will be as quiet as a mouse.

The current record for how loud the crowd can get is held by the Kansas City Chiefs, who reached 142.2dB against the Patriots in 2014 – 140dB is how loud a jet engine would be if it travelled 100ft above you.

The classic sign of miscommunication in football is a false start. A false start happens when a player on the offense moves before the play is started by the center giving the ball to the quarterback.

If the game is quiet, the quarterback can just actually tell the offense when the play is about to start, they shout something indiscernible like “hut”, “hike” or “go” and everything starts from there. However if the game is loud then this won’t work and you would expect the visiting team to get more false starts. And yes they do:

So you have the effect of a jet flying 100ft above you, you miss the cue and the offensive line jumps and causes a false start. It’s easy to see why the crowds take such pride in causing these.

(via)

You could argue that the crowd is the most pumped in the 1st and 4th quarter and that is why these are the largest deviations, where in the 3rd quarter people are still at the concessions stand buying potato waffles. Unfortunately potato waffle data isn’t available to me until I get signed to a sports website so this is just speculation.

So far, so predictable. What you might be surprised by is how small the deviation is, what this means in principle is that for every 10 false start penalties a home team commits, the visiting team will commit about 11.

This is because, as a result of the crowd noise, the offense changes the playcall. There are many ways that an offensive player can know when to start a play, the two most common are using verbal communication as described before or simply watching the ball and only moving when it does. That is why when you see a receiver lining up, they often look towards the ball. For a team who cannot hear their quarterback, this will be how they snap the ball.

This is disadvantageous to the offense as they don’t get a headstart on the play and have to be reactive instead of proactive, however it will significantly cut down on the amount of false starts called. Perhaps not having to be conservative is part of the home field advantage.

## Encroachment

Luckily, there’s an easy way of testing whether the visitors are doing more conservative snaps than the home team. A defensive line who thinks they know when the play will start will not wait until they can see the ball moving but instead will simply start charging at the quarterback at the time they’ve guessed the ball’s going to move.

This works magically when done right, like so:

(via)

However, when the defense is wrong and jumps too early, this is a foul. If you are a quarterback calling snaps non-conservatively, you will be able to keep the defense confused by changing when and how you snap the ball, and the defense either has to choose between getting a bunch of penalties (which is good for your team) or themselves playing conservatively (which is good for your health).

If you agree with my logic that an offense with a conservative snap count will draw less penalties, then the visitors getting more defensive line penalties is to be expected. The home team is taking a more aggressive style of snap, and therefore causing the visitors more penalties.

## Third Down Conversion

One of the most important qualities in an offense is their ability to convert third downs into 1st downs. A third down, if converted, will allow an offense to stay on the field and try to score some points. If it is not converted then in most cases the ball is given to the other team’s offense and they try and score.

Like the two above, this is a particularly useful number to judge this home field advantage as it is only weakly biased by any tactical decision of the offense. It is beneficial to convert 3rd down if you’re down by 10 points, tieing or up by 17.  This is opposed to statistics such as interceptions, for example, as a team who are losing will generally try more outlandish throws which are more likely to get intercepted, whereas a winning team will go for easy completions for security and to run out the clock.

So third down conversion is – to a certain extent – an unbiased judge of how well an offense is performing, let’s compare home and away.

What’s cool about this one is that it seems like there is a true advantage for the home team in the first quarter, then as the game progresses that same pattern keeps but just gets less significant. Maybe the difference to start with is due to nerves and during the game players start to get over it and start playing football.

## All the others (and some kind of substance)

Right, you don’t need me going on and on anymore, but I have a confession: I am a statistics tutor and if any of my students see the unsubstantiated claims above and I don’t clarify myself, then why would they listen to me in the future?

So let me put some substance to my claims, but first let me explain how I’m doing so.

Imagine you were to flip a coin 10 times, and got 6 heads. It’d be easy to believe that the coin was still fair and has a 50% chance of showing heads, just that you were a little unlucky when flipping. If you flip the coin 10,000,000 times however and get 6,000,000 heads you would be pretty certain that the coin wasn’t 50/50, even though the same proportion (60%) of flips turned up heads. This is because in the first case we don’t have much data and we can conceive of being unlucky enough to get 6 heads, however it is tremendously unlikely that a fair coin would show heads 6,000,000 times when flipped 10,000,000 times. The concept we’re touching on here is that there is an inherent uncertainty whenever we don’t have infinite data, and the less data we have the higher the uncertainty is.

That means every time above when I have said that one quantity is higher than the other, I have neglected to mention how significant that deviation is. And how significant the deviation is can be very important. To put it in a way you’ll appreciate: Todd Gurley‘s average fantasy points per game played (5 games) is 19.56 and Jeremy Langford is at 22.20 (1 games). Now if you have Todd Gurley and your friend offers you Jeremy Langford, would you take him? No, because we don’t know whether Langford just had a good game or was operating at normal capacity – we need more information, more games.

So listed below are the home/visitor quantities I looked at, their uncertainty and its significance. The way to read this is that in the significance column the percentage is the probability that this difference could come about purely by chance – so the smaller the percentage the more likely that this is a real fundamental difference.

 Quantity Home Visitor Uncertainty ($\sigma$) Significance Pass interception % 2.816% 2.999% 0.05% 3.68$\sigma$ (0.05%) Sack per pass play 6.326% 6.494% 0.0795% 2.1$\sigma$ (5%) Yards per play 2.856 2.9233 0.00306 21$\sigma$(≈0%) Third down conversion% 39.31% 37.75% 0.439% 3.55$\sigma$ (0.05%) Completion % 56.41% 55.25% 0.161% 7.5$\sigma$ (0.000000000003%) False starts 5019 5492 104.8 4.51$\sigma$ (0.0007%) Defensive line penalties 3111 3363 80.46 3.13$\sigma$ (0.2%) Games won 57.21% 42.79% 0.80% 8.97$\sigma$ (0.0000000003%)

There is a way of comparing these all together on the same graph, and it comes with the disclaimer that this is the most conceptually difficult graph that I will ever put on this site, and that’s a guarantee. However, if you get your head around it it’ll put that table up there in a better context. If you fancy just ignoring it, skip to the next horizontal line and you won’t have missed too much.

Firstly, what we want to do is define a quantity like so, lets take yards per play (YPP) as an example:

diff= $\frac{YPP_{home} - YPP_{away}}{YPP_{all}}$

The top of this fraction is just the difference between $YPP_{home}$, the YPP for a home team and $YPP_{away}$ which is the YPP for the away team – this difference is in terms of the YPP, but we want to know how big a difference it is with respect to the kind of value for YPP you would expect so we divide it by the average YPP for every play: $YPP_{all}$.

Don’t let that sink in too much, because in essence it doesn’t matter. The long and the short of it is that the bigger diff is for a quantity, the bigger the effect of being at home is on that quantity.

So on this graph, on the upright axis I will be plotting something like this ‘diff’, except I will make sure that if diff is positive, then it means that it benefits the home team. So for example the visiting teams have more interceptions, and so diff of interceptions should be positive. Also for each point, I will be showing how big the uncertainty on the point is. This is to show you how significant the deviation is. If the deviation is small compared to the distance from the ‘zero’ line, then this is a deviation you can believe in, however if the uncertainty line and the zero line cross, then the deviation isn’t particularly believable.

Anyway, here is that graph:

So we see that home field advantage does actually produce a statistically significant difference in how well a team plays. The largest deviations are in the quantities you’d expect: false starts and defensive line penalties. I think this is interesting as false starts are kind of seen as the “fan’s penalties”, however its never properly emphasised that the fans protect the defensive line from penalties too, and this effect is just about as large. Another interesting point is that the only quantity that the home team is disadvantaged in is yards per play. I’d argue that this is because this statistic is very strongly tied to how well a team are doing: if a team is in the lead then they’re more likely to do low yardage plays such as high completion passes and inside handoff runs. So since we expect the home side to be in the lead more, then we would expect the visitors to be running high yardage plays more.

The largest deviation of all is in ‘wins’ though, which goes to show that as much as I can go through the data looking for home-visitor correlations, it won’t ever match hitting a go route with a perfectly thrown 35 yard touchdown pass in overtime after a 12 point, 2 minute comeback in the NFC Championship. That’s just clouded in uncertainty.