The World Cup of pointless statistics
In an attempt to create a neat tie-in with the ongoing World Cup, I set out to write a blog about useless statistics in both football and cybersecurity. Partly because it’s an interesting comparison and partly to give my wife some relief from hearing me talk about football. The blog has largely turned out to be a rant about the use of xG in football, with a tenuous link to cybersecurity tacked on at the end.
Football, like most sports, brings with it a plethora of statistics. Statistics in modern sport have two primary purposes:
Encouraging people to gamble on increasingly obscure in-game events.
Encouraging people to argue on Twitter.
From a sporting perspective, most statistics about football are largely meaningless. In each game, the only statistic that really matters is how many goals each team scores. Details like percentage possession, the number of passes completed, the number of fouls and the number of shots on target can help to provide context about how the game was played, but one moment of genius that leads to a goal is infinitely more valuable than all of them.
FIFA have introduced a number of new statistics for the 2022 World Cup, most of which appear to be both pointless and impossible to understand. The new statistics include the number of times a player receives the ball between midfield and defensive lines and the number of “midfield line breaks”. I’m going to make a bold prediction now and say that there will be close to zero correlation between the teams that perform well in these metrics and the teams that go far in the tournament.
The most useless of all football statistics is a relatively new innovation - Expected Goals (xG). xG attempts to calculate the number of goals each team would be expected to score from their attempts on goal based on various factors (such as the distance to goal, angle of the shot and positioning of defenders). Slightly dubiously, this statistic is then used to suggest whether a team should have scored more or less goals than they actually did. Very dubiously, it’s used to suggest which team played better.
Whilst the calculations may well be statistically valid, the use of xG as evidence of virtually anything renders it useless. It fails to appreciate many of the factors that make football interesting - the different styles of play (several wasteful, speculative shots from distance is likely to generate more xG than patient build-up play resulting in one clear chance), the pressure on the players at any given moment of the game, the decision making leading up to a chance (was the ball passed to the player in the best position to score?), the quality of the goalkeeper (attackers may attempt a more difficult finish against a better goalkeeper), injuries to players, plus a thousand other factors that could influence human ability to make decisions and execute a technical skill - from the volume of the crowd to the colour of the goalkeeper’s shirt.
In vague defense of xG, over the course of a full season it may be possible to use xG to identify whether an individual player has performed well or badly in front of goal. We can assume a player with more actual goals than xG is a good finisher, while a player with less actual goals than xG is wasteful. Even so this can be misleading for many of the reasons listed above.
In the recent World Cup game between Argentina and Qatar, the xG was Argentina 2.29 - Saudi Arabia 0.15. All of Argentina’s other metrics were positive - more possession, more passes, better passing accuracy, more shots, more shots on target, fewer fouls committed, fewer yellow cards. Saudi Arabia won 2-1. It was exciting because one of the tournament favourites - and the best player in history - were beaten by a massive underdog, not because the statistics were weird.
Regardless of its limitations, my primary issue with xG is philosophical. xG seeks to reduce football to an exercise in statistics. Football derives its joy from the fact that it’s a game played by humans who bring with them both mistakes and moments of genius. I have no idea whether Martyn Woolford underperformed or overperformed vs xG over the course of his career, but I do remember how I felt after his goal in the 85th minute of the League One Play-off Final at Wembley in 2009. Nobody cried tears of joy that day because of the statistics.
Football is a game played on a football pitch, not in spreadsheets. Brian Clough famously said that “if God wanted us to play football in the clouds, he’d have put grass up there.” Similarly, Microsoft chose not to put grass in excel.
We can learn a lesson about cybersecurity metrics here. Primarily this: cybersecurity happens in real life, not in spreadsheets or slide decks. It’s impacted by real humans who are unpredictable. There are thousands of factors outside of our control. We must not allow metrics to cloud the ultimate question: did we get breached or not?
To the best of my knowledge, no metric has ever directly prevented a breach. Yet we often persevere with the time-consuming collection of dozens of metrics, which we then spend hours arguing about. Very rarely do we use metrics to help us make decisions. When was the last time a metric drove a genuine reduction of risk in your organisation? How often do those metrics tell us something useful? And, most worryingly, do those metrics plaster over gaps in our defences?
Just like you can’t win a football game with statistics, you can’t succeed in cybersecurity based on metrics. However, they can be useful if we use them effectively. The following principles are important:
Ultimately, the number of breaches is the only metric that matters. Not being able to detect breaches is like a football team playing without knowing how many goals the opposition have scored.
Only use metrics that genuinely tell you something about how well you are preventing breaches. If you can’t explain in one sentence how the thing being measured prevents breaches - the metric is likely to be pointless.
Don’t allow metrics to generate a false sense of security. A football team can win every tackle and face no shots for 89 minutes, but will still concede a last minute goal if the defenders stop concentrating.
Above all, we should remember that statistics can never give us the full picture. Use metrics as evidence, allow them to illustrate your argument, but never let statistics become the story.