Unai Emery: the xG whisperer

football
statistics
Unai Emery
Aston Villa
Premier League
Part one: Why should anyone care about xG, and which managers consistently outperform xG?
Author

John Knight

Published

5 January 2026

Aston Villa are having an incredible season. Their first four Premier League games resulted in two points and no goals. They followed that up with a stinker at Sunderland where they struggled to a draw against a team that had 10 men for an hour. At that stage, a relegation battle looked more likely than a European spot - let alone a title challenge.

And then they embarked on a truly absurd run of results: 17 wins from their next 20 games in all competitions, including a sequence of nine consecutive one-goal wins. Villa’s title hopes took a huge blow with the recent 4-1 thrashing at Arsenal, but the fact they were even in the title conversation was a ridiculous turn of events.

Now, anyone who has taken the slightest glance at the underlying statistics of this year’s Premier League will be aware that Villa have been sustaining a monumental overperformance relative to expected goals (xG). After 20 games, they have the 2nd-most points, joint with Man City, but only the 14th-highest xG.

Premier League 2025-2026
5 Jan, 2026
Team Pld Pts xG Diff GD
Arsenal 20 48 21.5 26
Manchester City 20 42 15.8 26
Aston Villa 20 42 −5.3 9
Liverpool 20 34 8.6 4
Chelsea 20 31 6.4 11
Manchester United 20 31 9.0 4
Brentford 20 30 4.1 4
Sunderland 20 30 −11.4 2
Newcastle United 20 29 6.9 4
Brighton & Hove Albion 20 28 4.1 3
Fulham 20 28 −3.3 -1
Everton 20 28 −6.9 -2
Tottenham Hotspur 20 27 −6.3 4
Crystal Palace 20 27 5.5 -1
AFC Bournemouth 20 23 1.3 -7
Leeds United 20 22 0.1 -7
Nottingham Forest 20 18 −5.2 -14
West Ham United 20 14 −14.0 -20
Burnley 20 12 −21.2 -19
Wolverhampton Wanderers 20 6 −9.9 -26
xG data from FotMob.

Depending on your slant, this may be because Unai Emery is a tactical savant who defies the laws of xG, or because Aston Villa have experienced some outrageously good fortune and are due for a big regression to the mean. The truth probably lies somewhere between those two poles, but why should we even care about xG in the first place?

Despite having been around for more than a decade now, xG continues to suffer from a serious PR problem. In statistical circles you will often hear George Box’s maxim, “all models are wrong, but some are useful”. And this really is an important concept to digest.

Just because an xG model says a chance was worth 0.4 goals, or that a team had 1.3 goals’ worth of chances during a game, does not make that some sort of ground truth. Probability is an abstract concept. Anyone can watch the game with their eyes and judge that a chance is better, or worse, than the xG model rates it.

But is xG useful? That’s really up to you, and whether you find a use for it. One way to judge xG’s usefulness is to look at whether it is predictive.

Assessing xG’s predictive power

My dataset for this study used FBRef data from the ‘big five’ European leagues (England, Spain, Italy, Germany and France) in the seasons 2017-2018 to 2024-2025. FBRef uses Opta’s xG model; there are others to choose from, and Python Football Review made a nice comparison. Overall, I don’t sense that the choice of xG model is going to make or break this study.

To assess predictive power, I used a snapshot of each team’s points, goal difference (GD), and expected goal difference (xGD) after each game of each season, and predicted their points-per-game (PPG) over the remainder of the season.

First, I fit three simple linear regressions - one using only points, one using only GD, and one using only xGD - and plotted the \(R^2\) for each model as the number of games played increases. \(R^2\) tells us the proportion of the variance in the dependent variable (in this case, PPG over the rest of the season) explained by each model.

SLR = simple linear regression. Big 5 European leagues 2017 to 2025 (2019-2020 Ligue 1 excluded due to COVID-shortened season).

The lines follow a similar shape: as expected, the predictive power increases as the sample size increases, stabilising after about 10 games. Then it declines after around 23 games, because with fewer games remaining future PPG naturally becomes more noisy.

You will notice that xG is consistently above GD, followed by points. If you literally didn’t know anything else about each team, the three measures give a similar approximation for predicting future points, but xG is the best.

What about when we use all three together? I fit a multiple regression model with all three variables, and plotted each variable’s coefficient, standardized to account for the different units of measurement. The results can be seen below. The second plot shows the marginal \(R^2\) for each variable in a multiple regression setting - in other words, how much extra information do we get by adding this variable to a simpler model involving only the other two?

MLR = multiple linear regression. Big 5 European leagues 2017 to 2025 (2019-2020 Ligue 1 excluded due to COVID-shortened season).

Marginal R² measures the additional proportion of variance explained when the variable is added to the reduced model.

Now we see a more pronounced difference. When the three variables are used in concert, xG is by far the most useful of the three. These results suggest that xG offers some information that the other two variables do not convey.

These are simple models of course, and people can (and do!) create far more complex models to assess team strength and predict future performance. But added complexity generally involves diminishing returns, and it is handy to have a simple statistic that can give you a reasonable indication of how good a team is. This is why people tend to use xG as a quick & dirty indicator of which teams have been lucky or unlucky so far each season.

And so we return to Aston Villa. What can we expect from them in the second half of this season? Based on the above, it would be fair to conclude that Villa’s xG (14th place) is a better reflection of how good they are than their league position (3rd place). But what about the Unai Emery factor? Is there some sort of magic in Emery’s tactics that enables his teams to defy the xG gods?

Which managers have historically outperformed xG?

I used the same dataset (‘Big 5’ leagues, 2017-2018 to 2024-2025), this time also including UEFA competitions, and compared the cumulative xG for each manager with their actual goals, for and against. The following scatterplot compares each manager’s attacking overperformance (x axis) against defensive overperformance (y axis). So the top right quadrant of this plot is the place to be. Bottom left…not so good.

Big 5 European leagues plus UEFA competitions, 2017 to 2025. Overperformance = difference between actual goals and expected goals.

The names in the desirable upper-right quadrant are certainly some of the names you would identify as elite managers. Interestingly, Pep Guardiola exceeds xG entirely on the attacking end, while Diego Simeone and Carlo Ancelotti tend to excel defensively. To reiterate, this is a measure of goals relative to expected goals; it does not measure whether a team has high or low xG in the first place, although it is obvious that managers in charge of wealthy clubs like Man City or Real Madrid are going to be above average in xG and other measures.

One problem with using cumulative scores is it may reward quantity rather than quality. This is why the upper-right quadrant has more outliers, whereas the bottom-left is more bunched: if you are consistently underperforming your xG then it’s unlikely you will continue being employed for a prolonged period of time. To address this, the following plot shows the same measures but standardized to account for the number of games:

Big 5 European leagues plus UEFA competitions, 2017 to 2025. Overperformance = difference between actual goals and expected goals.

The standardized plot reveals some interesting names in each quadrant - maybe I’ll come back to them in a future article. But this study is about Unai Emery, and you will have noticed that Emery occupies a prominent position in the upper-right quadrant, exceeding his xG in both attack and defence. In terms of total overperformance relative to xG during the study period, Emery sits third:

Top 10 managers by total xG overperformance
Big 5 leagues plus UEFA competitions, 2017-2025
Manager Matches Att Def Total per Match
Pep Guardiola 390 109.2 0.7 109.9 0.28
Diego Simeone 378 39.0 68.8 107.8 0.29
Unai Emery 359 59.8 45.5 105.3 0.29
Carlo Ancelotti 341 29.8 61.7 91.5 0.27
Ernesto Valverde 251 41.0 45.0 86.0 0.34
Jürgen Klopp 341 41.6 34.6 76.2 0.22
Bruno Génésio 256 48.1 26.5 74.6 0.29
Hansi Flick 128 49.4 14.8 64.2 0.50
Mauricio Pochettino 224 44.5 17.8 62.3 0.28
Mikel Arteta 258 42.8 19.4 62.2 0.24
Overperformance = difference between actual goals and expected goals. Own goals not included.

It is worth clarifying: none of this is necessarily causal evidence of being a good or bad manager. The big names generally manage the richest teams with the best players, and it could end up as a self-fulfilling prophecy if we anoint them as great managers because those players provide an edge at both ends of the pitch.

Nonetheless, it is always reassuring when the consensus best managers such as Guardiola show up highly on a metric, and it is worth adding that while Unai Emery has managed some big clubs (PSG, Arsenal) during the study period, a large chunk of his games have been with Villarreal and Aston Villa.

To recap part one, we have shown that xG is a good predictor of future performance, and we have seen that Unai Emery has outperformed his overall xG in previous seasons. In the next part, I will break down Emery’s previous stints by club, and drill down a little bit on where this xG overperformance comes from, including which players have been the hottest finishers under Emery, whether goalkeepers have played a big role, and how Emery teams vary according to game state.

© 2025 John Knight. All rights reserved.