Introducing contexG

football

statistics

A composite, context-adjusted rating to summarise each team’s performance in a football match.

Author

John Knight

Published

28 February 2026

I have created a new football metric called contexG. What is contexG, and what was the motivation for contexG?

When discussing expected goals (xG) among football fans, it is not uncommon to encounter a degree of scepticism. Some common complaints might be along the following lines:

“There’s no way that chance was only 0.18 xG.”

“xG just doesn’t reflect the great situations we were in.”

“But we had 10 men and we were defending the lead for an hour.”

And these are perfectly valid criticisms! xG is just a model, which people may or may not find a use for, and that’s ok. I have always advocated looking at xG alongside other statistics such as attacking possession, total shots, corners, and importantly, game state: Who was leading? Were there any red cards?

This was the motivation for contexG. I wanted an xG-like, single score for each match that accounted for more than just the quality of the shots taken.

xG models are trained by analysing a large dataset of shots and predicting whether a given shot will result in a goal, based on the location and other factors. Indirectly, football analysts have found xG useful as a proxy for team strength, because it removes the noise of finishing luck.

An important difference with contexG is that I trained it on future team performance. Team A may have created more xG in a match than Team B, but sometimes it feels like Team B’s xG contains a greater signal of being a strong team (obvious example: many people ignore xG from penalty kicks). The task was to create a model that appropriately weights match data to reflect those underlying signals based on historical data.

Building the model

There are different ways to approach this, but the method I chose was to create a walk-forward model that uses Poisson likelihood to model home and away goals, where expected goals are determined by a baseline scoring rate, home advantage, and evolving attack and defence ratings for each team.

The ratings update after each game using a learned performance signal constructed from match-level features. Originally this used the Fotmob match variables but alas, since Fotmob recently decided to aggressively defend their API with Cloudflare (no complaints - it’s their right to do that) I have switched my data source to Understat. It’s a smaller set of variables but it’s questionable how much incremental gain you get from adding more and more features with a high degree of collinearity, and I’m fairly happy with the model’s output.

So, what do I mean by performance? The main elements that went into the model can be categorised into three buckets:

Shots

This is the xG component of the model, since every shot has an attached xG (whose value comes from Understat’s xG model). My hypothesis was that high-xG chances can be quite noisy, and there’s more luck involved in creating a single 0.8 xG chance than, say, eight 0.1 xG chances - think of a goalkeeper spilling a long-range shot and leaving an open goal.

Using total shots would be an option, and I actually think that can be a useful metric - it’s what we used for gambling models way back when xG wasn’t public - but then that shrinks all shots to the same value, even hopeless potshots from distance. Perhaps there is a nice compromise to be found by shrinking or truncating the value of each shot?

Dominance

This is where Understat’s data is a bit less rich than Fotmob’s. Nonetheless, they have Deep Completions which, if you could just choose one stat, isn’t a bad one to have. They also report PPDA (passes allowed per defensive action) which is arguably more of a stylistic indicator than something that obviously reflects good attacking or defensive play, but it still seemed worthy of inclusion.

Game State

The important factor that raw xG always misses. Given a set of match stats, if you are then told that one team led from the 5th minute onwards, or that one team had 10 men for an hour, would it change your opinion of how the game went? Of course it would! So I calculated the game state in every minute of every match and included that in the model. Remember, this is when training the model how to update each team’s rating for predicting future games, NOT training it on the result of the same game.

What did the model say about the relative importance of each stat?

The model was trained by first tuning hyperparameters and applying an appropriate level of regularization to prevent overfitting, and time order was preserved to prevent leakage. The final parameters were interesting but fairly intuitive.

I tried various methods of transforming xG including log, square root, or just capping xG at a certain threshold. And it was the capped xG that won out - perhaps surprisingly, the optimal cap was found to be just 0.15 xG! So in other words, every shot of 0.15 xG or higher was counted as 0.15. I’m comfortable with this result: shot volume is the most important factor, high-xG chances being quite noisy, but you also can’t just count any old junk as the same. This method means the number of at least fairly decent shots a team can create is probably most indicative of its attacking quality. And vice versa for defence, of course.

In terms of shot characteristics, the results are what you might expect: open-play chances are more valuable than set-piece chances, which are more valuable than penalties (remember, this is after chances have already been truncated to a 0.15 xG maximum).

Off-target shots are more valuable than saved shots, which are more valuable than blocked shots. I’m not sure that is the order I would have guessed, but recall that this is when comparing shots that have the same xG: perhaps a shot that gets saved betrays some information about the quality of the chance (i.e. easier to save), more so than when it goes wide or over the bar? Without knowing the full nuts & bolts of the Understat xG model, it is hard to speculate.

Deep completions were a very strong indicator of team quality. Nobody should be surprised about this: good teams get the ball into threatening areas, and this is less noisy than whether they turn that into a big chance or a goal.

PPDA was an interesting one, because I wasn’t sure if this would associate more with a team’s attacking rating or defensive rating. Turns out, it’s almost exactly a 50:50 split! But PPDA is definitely an indicator of a good team (low = good; you are stopping the opposition from stringing passes together).

The game state coefficients are as one would expect. The biggest boosts are given to teams who are down to 10 men while leading or drawing, and the biggest reductions are vice versa: teams who are trailing or drawing while the other team has 10 men. You are expected* to dominate the game in these situations.

Some real contexG examples

Let’s look at some actual matches. There’s no better place to start than Friday night’s Wolves v Aston Villa match. Understat gave the final xG score as Wolves 1.51 - 1.31 Aston Villa. What does contexG say?

So, why the discrepancy? In terms of shots, Villa had 14 to Wolves’ 9. Three of Wolves’ shots were high-value xG (Gomes 0.44, Mosquera 0.39, Gomes 0.33) that were truncated at 0.15. Villa only had two high-xG shots: Onana 0.41 and Luiz 0.3. Adding in that Mosquera’s shot was from a set piece, Wolves lose a lot more of their raw xG than Villa do.

Deep completions were 7-1 in Villa’s favour. This gives Villa’s contexG a huge boost. They also had 5.33 PPDA compared to 17.1 for Wolves.

At one point during the match the commentator said “if you watched this match without seeing the league table, you would never guess that it’s Wolves who are down at the bottom”. And maybe you wouldn’t guess that they were 41 points behind Villa going into the match, but in spite of the score, I think on the balance of play you would have guessed that Wolves were the inferior team. Aston Villa, away from home, had the lion’s share of the play.

For an extreme example of contexG’s contextual adjustments, consider Man City 1-1 Chelsea from the 4th of January. The xG in that game was 1.03-2.04 in Chelsea’s favour, but contexG scores it as 1.71-0.91 in City’s favour.

Why? Firstly, City led the game from the 41st minute, right up until Enzo Fernandez’s last-minute equaliser. This means Chelsea are expected to be chasing the game and creating more attacking output.

Deep completions were 7-2 for City. PPDA was 8.2-13.2. Three of Chelsea’s shots were very high xG: Fernandez 0.95 xG, Fernandez 0.67 xG*, and Neto 0.64 xG. I watched that game, and there was no doubt City threw away two points in a game where they were the better team. This is a good example of contexG’s merits.

*Note that Understat and other sites make an adjustment for two chances in quick succession when calculating match-level xG. I haven’t done that with contexG, but the truncation of individual shots’ xG reduces the associated effect.

Limitations

I’m not claiming to have reinvented the wheel with contexG. It’s just a handy way of summarising what you can already see by browsing the match stats, or even with the eye test by watching a game of football.

I should emphasise that contexG does not account for home advantage or opposition strength. It just reflects what happened on the pitch, much like xG does.

Set pieces are a tricky area. While set pieces are generally considered more noisy, and therefore carry lower signal, there is a danger of underestimating teams that are consistently strong at set pieces when all teams are aggregated in a single model. Could set-piece skill be rated separately, with a different rate of decay compared to open-play ratings? That’s something I will continue to think about, but I am always wary of overfitting.

Finally, but importantly, contexG can only capture what’s in the Understat data. I hope to develop the model in future iterations to incorporate more stats and more competitions, if I can add more data sources. But it’s a rough time for football analysts, and I am grateful that we at least have access to data from the likes of Understat.

Season-long contexG ratings

Having a composite rating for each match allows us to summarise team performance over a season. These are broadly in line with the average xG for each team, but it’s interesting to see where the differences lie.

Team	Matches	contexGD	xGD
Premier League 2025-26 contexG Ratings
28 Feb, 2026
Arsenal	28	1.12	1.25
Manchester City	28	0.94	0.80
Chelsea	27	0.66	0.62
Liverpool	28	0.57	0.55
Manchester United	27	0.39	0.62
Brentford	28	0.20	0.36
Newcastle United	28	0.15	0.27
Crystal Palace	27	0.04	0.23
Aston Villa	28	0.03	−0.13
Bournemouth	28	−0.01	0.21
Brighton	27	−0.05	−0.05
Fulham	27	−0.09	−0.37
Leeds	28	−0.15	−0.03
Tottenham	27	−0.25	−0.33
Everton	28	−0.30	−0.23
Nottingham Forest	27	−0.33	−0.49
Sunderland	28	−0.55	−0.59
West Ham	28	−0.58	−0.60
Wolves	29	−0.75	−0.79
Burnley	28	−1.01	−1.26

Relative to xG, contexG likes Man City, Villa, Fulham, Forest and Burnley. Meanwhile, relative to xG, contexG dislikes Arsenal, Man United, Brentford, Newcastle, Palace, Bournemouth and Leeds.

It’s hard to say exactly why this is the case without digging into all the games. Some of it may be set-piece driven (see Limitations above). Overall though, it tallies quite well with my sense of the quality of the teams.

We can also take a peek at the top teams in the other four major European leagues: France, Germany, Italy and Spain.

Team	Matches	contexGD	xGD
Ligue 1 2025-26 contexG Ratings
28 Feb, 2026
Paris Saint Germain	24	1.67	1.33
Lille	23	0.73	0.47
Lens	24	0.63	0.91
Marseille	23	0.57	0.57
Lyon	23	0.56	0.41

Team	Matches	contexGD	xGD
Bundesliga 2025-26 contexG Ratings
28 Feb, 2026
Bayern Munich	24	1.81	1.95
Borussia Dortmund	24	0.75	0.53
Hoffenheim	24	0.69	0.25
RB Leipzig	23	0.60	0.75
VfB Stuttgart	23	0.49	0.48

Team	Matches	contexGD	xGD
Serie A 2025-26 contexG Ratings
28 Feb, 2026
Inter	27	1.65	1.68
Juventus	26	0.90	0.92
Roma	26	0.85	0.51
Como	27	0.70	0.67
AC Milan	26	0.50	0.75

Team	Matches	contexGD	xGD
La Liga 2025-26 contexG Ratings
28 Feb, 2026
Barcelona	26	1.78	1.54
Real Madrid	25	1.12	1.21
Atletico Madrid	26	0.91	0.54
Athletic Club	26	0.30	0.35
Real Betis	25	0.26	0.37

We see some large disparities between contexG and xG, which I am not going to dig into here, but I would be interested if anyone has any thoughts. One thing that is evident is the gulf between the top side in each league - PSG, Bayern, Inter and Barcelona - and the rest.

Moving forward, I plan to write a weekly review of each set of Premier League matches where contexG will help to give a simple numerical overview of each game. I hope you will enjoy reading them.