‘Passes’
dominate the bulk of recordable events of a football match, usually numbering
in the hundreds as compared to ‘goals’ or ‘shots’ for example which rarely ever
surpass 7 and 30 respectively. Us football fans have become accustomed to
seeing references to so-called “passing statistics” such as Xabi Alonso
completing a record number of passes with Bayern Munich or Sergio Busquets
having 99% passing accuracy in a match. When Arsenal recently signed Granit
Xhaka, the media’s coverage was dominated with figures of how he was amongst
the top 5 of the Bundesliga’s “Completed Passes” tables. That’s all very well;
but with the richness of information available, are we truly limited to such
(no offense) obvious conclusions and interpretations?
The recorded
information on passes is now greatly detailed, with not only the “passer” and
“recipient” being recorded, but also the time at which it occurred, the
starting and finish coordinates, the type of pass (long, short, aerial, through
ball, etc.), etc. Surely there is more information to be uncovered.
Gyarmati, Kwak and
Rodriguez (2014) have a creative take on the problem. They organise the pass
data by what they call 3 passes-long
motifs, defined as distinct sequences of 3 passes between players
regardless of their identity. In total, there are 5 motifs:
- ABCD
- ABCB
- ABCA
- ABAC
- ABAB
Take a moment to
make sure you understand exactly how the information is being processed. In the
end, the authors are left with a count tally for each match counting how many
times each of the 5 motifs occurred.
The authors’
reasoning is that by understanding the motifs’ distribution for different
teams, inherent information about a team’s playing style will become apparent. It
seems like a reasonable intuition, if we consider for example that ABCD is a
direct build-up passing sequence involving 4 different players, while ABAB most
likely reveals a patient build up where 2 players give the ball back and forth
in the style we usually attribute to Barcelona or Bayern Munich.
Indeed,
Barcelona seem to make extensive use of “patient” sequences like ABAC, ABCB and
ABAB when compared to other teams; and significantly less use of the “direct”
motif ABCD.
NOTE: Remember that when things
are in 2 dimensions we can visualise it and naturally observe this kind of
results, but the true value of these methods is when analysing data in higher
dimensions than 3.
Viewing each team
as a vector in 5 dimensions, where each entry corresponds to the z-score of one
of the five motifs, the authors
performed cluster analysis and their result yielded 4 natural groups of teams:
NOTE: The final league standings are in parenthesis
for context
Lopez and Sanchez (2015) build on the previous passing motifs approach in their article “Who can replace Xavi?”,
and turn the attention to players specifically by looking at which roles each
player is fulfilling within the team’s passing sequences. There are now 15
possible roles for a player:
He can either be the “A” in a ABAB sequence (XaviàIniestaàXaviàIniesta), or he
can be the “B” (IniestaàXaviàIniestaàXavi). Similarly, he can be the “A” in a ABAC sequence (XaviàBusquetsàXaviàMessi), or he
can be the “B” in an ABAC sequence (BusquetsàXaviàBusquetsàMessi), etc. In
total there are 15 roles each player
can be for all the motifs we already discussed.
This interpretation allows each player to be viewed as a vector in a 15
dimensional space; and the geometry (distances between players) of this space allows
for plenty of questions to be answered. A cluster analysis can again be
performed, and in this way you can see which players can fulfil similar roles
within a team’s passing combinations. Think of the applications this has for
recruitment!
In answering their question regarding who can
replace Xavi within Barcelona’s passing combinations, Lopez and Sanchez draw
out a list of the 20 players geometrically closest to Xavi’s 15-dimensional
passing motif feature vector (using data from the previous 3 and 5 seasons of
the La Liga and Premier League respectively):
Image taken directly from Lopez
and Sanchez (2015)
I really enjoyed the two articles I presented here, and not because I
believe their methodology to be the “be all and end all” of investigating
playing style or player recruitment, even though they do provide some valuable
insight. The true reason I really like them is because they provide the perfect
example of how applying math to football problems is not about the brute
computing force of computers and algorithms as some might think; but rather it
requires skill, creativity and even good old fashioned “football” sense to
understand how to quantify and aggregate raw data into useful and manageable
ways, and then apply methodologies whose outcomes can be tangibly interpreted
in the football context.
I think there is more to come from these recent approaches. Stay tuned.
REFERENCES:
- Peña, J.L. and Navarro, R.S., 2015. Who can replace Xavi? A passing motif analysis of football players. arXiv preprint arXiv:1506.07768.
- Gyarmati, L., Kwak, H. and Rodriguez, P., 2014. Searching for a unique style in soccer. arXiv preprint arXiv:1409.0308.