Review of Graph Theory and Football Prediction Part 1
- Apr 27, 2016
- 5 min read
Last week, I managed to chance upon an interesting slide show by Prachi Singhal and Umang Aggarwal of Lady Shri Ram College in their analysis of the relationship between Graph Theory and Football Prediction.
Here's the link if you wish to explore more:
http://www.slideshare.net/umangaggarwal/football-and-graph-theory
This review is split into 3 parts: the first will discuss the existing fundamentals of graph theory, followed by a review of 3 factors in prediction by Singhal and Aggarwal: Degree Centrality, Betweeness Centrality and Centre of a Graph, and the last section will provide possible suggestions for future studies in football.
However, to present this review in bit sized manner and to facilitate easier understanding, this post will touch on the first part only.
Existing fundamentals of graph theory
For those who are not familiar with graph theory, the following pictures from a Mathematical textbook will provide the foundations (Goodaire and Parmenter 2002 & Singhal and Aggarwal 2014).


As much as they are applicable in the game of football, it is vital to expand and challenge the basic assumptions of these basic concepts.
Graph and Directed Graph:
A graph in the context of football, is only as expandable or limited by the actual size of the football field. While this fact may seemingly define the obvious positive correlation between the size of the graph with the field size, it is still plausible to strictly define a graph as smaller than a field. (Think of a close cluster group of teammates).
With that said, in any given team, there can only be 1 graph for there is fixed number arcs in a graph (depending on the number of players in the pitch which will influence the size of the graph). To put this theory into play, when a football team with possession of the ball opens up in an attempt to score, the non-static graph then expands and morphs into a different shape. For any single graph, its shape can be either static or fluid depending on the characteristics of the team play.
More importantly, the directed graph presents several implications. For one, it would influence a team's dominant strategy. For instance, teams which favor long balls may have to adjust the trajectories of the ball in the match day as each stadium have their own different pitch dimensions. However in the case of possession-addicted teams like Arsenal and Bournemouth, it may be seemingly true that the size of the graph (pitch) does not affect their style of play considering they are playing off a smaller graph (close cluster of passes). Still some sort of modification and adaptation are required for the graph to reach the end point (to score a goal in the goal post). Henceforth, adaptability is vital for teams. This adaptability can be influenced by the formation and tactics of the opponent which may block off certain areas of the pitch to strictly limit the number or general trajectory of the paths.
Arcs:
To put it simply, a football player (the first arc) in a match with possession of the ball upon restart / start of a play is supposed to hit a pass to another arc (without the ball) in order to establish a vertices (a point where there's two or more interconnecting lines);
In theoretical terms, each arc is studied in isolation and assumed to be static; this however is not ideal all the time since a football team like Barcelona is well-versed and fluid in off- the- ball movements. Also, this thrash the assumption that the first node or arc is static since a player with a ball can and most of the time move with the ball. Hence, to expand on the given concepts, I believe that movement of the player with the ball is still not considered to be a second arc since he is not disposing the possession of the ball despite having the risk to do so. Hence, one possibility is to draw another line (e.g zig zag line to indicate movement with the player with the ball to distinguish two arcs, which is the starting and ending points of the movement).
Also, there is a limit of arcs within the graph (a football field). Ideally, the initial arc (player with the ball) should keep the ball and pass to the 10 other arcs who are his team mates; however due to opposition formation (the art of trapping, containing, high pressing and divergence) traps and limits the options of passes. With that said, with proper analysis of an opposition tactics and formations, it is in fact possibility to control the opposition's general passing trajectory and expose this predictability; as such, normally in most football matches, adequate and the right kind of pressing would force teams to play a back pass to the goalkeeper. By forcing the opponent to play like this suggest a form of predictability which must be exploited. For instance, a striker may be lingering behind the last man in wait to exploit the back pass. Whatever it is, the intelligence to read the game as opposed to winning possession by brute physical force is key in the game of football and graph theory. Indeed, this has already been done in most optajoe's statistical works.
Path and Directed Network:
Most studies and extended coding systems predicated on graph theory seek to expose the quickest way and path to reach a destination; in the case of football, the end point is to score a goal; the opposition's goal post then becomes an arc itself - the final destination. This line of thought breeds the notion of long ball system and why throw-ins are thrown far away towards the opposition side.
However, as much as scoring a goal is the key objective to win the game, there are de facto obstacles to scoring a goal; for instance getting past a 5 men midfield or defense is not easy and requires re-routing of the ball movement (across different sides of the pitch). Hence, the analysis of ball and players' path are imperative to understand how a team plays; e.g. when do the players release a line break pass (a pass which bypass lines of players) and how do they do it (e.g long or short passes).
With that said, there can be infinite number of paths available to score a goal; to strictly define an ideal number of paths to the creation of a goal is wrong and is correlated to the tactics and comparative advantages of the players (technical abilities, physical attributes, etc). While setting up a hypothesis in the correlation in the number of path to the occurrence of a goal is useful, it may not define the characteristics of all teams in a general context. For instance, Barcelona loves to wait and lure the opposition players into moving outside their allocated zones of marking before making a pass or even setting up the vital arcs (stationary or not) to play a 1-2 prior to a goal. Still, by drawing and studying the range of average number of paths to goal is one way to define a team's dominant strategy (Think the contrast between West Ham and Arsenal!).
With regards to directed network, the integer weight attached to the path can be defined in many ways: weight of the pass, timing of the pass, category of the passes (first option pass, second option pass,..., last option pass), and the order sequence of the path.
In all, I hope that these concepts are easy to understand and applicable to the game of football. I will continue the other parts by next week and hopefully, it would give us a wider perspective on how we look at football.
References
Singhal, Prachi., and Aggarwal, Umang. Aggarwal. 2014. Graph Theory in Football. Retrieved from http://www.slideshare.net/umangaggarwal/football-and-graph-theory
Edgar. G. Goodaire and Micheal M. Parmenter. 2002. Discrete Mathematics with graph theory 3rd Edition

Comments