Applying Network Analysis to Analyzing Box Lacrosse

The following is taken from Eddy Tabone’s senior research paper from his undergraduate at St. John Fisher College in the fall of 2018.

Networks and Their Application in Sports

While typically thought of in their most common form of modeling the relationship between two variables, the concept of a graph can be expanded to describe a graph as a mathematical object that shows a relationship between one or more object. At their most basic structure, graphs consist of nodes and edges, with nodes representing the individual points of the graph and the edges being links from one node to another, representing the relationship between the nodes. Graphs can be directed or undirected, with the difference being that directed graphs can be thought of as ordered pairs of nodes with their edges including an arrow with one node pointing to the other.

A network, then, describes when there is more than one relation on the same set of nodes. Sequences of adjacent nodes in a network are referred to as paths. If the paths revisit nodes but never revisit edges, the sequence is referred to as a trail. Any sequence of adjacent nodes, without restriction, is a walk. The lengths of each of these sequences is defined by the number of edges that they have. Edges themselves can have values associated with them to represent the strength of a tie, the frequency of the interaction, or probabilities. Aside from the graphical form, connections between nodes can be displayed in an adjacency matrix, where the entry in a position of the matrix represents a tie between the column and the row in which it is located.

The nodes and edges in a graph have statistics associated with them, primarily with measures of centrality, which is has many different interpretations depending on the context of the graph but essentially it shows which nodes are the most influential to other nodes in the network. A node’s degree is the number of edges that are connected to it, and if the graph is directed, in and out degree can also be calculated as the number of edges pointing to and away from the node to another node, respectively. Betweenness centrality is, “The measure of how often a given node falls along the shortest path between two other nodes…typically interpreted in terms of the potential for controlling flows through the network.” (Borgatti et al 2013) Eigenvector centrality is the number of nodes adjacent to a given node with an extra weight given to each adjacent node in regards to its own centrality. Each of these numbers can be rescaled out of 1 for better comparison between networks and the statistics themselves. Rescaling degree centrality can give frequencies for nodes appearing in edges in the graph, which can be used to simulate random graphs and give comparisons between the statistics from the random graphs and the initial graph.

There are also statistics that characterize the behavior of the whole network. The furthest distance between two nodes in the graph, determined by the number of edges in the path, is the measurement of the graph’s diameter. A graph of smaller diameter would be regarded as one where more nodes are connected to more other nodes throughout the graph. Knowing path length in this context can also lead to calculating a mean distance for a network, which would be the average number of edges in a path it takes for each node to be connected to each other node. Similar to diameter is the notion of a graph’s density, which is the ratio of the total number of edges present in a graph to the number of edges that could potentially exist in the graph if every node was connected to every other node. The denominator of this ratio is calculated differently for directed graphs as opposed to those that are undirected because the edges are counted differently (Ex: Node A connecting to Node B and Node B connecting to Node A aren’t the same concept in a directed graph, but they are when the graph is undirected). When a graph is directed, reciprocity calculates the proportion of edges that have symmetry, which in this context is the proportion of node pairs that have edges where, continuing the example with Nodes A and B, Node A has an edge out to Node B and an edge coming back in from Node B. Global transitivity of a network is the probability that adjacent nodes are connected and is sometimes used as a clustering coefficient to determine local neighborhoods in a graph, which are all of the edges that connect to each single node.

A social network is a common genre of network that visualizes interactions between people as nodes. The edges would represent a contextual relationship between the nodes, such as the act of knowing one another, starting a conversation between two people, one person calling or texting another, etc. Players sharing the floor in the same possession can be a network with the players being the nodes and the edges being the act of being on the floor in the same possession in the other player, and it can be expanded categorize the players by the team they play for to seek out potential player-on-player matchups that may stretch throughout a game or a series of games against a consistent team. Adding scoring success as another attribute of a network could show which players play better with each other and which matchups between players on the two teams are the best to exploit to optimize scoring chances on offense and minimize them on defense.

Networks have seen several applications in other sports, and since box lacrosse shares characteristics with other sports, and since there is little to no history in lacrosse analytics, the other sports can help serve as guide to driving lacrosse analytics forward. From the hockey perspective, the nature of assists, both regular and shot assists, which are the pass before a shot, can be modeled with directed networks. Steve Burtch from Sportsnet presented on Passing Networks in hockey during the RIT Analytics Conference in 2015, presenting centrality measurements on passing data in games played during the 2014-15 season by the Toronto Maple Leafs and New York Islanders. In a slide referring to betweenness centrality, Burtch suggests that players with high betweenness scores being removed from the network could present a danger to the network, so opposing teams could potentially use this information to target which players to focus their defensive strategy on in a game or a playoff series. The PageRank statistic contextually represents the probability of a player being involved in a pass (or shot in the Burtch data) in a network, which can reveal which players are the players that have more of a role in the offense throughout games and seasons. (Burch 2015) Evan Oppenheimer, a Hamilton College graduate, adds further explanation to what betweenness centrality can mean in terms of hockey when nodes represent players and directed edges represent which player primarily assisted on another player’s goal. Oppenheimer writes, “A player will have high betweenness centrality score if: The player scores and assists on a lot of goals. The player scores or assists on many distinct players’ passes or goals. A player’s teammates rely on the player for their scoring such that the teammates’ goals and assists tend to only occur with that player directly involved.” (Oppenheimer 2018) While a statistic like traditional plus-minus can be non-telling of a player’s value to an offense simply because they are on the ice when a goal is scored, betweenness in a network can better show which players are more closely involved in scoring goals.

Page RankBetweenness
Figures 1 and 2: Page Rank vs Betweenness for The 2014-15 New York Islanders

Jennifer Fewell’s 2012 article, Basketball Teams as Strategic Networks, describes basketball as, “based on a series of interactions, involving a tension between specialization and flexibility…[that] is dependent on a connected team network.” (Fewell et al 2012) For their scope, they used ball movement among nodes as edges, with the nodes being players represented by their positions in the lineup and the outcomes of the play, testing if they could model a basketball game, “through a transition network representing the mean flow of the ball through these sequences of play (a stochastic matrix) and secondly whether individual teams have specific network signatures.” (Fewell et al 2012) Some of the primary metrics that they analyzed were path lengths, which are simply the number of passes in a possession, path flow rate, which is the number of edges per second in the possession, and centrality measures such as degree centrality to count how frequently a player was involved in a ball movement during a game. The graph below from the Fewell paper visualizes all of the ball transitions recorded throughout the 16 games that they tracked during the Conference Quarterfinals in the 2010 NBA Playoffs, with the three possession starting outcomes on the left, the five player positions in the middle, and the many different possession ending outcomes on the right. According to the figure description in the original paper, “Red edges represent transition probabilities summing to the 60th percentile,” (Fewell et al 2012) which are the most frequently seen edges in the matrix in addition to edges that were greater in frequency being weighted in size more largely. From “eye-test” observations of basketball, it would make sense that most possessions start with an inbound to the point guard, who also makes the most passes, rebounds are most frequently grabbed by guards and forwards, and the most frequent shot events are two-point field goal attempts.

Figure 3: Event Network of Ball Movement Types from Basketball Teams as Strategic Networks


The scope of this research is the first two games of the 2018 NLL Finals between the Rochester Knighthawks and the Saskatchewan Rush, which took place on May 26 and June 2, respectively, with the first game taking place at Sasketel Centre in Saskatoon and the second game taking place at Blue Cross Arena in Rochester. Saskatchewan dominated the first game, winning 16-9, including a 7-0 run between the end of the second quarter and the start of the third. For the game, they out-shot Rochester 84-51, 64-44 for shots on goals, and acquiring 80 loose balls to 55 for Rochester among other stats. In game 2, Rochester won 13-8, out-shooting the Rush 80-72, although 53-55 for shots on goals, and acquiring 90 loose balls to Saskatchewan’s 81. The two games appeared substantially different from the conventional “eye-test” and the differences in the aforementioned “Box score statistics” backed up this difference, which makes these two games a good platform to test if there were statistically significant differences between the two games that reflect the substantial differences.

The tracking process was focused on all significant movements of the ball throughout the game. With the context of the being the how the offense develops and interacts, these significant ball movements will be in the form of a shot, a pass, or a player physically moving the ball in their possession to get a better look at a shot or pass or avoid a defender. Tracking the ball moving because of things such as faceoffs and long loose ball battles after deflections or rebounds would be redundant because it would not have any bearing on the interactions between the players in the game. The locations on the turf and shots at the net weren’t exact. Instead, they were based on a location map provided by the team to generalize locations to better observe if greater tendencies existed, shown in the image below with the labels for each of the grids (the defensive side of the floor is the same, except with the added designation of “D” for defense). For the change from transition to set offense in a set possession, the assumption had to be made that the transition ended once the last defensive player touched the ball.

For the event data, the attributes that were marked for each event were the on ball player, the event they performed, where the ball started and finished during the event, whether or not the event was successful, if the event was part of the set play or the transition from defense to offense, and a time stamp of the shot and game clock to accompany the lineup strength of each of the teams and the possession number in the game. For passes, the pass recipient was tracked. For any play resulting in a loose ball, the interferer, the result of the loose ball in a context of which team collected the loose ball, and the player to possessed it the loose ball after the scrum for it was over. For shots, if the shot was blocked, the blocker was recorded as the interferer, and rebounds were recorded with the location it was touched and the team and player who received it. The game score for each team and goal differential, the numbered event each event is in the possession, the numbered pass and shot each one is in a possession, if a rebound was recovered by the team on offense, and the line strength at the moment of each play were also each calculated automatically as part of the tracking process. The variables for the tracking process are featured in the table below.

Figure 4: Map of Net (Left) and Turf (Right) Locations

There were limitations that presented themselves through the tracking process, with the biggest one being the reliance on the TV broadcast for the game film. The tracking process was harmed by things such as changes in camera angles mid play, sudden switches to instant replays that removed the game and shot clock from the screen, returns from commercial that didn’t show the beginning of some possessions, random cuts from the game to the in-arena video board, video lagging or the screen going black in the middle of a play, etc. For the first game, there was the added wrinkle that the game and shot clocks weren’t in sync with those in the arena, so those were subject to inaccuracy if the shot clock on the field wasn’t in the view of the broadcast camera. Fortunately, between the two games, there was only one possession that was completely inconclusive to the point that it had to be omitted from the dataset. Electing to not track the Saskatchewan’s defensive lineups was due to the lack of familiarity with their roster in addition to not having the ample camera angles to determine their players on most possessions.

Results and Discussion

The initial findings for the game tracking are shown in the table below.  The first and most telling difference between the two teams and between the two games is that the team that won each game had more possessions, more passes, and more shots. While these findings, especially for shot differential in game two and possession differential in game 1, aren’t representative of a substantial difference between the two teams, it’s telling that from game 2 and game 1, the shot difference was -15 for the Rush and +13 for the Knighthawks, while the pass difference was -109 for the Rush and +81 for the Knighthawks, which stands in favor of a game plan that would encourage more shot attempts and ball movement through passing.

Game 1Game 1 Game 2Game 2
Set Possessions56615568
Transition Possessions23162419
Total Possessions79777987
Table 2: Initial Recap Information from The Two Games

Passing serves as the event that indicates the edge between nodes, which in this case can both be represented as players or locations on the field. Starting with all of the passes in the game, the graphs are shown below, with the darker arrows representing the more heavily weighted edges and therefore the more commonly made passes between two players. What first shows itself from a glance is that the heavier weighted edges remain so mostly for each team between the two games, so as far as passing goes, through the visual, it would suggest that there wasn’t a noticeable difference in what passes were made during the two games.

Figures 5-8: Weighted Graphs of All Passes for Both Teams in Both Games

The next wrinkle, if by glance the networks between the two games seem to have similar weights in how the ball the moving from player to player, would be to trickle the information down and to see if while the passing tendencies were the same in the two games, the passes that were setting up shots differed. The graphs below show a graph of shot assists, which would be the pass before a shot is taken. The concept of an assist was treated as such is done in the NBA to further specify that it was the pass that set up the shot as opposed to a player receiving a pass, moving with the ball, and then shooting, which can be the case with a box score assist in the NHL. Unlike the graphs for all of the passes by each team in each game, the edges are not simplified weights, meaning each arrow represents each shot assist. The graphs for the Knighthawks show the increase in shot assists from 32 in game 1 to 47 in game 2, while the graphs for the Rush show the decrease in shot assists from 51 in game 1 to 42 in game 2. While shot assists themselves aren’t the end-all-be-all way to generate a quality shot, the nature of their ability to most quickly move the ball with a short window of opportunity for the defense or goalie to adjust to the new ball handler makes them an ideal play type for creating quality shot looks, especially when in a set offense and trying to cut to the net for a shot isn’t as easy.

Figures 9-12: Graphs of All Shot Assists for Both Teams in Both Games

The next table, found below, of interest is the full network statistics associated with each of these graphs. The first things that stands out is how the density between the graphs of all passes and those of the shot assists are substantially different, with those for shot assists being lower, as with reciprocity, where it is much less likely for two nodes to reciprocate each other in the shot assist graph, which stands to reason since defensive players are involved in way less offensive plays than forwards. The decrease from 2.019 to 1.762 in mean distance for the Knighthawks in terms of all passes, mirrored by the decrease from 2.329 to 1.946 for shot assists, can be interpreted as more involvement from players in passing to more players both in general and in setting up shots.

All PassesAll PassesAll PassesAll PassesShot AssistsShot AssistsShot AssistsShot Assists
Game 1Game 1Game 2Game 2Game 1Game 1Game 2Game 2
Edge Density1.3721.0331.0161.3000.1670.1050.1370.154
Mean Distance1.8792.0191.8351.7621.7422.3291.6201.946
Table 3: Network Statistics for Each Team, Each Game, And Both Graph Types

Statistics on the vertices of the network are shown in the table below, ranked by their degree percentage. These will be for the shot assist networks only for organization purposes. The most central nodes in the network are the players that are most frequently involved in the generation of shots. Showing the top 5 in percentage of the degrees can reveal which players are involved in the most shot assists, which optimally should represent the most frequently used players if the staff is looking to optimize offensive production or point to which players should be in the most frequently used lineup. Then, below that is the same table except plotting the five most frequent locations involved in the shot assist network. High in degrees would represent most frequent shots being taken following a pass and high out degree would be the most frequent location for passes that set up shots, which could have value as a way for a defense to predict when an offense is planning to set up a shot. The most interesting thing from these results for the player tables is that the same five players appeared in the table for both games.

Game 1: Rochester
Player NumberIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Game 2: Rochester
Player NumberIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Game 1: Saskatchewan
Player NumberIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Game 2: Saskatchewan
Player NumberIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Table 4: Vertex Centrality Measures for Shot Assist Networks
Game 1: Rochester
Location IDIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Game 2: Rochester
Location IDIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Game 1: Saskatchewan
Location IDIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Game 2: Saskatchewan
Location IDIn DegreeOut DegreeTotal DegreeTotal Degree PercentageBetweennessEigenvector Centrality
Table 5: Vertex Centrality Measures for Shot Assist Networks – By Turf Location

Realistically, the results shown in the comparison between these two games with these two teams do not carry the weight of any statistical significance in the more substantial discoveries being addressed as results. With a game sample size of two games, this was to be expected, but the fact that even two games could reveal differences in a statistical measure could be the sign of a much bigger picture if there was data for an entire season of games in the league. Questions about the play style from one team to another, similarities in how teams approach their offense or defense against a specific opponent, and which teams play a style that is fit to be sustainable through a whole season could all reveal themselves with more data.

In terms of the next steps of this research, outside of acquiring more data, the biggest one would be to run simulations and compare network statistics to different null distributions. This could be possible for a single game or a single network, but those would reveal an irrelevant significance; any null distribution where each node would be equally likely to pass to another would show a different population from the one in a given game because of the disparity of how often defensive players have an offensive possession, so yes there could be ruled significance for something of that variety, but it wouldn’t add relevancy to the story that could be told from an analytical perspective in sports. It would take multiple games for multiple teams before that story would start to shape up to be able to analyze a single game and compare it to what would be expected. The second next step would be a network where the nodes are still players, but the undirected edges are the connection of two players playing on the field in the same possession, and then creating a bipartite network from that to join the two teams together, so players will connect with the teammates they’re playing with, as with the opposing players they are defending or trying to score against. Heavier weighted edges across the graph would show if a player is assigned to one player in a shutdown or exploitation role, and if this was a coach’s decision as part of a game plan, and revealed itself in the data, that would be intriguing to a staff to know that they data their analytics department is looking at the right things. 

Reference list

Borgatti SP, Everett MG, Johnson JC. Analyzing social networks. London: Sage; 2013.

Burtch S. 2015. Passing networks in hockey. Presented at: Rochester Institute of Technology Hockey Analytics Conference: Rochester, NY.

Fewell JH, Armbruster D, Ingraham J, Petersen A, Waters JS (2012) Basketball Teams as Strategic Networks. PLoS ONE 7(11): e47445.

Oppenheimer E. Whose Point is it Anyway? Using Network Analysis to Estimate Teammate Influence in Hockey Scoring. Towards Data Science. 2018 Jun 14 [accessed 2018 Dec 8].

+ posts

Leave a Reply