ELEKTROTEHNI ˇ SKI VESTNIK 91(4): 157–183, 2024
ORIGINAL SCIENTIFIC PAPER
A quantitative analysis of home advantage in top
football and basketball leagues
Timur Kulenovi´ c
1
, Jure Demˇ sar
1,2,† 1
Faculty of Computer and Information Science, University of Ljubljana, Veˇ cna pot 113, 1000 Ljubljana
2
Faculty of Arts, Department of Psychology, MBLab, University of Ljubljana, Aˇ skerˇ ceva cesta 2, 1000
Ljubljana
† E-mail: jure.demsar@fri.uni-lj.si
Abstract. Home advantage is a phenomenon in sports where teams tend to perform better when playing at
home. In this work, we explore home advantage in professional basketball and football leagues. Our results
suggest that in basketball all leagues (NBA, Euroleague, Eurocup, ABA, and Slovenian league) seem to exhibit a
significant degree of home advantage, it seems to be the highest in the ABA league and the lowest in the NBA.
In football, four of the leagues (Premier League, Serie A, Bundesliga, and Ligue 1) showed moderate presence
of home advantage, whereas La Liga showed significantly higher levels. When analysing seasonal trends, we
usually observed lower home advantage in the seasons affected by the COVID-19 pandemic. We proposed a
novel metric to measure home advantage at the individual game level, this metric allowed us to investigate
connections between potential factors (crowd impact, referee bias, and travel fatigue) and home advantage.
The crowd attendance seems to have a positive correlation with home advantage, while counter-intuitively the
opposite seems to often hold for the referee bias. A side product of our work is also an extensive and carefully
curated dataset, which we made publicly available for the whole research community.
Keywords: home advantage, football; basketball, Bayesian statistics; data science
Kvantitativna analiza prednosti domaˇ cega terena v
najmoˇ cnejˇ sih nogometnih in koˇ sarkarskih ligah
Prednost domaˇ cega igriˇ sˇ ca v ˇ sportu je pojav, ko ekipe dosegajo
boljˇ se rezultate na domaˇ cem prizoriˇ sˇ cu v primerjavi z rezultati,
ki jih dosegajo na gostujoˇ cem prizoriˇ sˇ cu. V delu obravnavamo
prednost domaˇ cega igriˇ sˇ ca v koˇ sarki in nogometu, pri ˇ cemer
smo se osredotoˇ cili na pet profesionalnih lig v vsakem ˇ sportu.
Naˇ s cilj je bil kvantificirati prednost domaˇ cega igriˇ sˇ ca in jo
primerjati med ligami.
Zbrali in obdelali smo mnoˇ zico podatkov ter ustvarili po-
datkovno zbirko, ki je javno dostopna ˇ sirˇ si uporabi. Za
potrebe analize smo predstavili novo metriko, ki meri prednost
domaˇ cega igriˇ sˇ ca na ravni tekme, kar jo razlikuje od obstojeˇ cih
metrik, ki prednost domaˇ cega igriˇ sˇ ca merijo na ravni skupine
tekem oz. sezone.
Za kvantifikacijo prednosti domaˇ cega igriˇ sˇ ca in oceno nego-
tovosti smo uporabili Bayesovske hierarhiˇ cne modele. Rezul-
tati kaˇ zejo, da je med izbranimi koˇ sarkarskimi ligami prednost
domaˇ cega igriˇ sˇ ca najveˇ cja v ligi ABA, najniˇ zja pa v ligi
NBA. Pri nogometu smo ugotovili, da z veˇ cjo prednostjo
domaˇ cega igriˇ sˇ ca izstopa ˇ spanska liga, v preostalih ˇ stiri ligah
pa so rezultati podobni. Raziskali smo tudi sezonske trende,
pri ˇ cemer smo pri nekaterih ligah opazili manjˇ so prednost
domaˇ cega igriˇ sˇ ca v ˇ casu epidemije covida 19.
Prouˇ cili smo tudi povezavo med potencialnimi dejavniki in
prednostjo domaˇ cega igriˇ sˇ ca, kjer smo uporabili novoustvar-
jeno metriko. Tu se rezultati nekoliko razlikujejo med ligami,
v glavnem pa je vpliv gledalcev pozitivno koreliran, sodniˇ ska
prisotnost pa negativno korelirana s prednostjo domaˇ cega
Received 13 May 2024
Accepted 1 July 2024
igriˇ sˇ ca. Kljub temu izpostavljamo obstoj dejavnikov, ki jih v
raziskavo nismo vkljuˇ cili, vendar verjetno pomembno prispe-
vajo k prednosti domaˇ cega igriˇ sˇ ca.
Kljuˇ cne besede: prednost domaˇ cega igriˇ sˇ ca, nogomet,
koˇ sarka, Bayesova statistika, podatkovna znanost
1 INTRODUCTION
Sports analytics is a scientific field where state-of-the-
art data science methodology is applied to the domain
of sports. This is particularly interesting to managers
and owners of sports teams since they can facilitate
the insights gained through data analysis for informed
data-driven decision-making where each team manager
strives to make optimal decisions in every aspect of
their work to gain a competitive advantage. A better
understanding of the home advantage would allow teams
to exploit it more efficiently.
Home advantage is a phenomenon in sports that
describes the benefit that the hosting team has over the
visiting team. It appears to be present since the start of
organised football [1] and it occurs in most team sports.
The reasons for this occurrence have been attributed to
several aspects, such as referee bias, crowd effects, travel
effects, familiarity with the playing field, territoriality,
specific tactics, rule factors, and psychological factors
[1]. A lot of existing research focused on statistical
analyses of the development of the phenomenon through
158 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
time and on differences in the magnitude of home
advantage between different sports. However, to the
best of our knowledge, it is still not exactly clear how
different previously mentioned factors affect the home
advantage.
A lot of research on the phenomenon of home advan-
tage has been already conducted, particularly in football.
Pollard [2] has introduced the home advantage ratio in
football as the ratio between the number of points won
by the home team and the number of points won in total.
When we can claim with high certainty that the ratio is
higher than 50%, then we say that home advantage is
present. In the 2015/2016 season, the average value of
the ratio for the top 10 European leagues was estimated
to be 58.25± 2.95%. Results from older seasons yield
a higher ratio, meaning that home advantage seems to
be in decline [3]. Furthermore, a substantial decline
was also found in English football from 1974 to 2018.
Interestingly, an approximately equal decline seems to
be present across different divisions. Even though the
advantage seems to be in decline, it is still present in
the last couple of years [4].
When analysing 19 European football leagues be-
tween the 2007/2008 season and the 2016/2017 season,
the Greek league had the highest, while the English
Football League Two (fourth rank) had the lowest home
advantage. Based on the results, a hypothesis that home
advantage is reduced in the lower-level leagues can
be stated and connected to the crowd effect [5]. By
analysing football matches of national teams Pollard
and Armatas [6] found out that, besides crowd size, the
altitude and the number of time zones crossed by the
visiting team were significantly related to the number
of points won by home teams. Additionally, every 100
meters of altitude difference is connected with the jump
of expected probability for the home team winning the
match by 1.1 percentage points [7]. Pollard and Armatas
[6] also found out that significantly more red cards were
issued, and more penalties were awarded against the
away team.
To isolate some potential factors of home advantage,
such as players’ familiarity with the stadium and travel
fatigue, Ponzo and Scoppa analysed same-stadium der-
bies [8]. They concluded that there is crowd support’s
effect on the home advantage generated through the
encouragement of players’ performance. Furthremore,
the crowd tends to affect the referee’s decision in favour
of the home team. In the football Champions League
and Europa League, referees issued 25% and 10%
more yellow cards to away teams than to home teams.
The higher level of home team bias in the Champions
League appeared mainly due to higher crowd densities
[9]. Referee bias has been more thoroughly researched
on English Premier League matches by Boyko et al.
[10]. They found out that the referee bias is not omni-
present and varies between referees. Similar to [8], in
order to eliminate some factors, Boudreaux et al. [11]
performed an analysis on the case of matches between
the basketball teams of Los Angeles Lakers and Los
Angeles Clippers, who both play their home matches in
the Staples Center in Los Angeles. Due to crowd effects,
it was estimated that the increase in the likelihood of
home team victory is between 21 and 22.8 percentage
points. Additionally, Sors et al. [12] have also consid-
ered the level of competitive anxiety of referees. They
concluded that the crowd noise does not seem to affect
the referees’ decisions unless we consider the anxiety
because external factors might more easily influence the
decisions of referees with high anxiety. Neural networks
have also been used to analyze home advantage in the
NBA.The conclusions were that attendance, altitude, and
market size were not connected to home advantage.
However, the style of play seems to be connected as
teams that made more two-point and free-throw shots
saw larger advantages at home. Another interesting study
was conducted by Gomez and Pollard [13], they state
that in some European countries basketball teams from
capital cities have a significantly lower home advantage
than teams from other cities. Pollard et al. [14] also com-
pleted a comprehensive analysis of home advantage in
different sports and countries. They state that basketball
and handball have the highest home advantage, which
was the most prominent in Bosnia and Herzegovina,
with other Balkan nations also well above average.
The fact that a large number of sports matches from
2019 to 2021 have been played without spectators, due
to COVID-19, gives us a unique opportunity for analysis,
allowing us to isolate certain factors of home advantage.
For example, many matches were forcefully played
without attendance, which removes the home crowd
effect. Indeed, some research on this matter has already
been conducted, with the results showing that the home
advantage has dropped significantly in games without
spectators [15], [16], [17], [18]. A significant drop in
home advantage ratio was observed in the German
Bundesliga (a 10% drop), turning home advantage into
home disadvantage [15]. In contrast, no change was
observed in the second and third divisions of German
football [18]. Sors et al. [17] also revealed a reduced
home advantage and the absence of referee bias in ghost
games (games without spectators). These results support
the claim that, among all the factors contributing to
home advantage and referee bias, crowd attendance has
a relevant role. Thus, it seems like fans can significantly
affect the outcomes of football matches [16], [17].
The main objective of this study was to conduct
a thorough statistical analysis of home advantage in
basketball and football. We selected five football com-
petitions and five basketball competitions, collected the
data, and then calculated and evaluated the level of the
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 159
home advantage in each league using different metrics
and Bayesian models. We compared the amount of home
advantage between the leagues and also analysed how
it changes through time. Next, we chose three potential
factors that could impact the degree of home advantage.
Since these factors are not directly measurable, we
had to come up with proxy variables. We quantified
the connection between the proxy variables and home
advantage in each league. In the process of searching for
necessary data to conduct the complete analysis of this
work, we found out that there are some limitations in the
granularity and availability of the data. Consequently,
we collected, cleaned, and published the basketball and
football data of high granularity. We appraise publishing
this data as a valuable contribution not only to this work,
but also for potential future research in sports analytics.
2 METHODS
In this section we first describe how we acquired the
data that was later used in our statistical analyses. Next,
we introduced the metrics that we used for quantifying
home advantage. In the final part, we describe the mod-
els behind the Bayesian statistical analyses supporting
our results.
2.1 Data acquisition
To conduct a thorough analysis of home advantage in
basketball and football, as outlined in the Introduction
section, we had to acquire an extensive dataset con-
taining very granular match data. Some sport-related
datasets are publicly available. However, the objective
of this analysis required relatively specific information
(such as the number of spectators or number of fouls in
the match) that was not present in these datasets.
Furthermore, for most of the leagues that we wanted
to include, free official APIs were not available. There-
fore, we had to use the web scraping technique (web
harvesting or web data extraction) to obtain the nec-
essary data. We did this time-consuming process with
Python libraries Selenium [19], Requests [20] and Beau-
tifulSoup [21].
The collected datasets exceed the requirements of this
thesis in terms of the granularity of the data and the
purpose of obtaining the data was not only to get the
required data, but our objective was also to create an
extensive dataset of sports data that would be publicly
available. The collected datasets should ease the process
of obtaining the data for other researchers working on
basketball or football-related projects.
First, we scraped an extensive amount of data for the
following basketball competitions:
• League of Adriatic Basketball Association (ABA),
• Eurocup (EC),
• Euroleague (EL),
• Slovenian Basketball League (SLO) and
• National Basketball Association league (NBA).
A high-level description of the datasets is presented
in Table 1. The dataset, along with the source code, is
available in an open GitHub repository https://github.
com/timurkulenovic/basketball-dataset [22].
For the ABA and NBA leagues, data is available
before the season of 2007/2008. However, to use the
same time-frame for all leagues, we did not use any
data before the season of 2007/2008. Table 2 shows the
number of matches we included in our work.
Second, we obtained the data for the following foot-
ball leagues:
• Premier League (English league),
• Bundesliga (German league),
• Ligue 1 (French league),
• Serie A (Italian league) and
• La Liga (Spanish league).
These football datasets with the source code behind
their acquisition are also available at our GitHub repos-
itory https://github.com/timurkulenovic/football-dataset
[23]. The datasets are not as granular as the basketball
datasets. However, they still contain much information
for each game. The first available season is 2009/2010
(there is some data available before, but it is limited),
and the last available season is 2022/2023. We used all
of the seasons for the analysis. Table 3 presents summary
data about each used league.
2.2 Metrics
In this analysis, we use two home advantage metrics,
the first one quite common in related work. We call this
metrics HA
SEASON
and it denotes the percentage of
the team’s points won at home throughout the whole
season. This metrics can be used to home advantage at
either the season, the league, or the sport level. However,
to conduct our analysis in the desired way, we needed
another metric that measures the home advantage on the
level of a single game. Consequently, we propose the
HA
GAME
metric, which serves as a measure of the
home advantage on a game level and, therefore, enables
us to quantify the home advantage for every game.
2.2.1 Home advantage on level of the whole season:
As mentioned, HA
SEASON
metric measures home ad-
vantage on a team level in the scope of one season.
It represents the amount of the team’s points gained at
home divided by the team’s total points:
HA
SEASON
=
Points
H
Points
H
+Points
A
, (1)
wherePoints
H
denotes the points that the team gained
at home and Points
A
denotes the points that the team
gained away in the scope of a season. In football, the
team gets 3 points for a win, 1 point for a draw, and 0
points for a loss. In basketball, the winner gets 2 points
and the loser gets 0 points.
160 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
ABA Eurocup Euroleague SLO NBA
First season available 2001/2002 2007/2008 2007/2008 2007/2008 2000/2001
Main Info ✓ ✓ ✓ ✓ ✓ Box Score ✓ ✓ ✓ ✓ ✓ Play by play ✓ ✓ ✓ ✓ Score evolution ✓ ✓ ✓ ✓ Shots ✓ ✓ ✓ Team comparison ✓ ✓ ✓ Venues ✓ ✓ ✓ ✓ ✓ Table 1.: Description of the basketball dataset. Cells with the checkmarks denote that data is available for the
corresponding league. The last season available is 2022/2023 for all competitions.
ABA Eurocup Euroleague SLO NBA
2007/2008 196 326 227 217 1316
2008/2009 185 150 184 222 1315
2008/2010 185 156 184 218 1312
2010/2011 185 155 185 143 1311
2011/2012 184 156 184 158 1074
2012/2013 185 156 249 158 1314
2013/2014 184 362 249 179 1319
2014/2015 194 306 247 189 1311
2015/2016 189 306 246 159 1316
2016/2017 191 146 255 201 1309
2017/2018 142 184 256 185 1312
2018/2019 143 186 256 154 1312
2019/2020 125 168 252 83 1142
2020/2021 166 185 324 138 1165
2021/2022 197 189 295 157 1317
2022/2023 202 195 324 153 1314
Total 2853 3326 3917 2714 20459
Table 2.: Number of basketball matches (by season and league) used in our work. We excluded games played
at neutral locations. Note that a smaller amount of games in 2019/2020 is due to the COVID-19 outbreak.
Premier League La Liga Ligue 1 Serie A Bundesliga
Games per season 380 380 380 380 306
Total games 5320 5320 5218 5319 4284
Table 3.: Number of football matches used in our work. Data for each league consists of 14 seasons – from
2009/2010 to 2022/2023. Season 2019/2020 in Ligue 1 ended without all the matches being played. One Serie A
match in season 2012/2013 was not played.
2.2.2 Home advantage on level of an individual
game: One of the goals of this work is to quantify
the effect of different factors on home advantage. To
fulfil this goal, we proceeded with quantifying the home
advantage for each game. This requires defining a metric
HA
GAME
that quantifies home advantage on a game
level. The gist of the metric is to compare the expected
point difference δ EXP
between two teams and the
observed point (or goal) difference δ OBS
between two
teams. The main idea is that the difference between
δ OBS
and δ EXP
yields the quantified home advantage
of the game. Obtaining δ OBS
is straightforward. We
simply calculate the difference between points scored
by the home team and points scored by the away team.
Next, we need a value that measures the expected point
difference δ EXP
. To obtain it, we use the averages of
score difference in a season for home and away teams.
The expected differenceδ EXP
is the difference between
the home team’s average and the away team’s average.
Rewriting the idea in a bit more concise manner,
HA
GAME
measures home advantage on a game level
between the home teamA and the away teamB in sea-
soni.S(T,i) denotes teamT ’s average score difference
in season i:
δ EXP
=S(A,i)− S(B,i),
δ OBS
=Points
A
− Points
B
,
HA
GAME
=δ OBS
− δ EXP
.
(2)
Let’s take an example from the 2022/2023
Premier League season, when Arsenal scored
88 goals and conceded 43 goals on 38 matches,
S(Arsenal,2022/2023) =
88− 43
38
= 1.184, whereas
Tottenham scored 70 goals and conceded 63 goals,
S(Tottenham,2022/2023) =
70− 63
38
= 0.184. The
match between Arsenal and Tottenham game ended 3-1.
The expected difference δ EXP
was 1.184− 0.184 = 1,
while the observed difference δ OBS
was 3 - 1 = 2.
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 161
Finally, HA
GAME
is 2− 1 = 1.
2.3 Factors influencing the home advantage
One of the goals of our study was to quantify the
factors that are considered to have some effect on the
home advantage and analyze the level of their correlation
with the home advantage. Based on the availability of
the data, we selected three factors: referee bias, crowd
impact and travel fatigue. None of these factors is
directly measurable, so we had to introduce proxy vari-
ables that try to serve in place of them. Furthermore, we
must be aware that there are several other confounding
factors that are not included in our data and thus not
included in the analysis. Because of this, we must be
very cautious when making any kind of claims about
the direct influence (causation) of a certain factor on the
home advantage. For example, it is known that teams
that are playing well and achieving good results have
a higher attendance, so higher attendance might be the
effect of a team doing well and not vice-versa. The goal
is not to build a high-performance model that would
predict the home advantage with high accuracy but to
check the connection between the chosen factors and the
home advantage.
2.3.1 Referee bias: When discussing the home ad-
vantage in football, basketball or other similar sports,
a commonly mentioned factor is the referee bias. It is
widely believed that the referees sometimes, intention-
ally or not, help certain teams with unfair decisions. One
of the reasons for this is the pressure from the crowd.
Therefore, we expect the referee bias to be more in
favour of the home teams rather than the away teams. It
is, however, not straightforward to quantify the level of
referee bias, as there is no objective variable that would
measure the referee bias. Hence, we created a metric
calledRBIAS that quantifies the referee bias based on
the called fouls. The idea of this variable is similar to
the concept of HA
GAME
. We compare the observed
difference of committed fouls F
OBS
with the expected
differenceF
EXP
, which is based on the teams’ averages
in a season. As such,RBIAS measures the referee bias
on a game level between the home teamA and the away
teamB in the seasoni. The notationC(T,i) denotes the
team T ’s average of the foul difference in season i and
is calculated as the difference between the team’s total
drawn fouls and the team’s total committed fouls. The
given difference is then divided by the number of the
games that the team played in seasoni. WithDFouls
A
andDFouls
B
we denote the observed number of drawn
fouls in the game for teams A and B, respectively:
F
EXP
=C(A,i)− C(B,i),
F
OBS
=DFouls
A
− DFouls
B
,
RBIAS =F
OBS
− F
EXP
.
(3)
To illustrate the idea, let’s take an example from
the 2007/2008 Euroleague game between Tau Ceramica
(home team) and Union Olimpija (away team). In the
season 2007/2008 Tau Ceramica drew 480 fouls and
committed 448 fouls in 23 games, while Union Olimpija
drew 253 fouls and committed 345 fouls in 14 games:
C(Tau Ceramica,2007/2008) =
480− 448
23
= 1.39,
C(Union Olimpija,2007/2008) =
253− 345
14
=− 6.57.
(4)
The expected differenceF
EXP
was1.39− (− 6.57) =
7.96. In the match, Tau Ceramica drew 26 fouls and
Union Olimpija drew 16. The observed differenceδ OBS
was 26 - 16 = 10. Finally,RBIAS = 10− 7.96 = 2.04.
We must be aware that this newly introduced variable
measures the number of fouls teams draw and commit
compared to the expected number of fouls. We assume
that the average of the variable being above 0 originates
from the referee bias. However, it could also be that
teams draw more fouls at home, because they play
in such a way at home due to some other factors.
Furthermore, for the assessment we only used fouls
(fouls in football and personal fouls in basketball). We
could expand this variable to consider personal fouls,
unsportsmanlike fouls, technical fouls (in basketball),
yellow cards, red cards, and penalty kicks (in football).
However, we wanted it to quantify something similar in
both sports, so we kept the simple definition that only
includes standard fouls.
2.3.2 Crowd impact: Another factor that is perceived
to have an important impact on the home advantage is
the crowd effect. It is believed that loud and supportive
chants motivate the home team to play better. Again, it
is difficult to quantify the crowd effect directly, so we
used the attendance number information. We introduce
theATT metric, which represents the ratio between the
number of spectators in the game and the maximum
number of spectators ever recorded in the arena where
the game was played. ATT is represented as a number
on [0,1] interval that describes how full the venue is:
ATT =
n
spectators
max
gamesinarena
(n
spectators
)
.
(5)
2.3.3 Travel fatigue: The third factor included in our
analysis is the travel fatigue that players experience
when they have to travel to an away venue. Similar to
the other factors, there is no simple way of quantifying
travel fatigue. Instead, we used the air distance the away
team had to travel from their hometown to the away
arena. To have the variable in the same range for all
the leagues, we normalise the variable by dividing the
distance by the league’s median travel distance:
162 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
DIST =
air distance
median over air distances in the league
.
(6)
This way, we obtain a value that tries to quantify
how much fatigue the travel caused normalised by the
league’s standards. There are some shortcomings of this
approach. We assume that teams always travel to the
away arena from their home location, which is not
always true because teams on tight schedules often
travel from the location of their previous away game.
Furthermore, teams use different means of transportation
(for example, a bus instead of a plane) for the same
distance. Consequently, the same air distance can cause
different levels of travel fatigue.
2.4 Statistical modelling
We used Bayesian statistics to analyse the results. All
analyses were conducted using Stan – a state-of-the-
art platform for executing modern Bayesian statistical
analyses [24]. We used Stan’s default (non-informative)
priors in all analyses.
To distinguish reported Bayesian probabilities from
frequentist p-values we denote them with a capital
P. Unlike p-values, the reported probabilities directly
describe the probability by which we can claim that
our hypotheses are true or not. The probability that
the opposite of our claim is true can be calculated as
1− P . We used Monte Carlo Standard Error (MCSE) to
estimate uncertainty in our quantifications. Since MCSE
was in all cases lower than 1%, we decided to omit it
for the sake of brevity.
When comparing leagues between each other through
the HA
SEASON
and HA
GAME
metrics, we used the
following Bayesian hierarchical normal model:
HA∼ N(µ HAseason
i
,σ 2
HAseason
i
),
µ HAseason
i
∼ N(µ HA
league
,σ 2
HA
league
),
(7)
where HA is either HA
SEASON
or HA
GAME
. With
this model, we obtain the µ HA
league
posterior distribu-
tion and n
season
posterior distributions of µ HAseason
i
for each of the five leagues. We first used theHA values
to get the season-level parameters. These were then used
to obtain the league-level parameters.
To analyzeRBIAS, we use the two-level hierarchical
normal model, with the leagues on the first level and the
seasons on the second level:
RBIAS∼ N(µ RBIASseason
i
,σ 2
RBIASseason
i
)
µ RBIASseason
i
∼ N(µ RBIAS
league
,σ 2
RBIAS
league
).
(8)
We analyze the impact ofRBIAS,ATT , andDIST
on HA
GAME
by using Bayesian linear regression,
which can be formalized as:
HA
GAME
∼ N(α +β RBIAS
RBIAS+
β ATT
ATT+β DIST
DIST,σ 2
).
(9)
All three independent variables, as well as the depen-
dent variable, are calculated at the game level. Therefore,
each game corresponds to one data point when fitting the
linear regression model. Since the variable DIST was
created using the league-level information, we modelled
each league separately. We have five separate linear
regression fits with one fit for each league. The intercept
α is included because it can be interpreted as the value
ofHA
GAME
if all the included home advantage factors
were not present (i.e., they are all 0). By observing how
close to 0 it is, the value of the intercept gives insight
into how much of the HA
GAME
can be attributed to
the three included factors.
3 RESULTS
In the first part of our results section, we describe
our findings when comparing the HA
SEASON
and
HA
GAME
metrics across leagues and across seasons
(years) for both basketball and football. In the second
part, we investigate how various factors (crowd impact,
referee bias, and travel fatigue) might influence home
advantage.
3.1 Comparing leagues and seasons in basketball
The results of our analysis for the HA
season
metric
in basketball are visualised in Figure 1. Our analysis
shows that teams achieved 59%-67% of their wins at
home. In the large majority of the cases, teams achieved
more than half of their wins at home, but there are cases
in each league where teams achieve more wins away
than at home, i.e., HA
SEASON
< 0.5. In the Eurocup
and Euroleague, we observe a bit higher variance of
HA
SEASON
. It could be because teams played fewer
games in one season of these competitions, the compe-
tition was organized into groups of 6 teams, because of
this each team played only ten games.
The differences between the leagues might not be
immediately clear from the histograms but are more
evident in the posterior distributions of µ HA
SEASON
,
displayed on the left side of Figure 1. NBA and the
Slovenian league semm to have the smallest amount of
home advantage, while ABA again has the highest level
of home advantage. The distributions for the Eurocup
and the Euroleague seem to be quite aligned, which
is confirmed by quantified comparisons between the
leagues in Table 4, where P(µ HA
SEASON
Eurocup
>
µ HA
SEASON
Euroleauge
) = 0.586 ± 0.011. Further-
more, from the probabilities in the table, we can
conclude that we have one group of leagues (NBA
and the Slovenian league) that very likely has lower
µ HA
SEASON
than another group (ABA, Eurocup and
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 163
Figure 1.: Histograms for the HA
SEASON
metric (left) and posterior distributions calculated with our
Bayesian anaysis of theµ HA
SEASON
parameter (right) for basketball leagues. The black dashed line represents
the distribution mean. The red line represents the value around which the distribution would be centered (0.5) if
there was no home advantage. In the left part of the figure, we see that there are cases in all the leagues when
teams achieve more points in away games than in home games. However, the right part shows that the means are
certainly above 0.5. ABA is likely to have the highest value of the metrics, followed by Euroleague and Eurocup,
while NBA and the Slovenian league seem to have the lowest µ HA
SEASON
value.
SLO ABA EC EL NBA
SLO - ≈ 0 0.005± 0.001 0.006± 0.002 0.857± 0.006
ABA ≈ 1 - 0.865± 0.007 0.913± 0.005 ≈ 1
EC 0.995± 0.001 0.135± 0.007 - 0.586± 0.011 ≈ 1
EL 0.994± 0.002 0.087± 0.005 0.414± 0.011 - ≈ 1
NBA 0.144± 0.006 ≈ 0 ≈ 0 ≈ 0 -
Table 4.: Comparison of µ HA
SEASON
for basketball leagues. Each cell represents P(µ HA
SEASON
i
>
µ HA
SEASON
j
) ± MCSE. ABA, Eurocup and Euroleague surely have higher µ HA
SEASON
than NBA and the
Slovenian league. Furthermore, ABA is likely to have a higher value than Eurocup and Euroleague, but the
probability is not very close to 1.
164 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Euroleague) – P(µ HA
SEASON
ABA,Eurocup,Euroleague
>
µ HA
SEASON
NBA,SLO
) = 0.9981± 0.0005. Strong claims
about intra-group comparisons cannot be made.
NBA and the Slovenian league consistently have
the lowest home advantage and ABA has the highest
µ HA
SEASON
in almost every season, while the Eurocup
and Euroleague consistently seem to be somewhere in
between. The drop during the COVID-19 seasons is
visible for NBA, ABA and Euroleague, but it is not very
significant for the Slovenian league and Eurocup.
The second metric, HA
GAME
, is distributed nor-
mally with mean values ranging from 2.7 to 4.4, as
shown in the histograms in the left part of Figure 3.
This means that home teams gain a 2.7–4.4 point
higher score difference on average compared to the
expected score difference. Note that the variances of
these distributions are relatively high, i.e., it is not
rare that HA
GAME
is negative. However, the posterior
distributions of µ HA
GAME
parameters in the right part
of the figure suggest that we are extremely confident
that the mean is positive. Once again, ABA has the
highest amount of home advantage, followed by the
Eurocup and Euroleague, while NBA and the Slovenian
league have the least amount of it. This consistency in
conclusions between the establishedHA
SEASON
metric
and the newly introduced HA
GAME
metrics gives as
an assurance that our new metric is a viable descriptor
of home advantage. This is important for more detailed
analyses that follow in the second part of this work.
By observing the results in Table 5, we can claim
that ABA has higherµ HA
GAME
value than the Eurocup
and the Euroleague with probabilities 0.962 and 0.935,
respectively. The Eurocup and the Euroleague seem to
have quite equal amounts of home advantage according
to this metric. The probabilities of these two competi-
tions having higherµ HA
GAME
than the Slovenian league
is fairly high (0.881 for the Eurocup and 0.938 for the
Euroleague). Nevertheless, we can be confident with a
0.953 probability, that even the Slovenian league has a
higher value than NBA. We are confident that ABA is
the league with the highest µ HA
GAME
and that NBA is
the one with the lowest value.
We show the visualisation of µ HA
GAME
over the
seasons in Figure 4. Overall, the posterior distributions
are very similar to those in Figure 2. ABA seems to be
consistently the league with the highest home advantage,
while NBA seems to have the lowest home advantage.
One visible discrepancy of µ HA
GAME
compared to the
other metric is that ABA has a lower value than the
Euroleague and Eurocup in 2007/2008, as this is not the
case with µ HA
SEASON
.
To sum up, both metrics yield similar results. ABA
very likely has the highest amount of home advan-
tage, the Eurocup and Euroleague follow, while the
Slovenian league and NBA have the least amount of
home advantage. The results also show that this order
stays consistent over the seasons without any major
discrepancies across the metrics.
3.2 Comparing leagues and seasons in football
In the left part of Figure 5, we observe that the
HA
SEASON
values are approximately normally dis-
tributed with a mean around 0.59 except for La Liga,
which has a mean above 0.61. In the right part of
the Figure 5 we visualise the posterior distributions of
µ HA
SEASON
. Once more, La Liga has the most promi-
nent home advantage, whereas the other four leagues
seem to be more or less similar to each other in terms
of their µ HA
SEASON
posterior distributions.
We show the probabilities of comparisons between
the leagues in Table 6. The second row consists of the
values estimating the probability that La Liga has higher
µ HA
SEASON
than the other leagues. The lowest proba-
bility in this row is 0.938 (for Ligue 1), which confirms
that La Liga very likely has the highest µ HA
SEASON
.
The probabilities that estimate the relationships between
the other leagues are too small to be able to claim
anything with high confidence.
We visualise the µ HA
SEASON
posterior distributions
over the seasons in Figure 6. Spanish La Liga is quite
consistent in having the highest home advantage over
the seasons, which also explains the higher probabilities
in Table 6. Bundesliga seems to have lower values in
the first four seasons but was consistently somewhat
in the middle for the rest of the seasons. We also
observe a drop in the COVID-19 seasons 2019/2020
and 2020/2021. P(µ HA
SEASON
) drops below 0.5 for
some of the leagues in 2020/2021. Except for these two
seasons, the distributions are positioned approximately
around the same values over the seasons.
As shown in Figure 7,HA
GAME
in football approx-
imately follows the normal distribution with a mean
ranging from 0.315 to 0.429 for different leagues. The
interpretation of this is that home teams in football
gain a bit less than half of a goal of the advantage
on average. It happens quite often that HA
GAME
is
below zero. Nevertheless, based on the visualisation
in the right column of Figure 7, we can be confident
that µ HA
GAME
(the expected mean) is positive for all
football leagues. Again, as with theHA
SEASON
metric,
µ HA
GAME
for La Liga stands out a bit when comparing
leagues’ posterior distributions for the parameter.
Looking at the second row in Table 7, we see that
the probabilities of La Liga having higher µ HA
GAME
than the other leagues are above 0.9, where the lowest
out of these four is the probability that La Liga has a
higher value than the Bundesliga (0.926). Furthermore,
there is 0.897 chance that the Bundesliga has more home
advantage than Serie A, which seems to have the lowest
µ HA
GAME
.
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 165
Figure 2.:µ HA
SEASON
posterior distributions for basketball leagues over the seasons. The order of the leagues
seems to be fairly consistent over the seasons, with ABA having the highest value and NBA and the Slovenian
league with the lowest value.
166 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 3.: Histograms forHA
GAME
and posterior distributions ofµ HA
GAME
for basketball leagues. The black
dashed line represents the distribution mean. The red line represents the value around which the distribution would
be centered (0) if there was no home advantage. HA
GAME
is distributed normally with positive means but with
frequent negative values. On the right side, we see that it is certain that the mean parameters of the distributions
are above 0.
SLO ABA EC EL NBA
SLO - 0.005± 0.001 0.119± 0.005 0.062± 0.004 0.953± 0.003
ABA 0.995± 0.001 - 0.962± 0.003 0.935± 0.004 ≈ 1
EC 0.881± 0.005 0.038± 0.003 - 0.352± 0.008 0.999± 0.001
EL 0.938± 0.004 0.065± 0.004 0.648± 0.008 - ≈ 1
NBA 0.047± 0.003 ≈ 0 0.001± 0.001 ≈ 0 -
Table 5.: Comparison of µ HA
GAME
for basketball leagues. Each cell represents P(µ HA
GAME
i
> µ HA
GAME
j
)
± MCSE. High values in the NBA column show that all other leagues very likely have higher µ HA
GAME
, while
high values in the ABA row indicate that this league has the most prominent home advantage when concerning
this metric.
The fact that La Liga has the highest µ HA
GAME
is
mostly due to the high values in seasons from 2010/2011
until 2015/2016 as seen in Figure 8. After 2015/2016
La Liga does not have a prominent home advantage
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 167
Figure 4.: Posterior distributions of µ HA
GAME
for basketball leagues over the seasons. Most values of the
distributions lie between 2 and 5, with some exceptions, most notably with ABA having the majority of the
distribution values over 5 in the seasons 2008/2009, 2009/2010, and 2010/2011. Otherwise, the changes do not
seem to correlate with time, so there are no clearly visible trends. The league order mostly stays consistent over
the seasons, just like with the µ HA
SEASON
metric.
168 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 5.: Histograms for HA
SEASON
and posterior distributions of µ HA
SEASON
for football leagues. The
left column shows distributions of HA
SEASON
, which seem normally distributed around a similar mean. In the
right column, we show the posterior distributions of µ HA
SEASON
parameter, where we observe that La Liga very
likely has a higher mean value than the other four leagues.
ENG SPA ITA GER FRA
ENG - 0.039± 0.004 0.618± 0.008 0.602± 0.008 0.375± 0.008
SPA 0.961± 0.004 - 0.979± 0.002 0.985± 0.002 0.938± 0.004
ITA 0.382± 0.008 0.021± 0.002 - 0.465± 0.008 0.250± 0.007
GER 0.398± 0.008 0.015± 0.002 0.535± 0.008 - 0.272± 0.008
FRA 0.625± 0.008 0.062± 0.004 0.750± 0.007 0.728± 0.008 -
Table 6.: Comparison ofµ HA
SEASON
for football leagues. Each cell representsP(µ HA
SEASON
i
>µ HA
SEASON
j
)
± MCSE. High values in the second row confirm, with a high probability, that La Liga seems to be the league
with the highest overall µ HA
SEASON
.
compared to the other leagues. We stated that the
Bundesliga is likely the league with the second most
home advantage and from these distributions, we can,
in large part, attribute this to three most recent seasons,
when the Bundesliga had the highest µ HA
GAME
value.
Also, we observe a drop in the values in 2020/2021.
Not only is the shift in 2020/2021 very clear, but also
the probabilities of µ HA
GAME
for Ligue 1 and Premier
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 169
Figure 6.: Histograms forHA
SEASON
and posterior distributions ofµ HA
SEASON
for football leagues over the
seasons. La Liga distributions are consistently distributed over the higher values. We can be sure thatµ HA
SEASON
values are over 0.5, except for Premier League and Ligue 1 in 2020/2021.
170 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 7.: Histograms for HA
GAME
and posterior distributions of µ HA
GAME
for football leagues. La Liga
is very likely to have the highest home advantage according to µ HA
GAME
as well. Serie A is likely to have the
lowest value, however, the confidence in this claim is not very high.
ENG SPA ITA GER FRA
ENG - 0.052± 0.004 0.772± 0.007 0.341± 0.007 0.536± 0.007
SPA 0.948± 0.004 - 0.997± 0.001 0.926± 0.004 0.969± 0.003
ITA 0.228± 0.007 0.004± 0.001 - 0.103± 0.005 0.243± 0.007
GER 0.659± 0.007 0.074± 0.004 0.897± 0.005 - 0.699± 0.007
FRA 0.464± 0.007 0.031± 0.003 0.757± 0.007 0.302± 0.007 -
Table 7.: Comparison of µ HA
GAME
for football leagues. Each cell represents P(µ HA
GAME
i
> µ HA
GAME
j
) ± MCSE. Spanish La Liga is very likely the league with the highestHA
GAME
value (second row). There also seems
to be an important difference between Bundesliga and Serie A, as P(µ HA
GAME
GER
>µ HA
GAME
ITA
) = 0.897.
League drop below 1, similar to the drop inµ HA
SEASON
.
We conclude that Spanish La Liga is the league
with the highest home advantage based on the values
from both metrics. However, this is mostly due to La
Liga’s higher home advantage in certain seasons only.
We cannot claim that this league consistently has the
highest values over the seasons. Furthermore, we see that
the differences between basketball leagues are of lower
degree compared to the differences between football
leagues. The order of football leagues over the seasons
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 171
Figure 8.: Histograms for HA
GAME
and posterior distributions of µ HA
GAME
for football leagues over the
seasons. La Liga likely had the highest µ HA
GAMEseason
from 2010/2011 to 2015/2016, while Serie A very often
had the lowest value.
172 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
is very inconsistent, whereas in basketball, the order
did not change much over time. Again, we can see
consistencies between conclusions reached when using
HA
SEASON
andHA
GAME
which gives validity to our
newly introduced metric.
3.3 Correlation between metrics
In sections that follow, we will present the anal-
ysis of the home advantage factors, where we used
the HA
GAME
metric. As we stated, this is a newly
proposed metric that tries to measure home advantage
on a game level, previous sections already showed
that conclusions reached through both metrics are con-
sistent. To validate it even further and gain addi-
tional insights, we checked its correlation with the
widely used HA
SEASON
metric that measures home
advantage on the season level. Since HA
GAME
and
HA
SEASON
have different scopes of definition, we
grouped HA
GAME
by season and (home) team and
calculated the team’s average in a single season –
HA
GAME
. Next, we calculated Pearson’s correlation
coefficient between HA
SEASON
and HA
GAME
. We
obtained the coefficient for each league separately. The
results are presented in Table 8. Coefficients for football
leagues range from 0.74 to 0.83. Hence, this suggests a
strong correlation between the metrics. The coefficients
for basketball leagues are a bit lower, but still confidently
positive. For four leagues, the values are above 0.55,
which still confirms a relatively high correlation. The
correlation for the Slovenian league is lower though
(0.287).
3.4 Influence of various factors on home advantage
in basketball
In Figure 9, we display the distribution of theRBIAS
over basketball leagues. In the left column of the fig-
ure we observe that the RBIAS values are normally
distributed with means close to 0, however posterior
distrubutions obtained through our Bayesian analysis
(displayed in the right column of the figure) show that
we are very confident that the means are above 0.
This means that teams seem to be getting more foul
calls to their benefit when they are playing at home
court, but as we mentioned, we cannot claim that this
exclusively happens due to the bias of the referees.
Home teams in NBA and Euroleague are getting more
than 0.6 additional calls per game. While ABA, Eu-
rocup, and Slovenian League follow with averages of
0.57, 0.44 and 0.2, respectively. The probability that
µ RBIAS
of Slovenian league is above 0 is still high
– P(µ RBIAS
SLO
> 0) = 0.978± 0.003.
In Figure 10, we visualise theµ RBIASseason
posterior
distributions over time to have better insight if there
are any changes through the seasons. The order of the
leagues matches the order in Figure 9 and stays more
or less the same over the seasons. We also notice that
µ RBIASseason
values for NBA are somewhat declining
over the seasons. A drop in the Euroleague occurred
between the season 2020/2021.
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 173
Sport League Coefficient
Basketball
Slovenian league 0.287
ABA 0.549
Eurocup 0.602
Euroleague 0.552
NBA 0.677
Football
Premier League 0.807
La Liga 0.742
Serie A 0.822
Bundesliga 0.828
Ligue 1 0.804
Table 8.: Pearson’s correlation coefficients betweenHA
SEASON
andHA
GAME
. Coefficients for football leagues
confirm a strong correlation between the metrics. The coefficients for the basketball are somewhat lower. However,
the correlation is still clearly there.
Figure 9.: Histograms for RBIAS and posterior distributions of µ RBIAS
for basketball leagues over the
seasons. The Slovenian league is very likely the one with the lowest µ RBIAS
, while Euroleague and NBA seem
to have the highest values of referee bias.
174 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 10.: Posterior distributions of µ RBIASseason
for basketball leagues over the seasons. The order of teams
by µ RBIAS
is consistent over the seasons and matches the order in Figure 9. The Slovenian league always had
the lowest value, the Euroleague and NBA most often had the highest values, with NBA values declining in recent
seasons, while the values for ABA and Eurocup are somewhere in between.
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 175
The purpose of Figure 11 is to have an insight into the
distributions ofATT andDIST , which help us interpret
the linear regression coefficients. In the left column, we
observe that NBA games have an average the attendance
of 84% of the capacity, while the games in Slovenian
league have an attendance below 28% of the capacity.
In the right column, we check if any league has a
DIST distribution with a long tail. To some extent, this
occurs in ABA because Maccabi Tel Aviv from Israel
also competed for one season, while the other venues in
ABA are mostly in the region consisting of West Balkan
countries, Croatia and Slovenia. We cannot compare this
variable between the leagues because it is normalised by
the league median. Its purpose is to have a variable on
the same scale across the leagues and not to directly
compare the distances between the leagues.
We show β and intercept posterior distributions ob-
tained by Model 9 in Figure 12 accompanied by prob-
abilities of β being larger than zero in Table 9. The
results differ from league to league. For none of the three
variables we can say that the impact is the same in every
league. In the Slovenian league, β RBIAS
is distributed
around 0, therefore RBIAS seems to neither have a
negative nor a positive impact on HA
GAME
. Interest-
ingly, ATT has a negative correlation with HA
GAME
,
withβ ATT
average of around -3, which means that on a
typical Slovenian league game with ATT = 0.275 this
factor adds -0.825 of HA
GAME
, i.e. almost an extra
point for the away team. This is something that we did
not expect and the explanation for this is not straightfor-
ward. Considering that β ATT
is negative only for this
league, the reason might lie within certain specifics of
the league.DIST variable seems to have positive corre-
lation withHA
GAME
in the Slovenian league, but this is
not significant –P(β DIST
> 0) = 0.787± 0.009. Due
to the negative β ATT
, the intercept for the Slovenian
league (3.865) is higher than the average of HA
GAME
for this league (3.335).
The results for the ABA, Eurocup and Euroleague are
somewhat similar. Negative values ofβ RBIAS
show that
RBIAS seems to have a negative impact – probabilities
P(β RBIAS
)> 0 are 0, 0.012 and 0.006, respectively for
ABA, the Eurocup and the Euroleague. On the contrary,
the probabilities thatATT has a positive effect are very
high, especially for ABA and the Eurocup, and less so
for the Euroleague – probabilities P(β RBIAS
> 0) are
0.961, 0.986 and 0.826, respectively for ABA, the Eu-
rocup and the Euroleague.β DIST
for ABA is distributed
around 0 –P(β DIST
> 0 = 0.604), while the values are
higher for the Eurocup and the Euroleague, henceDIST
might have a positive effect but we cannot claim this
with very high confidence. We compare the intercepts
with the averages of HA
GAME
. The average of ABA
is 4.3, while the intercept is 4.15. Hence, the difference
is not remarkable. The negative impact of RBIAS and
positive impact ofATT in ABA cancel out, implicating
the existence of some other impactful factors that we did
not include in our analysis. Similar findings hold for the
Eurocup and the Euroleague, with HA
GAME
averages
of 3.784 and 3.915. Their averages of the intercepts
are 2.651 and 3.102, respectively. If we compare the
averages for the Eurocup, the positive impact of ATT
and DIST (and negative impact of RBIAS) can be
accounted for more than one point of HA
GAME
on
average (1.1), while in the Euroleague, the impact is
a bit less than one point on average (0.8).
In the NBA, the impact of RBIAS is low, but it
is very likely positive – P(β RBIAS
> 0 = 0.964).
Compared to the other leagues, the value of β ATT
is
quite high, which indicates a strong correlation between
ATT and HA
GAME
in the NBA. The DIST factor
does not seem to be of great importance in the NBA.
With P(β DIST
> 0) = 0.353, it does not seem to have
either a positive or a negative effect. Comparing the
NBA average of HA
GAME
(2.711) with the intercept
(0.864), we infer that the included factors account (ATT
mostly) for almost 2 points of HA
GAME
, which is
noticeably more than for the other leagues. Still, there is
0.864 of a point of HA
GAME
that cannot be explained
with included variables and is caused by some other
factors.
We saw that for three leagues P(β RBIAS
> 0) is
close to 0. Hence, the results suggest RBIAS has a
negative correlation with HA
GAME
. This is something
that we did not expect and should be investigated more
in-depth. However, one explanation could be that the
referees are helping teams when they are having a bad
game with negative HA
GAME
.
3.5 Influence of various factors on home advantage
in football
Before analysing the factors in football, we removed
the data for some leagues before 2012/2013 since they
were missing the attendance information. In Figure 13,
we display the distribution of RBIAS, from the his-
tograms, we see that RBIAS is distributed normally
around 0. On the right side, we observe the posterior dis-
tributions ofµ RBIAS
obtained by applying the Model 8
on the football data.
We notice that the probability of La Liga having a
negative µ RBIAS
is very high, while for the other four
leagues, we can be confident that µ RBIAS
is positive.
In these leagues, the referees seem biased in favor of
the home team. Bundesliga is likely the league with
the highest µ RBIAS
, where, on average, the referees
seem to call 0.785 additional foul in favor of the home
team. The distributions for Ligue 1 and Premier League
have higher variance than the other three leagues. Since
µ RBIAS
was modelled using the hierarchical model
with the seasons on the second level, we visualize the
µ RBIAS
over the seasons in Figure 14.
176 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 11.: Histograms for ATT and DIST for basketball leagues. NBA seems to have the highest level of
arena fullness, while the arenas in the Slovenian league have an average attendance below 28% of the arenas’
capacities.
P(β RBIAS
> 0) P(β ATT
> 0) P(β DIST
> 0) α P (α> 0)
SLO 0.624± 0.009 0.005± 0.002 0.787± 0.009 3.872± 0.015 ≈ 1
ABA ≈ 0 0.961± 0.004 0.604± 0.008 4.149± 0.01 ≈ 1
EC 0.012± 0.002 0.986± 0.003 0.804± 0.008 2.651± 0.012 ≈ 1
EL 0.006± 0.002 0.826± 0.008 0.886± 0.006 3.102± 0.014 ≈ 1
NBA 0.964± 0.004 ≈ 1 0.353± 0.009 0.864± 0.009 0.995± 0.002
Table 9.: Probabilities of linear regression coefficients being positive for basketball leagues. We also include the
mean and MCSE forα to compare it toµ HA
GAME
. The results suggest that the probabilities ofRBIAS andATT
having positive correlation withHA
GAME
in NBA are high, while theRBIAS in ABA, Eurocup, and Euroleague
is likely to have a negative effect. It is also very likely that ATT has positive correlation with HA
GAME
in ABA
and Eurocup.
Indeed, the distributions over the seasons for Ligue
1 and Premier League are inconsistent, which explains
the higher variance of µ RBIAS
in Figure 13. Similarly,
the low variance in Serie A, La Liga, and Bundesliga
results from the high consistency of their seasonal distri-
butions. The distributions for Bundesliga are relatively
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 177
Figure 12.: Posterior distributions of the β coefficients and the intercept for basketball leagues. The red
dashed line at 0 visualises whether a certain factor has a positive or a negative impact on HA
GAME
. The results
are inconsistent between the leagues, but we can make some general observations. Overall, ATT has a positive
correlation with HA
GAME
, while RBIAS seems to have a negative correlation in three leagues.
consistent, but we observe a change when comparing
the first few seasons (µ RBIAS
around 1) and the last
few seasons (µ RBIAS
around 0.5). In 2020/2021, where
most of the stadiums had no spectators, Ligue 1 and
Premier League (along with the Bundesliga) very likely
had negative µ RBIAS
, which indicates that spectators
might influence referee bias. Overall, there are extra foul
calls for the home team on average (with the exception
of La Liga). However, this value is lower than 1.
178 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 13.: Histograms for RBIAS and posterior distributions of µ RBIAS
for football leagues over the
seasons. In the histograms, we observe that RBIAS is distributed approximately around 0. From the posterior
distributions we see that µ RBIAS
is very likely negative for La Liga and positive for the other four leagues.
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 179
Figure 14.: Posterior distributions of µ RBIASseason
for football leagues over the seasons. The distributions for
La Liga and Serie A are quite consistent over the seasons, while those for Ligue 1 and Premier League are less
consistent. In the 2020/2021 (the season with the lowest number of spectators) we see that three leagues were
highly likely to have negative µ RBIAS
. Data for some leagues is missing in the first seasons.
180 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Like in basketball, we show the distributions forATT
andDIST for football leagues in Figure 15. We see that
in all the leagues, there were approximately 500 games
without attendance. The stadiums are often fully packed
(ATT = 1) in the Bundesliga and the Premier League.
These two leagues also have the highest ATT average.
In the DIST histograms, we notice some outliers in
Spanish La Liga, which are due to 2 clubs from the
Canary Islands (Las Palmas and Tenerife), that are quite
far from Mainland Spain.
We apply a Bayesian linear regression Model 9 to the
football data. The obtained posterior distributions of the
coefficients are displayed in Figure 16, the probabilities
of coefficients being larger than 0 are presented in
Table 10. Similar to basketball, the distributions for
β RBIAS
are heavily on the negative side, which means
it is quite probable thatRBIAS has a negative effect on
HA
GAME
. This holds for all the leagues, but it is the
most prominent in La Liga, which also has the lowest
average ofµ RBIAS
. Once again, a negative influence of
RBIAS is something we did not expect and should be
investigated in the future.
We can be quite confident thatβ ATT
is positive and
thatATT has a positive correlation withHA
GAME
. In
Serie A the probability of β ATT
being larger than 0 is
the lowest – P(β ATT
> 0) = 0.862± 0.008, while in
other leagues, this probability is above 0.97.
There is no clear answer when deciding whether
DIST has a positive or negative effect on HA
GAME
.
The distributions ofβ DIST
for Bundesliga and La Liga
are distributed around 0. Their respective probabilities
P(β DIST
> 0) are 0.345 and 0.314, suggesting that
DIST has likely no effect in these two leagues. In Ligue
1 the effect of DIST is very unlikely to be positive –
P(β DIST
> 0) = 0.081. In the Premier League and
Serie A, DIST seems to have a positive correlation
with HA
GAME
(with probabilities 0.93 and 0.992).
Finally, let’s look at the posteriors of the intercepts
and check how well the factors predict the HA
GAME
.
The results for the Premier League seem to be the
most promising. The mean of intercept distribution is
0.046, suggesting that when setting the factors to 0, the
predicted average of HA
GAME
is 0.046. In Table 10,
we also observe that P(α > 0) for Premier League is
not significant – 0.722. Hence, only a small amount of
home advantage in the Premier League can be attributed
to the factors we did not consider in the analysis. The
difference between the average of HA
GAME
and the
average of the intercept (0.346 vs. 0.054) is prominent in
the Premier League. However, it is less prominent in the
other leagues – Ligue 1 (0.344 vs. 0.241), Bundesliga
(0.355 vs. 0.248), Serie A (0.311 vs. 0.153), La Liga
(0.429 vs. 0.327). In these leagues, the models explain
some part of the home advantage with the used factors,
but a large part of it happens due to the other factors
that were not included in our research.
4 DISCUSSION
In this work, we analysed the home advantage in basket-
ball and football by picking five different professional
leagues in each sport. Since we did not find any com-
prehensive datasets, we first had to collect the required
data, which was done by using different web scraping
libraries. In the process, we were unsure as to what
information would be needed for further analysis. As a
result, a large amount of obtained data ended up unused.
The cleaned and consistently preprocessed datasets are
now published in a public repository. As such, they are
available to anyone who wants to use them for their
research.
Once we collected all the necessary data, we analysed
the level of the home advantage. The analysis was done
with hierarchical Bayesian models that helped us to infer
the parameters describing the distributions of used home
advantage metrics. In both sports, we used two different
metrics. The first one is often used in the related work,
while the second one was proposed in this work. The
newly proposed metric HA
GAME
aims to quantify the
home advantage on the level of a single game, while the
other metric can only yield a score based on a group of
games. The interpretation of the HA
GAME
metric is
such that it quantifies the number of goals (in football)
or points (in basketball) that the home team scored above
their expected number.
The results for basketball show that the ABA league
has the highest average of µ HA
GAME
with 4.4, while
the NBA league has the lowest average with 2.7. We
also analysed the values ofµ HA
GAME
over the seasons,
where we found out that they are fairly consistent over
time with a slight decrease in the 2020/2021 season,
which was the season that was highly affected by the
COVID-19 outbreak.
The football results show that Serie A has the lowest
average µ HA
GAME
(0.311), not far behind are Ligue
1, Bundesliga, and Premier League, while La Liga has
a significantly larger average of 0.429. Compared to
basketball, the football distributions over the seasons
are much more overlapping, which suggests that the
differences between the leagues are lower. The decrease
of µ HA
GAME
in the 2020/2021 season is noticeable in
football as well.
In the second part of our analysis, we chose three
factors (referee bias, crowd impact, and travel fatigue),
which we hypothesised might have an impact on the
home advantage. Since we could not measure the fac-
tors directly, we created proxy variables to serve in
their place. For referee bias, we proposed the variable
RBIAS that uses the number of foul calls, for crowd
impact, we used ATT , which is a number that tells
how packed the venue was, and for travel fatigue we
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 181
Figure 15.: Histograms for ATT and DIST for football leagues. The black dashed lines represent the mean of
the distributions. The Bundesliga and Premier League have the highest ATT values. In the right part, we notice
the outliers in the DIST histogram for La Liga, which are due to the clubs from the Canary Islands.
P(β RBIAS
> 0) P(β ATT
> 0) P(β DIST
> 0) α P (α> 0)
FRA 0.086± 0.006 0.999± 0.001 0.081± 0.005 0.241± 0.002 0.999± 0.001
GER 0.149± 0.007 0.972± 0.004 0.345± 0.01 0.248± 0.002 0.998± 0.001
ITA 0.018± 0.003 0.862± 0.008 0.992± 0.002 0.153± 0.001 0.994± 0.002
SPA ≈ 0 0.978± 0.003 0.314± 0.009 0.327± 0.002 ≈ 1
ENG 0.01± 0.002 ≈ 1 0.93± 0.005 0.046± 0.002 0.722± 0.011
Table 10.: Probabilities that linear regression coefficients are positive for football leagues. Low probabilities
forP(β RBIAS
> 0) suggest thatRBIAS is likely to have a negative effect onHA
GAME
, while high probabilities
for P(β ATT
> 0) suggest a positive effect of ATT .
used DIST , the air distance between the home towns
of the teams. To quantify the effect of these variables
on HA
GAME
, we used a Bayesian linear regression.
The results for basketball suggest thatATT has a pos-
itive effect on the home advantage in general, whereas
for DIST we cannot claim with high probability that
it has a positive or a negative effect. In three basketball
leagues, RBIAS is likely to have a negative effect on
home advantage, which is a bit surprising and motivates
us to investigate this in future research. While the
182 TIMUR KULENOVI
´
C, JURE DEM
ˇ
SAR
Figure 16.: Posterior distributions for the β coefficients and the intercept for football leagues The red dashed
line is at 0 to see which variables have a positive or a negative impact on HA
GAME
. In general RBIAS has
somewhat negative impact on HA
GAME
, ATT has positive impact, while the DIST results are mixed across the
leagues.
variables have some correlation with home advantage,
there certainly are other impactful factors that were not
taken into account.
For football, we reached similar conclusions. We are
fairly confident that ATT has a positive correlation
with home advantage, whereas the results suggest that
RBIAS has a negative correlation. The effects of
DIST depend on the league. For the home advantage
in the Premier League, the used factors seem fairly vital
because they can explain most of HA
GAME
. This is,
however, not the case with the other four leagues, where
some part of the home advantage seems to be influenced
by other factors.
We found out that the chosen factors had some
correlation with the home advantage but also that there
exist other factors that seem to have a substaintail impact
on the home advantage. Therefore, one thing that could
be improved in future work would be to include other
potential factors. We observed that RBIAS often had
a negative correlation with home advantage. In future
work, we should investigate why this occurs and pos-
sibly update the methodology of obtaining the variable
that serves in place of the referee bias.
A QUANTITATIVE ANALYSIS OF HOME ADV ANTAGE IN TOP FOOTBALL AND BASKETBALL LEAGUES 183
FUNDING
This work was partially funded by the Cognitive control
beyond executive functions research project (Slovenian
Research and Innovation Agency, J5-4590) and the
Physiological mechanisms of neurological disorers and
diseases research programme (Slovenian Research and
Innovation Agency, P3-0338).
REFERENCES
[1] R. Pollard, “Home advantage in football: A current review of
an unsolved puzzle,” The open sports sciences journal, vol. 1,
no. 1, 2008.
[2] R. Pollard, “Home advantage in soccer: A retrospective analysis,”
Journal of sports sciences, vol. 4, no. 3, pp. 237–248, 1986.
[3] W. S. Leite, “Home advantage: Comparison between the major
european football leagues,” Athens Journal of Sports, vol. 4,
no. 1, pp. 65–74, 2017.
[4] T. Peeters and J. C. van Ours, “Seasonal home advantage
in english professional football; 1974–2018,” De Economist,
vol. 169, no. 1, pp. 107–126, 2021.
[5] P. Marek and F. V´ avra, “Comparison of home advantage in
european football leagues,” Risks, vol. 8, no. 3, p. 87, 2020.
[6] R. Pollard and V . Armatas, “Factors affecting home advantage
in football world cup qualification,” International Journal of
Performance Analysis in Sport, vol. 17, no. 1-2, pp. 121–135,
2017.
[7] N. Van Damme and S. Baert, “Home advantage in european
international soccer: Which dimension of distance matters?,”
Economics, vol. 13, no. 1, 2019.
[8] M. Ponzo and V . Scoppa, “Does the home advantage depend on
crowd support? evidence from same-stadium derbies,” Journal
of Sports Economics, vol. 19, no. 4, pp. 562–582, 2018.
[9] C. Goumas, “Home advantage and referee bias in european
football,” European journal of sport science, vol. 14, no. sup1,
pp. S243–S249, 2014.
[10] R. H. Boyko, A. R. Boyko, and M. G. Boyko, “Referee bias
contributes to home advantage in english premiership football,”
Journal of sports sciences, vol. 25, no. 11, pp. 1185–1194, 2007.
[11] C. J. Boudreaux, S. D. Sanders, and B. Walia, “A natural
experiment to determine the crowd effect upon home court
advantage,” Journal of Sports Economics, vol. 18, no. 7, pp. 737–
749, 2017.
[12] F. Sors, D. Tom´ e Lourido, V . Parisi, I. Santoro, A. Galmonte,
T. Agostini, and M. Murgia, “Pressing crowd noise impairs
the ability of anxious basketball referees to discriminate fouls,”
Frontiers in psychology, vol. 10, p. 2380, 2019.
[13] M. A. G´ omez and R. Pollard, “Reduced home advantage for
basketball teams from capital cities in europe,” european Journal
of sport science, vol. 11, no. 2, pp. 143–148, 2011.
[14] R. Pollard, J. Prieto, and M.-
´
A. G´ omez, “Global differences in
home advantage by country, sport and sex,” International Journal
of Performance Analysis in Sport, vol. 17, no. 4, pp. 586–599,
2017.
[15] M. Tilp and S. Thaller, “Covid-19 has turned home-advantage
into home-disadvantage in the german soccer bundesliga,” Fron-
tiers in sports and active living, vol. 2, p. 165, 2020.
[16] M. C. Leitner and F. Richlan, “No fans–no pressure: Referees in
professional football during the covid-19 pandemic,” Frontiers
in Sports and Active Living, p. 221, 2021.
[17] F. Sors, M. Grassi, T. Agostini, and M. Murgia, “The sound
of silence in association football: Home advantage and referee
bias decrease in matches played without spectators,” European
journal of sport science, pp. 1–9, 2020.
[18] K. Fischer and J. Haucap, “Does crowd support drive the
home advantage in professional soccer? evidence from german
ghost games during the covid-19 pandemic,” Journal of Sports
Economics, 2020.
[19] “Selenium.” https://selenium-python.readthedocs.io/index.html,
2023.
[20] “Requests.” https://requests.readthedocs.io/en/latest/, 2023.
[21] L. Richardson, “Beautiful soup documentation,” April, 2007.
[22] “Basketball dataset.” https://github.com/timurkulenovic/
basketball-dataset, 2023.
[23] “Football dataset.” https://github.com/timurkulenovic/
football-dataset, 2023.
[24] B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich,
M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell, “Stan
: A probabilistic programming language,” Journal of Statistical
Software, vol. 76, 01 2017.
Jure Demˇ sar received his PhD at the Faculty of Computer and
Information Science, University of Ljubljana in 2017. He is currently
an assistant professor at the same faculty. He is also partially employed
at the Department of Psychology at the University of Ljubljana, where
he serves as a senior researcher on neuroscience oriented projects. His
research interests lie in neuroscience, Bayesian statistics and machine
learning.
Timur Kulenovi´ c graduated from the University of Ljubljana, Faculty
of Computer and Information Science and is currently a data scientist
at Sportradar. His research interests lie in analyses of data in sport.