AdvancesinMethodologyandStatistics/Metodološkizvezki,Vol.18,No.2,2021,53–71
https://doi.org/10.51936/ltgt2135
PowercomparisonofANOVAandKruskal–Wallistests
whenerrorassumptionsareviolated
FelixN.Nwobi
∗
,FelixC.Akanno
ImoStateUniversity,DepartmentofStatistics,Owerri,Nigeria
Abstract
Theeffectsoftheviolationsofnormalityandhomogeneityofvariancesassumptionsonthe
poweroftheone-wayANOVAF-testisstudiedinthispaper. Simulationexperimentswere
conductedtocomparethepoweroftheparametricF-testwiththenon-parametricKruskal–
Wallis(KW)testinnormal/non-normal,equal/unequalvariancesscenariosandequal/unequal
sample group means. Each of these 184 simulation experiments was replicated N =1000
timesandpowerobtainedforbothF andKW−tests. TheShapiro–Wilk’stestfornormality
and Bartlett’s/Levene’s tests for homogeneity of variances was conducted in each experiment.
Results show that the power of the KW tests outperformed those of the F-tests in the 92
(85/92)non-normalcases. AlthoughthepoweroftheF-testsishigherthanthoseoftheKW
testsin85outofthe92experimentsundernormalityassumptions,thesedifferences,inall
casesinthisstudyarenotsigniﬁcant(p>0.05)usingbotht andsigntests. Basedonthese
results,thisstudyfavourstheKWtestasamorerobusttestandsafertouseratherthanthe
F-testespeciallywhenthedistributionalassumptionsofdatasetsareindoubt.
Keywords: ANOVA,Homogeneitytest,Kruskal–Wallistest,Normalitytest,Power
comparison
1. Introduction
Theone-wayanalysisofvariance(ANOVA)isoneofthemostpopularstatisticalmethods
forthecomparisonoftreatmentmeansfromcompletelyrandomized designed(CRD)experi-
mentssinceitsintroductionintothestatisticsliteraturefromthe1920sbyR.A.Fisher. Asa
classicalstatisticalmethod,twoofthemajorrequirementsofthemethodtoproduceoptimal
resultsare: (i)datasetbe normallydistributed,and(ii)groupvariancesbehomogeneous. In
real-lifesituations,datasetsarenotoftennormallydistributedandgroupvariancesunequal
makingtheseassumptionsalwaysunattainable. However,manyappliedresearcherssuchas
those in the ﬁelds of business, economics, the social sciences, etc., go ahead to apply the
methodbecausetheyareeithernotawareoftheserestrictionsorignorantoftheseriousness
oftheirviolations.
∗
Correspondingauthor
Emailaddresses: fnnwobi@imsu.edu.ng(FelixN.Nwobi),felixtemple@yahoo.com(FelixC.
Akanno)
54 NwobiandAkanno
To circumvent the effects of these assumptions, Kruskal and Wallis (1952) introduced
a non-parametric version of the analysis of means by ranks as an alternative to ANOVA’s
F-test. The remarkable thing about Kruskal–Wallis (KW) test is that the assumptions are
milderthan thoseoftheparametricF-test. Effectsofviolations ofhomogeneityofvariances,
non-normality of group means or both on Type I error rate are available (see, for example,
Kutner et al., 2005; Legendre & Borcard, 2008; Marcinko, 2014; Moder, 2007, 2010). A
good review of the effect of non-normality on the robustness of the F-test can be found in
BlancaMenaetal.(2017),especiallywhendegreesofskewnessandkurtosisrangingfrom
−1to1areconsidered.
To compare these methods, Hecke (2012) whose simulation experiments were based
on permutation to determine the power of both tests, observed a higher power in KW as
comparedtotheclassicalF-testinthecaseofnon-symmetricaldistributions. Lachenbruch
andClements(1991) haddemonstratedthatthe KWtestmayhavegreater powerthantheF-
test when the population distributions are not normal. They further argued that in comparison
withF-test,theKWtestismorerobustagainstthedeparturesfromassumptionsofequalityof
variance. TheresearchcarriedoutbyGlass etal.(1972)focusedonthepowersof theF-test
andKWtestwhenthepopulationofinterestisskewed. Theyobservedthatnon-normality
hassomeeffectontheTypeIerror,buttheminimaleffectwhenthevariancesareequal. Fora
completelyrandomizedﬁxed-effectmodelofdatawithbinomialerrors, theF-testbehavedin
general, betterthan theKWtest, controlling thenominal level of signiﬁcanceand presenting
higherpower(Ferreiraetal.,2012).
From the foregoing, opinions of researchers are divided on the robustness of the classical
F-test in theanalysis of data sets from completely randomized experiments. In this study, we
useMonteCarlosimulationstoinvestigatetheeffectsofviolationsoftheseassumptionson
the power of theF-test and KW test under various scenarios, e.g., unequal means and sample
sizes.
2. Twocompetingtests
2.1. TheANOVAF-test
TheANOVAtestisapowerfulstatisticaltoolfortestsofequalityofagroupmeans. By
usingFishernotation,aone-wayANOVAmodelmayberepresentedmathematicallyas
Y
ij
=µ +α
i
+ε
ij
wherei=1,...,k, j=1,...,n
i
,Y
ij
istheyieldfromthe j-thobservationati-thtreatment,µ isthegeneralmeaneffectgivenby
µ =
k
∑
i=1
µ i
α
i
/n
andα
i
istheﬁxed/randomeffectduetothei-thtreatment. Thismeansthatiftherewereno
treatment differences and no chance causes, then the yield of each observation will beµ . The
α
i
whichistheeffectofthei-thtreatmentisgivenby
α
i
=µ i
−µ .
Thereforethei-thtreatmentincreases ordecreasesthe yieldbyanamountα
i
. Thetwo basic
assumptionsofthismodelare(i)thatthedatasetisnormallydistributed,Y ∼N(µ ,σ
2
)and
ε
i
∼N(0,σ
2
ε
)and(ii)groupvariancesareequal. Thetesthypothesisisthereforestatedas
H
0
: µ 1
=µ 2
=···=µ k
,
PowercomparisonofANOVAandKruskal–Wallistests... 55
for
F =
MS
tr
MS
e
(2.1)
whereF istheteststatistic,MS
tr
andMS
e
aretreatmentmeansquaresanderrormeansquares
respectively. BasedonEquation(2.1),H
0
isrejectedforagivenα ifF >F
k−1,k(n−1);α
.
2.2. TheKruskal–Wallistest
The KW test as a non-parametric alternative to the one-way ANOVA assumes that
observationsineachgroupcomefromapopulationwiththesameshapeofthedistribution. It
becomesaproblem when trulytheobservations arenotcoming fromthepopulationwith the
sameshape. Thenullhypothesisassociatedwiththistestisgivenby
H
0
: η
1
=η
2
=···=η
k
whereη
i
is the median of thei-th group. This is equivalent toH
0
: Samples are from identical
populations. Letndenotethetotalnumberofobservationsn=∑
k
i=1
n
i
wheren
i
isthesize
ofthei-thsample,i=1,2,...,k andk isthenumberofgroups. Rankthenobservationsin
eitherascending ordescendingorder ofmagnitudeanduseaverage ranks when thereare ties.
Let R(X
ij
) represent the rank assigned to the j-th observation from the i-th group, X
ij
and
R
i
representthesumoftheranksassignedtothei-thgroupR
i
=∑
n
i
i=1
R(X
ij
),i=1,2,...,k.
DeﬁnetheteststatisticT as
T =
1
S
2
 
k
∑
i=1
R
i
2
n
i
−
n(n+2)
2
4
!
where
S
2
=
1
n−1
 
∑
allrank
R(X
ij
)
2
−n
(n−1)
2
4
!
.
Iftherearenoties,S
2
simpliﬁeston(n+1)/12andtheteststatisticreducesto
T =
12
n(N+1)
k
∑
i=1
R
2
i
n
i
−3(n+1).
Underthenullhypothesis,H
0
(Lehmann,2006),T isasymptoticallychi-squaredistributed
withk−1degreesoffreedom,i.e.,T ∼χ
2
k−1
.
3. Methodology
3.1. Thepowerofatest
Intestingtheequalityofgroupmeansofadatasetbywhatevermethod,theresearcher
willbeinterestedinthecorrectnessorotherwiseoftheoutcomeofthetesthypothesis. The
outcome is interpreted using a p-value, which is the probability of observing the result with a
speciﬁedlevelofsigniﬁcance,α,giventhatH
0
istrue. Whenanexperimentererroneously
acceptsaH
0
whenH
1
istrue,thenthatexperimenterhascommittedaTypeIIerrorstatedas
P(AcceptH
0
|H
1
istrue)=β. Thendeﬁneastatisticalpowerofatestastheprobabilitythat
thetestcorrectlyrejectsH
0
when aspeciﬁed alternative istrue, i.e.,Power=P(RejectH
0
|H
1
istrue)=1−β.
Power isinﬂuencedmainlyby thechosensigniﬁcantlevel of thetest and thesamplesize.
Sincepowerisstatedintermsofprobability,itsvalueiswithintherange0≤P≥1,therefore
withtwomethodstestingagivendataset,themethodwithhigherPwillbeconsideredabetter
method. Therefore, a method with a lower power has a higher risk of committing Type II
errors(ofacceptinganullhypothesiswhenindeedthealternativeistrue).
56 NwobiandAkanno
3.2. Testsonassumptions
Testsonassumptionswillbeconductedoneachsimulationexperiment. Forthenormality
assumption, the Shapiro–WilksW test will be implemented. The hypothesis of interest isH
0
:
Dataarenormallydistributed,versusthealternative,H
1
: Dataarenotnormallydistributed.
For the homogeneity of group variances assumption, the Bartlett’sK
2
test will be applied
whendataareassumednormalornear-normalwhiletheLevene’sLtestwillbeusedwhen
datasetiseitherskewedorfornon-normaldata. TheLevene’stest,unliketheBartlett’s,is
knownto be less sensitive to departures from normality. In thesetwocases, thehypothesis is
givenby
H
0
: σ
2
i
=σ
2
j
H
0
: σ
2
i
̸=σ
2
j
.
Forthesethreetestsinthissubsection,H
0
willberejectedif p<α;if p>α,thentheH
0
cannot be rejected and conclude that the group variances are equal. The reader is referred
to Sahai and Ageel (2000, pp. 93–107) for Shapiro–Wilk’s, Bartlett’s and Levene’s tests
respectively.
3.3. Testsfordifferences
3.3.1. Thet-test
Eachdatasetgeneratedwillbeanalyzedusingtwomethods,theF-testandtheKWtest
each of them independently reporting the power of the test. To compare the powers from
these methods, a pairedt-test is considered to determine whether the difference between their
powersissigniﬁcantlydifferentfromzero.
Let A
i
and B
i
, i=1,2,...,n denote the power of the F-test and the KW test on the i-th
experiment respectively. Further, letd
i
=A
i
−B
i
, be assumed to be identically distributed, all
withthe sameexpectedpopulationmean valuesµ d
andvariancesσ
2
d
. The hypothesesforthis
testaregivenasfollows
H
0
: µ d
=0 vs. H
1
: µ d
̸=0.
Thet-statisticforthistestis
T
n
=
¯
d
s/
√
n
∼t
(n−1;α)
where
¯
d and sarethemeanandstandarddeviationrespectivelyofthe d
i
. Rejectionofthis
null hypothesis atα level of signiﬁcance will lead to the conclusion that the power of the
F-testissigniﬁcantly differentfromthepowerofthe KWtest. T-testswillbeperformedfor
alltheexperiments.
3.3.2. TheSigntestS
A nonparametric sign test will then be used to determine if the number of F-test with
positivepowerdifferencesissigniﬁcantlygreaterthanorequaltothenumberofKWtests
withnegativepowerdifferences.
TakingS
+
to be the number of positive differences (+) in favour ofF-tests out ofm pairs,
thenthenullhypothesisofinterestis
H
0
: P(S
+
)=P(S
−
)=0.5
H
1
: P(S
+
)<P(S
−
)
P(S≤q|p=0.5)=α
PowercomparisonofANOVAandKruskal–Wallistests... 57
whereS is thenumber (S=
S
+
m
∗
, wherem
∗
=S
+
+S
−
)ofF-test withpositivedifferences,qis
thecriticalvalueandα thelevelofsigniﬁcance. IfP(S
+
)≤q,thentheF-test,thenH
0
will
be rejected and conclude that F-test with positive differences is signiﬁcantly less than the
numberofKWtestpowerwithnegativedifferences.
4. Simulationstudies
4.1. Thecriteria
AMonteCarlosimulationisimplementedtoaccesstheperformancesofboththeone-way
ANOVAandKWmethodsforthefollowingscenariosarelistedinTable1. Thefollowing
setup and conditions have been deﬁned for purposes of clarity (eight scenarios for normal
andeightscenariosfornon-normaldistributions)andreproducibility. Allsimulationswere
carried out usingR software and plots withMATLAB. Each of the 194 experiments were
replicatedN=1000times.
Table 1: The 16 scenarios for both normal and non-
normaldistributions
S/N n µ σ
1 Equal Equal Equal
2 Equal Equal Unequal
3 Equal Unequal Equal
4 Equal Unequal Unequal
5 Unequal Equal Equal
6 Unequal Equal Unequal
7 Unequal Unequal Equal
8 Unequal Unequal Unequal
Note: S/N=simulationscenario,n=samplesize,µ =mean,
σ =standarddeviation
4.1.1. Balanced/unbalanceddesign
Inthisstudyn
i
(i=1,2,3)wasdecideduponforconveniencewithoutlossofgenerality.
Bydeﬁnition,acompletelyrandomizeddesignissaidtobebalancedifthegroupsizesare
equal, i.e., n=n
1
=n
2
=n
3
. In this work balanced data is taken to mean n=n
1
=n
2
=
n
3
=5,10,15,...,60. Intheunbalancedcases,thetotalsamplesizeisgivenbythesumof
allgroupsizes,i.e.,n=n
1
+n
2
+n
3
. Forexample,n
1
=3,n
2
=3,n
3
=4son=10. Or,if
n
1
=13,n
2
=23,n
3
=24thenn=60.
4.1.2. Equal/unequalgroupmeans
Inthesimulateddatasets,equalmeansaretakentobeµ =µ 1
=µ 2
=µ 3
=8andunequal
meansunderstoodtomeanthevectorµ =(8,9,11)=(µ 1
,µ 2
,µ 3
).
4.1.3. Homogeneity/heterogeneityofvariances
For homogeneity of group variances, each group standard deviation (SD)σ is taken to be
σ =σ
1
=σ
2
=σ
3
=5. Howeverintheheterogeneityscenario,thevectorσ =(σ
1
,σ
2
,σ
3
)=
(5.3,8.5,11.3).
58 NwobiandAkanno
4.1.4. Normal/non-normal
In all simulation experiments, equal or unequal sample sizes, equal or unequal group
meansandequalorunequalgroupvarianceswereconsidered. Theeightsimulationstudies
carried out were based on the assumption of normality of data sets so generated. For the
non-normalsimulations,twoscenarioswereconsidered:
1. Forequalnthemultivariatenon-normaldistributeddatasetsweregeneratedwiththe
mvnonnorm()functioninthesemToolsandMASSpackagesoftheRsoftwarebasedon
Vale and Maurelli(1983) method. Theparameters ofthisfunctioninclude, amongoth-
ers, the variance-covariance matrix V = (v
11
,v
22
,v
33
,v
21
,v
31
,v
32
), skewness
Sk=(s
1
=s
2
=s
3
=1.5),andkurtosisK=(k
1
=k
2
=k
3
=3.5)vectors. Whilethe
skewnessandkurtosisparameterswerekeptconstant,V variedaccordingasequalor
unequalvariances;V =(v
11
=5.3,v
22
=8.5,v
33
=11.3,v
21
=2,v
31
=1,v
32
=2)for
unequalanddiag(V)=(v
11
=v
22
=v
33
=8)forequalvariances.
2. Forunequal n,thelog-normaldistributiondatasetswerealsogeneratedforequalor
unequal means; meanlog = µ 1
=µ 2
=µ 3
=8 or meanlog = (8,9,11), and standard
deviation,sdlog =(5,5,5)or(5.3,8.5,11.3)respectively.
4.2. Thealgorithm
Thealgorithmofthesimulationexperimentcanbedepictedinthefollowingsteps:
1. GeneratethreerandomsamplesinaccordancewithSections4.1.1–4.1.4.
2. RunboththeANOVAandKWtestsontheindependentgroupssimulatedinstep1at
α =0.05levelofsigniﬁcance.
3. Calculatethe p-valuesfromthetests.
4. Repeatsteps1–31000times.
5. Calculate the probability of rejecting the null hypothesis when it is true (i.e., Type I
error).
6. ComputepowerbyobtainingtheproportionofsimulationrunsthatrejectedH
0
.
5. Resultsanddiscussions
Results of the simulation studies carriedout in Section 4above arepresented in Tables 2–
17. Eachofthesetablesconsistsofninecolumns,theﬁrstofwhichisthesamplesize,n,an
integer for equal sample sizes and a vector with three elements in the case of unbalanced
design. The computed power values corresponding to the respective sample sizes for the
F-testandKWtestaregivenincolumnstwoandthree. TheteststatisticW fortheShapiro–
Wilk’s test for normality and its corresponding p-values are in columns 4–6. Similarly,
valuesdisplayed incolumns7–9areresults fortheK
2
statisticfrom the Bartlett’sK
2
test for
homogeneityofvarianceswhenthedatasetunderconsiderationisassumednormal,otherwise
theLeveneL-testwasused. Thehypothesesweretrue(T)orfalse(F)accordingtothetest
resultsarereﬂectedinthe p-value.
From Table 2, 12 experiments were performed and each replicated 1000 times. At the
endofeachexperiment,theF-testandKW-testwereconductedandtheirrespectivepowers
obtained. Similarly, the Shapiro–Wilk’s W statistic for normality test and Bartlett’s K
2
statisticandLevene’sL statisticfortestsforequalityofvarianceswereextracted. Following
theShapiro–Wilk’stestondatasetwhenn=5,µ =8,σ =5inTable2,weassumethatthe
datasetisnormallydistributed(W =0.978,p>0.05(0.955)). Similarly,theBartlett’steston
thesamedatasetconﬁrmthehomogeneity(K
2
=0.307,p>0.05(0.858))ofgroupvariances
ofthedataset.
PowercomparisonofANOVAandKruskal–Wallistests... 59
For equal means, variances and sample sizes, the normality and equality of variances
assumptions were maintained (or nearly so) in all 12 experiments displayed in Table 2. It
wasobservedthatthepoweroftheF-testwashigherthanthatoftheKWtestineightoutof
12experiments(Table2andFigure1(a)). Similarly,resultsofpoweranalysesdisplayedin
Table3showthatF-testwasslightlybetterthantheKWtestonlyinﬁveexperiments,whilethe
KWtestperformedbetterinﬁveexperimentsandbothteststiedintwoofthe12experiments.
This is displayed in Figure 1(b). This poor performance of the F-test may have been due
totheviolationoftheequalvarianceassumptionwhereσ
i
=(σ
1
,σ
2
,σ
3
)=(5.3,8.5,11.3).
F-test performedbetter in all12 experiments in Table 4 asthe group sample sizes were equal
andequalityofvariancesassumptionswererespected. Insituationsofunequalgroupmeans,
thepowerofbothtestsshowedapositivetrendwithincreasingsamplesizes(panels(c)and
(d)of Figures1–4). Increasing sample size, however,didnotinﬂuencethe power ofF-testin
thelog-normalsituations. Regardlessofgroupsizesandwhetherornotthevarianceswere
equal, the two tests under comparison demonstrated nearly identical power except in the
log-normalscenarioswhentheKWmaintaineddominance(Figure4). Similarobservations
couldberenderedtoresultspresentedinTables6–9,anddisplayedgraphicallyinFigures1
and2wherenormaldistributionswereassumed.
The situation in non-normal scenarios is of interest in this work. Except for results
in Table 10, KW test performed better than the F-test (Tables 11–14), but in multivariate
non-normal cases, both tests showed indications of asymptotic convergence when sample
sizesareequalandthemeansareunequal(Tables12and13,andFigure3inpanels(c)and
(d)). ThesuperiorperformanceofthepoweroftheKWtestovertheF-testinanon-normal
scenarioisdemonstratedunderthelognormaldistribution(Tables14–17,andFigure4).
These results are in tandem with the trends in research involving parametric and non-
parametricstatistics(BlancaMenaetal.,2017;Sawilowskyetal.,1989)wherenon-parametric
testsarelesspowerfulthanparametrictestsbutsuchpowergapissmall. Ontheotherhand,
these authors observed that the power advantage of the non-parametric tests under conditions
ofnon-normalitycanbedramatic.
Further analysis of the values of power of the F-test and the KW test were performed
using data in Tables 2–17 and displayed in Table 18. The negative values in thet-statistic
column indicatescenarios where the power ofthe KW is higherthan theF-test,and positive
otherwise. Thecorresponding p-valuesshowthesigniﬁcanceofthedifferences(p<0.05);
the hypothesis of no difference in power are in 11 out of the 16 tests especially when data
setswereassumedtobenormallydistributed.
The result of the t-test that was carried out to see if the power of the F-test is indeed
higher than that of the KW test using values in columns 3 and 4 of Table 18 showed that
with11degreesoffreedomforTables2–5andTables10–13and10degreesoffreedomfor
thosewith11experiments,thet-testsrejectedthehypothesisofnodifference(p<0.05)only
in5outof16tests. TheKWtestperformedaboutthreetimesbetterthantheconventional
F-test. Sincethe p-valuesareverysmall(p<0.001)intheﬁvecases,thereareverysmall
probabilityoftheseresultsoccurringbychance.
Similarly,thenon-parametricsigntestresultsasdisplayed(Table18)wherethestatistic
S (showninbold)isthenumberofpositivedifferenceswhereF-testperformedbetterthan
KWtest. Forinstance,8/12isunderstoodtomean F-testishigherinpowerthanKWin8
outof12experimentsinTable2andthatthedifferenceisstatisticallydifferent(p>0.05).
The values ofS in thistable show also thatKW test outperformedtheF-test, (p<0.05). The
overallresultshowsthattheF-testisbetterinonly74outof184experiments.
60 NwobiandAkanno
Table2: Normal: Equaln,µ =8,σ =5,12experiments
Power Normality Homogeneity
n F KW W p H
0
K
2
p H
0
5 0.045 0.042 0.978 0.955 T 0.307 0.858 T
10 0.058 0.054 0.970 0.532 T 2.403 0.297 T
15 0.054 0.032 0.977 0.519 T 1.807 0.405 T
20 0.048 0.047 0.981 0.477 T 1.519 0.468 T
25 0.049 0.054 0.978 0.229 T 0.366 0.833 T
30 0.049 0.046 0.990 0.697 T 2.191 0.334 T
35 0.056 0.062 0.983 0.192 T 1.310 0.520 T
40 0.038 0.042 0.982 0.109 T 5.325 0.070 T
45 0.048 0.046 0.988 0.312 T 4.169 0.124 T
50 0.053 0.057 0.996 0.941 T 0.537 0.765 T
55 0.047 0.041 0.997 0.969 T 0.018 0.991 T
60 0.041 0.036 0.987 0.101 T 0.221 0.895 T
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p =
p-value
Table3: Normal: Equaln,µ =8,σ =(5.3,8.5,11.3),12experi-
ments
Power Normality Homogeneity
n F KW W p H
0
K
2
p H
0
5 0.048 0.041 0.959 0.678 T 2.499 0.287 T
10 0.051 0.052 0.982 0.865 T 0.118 0.943 T
15 0.044 0.046 0.978 0.540 T 2.436 0.296 T
20 0.058 0.044 0.962 0.059 T 12.998 0.002 F
25 0.057 0.058 0.964 0.032 F 8.874 0.012 F
30 0.047 0.043 0.979 0.154 T 15.743 0.000 F
35 0.046 0.046 0.991 0.704 T 3.976 0.137 T
40 0.051 0.058 0.961 0.002 F 18.580 0.000 F
45 0.058 0.052 0.979 0.034 F 26.686 0.000 F
50 0.045 0.050 0.991 0.414 T 28.913 0.000 F
55 0.065 0.060 0.987 0.123 T 24.962 0.000 F
60 0.058 0.061 0.988 0.114 T 24.053 0.000 F
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-
value
PowercomparisonofANOVAandKruskal–Wallistests... 61
Table4: Normal: Equaln,µ =(8,9,11),σ =5, 12 experiments
Power Normality Homogeneity
n F KW W p H
0
K
2
p H
0
5 0.103 0.089 0.958 0.658 T 0.444 0.801 T
10 0.189 0.165 0.948 0.150 T 6.057 0.048 F
15 0.283 0.271 0.952 0.063 T 2.753 0.252 T
20 0.382 0.360 0.986 0.696 T 0.204 0.903 T
25 0.474 0.435 0.994 0.982 T 3.725 0.155 T
30 0.559 0.535 0.979 0.165 T 2.832 0.243 T
35 0.615 0.585 0.994 0.919 T 2.434 0.296 T
40 0.680 0.665 0.988 0.368 T 1.203 0.548 T
45 0.704 0.688 0.991 0.495 T 0.486 0.784 T
50 0.767 0.750 0.997 0.981 T 1.577 0.455 T
55 0.818 0.796 0.988 0.187 T 1.529 0.466 T
60 0.861 0.846 0.991 0.323 T 0.091 0.956 T
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p =
p-value
Table 5: Normal: Equal n, µ =(8,9,11), σ =(5.3,8.5,11.3),
12experiments
Power Normality Homogeneity
n F KW W p H
0
K
2
p H
0
5 0.072 0.067 0.941 0.390 T 1.262 0.532 T
10 0.096 0.082 0.985 0.934 T 2.672 0.263 T
15 0.128 0.111 0.982 0.710 T 4.990 0.082 T
20 0.145 0.130 0.983 0.551 T 8.523 0.014 F
25 0.169 0.162 0.971 0.081 T 28.763 0.000 F
30 0.200 0.196 0.963 0.012 F 12.461 0.002 F
35 0.239 0.212 0.988 0.448 T 3.879 0.144 T
40 0.283 0.253 0.967 0.035 F 24.372 0.000 F
45 0.269 0.257 0.991 0.557 T 10.809 0.004 F
50 0.324 0.306 0.990 0.354 T 20.851 0.000 F
55 0.350 0.328 0.991 0.367 T 30.040 0.000 F
60 0.344 0.323 0.989 0.161 T 28.190 0.000 F
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p =
p-value
62 NwobiandAkanno
Table6: Normal: Unequaln,µ =8,σ =5,11experiments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
3,3,4 0.052 0.026 0.896 0.198 T 2.198 0.333 T
4,5,6 0.043 0.037 0.923 0.211 T 6.214 0.045 F
5,7,8 0.043 0.043 0.954 0.432 T 1.940 0.379 T
6,9,10 0.035 0.029 0.964 0.509 T 0.529 0.767 T
7,11,12 0.060 0.056 0.940 0.092 T 1.295 0.523 T
8,13,14 0.053 0.045 0.954 0.152 T 0.021 0.990 T
9,15,16 0.054 0.046 0.961 0.177 T 0.015 0.992 T
10,17,18 0.054 0.051 0.968 0.249 T 1.105 0.576 T
11,19,20 0.042 0.048 0.977 0.421 T 1.199 0.549 T
12,21,22 0.053 0.051 0.985 0.743 T 3.158 0.206 T
13,23,24 0.050 0.035 0.974 0.233 T 2.884 0.237 T
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
Table7: Normal: Unequaln,µ =8,σ =(5.3,8.5,11.3),11experiments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.042 0.040 0.966 0.854 T 3.712 0.156 T
(4,5,6) 0.044 0.035 0.880 0.047 F 5.232 0.073 T
(5,7,8) 0.037 0.038 0.976 0.874 T 3.958 0.138 T
(6,9,10) 0.027 0.026 0.995 0.761 T 5.334 0.069 T
(7,11,1) 0.035 0.039 0.951 0.185 T 8.036 0.018 F
(8,13,14) 0.042 0.040 0.968 0.379 T 5.198 0.074 T
(9,15,16) 0.028 0.030 0.934 0.022 T 5.910 0.052 T
(10,17,18) 0.041 0.043 0.977 0.515 T 8.023 0.018 F
(11,19,20) 0.042 0.049 0.985 0.765 T 13.503 0.001 F
(12,21,22) 0.028 0.030 0.971 0.208 T 5.704 0.058 T
(13,23,24) 0.037 0.037 0.958 0.036 F 9.948 0.007 F
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
PowercomparisonofANOVAandKruskal–Wallistests... 63
Table8: Normal: Unequaln,µ =(8,9,11),σ =5,11experiments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.076 0.053 0.949 0.655 T 0.739 0.691 T
(4,5,6) 0.105 0.095 0.978 0.953 T 2.133 0.344 T
(5,7,8) 0.149 0.138 0.950 0.373 T 0.468 0.791 T
(6,9,10) 0.157 0.141 0.969 0.613 T 0.819 0.664 T
(7,11,1) 0.175 0.157 0.980 0.814 T 1.066 0.587 T
(8,13,14) 0.188 0.186 0.972 0.497 T 2.821 0.244 T
(9,15,16) 0.241 0.226 0.963 0.218 T 3.338 0.188 T
(10,17,18) 0.294 0.267 0.983 0.749 T 3.265 0.196 T
(11,19,20) 0.286 0.273 0.996 0.999 T 0.715 0.699 T
(12,21,22) 0.344 0.318 0.968 0.145 T 0.275 0.871 T
(13,23,24) 0.350 0.335 0.960 0.047 F 0.017 0.991 T
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
Table9: Normal: Unequaln,µ =(8,9,11),σ =(5.3,8.5,11.3),11exper-
iments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.058 0.041 0.878 0.125 T 3.490 0.175 T
(4,5,6) 0.057 0.053 0.957 0.648 T 3.268 0.195 T
(5,7,8) 0.055 0.048 0.955 0.448 T 3.311 0.191 T
(6,9,10) 0.060 0.064 0.882 0.008 F 8.301 0.016 F
(7,11,1) 0.068 0.076 0.944 0.117 T 2.419 0.298 T
(8,13,14) 0.073 0.073 0.962 0.260 T 6.517 0.038 F
(9,15,16) 0.088 0.090 0.987 0.931 T 3.860 0.145 T
(10,17,18) 0.085 0.078 0.991 0.972 T 5.523 0.063 T
(11,19,20) 0.096 0.099 0.970 0.223 T 18.740 0.000 F
(12,21,22) 0.095 0.094 0.980 0.492 T 9.512 0.009 F
(13,23,24) 0.112 0.101 0.963 0.068 T 18.660 0.000 F
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
64 NwobiandAkanno
Table10: Multivariate Non-normal: Equal n, µ =8,σ =5, 12
experiments
Power Normality Homogeneity
n F KW W p H
0
L p H
0
5 0.025 0.023 0.900 0.095 T 1.904 0.386 T
10 0.019 0.017 0.800 0.000 F 3.544 0.170 T
15 0.023 0.020 0.850 0.000 F 1.443 0.486 T
20 0.016 0.022 0.848 0.000 F 7.634 0.022 F
25 0.015 0.019 0.797 0.000 F 0.852 0.653 T
30 0.014 0.013 0.793 0.000 F 5.715 0.057 T
35 0.021 0.021 0.879 0.000 F 1.980 0.371 T
40 0.015 0.014 0.874 0.000 F 6.144 0.046 F
45 0.019 0.022 0.940 0.000 F 2.355 0.308 T
50 0.011 0.014 0.925 0.000 F 3.346 0.188 T
55 0.014 0.017 0.873 0.000 F 7.189 0.028 F
60 0.017 0.019 0.911 0.000 F 3.670 0.160 T
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p =
p-value
Table 11: Multivariate Non-normal: Equal n, µ = 8, σ =
(5.3,8.5,11.3),12experiments
Power Normality Homogeneity
n F KW W p H
0
L p H
0
5 0.030 0.035 0.864 0.028 F 4.093 0.129 T
10 0.023 0.031 0.968 0.494 T 1.377 0.502 T
15 0.022 0.029 0.768 0.000 F 8.910 0.012 F
20 0.035 0.044 0.811 0.000 F 16.417 0.000 F
25 0.023 0.042 0.910 0.000 F 10.169 0.006 F
30 0.024 0.043 0.891 0.000 F 2.911 0.233 T
35 0.023 0.049 0.879 0.000 F 0.562 0.755 T
40 0.022 0.053 0.874 0.000 F 18.222 0.000 F
45 0.028 0.046 0.923 0.000 F 1.142 0.565 T
50 0.023 0.066 0.888 0.000 F 18.459 0.000 F
55 0.021 0.063 0.901 0.000 F 7.396 0.025 F
60 0.024 0.081 0.850 0.000 F 18.287 0.000 F
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-
value
PowercomparisonofANOVAandKruskal–Wallistests... 65
Table 12: Multivariate Non-normal: Equal n, µ = (8,9,11),
σ =5,12experiments
Power Normality Homogeneity
n F KW W p H
0
L p H
0
5 0.434 0.425 0.913 0.148 T 2.730 0.255 T
10 0.734 0.824 0.944 0.114 T 1.230 0.541 T
15 0.907 0.962 0.947 0.040 F 3.762 0.153 T
20 0.973 0.993 0.976 0.275 T 0.437 0.804 T
25 0.994 0.999 0.943 0.002 F 3.370 0.186 T
30 0.998 1.000 0.933 0.000 F 3.578 0.167 T
35 1.000 1.000 0.920 0.000 F 5.186 0.075 T
40 1.000 1.000 0.936 0.000 F 0.199 0.905 T
45 1.000 1.000 0.952 0.000 F 2.228 0.328 T
50 1.000 1.000 0.948 0.000 F 1.752 0.417 T
55 1.000 1.000 0.976 0.005 F 3.326 0.190 T
60 1.000 1.000 0.980 0.012 F 5.754 0.056 T
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p =
p-value
Table13: MultivariateNon-normal: Equaln,µ =(8,9,11),σ =
(5.3,8.5,11.3),12experiments
Power Normality Homogeneity
n F KW W p H
0
L p H
0
5 0.501 0.538 0.890 0.005 F 5.792 0.055 T
10 0.743 0.813 0.840 0.000 F 4.099 0.129 T
15 0.225 0.240 0.787 0.003 F 5.704 0.058 T
20 0.865 0.914 0.923 0.001 F 2.016 0.365 T
25 0.936 0.974 0.945 0.003 F 0.757 0.685 T
30 0.976 0.996 0.833 0.000 F 15.273 0.001 F
35 0.988 0.996 0.928 0.000 F 1.549 0.461 T
40 0.998 1.000 0.874 0.000 F 26.109 0.000 F
45 0.998 1.000 0.957 0.000 F 0.475 3.789 T
50 0.999 1.000 0.890 0.000 F 9.985 0.007 F
55 1.000 1.000 0.870 0.000 F 1.805 0.406 T
60 1.000 1.000 0.913 0.000 F 5.562 0.062 T
Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-
value
66 NwobiandAkanno
Table14: Lognormal: Unequaln,µ =8,σ =5,11experiments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.042 0.029 0.962 0.811 T 0.505 0.624 T
(4,5,6) 0.051 0.041 0.909 0.131 T 0.148 0.864 T
(5,7,8) 0.053 0.040 0.945 0.297 T 1.806 0.194 T
(6,9,10) 0.043 0.043 0.956 0.333 T 0.044 0.957 T
(7,11,12) 0.059 0.047 0.938 0.080 T 2.604 0.092 T
(8,13,14) 0.052 0.050 0.912 0.008 F 1.657 0.207 T
(9,15,16) 0.052 0.056 0.969 0.340 T 0.000 1.000 T
(10,17,18) 0.054 0.052 0.923 0.005 F 1.508 0.233 T
(11,19,20) 0.054 0.055 0.923 0.005 F 1.508 0.233 T
(12,21,22) 0.056 0.048 0.976 0.323 T 1.192 0.312 T
(13,23,24) 0.053 0.050 0.980 0.437 T 0.229 0.796 T
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
Table15: Lognormal: Unequaln,µ =8,σ =(5.3,8.5,11.3),11experi-
ments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.042 0.029 0.962 0.811 T 0.505 0.624 T
(4,5,6) 0.051 0.041 0.909 0.131 T 0.148 0.864 T
(5,7,8) 0.053 0.040 0.945 0.297 T 1.806 0.194 T
(6,9,10) 0.043 0.043 0.956 0.333 T 0.044 0.957 T
(7,11,12) 0.059 0.047 0.938 0.080 T 2.604 0.092 T
(8,13,14) 0.052 0.050 0.912 0.008 F 1.657 0.207 T
(9,15,16) 0.052 0.056 0.969 0.340 T 0.000 1.000 T
(10,17,18) 0.054 0.052 0.923 0.005 F 1.508 0.233 T
(11,19,20) 0.054 0.055 0.923 0.005 F 1.508 0.233 T
(12,21,22) 0.056 0.048 0.976 0.323 T 1.192 0.312 T
(13,23,24) 0.053 0.050 0.980 0.437 T 0.229 0.796 T
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
PowercomparisonofANOVAandKruskal–Wallistests... 67
Table16: Lognormal: Unequaln,µ =(8,9,11),σ =5,11experiments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.006 0.046 0.560 0.000 F 0.995 0.417 T
(4,5,6) 0.002 0.030 0.285 0.000 F 0.723 0.506 T
(5,7,8) 0.002 0.035 0.289 0.000 F 0.997 0.390 T
(6,9,10) 0.003 0.044 0.203 0.000 F 0.733 0.492 T
(7,11,1) 0.001 0.041 0.232 0.000 F 0.988 0.355 T
(8,13,14) 0.001 0.033 0.232 0.000 F 0.777 0.468 T
(9,15,16) 0.001 0.042 0.147 0.000 F 0.740 0.484 T
(10,17,18) 0.001 0.036 0.136 0.000 F 0.819 0.448 T
(11,19,20) 0.001 0.044 0.125 0.000 F 0.742 0.482 T
(12,21,22) 0.000 0.026 0.128 0.000 F 0.828 0.443 T
(13,23,24) 0.000 0.030 0.208 0.000 F 0.000 0.145 T
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
Table17: Lognormal: Unequaln,µ =(8,9,11),σ =(5.3,8.5,11.3),11
experiments
Power Normality Homogeneity
(n
1
,n
2
,n
3
) F KW W p H
0
K
2
p H
0
(3,3,4) 0.008 0.059 0.433 0.000 F 0.673 0.540 T
(4,5,6) 0.004 0.089 0.348 0.000 F 0.817 0.465 T
(5,7,8) 0.007 0.118 0.282 0.000 F 0.830 0.453 T
(6,9,10) 0.007 0.159 0.346 0.000 F 0.946 0.404 T
(7,11,1) 0.004 0.161 0.187 0.000 F 0.840 0.443 T
(8,13,14) 0.006 0.206 0.399 0.000 F 2.279 0.119 T
(9,15,16) 0.009 0.235 0.286 0.000 F 1.913 0.162 T
(10,17,18) 0.008 0.244 0.197 0.000 F 0.511 0.604 T
(11,19,20) 0.009 0.304 0.270 0.000 F 0.725 0.490 T
(12,21,22) 0.004 0.298 0.120 0.000 F 0.735 0.485 T
(13,23,24) 0.011 0.332 0.116 0.000 F 1.828 0.170 T
Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value
68 NwobiandAkanno
Table18: Summaryofperformancesandtestsfordifferences
t-test Signtest
Distribution Table# t p S p
Normal 2 0.5941 0.559 8/12 0.927
Normal 3 0.5053 0.618 5/12 0.500
Normal 4 0.2031 0.841 12/12 1.000
Normal 5 0.4121 0.684 12/12 1.000
Normal 6 1.8049 0.087 9/11 0.999
Normal 7 −0.1323 0.896 4/11 0.377
Normal 8 0.4035 0.691 11/11 1.000
Normal 9 0.3186 0.753 6/11 0.828
Non-normal 10 −0.6485 0.524 5/12 0.500
Non-normal 11 −5.1883 0.000 0/12 0.000
Non-normal 12 −0.1960 0.846 1/12 0.109
Non-normal 13 −0.4068 0.688 0/12 0.000
Non-normal 14 −13.6000 0.000 0/11 0.000
Non-normal 15 −8.2373 0.000 1/11 0.005
Non-normal 16 −7.0466 0.000 0/11 0.000
Non-normal 17 −11.6210 0.000 0/11 0.000
Figure 1: Normal: (a) Equal n, µ = 8, σ = 5; (b) Equal n, µ = 8,
σ = (5.3,8.5,11.3); (c) Equal n, µ = (8,9,11), σ = 5; (d) Equal n,
µ =(8,9,11),σ =(5.3,8.5,11.3)
PowercomparisonofANOVAandKruskal–Wallistests... 69
Figure 2: Normal: (a) Unequal n, µ =8, σ =5; (b) Unequal n, µ =8,
σ =(5.3,8.5,11.3); (c) Unequal n, µ =(8,9,11),σ =5; (d) Unequal n,
µ =(8,9,11),σ =(5.3,8.5,11.3)
Figure3: MultivariateNon-normal: (a)Equaln,µ =8,σ =5;(b)Equal
n,µ =8,σ =(5.3,8.5,11.3);(c)Equaln,µ =(8,9,11),σ =5;(d)Equal
n,µ =(8,9,11),σ =(5.3,8.5,11.3)
70 NwobiandAkanno
Figure4: Lognormal: (a) Unequaln,µ =8,σ =5; (b) Unequaln,µ =8,
σ =(5.3,8.5,11.3); (c) Unequal n, µ =(8,9,11),σ =5; (d) Unequal n,
µ =(8,9,11),σ =(5.3,8.5,11.3)
6. Conclusion
The purpose of this study was to compare the power of the parametric ANOVA F-test
and its alternative, the non-parametric Kruskal–Wallis KW test where the assumptions of
normality and homogeneity of variances are violated. The power of both tests showed a
particularpatterninthecaseofequalmeansfornormalandnon-normalsituations. Inunequal
group mean scenarios, they showed positive trends with increasing sample sizes for balanced
orunbalanceddesigns,thedistributionofthedatasetnotwithstanding.
This study has shown thatthe instances when theF-test was more powerful thanthe KW
test,it isoften very difﬁcultto distinguish. However,when theKW test was demonstratedto
bemorepowerful,especiallyinnon-normalscenarios,itcamewithasigniﬁcantdifference
(p<0.05). TheseresultsingeneralimplythattheF-testhasahigherriskofacceptingthe
hypothesisofequalityofgroupmeanswhen, indeed,theyarenotso. Speciﬁcally,therisk
of using the F-test in the analysis of non-normal data is very high. Since it is rare to have
perfectnormalityifever,thisstudyhasprovidedmoreevidencethatthereisquiteliterally
littletoloseinusingtheKruskal–Walistestasanon-parametricalternativetotheparametric
analysisofvarianceF-test.
Acknowledgements
TheauthorsgratefullyacknowledgetheanonymousreviewersandtheEditorsfortheir
time,constructivecomments,andsuggestionsthatledtothesigniﬁcantimprovementofthis
paper.
PowercomparisonofANOVAandKruskal–Wallistests... 71
References
BlancaMena,M.J.,AlarcónPostigo,R.,ArnauGras,J.,BonoCabré,R.,Bendayan,R.,etal.
(2017).Non-normaldata:IsANOVAstillavalidoption?Psicothema,29(4),552–557.
https://doi.org/10.7334/psicothema2016.383
Ferreira, E. B., Rocha, M. C., & Mequelino, D. B. (2012). Monte Carlo evaluation of the
ANOVA’s F and Kruskal–Wallis tests under binomialdistribution.Sigmae,1(1),126–
139.
Glass,G.V.,Peckham,P.D.,&Sanders,J.R.(1972).Consequencesoffailuretomeetas-
sumptions underlying the ﬁxed effects analyses of variance and covariance. Review of
EducationalResearch,42(3),237–288.https://doi.org/10.3102/00346543042003237
Hecke,T.V.(2012).PowerstudyofANOVAversusKruskal–Wallistest.JournalofStatistics
andManagementSystems,15(2-3),241–247.https://doi.org/10.1080/09720510.2012.
10701623
Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis.
JournaloftheAmericanStatisticalAssociation,47(260),583–621.
Kutner, M., Nachtsheim, C., Neter, J., & Li, W. (2005). Applied linear statistical models
(5thed.).McGraw-Hill.
Lachenbruch,P.A.,&Clements,P.J.(1991).ANOVA,Kruskal–Wallis,normalscoresand
unequalvariance.CommunicationsinStatistics-TheoryandMethods,20(1),107–
126.
Legendre,P.,&Borcard,D.(2008).Statisticalcomparisonofunivariatetestsofhomogene-
ity of variances [Unpublished manuscript]. Département de sciences biologiques,
UniversitédeMontréal.
Lehmann,E.L.(2006).Nonparametrics:Statisticalmethodsbasedonranks.Springer.
Marcinko, T. (2014). Consequences of assumption violations regarding one-way ANOVA.
Proceedings of The 8th International Days of Statistics and Economics, 116(47),
974–985.
Moder,K.(2007).HowtokeeptheTypeIerrorrateinANOVAifvariancesareheteroscedas-
tic.AustrianJournalofStatistics,36(3),179–188.https://doi.org/10.17713/ajs.v36i3.
329
Moder, K. (2010). Alternatives to F-test in one way ANOVA in case of heterogeneity of
variances(asimulationstudy).PsychologicalTestandAssessmentModeling,52(4),
343–353.
Sahai, H., & Ageel, M. I. (2000). The analysis of variance: Fixed, random and mixed models.
Springer.
Sawilowsky,S.S.,Blair,R.C.,&Higgins,J.J.(1989).AninvestigationoftheTypeIerror
andpowerpropertiesoftheranktransformprocedureinfactorialANOVA.Journal
of Educational and Behavioral Statistics, 14(3), 255–267. https://doi.org/10.3102/
10769986014003255
Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions.
Psychometrika,48(3),465–471.https://doi.org/10.1007/BF02293687