AdvancesinMethodologyandStatistics/Metodološkizvezki,Vol.18,No.2,2021,53–71 https://doi.org/10.51936/ltgt2135 PowercomparisonofANOVAandKruskal–Wallistests whenerrorassumptionsareviolated FelixN.Nwobi ∗ ,FelixC.Akanno ImoStateUniversity,DepartmentofStatistics,Owerri,Nigeria Abstract Theeffectsoftheviolationsofnormalityandhomogeneityofvariancesassumptionsonthe poweroftheone-wayANOVAF-testisstudiedinthispaper. Simulationexperimentswere conductedtocomparethepoweroftheparametricF-testwiththenon-parametricKruskal– Wallis(KW)testinnormal/non-normal,equal/unequalvariancesscenariosandequal/unequal sample group means. Each of these 184 simulation experiments was replicated N =1000 timesandpowerobtainedforbothF andKW−tests. TheShapiro–Wilk’stestfornormality and Bartlett’s/Levene’s tests for homogeneity of variances was conducted in each experiment. Results show that the power of the KW tests outperformed those of the F-tests in the 92 (85/92)non-normalcases. AlthoughthepoweroftheF-testsishigherthanthoseoftheKW testsin85outofthe92experimentsundernormalityassumptions,thesedifferences,inall casesinthisstudyarenotsignificant(p>0.05)usingbotht andsigntests. Basedonthese results,thisstudyfavourstheKWtestasamorerobusttestandsafertouseratherthanthe F-testespeciallywhenthedistributionalassumptionsofdatasetsareindoubt. Keywords: ANOVA,Homogeneitytest,Kruskal–Wallistest,Normalitytest,Power comparison 1. Introduction Theone-wayanalysisofvariance(ANOVA)isoneofthemostpopularstatisticalmethods forthecomparisonoftreatmentmeansfromcompletelyrandomized designed(CRD)experi- mentssinceitsintroductionintothestatisticsliteraturefromthe1920sbyR.A.Fisher. Asa classicalstatisticalmethod,twoofthemajorrequirementsofthemethodtoproduceoptimal resultsare: (i)datasetbe normallydistributed,and(ii)groupvariancesbehomogeneous. In real-lifesituations,datasetsarenotoftennormallydistributedandgroupvariancesunequal makingtheseassumptionsalwaysunattainable. However,manyappliedresearcherssuchas those in the fields of business, economics, the social sciences, etc., go ahead to apply the methodbecausetheyareeithernotawareoftheserestrictionsorignorantoftheseriousness oftheirviolations. ∗ Correspondingauthor Emailaddresses: fnnwobi@imsu.edu.ng(FelixN.Nwobi),felixtemple@yahoo.com(FelixC. Akanno) 54 NwobiandAkanno To circumvent the effects of these assumptions, Kruskal and Wallis (1952) introduced a non-parametric version of the analysis of means by ranks as an alternative to ANOVA’s F-test. The remarkable thing about Kruskal–Wallis (KW) test is that the assumptions are milderthan thoseoftheparametricF-test. Effectsofviolations ofhomogeneityofvariances, non-normality of group means or both on Type I error rate are available (see, for example, Kutner et al., 2005; Legendre & Borcard, 2008; Marcinko, 2014; Moder, 2007, 2010). A good review of the effect of non-normality on the robustness of the F-test can be found in BlancaMenaetal.(2017),especiallywhendegreesofskewnessandkurtosisrangingfrom −1to1areconsidered. To compare these methods, Hecke (2012) whose simulation experiments were based on permutation to determine the power of both tests, observed a higher power in KW as comparedtotheclassicalF-testinthecaseofnon-symmetricaldistributions. Lachenbruch andClements(1991) haddemonstratedthatthe KWtestmayhavegreater powerthantheF- test when the population distributions are not normal. They further argued that in comparison withF-test,theKWtestismorerobustagainstthedeparturesfromassumptionsofequalityof variance. TheresearchcarriedoutbyGlass etal.(1972)focusedonthepowersof theF-test andKWtestwhenthepopulationofinterestisskewed. Theyobservedthatnon-normality hassomeeffectontheTypeIerror,buttheminimaleffectwhenthevariancesareequal. Fora completelyrandomizedfixed-effectmodelofdatawithbinomialerrors, theF-testbehavedin general, betterthan theKWtest, controlling thenominal level of significanceand presenting higherpower(Ferreiraetal.,2012). From the foregoing, opinions of researchers are divided on the robustness of the classical F-test in theanalysis of data sets from completely randomized experiments. In this study, we useMonteCarlosimulationstoinvestigatetheeffectsofviolationsoftheseassumptionson the power of theF-test and KW test under various scenarios, e.g., unequal means and sample sizes. 2. Twocompetingtests 2.1. TheANOVAF-test TheANOVAtestisapowerfulstatisticaltoolfortestsofequalityofagroupmeans. By usingFishernotation,aone-wayANOVAmodelmayberepresentedmathematicallyas Y ij =µ +α i +ε ij wherei=1,...,k, j=1,...,n i ,Y ij istheyieldfromthe j-thobservationati-thtreatment,µ isthegeneralmeaneffectgivenby µ = k ∑ i=1 µ i α i /n andα i isthefixed/randomeffectduetothei-thtreatment. Thismeansthatiftherewereno treatment differences and no chance causes, then the yield of each observation will beµ . The α i whichistheeffectofthei-thtreatmentisgivenby α i =µ i −µ . Thereforethei-thtreatmentincreases ordecreasesthe yieldbyanamountα i . Thetwo basic assumptionsofthismodelare(i)thatthedatasetisnormallydistributed,Y ∼N(µ ,σ 2 )and ε i ∼N(0,σ 2 ε )and(ii)groupvariancesareequal. Thetesthypothesisisthereforestatedas H 0 : µ 1 =µ 2 =···=µ k , PowercomparisonofANOVAandKruskal–Wallistests... 55 for F = MS tr MS e (2.1) whereF istheteststatistic,MS tr andMS e aretreatmentmeansquaresanderrormeansquares respectively. BasedonEquation(2.1),H 0 isrejectedforagivenα ifF >F k−1,k(n−1);α . 2.2. TheKruskal–Wallistest The KW test as a non-parametric alternative to the one-way ANOVA assumes that observationsineachgroupcomefromapopulationwiththesameshapeofthedistribution. It becomesaproblem when trulytheobservations arenotcoming fromthepopulationwith the sameshape. Thenullhypothesisassociatedwiththistestisgivenby H 0 : η 1 =η 2 =···=η k whereη i is the median of thei-th group. This is equivalent toH 0 : Samples are from identical populations. Letndenotethetotalnumberofobservationsn=∑ k i=1 n i wheren i isthesize ofthei-thsample,i=1,2,...,k andk isthenumberofgroups. Rankthenobservationsin eitherascending ordescendingorder ofmagnitudeanduseaverage ranks when thereare ties. Let R(X ij ) represent the rank assigned to the j-th observation from the i-th group, X ij and R i representthesumoftheranksassignedtothei-thgroupR i =∑ n i i=1 R(X ij ),i=1,2,...,k. DefinetheteststatisticT as T = 1 S 2 k ∑ i=1 R i 2 n i − n(n+2) 2 4 ! where S 2 = 1 n−1 ∑ allrank R(X ij ) 2 −n (n−1) 2 4 ! . Iftherearenoties,S 2 simplifieston(n+1)/12andtheteststatisticreducesto T = 12 n(N+1) k ∑ i=1 R 2 i n i −3(n+1). Underthenullhypothesis,H 0 (Lehmann,2006),T isasymptoticallychi-squaredistributed withk−1degreesoffreedom,i.e.,T ∼χ 2 k−1 . 3. Methodology 3.1. Thepowerofatest Intestingtheequalityofgroupmeansofadatasetbywhatevermethod,theresearcher willbeinterestedinthecorrectnessorotherwiseoftheoutcomeofthetesthypothesis. The outcome is interpreted using a p-value, which is the probability of observing the result with a specifiedlevelofsignificance,α,giventhatH 0 istrue. Whenanexperimentererroneously acceptsaH 0 whenH 1 istrue,thenthatexperimenterhascommittedaTypeIIerrorstatedas P(AcceptH 0 |H 1 istrue)=β. Thendefineastatisticalpowerofatestastheprobabilitythat thetestcorrectlyrejectsH 0 when aspecified alternative istrue, i.e.,Power=P(RejectH 0 |H 1 istrue)=1−β. Power isinfluencedmainlyby thechosensignificantlevel of thetest and thesamplesize. Sincepowerisstatedintermsofprobability,itsvalueiswithintherange0≤P≥1,therefore withtwomethodstestingagivendataset,themethodwithhigherPwillbeconsideredabetter method. Therefore, a method with a lower power has a higher risk of committing Type II errors(ofacceptinganullhypothesiswhenindeedthealternativeistrue). 56 NwobiandAkanno 3.2. Testsonassumptions Testsonassumptionswillbeconductedoneachsimulationexperiment. Forthenormality assumption, the Shapiro–WilksW test will be implemented. The hypothesis of interest isH 0 : Dataarenormallydistributed,versusthealternative,H 1 : Dataarenotnormallydistributed. For the homogeneity of group variances assumption, the Bartlett’sK 2 test will be applied whendataareassumednormalornear-normalwhiletheLevene’sLtestwillbeusedwhen datasetiseitherskewedorfornon-normaldata. TheLevene’stest,unliketheBartlett’s,is knownto be less sensitive to departures from normality. In thesetwocases, thehypothesis is givenby H 0 : σ 2 i =σ 2 j H 0 : σ 2 i ̸=σ 2 j . Forthesethreetestsinthissubsection,H 0 willberejectedif p<α;if p>α,thentheH 0 cannot be rejected and conclude that the group variances are equal. The reader is referred to Sahai and Ageel (2000, pp. 93–107) for Shapiro–Wilk’s, Bartlett’s and Levene’s tests respectively. 3.3. Testsfordifferences 3.3.1. Thet-test Eachdatasetgeneratedwillbeanalyzedusingtwomethods,theF-testandtheKWtest each of them independently reporting the power of the test. To compare the powers from these methods, a pairedt-test is considered to determine whether the difference between their powersissignificantlydifferentfromzero. Let A i and B i , i=1,2,...,n denote the power of the F-test and the KW test on the i-th experiment respectively. Further, letd i =A i −B i , be assumed to be identically distributed, all withthe sameexpectedpopulationmean valuesµ d andvariancesσ 2 d . The hypothesesforthis testaregivenasfollows H 0 : µ d =0 vs. H 1 : µ d ̸=0. Thet-statisticforthistestis T n = ¯ d s/ √ n ∼t (n−1;α) where ¯ d and sarethemeanandstandarddeviationrespectivelyofthe d i . Rejectionofthis null hypothesis atα level of significance will lead to the conclusion that the power of the F-testissignificantly differentfromthepowerofthe KWtest. T-testswillbeperformedfor alltheexperiments. 3.3.2. TheSigntestS A nonparametric sign test will then be used to determine if the number of F-test with positivepowerdifferencesissignificantlygreaterthanorequaltothenumberofKWtests withnegativepowerdifferences. TakingS + to be the number of positive differences (+) in favour ofF-tests out ofm pairs, thenthenullhypothesisofinterestis H 0 : P(S + )=P(S − )=0.5 H 1 : P(S + )
0.05(0.955)). Similarly,theBartlett’steston thesamedatasetconfirmthehomogeneity(K 2 =0.307,p>0.05(0.858))ofgroupvariances ofthedataset. PowercomparisonofANOVAandKruskal–Wallistests... 59 For equal means, variances and sample sizes, the normality and equality of variances assumptions were maintained (or nearly so) in all 12 experiments displayed in Table 2. It wasobservedthatthepoweroftheF-testwashigherthanthatoftheKWtestineightoutof 12experiments(Table2andFigure1(a)). Similarly,resultsofpoweranalysesdisplayedin Table3showthatF-testwasslightlybetterthantheKWtestonlyinfiveexperiments,whilethe KWtestperformedbetterinfiveexperimentsandbothteststiedintwoofthe12experiments. This is displayed in Figure 1(b). This poor performance of the F-test may have been due totheviolationoftheequalvarianceassumptionwhereσ i =(σ 1 ,σ 2 ,σ 3 )=(5.3,8.5,11.3). F-test performedbetter in all12 experiments in Table 4 asthe group sample sizes were equal andequalityofvariancesassumptionswererespected. Insituationsofunequalgroupmeans, thepowerofbothtestsshowedapositivetrendwithincreasingsamplesizes(panels(c)and (d)of Figures1–4). Increasing sample size, however,didnotinfluencethe power ofF-testin thelog-normalsituations. Regardlessofgroupsizesandwhetherornotthevarianceswere equal, the two tests under comparison demonstrated nearly identical power except in the log-normalscenarioswhentheKWmaintaineddominance(Figure4). Similarobservations couldberenderedtoresultspresentedinTables6–9,anddisplayedgraphicallyinFigures1 and2wherenormaldistributionswereassumed. The situation in non-normal scenarios is of interest in this work. Except for results in Table 10, KW test performed better than the F-test (Tables 11–14), but in multivariate non-normal cases, both tests showed indications of asymptotic convergence when sample sizesareequalandthemeansareunequal(Tables12and13,andFigure3inpanels(c)and (d)). ThesuperiorperformanceofthepoweroftheKWtestovertheF-testinanon-normal scenarioisdemonstratedunderthelognormaldistribution(Tables14–17,andFigure4). These results are in tandem with the trends in research involving parametric and non- parametricstatistics(BlancaMenaetal.,2017;Sawilowskyetal.,1989)wherenon-parametric testsarelesspowerfulthanparametrictestsbutsuchpowergapissmall. Ontheotherhand, these authors observed that the power advantage of the non-parametric tests under conditions ofnon-normalitycanbedramatic. Further analysis of the values of power of the F-test and the KW test were performed using data in Tables 2–17 and displayed in Table 18. The negative values in thet-statistic column indicatescenarios where the power ofthe KW is higherthan theF-test,and positive otherwise. Thecorresponding p-valuesshowthesignificanceofthedifferences(p<0.05); the hypothesis of no difference in power are in 11 out of the 16 tests especially when data setswereassumedtobenormallydistributed. The result of the t-test that was carried out to see if the power of the F-test is indeed higher than that of the KW test using values in columns 3 and 4 of Table 18 showed that with11degreesoffreedomforTables2–5andTables10–13and10degreesoffreedomfor thosewith11experiments,thet-testsrejectedthehypothesisofnodifference(p<0.05)only in5outof16tests. TheKWtestperformedaboutthreetimesbetterthantheconventional F-test. Sincethe p-valuesareverysmall(p<0.001)inthefivecases,thereareverysmall probabilityoftheseresultsoccurringbychance. Similarly,thenon-parametricsigntestresultsasdisplayed(Table18)wherethestatistic S (showninbold)isthenumberofpositivedifferenceswhereF-testperformedbetterthan KWtest. Forinstance,8/12isunderstoodtomean F-testishigherinpowerthanKWin8 outof12experimentsinTable2andthatthedifferenceisstatisticallydifferent(p>0.05). The values ofS in thistable show also thatKW test outperformedtheF-test, (p<0.05). The overallresultshowsthattheF-testisbetterinonly74outof184experiments. 60 NwobiandAkanno Table2: Normal: Equaln,µ =8,σ =5,12experiments Power Normality Homogeneity n F KW W p H 0 K 2 p H 0 5 0.045 0.042 0.978 0.955 T 0.307 0.858 T 10 0.058 0.054 0.970 0.532 T 2.403 0.297 T 15 0.054 0.032 0.977 0.519 T 1.807 0.405 T 20 0.048 0.047 0.981 0.477 T 1.519 0.468 T 25 0.049 0.054 0.978 0.229 T 0.366 0.833 T 30 0.049 0.046 0.990 0.697 T 2.191 0.334 T 35 0.056 0.062 0.983 0.192 T 1.310 0.520 T 40 0.038 0.042 0.982 0.109 T 5.325 0.070 T 45 0.048 0.046 0.988 0.312 T 4.169 0.124 T 50 0.053 0.057 0.996 0.941 T 0.537 0.765 T 55 0.047 0.041 0.997 0.969 T 0.018 0.991 T 60 0.041 0.036 0.987 0.101 T 0.221 0.895 T Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-value Table3: Normal: Equaln,µ =8,σ =(5.3,8.5,11.3),12experi- ments Power Normality Homogeneity n F KW W p H 0 K 2 p H 0 5 0.048 0.041 0.959 0.678 T 2.499 0.287 T 10 0.051 0.052 0.982 0.865 T 0.118 0.943 T 15 0.044 0.046 0.978 0.540 T 2.436 0.296 T 20 0.058 0.044 0.962 0.059 T 12.998 0.002 F 25 0.057 0.058 0.964 0.032 F 8.874 0.012 F 30 0.047 0.043 0.979 0.154 T 15.743 0.000 F 35 0.046 0.046 0.991 0.704 T 3.976 0.137 T 40 0.051 0.058 0.961 0.002 F 18.580 0.000 F 45 0.058 0.052 0.979 0.034 F 26.686 0.000 F 50 0.045 0.050 0.991 0.414 T 28.913 0.000 F 55 0.065 0.060 0.987 0.123 T 24.962 0.000 F 60 0.058 0.061 0.988 0.114 T 24.053 0.000 F Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p- value PowercomparisonofANOVAandKruskal–Wallistests... 61 Table4: Normal: Equaln,µ =(8,9,11),σ =5, 12 experiments Power Normality Homogeneity n F KW W p H 0 K 2 p H 0 5 0.103 0.089 0.958 0.658 T 0.444 0.801 T 10 0.189 0.165 0.948 0.150 T 6.057 0.048 F 15 0.283 0.271 0.952 0.063 T 2.753 0.252 T 20 0.382 0.360 0.986 0.696 T 0.204 0.903 T 25 0.474 0.435 0.994 0.982 T 3.725 0.155 T 30 0.559 0.535 0.979 0.165 T 2.832 0.243 T 35 0.615 0.585 0.994 0.919 T 2.434 0.296 T 40 0.680 0.665 0.988 0.368 T 1.203 0.548 T 45 0.704 0.688 0.991 0.495 T 0.486 0.784 T 50 0.767 0.750 0.997 0.981 T 1.577 0.455 T 55 0.818 0.796 0.988 0.187 T 1.529 0.466 T 60 0.861 0.846 0.991 0.323 T 0.091 0.956 T Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-value Table 5: Normal: Equal n, µ =(8,9,11), σ =(5.3,8.5,11.3), 12experiments Power Normality Homogeneity n F KW W p H 0 K 2 p H 0 5 0.072 0.067 0.941 0.390 T 1.262 0.532 T 10 0.096 0.082 0.985 0.934 T 2.672 0.263 T 15 0.128 0.111 0.982 0.710 T 4.990 0.082 T 20 0.145 0.130 0.983 0.551 T 8.523 0.014 F 25 0.169 0.162 0.971 0.081 T 28.763 0.000 F 30 0.200 0.196 0.963 0.012 F 12.461 0.002 F 35 0.239 0.212 0.988 0.448 T 3.879 0.144 T 40 0.283 0.253 0.967 0.035 F 24.372 0.000 F 45 0.269 0.257 0.991 0.557 T 10.809 0.004 F 50 0.324 0.306 0.990 0.354 T 20.851 0.000 F 55 0.350 0.328 0.991 0.367 T 30.040 0.000 F 60 0.344 0.323 0.989 0.161 T 28.190 0.000 F Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-value 62 NwobiandAkanno Table6: Normal: Unequaln,µ =8,σ =5,11experiments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 3,3,4 0.052 0.026 0.896 0.198 T 2.198 0.333 T 4,5,6 0.043 0.037 0.923 0.211 T 6.214 0.045 F 5,7,8 0.043 0.043 0.954 0.432 T 1.940 0.379 T 6,9,10 0.035 0.029 0.964 0.509 T 0.529 0.767 T 7,11,12 0.060 0.056 0.940 0.092 T 1.295 0.523 T 8,13,14 0.053 0.045 0.954 0.152 T 0.021 0.990 T 9,15,16 0.054 0.046 0.961 0.177 T 0.015 0.992 T 10,17,18 0.054 0.051 0.968 0.249 T 1.105 0.576 T 11,19,20 0.042 0.048 0.977 0.421 T 1.199 0.549 T 12,21,22 0.053 0.051 0.985 0.743 T 3.158 0.206 T 13,23,24 0.050 0.035 0.974 0.233 T 2.884 0.237 T Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value Table7: Normal: Unequaln,µ =8,σ =(5.3,8.5,11.3),11experiments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.042 0.040 0.966 0.854 T 3.712 0.156 T (4,5,6) 0.044 0.035 0.880 0.047 F 5.232 0.073 T (5,7,8) 0.037 0.038 0.976 0.874 T 3.958 0.138 T (6,9,10) 0.027 0.026 0.995 0.761 T 5.334 0.069 T (7,11,1) 0.035 0.039 0.951 0.185 T 8.036 0.018 F (8,13,14) 0.042 0.040 0.968 0.379 T 5.198 0.074 T (9,15,16) 0.028 0.030 0.934 0.022 T 5.910 0.052 T (10,17,18) 0.041 0.043 0.977 0.515 T 8.023 0.018 F (11,19,20) 0.042 0.049 0.985 0.765 T 13.503 0.001 F (12,21,22) 0.028 0.030 0.971 0.208 T 5.704 0.058 T (13,23,24) 0.037 0.037 0.958 0.036 F 9.948 0.007 F Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value PowercomparisonofANOVAandKruskal–Wallistests... 63 Table8: Normal: Unequaln,µ =(8,9,11),σ =5,11experiments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.076 0.053 0.949 0.655 T 0.739 0.691 T (4,5,6) 0.105 0.095 0.978 0.953 T 2.133 0.344 T (5,7,8) 0.149 0.138 0.950 0.373 T 0.468 0.791 T (6,9,10) 0.157 0.141 0.969 0.613 T 0.819 0.664 T (7,11,1) 0.175 0.157 0.980 0.814 T 1.066 0.587 T (8,13,14) 0.188 0.186 0.972 0.497 T 2.821 0.244 T (9,15,16) 0.241 0.226 0.963 0.218 T 3.338 0.188 T (10,17,18) 0.294 0.267 0.983 0.749 T 3.265 0.196 T (11,19,20) 0.286 0.273 0.996 0.999 T 0.715 0.699 T (12,21,22) 0.344 0.318 0.968 0.145 T 0.275 0.871 T (13,23,24) 0.350 0.335 0.960 0.047 F 0.017 0.991 T Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value Table9: Normal: Unequaln,µ =(8,9,11),σ =(5.3,8.5,11.3),11exper- iments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.058 0.041 0.878 0.125 T 3.490 0.175 T (4,5,6) 0.057 0.053 0.957 0.648 T 3.268 0.195 T (5,7,8) 0.055 0.048 0.955 0.448 T 3.311 0.191 T (6,9,10) 0.060 0.064 0.882 0.008 F 8.301 0.016 F (7,11,1) 0.068 0.076 0.944 0.117 T 2.419 0.298 T (8,13,14) 0.073 0.073 0.962 0.260 T 6.517 0.038 F (9,15,16) 0.088 0.090 0.987 0.931 T 3.860 0.145 T (10,17,18) 0.085 0.078 0.991 0.972 T 5.523 0.063 T (11,19,20) 0.096 0.099 0.970 0.223 T 18.740 0.000 F (12,21,22) 0.095 0.094 0.980 0.492 T 9.512 0.009 F (13,23,24) 0.112 0.101 0.963 0.068 T 18.660 0.000 F Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value 64 NwobiandAkanno Table10: Multivariate Non-normal: Equal n, µ =8,σ =5, 12 experiments Power Normality Homogeneity n F KW W p H 0 L p H 0 5 0.025 0.023 0.900 0.095 T 1.904 0.386 T 10 0.019 0.017 0.800 0.000 F 3.544 0.170 T 15 0.023 0.020 0.850 0.000 F 1.443 0.486 T 20 0.016 0.022 0.848 0.000 F 7.634 0.022 F 25 0.015 0.019 0.797 0.000 F 0.852 0.653 T 30 0.014 0.013 0.793 0.000 F 5.715 0.057 T 35 0.021 0.021 0.879 0.000 F 1.980 0.371 T 40 0.015 0.014 0.874 0.000 F 6.144 0.046 F 45 0.019 0.022 0.940 0.000 F 2.355 0.308 T 50 0.011 0.014 0.925 0.000 F 3.346 0.188 T 55 0.014 0.017 0.873 0.000 F 7.189 0.028 F 60 0.017 0.019 0.911 0.000 F 3.670 0.160 T Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-value Table 11: Multivariate Non-normal: Equal n, µ = 8, σ = (5.3,8.5,11.3),12experiments Power Normality Homogeneity n F KW W p H 0 L p H 0 5 0.030 0.035 0.864 0.028 F 4.093 0.129 T 10 0.023 0.031 0.968 0.494 T 1.377 0.502 T 15 0.022 0.029 0.768 0.000 F 8.910 0.012 F 20 0.035 0.044 0.811 0.000 F 16.417 0.000 F 25 0.023 0.042 0.910 0.000 F 10.169 0.006 F 30 0.024 0.043 0.891 0.000 F 2.911 0.233 T 35 0.023 0.049 0.879 0.000 F 0.562 0.755 T 40 0.022 0.053 0.874 0.000 F 18.222 0.000 F 45 0.028 0.046 0.923 0.000 F 1.142 0.565 T 50 0.023 0.066 0.888 0.000 F 18.459 0.000 F 55 0.021 0.063 0.901 0.000 F 7.396 0.025 F 60 0.024 0.081 0.850 0.000 F 18.287 0.000 F Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p- value PowercomparisonofANOVAandKruskal–Wallistests... 65 Table 12: Multivariate Non-normal: Equal n, µ = (8,9,11), σ =5,12experiments Power Normality Homogeneity n F KW W p H 0 L p H 0 5 0.434 0.425 0.913 0.148 T 2.730 0.255 T 10 0.734 0.824 0.944 0.114 T 1.230 0.541 T 15 0.907 0.962 0.947 0.040 F 3.762 0.153 T 20 0.973 0.993 0.976 0.275 T 0.437 0.804 T 25 0.994 0.999 0.943 0.002 F 3.370 0.186 T 30 0.998 1.000 0.933 0.000 F 3.578 0.167 T 35 1.000 1.000 0.920 0.000 F 5.186 0.075 T 40 1.000 1.000 0.936 0.000 F 0.199 0.905 T 45 1.000 1.000 0.952 0.000 F 2.228 0.328 T 50 1.000 1.000 0.948 0.000 F 1.752 0.417 T 55 1.000 1.000 0.976 0.005 F 3.326 0.190 T 60 1.000 1.000 0.980 0.012 F 5.754 0.056 T Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p-value Table13: MultivariateNon-normal: Equaln,µ =(8,9,11),σ = (5.3,8.5,11.3),12experiments Power Normality Homogeneity n F KW W p H 0 L p H 0 5 0.501 0.538 0.890 0.005 F 5.792 0.055 T 10 0.743 0.813 0.840 0.000 F 4.099 0.129 T 15 0.225 0.240 0.787 0.003 F 5.704 0.058 T 20 0.865 0.914 0.923 0.001 F 2.016 0.365 T 25 0.936 0.974 0.945 0.003 F 0.757 0.685 T 30 0.976 0.996 0.833 0.000 F 15.273 0.001 F 35 0.988 0.996 0.928 0.000 F 1.549 0.461 T 40 0.998 1.000 0.874 0.000 F 26.109 0.000 F 45 0.998 1.000 0.957 0.000 F 0.475 3.789 T 50 0.999 1.000 0.890 0.000 F 9.985 0.007 F 55 1.000 1.000 0.870 0.000 F 1.805 0.406 T 60 1.000 1.000 0.913 0.000 F 5.562 0.062 T Note: n = sample size, F = F-test, KW = Kruskal–Wallis test, p = p- value 66 NwobiandAkanno Table14: Lognormal: Unequaln,µ =8,σ =5,11experiments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.042 0.029 0.962 0.811 T 0.505 0.624 T (4,5,6) 0.051 0.041 0.909 0.131 T 0.148 0.864 T (5,7,8) 0.053 0.040 0.945 0.297 T 1.806 0.194 T (6,9,10) 0.043 0.043 0.956 0.333 T 0.044 0.957 T (7,11,12) 0.059 0.047 0.938 0.080 T 2.604 0.092 T (8,13,14) 0.052 0.050 0.912 0.008 F 1.657 0.207 T (9,15,16) 0.052 0.056 0.969 0.340 T 0.000 1.000 T (10,17,18) 0.054 0.052 0.923 0.005 F 1.508 0.233 T (11,19,20) 0.054 0.055 0.923 0.005 F 1.508 0.233 T (12,21,22) 0.056 0.048 0.976 0.323 T 1.192 0.312 T (13,23,24) 0.053 0.050 0.980 0.437 T 0.229 0.796 T Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value Table15: Lognormal: Unequaln,µ =8,σ =(5.3,8.5,11.3),11experi- ments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.042 0.029 0.962 0.811 T 0.505 0.624 T (4,5,6) 0.051 0.041 0.909 0.131 T 0.148 0.864 T (5,7,8) 0.053 0.040 0.945 0.297 T 1.806 0.194 T (6,9,10) 0.043 0.043 0.956 0.333 T 0.044 0.957 T (7,11,12) 0.059 0.047 0.938 0.080 T 2.604 0.092 T (8,13,14) 0.052 0.050 0.912 0.008 F 1.657 0.207 T (9,15,16) 0.052 0.056 0.969 0.340 T 0.000 1.000 T (10,17,18) 0.054 0.052 0.923 0.005 F 1.508 0.233 T (11,19,20) 0.054 0.055 0.923 0.005 F 1.508 0.233 T (12,21,22) 0.056 0.048 0.976 0.323 T 1.192 0.312 T (13,23,24) 0.053 0.050 0.980 0.437 T 0.229 0.796 T Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value PowercomparisonofANOVAandKruskal–Wallistests... 67 Table16: Lognormal: Unequaln,µ =(8,9,11),σ =5,11experiments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.006 0.046 0.560 0.000 F 0.995 0.417 T (4,5,6) 0.002 0.030 0.285 0.000 F 0.723 0.506 T (5,7,8) 0.002 0.035 0.289 0.000 F 0.997 0.390 T (6,9,10) 0.003 0.044 0.203 0.000 F 0.733 0.492 T (7,11,1) 0.001 0.041 0.232 0.000 F 0.988 0.355 T (8,13,14) 0.001 0.033 0.232 0.000 F 0.777 0.468 T (9,15,16) 0.001 0.042 0.147 0.000 F 0.740 0.484 T (10,17,18) 0.001 0.036 0.136 0.000 F 0.819 0.448 T (11,19,20) 0.001 0.044 0.125 0.000 F 0.742 0.482 T (12,21,22) 0.000 0.026 0.128 0.000 F 0.828 0.443 T (13,23,24) 0.000 0.030 0.208 0.000 F 0.000 0.145 T Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value Table17: Lognormal: Unequaln,µ =(8,9,11),σ =(5.3,8.5,11.3),11 experiments Power Normality Homogeneity (n 1 ,n 2 ,n 3 ) F KW W p H 0 K 2 p H 0 (3,3,4) 0.008 0.059 0.433 0.000 F 0.673 0.540 T (4,5,6) 0.004 0.089 0.348 0.000 F 0.817 0.465 T (5,7,8) 0.007 0.118 0.282 0.000 F 0.830 0.453 T (6,9,10) 0.007 0.159 0.346 0.000 F 0.946 0.404 T (7,11,1) 0.004 0.161 0.187 0.000 F 0.840 0.443 T (8,13,14) 0.006 0.206 0.399 0.000 F 2.279 0.119 T (9,15,16) 0.009 0.235 0.286 0.000 F 1.913 0.162 T (10,17,18) 0.008 0.244 0.197 0.000 F 0.511 0.604 T (11,19,20) 0.009 0.304 0.270 0.000 F 0.725 0.490 T (12,21,22) 0.004 0.298 0.120 0.000 F 0.735 0.485 T (13,23,24) 0.011 0.332 0.116 0.000 F 1.828 0.170 T Note: n=samplesize,F =F-test,KW=Kruskal–Wallistest, p= p-value 68 NwobiandAkanno Table18: Summaryofperformancesandtestsfordifferences t-test Signtest Distribution Table# t p S p Normal 2 0.5941 0.559 8/12 0.927 Normal 3 0.5053 0.618 5/12 0.500 Normal 4 0.2031 0.841 12/12 1.000 Normal 5 0.4121 0.684 12/12 1.000 Normal 6 1.8049 0.087 9/11 0.999 Normal 7 −0.1323 0.896 4/11 0.377 Normal 8 0.4035 0.691 11/11 1.000 Normal 9 0.3186 0.753 6/11 0.828 Non-normal 10 −0.6485 0.524 5/12 0.500 Non-normal 11 −5.1883 0.000 0/12 0.000 Non-normal 12 −0.1960 0.846 1/12 0.109 Non-normal 13 −0.4068 0.688 0/12 0.000 Non-normal 14 −13.6000 0.000 0/11 0.000 Non-normal 15 −8.2373 0.000 1/11 0.005 Non-normal 16 −7.0466 0.000 0/11 0.000 Non-normal 17 −11.6210 0.000 0/11 0.000 Figure 1: Normal: (a) Equal n, µ = 8, σ = 5; (b) Equal n, µ = 8, σ = (5.3,8.5,11.3); (c) Equal n, µ = (8,9,11), σ = 5; (d) Equal n, µ =(8,9,11),σ =(5.3,8.5,11.3) PowercomparisonofANOVAandKruskal–Wallistests... 69 Figure 2: Normal: (a) Unequal n, µ =8, σ =5; (b) Unequal n, µ =8, σ =(5.3,8.5,11.3); (c) Unequal n, µ =(8,9,11),σ =5; (d) Unequal n, µ =(8,9,11),σ =(5.3,8.5,11.3) Figure3: MultivariateNon-normal: (a)Equaln,µ =8,σ =5;(b)Equal n,µ =8,σ =(5.3,8.5,11.3);(c)Equaln,µ =(8,9,11),σ =5;(d)Equal n,µ =(8,9,11),σ =(5.3,8.5,11.3) 70 NwobiandAkanno Figure4: Lognormal: (a) Unequaln,µ =8,σ =5; (b) Unequaln,µ =8, σ =(5.3,8.5,11.3); (c) Unequal n, µ =(8,9,11),σ =5; (d) Unequal n, µ =(8,9,11),σ =(5.3,8.5,11.3) 6. Conclusion The purpose of this study was to compare the power of the parametric ANOVA F-test and its alternative, the non-parametric Kruskal–Wallis KW test where the assumptions of normality and homogeneity of variances are violated. The power of both tests showed a particularpatterninthecaseofequalmeansfornormalandnon-normalsituations. Inunequal group mean scenarios, they showed positive trends with increasing sample sizes for balanced orunbalanceddesigns,thedistributionofthedatasetnotwithstanding. This study has shown thatthe instances when theF-test was more powerful thanthe KW test,it isoften very difficultto distinguish. However,when theKW test was demonstratedto bemorepowerful,especiallyinnon-normalscenarios,itcamewithasignificantdifference (p<0.05). TheseresultsingeneralimplythattheF-testhasahigherriskofacceptingthe hypothesisofequalityofgroupmeanswhen, indeed,theyarenotso. Specifically,therisk of using the F-test in the analysis of non-normal data is very high. Since it is rare to have perfectnormalityifever,thisstudyhasprovidedmoreevidencethatthereisquiteliterally littletoloseinusingtheKruskal–Walistestasanon-parametricalternativetotheparametric analysisofvarianceF-test. Acknowledgements TheauthorsgratefullyacknowledgetheanonymousreviewersandtheEditorsfortheir time,constructivecomments,andsuggestionsthatledtothesignificantimprovementofthis paper. PowercomparisonofANOVAandKruskal–Wallistests... 71 References BlancaMena,M.J.,AlarcónPostigo,R.,ArnauGras,J.,BonoCabré,R.,Bendayan,R.,etal. (2017).Non-normaldata:IsANOVAstillavalidoption?Psicothema,29(4),552–557. https://doi.org/10.7334/psicothema2016.383 Ferreira, E. B., Rocha, M. C., & Mequelino, D. B. (2012). Monte Carlo evaluation of the ANOVA’s F and Kruskal–Wallis tests under binomialdistribution.Sigmae,1(1),126– 139. Glass,G.V.,Peckham,P.D.,&Sanders,J.R.(1972).Consequencesoffailuretomeetas- sumptions underlying the fixed effects analyses of variance and covariance. Review of EducationalResearch,42(3),237–288.https://doi.org/10.3102/00346543042003237 Hecke,T.V.(2012).PowerstudyofANOVAversusKruskal–Wallistest.JournalofStatistics andManagementSystems,15(2-3),241–247.https://doi.org/10.1080/09720510.2012. 10701623 Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. JournaloftheAmericanStatisticalAssociation,47(260),583–621. Kutner, M., Nachtsheim, C., Neter, J., & Li, W. (2005). Applied linear statistical models (5thed.).McGraw-Hill. Lachenbruch,P.A.,&Clements,P.J.(1991).ANOVA,Kruskal–Wallis,normalscoresand unequalvariance.CommunicationsinStatistics-TheoryandMethods,20(1),107– 126. Legendre,P.,&Borcard,D.(2008).Statisticalcomparisonofunivariatetestsofhomogene- ity of variances [Unpublished manuscript]. Département de sciences biologiques, UniversitédeMontréal. Lehmann,E.L.(2006).Nonparametrics:Statisticalmethodsbasedonranks.Springer. Marcinko, T. (2014). Consequences of assumption violations regarding one-way ANOVA. Proceedings of The 8th International Days of Statistics and Economics, 116(47), 974–985. Moder,K.(2007).HowtokeeptheTypeIerrorrateinANOVAifvariancesareheteroscedas- tic.AustrianJournalofStatistics,36(3),179–188.https://doi.org/10.17713/ajs.v36i3. 329 Moder, K. (2010). Alternatives to F-test in one way ANOVA in case of heterogeneity of variances(asimulationstudy).PsychologicalTestandAssessmentModeling,52(4), 343–353. Sahai, H., & Ageel, M. I. (2000). The analysis of variance: Fixed, random and mixed models. Springer. Sawilowsky,S.S.,Blair,R.C.,&Higgins,J.J.(1989).AninvestigationoftheTypeIerror andpowerpropertiesoftheranktransformprocedureinfactorialANOVA.Journal of Educational and Behavioral Statistics, 14(3), 255–267. https://doi.org/10.3102/ 10769986014003255 Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika,48(3),465–471.https://doi.org/10.1007/BF02293687