Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
JUDGES' EVALUATION OF ROUTINES IN MEN ARTISTIC
GYMNASTICS
George Dallas 1, Paschalis Kirialanis2
1Kapostriakon University of Athens, Department of Physical Education and Sport Science,
Greece
2Dimokrition University of Thrake, Department of Physical Education and Sport Science,
Greece
Original research article
Abstract
For competition judging, the practice of assigning gymnastics judges into one of two groups (D-Jury and E-Jury) is internationally accepted. International judges (the highest level) are placed in the D-Jury and national judges are allocated to the E-Jury. Performance evaluations are the jurisdiction of the E-Judges who record the deductions in the exercises, determining the exercise final score. The purpose of this study was to examine if there were significant differences between D-Jury and E-Jury judges (international vs. national), based on their evaluations of gymnastics performances; allowing for an assessment of the necessity to split judges into these two groups. Twenty experienced judges, who volunteered to participate in the study, were divided into two groups (National vs International). The judges evaluated, via videotape, nine gymnastics routines performed on the rings. Points were deducted (in tenths of units) based on the severity of errors in the routines. According to the results, for the judges level effect the results approached significance and significant differences were found across the 9 separate programs. The observed differences raise questions concerning the existing placement system of judges (international vs. national) in Greece.
Keywords: artistic gymnastic, judges, evaluation, level of judging.
INTRODUCTION
In various events in artistic gymnastics (floor exercises, side horse, rings, etc.), it is at the level of the judges' knowledge and experience that a "winner" is decided. For that decision to be made, the judges are engaged in an extensive process related to information concerning the movement patterns observed (Ste-Marie, 1999). For this reason, they record the difficulty values of the elements that are performed (according to the Code of Points that is valid for every Olympic cycle), the connections of these elements (D-Jury) and the technical aspects of these elements (performance, composition) (FIG, 2009). In international competitions, all members of the Juries (D-
and E-Panels, Assistants and Secretaries) must possess exact, applicable and thorough knowledge of the F.I.G. Code of Points for men and the F.I.G. rules for judges. They must have successfully participated in an international or intercontinental judges course and possess the corresponding FIG category . Prior to the competition, they participate in the Judges' Review Session (instruction) and the final draw of the judges to their functions.
Literature in this area states that the cognitive and perceptual differences that exist between expert and novice athletes can also be applied to judges, because they can also be classed as "performers", since they evaluate gymnasts' performances (Abernethy and Russell, 1984; Allard,
49
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
Graham, and Paarsalu, 1980; Allard and Starkes, 1980; Bard and Fleury, 1981). It can be stated that perceptual differences are related to the elements of the display that are selectively attended to (Allard and Starkes, 1980), the way and the speed at which the visual display is searched (Bard and Fleury, 1981) and how quickly the important information is extracted from the visual display prior to movement (Abernethy and Russell, 1984). On the contrary, the cognitive differences between expert and novice athletes refers to the interpretation and organisation of the skill-related information in memory, so as to facilitate superior recall of that knowledge (Allard, Graham, and Paarsalu, 1980).
Previous studies state that expert judges (more than 10 years experience) are superior to novice judges (up to 3 years experience) because they are more effective at interpreting biomechanical information available from the gymnast's body (Abernethy, 1997), they have greater breadth and depth of knowledge (Ste-Marie, 1999) and they can focus on different areas of the body better than novice judges (Bard et al, 1980). In addition, expert judges are more accurate when recognising form errors (correct body positions) than novice judges (Ste-Marie and Lee, 1991). This is because they are more able to predict what elements follow up during performance of one or more combinations of elements (Ste-Marie and Lee, 1991) and can better adhere to the speed of performances in various apparatus (Salmela, 1978).
A gymnasts' final score is calculated as follows: D-Score (from the D-Jury) + EScore (from the E-Jury) = final score for each apparatus.
The D-Score is concerned with difficulty, element groups and connection values, while the E-Score is concerned with execution and composition. The E-Score is calculated by averaging the middle two of four (or four of six) scores (deductions).
Internationally and nationally, the level of the athlete's performance is evaluated by the judges and there is a common agreement about the final score
that the gymnast receives. However, it is often unclear whether the final sum of deductions comes from the same number and kind of faults that receive deductions (small, medium, large, very large).
In Greece, judges are divided into three categories (novice, national and international). International judges have successfully participated in an international or intercontinental judges course. National and novice judges have only participated in national judges' courses. For them, the results of the examination of these courses serve as the main criteria for further categorisation (i.e. from novice to national, from national to international). However, it is the opinion of specialists that experience is of greater value than judging courses.
Although there are no differences in the total number of deductions (sum of deductions) that judges give whilst evaluating athletes' routines, it is unclear whether the sum of deductions comes from the same number of faults or the same technical error. This is even more evident in routines of lower technical level than in routines executed by elite athletes. It is therefore questionable whether differences in scores between experienced judges result from the judges' different category (national, international); whether differences in the final score result from the same technical faults in the same elements; or if they have come from different elements. It is possible that result accuracy would improve in national competitions if international level judges also judged in the E-Jury, allowing for more accurate and objective evaluations. The purpose of this study was to examine if there are significant differences between national and international judges in: a) the total amount of deductions in all the routines performed, b) the total amount of deductions in each routine, c) the deductions for every element separately and d) the deductions between competition and video evaluation.
50
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
METHODS
Participants
Twenty experienced national and international judges from the Hellenic Federation Gymnastics volunteered to participate in the study. They were divided into two groups: a) international judges (n=8) with 14.47 ± 4.35 years of judging experience and had judged 80.43 ± 28.43 competitions and b) National judges (n=12) with 6.25 ± 1.55 years of judging experience and had judged 18.50 ± 6.54 competitions. The differences for these two parameters (years of judging and number of competitions) were statistically significant (p < .05).
Instruments
Competition routines were recorded using a video camera (JVC GR-Ax2) during an international meeting of artistic gymnastics. The video camera was placed so that the optical axis of the camera was perpendicular with the transverse axis of the performance of the routines on the rings. The distance of the camera from the nearest ring was 3.00 ± 0.20m and the camera's height from the floor was 1.00 ± 0.12m. This placement of the camera is identical to the corresponding position of judges (E-jury) that evaluate the technical execution according to the Code of Points.
Procedure
To evaluate the gymnastics routine, the judges watched the routines via a video link on a monitor. Judges sat one meter from the monitor. Judges independently evaluated the same nine rings routines; each routine contained ten elements resulting in a total of ninety elements. The sum of deductions of every element that was performed was the total score of these deductions in every routine. After the end of each performance element, a black screen appeared for 5 seconds on the monitor, allowing the judges enough time to record the deductions on a record sheet and to prepare for the next performance.
Two expert international judges also evaluated all routines to provide a more objective evaluation and reference point (gold standard) for comparison. The evaluated routines in the preset study represented a broad range of technical gymnastics abilities, thus providing routines with many errors, as well as routines with few errors. The dependent variable, which was the score of each gymnast in the nine routines—as well in each routine separately—was used for statistical analysis (Student's t-test). Statistical significance was set at the 0.05 level.
RESULTS
The scores of the National and International judges across the 9 separate programs and 10 separate exercises are presented in tables 1 and 2 respectively.
Multivariate Analysis of Variance (MANOVA) was used to examine the differences between National and International judges in the deductions across the separate exercises. The multivariate and univariate post hoc results were not significant (A= .926, F= 1.349, p= .208, n2= .074), indicating that the two groups of national and international judges were not significantly different when evaluating the deductions across the 10 separate exercises. The overall univariate post hoc findings are presented in table 3.
We examined the interaction between judge's level (International vs National) and programs (9 separate programs), with respect to the judges' evaluation score. The interaction effect of the 2 X 9 independent groups ANOVA was not significant (F=. 588, p=.786, n2=.028). Accordingly, we examined the main effects for judge's level and programs. For the judge's level effect, the results approached significance (F= 3.881, p= .051, n2= .023) and significant differences were found across the 9 separate programs (F= 11.633, p= .000, n2= .365). The post hoc LSD test was used to detect the sources of significance across the 9 separate programs. The overall findings are presented in table 4.
51
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
Table 1. Mean scores (deductions) of judges, across separate programs.
Variable
M
SD
N
Program 1
National	178.33
International	183.75 Program 2
National	266.67
International	237.50 Program 3
National	127.50
International	118.75 Program 4
National	166.67
International	143.75 Program 5
National	189.17
International	131.25 Program 6
National	135.00
International	108.75 Program 7
National	156.67
International	140.00 Program 8
National	92.50
International	102.50 Program 9
National	132.50
International	122.50
57.18 65.88
72.15 69.02
61.07 64.01
78.66 79.45
62.88 57.18
48.71 24.74
41.63 27.25
41.14 62.28
45.15 46.83
12
8
12 8
12 8
12 8
12 8
12 8
12 8
12 8
12 8
52
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
Table 2. Mean scores (deductions) of judges, across separate exercises.
Variable	M	SD	N
Exercise 1			
National	1.62	1.44	108
International	1.36	1.20	72
Exercise 2			
National	2.07	1.14	108
International	1.90	1.22	72
Exercise 3			
National	1.48	1.27	108
International	1.55	1.34	72
Exercise 4			
National	2.75	1.98	108
International	2.29	1.75	72
Exercise 5			
National	1.38	1.16	108
International	1.33	1.09	72
Exercise 6			
National	1.44	1.31	108
International	1.14	1.06	72
Exercise 7			
National	1.64	1.14	108
International	1.33	0.98	72
Exercise 8			
National	1.15	1.12	108
International	1.35	1.43	72
Exercise 9			
National	0.68	0.99	108
International	0.79	1.10	72
Exercise 10			
National	1.79	1.97	108
International	1.26	1.92	72
53
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
Table 3. ANOVA table, examining the differences between international vs national level judges, across the 10 separate exercises.
Effect	SS	df MS	F	p	n2
Exercice 1					
BG	2.904	1 2.904	1.595	.208	.009
WG	324.046	178	1.820		
Exercice 2					
BG	1.268	1 1.268	.918	.339	.005
WG	245.727	178	1.380		
Exercice 3					
BG	.237	1 .237	.140	.708	.001
WG	300.741	178	1.690		
Exercise 4					
BG	9.445	1 9.445	2.649	.105	.015
WG	634.616	178	3.565		
Exercise 5					
BG	.093	1 .093	.072	.788	.000
WG	227.435	178	1.278		
Exercise 6					
BG	4.033	1 4.033	2.727	.100	.015
WG	263.278	178	1.479		
Exercise 7					
BG	4.033	1 4.033	3.470	.064	.019
WG	206.917	178	1.162		
Exercise 8					
BG	1.712	1 1.712	1.081	.300	.006
WG	281.949	178	1.584		
Exercise 9					
BG	.490	1 .490	.471	.493	.003
WG	185.171	178	1.040		
Exercise 10					
BG	12.245	1 12.245	3.208	.075	.018
WG	679.505	178	3.817		
BG. Between groups WG. Within groups
54
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
Table 4. Post hoc LSD test examining significance across the 9 separate programs.
Mean Difference
program_2_3_4_5_6_7_8_9
1	-74.50* 56.50*	23.00	14.50	56.00*	30.50	84.00*	52.00*
2	131.00*	97.50*	89.00*	130.50*	105.00*	158.50*	126.50*
3		-33.50	-42.00*	-0.50	-26.00	27.50	-4.50
4			-8.50	33.00	7.50	61.00*	29.00
5				41.50*	16.00	69.50*	37.50*
6					-25.50	28.00	-4.00
7						53.50*	21.50
8							-32.00
_9_
*: p < .05
DISCUSSION
It should be noted that the interaction effect between judges category and programs in the present study—with respect to the evaluation score—was not significant. Although international judges have considerably more years and competitions of judging experience, the years of judging experience of national judges provides sufficient knowledge to identify errors in gymnastics routines. However these results should be interpreted with caution, because the sum of deductions between these two categories of judges weren't from the same errors and the same severity (degree of error). This means that judges in these two categories may differ in declarative knowledge (Ste-Marie, 1999), meaning they "record" errors in a different way. It is clear that an attempt has been made internationally to minimise the subjectivity in the judging process and although judges aim to evaluate in an objective way, we should mention that the judging procedure is based on judges' own perceptions of what constitutes the 'perfect performance'.
Based on the findings from this study, judges evaluated different deductions for errors in separate programs. These findings are similar to findings from
previous studies in this area (Ste-Marie, 1999; Ste-Marie and Lee, 1991). Possibly these statistically-significant differences in the deductions in each element are a result of insufficient comparison of the deviation of the technique of the element performed with the perfect technique. Bard et al., (1980) supported that novice and experienced judges focus their attention in "different areas" of the body of the athletes, agreeing with the results of Tenenbaum and his colleagues (1996), who also supported that judges gain experience and become better at their work in every competition.
According to Ste-Marie (1999) and Thomas (1994), the amount of declarative and procedural knowledge is different between national and international judges. Knowledge concerning real information based on specific rules and decision for movement (exercise) possibly differs between national and international judges. The present findings are in conflict with Ste-Marie and Thomas, since no significant differences were evident between national and international level judges. The results, however, approached significance and replication study may be necessary in the future to confirm the present findings. Further, the nine separate programs were associated with athlete's high speed performance, a factor that according to Salmela (1978) is associated with judging
55
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
errors and is a decisive factor for the accuracy of evaluation. We suggest with caution that previous athletic experience of the judges— if they have been athletes in gymnastics or not—may also affect their evaluations. It could also be supported that the complexity of the "multi-joint" system of the human body is also affecting the evaluation from the judges.
Finally, the differences that were revealed between real marks given during competition and those that were given through video evaluation agree with the theory of Puhl (1980), who stated that isolated presentation of the elements through video is giving the judges the possibility to evaluate with more precision. It is possible that the speed of performance and the different connections of the elements during competition also influence the accuracy of the evaluation. This is supported by the results of previous studies (Salmela, 1978).
The results of the present study agree with other findings (Abernethy, 1997, Ste-Marie, 1999) which support that experienced judges better interpret the biomechanical information coming from the athlete's body and need to focus their attention less on the performance, allowing them to concentrate more on the analysis of the element. Additionally—as already observed in the present research—based on the differences of the deductions in isolated routines there is a difference in the capacity of "anticipation through perception" between national and international judges. This fact is in agreement with findings of previous research (Ste-Marie and Lee, 1991; Tenenbaum et al, 1996). In conclusion, we can say that the accuracy of judging between national and international judges is satisfactory based on the very small percentage of statistically-significant differences in the total amount of deductions in all the routines.
Though all judges that participated in this study have a sufficiently long practicing experience, there are some differences in the evaluation between national and international judges. Probably
these differences result from different opinions and knowledge about the performance of the elements or from personal experience, or from differing ability to recognise the nature of mistakes. For the elimination of these differences the presence of international judges in the E-panel is recommended. Special judges' courses to present and analyse the mistakes of performance of the elements and the resulting deductions from the judges will contribute to a fair result.
REFERENCES
Abernethy, B. (1997) Movement expertise: A juncture between psychology, theory, and practice. Paper presented at the meeting of the Association for the Advancement of Applied Sport Psychology, San Diego, CA, June.
Abernethy, B. and Russell, D.G. (1984) Advance cue utilization by skilled cricket batsmen. The Australian Journal of Science and Medicine in Sport, 16, 2, 2-10.
Allard, F., Graham, S. and Paaraslu, M.E. (1980) Perception in sport: Basketball. Journal of Sport Psychology, 2, 14-21.
Allard, F. and Starkes, J.L. (1980) Perception in sport: Volleyball. Journal of Sport Psychology, 2, 22-33.
Bard, C. and Fleury, M. (1981) Considering eye movement as a prediction of attainment. In I.M. Cockerill and W.W. MacGillivray (eds.), Vision and Sport (pp.28-41). Chetenham, UK: Stanley Thornes Publishers.
Bard, C., Fleury, M., Carriere, L. and Halle, M. (1980) Analysis of gymnastic judges' visual search. Research Quarterly for Exercise and Sport, 51, 267-273.
Federation International Gymnastics (2009) The Code of Points. Lucerne, Switzerland: Raeber.
Puhl, J. (1980) Use of video replay in judging gymnastics' vaults. Perceptual and Motor Skills, 51, 51-54.
Salmela, J.H. (1978) GYMNASTIC JUDGING: A complex information processing task, or (Who's putting one over
56
Dallas G., Kirialani P. JUDGES' EVALUATION OF ROUTINES IN
Vol. 2 Issue 2: 49-57
on who?). International Gymnast, 20, 5457, 62-63.
Ste-Marie, D.M. and Lee, T.D. (1991) Prior processing effects on gymnastic judging, Journal of Experimental Psychology, 17, (1), 126-136.
Ste-Marie, D.M. (1999) Expert-Novice Differences in Gymnastic Judging: An Information-processing Perspective. Applied Cognitive Psychology, 13: 269-281.
Tenenbaum, G., Levy-Kolker, N., Sade, S., Lieberman, D.G. and Lidor, R. (1996) Anticipation and confidence of decisions related to skill performance. International Journal of Sport Psychology, 27, 293-307.
Thomas, K. T. (1994) The development of sport expertise: From Leeds to MVP legend. Quest, 1994, 46, 199-210.
57