DOI: 10.20419/2024.33.600 Psihološka obzorja / Horizons of Psychology, 33, 201-223 (2024) CC: 2340, 2227 © Avtorice / Authors, ISSN 2350-5141 UDK: 159.95.072-057.874 Znanstveni raziskovalnoempirični članek / Scientific empirical article The first steps in developing a number sense screening test for young primary school children Katja Depolli Steiner*, Cirila Peklaj, and Anja Podlesek Department of Psychology, Faculty of Arts, University of Ljubljana, Slovenia Abstract: Number sense refers to a set of numerical processing skills that develop before entering primary school and evolve with age and experience. Research has shown the importance of these skills for mathematical achievement. Therefore, early identification of students who have difficulty in numerical processing is the key to early intervention to reduce these deficits. This study's purpose was to design a group-administered pencil-and-paper instrument measuring numerical magnitude estimation and numerical magnitude comparison that can be used in the first three grades of primary school as a quick screening tool for number sense deficits. Three quick and easy-to-use tasks measuring non-symbolic and symbolic number sense (the number line estimation task, the area comparison task, the number comparison task) were developed and administered to a group of 316 students in the first three grades of Slovenian primary schools. The results show that these tests provide a good basis for further development of a screening test. Keywords: number sense, numerical processing, symbolic and non-symbolic numerical magnitude, measurement, primary school students Prvi koraki v razvoju presejalnega testa za merjenje občutka za števila pri mlajših osnovnošolcih Katja Depolli Steiner*, Cirila Peklaj in Anja Podlesek Oddelek za psihologijo, Filozofska fakulteta, Univerza v Ljubljani, Slovenija Izvleček: Občutek za števila se nanaša na niz veščin numeričnega procesiranja, ki se razvije še pred vstopom v osnovno šolo in se nadalje razvija s starostjo in izkušnjami. Raziskave so potrdile pomembnost teh veščin za matematične dosežke učencev. Zgodnje odkrivanje učencev, ki imajo težave z numeričnim procesiranjem, omogoča zgodnje intervencije in je zato ključnega pomena za zmanjševanje primanjkljajev učencev na tem področju. Namen naše raziskave je bil razviti pripomoček tipa papir-svinčnik za merjenje veščine ocenjevanja in primerjave količin, zasnovan za skupinsko izvedbo v razredu, ki ga lahko uporabimo v prvih treh razredih osnovne šole kot hiter presejalni test za odkrivanje primanjkljajev v občutku za števila. Razvili smo tri naloge za merjenje nesimbolnega in simbolnega občutka za števila (test ocenjevanja na številskih daljicah, test primerjave površin, test primerjave števil); uporaba teh nalog je hitra in enostavna. Teste smo preizkusili na skupini 316 učencev prvih treh razredov slovenske osnovne šole. Rezultati so pokazali, da razviti testi predstavljajo dobro izhodišče za nadaljnji razvoj presejalnega testa. Ključne besede: občutek za števila, numerično procesiranje, simbolna in nesimbolna količina, merjenje, osnovnošolci *Naslov/Address: dr. Katja Depolli Steiner, Department of Psychology, Faculty of Arts, University of Ljubljana, Aškerčeva 2, 1000 Ljubljana, e-mail: katja.depolli-steiner@ffum-lj.si ^ I Članek je licenciran pod pogoji Creative Commons Priznanje avtorstva-Deljenje pod enakimi pogoji 4.0 Mednarodna licenca (CC BY-SA licenca). The article is licensed under a Creative Commons Attribution-Share Alike 4.0 International License (CC BY-SA license). 202 K. Depolli Steiner, C. Peklaj, and A. Podlesek In today's increasingly complex and rapidly changing world, the acquisition of basic numerical skills and knowledge is essential for students' successful transition to further education levels and their success in everyday life, e.g., managing their monthly budget or planning significant purchases. However, the data indicate that many students struggle with mathematics in school. Studies of learning difficulties in mathematics show a prevalence of 4 to 6 or 7% among primary school students (Andersson & Ostergren, 2012; Kavkler, 2002). In Slovenia, mathematics is one of the most important subjects in primary school. Students must attend at least 4 hours of mathematics per week from grade 1 to grade 9. Beginning in 3rd grade, their achievement is assessed on a 5-point scale from 1 (unsatisfactory) to 5 (excellent). In 2018, 2% of Slovenian 8th grade students received grade 1 and 16% received grade 2 in mathematics (Japelj Pavesic & Cankar, 2018), which means that their mathematical skills were very poor according to the national knowledge standards prescribed in the curriculum. In Slovenia, a slightly adapted Response to Intervention (RTI) model addresses children with learning difficulties. The original RTI model (Fletcher & Vaughn, 2009) is designed for early identification and support of students with learning difficulties. It comprises of three levels (tiers) that allow increasing the intensity of support for students according to their difficulties, is based on a continuous assessment of these difficulties, and involves teachers, school counselors, and parents. Teachers play an essential role in identifying students' difficulties and designing interventions; they can quickly identify and analyze students' difficulties, design an intervention plan, and evaluate it. Teachers can help most students in the classroom by implementing good pedagogical practice (tier 1). For some students (15-20%), this help is not sufficient, and they need additional professional help from counselors (tier 2). However, for about 3-6% of the students, this additional help is still not sufficient, so they need even more intensive special educational and psychological help in the form of intensive individual interventions (tier 3; Fletcher & Vaughn, 2009). In contrast to the original three-tier model, the five-tier RTI model applied in Slovenia (Magajna et al., 2008) includes additional assessment and counseling by school counselors at tier 2, followed by group support by qualified teachers or school counselors (tier 3) and additional diagnostics in a specialized external institution (tier 4). At the last level (tier 5), the adapted RTI model also includes targeted individual interventions. The key to providing effective help is the early screening of all students who need it. For early identification of students at risk in the general population, we need instruments that enable a reliable and valid identification of students who have problems related to basic numerical skills or representations and help determine which problems are involved. Therefore, in this study, we focused on developing and validating instruments that could be used to identify problems with numerical skills in students in the first three grades of Slovenian primary school. Early identification of students having issues in learning mathematics is important, since early interventions can help prevent such students from lagging behind their peers (Duncan et al., 2007). Aunio and Rasanen (2016) suggested four main factors for developing core numerical skills: symbolic and non-symbolic number sense, understanding of mathematical relations, counting skills, and basic arithmetic skills. Since developing counting skills and basic arithmetic skills is addressed heavily in the Slovenian mathematics curriculum for Grades 1-3 and is monitored regularly by teachers, we focused on developing an instrument for determining the students' non-symbolic and symbolic number sense that are given less attention in the educational process but are equally important components of mathematical skills. Number sense is crucial to mathematical achievement (Andrews & Sayers, 2015; Aunio & Niemivirta, 2010; Aunola et al., 2004; Yang & Li, 2008). It refers to the individual's comprehensive understanding of numbers, operations, their relationships, and the ability to deal with situations in daily life where numbers play a role (Yang & Lin, 2015). These skills develop before entering primary school (Aunio & Rasanen, 2016). Longitudinal studies have shown that a number sense measured in preschool at age 4 predicts mathematics achievement 2 years later (Mazzoco et al., 2011). Number sense measured in preschool predicts mathematics achievement in 3rd grade (Jordan et al., 2009, 2010). Siegler (2016) considers that the ability to distinguish between numerical magnitudes for different numbers (smaller and larger) is crucial for developing the number sense and that the linearity of numerical magnitude representations is also essential. In his integrated theory of numerical development, the knowledge of numerical magnitude, which is reflected in the generation of increasingly precise magnitude representations for an increasingly broad range of numbers, is considered essential for success in the field of mathematics (Siegler, 2016). Magnitude representations first develop on the non-symbolic level (stage 1). The non-symbolic representations are then linked to the symbolic representations for whole numbers (stage 2), followed by an increase in the range of whole numbers whose size an individual can accurately represent (stage 3). The development ends with the extension of these exact representations to rational numbers (Siegler, 2016). For the first three grades of the Slovenian primary school, which are attended by children aged 5-6 to 8-9 years, the first three developmental stages are of crucial importance. The understanding of number magnitude is based on two inherent cognitive mechanisms. A quantity approximation system (approximate number system - ANS; Dehaene, 2011) is used to represent the approximate value of larger numbers, while a system for determining the exact number (object tracking system - OTS; Andersson & Ostergren, 2012; Feigenson et al., 2004) is used to represent small numbers from 1 to 4. ANS allows us to represent the magnitude of numbers using an analogy with a number line. Values on the number line increase from left to right. First, these values are represented logarithmically; children imagine smaller numbers are further apart than large numbers. However, with increasing age and experience, the development of performance on the number line becomes more linear (Praet & Development of a number sense screening test 203 Desoete, 2014). OTS develops very early, even before the age of 1 year (Piazza, 2010); we use it to imagine a small number of objects as separate units through a 1-to-1 correspondence between the object and its mental representation (Andersson 6 Ostergren, 2012; Feigenson at al., 2004). This process is called subitization (Clements, 1999). Researchers use various instruments to determine an individual's knowledge or representations of numerical quantity and comparisons between numbers. One of the most commonly used instruments to identify symbolic and non-symbolic representations of numerical quantities is the number line estimation task developed by Siegler and his colleagues (Siegler & Booth, 2004; Siegler & Opfer, 2003). The participants are presented with a number line with 0 at one end and 10, 100, or 1000 at the other end, depending on the individual's age (Siegler, 2016). The participants are then shown a number and are asked to indicate where this number is located on the number line. In a complementary task, the participants are shown a position on a number line and are asked to estimate the corresponding number. Research has shown that representations of whole number magnitudes progress from a consistently logarithmic distribution through a mixture of logarithmic and linear to a primarily linear distribution (Siegler & Booth, 2004; Siegler & Opfer, 2003). Children's representations of small numbers from 0 to 10 pass through this transition between 3 and 6 years of age. This developmental sequence is later repeated for larger numbers, i.e., for numbers from 0 to 100 between 5 and 8 years, for numbers from 0 to 1,000 between 7 and 10 years, and for numbers from 0 to 10,000 between 9 and 12 years (Siegler, 2016). In students with mathematical difficulties or dyscalculia, linear estimation develops later (Geary et al., 2008; Landerl et al., 2009) and is less accurate than that of students without difficulties (Geary et al., 2008; Landerl, 2013). Studies of representations of number magnitude using number lines have shown that the accuracy of estimating the number magnitude on a number line is related to mathematics achievements, i.e., it is in relation with the accuracy of calculations at the age of 8 years 10 months and 9 years 10 months (LeFevre et al., 2013), with solving word problems at the age of 8 years (Gunderson et al., 2012) and with a standardized math test for students aged 6 to 8 years (Sasanguie et al., 2013). For the comparison of non-symbolic magnitudes, the task most often used is that of two large squares in which smaller elements (in the form of different objects or signs) are drawn, and children are asked to estimate in which half there are more/fewer elements (Halberda & Feigenson, 2008; Laski & Siegler, 2007). The task difficulty is altered by varying the objects' ratio or the distance between the elements in the sequence. Tasks can be presented on paper or a computer screen. These different tasks are not standardized, and the accuracy of the estimates may also be affected by differences in the visual characteristics of the stimuli (i.e., brightness, distance, size, the area covered by the stimuli), which are difficult to control and may affect the estimates (De Smedt et al., 2013; Gebuis & Reynvoet, 2012). To avoid such problems, we decided to develop an instrument for non-symbolic magnitude comparison that includes the comparison of the area size instead of the comparison of elements numerosity, since Lourenco and her colleagues (Lourenco & Bonny, 2017; Lourenco et al., 2012) found that non-symbolic cumulative area representations also predict mathematical achievements. To determine an individual's comparison of symbolic magnitudes, researchers usually use tasks in which two numbers written in Arabic numerals are shown and participants are asked to estimate which of the two is larger/ smaller in magnitude. The numbers used in these tasks can be small, e.g., between 1 and 9 (De Smedt et al., 2009), or can be chosen from a broader range of orders of magnitude, e.g., between 1 and 100, if the participants are older students (Kolkman et al., 2013). The numbers can be displayed on a computer screen (Sasanguie et al., 2012) or read aloud by the experimenter if this task is applied to younger children (Laski & Siegler, 2007). Accuracy, response time, or numerical distance/ratio effect can be recorded (De Smedt et al., 2013). Studies of normative population students have shown that mathematical achievement is related to response accuracy (De Smedt et al., 2009; Kolkman et al., 2013), reaction time (De Smedt et al., 2009; Sasanguie et al., 2013), the adjusted score of response speed and accuracy (Sasanguie et al., 2012, 2013), the distance effect (De Smedt et al., 2009) and the ratio effect (Sasanguie et al., 2013). Research has also shown that students with dyscalculia aged 6 to 10 years attain poorer magnitude estimation results than a normative population. These poor results are reflected in response accuracy (Rousselle & Noël, 2007) and reaction times (De Smedt & Gilmore, 2011; Landerl & Kolle, 2009; Landerl et al., 2004). Since the research results show a significant correlation between students' ability to compare number magnitudes and their further achievement in mathematics, we have decided to include in our instrument a task of comparing the magnitude of two numbers. The main purpose of our study was to develop a convenient instrument for screening students with problems in number sense development at the beginning of primary school. We focused on two components of number sense, i.e., numerical magnitude estimation and magnitude comparison, both on the non-symbolic and symbolic levels. Studies (e.g., Booth & Siegler, 2008; Geary, 2011; Schleepen et al., 2016; Xenidou-Dervou et al., 2017) have shown that these two skills are the key numerical competencies related to students' continued mathematics achievement. They have also shown that students with learning difficulties in mathematics and dyscalculia have difficulties in developing these skills (e.g., De Smedt & Gilmore, 2011; Landerl & Kolle, 2009; Landerl et al., 2004; Rousselle & Noël, 2007). Early identification of students who have difficulty in numerical processing is key to early intervention to reduce these deficits (e.g., Gersten et al., 2005). We aimed to develop an instrument that can be used by school psychologists as a screening instrument. To make the use of such instrument as convenient and cost-effective as possible, we decided to design a set of paper-and-pencil tasks that are short and can be applied in groups in the classroom. In this article we present the first steps in the development of this instrument. 204 K. Depolli Steiner, C. Peklaj, and A. Podlesek Figure 1 Sample Item for Set A of the Number Line Estimation Task Method Participants A total of 316 students (166 boys and 150 girls) from five Slovenian primary schools participated in the study. Of these students, 117 were in the first grade (62 boys), 115 in the second grade (63 boys), and 84 in the third grade (41 boys). Their average age was about seven years (m = 83.4 months, SD = 3.78), eight years (m = 95.8 months, SD = 4.20), and nine years (m = 107.3 months, SD = 3.75), respectively. Procedure Figure 2 Sample Item for Set B of the Number Line Estimation Task We obtained parental consent for all students that participated in the study. The testing took place in school classrooms in small groups (approx. 10 to 15 students). Each group was tested in one session during regular school hours. Typical duration of the testing as a whole was 45 minutes, including material distribution and collection, and short breaks (as needed by specific groups of students). The testing was conducted by an experimenter (one of the researchers) and her assistant (a 5th-year psychology student). For all participating students, we also obtained teachers' assessment of their mathematical skills. Figure 3 One of the Items in the Area Comparison Task, Ratio 11:10 (The Dark Gray Part Is Larger) Instruments The number sense test. The developed number sense test was composed of three tasks: the number line estimation, area comparison, and number comparison task. In the Number line estimation task, two 10-item sets, i.e., number-to-position (NP) and the position-to-number (PN) item set, were adapted from Siegler and Opfer (2003). Each item consisted of a 10 cm long line with left end labeled "0" and right end labeled "100". One randomly selected number from each interval of ten was used (0-10, 10-20, and so forth to 90-100). The number lines were printed in a scrambled arrangement on a landscape page in A3 format (five tasks on the left side of the page, five tasks on the right side of the page). The test pages were preceded by a separate page with two sample items to ensure that the students understood the task and knew the interval size. After the sample items were presented, the students went through the task at their own pace, with the time for the whole set limited to two minutes. A time limit was set to keep testing time within reasonable limits. In Set A consisting of the NP items, the non-symbolic numerical magnitude estimation was measured by asking students to locate a given number on a 0-100 number line (see Figure 1). The experimenter's instructions were the following: "What I am going to ask you to do is to mark the position of some numbers on the number line. Our number line goes from 0 at the left end to 100 at the right end. Where would you put 5? Mark it on a line like this [making a vertical hatch mark]". Set B consisting of the PN items measured the symbolic numerical magnitude estimation. Students were asked to estimate the number corresponding to a marked position on a 0-100 number line (see Figure 2). The experimenter's instructions were: "Now I am going to ask you to decide which numbers are already marked on the number line. Our number line goes from 0 at the left end to 100 at the right end. What is this number [pointing to the hatch mark]? Write it here [pointing to the short line above the number line]." The content of different items is listed in Table A1 in appendix. The Area comparison task measured the non-symbolic numerical magnitude comparison. As there was no suitable precedent in the literature, this task was created for the purposes of this study. The items were in the form of a square divided into two areas, one colored light gray and the other dark gray. Students had to estimate which of the two areas is larger (see Figure 3). The instructions of the experimenter Development of a number sense screening test 205 were the following: "Look at the square. It consists of two parts, one dark gray and one light gray. Which part is bigger? When you decide, I want you to put a checkmark in the box of the same color under the square". Three sets of items (sets A, B, and C) were designed; each set covered ratios 6:5, 7:6, 8:7, 9:8, 10:9, and 11:10, with four items for each ratio, so that all possible combinations of color (light gray, dark gray) and position (left, right) of the larger area were included (see Table A2 in appendix). Within each set, the items were randomly ordered and presented in three rows of three items each on eight A4 pages, starting with set A, followed by set B and set C. The test pages were preceded by a separate page with three sample items to ensure that the students understood the task. After presenting the sample items, the students went through the task at their own pace, and the total time was limited to five minutes. A time limit was set to keep testing time within reasonable limits. The Number comparison task, which is modelled on one of the tasks in the Number Knowledge Test (McGraw-Hill Education, n.d.), measured the symbolic numerical magnitude comparison. It consisted of 24 increasingly difficult items (Figure 4) presented in three rows of two on four A4 pages. The students had to decide quickly without calculations which of the two numbers in the lower corners of an equilateral triangle (i.e., the comparison number) was closer in magnitude to the number in the upper corner (i.e., the reference number). Comparison numbers were chosen so that their distances from the reference number either differed by 1 (in 12 items) or by 2 (in the other 12 items). For example, in the triangle with the reference number 3 where the two comparison numbers were 1 and 7, the distances (i.e., the absolute difference) between the reference and comparison numbers were 2 and 4, so the difference between the distances was 2. In items 1-6, the reference numbers had values between 3 and 8, and if the comparison to the other two numbers was done through performing calculations, the arithmetic would require no carrying over. In items 7-12, the reference numbers were larger (13-18) and no carrying over would be required when making comparisons. In items 1318, the reference and comparison numbers had values up to 20 and one of the comparison numbers would require a carry over or borrowing. In items 19-24, the reference numbers were larger than 20 and a carry over or borrowing would be needed for one of the comparisons. Items are shown in Table A3 in appendix. The experimenter's instructions were: "Now I am going to ask you to compare some numbers. Here is a triangle. First, look at the number in the upper corner. Now, look at the numbers in the two lower corners. Which one of the numbers in the lower corners is closer in magnitude to the number in the upper corner? Do not make any calculations; choose the number you think is right and circle it." To prevent performing calculations, the experimenter guided the pace by reading the numbers in the upper corners at intervals of three seconds. Students were also given instructions on what to do if they got lost due to temporary inattention: "If you miss a triangle, just wait for the next one." The test pages were preceded by a separate page with three sample items and a practice page with six items to ensure that all students understood both the task and the guided solving. At the end Figure 4 Sample Item for the Number Comparison Task of each test page, the testator stopped and made sure that all students had managed to turn the page and were ready before continuing with the task. Other measures. For the validation of our newly designed instrument, we used teacher's assessment of students' numerical skills scale and the test of writing two- and three-digit numbers. Students' numerical skills scale consists of 11 items for assessing following skills of the students on a 7-point scale (from 1 - very poor to 7 - excellent): (1) speed in solving tasks, (2) correctness ofanswers, (3) speed in retrieving mathematical facts (e.g., addition/subtraction up to 10, times-tables), (4) use of adequate mathematical procedures (e.g., addition/ subtraction, subtraction of two digits with borrowing), (5) use of age-appropriate calculation strategies (e.g., counting with fingers), (6) autonomy in solving mathematical problems, (7) persistence in solving mathematical problems, (8) interest in mathematics, (9) mathematical knowledge, (10) reading skill, and (11) writing skill. Most items relate to core numerical skills, except for the last two items, which relate to reading and writing. Reading and writing are also crucial for solving mathematical tasks. All 11 items were included in a principal component analysis. A scree plot showed a one-component solution. The first component explained 79.45% of the total variance. For each student, teacher's answers to 11 items were averaged to obtain the scale score. The scale showed a strong internal consistency (a = .97). In the writing numbers test, the experimenter read out loud numbers and students wrote them down on a response sheet. Part 1 comprised 10 two-digit numbers and part 2 comprised 10 three-digit numbers. Students in the second and third grade completed both parts, while students in the first grade only completed part 1 as they were not yet familiar with three-digit numbers. For each student, the correct answers in each part were counted to obtain the test score. Both parts showed a strong internal consistency (a = .92 for part 1 and .97 for part 2). 206 K. Depolli Steiner, C. Peklaj, and A. Podlesek Results For each item in the number line estimation tasks, the absolute error (the deviation from the correct answer) was used as the student's score. The mean scores for each item, item-total correlation, and Cronbach's alpha if the item was deleted are shown in the appendix (Table A1). The student's score for the task was calculated as the mean absolute error for all items included; a higher number means a lower score. For the area comparison task and the number comparison task, all test items were scored as 1 for the correct answer and 0 for the incorrect answer or the unanswered item. The percentage of students who answered each item correctly, item-total correlation, and Cronbach's alpha if the item was deleted are shown in the appendix (Tables A2 and A3). The student's score for the task was calculated as the sum of the correct answers for all items included in the task; a higher number means a higher score. Number line estimation tasks. Overall, both parts of the number line estimation task were easy for the third grade students, reasonably easy for the second grade students, but quite difficult for the first grade students. In the NP task, the average score of the students in third grade was 7.17, in second grade 10.10 and first grade 17.93. The students' average scores in the PN task were 6.71, 9.36 and 19.35, respectively. For all items, the average absolute error decreased from the first to the third grade, which shows that the students' performance increased with the grade. In the NP task, the average absolute error for the individual items ranged from 12.66 to 3.48 in the third grade, 15.25 to 5.39 in the second grade, and 13.23 to 2.42 in the first grade. In the PN task, the average absolute error for the individual items ranged from 10.90 to 0.83 in the third grade, 24.88 to 11.73 in the second grade, and 29.44 to 6.63 in the first grade. A closer examination of the average absolute errors for individual items showed that the two items with numbers closest to 0 (numbers 3 and 6) were the easiest in both tasks. The PN item with the number 98 had a high accuracy as well. Only one NP item (number 86) and two PN items (numbers 74 and 67) had an average absolute error of more than 10 in all three grades. These results show that representations close to reference points 0 and 100 are easier to make than those further away from such points. Both parts of the number line estimation task showed acceptable internal consistency in all three grades (a were .79, .81 and .68 for the NP task and .81, .80 and .73 for the PN tasks). Detailed psychometric information for individual items on both number line estimation tasks are listed in Table A1 in the appendix. In both the NP and PN tasks, all items were retained. Area comparison task. During this task, the experimenters noticed that students' attention and motivation quickly decreased. There were some cases where students did not get to the last item because of a time limit. After testing, it was therefore decided to reduce the length of the test from three to two sets of items for analysis and future use. Items with highest accuracy (48 items) were chosen from the three original sets and 24 items were removed. The items from sets A and B were preferred because they were reached and solved (either correctly or incorrectly) by a larger number of participants. Therefore, only four items from set C were retained, as a comparable substitute for three items from set A and one item from set B. The selected four items from set C appeared to perform better than the replaced items in sets A and B. The retained items analysis is shown in Table A2 in appendix. The final version of the task showed an acceptable internal consistency in all three grades (a = .80, .72 and .68). Overall, the task was relatively easy - in the third grade, 37 retained items were answered correctly by more than 70% of the students (33 items in the second grade, 31 items in the first grade), 11 items were answered by 50 to 70% (14 items in the second grade, 12 items in the first grade), and none by less than 50% (one item in the second grade, five items in the first grade). A closer examination of response accuracy showed that it was highest in items with 6:5 ratio (82.9, 85.8 and 85.1%>) and decreased with higher ratios. Based on these results we estimate that for children in the first three grades of primary school, the just noticeable difference in surface area is roughly at the ratio of 9:8 (at this ratio the probability of correct response exceeded the chance level by approx. 50%). Number comparison task. This task was relatively easy, with 19 of the 24 items answered correctly by more than 70% of the third grade students (16 items in the second grade, six items in the first grade), and another two items answered correctly by 50 to 70% of the third grade the students (five items in the second grade, 12 items in the first grade). Only three items, items 19 to 21, were shown to be hard, as they were answered correctly by only approx. one-third of the students. All 18 items with numbers under 20 were answered correctly by more than 50% of the students in the second and third grades. The easiest items were those involving numbers under 10. Most difficult were those involving larger numbers (e. g., Item 21, see Table A3 in appendix). Overall, the items with a difference of 2 between the distances of the comparison numbers to the reference number were easier than those with a difference of 1. All items in this task were retained. The task showed acceptable internal consistency in all three grades (a = .63, .62 and .77). Differences between grades. The next step was to compare the performance of students from different grades. Four one-way independent ANOVAs were conducted to compare group mean scores on the number line estimation, area comparison, and number comparison tasks. Table 1 shows descriptive statistics for the overall sample and separately for the first, second, and third grade students. The results from ANOVA are listed in the last column of the table. The mean scores' differences were statistically significant in all tasks, with medium to large effect sizes. The mean scores of the three groups consistently showed that students' performance increased with grade. The largest increase in mean scores from the first to the third grade, with effect sizes (