ITADICT Project and Japanese Language Learning Marcella MARIOTTI Alessandro MANTELLI Ca' Foscari University of Venice Ca' Foscari University of Venice mariotti@unive.it mantrex@gmail.com Abstract This article aims to show how the Nuclear disaster in Fukushima (3 March 2011) affected Japanese Language teaching and learning in Italy, focusing on the ITADICT Project (Marcella Mariotti, project leader, Clemente Beghi, research fellow and Alessandro Mantelli, programmer). The project intends to develop the first Japanese-Italian online database, involving more than 60 students of the Japanese language interested in lexicographic research and online learning strategies and tools. A secondary undertaking of ITADICT is its Latin alphabet transliteration of Japanese words using the Hepburn system of romanization. ITADICT is inspired by the EDICT Japanese-English database developed by the Electronic Dictionary Research and Development Group established in 2000 within the Faculty of Information Technology at Monash University. The Japanese-Italian database is evolving within the Department of Asian and North African Studies at Ca' Foscari University of Venice, the largest in the country and one of the main teaching centres of Japanese in Europe in terms of the number of students dedicated to it (more than 1800) and number of Japanese language teaching hours (1002h at B.A. level, and 387h at M.A. level). In this paper we describe how and why the project has been carried out and what the expectations are for its future development. Keywords ITADICT; Japanese-Italian database; lexicography; Japanese language; online database; collaborative editing; Japanese language learning Izvleček Pričujoči članek predstavlja projekt ITADICT (vodja projekta Marcella Mariotti, sodelujoči raziskovalec Clemente Beghi, programer Alessandro Mantelli) in vpliv nuklearne katastrofe v Fukushimi 3. marca 2011 na učenje japonščine v Italiji. Cilj projekta je razvoj prve spletne japonsko-italijanske baze podatkov, pri njem pa sodeluje več kot 60 študentov japonščine, ki jih zanima slovaropisje in učne strategije ter orodja na spletu. Drugi cilj projekta ITADICT je prečrkovanje japonskih besed v latinico, po sistemu Hepburn. Projekt je zastavljen po vzoru japonsko-angleške podatkovne baze EDICT, ki jo je razvila skupina Electronic Dictionary Research and Development Group (skupina za raziskovanje in razvoj elektronskih slovarjev), ki je bila ustanovljena leta 2000 na Fakulteti za informacijskso tehnologijo na Univerzi Monash. Japonsko-italijanska baza podatkov se razvija na Oddelku za azijske in severno-afriške študije na Univerzi Ca'Foscari v Benetkah, eden od glavnih centrov za učenje japonščine v Evropi in Acta Linguistica Asiatica, Vol. 2, No. 2, 2012. ISSN: 2232-3317 http://revije.ff.uni-lj.si/ala/ največji v Italiji po številu študentov (1800) in številu učnih ur japonščine (1002 na prvi stopnji in 387 na drugi stopnji študija). Članek predstavlja ozadje in način izpeljave projekta ter načrte za prihodnji razvoj. Ključne besede ITADICT; japonsko-italijanska baza podatkov; leksikografija; japonski jezik; spletna baza podatkov; sodelovalno urejanje; učenje japonščine 1. ITADICT Project and Japanese Language Learning The ITADICT Project (http://virgo.unive.it/itadict/eng/about), is aimed at the creation of a freely accessible Japanese-Italian database, and is expressly inspired by Jim Breen's JMdict/EDICT Project that initiated in 1991 at Monash University. The database was started separately by both Marcella Mariotti (Ca' Foscari University of Venice) and Clemente Beghi (Ca' Foscari University of Venice) between 2007 and 2008 as part of their research. At the time Beghi was a Ph.D student at Cambridge University doing research on Esoteric Buddhist Iconography, so he edited Buddhist and, for other reasons, Floral terms. In the mean time, Mariotti was a JSPS post-doc researcher at International Christian University (Tokyo), where she needed an Italian translation and transliteration in Latin alphabet of all the words present in her Hypermedia Dictionary of Japanese Grammar BunpoHyDict (Mariotti 2008), so this was her starting point for editing more than 3000 words in the database. They are both grateful to Jim Breen (Monash University) who brought their research interests together. ITADICT became one unified voluntary project coordinated by Marcella Mariotti, at the end of 2010, when Beghi and Mariotti were both teaching Japanese Language at the Department of East Asian Studies (now Department of Asian and North African Studies) at Ca' Foscari University of Venice. In one and a half years, they involved more than 60 of their students, who became an integral part of the project, actively translating terms from Japanese into Italian and inserting them in the ITADICT EDITOR later developed by Alessandro Mantelli. 2. Translating students: Why involving them? The strategic role of pleasure in the long-term acquisition processes of a foreign language has been stressed by neurolinguistics and researchers such as Danesi (2003), Schumann (2006) and Balboni (2002). Moreover, the more social networks, eLearning and mLearning sites and applications spread around on PCs and smartphones, the more students are fascinated, aware of and concerned with the object of their studies, proving that "learning is like a utility - like water or electricity - that flows in a network or a grip that we tap into when we want". (Downes 2007) According to a survey conducted in 2009 (Ferrari, 2010) students at Ca' Foscari often approach Japanese Language studies not intentionally, as a conscious part of a wider life-plan they have, but more as a way to foster their curiosity, to feel closer to "the real thing": the original language of loved novels, movies, animated movies, manga, dora^ma, inspired sutras or martial arts. Maybe due to a disenchantment felt by students of Japanese in Italy in the new century, following the economical crisis of Japan, this emotional motivation is quite specific to the learners of the net-generation (Mariotti, forthcoming, Miyake 2012), while far removed from those of the Nineties. The above may explain why, as soon as Mariotti introduced ITADICT project to her students as part of a presentation about fansubbing and language learning (Mariotti, 2011), many of them were willing to participate. Their motivations were diverse: mainly they were interested in creating a tool for translating Japanese into Italian using a mouse-over dictionary1, and in learning strategies to use the online research tools on Japanese language sites to conduct lexicographical research (Mariotti 20122). A secondary motivation followed. As of 2012, Ca' Foscari students need to complete a period of internship before they can graduate. In 2011 this internship was "warmly suggested", and students received 5 or 6 University Credits for it. Particularly because of the nuclear incident in Fukushima following the tragic 3.11 earthquake and tsunami in North Eastern Japan, Ca' Foscari students were discouraged from applying for an internship in Japan as they had usually done. As a result, 42 out of the 64 students collaborating on ITADICT were 2011 prospective graduates who did not know where to complete their internship and, intrigued by the ITADICT project, chose to do so by taking part in the project. 3. What is ITADICT? As mentioned, ITADICT was born out of different needs, largely mirroring those of the original Japanese-English EDICT project, which started in 1991 from MOKE (Mark's Own Kanji Editor), a word processor with integrated Japanese-English Dictionary (Breen 2010). ITADICT project focused on creating a file for a Japanese-Italian database that could be used by third party software to easily read and translate Japanese texts (e.g. Rikaichan - popup dictionary tool for Firefox browser-, Japan 1 E.g. http://www.polarcloud.com/rikaichan/ Online anonymous survey Why ITADICT?, addressed to the 66 students and collaborators who worked on ITADICT (Sept. 2012). Goggles -iPhone app to translate words from live camera-, Kotoba/Imiwa? - iPhone dictionary-app to manage Japanese-Other Languages databases-, and more). In 1999 the EDICT project, which had been limited by very a simple dictionary structure, evolved into the more complex JMDict Project JMDict (Japanese-Multilingual Dictionary) Project (managed by the EDRG Electronic Dictionary Research Group). (Breen 2004) JMDict employs an XML structure to support a much richer dictionary entry format including multiple kanji surface forms and readings. The original EDICT format is generated from this project as a legacy format mainly for older software packages. An expanded "EDICT2" format is also generated which more closely follows the XML content. For our purposes we started using the simpler ^'Traditional" EDICT file where there is only 1 kanji form and 1 reading per entry/line in plain text, with less marking of different semantic fields than in the newer EDICT2: KANJI [KANA] /(PoS tag) gloss/gloss/... The file had about 160.000 entries (with one line per entry), where most common entries had a (P) mark for "priority" at the end of the line. Breen's online pages describe the process used to determine the priority of a term, mainly marked after a) Alexandre Girardi's (NAIST-MULTITEL) match-analysis between EDICT entries, the 1994-1998 corpus of Mainichi Shinbun, and b) the 10,000 common words in the collection Ichimango goi bunruishü (Senmon kyouiku Publishing 1998). Although, as Breen underlines: While the priority markings accurately reflect the status of entries with regard to the various sources, they must be seen as only providing a crude indication of how common a word or expression actually is in Japanese. The "(P)" markings in the EDICT and EDICT2 files appear to identify a useful subset of "common" words, but there are clearly some marked entries which are not very common, and there are clearly unmarked entries which are in common use, particularly in the spoken language. (Breen, 2010) 3.1 Latin alphabet transliteration according to the Hepburn system Since our purpose was to allow as many people as possible to approach the Japanese language and to enable, let us say, a primary school teacher to say some words to a Japanese-native-speaker child at school even without knowing k^a^nji or kana, we added a new characteristic to Breen's EDICT format: the Latin alphabet transliteration of the h^ira^ga^na readings in brackets. ITADICT line, then, appeared as follows: KANJI [KANA, latin] /(PoS tag)/gloss/gloss/... , oitsuku] / (v5k,vi) raggiungere/uguagliare/arrivare al livello di/(P)/ This decision was followed by a heated debate, and since the database was developed inside university academia, and the whole process itself was part of a teaching/learning project, we adopted the Hepburn system of transliteration and did not rely on automatic transliteration tools (e.g. Romaji Translator), but rather chose to manually transcribe each entry. 3.2 Use of monolingual JA-JA dictionary With the intent to not only produce an accurate Japanese-Italian database for the general user, but also to offer our students a professionalizing experience and autonomous learning strategies, we encouraged them to refer to online and offline monolingual Japanese-Japanese dictionaries and discussion groups (e.g. Yahoo's Chiebukuro or kotoba.ne.jp), above and beyond utilizing them to only translate from English. This was intended to avoid "false friends" as well, which are quite numerous in English and Italian, such as shunbun, translated into English as "vernal equinox" and mistakenly translated into the Italian "equinozio d'inverno" (winter equinox), instead of "equinozio di primavera" (spring equinox). Further explanations about checking entered translations through the ITADICT Editor are given in section 5 of this article, dedicated to ITADICT Editor. 4. How was the project organized? 4.1 Repartition of the "traditional" EDICT file The work with our students started with the extraction of 18626 priority words (P) from the "traditional" EDICT file, resulting in a 2.8 MB .txt file that was split into 18 smaller files of 1000 entries each. Students volunteering were assigned 250 entries each, while internship students were assigned 1000 entries each. The former had a very flexible deadline, while the latter had to complete the internship in 3 months. Assigned entries were sent to the students as an .rtf file e-mail attachment, with lines numbered from 1 to 1000 for each "priority" file. (Figure 1) itadict_pl2000.rtf [ stili [ = = = ] [ Spajiatura r ] [ Eiencjii ' ] * ♦ ^ o T/ V ^ r.' ^ r.' L ^ L ^ L "TTT 1 2006. n^tb p^Öi;(i!)]/(ady,n) beginning of month/(P)/ 1 2007. pž]/(n) 1ime/years/days/(P)/ 1 2011. Mm [tfi^]/(n-adv,n-t) Monday/(P)/ 12012. Bi a [tfj; ^ Q^ /(n-adv.n-t) l^onday/(P)/ 12013. ^ \if-D S -D] /tn-ady,n-t) end of the month/(P)/ 1 201 4. ß S pS-Tx.]/(n-ady.n-tj end of tlie month/(P)/ 1 201 5. p^Öfl^]/Cadi:,na,n) evefy inonthArite/comrTion/(P)/ 1 201 6. [tf o g ^ 5]/(n) monthly saiary/(P)/ 12017. ß tä! [t/o b i <]/(n) lunar eclipse/(P)/ 1 201 8. ^ S viewing the moon/(P)/ 12019. ^ ^ [tfLj /(n) monthly tuition fee/(P)/ 1 2020. M it [if-:! <3!] /(n) monthly installment (instalmentj/monthiy payment/(P)/ 12021. M-k [tfo b j: <]/(n) lunar eclipse/(P)/_ Figure 1: Partitioned .rtf file of (P)riority words with numbered lines The overall exchange of files was managed on a shared online google spreadsheet called Ripartizione ITADICT created by Mariotti on November 3, 2010 (Figure 2). The spreadsheet included the following information: • student's name and surname, • assigned file or portion of file, • deadline of the work, • first/last line to translate, • delivered date, • reviewed status, • supervisor of the (later) import in the online new EDITOR developed by Mantelli, • private e-mail (upon written agreement, so as to be able to contact the student-translator even in the future), • grade (undergraduate or graduate), • supervisor of the delivered entries. Ripartizione had let Pile Edit View Insert Formal Dale Tools Help Ö^'-^C T * *123- lOpt • B , Nome All changes sav«d A . n . m - m liadicfP tiadictP llaüiclP lladlclP lladlcCP2001 iiaai«P3!»i tlädlctP2001 ilaüiclP2001 itwji«Piooi llB{li«lP1C*0l iiadictPtMl lladlctPIMI ilMictPltWi tiadlclPSOOl lladlctP3001 iladlctP3QQ1 lladic«'4001 llBdlctP400l iBflictP4001 llaEficlPäOOl fladlclPetWI lladi«P7000 iladicEPSOOO fiedlciPSOOO lladictP 10000 toms(^>UDATE prfmff-ultlmo 5TASE Carmine: convagrato CORR£2 inpo/«azio> MAIL 751-1000 wnwgnpw OK kfft 1-250 conse^oalo OK km OK kb 501-750 iconsegialo OK km 2751-3000 OK km 1-250 OK km 2S1-500 OK km 501-750 consegnalo OK km 1-250 «■npsgneio OK ka 261-500 OK km 501-750 OK km 1751-2000 i001umnio~ .1001.2000 1250:4uniiOOf «mst^nto OK kb 1-250 conssgnalo OK km 251-500 ■conseanalo OK km 501-750 750:fuizy consegnato OK km 751-1000 consagnalD OK km 1-250 consegnalo OK km 251-500 consegnato «?? km 501:marinbB 750; SC 1-750 monpari OK km tutti cons kb futa econs OK km 1-1000 1 1000 lt