Zbornik 23. mednarodne multikonference INFORMACIJSKA DRUŻBA Zvezek H Proceedings of the 23rd International Multiconference .si INFORMATION SOCIETY Volume H .ijsI S http://is Interakcija človek-računalnik v informacijski družbi 20 Human-Computer Interaction in Information Society Uredili / Edited by 20 Veljko Pejović, Matjaž Kljun, Vida Groznik, Domen Šoberl, Klen Čopič Pucihar, Bojan Blažica, Jure Žabkar, Matevž Pesek, Jože Guna, Simon Kolmanič 7. oktober 2020 / 7 October 2020 Ljubljana, Slovenia Zbornik 23. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2020 Zvezek H Proceedings of the 23rd International Multiconference INFORMATION SOCIETY – IS 2020 Volume H Interakcija človek-računalnik v informacijski družbi Human-Computer Interaction in Information Society Uredili / Edited by Veljko Pejović, Matjaž Kljun, Vida Groznik, Domen Šoberl, Klen Čopič Pucihar, Bojan Blažica, Jure Žabkar, Matevž Pesek, Jože Guna, Simon Kolmanič http://is.ijs.si 7. oktober 2020 / 7 October 2020 Ljubljana, Slovenia Uredniki: Veljko Pejović Univerza v Ljubljani, FRI, ACM SIGCHI Chapter Bled Slovenia Matjaž Kljun Univerza na Primorskem, FAMNIT Vida Groznik Univerza na Primorskem, FAMNIT Domen Šoberl Univerza na Primorskem, FAMNIT Klen Čopič Pucihar Univerza na Primorskem, FAMNIT Bojan Blažica Inštitut Jožef Stefan Jure Žabkar Univerza v Ljubljani, FRI Matevž Pesek Univerza v Ljubljani, FRI Jože Guna Univerza v Ljubljani, FE Simon Kolmanič Univerza v Mariboru, FERI Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2020 Informacijska družba ISSN 2630-371X Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID=33206787 ISBN 978-961-264-200-6 (epub) ISBN 978-961-264-201-3 (pdf) PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2020 Triindvajseta multikonferenca Informacijska družba (http://is.ijs.si) je doživela polovično zmanjšanje zaradi korone. Zahvala za preživetje gre tistim predsednikom konferenc, ki so se kljub prvi pandemiji modernega sveta pogumno odločili, da bodo izpeljali konferenco na svojem področju. Korona pa skoraj v ničemer ni omejila neverjetne rasti IKTja, informacijske družbe, umetne inteligence in znanosti nasploh, ampak nasprotno – kar naenkrat je bilo večino aktivnosti potrebno opraviti elektronsko in IKT so dokazale, da je elektronsko marsikdaj celo bolje kot fizično. Po drugi strani pa se je pospešil razpad družbenih vrednot, zaupanje v znanost in razvoj. Celo Flynnov učinek – merjenje IQ na svetovni populaciji – kaže, da ljudje ne postajajo čedalje bolj pametni. Nasprotno - čedalje več ljudi verjame, da je Zemlja ploščata, da bo cepivo za korono škodljivo, ali da je korona škodljiva kot navadna gripa (v resnici je desetkrat bolj). Razkorak med rastočim znanjem in vraževerjem se povečuje. Letos smo v multikonferenco povezali osem odličnih neodvisnih konferenc. Zajema okoli 160 večinoma spletnih predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic in 300 obiskovalcev. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad – seveda večinoma preko spleta. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica (http://www.informatica.si/), ki se ponaša s 44-letno tradicijo odlične znanstvene revije. Multikonferenco Informacijska družba 2020 sestavljajo naslednje samostojne konference: • Etika in stroka • Interakcija človek računalnik v informacijski družbi • Izkopavanje znanja in podatkovna skladišča • Kognitivna znanost • Ljudje in okolje • Mednarodna konferenca o prenosu tehnologij • Slovenska konferenca o umetni inteligenci • Vzgoja in izobraževanje v informacijski družbi Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. V 2020 bomo petnajstič podelili nagrado za življenjske dosežke v čast Donalda Michieja in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe je prejela prof. dr. Lidija Zadnik Stirn. Priznanje za dosežek leta pripada Programskemu svetu tekmovanja ACM Bober. Podeljujemo tudi nagradi »informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo. Limono je prejela »Neodzivnost pri razvoju elektronskega zdravstvenega kartona«, jagodo pa Laboratorij za bioinformatiko, Fakulteta za računalništvo in informatiko, Univerza v Ljubljani. Čestitke nagrajencem! Mojca Ciglarič, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD INFORMATION SOCIETY 2020 The 23rd Information Society Multiconference (http://is.ijs.si) was halved due to COVID-19. The multiconference survived due to the conference presidents that bravely decided to continue with their conference despite the first pandemics in the modern era. The COVID-19 pandemics did not decrease the growth of ICT, information society, artificial intelligence and science overall, quite on the contrary – suddenly most of the activities had to be performed by ICT and often it was more efficient than in the old physical way. But COVID-19 did increase downfall of societal norms, trust in science and progress. Even the Flynn effect – measuring IQ all over the world – indicates that an average Earthling is becoming less smart and knowledgeable. Contrary to general belief of scientists, the number of people believing that the Earth is flat is growing. Large number of people are weary of the COVID-19 vaccine and consider the COVID-19 consequences to be similar to that of a common flu dispute empirically observed to be ten times worst. The Multiconference is running parallel sessions with around 160 presentations of scientific papers at twelve conferences, many round tables, workshops and award ceremonies, and 300 attendees. Selected papers will be published in the Informatica journal with its 44-years tradition of excellent research publishing. The Information Society 2020 Multiconference consists of the following conferences: • Cognitive Science • Data Mining and Data Warehouses • Education in Information Society • Human-Computer Interaction in Information Society • International Technology Transfer Conference • People and Environment • Professional Ethics • Slovenian Conference on Artificial Intelligence The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national engineering academy, the Slovenian Engineering Academy. In the name of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. For the fifteenth year, the award for life-long outstanding contributions will be presented in memory of Donald Michie and Alan Turing. The Michie-Turing award was given to Prof. Dr. Lidija Zadnik Stirn for her life-long outstanding contribution to the development and promotion of information society in our country. In addition, a recognition for current achievements was awarded to the Program Council of the competition ACM Bober. The information lemon goes to the “Unresponsiveness in the development of the electronic health record”, and the information strawberry to the Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana. Congratulations! Mojca Ciglarič, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Marjetka Šprah Vladimir Fomichov, Russia Mitja Lasič Vesna Hljuz Dobric, Croatia Blaž Mahnič Alfred Inselberg, Israel Jani Bizjak Jay Liebowitz, USA Tine Kolenik Huan Liu, Singapore Henz Martin, Germany Marcin Paprzycki, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA prof. Toby Walsh, Australia Programme Committee Mojca Ciglarič, chair Andrej Gams Vladislav Rajkovič Bojan Orel, co-chair Matjaž Gams Grega Repovš Franc Solina, Mitja Luštrek Ivan Rozman Viljan Mahnič, Marko Grobelnik Niko Schlamberger Cene Bavec, Nikola Guid Špela Stres Tomaž Kalin, Marjan Heričko Stanko Strmčnik Jozsef Györkös, Borka Jerman Blažič Džonova Jurij Šilc Tadej Bajd Gorazd Kandus Jurij Tasič Jaroslav Berce Urban Kordeš Denis Trček Mojca Bernik Marjan Krisper Andrej Ule Marko Bohanec Andrej Kuščer Tanja Urbančič Ivan Bratko Jadran Lenarčič Boštjan Vilfan Andrej Brodnik Borut Likar Baldomir Zajc Dušan Caf Janez Malačič Blaž Zupan Saša Divjak Olga Markič Boris Žemva Tomaž Erjavec Dunja Mladenič Leon Žlajpah Bogdan Filipič Franc Novak iii iv KAZALO / TABLE OF CONTENTS Interakcija človek računalnik v informacijski družbi / Human-Computer Interaction in Information Society .. 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 4 Investigating the Role of Context and Personality in Mobile Advertising / Martinovic Andrej, Pejović Veljko ........ 5 Interaktivna vizualizacija proračuna Republike Slovenije s Sankeyevim diagramom / Tušar Tea ........................ 9 MightyFields Voice: Voice-based Mobile Application Interaction / Zupančič Jernej, Štravs Miha, Mlakar Miha .. 13 eBralec 4: hibridni sintetizator slovenskega govora / Žganec Gros Jerneja, Romih Miro, Šef Tomaž ................ 17 Sound 2121: The Future of Music is Natural / Deja Jordan Aiko, Attygale Nuwan, Čopič Pucihar Klen, Kljun Matjaž ............................................................................................................................................................... 21 Ohranjanje kulturne dediščine s pomočjo navidezne in obogatene resničnosti / Plankelj Marko, Lukač Niko, Rizvić Selma, Kolmanič Simon ......................................................................................................................... 25 Predmetnik: oprijemljiv uporabniški vmesnik za informiranje turistov / Sotlar Gregor, Roglej Peter, Čopič Pucihar Klen, Kljun Matjaž ............................................................................................................................... 29 Razvoj in Ocenjevanje Prototipa Mobilne Aplikacije z Elementi Igrifikacije in Mešane Resničnosti / Zorko Monika, Debevc Matjaž, Kožuh Ines ................................................................................................................ 33 StreetGamez: detection of feet movements on the projected gaming surface on the floor / Škrlj Peter, Lochrie Mark, Kljun Matjaž, Čopič Pucihar Klen ........................................................................................................... 37 Anamorfična projekcija na poljubno neravno površino / Cej Rok, Solina Franc .................................................. 41 Učinkovita predstavitev slovarskih jezikovnih virov pri govornih tehnologijah / Žganec Gros Jerneja, Golob Žiga, Dobrišek Simon ................................................................................................................................................ 45 The Fundamentals of Sound Field Reproduction Using a Higher Order Ambisonics System / Prislan Rok ....... 49 The use of eCare services among informal carers of older people and psychological outcomes of their use / Smole Orehek Kaja, Dolnicar Vesna, Hvalič Touzery Simona ........................................................................ 52 Indeks avtorjev / Author index ................................................................................................................................ 57 v vi Zbornik 23. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2020 Zvezek H Proceedings of the 23rd International Multiconference INFORMATION SOCIETY – IS 2020 Volume H Interakcija človek-računalnik v informacijski družbi Human-Computer Interaction in Information Society Uredili / Edited by Veljko Pejović, Matjaž Kljun, Vida Groznik, Domen Šoberl, Klen Čopič Pucihar, Bojan Blažica, Jure Žabkar, Matevž Pesek, Jože Guna, Simon Kolmanič http://is.ijs.si 7. oktober 2020 / 7 October 2020 Ljubljana, Slovenia 1 2 PREDGOVOR Interakcija človek–računalnik v informacijski družbi je konferenca, ki jo organizira Slovenska skupnost za proučevanje interakcije človek–računalnik. Namen konference je zbrati raziskovalce, strokovne delavce in študente s področja in ponuditi možnost izmenjave izkušenj in raziskovalnih rezultatov, kakor tudi navezave stikov za bodoča sodelovanja . Tokratna, peta reinkarnacija konference se že drugič odvija pod okriljem SIGCHI poglavja ACM Chapter Bled, ki je nastalo tudi kot posledica prejšnjih konferenc. O rasti HCI skupnosti v regiji pa priča tudi vse večje število prispevkov, ki prihajajo z vseh večjih visokošolskih zavodov v Sloveniji. Teme, ki jih konferenca pokriva segajo od bolj uveljavljenih, kot so vizualizacija, snovanje grafičnih in uporabniških vmesnikov, ki temeljijo na govoru, personalizacija in prilagajanje interakcije uporabnikom, pa do virtualne in nadgrajene resničnosti ter uporabniških vmesnikih v turizmu, umetnosti in e-učenju. FOREWORD Human-computer interaction in information society is a conference organized by the Slovenian HCI community. The purpose of the conference is to gather researchers, practitioners and students in the field and offer the opportunity to exchange experiences and research results, as well as to establish contacts for future cooperations. This year's fifth reincarnation of the conference is, for the second time, organized by the SIGCHI Chapter ACM Chapter Bled, which has been established also as a result of previous conferences. The growth of the HCI community in the region is witnessed by the doubled number of contributions coming from all major higher education institutions in Slovenia. The topics covered by the conference range from the more established ones, such as visualization and design of graphical and audio user interfaces, personalisation and interaction adaptation, to virtual and augmented reality, and the ap plication of user interfaces in tourism, arts, and e-learning. 3 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Nuwan T Attygalle (Univerza na Primorskem) Bojan Blažica (Inštitut Jožef Stefan) Klen Čopič Pucihar (Univerza na Primorskem) Jordan Deja (Univerza na Primorskem) Vida Groznik (Univerza na Primorskem) Jože Guna (Univerza v Ljubljani) Matjaž Kljun (Univerza na Primorskem) Simon Kolmanič (Univerza v Mariboru) Ines Kožuh (Univerza v Mariboru) Elham Motamedi (Univerza na Primorskem) Marko Tkalčič (Univerza na Primorskem) Domen Šoberl (Univerza na Primorskem) Veljko Pejović (Univerza v Ljubljani) Jure Žabkar (Univerza v Ljubljani) 4 Investigating the Role of Context and Personality in Mobile Advertising Andrej Martinovič Veljko Pejović Faculty of Computer and Information Science, University Faculty of Computer and Information Science, University of Ljubljana, Slovenia of Ljubljana, Slovenia am6694@student.uni-lj.si Veljko.Pejovic@fri.uni-lj.si ABSTRACT Machine learning and recommender systems are at the core More than three billion smartphones carried by their users at of modern advertising solutions [9]. The selection of the ad virtually all times, represent an unprecedented platform for to be show to the user benefits from the history of purchases, in-situ advertisement delivery. While recent efforts in data information on the similarity among users, but also on the analysis and machine learning led to significant advances in information about a user’s personality [6]. the way relevant content is selected to be shown to a user, Moving to the mobile domain, contextual information, thorough investigation on how the content should be dis- such as location may impact the relevance of an ad [2]. The played to a mobile user is yet to be conducted. In this work we context, that can be sensed by a smartphone, such as a user’s present our preliminary research on the role of the context location, his physical activity, time of day, and other factors, in which an advertisement is consumed and the personality can also be used to determine the suitability of a moment of a user consuming it on the perception of the ad content. for information delivery [7]. We conduct a 7-week study with 14 mobile users who were While the previous work focuses on the content or the exposed to both video and picture ads. Through mobile sens- timing of the ad delivery, the type of the ad, to the best of our ing and experience sampling we capture the information on knowledge, has not been examined in the mobile domain. the context in which the ad was seen, the user’s attitude Nevertheless, the type of the ad, whether it is a picture, a towards the ad, as well as the user’s personality traits. Statis- short or a long video, or perhaps an interactive content (e.g. tical analysis based on mixed-effect modelling demonstrates a short game) is an important parameter that influences the that personality traits play an important role in ad percep- overall design of an ad, the platforms at which the ad can tion, as does the ad type, with picture ads being preferred to be shown, advertisement budget, etc. In this paper we focus video ads, while the effect of the context on ad perception on the perception of an ad type in mobile computing and appears to be negligible. pose the following research question: Can the contextual information collected by the mobile phone sensors and the CCS CONCEPTS information on a user’s personality predict a user’s perception • Human-centered computing → Interaction techniques; of different types of mobile ads? Ubiquitous and mobile devices; Empirical studies in ubiq- uitous and mobile computing. 2 METHODOLOGY KEYWORDS To obtain ecologically valid data on mobile ad perception in different contexts we developed a data collection mobile mobile advertising, multilevel models, ubiquitous computing application that serves ads, captures a user’s attitudes to- 1 INTRODUCTION AND BACKGROUND wards the displayed ads, and collects sensor data pertaining to the context of use. In the rest of the section we present Tremendous amounts of digital traces, just-in-time sensor the details of our app. information, and the advances in data processing have re- sulted in major shifts in how the advertising is performed. Mobile Application Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are We implemented a full-fledged mobile app that caters to the not made or distributed for profit or commercial advantage and that copies need of our target users – students at our University. The bear this notice and the full citation on the first page. Copyrights for third-application was built for the Android platform and serves party components of this work must be honored. For all other uses, contact as a utility tool allowing its users to: obtain information the owner/author(s). Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia on nearby restaurants providing subsidised student meals, © 2020 Copyright held by the owner/author(s). get real-time public transport timetables, record or share important student notes, retrieve latest student related news 5 Figure 1: Data collection app: one of the functionalities (left), advertisement (center) and an ESM questionnaire (right). feeds, save and access their most needed school gadgets, and was engaged with the ad. The last question focused on the ap- organise their class schedules (Figure 1 left). propriateness of the displayed ad. The answers are recorded with five-level Likert scales. Figure 1 represents the data col-Mobile ads. Mobile ads come in different flavours ranging lecting workflow, where a user made an action, which led to from simple picture-based ads, over video ads, to more inter- the ad being displayed, followed by the ESM questionnaire. active game-like ads. We opted to investigate the two most frequent types of ads in our study – pictures and videos. We Personality test. Previous research demonstrates that per- further divide the video ads into two groups – short videos, sonality traits have a moderate effect on a user’s attitude with the length of 30 seconds or less, and long videos with towards advertisement [1]. Therefore, we included the BFI-the length between 30 and 80 seconds. From each of the 10 personality test [8] as a part of our app. The test includes three groups – pictures, short videos, and long videos – we ten questions about a user’s traits answered on a seven-point gathered 31 different publicly available ads and pre-loaded Likert scale. The processed BFI-10 data, assessing a user’s them on our server. After five actions that a user makes personality along the five dimensions (extraversion, agree- within our app, a request is made to our back-end system ableness, openness, conscientiousness, and neuroticism), was which responds with a random ad of a randomly chosen cat- further compared to the statistics calculated on a larger pop- egory. Simultaneously, we activate mobile phone’s sensors ulation set in order to extract the percentiles to which the and capture the user’s context, including the physical activity participants personality trait scores belong [8]. (through Android’s Google Activity Recognition function- ality), location (clustered as work, home, or other, according Data collection campaign to the method described in [7]), screen brightness, battery Our data collection campaign lasted for seven weeks in level, time of day, and the Internet connectivity type. spring 2020 and included 14 participants who in total viewed Experience sampling method (ESM) questionnaires. ESM is 994 ads, out of which 501 they labeled, i.e. an ESM question- commonly used to gather the participants own thoughts, naire was completed immediately after the ad was viewed. emotions, behaviour, etc [3]. In our case it provided us with The distribution of labeled and unlabeled ad types is roughly feedback regarding the participants assessment of overall ad even. The viewing was reasonably evenly distributed among suitability. With the included questionnaire we also wanted users, with the least active participant contributing 2.4% and to measure the interaction level between the user and the the most active participant contribution 12.6% of the data. displayed ad. Thus, the questionnaire consisted of following In our study we included 12 picture ads, 9 short video ads, questions: what was shown on the ad, which brand/trademark and 10 long video ads. The ads were randomly shown both was advertised, and was the ad shown in an appropriate form. within and among users, i.e. each two users saw different ads The first two questions were used to assess whether the user where a participant shown a picture ad from a specific brand 6 need not have seen a video ad from the same trademark. model comparison remains above the 0.05 threshold, again The majority of viewed ads, that are labeled, were pictures indicating the superiority of the basic model. (40.5%), followed by short videos (34.7%). The least amount Since the context is shown to be irrelevant, we focus on the of user feedback was from long videos (24.8%). The average content and the type-based models. With the inclusion of ad score (questionnaire answers ranging from "Strongly dis- type, as a part of fixed effects, we were able to build a model agree" to "Strongly agree" were transformed to the integer that preforms better then the basic one. We suspect that [-2, 2] scale) over all ads was 0.377, yet it differs across the ad different users score different ad types in different manners, types. Labeled pictures had an average score of 0.695, short thus we included the type parameter as a random slope. videos 0.253, and long videos 0.032. Metrics AIC, BIC show a significant decrease, indicating that the new model preforms better than the previous one. 3 MOBILE AD PERCEPTION MODELLING The analysis of the model reveals that picture ads receive a Our data collection study elaborated in Section 2 has resulted predominantly positive score, short videos neutral-negative, in a heterogeneous dataset with an uneven number of dat- and long videos very negative score. Slope coefficients for apoints across users, across contextual characteristics, and ad type were also found to be varying within users. We ad types. The natural organisation of our data into groups further experiment with content-based models, where the makes multilevel modelling-based analysis particularly ap- each particular ad is encoded as its own content category.The propriate. Such models generalise the linear regression in a AIC, BIC, and chi-square-based comparison indicate that the manner that allows that the effect of a group (e.g. a particular content has a statistically significant impact on ad scoring. user, a personality type, etc.) is disentangled from the effect With both content and ad type being relevant we further of predictors, such as contextual variables [4] [5]. investigate whether it is possible to combine both models With hierarchical modeling we gradually increase the and also include the ad viewing duration as a parameter. model complexity by including different parameters as a Indeed, our best preforming model includes the duration of part of fixed or random effects. At each step we need to com- ad watching, and cross-level interaction of ad content and ad pare our new model to the previous one. This is done by type as fixed effects, and ad type as the random effect. The preforming a chi-squared test checking if the residual sum of conditional 2 𝑅 metric of such a model is 0.455 whilst the squares of the new model is statistically significantly smaller marginal 2 𝑅 is 0.204 indicating a reasonably good fit. than that of the old model. To further verify which model is better we calculated the AIC (Akaike information criterion) Personality-based model and BIC (Bayesian Information Criterion) metrics, where The above user ID-based model demonstrates the impact smaller values indicate a better model, since the relative of individual traits on the ad perception. Nevertheless, the amount of information lost is lower. model is not suitable for real-world use, as it requires that an In this section we present the results of multilevel mod- individual’s data is available before predictions can be made. elling with two models constructed on the labeled data in Therefore, we now design a model that, instead of data from a order to investigate the impact of different parameters on particular user, is based on the information about personality the ad perception – a model where the user ID is the group- traits of a user. Such information can be obtained quickly ing variable and a model where the user’s personality is the through a personality test. grouping variable. We then use both labeled and unlabeled The basic personality-based model only includes a group- data in a semi-supervised learning fashion to construct our ing variable based on personality traits without any fixed final predictive model rooted in users’ personalities. effects or random slopes. As before, we find that the inclu- sion of context parameters does not improve the basic model User ID-based model so we focus on the ad content and ad type as the next model- The basic user model includes merely the participants’ IDs as ing level. Gradually increasing the complexity of our model the grouping variable. From there on we gradually increase we come to similar conclusions as in the previous section. the model complexity by separately adding context-based The fixed effects include a cross-level interaction of ad con- parameters. We experiment with the inclusion of the physical tent and ad type, where the random effects include ad type activity, location, screen brightness, battery level, time of only. The final personality-based model demonstrates that day, and the internet connectivity type information in our ad types are marked differently within different personality model, and find that none of the contextual variables have a groups. One particular group consisting of extrovert, non- statistically significant influence on whether a user marks an conflicting, non-conscious, and emotionally stable users is ad as appropriate or not. In addition, the comparison of the found to stand out. In the mentioned group pictures had an basic model with the context-based ones reveals that the AIC average score of -0.4, short videos 0.636 and long videos -0.75. and BIC metrics increase, and the p-value of the chi-squared To see if the scores were indeed significantly different, we 7 preform a Welch’s t-test between this outlying and all other (R)MSE and mean absolute error (MAE) of our model and personality groups (Table 1). We find that the difference in the baseline model that predicts the mean score across the short video scoring between the compared groups is not dataset. Average RMSE, MSE, and MAE for the personality- statistically significant, whilst the scores of pictures are. based model are 0.967, 1.014, and 0.785, whereas the baseline results in 1.117, 1.347, and 0.865, respectively, indicating that Metrics Pictures Short videos Long videos the personality-based predictive model fits the data better t-test -4.087 1.026 -1.545 2 p-value 0.001 0.326 0.162 than the majority classifier. The 𝑅 metric’s conditional value 95% conf. interval [-1.771, -0.565] [-0.467, 1.286] [-2.089, 0.416] of the model is 0.488 and the marginal is 0.308. Outlying group avg. -0.4 0.636 -0.75 Other groups avg. 0.768 0.227 0.086 4 DISCUSSION AND CONCLUSION Table 1: Welch’s t-test between the outlying personality group (extrovert, non-conflicting, non-conscious, and emo- In this paper we examined of the role of context and a user’s tionally stable) and other personality groups. personality on ad perception. While our initial assumption was that users would prefer either picture or video ads de- pending on the context of viewing, we discovered that picture Even though we built a personality-based model with ads are almost universally better accepted. This surprising the intent to make it more general, we found that not all finding might stem from our data collection limitations – personality combinations are included, as our sample size is conducted during the COVID-19 pandemics, the data fails to not large enough. With 14 participants, out of 16 different capture the full range of locations and activities we would ex- possible personality groups (openness omitted) only 7 are pect to see during regular times. A prominent role of a user’s covered. The final model’s 2 𝑅 metric conditional value is personality in the perception of an ad is another interesting 0.377 and the marginal is 0.198. finding. We discover that certain personalities actually prefer short videos over picture ads. Our general predictive model Predictive personality-based model takes personalities into account and is able to predict the The user ID-based model demonstrates that who is watching attitude that a previously unobserved user will have towards the ad is more important than in what situation is someone an ad better than the baseline model. The initial analysis also watching the ad. Predictions of an attitude towards an ad demonstrates that the content of the ad, a property that was could be used to decide whether to show an ad of a cer- outside of the scope of our study, may significantly impact tain type, or whether to show an ad at all. Yet, personalised the perception and should be further examined. user-based models would require labeled data for each user, making their construction impractical. The analysis of the REFERENCES personality-based multilevel models demonstrates that gen- [1] Aliosha Alexandrov, Susan Mayers, and Sandipan Sen. 2010. The mod- eral personality traits, obtainable through a simple 10-item erating effect of personality traits on attitudes toward advertisements: A contingency framework. Management Marketing 5 (01 2010). questionnaire, can be used to build an informative model. [2] Christine Bauer and Christine Strauss. 2016. Location-based advertising Here we examine the predictive potential of a fully gener-on mobile devices: A literature review and analysis. Management Review aliseable model based on personality traits information. Quarterly 66 (01 2016), 159–194. With semi-supervised learning, we first label the unlabeled [3] Niall Bolger. 2013. Intensive longitudinal methods: An introduction to data – using the previously constructed user ID-based model, diary and experience sampling research. Guilford Press. [4] Andrea Feldstain, Heather Woltman, Jennifer MacKay, and Meredith we predict the labels for the 493 unlabeled points. We then Rocci. 2012. Introduction to hierarchical linear modeling. Tutorials in proceed with constructing a new personality-based model. Quantitative Methods for Psychology 8 (02 2012), 62–69. Repeating the gradual increase of complexity procedure we [5] Andrew Gelman and Jennifer Hill. 2006. Data Analysis Using Regression find that the following context variables significantly impact and Multilevel/Hierarchical Models. Cambridge University Press. the fit: screen brightness, battery level, and Internet connec- [6] Haris Kriještorac, Rajiv Garg, and Maytal Saar-Tsechansky. 2019. Personality-Based Content Engineering for Rich Digital Media. In AIS tion type. Nevertheless, the variables do not feature highly in BLED. Bled, Slovenia. the final model, as ad content and ad type prove to be much [7] Veljko Pejovic and Mirco Musolesi. 2014. InterruptMe: Designing In- more impactfull on the final ad scoring. Our final generalised telligent Prompting Mechanisms for Pervasive Applications. In ACM personality-based model constructed on all gathered data UbiComp. Seattle, WA, USA. includes a cross-level interaction of ad content and ad type [8] Beatrice Rammstedt and Oliver P John. 2007. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory as fixed effects and ad type as a random effect. in English and German. Journal of Research in Personality 41, 1 (2007), To assess the potential of the model to correctly predict 203 – 212. the score a previously unseen user will give to an ad in a [9] Ayush Singhal, Pradeep Sinha, and Rakesh Pant. 2017. Use of Deep certain situation, we perform a leave-one-person out evalua- Learning in Modern Recommendation System: A Summary of Recent tion and in each step calculate the (root) mean square error Works. Intl. Journal of Computer Applications 180, 7 (Dec 2017), 17–22. 8 Interaktivna vizualizacija proračuna Republike Slovenije s Sankeyevim diagramom Interactive Visualization of the Slovenian Budget with the Sankey Diagram Tea Tušar tea.tusar@ijs.si Jožef Stefan Institute Jamova cesta 39 Ljubljana, Slovenia Slika 1: Sankeyev diagram za splošni del proračuna za leto 2020 POVZETEK different budget categories and the cash flows between them, we Predstavljamo spletno aplikacijo z interaktivno vizualizacijo pro- visualize both the general and the specific budget part. Interaction računa Republike Slovenije. Z dvema Sankeyevima diagramoma, allows to change views, so that more details can be shown. The ki prikazujeta različne kategorije proračuna in denarne tokove application does not produce pre-selected aspects of the budget, med njimi, vizualiziramo tako splošni kot posebni del državnega but is intended for free searching among its data and as such proračuna. Interakcija omogoča spreminjanje pogledov, s ka- represents an alternative to existing budget visualizations. It is terimi lahko prikažemo več podrobnosti. Aplikacija ne ponuja available at http://proracun.herokuapp.com vnaprej izbranih vidikov proračuna, ampak je namenjena pro- stemu raziskovanju po njegovih podatkih in kot taka predstavlja KEYWORDS alternativo obstoječim vizualizacijam proračuna. Na voljo je na state budget, interactive visualization, Sankey diagram naslovu http://proracun.herokuapp.com. KLJUČNE BESEDE 1 UVOD državni proračun, interaktivna vizualizacija, Sankeyev diagram Živimo v času velepodatkov, družabnih omrežij in takojšnje ko- munikacije, ki nam v vsakem trenutku nudijo ogromne količine ABSTRACT informacij. Ta preobremenjenost z informacijami nam otežuje We present a web application with interactive visualizations poglabljanje vanje in njihovo preverjanje. Tako se pogosto za- of the Slovenian budget. With two Sankey diagrams that show našamo na tuje interpretacije in se nehote znajdemo v pasivni vlogi prejemnika informacij, ki so lahko tudi pomanjkljive ali Permission to make digital or hard copies of part or all of this work for personal (namenoma) napačne. or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and Temu se je moč izogniti z lastnim preverjanjem podatkov, ki the full citation on the first page. Copyrights for third-party components of this pa je lahko zelo zahtevno opravilo. Na voljo moramo imeti dostop work must be honored. For all other uses, contact the owner/author(s). do podatkov, možnost obdelave velike količine podatkov, metode Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia © 2020 Copyright held by the owner/author(s). za vizualizacijo ter znanje potrebno za umestitev podatkov v širši kontekst in njihovo pravilno interpretacijo. Pri tem nam lahko 9 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Tea Tušar pomagajo orodja, ki pridobivanje in obdelavo podatkov opravijo predstavlja 48,4 % vseh javnofinančnih odhodkov. Sledijo pokoj- namesto nas. ninska blagajna s 27,1 % odhodkov, zdravstvena blagajna s 14,2 % V prispevku predstavljamo novo takšno orodje (slika 1), na odhodkov in občinski proračuni z 10,3 % odhodkov [10]. voljo na naslovu http://proracun.herokuapp.com, ki preko interaktivne vizualizacije s t.i. Sankeyevim diagramom uporabniku 2.1 Struktura proračuna pomaga pri razumevanju proračuna Republike Slovenije in iska- Državni proračun je sestavljen iz treh delov. nju informacij v njem. To je relevantno, saj je državni proračun I. del: Splošni del proračuna vključuje bilanco prihodkov največja izmed štirih blagajn javnega financiranja in predstavlja in odhodkov, račun finančnih terjatev in naložb ter ra- nekaj manj kot polovico vseh javnofinančnih odhodkov [10]. Raz- čun financiranja. Izkazuje se po ekonomski klasifikaciji kriva fiskalno politiko, razvojne cilje, prednostna področja ter (skupina kontov, podskupina kontov in konto). politične in strateške prednostne naloge vlade. II. del: Posebni del proračuna izkazuje porabo javnofi- Kot pri vseh vizualizacijah, ki predstavljajo kompleksne po- nančnih sredstev posameznih proračunskih uporabnikov datke, je tudi tu bistvenega pomena interaktivnost. Pri obsežnih preko institucionalne klasifikacije (nadskupina proračun- zbirkah podatkov namreč zaradi omejitev ljudi na eni in raču- skih uporabnikov, skupina proračunskih uporabnikov in nalniških vizualizacij na drugi strani ni mogoče vseh podatkov proračunski uporabnik) ter vključuje odhodke in druge iz- pokazati naenkrat. Bolje se izkaže interakcija, pri kateri upo- datke delovanja predstavljene po programski klasifikaciji rabnik s svojimi dejanji sproža spremembe pogledov. Za razliko (politika, program in podprogram). od statičnega pogleda, ki lahko naenkrat prikaže samo en vidik III. del Načrt razvojnih programov predstavlja načrt od- podatkov, interakcija podpira številne poizvedbe. Še posebej je hodkov po podprogramih, ukrepih, skupinah projektov, koristna pri preiskovanju na več ravneh podrobnosti, ko nam projektih in virih financiranja po posameznih letih za ce- omogoča, da se (postopoma) premaknemo od pregleda na najvišji lotno obdobje izvajanja projektov in ukrepov. ravni preko vmesnih pregledov do najbolj podrobnega pregleda, ki lahko prikazuje le majhen del vseh podatkov [5]. Tako za splošni kot za posebni del proračuna so na voljo tudi Naloga, ki jo novo orodje naslavlja, ni predstavljanje ali razla- dodatne obrazložitve. V nadaljevanju se osredotočamo le na ta ganje vnaprej izbranih vidikov proračuna, temveč podpora pri dva dela proračuna. prostem raziskovanju po njegovih podatkih, ki uporabniku po- maga, da najde lastne vpoglede vanje. Kot tako je torej dopolnitev 2.2 Dostopnost podatkov obstoječim vizualizacijam proračuna, kot so razlagalne infogra- Na spletišču državne uprave (https://www.gov.si/) je pod okriljem fike in druge vizualizacije, ki jih pripravlja Ministrstvo za finance Ministrstva za finance podanih mnogo informacij o državnem Republike Slovenije (več o njih v razdelku 2.3). Orodje je name-proračunu [8]. Med njimi so prosto dostopni tudi podatki o spre-njeno tako navadnim državljanom kot novinarjem in drugim jetih proračunih za vsa leta med letoma 2004 in 2021. Ti so na profilom, ki jih proračun tako ali drugače zadeva in ga želijo voljo v tabelarični obliki v datotečnem formatu PDF za vse tri raziskati ter tako bolje razumeti. dele proračuna. Namenjeni so torej predvsem pregledu in niso V nadaljevanju najprej na kratko predstavimo državni pro- primerni za dodatno računalniško obdelavo. račun, njegovo strukturo, dostopnost podatkov in obstoječe vi- Ravno nadaljnji obdelavi pa so namenjeni podatki v datoteč- zualizacije. Nato se posvetimo novemu orodju za vizualizacijo nem formatu CSV na portalu Odprti podatki Slovenije (OPSI, proračuna. Po opisu Sankeyevega diagrama razložimo kako ga https://podatki.gov.si/). Na portalu sta za vse proračune med le-lahko obogatimo z uporabo interakcije. Predstavimo tudi podrob- toma 2014 in 2021 na voljo splošni in posebni del proračuna, od nosti izdelave vizualizacije in razpravljamo o njenih lastnostih. leta 2019 naprej pa še načrt razvojnih programov [11]. Vsi podatki Prispevek zaključimo s povzetkom in zamislimi za nadgradnjo uporabljeni v tem prispevku izhajajo iz portala OPSI. orodja. 2.3 Obstoječe vizualizacije 2 DRŽAVNI PRORAČUN Ministrstvo za finance poleg golih podatkov o proračunu od Državni proračun Republike Slovenije je gospodarsko-politični leta 2017 naprej objavlja tudi infografike s ključnimi podatki akt, ki vključuje predvidene prihodke in odhodke države za eno o proračunu, s katerimi želi proračun približati širšemu krogu leto. Sprejme ga Državni zbor po predpisanem postopku. Kadar so državljanov. Primer takšne infografike je prikazan na sliki 2 [9]. dejanski prihodki manjši od načrtovanih ali nastanejo nove obve- Infografika izpostavlja določene vidike posebnega proračuna – znosti, ki v proračunu niso bile predvidene, vlada lahko predlaga v tem primeru 16 politik, pri čemur so nekatere združene, saj je originalnih politik, zajetih v proračun, 24. Manjše število politik je rebalans proračuna1. Z njim proračun uskladi s spremenjenimi okoliščinami. lažje za razumevanje, a neizogibno skriva nekatere podrobnosti. Državni proračun je ena od štirih blagajn javnega financiranja. Poleg infografik so od začetka leta 2020 na voljo tudi tri vr- Preostale tri so ste (interaktivnih) vizualizacij proračuna [12]. Prva omogoča pokojninska blagajna, iz katere se pretežno izpla- čujejo pokojnine in invalidnine, vpogled v trenutno stanje prihodkov in odhodkov proračuna, zdravstvena blagajna, ki pokriva predvsem stroške delovanja zdravstvenih domov, bolnišnic in ki se dnevno osvežuje. Iz nje je razvidno ali se proračun izvaja zdravil ter skladno s pričakovanji. Druga vizualizacija je interaktivna in za občinski proračuni, ki obsegajo prihodke in odhodke vseh 212 občin. Največja blagajna je ravno državni proračun, ki vseh 24 politik proračuna omogoča podrobnejši pogled porabe v posebnem oknu, v katerem so odhodki dodatno razdeljeni po programih in kontih. Odhodki vsake politike so prikazani tudi za pretekla leta (od leta 2009 naprej). Tretja vizualizacija pa nudi 1V času pisanja tega prispevka se pripravlja rebalans proračuna za leto 2020 [13]. vpogled v posamezne projekte, kjer interaktivnost omogoča iska- Povod zanj je izraziti upad proračunskih prihodkov med epidemijo COVID-19, hkrati pa rast izdatkov zaradi sprejetih ukrepov vlade za omilitev posledic krize in nje projektov po različnih kriterijih, med drugim tudi po tem v ohranitev gospodarske aktivnosti. kateri regiji in občini se izvajajo. 10 Interaktivna vizualizacija proračuna Republike Slovenije Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Slika 2: Infografika bilance odhodkov za posebni proračun za leto 2020 (vir: Ministrstvo za finance [9]) Slika 3: Izpis dodatnih podatkov ob interakciji s katego- 3 INTERAKTIVNA VIZUALIZACIJA S rijo (zgoraj) in tokom (spodaj) posebnega dela proračuna za leto 2020 SANKEYEVIM DIAGRAMOM Kot dopolnitev obstoječim grafičnim prikazom predlagamo vizu- alizacijo proračuna z dvema Sankeyevima diagramoma – enim za splošni in drugim za posebni del proračuna. 3.1 Sankeyev diagram Sankeyev diagram (poznan tudi kot aluvialni diagram) prikazuje kategorije in kvantitativne odnose med njimi [4]. Kategorije so vizualizirane kot pravokotniki (na sliki 1 obarvani v sivo), odnosi med njimi pa kot tokovi (na sliki 1 v različnih barvah). Širina toka je sorazmerna s količino, ki povezuje dve kategoriji. Čeprav je Sankeyev diagram poimenovan po diagramih ener- getske učinkovitosti parnega stroja Matthewa Sankeya iz leta Slika 4: Podrobnejši pregled kategorije Izobraževanje in 1898 [3], je bil v rabi že prej. Eden najbolj poznanih Sankeye- šport posebnega dela proračuna za leto 2020 vih diagramov je Napoleonova ruska kampanija, ki jo leta 1869 ustvaril Charles Minard [7]. Sankeyev diagram je videti kot nalašč za vizualizacijo pro- • Sprememba pogleda. S klikom na kategorijo spremenimo računskih podatkov, saj lahko na eni sami sliki prikaže mnogo pogled tako, da se približamo izbrani kategoriji in vsem raznolikih prihodkov in odhodkov ter morebitno razliko med njenim podrejenim kategorijam (ter v primeru posebnega njunima vsotama. Na sliki 1 so bilance označene s temno sivo proračuna tudi njenim prvim nadrejenim kategorijam). barvo in postavljene na sredino grafičnega prikaza. Kategorije na Na ta način lahko prikažemo kategorije in tokove, ki so levi kažejo prihodke v proračun, kategorije na desni pa njegove v izvirnem pogledu predrobni ali preveč nagneteni, da bi odhodke. Sankeyev diagram dobro prikaže tudi kako se neka jih lahko dobro videli. Primer takšne spremembe pogleda kategorija razčleni na več podkategorij in kakšna so razmerja na diagramu posebnega dela proračuna je ilustriran na med njimi. Na sliki je to vidno za hierarhijo bilanca – skupina sliki 4. Tu je kategorija Izobraževanja in športa povečana kontov – podskupina kontov – konto (na strani prihodkov in čez celoten zaslon, kar nam omogoča, da v podrobnosti vi- odhodkov). dimo njene podkategorije in njihova medsebojna razmerja. Sankeyevega diagrama za posebni del proračuna zaradi omeje- Hkrati pa vidimo tudi katera ministrstva so odgovorna nega prostora v prispevku ne prikazujemo v celoti (v nadaljevanju za to politiko. V takšnem pogledu se lahko odločimo, da bomo videli nekatere njegove dele). nadaljujemo s pregledovanjem drugih kategorij (s klikom nanje) ali pa se s klikom na katerikoli tok vrnemo na pr- 3.2 Uporaba interakcije votni pogled. Sankeyevemu diagramu lahko izrazno moč povečamo z uporabo • Izbira podatkov. Preko zavihka (ni viden na slikah) lahko interakcije. Predlagano orodje podpira naslednje interakcije: izberemo leto proračuna, ki nas zanima. Trenutno imamo na voljo podatke za proračune za leta 2019, 2020 in 2021. • Izpis več podatkov. Ker se zneski v državnem proračunu med seboj precej razlikujejo, so nekatere kategorije in to- Ob spremembi leta se izrišeta nova dva Sankeyeva dia- kovi lahko zelo debeli, drugi pa komaj vidni. Poleg tega je grama (za splošni in posebni del proračuna), ki vsebujeta na določenih ravneh število kategorij in tokov precejšnje. podatke za izbrano leto. To pomeni, da ne moremo izpisati imena vseh kategorij, ampak se omejimo le na največje. Interakcijo lahko kori- 3.3 Izdelava vizualizacije stimo za to, da se imena kategorij (tudi tistih najmanjših) 3.3.1 Priprava podatkov. Kot že omenjeno, so vsi podatki, upo- v celoti izpišejo šele takrat, ko se z miško postavimo nad rabljeni v tem orodju, pridobljeni s portala OPSI [11]. Podatki so njimi (glej sliko 3 zgoraj). Na podoben način interakcijo vzorno pripravljeni, saj z njihovim rokovanjem nismo imeli težav. uporabimo tudi pri premikanju miške nad tokovi, kjer se Pred uporabo smo podatke dodatno obdelali. Najprej smo odstra- ob tem pokaže več informacij o toku (njegov izvor in ponor nili vse tiste povezave med kategorijami, pri katerih so bili zneski ter znesek, slika 3 spodaj). manjši od 1000 EUR. S tem smo želeli izpustiti podatke, ki so 11 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Tea Tušar relativno majhni in, v kontekstu državnega proračuna, praktično uporabe za posledico (pre)malo nadzora nad končnim izgledom, nepomembni. Poleg tega smo na tak način zmanjšali velikost ki bi ga želeli dodatno prilagoditi, a to ni mogoče. Moteča so tudi podatkovne zbirke in malenkost izboljšali odzivnost orodja, ki se občasna prekrivanja imen v kategorijah (glej spodaj desno na ob velikem številu kategorij in tokov zmanjša. sliki 1), ki se jim je pri interaktivnih vizualizacijah težko izogniti. Izračunali smo tudi vse skupne zneske po kategorijah. Nato smo pripravili uporabniku prijazen zapis zneskov, ki števila zao- 4 ZAKLJUČKI kroži in uporablja okrajšave za milijon in milijardo. Končno smo Predstavili smo novo vizualizacijo proračuna Republike Slove- podatke preoblikovali v obliko, ki jo zahteva knjižnica za izris nije s Sankeyevim diagramom, ki podpira interaktivnost in tako Sankeyevih diagramov (več v tem v nadaljevanju). Tako pripra- omogoča poglobljeno raziskovanje kategorij in denarnih tokov vljene podatke smo shranili za uporabo v nadaljevanju (opisana proračuna. Na ta način vizualiziramo tako splošni kot posebni obdelava podatkov se izvede samo enkrat – orodje nato deluje del proračuna. na že obdelanih podatkih). V nadaljevanju bi želeli poskusiti na isti način vizualizirati tudi razlike med dvema proračunoma. Tako bi lahko primerjali 3.3.2 Tehnična izvedba. Za implementacijo Sankeyevih diagra- proračune dveh različnih let ali pa osnovni proračun z njegovim mov smo uporabili Pythonovo knižnico Plotly [6], ki ponuja rebalansom. številne interaktivne grafične prikaze in delo z njimi precej po- enostavi. Plotly zahteva podatke o kategorijah in tokovih med ZAHVALA njimi in iz njih avtomatično zgradi Sankeyev diagram. Spletno aplikacijo smo zgradili z ogrodjem Dash [1] in jo Delo je nastalo v okviru raziskovalnega programa št. P2-0209, ki objavili preko platforme Heroku [2]. Trenutno je na naslovu ga sofinancira Javna agencija za raziskovalno dejavnost Repu- http://proracun.herokuapp.com na voljo verzija 0.3. blike Slovenije iz državnega proračuna. 3.3.3 Oblikovalske odločitve. Ob oblikovanju diagramov smo LITERATURA morali sprejeti nekaj odločitev, ki so vplivale na uporabo in iz- [1] Dash. 2020. Dash user guide. Dostopano 1. 9. 2020. https: gled diagramov. V prvi vrsti smo se odločali za funkcionalnost //dash.plotly.com. interakcij (glej razdelek 3.2). Pri spremembi pogleda se tako v [2] Heroku. 2020. Heroku. Dostopano 1. 9. 2020. https://www. primeru posebnega dela proračuna pokažejo tudi nadrejene kate- heroku.com/home. gorije, ker to nudi več konteksta, ki v splošnem delu proračuna [3] Alex B. W. Kennedy in H. Riall Sankey. 1898. The thermal ni tako pomemben. efficiency of steam engines. Minutes of the Proceedings of Ime kategorije se pokaže, če je znesek kategorije vsaj 5 % vsote the Institution of Civil Engineers, 134, 278–312. vseh kategorij v istem stolpcu. Podobno prikazujemo le prvih 30 [4] Andy Kirk. 2016. Data Visualization: A Handbook for Data znakov imena, celotno ime pa le ob interakciji. Obe meji (5 % in Driven Design. SAGE. 30 znakov) smo določili empirično. [5] Tamara Munzner. 2015. Visualization Analysis and Design. Vse kategorije so obarvane enako (svetlo sivo), razen bilanc, ki AK Peters Visualization Series. CRC Press. so temnejše, da bolj izstopajo. Tokovi so različnih barv, ki so dolo- [6] Plotly. 2020. Plotly Python open source graphing library. čene tako, da so kategorije z istimi imeni vedno enako obarvane. Dostopano 1. 9. 2020. https://plotly.com/python/. To olajša razumevanje in primerjavo med različnimi leti prora- [7] Edward R. Tufte. 2001. The Visual Display of Qunatitative čuna. Z napisi na dnu prikaza, ki pojasnjujejo klasifikacijo, smo Information. Graphics Press. vnesli kontekst, ki pomaga pri orientaciji med spreminjanjem [8] Ministrstvo za finance Republike Slovenije. 2020. Državni pogledov. proračun. Dostopano 1. 9. 2020. https : / / www. gov. si / podrocja/finance-in-davki/drzavni-proracun/. 3.4 Razprava [9] Ministrstvo za finance Republike Slovenije. 2020. Državni Po začetnem testiranju uporabe, ki pa še ne vključuje prave upo- proračun 2020, Infografika. Dostopano 1. 9. 2020. https:// rabniške študije, lahko rečemo, da je Sankeyev diagram dober www.gov.si/assets/ministrstva/MF/Proracun-direktorat/ način za raziskovanje proračuna. Eden glavnih uvidov pri upo- Drzavni - proracun / Sprejeti - proracun / Sprejeti - 2020 / rabi orodja je bil, da je servisiranje javnega dolga večja postavka Infografika_PRORACUN_2020.pdf. od pričakovane (ker je poleg bilance odhodkov vsebovana tudi v [10] Ministrstvo za finance Republike Slovenije. 2020. Fiskalna računu financiranja, na preostalih vizualizacijah ne nastopa tako in javnofinančna politika. Dostopano 1. 9. 2020. https : izstopajoče). //www.gov.si/teme/fiskalna-in-javnofinancna-politika/. Interakcija omogoča “sprehajanje” po diagramu na različnih [11] Ministrstvo za finance Republike Slovenije. 2020. Proračun ravneh podrobnosti in v uporabniku zbudi željo po dodatnih Republike Slovenije. Dostopano 1. 9. 2020. https://podatki. informacijah, ki trenutno v vizualizacijo niso zajete. Te so na gov.si/dataset/proracun-republike-slovenije. voljo le v obrazložitvah proračuna v datotečnem formatu PDF, [12] Ministrstvo za finance Republike Slovenije. 2020. Proračun kar otežuje njihovo morebitno dodajanje v aplikacijo. Republike Slovenije, Aplikacija APPrA. Dostopano 1. 9. Verjetno največja prednost takšnega prikaza je primerjava med 2020. https://proracun.gov.si/. posameznimi kategorijami in tokovi, ki je precej bolj intuitivna od [13] Ministrstvo za finance Republike Slovenije. 2020. Vlada po- obstoječih vizualizacij proračuna. Slabost je odzivnost, za katero trdila predlog rebalansa letošnjega državnega proračuna. bi si želeli, da bi bila boljša. Žal je to lastnost, ki se je ne da dovolj Dostopano 1. 9. 2020. https://www.gov.si/novice/2020- dobro predvideti in se izkaže šele v zadnjih fazah implementacije 08 - 30 - vlada - potrdila - predlog - rebalansa - letosnjega - takšne aplikacije. drzavnega-proracuna/. Uporaba knjižnice Plotly je zelo olajšala delo in zmanjšala čas, potreben za razvoj takšne aplikcaije. Vendar ima ta enostavnost 12 MightyFields Voice: Voice-based Mobile Application Interaction Jernej Zupančič Miha Štravs Miha Mlakar Jožef Stefan Institute Faculty of Mathematics and Physics Jožef Stefan Institute Ljubljana, Slovenia Ljubljana, Slovenia Jamova cesta 39 Jožef Stefan International Faculty of Computer and Ljubljana, Slovenia Postgraduate School Information Science miha.mlakar@ijs.si Ljubljana, Slovenia Ljubljana, Slovenia jernej.zupancic@ijs.si miha.stravs996@gmail.com ABSTRACT feature cannot be specialized and has to work satisfactory in We present MightyFileds Voice (MFVoice), a service and an exten- general setting. Three steps are performed to enable voice in- sion of the MightyFields application that enables voice interaction teraction. First, speech is transformed into text by using Google with a mobile application. The user can issue voice commands speech-to-text (STT) engine [2]. Second, approach from [5] is for transitioning between application views and filling out the utilized to extract intent keywords. The full intended command forms. Google speech-to-text engine is used to obtain text, which is then inferred based on what the user is currently seeing on is then fed into the developed MFVoice service together with the screen and from the rest of the spoken words. Third, the the structured application view representation. MFVoice service recognized action is performed within the application itself. then returns appropriate action to take, which is executed by In Section 2, an architecture of our service is presented. In the Mighty Fields application extension. The MFVoice natural Section 3, we present our MFVoice natural language understand-language understanding service was tested in real-life use cases, ing (NLU) service and show its implementation. We then explain achieving 93% intent recognition accuracy, 88% entity recogni- the tests conducted on the service and their results in Section tion success when the system was used as intended. When no 4 and discuss them in section 5. We conclude the paper with a training to the user was provided, intent and entity recognition summary in Section 6. achieved 68% and 52% accuracy, respectively. Note that in case of no training provided, the users assumed general knowledge 2 MFVOICE ARCHITECTURE of the language semantics, which is out-of-scope for the current MFVoice comprises several parts (Figure 1) that enable voice state-of-the-art research in natural language. interaction: (1) MF application itself: this is the main MightyFields appli- KEYWORDS cation. voice assistant, voice interaction, natural language understanding (2) MF agent: the program that enables programmatic access to the application view - reading and interacting. 1 INTRODUCTION (3) STT: a service that transforms spoken commands into Interaction with devices by voice has become quite common in text. recent times. More known examples of applications allowing (4) MFVoice NLU service: the service that parses free text and voice commands are voice assistants like Cortana [4] and Siri [1]. returns structured information about recognized intent Voice interaction is attractive to users as it offers a hands-free and entities. application interaction and is therefore a desired feature in many applications. This feature is useful for people with spelling diffi- 3 THE MFVOICE NLU SERVICE culties. It can also help those with physical disabilities who often The MFVoice NLU comprises the following steps (Figure 2): find typing difficult. The proposed service is not used for two- (1) Application view context processing way conversation, as in platforms such as the one from Rasa [3]. (2) Intent recognition However, the part of the service used to recognize user’s com- (3) Entity recognition mand, is very similar to the ones from other virtual assistants. When the application context and transcription of the voice The modifications applied take into account the specifics of the command are provided to the NLU application programming task at hand. interface (API), the service first identifies possible actions to take, In this paper we focus on the task of filling out custom forms given the context, then it processes the context content, which through the voice interaction. Here, a custom form is a small in turn enables recognition of the intent and, finally, the entities. information gathering application, made for specific purpose, The so-obtained structured action data is then forwarded back e.g., electric grid inspection form, or police report regarding an to the MF agent, which can execute appropriate actions. In this incident. Since the domain is open ended, i.e., each individual section we will describe each of the MFVoice NLU parts in more can make his or her own custom forms, the voice understanding details. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or 3.1 Application View Context Processing distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this The application view context provides structured data on the work must be honored. For all other uses, contact the owner/author(s). elements that are visible on the screen. This includes field labels, Information society ’20, October 5–9, 2020, Ljubljana, Slovenia field IDs, possible values of fields (where applicable), interaction © 2020 Copyright held by the owner/author(s). options, and available tabs for multi-page forms. Upon the API 13 Information society ’20, October 5–9, 2020, Ljubljana, Slovenia Novak, et al. (4) “Tap” the field or graphical element. This is used to interact with buttons and navigate the application. Due to non-existent training data, keyword-based intent recog- nition was utilized. For each intent a set of intent “key-phrases” were defined: (1) “Choose”: choose, pick, select, is (2) “Write”: is, write, input (3) “Clear”: clear, delete, remove (4) “Tap”: choose, go to, tab, click, pick In the intent inference step, the score for each of the “key- phrases” is computed. If the highest score exceeds the predefined threshold, the corresponding intent is chosen [5]. To simplify the NLU pipeline only one intent and one field per utterance is allowed. This is especially problematic in voice assistants, since the users naturally communicate differently when talking than when writing. To resolve the disambiguation of one “key-phrase” being associated with multiple intents, the following order of intents is taken into account: “Choose”, “Write”, “Clear”, “Tap”. Further, since not every intent is possible with every field, the intent list is first filtered and only the intents that make sense in the current context are kept and iterated over. For instance, Figure 1: The MFVoice architecture overview if the context comprises only of the input fields and navigation tabs, “Choose” does not make sense. The “key-phrases” used for inferring the intent are tagged with Voice com- Application the intent tag (IT), which is later used in the entity recognition mand (text) context step. Preprocess 3.3 Entity Recognition context. There are two types of entities present in our use-cases: “label Recognize entities” (e.g., name in “The name is John.” ) and “value entities” intents. (e.g., John in “The name is John.” ). Label entities are the labels of the fields in the user generated Recognize form. Since they are user generated, their value is not restricted. entities. Their semantic meaning is sometimes harder to grasp automat- ically, since not much additional info is usually provided (for instance description). In general, the recognition of label entities Structured cannot be learning-based, since the users are not expected to action data provide examples. Value entities can further be divided into known value entities Figure 2: The MFVoice service processing pipeline (button labels, the items in the drop-down lists or checkbox lists and similar) or unknown value entities (text input fields). The call, the application context is pre-processed. The text visible to reasoning for known value entities is the same as for the label the user is normalized and transformed into search friendly form entities – they are user generated and cannot be learned in gen- that is used in intent and entity recognition. eral. For the unknown value entities, the value corresponding to The transformation is cached to speed-up its future use in a certain label should be recognized from the free text, generated subsequent steps. Least recently used cache is used, since user from the STT service. The unknown value entities can comprise interacts with one application view as long as he or she does not one or several words. fill-out the form in short time period. Score, related to the probability of the entity appearing in the text, is used to recognize the label entities and known value 3.2 Intent Recognition entities, while a heuristic is used to recognize the unknown value entities. Intent defines what the user wants the application to do. In the case of form filling, the following intents were identified: 3.3.1 Label and Known Value Entities Recognition. These enti- (1) “Choose” an element from a list of elements. This is often ties are the ones set by the user in the form creation phase. To used to pick the element from a dropdown or checklist recognize the entities from free text, text similarity scores are elements. applied and evaluated for each possible label or known value (2) “Write” some text into a field. This is used for input or entity [5]. Only the entities with scores that exceed threshold are text-area elements. recognized and can be used in the next pipeline steps. (3) “Clear” the value of a field to delete a wrong value entry In some instances, the MFVoice NLU has to modify the text for any kind of element. received by the STT engine. Common examples for this are: 14 MightyFields Voice: Voice-based Mobile Application Interaction Information society ’20, October 5–9, 2020, Ljubljana, Slovenia (1) Letter-by-letter dictation transformation, e.g. in transcribed (2) If IT tag is present in the text, then begin tagging word to command “the form ID is 1 2 3 4 5 a” the empty spaces in the left or to the right of LE-tagged word with OTHR tag. the form ID have to be deleted so “12345a” is obtained. Stop if text-end or IT-tagged word are reached. Check if (2) Zero padded numbers transformation, e.g. in transcribed there is any remaining word: command “the house number is 23”, sometimes the drop- (a) If there are remaining words, tag those with UVE tag. down values include only known value entity “023”. There- fore, the preceding zeros have to be dropped when com- his name really is John Doe puting the text similarity. OTHR LE OTHR IT UVE (3) Numbers in text transformation, e.g. in transcribed com- mand “pick option three”, the number “3” is transcribed (b) If there are no remaining words, re-tag all the words to as “three”. In those cases, the textual representation has the right of LE tag with UVE tags. to be transformed into a number. insert the name John Doe The words that correspond to the highly-scored entities are IT OTHR LE UVE tagged with the label entity (LE) or known value entity tag (KVE), to be later used in the unknown value entity recognition. The previous steps capture the majority of the unknown value Some examples of label entity and known value entity recog- entity recognition cases. However, there are still commands that nition are: would not be understood by the MFVoice NLU service: (1) (1) form number 123456 John Doe the name is ̸= John Doe the name is LE KVE OTHR LE IT UVE OTHR LE IT (2) (2) female KVE John Doe really is his name Note that when the user speaks command “female”, the MFVoice UVE IT OTHR LE NLU service recognizes the known value entity that belongs to ̸= John Doe really is his name the field with label “sex”. Additionally, even without specifying the intent keyword, the application logic infers that the user UVE OTHR IT OTHR LE wants to pick the option “female” in the field “sex”. 4 TESTING Since the text similarity metrics are used for scoring the labeled and known value entities, in some cases the entities are not The MFVoice NLU service was tested in two ways: laboratory recognized correctly: testing and real-world testing. For laboratory testing, the text was entered into the service directly, bypassing the STT ser- (1) Multi-word synonyms are not recognized, e.g. “city” ∼ vice. This way, the STT performance issues were ignored and “place of living” only the recognition capability of the MFVoice NLU service was clear the place of living tested. The examples, however, were still obtained from the final MFVoice users. The test user was presented with an application IT OTHR screen and told to fill the form using only his or hers voice. While the systems supports synonyms, they have to be For the real-world testing, the users were given written in- manually entered by the form creator and are therefore structions on how to use the app, however, no instruction on less practical. how to actually voice commands were given. First, the form was (2) Multiple occurrences of the same or very similar label filled out using screen and keyboard interactions. Second, the entities or known value entities cannot be properly dis- field that a user wants for fill with a voice command was marked. ambiguated. Consider, for example, a form that comprises Third, voice interaction was activated and the command was house number field with possible value “4”, and household spoken. Fourth, the transcribed voice command, the context, and size field also with possible value “4”. User usually fills the the marked item were stored for future analysis. We did not pro- form in a linear way, top to bottom. When the user encoun- vide any examples on how to use MFVoice. This allowed us to ters the first of the mentioned field, he or she may voice research what the users actually expect from the system. command “four”. In this case, NLU service will provide The forms used in testing included six free-text input field wid- two possible actions: “house number is 4” and “household gets (name, surname, age, settlement, street, house number), one size is 4”. radio widget with two options (gender: male, female), one check- box field with five options (language: Slovene, Slovak, Spanish, 3.3.2 Unknown Value Entity Recognition. The unknown value entity recognition is computed only when the intent “Write” is Swedish, Sumerian), and four dropdown fields (country, settle- considered, since this is the only type of the“open” form field ment, street, and house number). that allows for unknown values. The following heuristic is used to tag the unknown value entity (UVE): 4.1 Laboratory Testing Set-up and Results (1) If IT tag is not present in the text, every word not tagged We have gathered 70 and 69 commands for application interaction with LE is tagged with UVE. in Slovenian and English languages, respectively. Laboratory testing is performed upon each git push to the code repository and age 31 is run within the continuous integration pipeline. This enables LE UVE us to track the performance of the MFVoice NLU pipeline. 15 Information society ’20, October 5–9, 2020, Ljubljana, Slovenia Novak, et al. Table 1: Intent confusion matrix for commands in Slove- 5 DISCUSSION nian According to the results, the intent recognition process performs very good, despite the fact that it is only based on keyword write choose clear tap missing recognition and the context processing. We do not think that any write 22 0 0 0 2 additional work would benefit the performance in this regard, choose 0 21 0 0 0 with the exception of adding additional intent keywords, which clear 0 0 2 0 0 will be obtained during the application usage. tap 0 0 0 20 3 After the user familiarizes with the way the MFVoice appli- cation works, also the named entity recognition performs well. Most of the errors were actually a result of a user expecting the system to be too advanced. All 43 incorrectly recognized entities 4.1.1 Intent Recognition. After each continuous integration were the result of MFVoice not being able to reason that, for pipeline run, the intent confusion matrix is computed. Table 1 is instance, “John” is a person name. While this could be done for an example of the intent confusion matrix for voice commands certain special cases, e.g. person names and geographic names, in Slovenian for the last version of MFVoice. According to the at the moment this cannot be solved in general. This is a result matrix, the accuracy of intent recognition is above 90%. The only of letting the users to create their own forms, which are often errors were the ones, where the system was not able to deter- very domain specific. In the future we will perform the testing mine the item to be interacted with, which was labeled with the of the system after users are given some basic training on how “missing” classification label. to use MFVoice. This should greatly improve the percentage of 4.1.2 Entity Recognition. For each command also the field labels properly labeled instances and also help us uncover additional and values recognized by the NLU service and the ground truth edge cases to be addressed by the entity recognition pipeline. labels and values are compared. Examples where the NLU fails The MFVoice NLU was designed in a way to easily support to recognize the label or value correctly are: multiple languages. In the current form, to support a new lan- guage, the translations of the intent keywords and language (1) “Age 26 years.” Expected value: “26”, got “26 years” word vectors have to added. For certain languages the module (2) “She is 26 years old.” Expected label: “age”, got nothing. for unknown value entity has to be adjusted, since the sentence (3) “She lives in Ljubljana.” Expected label: “Place”, got noth- syntax can be different. This enabled us to quickly add support ing. for English, after Slovenian voice interaction performed well. 4.2 Real-life Testing Set-up and Results 6 CONCLUSION We have gathered 172 spoken voice commands in the real-life In this paper we presented our service that is used for filling forms setting in Slovenian. Unfortunately, there were only 86 commands with voice commands in a mobile application. While some oper- that were labeled correctly by the test users and STT performed ating system do include voice interaction, e.g., Cortana [4] and well there. STT issues occurred in 42 out of 172 cases (24%). These Siri [1]), their use in a dedicated application is limited. MFVoice could either be result of too much background noise, command enables more advanced voice interaction. MFVoice application not being recorded properly, or just the problem with the STT first gets the text which was converted from speech by using service used for the Slovenian language. Incorrect user labeling the Google STT engine [2]. Then, the MFVoice NLU service occurred in 42 out of 172 cases (24%). The most common mistakes uses keyword recognition and context preprocessing to infer the in those cases were: the user forgot to set the ground truth either command the user intended. Because of the simplicity of the by entering the value or choosing the item, the user obviously implementation, the service is less accurate when commands picked the wrong item (e.g., for command “the name is John” an are voiced in the form of long and complex sentence. However, item with label age was selected). this simplicity does make the service more robust and accurate Out of 86 valid commands, 45 were recognized correctly and 41 with commands voiced in concise form. We believe that users incorrectly. For 23 cases the label value was completely missing should have a comfortable user experience, after they get used and could not be inferred from the surrounding text (e.g., “John”, to forming commands in a more concise manner. “45”, “Ljubljana”). For 18 cases the label value could be inferred from the surrounding text (e.g., “he is 23 years old”, “she lives in ACKNOWLEDGMENTS Ljubljana”). In some cases (12) this would require some general Comland d.o.o. funded the research presented in this paper. reasoning about the words and their relations and in other the unknown value entity included additional text, e.g. “his name REFERENCES miki”, was not recognized because of minor STT-engine mistakes (1), or the known value entity score was not high enough to be [1] Apple. 2020. Siri. https://www.apple.com/siri/. (2020). included (5). This results in 88% accuracy for entity recognition [2] Google. 2020. Speech-to-text: automatic speech recognition. when the system was used as intended, 72% when the synonyms https://cloud.google.com/speech-to-text/. (2020). were assumed, and 52% when general knowledge of the language [3] Rasa Technologies Inc. 2020. Rasa. https://rasa.com/. (2020). semantics was assumed. [4] Microsoft. 2020. Cortana - your personal productivity as- Note that the testing was performed without some planned sistant. https://www.microsoft.com/en-us/cortana/. (2020). features implemented. The Zero padded numbers transformation [5] Miha Štravs and Jernej Zupančič. 2019. Named entity recog- and Numbers in text transformation steps were missing. The accu- nition using gazetteer of hierarchical entities. In Interna- racy percentages should improve to 94%, 76%, and 56% for uses as tional Conference on Industrial, Engineering and Other Ap- intended, assuming synonyms, and assuming general knowledge plications of Applied Intelligent Systems. Springer, 768–776. of language semantics, respectively. 16 eBralec 4: hibridni sintetizator slovenskega govora Jerneja Žganec Gros Miro Romih Tomaž Šef Alpineon d.o.o. Amebis d.o.o. Institut “Jožef Stefan” Ulica Iga Grudna 15 Bakovnik 3 Jamova cesta 39 1000 Ljubljana, Slovenija 1241 Kamnik, Slovenija 1000 Ljubljana, Slovenija jerneja.gros@alpineon.si miro.romih@amebis.si tomaz.sef@ijs.si POVZETEK snemanju je manj uporabna za uporabo v mobilnih aplikacijah ob hrupnem akustičnem ozadju. V članku predstavljamo nov sintetizator slovenskega govora Zato smo velik del analize posvetili možnim izboljšavam pri eBralec4 (https://ebralec.si/). Razvit je bil povsem nov ženski glas » gradnji nove govorne zbirke, ki omogoča boljše delovanje Nadja eBralec«, ki je razumljivejši in zveni bolj naravno od akustičnega modula. To še posebej velja za ženski glas, ki ga je predhodnega ženskega glasu. Opisujemo zgradbo sintetizatorja zaradi fizikalne narave tudi sicer težje kvalitetno sintetizirati. govora, njegove module, jezikovne vire uporabljene pri razvoju ter Nova potek izgradnje govorne zbirke za nov ženski glas. govorna zbirka za glas » Nadja eBralec« je bila posneta z branim govorom. To ustreza najpogostejšim oblikam rabe sintetizatorjev KLJUČNE BESEDE govora, lažje je izdelati transkripcijo, snemanje je bolj nadzorovano in predvidljivo. Pri spontanem govoru je namreč govorno zbirko sinteza slovenskega govora, govorna zbirka, postopek sinteze težko fonetično in prozodično uravnotežiti. slovenskega govora Na osnovi analize delovanja izhodiščnega jezikovnega modula smo izboljšali pomensko analizo povedi in na novo razvili 1 samodejno določanje vrste povedi, ki ima še posebej veliko težo Uvod tudi v postopku gradnje govorne baze. V članku predstavljamo nov sintetizator slovenskega govora Izpostavljene so bile tudi težave eBralca pri sintetiziranju eBralec4 (https://ebralec.si/). Pri razvoju smo izhajali iz obstoječe kratkih besedilnih segmentov in posameznih simbolov, kar se kot tehnologije, sintetizatorja govora za slovenski jezik eBralec [1], ki najbolj moteče pokaže pri črkovanju, ki je bilo mestoma slabo je bil razvit v okviru projekta Knjižnica slepih in slabovidnih, in je nerazumljivo. Težavo smo rešili z uvedbo hibridnega pristopa k prvenstveno namenjen slepim in slabovidnim uporabnikom ter akustičnemu modeliranju govornega signala, kjer kratke segmente osebam z motnjami branja. sintetiziramo z visoko razumljivo difonsko konkatenacijo govornih V okviru projekta CityVOICE smo identificirali več priložnosti segmentov, daljše segmente pa z naravno zvenečimi za izboljšavo in nadgradnje eBralca, tako glede naravnosti kot tudi parametričnimi reprezentacijami govornega signala s pomočjo razumljivosti. V sodelovanju s skupino končnih uporabnikov smo prikritih Markovovih modelov. pregledali in raziskali pomanjkljivosti obstoječega sintetizatorja Obstoječima glasovoma eBralca, moškemu glasu » Renato govora ter zbrali dodatne želje končnih uporabnikov za izboljšave eBralec« in ženskemu glasu » Maja eBralec«, se je v novem sintetizatorja govora, kar je rezultiralo v novem produktu eBralec4. produktu eBralec4 pridružil novi in opazno bolj naravno zveneči Kot poglavitna pomanjkljivost se je izkazala prvotna govorna ženski glas » Nadja eBralec«. zbirka, na kateri sloni delovanje izhodiščnega sintetizatorja govora. V članku opisujemo zgradbo sintetizatorja govora, njegove Ni zasnovana dovolj konsistentno (deloma neustrezni in module, jezikovne vire, ki so bili uporabljeni pri njegovem razvoju, spremenljivi snemalni pogoji) in robustno – v smislu zajema potek izgradnje nove govorne zbirke za ženski glas » Nadja raznovrstnosti ciljnega besedišča, kar povzroča slabšo razumljivost eBralec« in postopek hibridnega akustičnega modela za generiranje sintetičnega govora ob črkovanju in izgovarjavi posebnih govornega signala. Opisujemo tudi izboljšave pri jezikovni analizi simbolov. vhodnega besedila. Identificirana je bila tudi spremenljiva kakovost sintetiziranega govora, ki izhaja iz sejne spremenljivosti ob snemanju izvorne govorne zbirke. Zaradi neustreznega dinamičnega obsega pri 2 Zgradba sintetizatorja Permission to make digital or hard copies of part or all of this work for personal or Naloga jedra sintetizatorja govora eBralec oziroma povezovalnega classroom use is granted without fee provided that copies are not made or distributed cevovoda je povezovanje sestavnih modulov sintetizatorja govora for profit or commercial advantage and that copies bear this notice and the full citation v enoten proces. Jedro sintetizatorja govora usklajuje delo on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). posameznih delov sintetizatorja tako, da v ustreznem vrstnem redu Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia vključuje oziroma kliče module sintetizatorja govora. Posamezni © 2020 Copyright held by the owner/author(s). moduli pretvorbe zaradi pohitritve in večje paralelizacije procesov lahko hkrati delujejo v ločenih nitih. 17 Zasnova jedra sintetizatorja govora eBralec je prikazana na sliki vsebovane vse potrebne informacije o izgovarjavi besed glede na 1. Moduli, ki jih vključuje jedro eBralca, so: jezikovni analizator, njihovo pozicijo in pomen v vhodnem stavku oziroma povedi. besednik, modul za grafemsko-fonemsko pretvorbo in modul za Modul besednik v odvisnosti od vhodnih nastavitev poskrbi za sintezo govornega signala [1]. Na vhodu in izhodu se jedro pretvorbo simbolov in števil v besede. Ti elementi so namreč zelo sintetizatorja govora lahko poveže z ustreznim vmesnikom, npr. pogost sestavni del besedil, zato je njihovo pravilno izgovarjanje SAPI 5, s pomočjo katerega vhodno besedilo z morebitnimi pomembno za razumljivost govora. Modul »grafemsko-fonemska dodatnimi ukazi spreminja v ustrezen govorni signal. pretvorba« poskrbi za pretvorbo v fonemski zapis. Vhodno besedilo sprva obdela jezikovni analizator, ki poskrbi Modul za »sintezo govornega signala« je zadolžen za za ustrezno predobdelavo vhodnega besedila ter razdvoumljanje oblikovanje prozodije in tvorjenje izhodnega govornega signala. izgovornih različic. Rezultat modula je zapis, v katerem so Slika 1: Shema jedra sintetizatorja govora. računalniško predstavitev leksikalnih jezikovnih virov, ki jih razvijamo v okviru projekta OptiLEX. Pri tem rešujemo vrsto 3 Jezikovna analiza besedila problemov, kot so: zahteva po delovanju v realnem času, zahteva Jezikovna analiza uporablja podatke iz Amebisove jezikovne baze po kompaktnem zapisu jezikovnih virov ter zahteva po majhnem Ases [2]. Ta za slovenščino v tem trenutku vsebuje več kot 257.000 odtisu zapisa jezikovnih virov v delovnem pomnilniku [4]. lem, ki vsebujejo 8,1 milijona oblik, od katerih je 5,7 milijona oblik dodatno opremljenih s podatki o izgovarjavi. Dodatno je za 3.1 Samodejno določanje vrste povedi slovenščino v bazi še 36.000 zvez in 8.000 glagolskih predlog. Pri izbiri optimalnih fonetično in drugače uravnoteženih besedilnih Glagolske predloge podajajo informacije o vezljivosti glagola. predlog za govorno bazo smo se posvetili izboljšanemu Jezikovni analizator mora narediti razrez besedila na povedi, označevanju povedi, predvsem označevanju in določanju vrste stavke in besede, potem pa za vsako besedo določiti še ustrezno povedi, kjer smo poleg klasične metode s pomočjo pravil analizirali lemo in oblikoskladenjsko oznako. Ases ločuje leme, ki se različno tudi možnost določanja vrste povedi s pomočjo različnih metod izgovarjajo, npr. »téma« in »temà« predstavljata dve ločeni lemi. strojnega učenja. Razviti postopek smo uporabili tudi v izboljšani Jezikovni analizator deluje na podlagi pravil in podatkov iz različici jezikovnega analizatorja. jezikovne baze Ases, pri čemer so osnova glagolske predloge. Za potrebe določanja vrste povedi, predvsem večstavčnih, smo Na izboljšano delovanje sintetizatorja govora lahko jezikovno definirali ustrezne zapise. Osnovni zapis, prilagojen zdajšnjemu procesiranje vpliva predvsem s še boljšo analizo besedila, ki jo zapisu analize povedi, je kompleksen. Poved je zapisana v lahko uporabimo tako v postopku gradnje govorne baze kot tudi pri posebnem meta jeziku, ki vsebuje vse informacije, ki jih lahko analizi besedila v fazi sintetiziranja govora izluščimo iz povedi na osnovi avtomatske stavčne analize. Na podlagi identificiranih težav delovanja jezikovnega modula Tak zapis omogoča združen zapis večstavčnih povedi, v obstoječega sintetizatorja smo veliko pozornosti posvetili možnim katerem nastopajo tudi vse stavčne odvisnosti. Stavki v povedi izboljšavam jezikovnega analizatorja. Posebej pomembni med namreč lahko nastopajo kot priredja, soredja ali podredja. V njimi sta izboljšanje pomenske analize povedi in določanje vrste primeru podredja pa je navzoča tudi informacija glede njihove povedi, ki ima še posebej veliko težo tudi v postopku gradnje odvisnosti, torej ali gre za prilastkov, osebkov, predmetni, prislovni govorne baze, gl. poglavje 4. ali kateri drugi odvisnik. Raziskali smo tudi možnost pohitritve odzivnosti oz. latence jezikovnega analizatorja s pomočjo postopkov za učinkovito 18 Poleg tega daljšega zapisa smo definirali tudi skrajšan, Snemanje govornega gradiva je potekalo ob prisotnosti poenostavljeni zapis, ki podaja informacijo o vrsti povedi, ki smo izkušenega snemalnega operaterja z namenom, da se je preprečilo ga uporabili pri izbiri končne množice izbranih povedi. neustrezne izgovarjave besedilnih predlog in napake pri snemanju Ker izboljšani stavčni analizator svojo analizo zapiše v daljšem govora, gl. sliko 2. zapisu, smo razvili pretvornik iz tega zapisa v poenostavljeni zapis, ki ohrani le najbolj pomembne podatke o tipu strukture povedi. Primer tega zapisa za poved " Miha, ki je bil lačen, je pojedel malico." je " [[-gp|[[-r-|]]]]" . S pomočjo pretvornika smo vhodni množici povedi dodali informacijo o vrsti povedi, ta pa je v postopku izbire povedi služila kot eden izmed parametrov pri izbiri in uravnoteženju ciljnega števila povedi. Analizator, ki zapiše analizo povedi z informacijo o vrsti povedi v daljši zapis, smo realizirali s pomočjo pravil in z metodami strojnega učenja. V analizator smo vgradili rešitev, ki je ob evalvaciji dala najboljše rezultate [5]. 4 Govorna zbirka CITYVOICE V osrednjem delu analize smo izhajali predvsem iz zaznanih pomanjkljivosti obstoječega sintetizatorja govora, eBralca, ki bi jih bilo mogoče izboljšati, ter zbranih uporabniških zahtev ter identificirali priložnosti za razvoj izboljšanega produkta. Sem Slika 2: Govorka med snemanjem oz. branjem pripravljenega vsekakor sodijo težave zaradi neoptimalne govorne baze, zato smo gradiva v tonskem studiu. V ozadju tonski tehnik, ki med velik del analize posvetili možnim izboljšavam pri gradnji nove snemanjem na zaslonu spremlja tako signal laringografa Lx govorne baze, ki pomembno vpliva na izboljšano delovanje kot tudi mikrofonski signal Sp. akustičnega modula sintetizatorja govora. Izbira velikosti govorne zbirke je posledica kompromisa med Govorcu smo pred snemalnimi sejami podali ustrezna navodila želenim številom variacij glasov oz. njihovim pokritjem na eni in ga zaprosili, da povedi prebira razločno in enakomerno hitro. strani ter časom in stroški, vezanimi na razvoj, na drugi strani. Med branjem besedila so imeli govorci nameščene elektrode Upoštevali smo tudi čas za kasnejše preiskovanje govorne zbirke in laringografa, s katerimi smo spremljali nihanje njihovih glasilk potreben prostor za njeno hranjenje. Najpomembnejši preostali zaradi lažjega kasnejšega označevanja osnovnih period govornega dejavniki, ki smo jih upoštevali pri snovanju nove govorne zbirke signala, gl. sliki 1 in 2. Uporabili smo tri nivoje anotacij oz. za sintezo govora, so: izbira vsebine posnetkov, izbira govorcev, prepisov govorjenega besedila: grafemski prepis, fonetični prepis postopek snemanja in označevanje posnetkov. in prozodijske oznake (slika 3). Izbor povedi za govorno bazo poteka na osnovi večjega števila kriterijev, med katerimi so pokritost osnovnih govornih enot, uravnoteženost dolžin, tipov in vrst povedi, pravilna fonetična transkripcija itd [6]. Med njimi bi posebej omenili vrsto povedi. Ta omogoča bolj natančno modeliranje prozodije, ki je pomembna za naravnost sintetičnega govora. Ena od glavnih težav starega postopka je bila v tem, da nismo imeli orodja za avtomatsko določanje vrste povedi, kar bi lahko bistveno pohitrilo in izboljšalo izbiro povedi. Pri izbiri optimalnih fonetično in drugače uravnoteženih vsebin (povedi) smo se zato v veliki meri posvetili izboljšanemu označevanju povedi, predvsem označevanju in določanju vrste povedi, kjer smo poleg klasične metode s pomočjo pravil analizirali tudi možnost določanja vrste povedi s pomočjo različnih metod strojnega učenja, kot je to opisano v poglavju 3.1. Za razliko od avtomatskega določanja tipa povedi, ki je v večini primerov odvisna od končnega ločila (trdilni, vprašalni, velelni) je določanje vrste povedi precej bolj zapleteno. Če je poved Slika 3: Primer govornega signala z označenimi osnovnimi enostavčna, težav ni. Če pa je poved večstavčna, je potrebna periodami. Zgoraj je govorni signal posnet z mikrofonom, sledi zahtevna analiza povedi in vseh njenih stavkov za določitev njihove signal laringografa Lx in spektralna prikaza obeh signalov. odvisnosti. Ti lahko nastopajo kot priredja, soredja ali podredja. Navpične črte predstavljajo oznake period govornega signala. 19 Uporabili smo tri nivoje anotacij oz. prepisov govorjenega 6 Zaključek besedila: grafemski prepis, fonetični prepis in prozodijske oznake V prispevku smo predstavili zasnovo in izvedbo novega (slika 3). visokokakovostnega sintetizatorja govora za slovenski jezik, eBralec4. Za samodejno tvorjenje govora smo uporabili 5 Akustično modeliranje govora s hibridnim optimizacijo postopka pridobivanja govornih jezikovnih virov v kombinaciji z napredno parametrično predstavitvijo govora z postopkom modeliranjem govora s pomočjo prikritih Markovih modelov ter Specifikacije končnih uporabnikov so narekovale hitro odzivnost difonsko konkatenacijsko sintezo govora za sintezo krajših sintetizatorja govora ter kompaktno velikost pomnilniškega govornih segmentov, predvsem pri črkovanju. prostora, potrebnega za namestitev ter delovanje sintetizatorja Izdelali smo govorno zbirko za nov ženski glas, » Nadja govora. eBralec«. Pri izdelavi govorne zbirke smo posebno pozornost To je ponovno narekovalo izvedbeno različico končnega namenili določanju optimalnih pogojev za snemanje ter določanju sintetizatorja govora, ki, podobno kot pri eBralcu, temelji na optimalnih fonetično in drugače uravnoteženih besedilnih vsebin, parametrični predstavitvi zakonitosti govora v slovenskem jeziku pri čemer smo dodali raznovrstnost povedi glede na novo razviti [1] z uporabo prikritih Markovovih modelov PMM [7,8]. Teh postopek samodejnega določanja zvrsti povedi. zakonitosti se sintetizator govora nauči samodejno na podlagi obsežnega učnega govornega korpusa, ki je bil posebej posnet v te namene, in ki vključuje relevantne akustične ter prozodijske Zahvala fenomene, ki so značilni za govorjeno slovenščino. Razvojno raziskovalno delo je bilo delno financirano v okviru Sinteza govora z uporabo prikritih modelov Markova (PMM) projekta CityVOICE s strani Republike Slovenije in Evropske unije ima v primerjavi z bolj klasičnimi postopki tvorbe govora, pri iz Evropskega sklada za regionalni razvoj, in sicer v okviru katerih govor tvorimo z »lepljenjem« krajših ali daljših govornih »Operativnega programa za izvajanje evropske kohezijske politike izsekov, nekaj privlačnih prednost, saj za zadovoljivo kakovost v obdobju 2014-2020«. Raziskave učinkovitega zapisa jezikovnih govora potrebujemo razmeroma majhno govorno zbirko (zadošča virov je delno sofinancirala Javna agencija za raziskovalno že ura ali več posnetega govora). Nadalje omogoča enovito, dejavnost Republike Slovenije v sklopu aplikativnega kakovostno in sočasno modeliranje akustičnih in prozodičnih raziskovalnega projekta OptiLEX (L7-9406). značilnosti govora. Omogoča tudi zgoščen zapis akustičnega in prozodijskega modela govora, saj za tvorbo govora ni treba hraniti LITERATURA IN VIRI celotne izvorne govorne zbirke. [1] Jerneja Žganec Gros, Boštjan Vesnicer, Simon Rozman, Peter Holozan, Tomaž Po drugi strani pa imajo sistemi PMM tudi nekatere slabosti. Šef, 2016. Sintetizator govora za slovenščino eBralec. Konferenca Jezikovne Govor je lahko na trenutke nekoliko manj razumljiv. Govor ima tehnologije in digitalna humanistika, Ljubljana. lahko ponekod značilen »robotski« prizvok, ki je posledica [2] Špela Arhar in Peter Holozan. 2009. ASES – leksikalna podatkovna zbirka za razvoj slovenskih jezikovnih tehnologij. V Mikolič (ur.). Jezikovni korpusi v parametrizacije govornega signala. medkulturni komunikaciji. Koper: Založba Annales. Podrobna analiza uporabniške izkušnje slepih in slabovidnih [3] Peter Holozan. 2004. Uporaba glagolskih predlog pri strojnem prevajanju. V: uporabnikov eBralca je pokazala, da je še posebej slabo razumljiva Zborniku Konference JEZIKOVNE TEHNOLOGIJE 2004, uredila T. Erjavec in J. Žganec Gros, str. 128. Ljubljana. sinteza govora krajših besednih enot, kot je denimo črkovanje, ki [4] Žiga Golob, Jerneja Žganec Gros, Mario Žganec, Boštjan Vesnicer, " FST-based ga ta skupine končnih uporabnikov zelo pogosto uporablja. Slepi in pronunciation lexicon compression for speech engines" International Journal of slabovidni uporabniki namreč za uspešno uporabo računalnika Advanced Robotic Systems, zv. 9, 2011.. uporabljajo t. i. bralnike zaslona, programe, ki s pomočjo [5] P. Holozan, M. Romih, S. Rozman, 2019, R2.2 – Zasnova govorne zbirke, projektno poročilo CityVoice – govorne tehnologije z naprednimi jezikovnimi sintetizatorja govora uporabnikom sporočajo informacije o tem, kaj viri. se prikazuje na ekranu. [6] Tomaž Šef, Miro Romih, Jerneja Žganec Gros, 2019. Izdelava govorne zbirke Za točno sliko ekrana zelo pogosto uporabijo branje v načinu za sintezo slovenskega govora, Informacijska družba IS 2019. črkovanja [7] T. Toda in K. Tokuda. 2007. A speech parameter generation algorithm , ki besede bere črko po črko, oz. znak po znak. Pri tem considering global variance for HMM-based speech synthesis. IEICE je potrebna velika hitrost branja oz. izgovarjanja, pri čemer metoda Transactions Inf. Syst. PMM ni najbolj uporabna, ker je premalo odzivna ter rezultira v [8] H. Zen in H. Sak. 2015. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. V: manj razumljivih kratkih izoliranih segmentih. Proceedings of the ICASSP, str. 4470–4474. Ker je postopek sinteze s pomočjo PMM manj primeren za [9] Jerneja Žganec Gros in Mario Žganec. 2008. An efficient unit-selection method uspešno sintezo kratkih govornih segmentov, smo se odločili za for concatenative text-to-speech synthesis systems. CIT, zvezek. 16, št. 1, str. 69-78. razvoj unikatnega hibridnega akustičnega modela, ki omogoča [10] C. Hamon, E. Moulines, F. Charpentier, (1989), “A Diphone System Based on kakovostno sintezo govora kratkih govornih segmentov s pomočjo Time-Domain Prosodic Modifications of Speech”, Proceedings of the difonskega sintetizatorja govora z uporabo konkatenacije osnovnih International Conference on Acoustics, Speech, and Signal Processing ICASSP 89, S5.7, str. 238–241. govornih segmentov z metodo TD-PSOLA [9,10], daljši govorni segmenti pa so generirani z uporabo pristopa s pomočjo prikritih Markovovih modelov PMM. 20 Sound 2121: The Future of Music Jordan Aiko Deja Nuwan Attygale jordan.deja@famnit.upr.si nuwan.attygalle@upr.si University of Primorska University of Primorska UP FAMNIT UP FAMNIT Koper, Slovenia Koper, Slovenia Klen Čopič Pucihar Matjaž Kljun klen.copic@famnit.upr.si matjaz.kljun@famnit.upr.si University of Primorska University of Primorska UP FAMNIT UP FAMNIT Koper, Slovenia Koper, Slovenia Fakulteta za Informacijske Študije Fakulteta za Informacijske Študije Novo mesto, Slovenija Novo mesto, Slovenija Figure 1: Concept: We see a future where we no longer need tangible interfaces. Rather humans would let go of these interfaces to give way to a more seamless music interface. ABSTRACT really is music? What is the future of music? How will we Music has always been an integral part of our society since consume music a hundred years from now?” In this paper, the prehistoric times. For the past five centuries, music in- we shortly present how music has been consumed through- struments have been perfected and the industry is nowadays out history and how we imagine it a century from now. We worth billions of dollars. With recent innovations in com- make a wild speculation about the future of music and its puter interfaces, music information retrieval and artificial interface, while encouraging the discussion regarding these intelligence, playing music is not in the sole domain of hu- visions. mans anymore. Thus we are faced with the questions: “What KEYWORDS Permission to make digital or hard copies of part or all of this work for music, interface, interaction design, sound, future personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-1 INTRODUCTION party components of this work must be honored. For all other uses, contact the owner/author(s). Music is considered to be culturally universal [2, 17] and present across all parts of the globe, reshaping the ways Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia © 2020 Copyright held by the owner/author(s). human live, express themselves and convey emotions [9, 13]. Humans have been expressing themselves through music 21 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Jordan Aiko Deja, Nuwan Attygale, Klen Čopič Pucihar, and Matjaž Kljun for a very long time. It is believed that music originated consumption might look like. Lastly, we present questions from naturally occurring sounds and rhythms that humans and challenges that provoke discussions involving usability, echoed by merging them in patterns, making repetitions security, intellectual property and many other relevant key while changing tonality using their voice, hands clapping topics on music. [11], and smacking stones, sticks and other objects around them [12]. For example, one of such ambient sound is rain, 2 RE-IMAGINING MUSIC AND THE MUSIC that has a calming and a relaxing effect, even now since INTERFACE early humans felt safe during the rain while predators do not Humans create and consume music for four different pur- hunt [1]. Music has also helped humans in terms of survival, poses: (1) dancing as a social exercise, (2) providing a com-forging a sense of group identity and mutual trust [4]. mon form of personal or community entertainment, (3) com- The voice box, which allowed humans to sing, first emerged municating ideas and emotions and (4) having and celebrat- about a million years [13] ago and they learned how to use ing rituals and other activities [13]. While these purposes it around 530,000 years ago [14]. The voice box is considered come in handy for a variety of music activities, this position the first music instrument. Besides voice and hands, the ear- paper is focusing on music listening only. This is present in lier instruments were the objects found in the environment (1) where listening is a shared experience, as well as in (2) such as sticks and stones. Some authors argue that since the and (4) where listening can be a shared and also a personal oldest instruments found are so sophisticated (such as over experience. 40 thousand year-old bone flutes [4, 8, 19]) there must have When looking at how music listening has evolved in his-been less sophisticated instruments used by humans before tory, we can envision human tribes gathered around the fire [4, 14]. Nevertheless, the instruments the humans made and where one or several members performed a music piece. This used have rapidly evolved together with the complexity of type of music listening has been present for a long period of music compositions in the last couple of centuries. In this time and even nowadays people gather in live concerts to lis- period a variety of string, brass, percussion and woodwind ten to music. With recordings, music has moved to people’s instruments have evolved from earlier less sophisticated ones homes and the group has been reduced to family members [3]. and friends and listening become more personal. The head- As newer technologies are introduced, other ways of cre- phones enabled users to experience music individually and ating, producing, interacting with and even sharing music the Walkman enabled us to do it on-the-go. Smartphones and [18] are also taking place. MIDI interfaces, electric guitars internet have expanded the instant availability of music but and synthesizers are just some of the devices made of cir- the consumption remained mainly personal. The advances cuits that imitate traditional music instruments, and can be in virtually reality (VR) and augmented reality (AR) have connected to the computer. Novel algorithms, music infor- made personal music listening an immersive experience with mation retrieval (MIR) [5, 6] and artificial intelligence (AI) lights and visualisations augmenting music. Looking at how techniques allow us to work with and create new music con- listening to music has moved closer and closer to us with tent. With the advent of social platforms, sharing music on in-ear headphones that we even try to insert as close as pos- a grand scale has become a norm. sible to our body, it is not far-fetched if our vision is that Throughout this evolution one of the main components listening to music will move inside our heads. of music is expressing and generating emotions. Changes in Music is not just about sounds as it is also about rhythmic vocal parameters occurring during speech as well as singing vibrations; for example, it has been noted that the part of the have been shown to effect our state of emotions [10]. It brain responsible for hearing, works perfectly in deaf people has been also confirmed that sadness, happiness and other as well [16]. In order to feel music, we do not need to hear emotions can be communicated to listeners by music com-it but rather receive the vibrations to the hearing region of posers. As such, music is considered as a popular and easily- the brain. Because of this, we envision a future where we do applicable means for triggering emotions [10] and is globally not need external devices (such as headphones or speakers) consumed by everyone. We listen to music in order to make to be able to hear music, enjoy concerts, etc. Rather we will us happy, sad, to reminisce or reflect on our emotions. be listening to music within our brain in a seamless way. This paper attempts to share the authors’ visions on how Currently, researchers are already experimenting with micro- humans will consume and interact with music in the fu- controllers plugged into the brain and we envision having ture. We present our position based on the trends in how similar devices plugged into the hearing part of our brain. music instruments and music consumption have evolved Sound signals will be delivered straight into our auditory throughout the history. These visions have also emerged cortex. People will no longer have to depend on their ears from shared ideas in our small crowd-sourcing study we con- to listen and hear things. As such, people could enjoy music ducted online. We present two scenarios of how future music even while spacewalking, diving, skiing or surfing. 22 Sound 2121: The Future of Music Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia believe that through this interface, humans will be able to lis- ten and consume music, various melodies and sounds while being able to perform their daily tasks at the same time. As humans are connected to the global highway of infor- mation, unobtrusive sensors no longer need to detect and distinguish the current affect that they feel. Because of their neurological connection to the world, their emotions are eas- ily read and “felt” by the objects around them. Similar to how empathetic spaces that are context and emotion aware, ob- jects nearby will be act as local producers of music to either amplify or address the emotions that humans are feeling. 3 DESIGN SCENARIOS To better explain how we re-imagined this music interface, we describe two design scenarios with our vision in different contexts. Figure 2: Concept: Humans do not need tablets, mobile de- vices, or digital walls. Ideas and concepts are conceived in and are played into our brains from the surrounding objects. At the same time, we envision a future where biological and artificial objects around us will be connected to the cloud [15] where they will have access to a superb computing power. These objects will be equipped with for example nano-chips that will allow them to be part of the global link of information and capable of moving it depending on the needs. This is partly also a vision of IoT [20], which we are expanding to music listening. These technologies will allow Figure 3: Concept: Listening to music while surfing in the humans and objects to telepathically communicate. wide ocean will no longer require waterproof music gear. In the future humans will be able to amplify their emotions Rather, natural elements that are interlinked together cre- by the music naturally produced by the objects surrounding ate vibrations that humans can hear. Humans can finally them (see Fig 2. Traditionally, there are two ways on how achieve a non-obtrusive way of listening music while enjoy-music becomes a gateway for our emotions. If we feel sad, ing their wet hobbies. we wish to hear music so we can reflect, dive deeper and understand the sadness that we feel [emotions going in]. This Surfing. It is a lovely sunny and windy day. Cuauthli de- experience gives us lessons on how to manage our emotions, cides that these are perfect conditions to go surfing (as seen and how to grow stronger. At times, we may feel sad so we in Fig 3). While doing it he likes to feel the adrenaline rush wish to hear music in order to improve our mood [emotions with the sound of rock music. In 2020 Cuauthli would have to going out] and spend the better part of our days. In our wear water-proof in-ear headphones tightly plugged into his envisioned interface, humans can create gateways for their ears to prevent them from falling off. This would prevent him emotions with music. to hear the surroundings. However, when surfing he also has Algorithms will design and produce rhythms in on-the-fly to hear the surroundings for his safety. In order to do this, and have them played via vibration by these nearby objects he would have to balance spatial awareness and enjoy at the (moving on their own). Humans will simply need to think of same time, which takes a lot of effort [7]. In 2121 he will not their emotions and sounds, and the objects near them will have this problem anymore since, not only can he hear his seamlessly produce the vibrations recreating these sounds. preferred rock music, but the music blends with the sound of Objects around us, will produce a unique rhythm, providing the environment around him. In addition, if Cuauthli wants a new definition of audio augmented reality. Humans will to listen to the environment, the algorithm understands this get to enjoy their favorite sounds and rhythms through this and can mute the music just through his thoughts. This can seamless interface and played directly in their minds. We be done using two approaches. First, Cuauthli can listen to 23 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Jordan Aiko Deja, Nuwan Attygale, Klen Čopič Pucihar, and Matjaž Kljun his favorite track in full volume and when he wants to listen [3] Nicholas J Conard. 2009. A female figurine from the basal Aurignacian to the background noise, a chip inside his head can under-of Hohle Fels Cave in southwestern Germany. Nature 459, 7244 (2009), stand this and allows the environment sound to be heard. 248–252. Second, the noise of the environment around him can be [4] Nicholas J Conard, Maria Malina, and Susanne C Münzel. 2009. New flutes document the earliest musical tradition in southwestern Ger- used as an input and then be used to create a new sound that many. Nature 460, 7256 (2009), 737–740. blends with his taste and with the noise around him. These [5] J Stephen Downie. 2003. Music information retrieval. Annual review can be based on Cuauthli’s favorite tunes and algorithms of information science and technology 37, 1 (2003), 295–340. produce a specific tune that fits his current preferences and [6] J Stephen Downie. 2008. The music information retrieval evaluation allows him to not lose the connection with the environment exchange (2005–2007): A window into music information retrieval research. Acoustical Science and Technology 29, 4 (2008), 247–255. around him. [7] Dominik Fuchs. 2018. Dancing with Gravity—Why the Sense of Balance Amplifying emotions at the blink of an eye. It is a rainy Is (the) Fundamental. Behavioral Sciences 8, 1 (2018), 7. day and Cuauthli is sitting by the window, thinking about [8] Thomas Higham, Laura Basell, Roger Jacobi, Rachel Wood, Christo- his loved one. Since he is in Germany on a research visit, pher Bronk Ramsey, and Nicholas J Conard. 2012. Testing models for he misses her dearly. Cuauthli would love to get lost in his the beginnings of the Aurignacian and the advent of figurative art and music: The radiocarbon chronology of Geißenklösterle. Journal thinking about her. He then decides to listen to a song, which of human evolution 62, 6 (2012), 664–676. helps him reflect on his feelings for her and on his current [9] Patrik N Juslin and John A Sloboda. 2001. Music and emotion: Theory mood. The music helps him bringing back the the memories. and research. Oxford University Press. This is done by reducing other background noise as inputs [10] Mattes B Kappert, Alexandra Wuttke-Linnemann, Wolff Schlotz, and and allows him to focus on the memories that are in his brain. Urs M Nater. 2019. The Aim Justifies the Means—Differences Among Musical and Nonmusical Means of Relaxation or Activation Induction After a while, the rain stops falling and Cuauthli needs to go in Daily Life. Frontiers in Human Neuroscience 13 (2019), 36. back to work, but is feeling somewhat depressed. He thinks [11] Jamie C Kassler. 1987. The dancing chimpanzee: A study of the origin of a happy and exciting song, which starts playing and helps of music in relation to the vocalising and rhythmic action of apes. him to focus on his work as well as changes his mood. The Musicology Australia 10, 1 (1987), 79–81. algorithms and his neurological link take care of processing [12] Jeremy Montagu. 2014. Horns and Trumpets of the World: An Illustrated Guide. Rowman & Littlefield. his thoughts and produce the sounds that he needs to hear. [13] Jeremy Montagu. 2017. How music and instruments began: a brief overview of the origin and entire development of music, from its 4 CONCLUSION earliest stages. Frontiers in Sociology 2 (2017), 8. The visions and scenarios we presented come with their [14] Iain Morley. 2013. The prehistory of music: human evolution, archaeol-respective issues and challenges in implementation and in ogy, and the origins of musicality. Oxford University Press. [15] Elon Musk et al. 2019. An integrated brain-machine interface platform policy design. If we imagine a natural and seamless inter-with thousands of channels. Journal of medical Internet research 21, 10 face, evaluating its usability will introduce a new paradigm (2019), e16194. for HCI researchers. Will existing models such as Fitts’ Law [16] ABC Online. 2001. Music - good vibrations for deaf. http://www.abc. (which has always worked on any interface developed - me- net.au/science/articles/2001/11/28/426276.htm. Accessed: 2020-09-10. chanical, digital, virtual) still work in neurological links man- [17] Charles Seeger. 1971. Reflections upon a given topic: Music in universal perspective. Ethnomusicology 15, 3 (1971), 385–398. aged by our seamless thoughts? The intangible interaction [18] Amy Voida, Rebecca E Grinter, Nicolas Ducheneaut, W Keith Edwards, provided by this “online network” could potentially blur con-and Mark W Newman. 2005. Listening in: practices surrounding iTunes cepts such as piracy and intellectual property. As music is music sharing. In Proceedings of the SIGCHI conference on Human factors composed by ubiquitous algorithms connected to our per-in computing systems. 191–200. sonal thoughts and feelings, are all our emotions and the [19] Nils Lennart Wallin, Björn Merker, and Steven Brown. 2001. The origins of music. MIT press. music that are generated by them considered unique and [20] Robert Weisman. 2004. The Internet of things: Start-ups jump into shareable? These, among many others, are interesting ques- next big thing: tiny networked chips. The Boston Globe (2004). tions that we leave to our readers as we re-imagined a music interface. While these visions appear to be very far from reality, we are only left with our own thoughts to begin with and maybe hopefully in the not so near future too. REFERENCES [1] Mitsuko Aramaki, Richard Kronland-Martinet, and Sølvi Ystad. 2017. Bridging People and Sound: 12th International Symposium, CMMR 2016, São Paulo, Brazil, July 5–8, 2016, Revised Selected Papers. Vol. 10525. Springer. [2] Patricia Shehan Campbell. 1997. Music, the universal language: Fact or fallacy? International Journal of Music Education 1 (1997), 32–39. 24 Ohranjanje kulturne dediščine s pomočjo navidezne in obogatene resničnosti Cultural heritage preservation through virtual and augmented reality Marko Plankelj Niko Lukač Selma Rizvić Univerza v Mariboru, Univerza v Mariboru, Univerza v Sarajevu, Fakulteta za elektrotehniko, Fakulteta za elektrotehniko, Fakulteta za elektrotehniko računalništvo in informatiko računalništvo in informatiko Sarajevo, Zmaja od Bosne, bb., Maribor, Slovenija Maribor, Slovenija Bosna in Hercegovina marko.plankelj@student.um.si niko.lukac@um.si selma.rizvic@etf.unsa.ba Simon Kolmanič Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko Maribor, Slovenija simon.kolmanic@um.si of Microsoft HoloLens glasses and Unity game engine. For that purpose, we created an application that enables the interaction Permission to make digital or hard copies of part or all of this work for personal between user and six artefacts from Roman era, found in four or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and different sites through the Balkan. In this article the advantages the full citation on the first page. Copyrights for third-party components of this and disadvantages of such a presentation are presented and the work must be honored. For all other uses, contact the owner/author(s). CHI 2020 Extended Abstracts, April 25–30, 2020, Honolulu, HI, USA. possibility of its use for cultural heritage preservation. © 2020 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-6819-3/20/04. DOI: https://doi.org/10.1145/3334480.23 Keywords Cultural heritage, mixed reality, virtual reality, augmented Povzetek reality, Microsoft HoloLens Kulturna dediščina izginja zaradi različnih dejavnikov in pri njenem ohranjanju si v zadnjih letih vse pogosteje pomagamo s sodobnimi tehnologijami, ki omogočajo njeno digitalizacijo. Kot 1 Uvod primer dobre prakse predstavimo uporabo navidezne in obogatene resničnosti. Vedno pogosteje se uporablja tudi mešana resničnost, Kulturna dediščina je podedovana zapuščina, ohranjena v ki združuje virtualne objekte in resnično okolje. V članku predstavljamo možnost virtualne razstave muzejskih sedanjosti. V osnovi jo delimo na materialno in nematerialno. Z eksponatov s pomočjo očal Microsoft HoloLens in igralnega ohranjanjem kulturne dediščine se ukvarja UNESCO, agencija pogona Unity. V ta namen smo v okviru diplomske naloge znotraj organizacije Združenih narodov, ki je na seznam kulturne ustvarili aplikacijo, ki omogoča dediščine uvrstila 1121 območij [ interakcijo s šestimi artefakti iz 1]. Od tega jih je kar 53 časov Rimljanov, najdenih ogroženih zaradi naravnih katastrof, vremenskih sprememb, vojn na štirih različnih nahajališčih na Balkanu. V članku predstavljamo prednosti in slabosti, ki jih taka in človeške malomarnosti. Zaradi razvoja in priljubljenosti sodobnih tehnologij, so njihovo vrednost prepoznali tudi pri predstavitev nudi uporabniku in kako lahko le-to uporabimo za ohranjanje kulturne dediščine ohranjanju kulturne dediščine. Več milijonov turistov, ki . vsakoletno obiskujejo znamenitosti po svetu, bi lahko namesto Ključne besede dolgih potovanj in visokih stroškov iz udobja domačega fotelja doživeli cenejšo, ampak še vedno zadovoljivo izkušnjo, hkrati pa Kulturna dediščina, mešana resničnost, obogatena resničnost, s tem tudi te znamenitosti razbremenili in tako pripomogli pri navidezna resničnost, Microsoft HoloLens ohranjanju kulturne dediščine tudi za prihodnje generacije. Abstract Problem izginjanja kulturne dediščine in priložnost njenega Cultural heritage is disappearing due to various factors. In recent ohranjanja za prihodnje generacije s pomočjo sodobnih years, we have increasingly used modern technologies together tehnologij smo želeli preveriti tudi v praksi. Izdelali smo with its digitalization in order to preserve it. As an example of aplikacijo za prikazovanje artefaktov iz časov Rimljanov, ki good practice, we present the use of virtual and augmented deluje na »pametnih« očalih za prikazovanje mešane resničnosti, reality. Increasingly, mixed reality is also used combining virtual Microsoft HoloLens in uporabniku omogoča osnovno interakcijo objects and a real environment. In this article, we present the z artefakti ter njihov nemoten ogled iz vseh smeri. possibility of virtual exhibition of museum artefacts with the help 25 Članek sestavlja pet razdelkov. V drugem razdelku predstavimo dediščine Kočevja in Črnomlja. Na podoben način deluje tudi obstoječe aplikacije obogatene in mešane resničnosti, ki se aplikacija AR Kranj6, s pomočjo katere lahko spoznamo mesto uporabljajo pri ohranjanju kulturne dediščine. Naslednji Kranj in njegovo zgodovino. Kulturno dediščino lahko s razdelek, to je tretji, nas seznani z načrtovanjem in izdelavo pomočjo obogatene resničnosti spoznamo tudi v nekdanjem aplikacije. V četrtem razdelku predstavimo delovanje aplikacije. samostanu Žička Kartuzija, ki je danes v ruševinah, kjer s Zadnji, peti razdelek je namenjen predstavitvi doseženih pomočjo pametnih očal in avdio vodnikov skozi rezultatov. tridimenzionalne modele vidimo, kako je samostan izgledal v vsej svoji veličini. 2 Pregled področja Kontinuum virtualnosti, katerega avtorja sta Paul Milgram in Fumio Kishino se osredotoča na definicijo Čeprav sodobne tehnologije v javnosti pogosto označujejo kot mešane resničnosti. Po njuni definiciji mešano resničnost sestavljata obogatena grožnjo, ki lahko privede v odvisnost, socialno izolacijo in resničnost, kjer virtualni zmanjša ustvarjalnost [ elementi obogatijo resnični svet, ter 2], so ravno te tehnološke inovacije pogosto ključne pri ohranjanju kulturne dediščine za prihodne obogatena virtualnost, kjer elementi resničnega sveta obogatijo virtualni svet. Tako lahko mešano resničnost predstavimo tudi generacije. V zadnjih letih med najbolj priljubljene tehnologije kot nadmnožico obogatene in navidezne resničnosti. za ohranjanje kulturne dediščine štejemo navidezno, obogateno in mešano resničnost. Aplikacije mešane resničnosti se za ohranjanje kulturne dediščine v času pisanja tega članka uporabljajo zgolj v manjšem S stališča računalništva štejemo pod pojem navidezna resničnost področje, katerega cilj je ustvariti virtualni svet, ki omogoča obsegu [5]. Možna razloga za to sta lahko visoka cena in relativno nova tehnologija. interakcijo z uporabnikom, medtem ko uporablja posebne naprave za simulacijo okolja, ki skrbijo za čim bolj realno izkušnjo. Obogatena resničnost se z HoloTour7 je produkt podjetja Microsoft, ki nam omogoča ogled a razliko od navidezne osredotoča na dopolnitev resničnega sveta s pomočjo dodajanja zgodovine Rima in skrivnosti Machu Picchua na napravi za prikazovanje mešane resničnost plasti virtualnih objektov ali dodatnih informacij v resnično i, Microsoft HoloLens. Z aplikacijo upravljamo s pomočjo gest in glasovnih ukazov. okolje. Mešano resničnost vse pogosteje vključujejo tudi v muzeje, kjer Priljubljenost obogatene resničnosti z leti eksponentno narašča, lahko izpostavimo aplikaciji HoloMuse8 in Holomuseum9 (obe sta namenjeni za uporabo na napravi Microsoft HoloLens), ki vendar so v raziskavi ohranjanja kulturne dediščine na uporabniku omogočata spoznavanje arheoloških zbirk evropskem območju [3] ugotovili, da trenutno obstaja zelo malo artefaktov, s katerimi lahko poljubno manipuliramo, česar v aplikacij, ki jih večinoma razvijajo muzeji oziroma ustanove za pravem muzeju ne moremo doseči. ohranjanje kulturne dediščine [ 4]. Aplikacije se večinoma aktivirajo na podlagi sprožilca (npr. simbol, označb a, predmet, lokacija naprave), v manjši meri pa tudi na podlagi pogleda. 3 Načrtovanje in izdelava aplikacije Mobilna aplikacija England Originals 1 ter funkcija Pocket Gallery 2 znotraj aplikacije Google Arts&Culture 3 delujeta na Pri snovanju aplikacije, nastajala je v času diplomskega dela, podoben način – ob zaznavi ravne površine prikažeta smo se zgledovali po obstoječih aplikacijah za prikazovanje tridimenzionalni model v resničnem okolju. Manipulacija z mešane resničnosti. Pri tem pa smo se srečali s težavo, da gre pri modelom je mogoča s premikanjem telefona ter upravljanjem očalih HoloLens za dokaj novo napravo, ki je cenovno težje preko zaslona. Tudi mobilna aplikacija Civilisations AR4 deluje dostopna. Ob tem je potrebno poudariti, da se knjižnice z novimi na enak način, vendar v tem primeru virtualni model našega funkcionalnostmi še vedno razvijajo, zato je ustrezne literature planeta lebdi v zraku, na njem pa so označena najdišča dokaj malo, oziroma v njej avtorji opisujejo starejše verzije artefaktov, ki si jih lahko s klikom na zaslon mobilne naprave knjižnic, ki se več ne uporabljajo. Zato smo si zamislili tudi bolj natančno ogledamo ter nad njimi izvajamo osnovne enostavno, ampak še vedno vabljivo aplikacijo, skozi katero so geometrijske transformacije. predstavljene različne možnosti uporabe mešane resničnosti s pomočjo očal HoloLens. Aplikacije za ohranjanje kulturne dediščine s pomočjo obogatene resničnosti zelo uspešno vključujejo v turistično ponudbo tudi v Predstavljena aplikacija spada pod obogateno resničnost, ki pa bi Sloveniji, kjer lahko izpostavimo tri aplikacije. Travel AR lahko postala aplikacija mešane resničnosti v primeru, da bi se Slovenia 5 sproži obogatitev okolice na prenosni napravi ob pojavila potreba po tem, tako da bi uporabljenim objektom dodali zaznavi markerja obogatene resničnosti in nam ob avdio vodenju zavedanje okolice. Aplikacija namreč deluje na napravi za omogoča ogled tridimenzionalnih rekonstrukcij kulturne 1 http://www.heritagecities.com/stories/explore [09. 09. 2020]. 8 https://www.researchgate.net/publication/315472858_HoloMuse_Enhancing_En 2 https://artsandculture.google.com/story/5QWhvYU1kBJfgw [09. 09. 2020]. gagement_with_Archaeological_Artifacts_through_Gesture- 3 https://about.artsandculture.google.com/ [09. 09. 2020]. Based_Interaction_with_Holograms [09. 09. 2020]. 4 https://www.bbc.com/news/technology-42966371 [09. 09. 2020]. 9 https://www.researchgate.net/publication/326713622_HOLOMUSEUM_A_HOL 5 http://www.travel-ar.si/sl/ [09. 09. 2020]. OLENS_APPLICATION_FOR_CREATING_EXTENSIBLE_AND_CUSTOMI 6 https://www.visitkranj.com/sl/obogatena-resnicnost-v-kranju [09. 09. 2020]. ZABLE_HOLOGRAPHIC_EXHIBITIONS [09. 09. 2020]. 7 https://docs.microsoft.com/en-us/windows/mixed-reality/case-study-capturing- and-creating-content-for-holotour [09. 09. 2020]. 26 prikazovanje mešane resničnosti, Microsoft HoloLens, ki to v osnovi omogoča. Tabela 1: Število mnogokotnikov pred in po decimaciji Število Število Objekt mnogokotnikov mnogokotnikov po pred decimacijo decimaciji Artefakt 1 3.268.685 490.301 Artefakt 2 448.881 44.877 Artefakt 3 422.978 42.282 Artefakt 4 124.764 43.666 Artefakt 5 146.228 29.244 Artefakt 6 97.762 9.437 Za večjo atraktivnost aplikacije, smo dodatno zmodelirali še steber v slogu rimske arhitekture, na katere smo v predstavitvi nato postavili artefakte. Vse objekte smo nato izvozili v grafični pogon Unity. V igralnem pogonu Unity smo ustvarili nov projekt ter namestili Slika 1: Shematski prikaz delovanja aplikacije in konfigurirali orodje, ki omogoča razvoj aplikacij mešane in obogatene resničnosti, imenovano »Mixed Reality Toolkit«, oziroma krajše MRTK. Aplikacijo smo razdelili na posamezne dele imenovane scene, ki smo jih uporabili kot samostojne enote. V vsako izmed scen smo dodali izbrane objekte ter jim dodali interaktivne komponente, ki so nam omogočile interakcijo z artefakti. Interakcija s HoloLens je možna na tri osnovne načine: s pogledom, kretnjo ali glasovnim upravljanjem. Vse tri načine smo implementirali v našo aplikacijo. 4 Predstavitev rezultatov Aplikacijo za ohranjanje kulturne dediščine s pomočjo navidezne in obogatene resničnosti smo razvili skozi različne stopnje, ki skupaj sestavljajo celoto; delujočo aplikacijo na napravi za Slika 2: Osnovna načina interakcije z HoloLens, ki smo jih prikaz mešane resničnosti Microsoft HoloLens, skozi katero uporabili v aplikaciji (geste in glasovno upravljanje) lahko spoznamo artefakte iz časov Rimljanov. Aplikacijo smo razdelili na različne scene, kot prikazuje slika 1, Ob njenem zagonu se nam prikaže glavna scena, ki prikazuje šest med katerimi se lahko premikamo z uporabo osnovnih načinov stebrov, kot prikazuje slika 3. Na vsakega izmed stebrov je interakcije s HoloLens, kot prikazuje slika 2. postavljen po en artefakt iz rimske dobe. Za izboljšano uporabniško izkušnjo izleta v preteklost, se v ozadju predvaja glasba s časa Rimskega imperija, kot si jo danes predstavljajo 3.1 Izdelava aplikacije muzikologi. Prvi korak pri izdelavi aplikacije za razstavo muzejskih eksponatov s pomočjo mešane resničnosti, je priprava objektov. Osredje mesto na razstavi so zasedali eksponati s štirih nekdanjih rimskih naselbin: Viminacium in Municipium v Srbiji, Aquae v Bosni in Hercegovini ter Dyrrachium v Albaniji, ki smo jih dobili v elektronski obliki, vendar jih je bilo potrebno pred uporabo v naši aplikaciji obdelati. V ta namen smo uporabili animacijski paket Blender, v katerem smo zmanjšali število točk in Slika 3: Glavna scena s šestimi artefakti, ki so postavljeni na mnogokotnikov, kot prikazuje tabela 1. Število točk je bilo različno visoke stebre potrebno zmanjšati zaradi procesorske moči in velikosti pomnilnika na napravi HoloLens. Artefaktov pa si ni možno ogledovati zgolj od daleč, ampak si jih lahko ogledamo tudi natančneje. Za dostop do scene 27 posameznega artefakta usmerimo pogled v artefakt ter z gesto, ki časov Rimljanov s pomočjo naprave za prikazovanje mešane ponazarja klik, preidemo na novo sceno, kot prikazuje slika 4. resničnosti Microsoft HoloLens. Čeprav smo uspešno razvili aplikacijo, kot smo si jo zamislili, smo mnenja, da imata tako aplikacija kot tehnologija mešane ter obogatene resničnosti veliko možnost nadgradnje v prihodnosti. Največjo omejitev pri razvoju trenutno predstavlja naprava HoloLens in njene tehnične zmogljivosti, kot je npr. slabša ločljivost in majhno vidno polje na vizirju. Druga generacija Microsoftove naprave bi naj izboljšala vse slabosti naprave prejšnje generacije in omogočila bolj naravno interakcijo s hologrami. Slika 4: Prikaz posamezne scene enega izmed artefaktov Če smo v preteklosti razmišljali, kako prilagoditi objekte, da bodo lahko sprejeli več obiskovalcev, bomo v prihodnosti morali Na tej sceni lahko nad artefaktom izvajamo osnovne več pozornosti nameniti uporabi sodobnih tehnologij na različnih geometrijske transformacije (spreminjanje velikosti, rotiranje in področjih, tudi ohranjanju kulturne dediščine. Obogatena in premikanje) s pomočjo prijemanja ročajev ob straneh artefakta. mešana resničnost sta vsekakor tehnologiji, ki ju lahko Na takšen način lahko artefakt pogledamo iz vseh smeri, česar v uporabimo na kateremkoli področju. Potrebujemo zgolj realnem muzeju ne moremo doseči. Ob straneh artefakta se tridimenzionalne modele in zgodbo, ki bo pritegnila uporabnike, prikažeta dve ploščici, na katerih sta zapisani zanimivosti o zgodbo, ki je sestavni del naše preteklosti, preteklosti, ki jo življenju Rimljanov. Po želji jih lahko z usmeritvijo pogleda in želimo ohraniti za prihodnje generacije. gesto, ki ponazarja klik, zapremo in vso pozornost usmerimo v artefakt. Zahvala Na sceni, ki prikazuje artefakte posamezno, se bo nad njimi Avtorji izjavljamo, da je raziskavo finančno podprla Javna pojavila nadzorna plošča, ki omogoča prekinitev predvajanja agencija za raziskovalno dejavnost Republike Slovenija v okviru glasbe, vrnitev v glavno sceno; po želji pa lahko kontrolno ploščo projekta BI-BA/19-20-003. tudi zapremo. Vse naštete ukaze izvedemo z usmeritvijo pogleda v izbrano akcijo (gumb) in gesto, ki ponazarja klik. Ob usmeritvi pogleda v želeni gumb se nam pod njim izpiše ključna beseda, s 6 Viri in literatura katero lahko izvedemo ukaz. Sceno, ki prikazuje posamezen artefakt, lahko upravljamo tudi s štirimi glasovnimi ukazi. Prvi ukaz s ključno besedo »Menu« se [1] Unesco. World Heritage List. Dostopno na: uporablja v primeru, da smo pred tem nadzorno ploščo zaprli. https://whc.unesco.org/en/list/ [26.2.2020]. Nadzorna plošča se bo ob zaznavi ukaza ponovno prikazala. [2] Al-Zoubi, S., Younes, M. A. B. The Impact of Drugi ukaz s ključno besedo »Sound« se uporablja v primeru, ko Technologies on Society: A Review . International Organization smo pred tem predvajanje glasbe ustavili. Ob zaznavi ukaza se of Scientific Research Journal of Humanities and Social Science, bo glasba predvajala naprej. Tretji ukaz s ključno besedo »Close« 20, (2015), 2(5), str. 82-86. se uporablja kot nadomestilo klika na gumb Close in nam ob [3] Luna, U., Rivero, P., Vicent, N. Augmented Reality in zaznavi ukaza zapre nadzorno ploščo. Zadnji, četrti ukaz, s Heritage Apps : Current Trends in Europe. Applied Sciences, 9, ključno besedo »Back«, se uporablja kot nadomestek klika na (2019), 13, 2756. gumb Back in nam ob zaznavi ukaza ponovno prikaže glavno [4] Hammady, R., Ma, M., Strathearn, C. User experience sceno. design for mixed reality: a case study of HoloLens in museum. International Journal of Technology Marketing, 13, (2019), 3/4, Aplikacijo smo želeli tudi testirati na testnih uporabnikih in str. 354-375. pridobiti njihov odziv, vendar to zaradi epidemiološke situacije [5] Kassahun-Bekele M., Pierdicca, R., Frontoni, E., glede COVID-19 ni bilo mogoče. Pričakujemo, da bi bili rezultati Malinverni, E. S., Gain, J. A Survey of Augmented, Virtual, and podobni rezultatom, predstavljenim v članku [6], saj avtorji tam Mixed Reality for Cultural Heritage. Association for Computing med drugim opisujejo uporabniške izkušnje s podobno aplikacijo Machinery Journal on Computing and Cultural Heritage, 11, kot je naša. (2018), 2, str. 1-36. [6] Kolmanič, S., Marksel, M., Mongus, D., Žalik, B. 5 Zaključek Tehnologije navidezne in obogatene resničnosti, kot orodje za predstavitev novih idej in produktov na sejmih : primer Mahepa. Uporabna informatika. [Tiskana izd.]. 2020, letn. 28, V tem projektu smo izpostavili problem izginjanja kulturne št. 2, str. 85-93. ISSN 1318-1882. dediščine, ki je posledica različnih dejavnikov (naravnih in človeških) ter želeli preveriti možnosti ohranjanja naše preteklosti s pomočjo sodobne tehnologije, mešane resničnosti. Razvili smo aplikacijo, ki nam omogoča ogled artefaktov iz 28 Predmetnik: oprijemljiv uporabniški vmesnik za informiranje turistov Gregor Sotlar Peter Roglej 89172027@student.upr.si peter.rogelj@upr.si Univerza na Primorskem, UP FAMNIT Univerza na Primorskem, UP FAMNIT Koper, Slovenija Koper, Slovenija Klen Čopič Pucihar Matjaž Kljun klen.copic@famnit.upr.si matjaz.kljun@famnit.upr.si Univerza na Primorskem, UP FAMNIT Univerza na Primorskem, UP FAMNIT Koper, Slovenija Koper, Slovenija Fakulteta za Informacijske Študije Fakulteta za Informacijske Študije Novo mesto, Slovenija Novo mesto, Slovenija Slika 1: Vmesnik Predmetnika. POVZETEK ponudbo. Ob dvigu predmeta uporabnik sproži prikaz vse- Namen dela je zasnovati in raziskati možne vloge oprijemlji- bine na zaslonu. Opravljena uporabniška študija je pokazala, vega uporabniškega vmesnika za informiranje turistov, ki da je lahko Predmetnik prva točka informiranja v turistično bi dopolnjeval obstoječe oblike informiranja v turistično in- informacijskih centrih, njegova prednost pa je v tem, da na formacijskih centrih, nadomestil njihove pomanjkljivosti, a enostaven in preprost način v kratkem času podaja informa- hkrati uporabil njihove prednosti. Vmesnik smo zasnovali cije o doživetju posamezne turistične ponudbe. na podlagi predhodnih raziskav in lastnih izkušenj ter ga poimenovali Predmetnik. Uporabniški vmesnik vsebuje posa- KEYWORDS mezne enote - predmete, ki predstavljajo določeno turistično oprijemljivi uporabniški vmesnik, turizem, informiranje tu- ristov, turistični informacijski center, TIC Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies 1 UVOD bear this notice and the full citation on the first page. Copyrights for third-Informiranje turistov v turistično informacijskih centrih party components of this work must be honored. For all other uses, contact the owner/author(s). (TIC) je omejeno na nekaj medijev ali virov, kot so turistični informator, tiskovine (letaki, prospekti, zemljevidi, brošure Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia © 2020 Copyright held by the owner/author(s). in zgibanke), zasloni, ki predvajajo videoposnetke, in raču- nalniki (npr. zasloni na dotik). Težave tiskanih medijev so 29 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Gregor Sotlar, Peter Roglej, Klen Čopič Pucihar, and Matjaž Kljun enosmerna komunikacija, težko posodabljanje, različna gra- na po mestu postavljenih kiosk računalnikih, ki so žetone fična razporeditev, poleg tega težko omogočajo nadaljnje prepoznali, vodil do želenih ciljev. Tangible user Interface iskanje informacij (npr. preko URL naslovov, ki jih je po- within Projector-based Mixed Reality je s sledenjem figure, ki trebno prepisati v brskalnik). Tiskovine imajo tudi prednosti, jo je uporabnik premikal po zemljevidu, na zaslonu prika- saj ne potrebujejo dodatne energije in so zelo prenosljive. zoval pripadajoči 3D prizor [27]. Pri drugi izvedbi je fizično Težave zaslonov na dotik so zahtevano znanje za uporabo, figuro zamenjal uporabnik sam, ki je s premikanjem po na čas, ki ga porabimo pri iskanju informacij, potrebno napa- tleh projicirani maketi parka, upravljal prikaz lokacije. Opri- janje z energijo in nezmožnost odnesti informacije s seboj. jemljivi vmesniki so pogosto dostopni v muzejih. Na primer Podobno kot tiskani mediji tudi neinteraktivni zasloni (angl. v [25] je oprijemljiv vmesnik za pridobitev informacij o do-public displays) omogočajo povečini enosmerno komunika- ločenem geološkem vzorcu kar vzorec sam, ki ob rokovanju cijo s predvajanjem videoposnetkov turistične ponudbe. Ti poda zvočne in vizualne informacije o njem na projekciji. so zelo dobrodošli za prvi vtis o ponudbi, vendar teh infor- Vzorec je hkrati tudi bogat vir informacij o barvi, teži, trdoti macij obiskovalec ne more upravljati in lahko le čaka, da se in teksturi. vse informacije predvajajo. Dosedanje raziskave so se osredotočale na prikaz poti do Za raziskavo informiranja o turistični ponudbi smo zasno- želene turistične znamenitosti, prikaz pokrajine glede na vali in izdelali oprijemljiv uporabniški vmesnik, imenovan položaj in rokovanje s predmeti za prikaz podrobnejših (vi- Predmetnik. Sistem odpravlja težavno interakcijo z zasloni na deo) vsebin. Nobena nam znana raziskava se ni ukvarjala s dotik in nezmožnost interakcije z javnimi zasloni. Vmesnik je podajanjem predstave o turističnih doživetjih preko različ- sestavljen iz predmetov, s katerimi lahko uporabnik rokuje in nih predmetov kot so spominki, lokalni pridelki ali izdelki preko tega upravlja večpredstavnostne vsebine o določenem in predmeti, ki predstavljajo turistične aktivnosti (pohodi, turističnem cilju, aktivnosti ali ponudbi na povezanem za- kolesarjenje, ipd.) Osnovna ideja je tako omogočiti upravlja- slonu. Za predmete našega vmesnika lahko izberemo lokalne nje in izbiro aktivnosti ali ciljev potovanja iz predhodnega izdelke, pridelke, spominke in različne predmete, ki so pove- nabora fizičnih predmetov, ki predstavljajo asociacijo na do- zani z aktivnostmi. Predmeti, s katerimi uporabnik rokuje, ločeno turistično izkušnjo, s tem pa omogočiti pridobitev preko zaslona predstavljajo določeno zgodbo, ki obogati turi- možnega doživetja turistične ponudbe preko večpredstavno- stično izkušnjo. Predmeti s tem postanejo “vstopna točka” in stnih vsebin, ki so drugače dosegljive le na spletu in javnih preko podanih zgodb spodbudijo željo po iskanju nadaljnjih zaslonih. Naše raziskovalno vprašanje se tako glasi: Ali je informacij, ki so na voljo v TIC-u, kot na primer, kako priti fizični vmesnik Predmetnik primeren kot vstopna točka za do želene posamezne ponudbe ali cilja, zgodovino, ipd. Vse informiranje o turističnih doživetjih oziroma o ponudbi v dodatne informacije so torej dosegljive preko tiskovin, spleta turistično informacijskih centrih? in turističnih informatirjev, ki so že na voljo v TIC-u. 3 OPIS SISTEMA 2 PREGLED PODROČJA Predmetnik je sestavljen iz treh delov (Slika 2): uporabni-Skupnost je že pred tremi desetletji izpostavila, da računal- škega vmesnika, mikroračunalnika (Raspberry Pi 3B) kot niki preprečujejo stik z okoljem [26], kar je spodbudilo ideje računske enote in naprave za predvajanje zvočnih in video za uporavljanje digitalnih vsebin s pomočjo fizičnih predme- vsebin (zaslon ali projektor). Komunikacija poteka v smeri tov [7]. Nekateri raziskovalci so šli še dlje in predstavili vizijo od oprijemljivega vmesnika preko senzorjev do mikrora-uporabe fizičnega sveta kot vmesnika za povezovanje objek- čunalnika, ki prejme informacijo o dvignjenem predmetu, tov in površin z digitalnimi vsebinami [12]. Na osnovi teh predvaja temu primerno vsebino in preko svetlobnih signa-del so oprijemljivi uporabniški vmesniki postali nova oblika lov podaja informacije o aktivnih in neaktivnih predmetih interakcije [18], ki se uporablja na vse več področijih in za ra-vmesnika (če so vsi predmeti odloženi, pri vseh gori lučka; znovrstne naloge [21], kot so: (i) shranjevanje, pridobivanje če je pa posamezen predmet dvignjen, gori lučka le pri tem). in rokovanje s podatki [1, 5, 19, 22], (ii) vizualizacija informa-Polica predmetnika ima za posamezen predmet izrezan relief cij preko oprijemljivih uporabniških vmesnikov [10, 23, 24], v obliki predmeta, kar omogoča (poleg svetlobnega signala) (iii) modeliranje in simulacije [2–4, 8, 11], (iv) upravljanje lažje odlaganje. Če noben predmet ni aktiven (se z njim ne sistemov, kontrola in konfiguracija [3, 5, 14, 15, 22] in (v) rokuje), se na zaslonu predvaja kratek video posnetek, ki izobraževanje, zabava in programski sistemi [3, 9, 13, 16, 17]. prikazuje rokovanje s Predmetnikom in vabi uporabnika, da Tudi na področju turizma so že bile narejene raziskave, ki dvigne enega od njih. Podrobnejši opis uporabljene strojne so za informiranje turistov uporabile koncepte oprijemljivih in programke opreme je na voljo v [20]. uporabniških vmesnikov. Sistem Mementos [6] je uporabnike Sistem zbira naslednje podatke: število rokovanj z določe-preko žetonov (spominkov), ki so predtsavljali turistične zna- nim predmetom, čas rokovanja s posameznim predmetom menitosti ali infrastrukturo (restavracije, javni prevoz, ipd.), in čas predvajanja posameznega posnetka s čimer beležimo, 30 Predmetnik: oprijemljiv uporabniški vmesnik za informiranje turistov Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia sodelovalo devet udeležencev v razponu od devet do 66 let, od tega je bilo sedem žensk in dva moška. Po privolitvi smo jim predstavili potek raziskave: (i) iz- polnjevanje pred-vprašalnika, (ii) opravljanje dveh nalog (iskanje informacij o kolesarjenju in pregled ostalih možnih doživetij) brez časovnih omejitev, kjer so imeli na voljo ti- skovine, zaslon na dotik in Predmetnik ter (iii) izpolnjevanje Slika 2: Prikaz sheme sistema: a) uporabnik, b) uporabniški drugega dela vprašalnika. Vrstni red opravljanja nalog ni bil vmesnik s predmeti, c) računska enota in d) zaslon, ki pred- naključen, saj smo najprej želeli videti rokovanje za točno vaja večpredstavnostne vsebine. določen namen (najti informacije o kolesarjenju) in nato opazovati splošno rokovanje (pregled možnih doživetij). koliko je bila posamezna ponudba zanimiva in koliko je bil Med raziskovanjem smo opazovali interakcijo z vsako določen predmet vmesnika informativen ali zanimiv za inte- od oblik informiranja (obračanje, tipanje ...) Opazovali smo rakcijo. tudi vrstni red interakcije med raznimi oblikami oziroma predmeti informiranja, porabljen čas na posameznem viru 4 RAZISKAVA interakcije, čas gledanja pri video predstavitvah in skupen V raziskavi smo izvedli nadzorovano uporabniško študijo čas iskanja informacij. (angl. controlled user study). Opazovalne študije (angl. ob- 5 REZULTATI IN RAZPRAVA servational study) nismo mogli izvesti zaradi epidemije. Na TIC-u v Izoli so vse predmete odmaknili, saj jih turisti za- V povprečju so udeleženci prvo nalogo reševali osem minut radi možnosti okužbe ne smejo prijemati; lahko jih dobijo le, in 16 sekund, drugo pa pet minut in 31 sekund, čeprav je če vprašajo zaposlenega informatorja. Zaradi tega tudi niso slednja od njih zahtevala pregled več informacij. Tiskovinam dovolili postaviti Predmetnik v njihove prostore. so pri drugi nalogi v povprečju namenili dobro minuto in pol manj časa v primerjavi s prvo nalogo, posamezna tiskovina pa se je v povprečju gledala enako dolgo, čeprav niso vsi iz prve naloge po tiskovinah posegli tudi v drugi. Pri tablici se je število uporabnikov pri drugi nalogi v primerjavi s prvo zmanjšalo, predvsem zaradi začetne slabe izkušnje z njenim rokovanjem. Ravno tako se je zmanjšal čas povprečne porabe za skoraj dve minuti. Le pri Predmetniku se je čas uporabe podaljšal iz 32 sekund na 53 sekund. To je bilo za pričakovati, saj so imeli možnost pogledati še tri preostale posnetke. Čas se je podaljašal tudi za ogled posameznega videoposnetka v povprečju za tri sekunde, vendar k temu prispeva tudi dejstvo, da so drugi trije videi za nekaj sekund daljši od videa, predvidenega za prvo nalogo. Povečalo se je zanimanje za Slika 3: Simulacija TIC-a: zaslon na dotik, Predmetnik in ti- Predmetnik, saj ga je po tem, ko so ga v prvi nalogi šele skovine. spoznali, v drugi nalogi kot prvo obliko informiranja izbralo več uporabnikov. Vrstni red nalog je deloma tudi vplival na Študijo smo izvedli v simuliranem TIC-u (Slika 3). Na vo- čas opravljanja: slabe izkušnje iz prve naloge so vplivale ljo so bile tri oblike informacij, ki so predstavljale turistično na neuporabo tablice, domačnost s Predmetnikom pa na ponudbo slovenske Istre: tiskovine, zaslon na dotik (tablica opustitev uvodnega videa. s spletno stranjo I Feel Slovenia: Mediterranean & Karst Slo- Poleg tega se je v drugi nalogi v primerjavi s prvo povečalo venia1) in Predmetnik (s štirimi predmeti – pedal, vponka, število uporabnikov, ki so si začeli ogledovati predmete ali sol in kamen s pohodniško markacijo), ki je prikazoval video se z njimi igrati, vendar je bilo število manjše od pričakova- vsebine s spletnih strani turističnih zavodov Kopra, Izole in nega. Previdevamo, da imajo tiskovine in tablica prednost, Pirana. Video vsebine so bile dolge med 12 in 19 sekund. ker so jih udeleženci že poznali ali vsaj vedeli, kakšna je nji- Udeležence smo pridobili s priročnim vzorčenjem (angl. hova funkcionalnost. Prišlo je tudi do določenih sprememb convenience sampling). V danem trenutku in položaju je bila pri izvajanju obeh nalog, kot je povečanje prehajanja med to edina možnost pridobitve uporabnikov. Pri testiranju je oblikami informiranja oziroma vračanja k že obiskani obliki 1https://www.slovenia.info/en/places-to-go/regions/mediterranean-karst- znotraj iste naloge. Še posebno pri prvi nalogi smo opazili, da slovenia so udeleženci najprej uporabili tisto obliko informiranja, ki 31 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Gregor Sotlar, Peter Roglej, Klen Čopič Pucihar, and Matjaž Kljun jim je bila fizično najbližje. Tako so pri prvi nalogi večinoma [5] Jonathan Cohen, Meg Withgott, and Philippe Piernot. 1999. Logjam: a začeli z uporabo tiskovin, ki so bile prva oblika informiranja tangible multi-person interface for video logging. , 128–135 pages. glede na smer prihoda v prostor. Pri drugi nalogi pa je to že [6] Augusto Esteves and Ian Oakley. 2010. Mementos: A Tangible Interface bil Predmetnik. Supporting Travel. , 4 pages. https://doi.org/10.1145/1868914.1868994 [7] George W Fitzmaurice, Hiroshi Ishii, and William AS Buxton. 1995. Predmetnik, ne nadomešča drugih virov saj sta namen in Bricks: laying the foundations for graspable user interfaces. , 442– učinek različnih načinov pridobivanja informacij različen. 449 pages. (i) Predmetnik je privlačen in nudi maloštevilno izbiro in- [8] John Frazer. 1995. An evolutionary architecture. (1995). formacij brez podrobnejšega usmerjanja znotraj področja [9] Matthew G Gorbet, Maggie Orth, and Hiroshi Ishii. 1998. Triangles: zanimanja, (ii) tablica (zasloni na dotik) je manj privlačna a tangible interface for manipulation and exploration of digital informa- tion topography. , 49–56 pages. omogoča bolj podrobno usmerjanje k želenim informacijam [10] Ken Hinckley, Randy Pausch, John C Goble, and Neal F Kassell. 1994. z več nivojsko izbiro, (iii) tiskovine pa so edine izmed treh, ki Passive real-world interface props for neurosurgical visualization. , jih obiskovalci lahko vzamejo s seboj in informacije iz njih 452–458 pages. prebirajo tudi kasneje, a ne nudijo večpredstavnostnih vse- [11] Hiroshi Ishii, Carlo Ratti, Ben Piper, Yao Wang, Assaf Biderman, bin. Rezultate bi težko primerjali z drugimi raziskavami saj and Eran Ben-Joseph. 2004. Bringing clay and sand into digital de- sign—continuous tangible user interfaces. BT technology journal 22, 4 niso dovolj sorodne. Rezultati vprašalnikov so predstavljeni (2004), 287–299. v [20]. [12] Hiroshi Ishii and Brygg Ullmer. 1997. Tangible bits: towards seamless interfaces between people, bits and atoms. , 234–241 pages. 6 ZAKLJUČEK [13] Sergi Jordà, Günter Geiger, Marcos Alonso, and Martin Kaltenbrunner. V članku je predstavljen oprijemljivi uporabniški vmesnik 2007. The ReacTable: Exploring the Synergy between Live Music Performance and Tabletop Tangible Interfaces. , 8 pages. https: Predmetnik, ki bi turistom v TIC na hiter in enostaven način //doi.org/10.1145/1226969.1226998 prikazal doživljajske informacije kot osnovo za raziskovanje [14] Jeevan James Kalanithi and V Michael Bove Jr. 2015. Tangible social turistične ponudbe. Vmesnik smo izdelali kot nadgradnjo network. US Patent 9,002,752. obstoječih zaslonov na dotik in javnih zaslonov (angl. public [15] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, Enrico Costanza, et al. 2005. TUIO: A protocol for table-top tangible user interfaces. , displays), ki le prikazujejo videoposnetke. Namen vmesnika 5 pages. je tako postati vstopna točka informiranja za turiste, ki bi [16] Radia Perlman. 1976. Using computer technology to provide a creative lahko preko tiskovin, turističnega informatorja in spleta nato learning environment for preschool children. (1976). naprej raziskovali turistično ponudbo, ki bi jih na Predme- [17] Hayes Solos Raffle, Amanda J Parkes, and Hiroshi Ishii. 2004. Topobo: tniku pritegnila. a constructive assembly system with kinetic memory. , 647–654 pages. S Predmetnikom smo izvedli nadzorovano uporabniško [18] O. Shaer and E. Hornecker. 2010. Tangible User Interfaces: Past, Present, and Future Directions. NOW. študijo. Povečan čas uporabe in večkratnega rokovanja s [19] Andrew Singer, Debby Hindus, Lisa Stifelman, and Sean White. 1999. predmeti pri drugi nalogi deloma odgovori na zastavljeno Tangible progress: less is more in Somewire audio spaces. , 104– vprašanje o vlogi Predmetnika, ki lahko predstavlja vstopno 111 pages. točko pri informiranju, vendar bi bilo v prihodnosti treba iz- [20] Gregor Sotlar. 2020. Oprijemljiv uporabniški vmesnik za informiranje vesti obširnejšo študijo v realnem okolju TIC-a s turisti, kar v turistov. Master’s thesis. Univerza na Primorskem, Titov trg 4, 6000 Koper. danem trenutku zaradi pandemije ni bilo mogoče. Poleg tega [21] Brygg Ullmer and Hiroshi Ishii. 2000. Emerging frameworks for tangi-bi bilo potrebno razširiti ponudbo Predmetnika ter dodati ble user interfaces. IBM systems journal 39, 3.4 (2000), 915–931. druge funcionalnosti (ponujanje nadaljnjega raziskovanja [22] Brygg Ullmer, Hiroshi Ishii, and Dylan Glas. 1998. mediaBlocks: physi- (QR kode, povezovanje z drugimi viri v enotno izkušnjo), cal containers, transports, and controls for online media. , 379–386 pa-sledenje pogledu, priporočilni sistem, sledenje predmetom ges. [23] Brygg Ullmer, Hiroshi Ishii, and Robert JK Jacob. 2003. Tangible query ...). Bolj podrobno je vse opisano v [20]. interfaces: Physically constrained tokens for manipulating database queries. , 279–286 pages. LITERATURA [24] John Underkoffler and Hiroshi Ishii. 1999. Urp: a luminous-tangible [1] Rachel Abrams. 1999. Adventures in tangible computing: The work of workbench for urban planning and design. , 386–393 pages. interaction designer ‘Durrell Bishop’in context. Master’s thesis, Royal [25] Roberto Ivo Fernandes Vaz, Paula Odete Fernandes, and Ana Cecília Ro-College of Art, London (1999). cha Veiga. 2016. Proposal of a tangible user interface to enhance [2] Robert Aish. 1979. 3D input for CAAD systems. Computer-Aided accessibility in geological exhibitions and the experience of museum Design 11, 2 (1979), 66–70. visitors. Procedia Computer Science 100 (2016), 832–839. [3] George Anagnostou, Daniel Dewey, and Anthony T. Patera. 1989. [26] Pierre Wellner, Wendy Mackay, and Rich Gold. 1993. Back to the Real Geometry-defining processors for engineering design and analysis. World. Commun. ACM 36, 7 (July 1993), 24–26. https://doi.org/10. The Visual Computer 5, 5 (1989), 304–315. 1145/159544.159555 [4] David Anderson, James L Frankel, Joe Marks, Aseem Agarwala, Paul [27] Yuan Yuan, Xubo Yang, and Xiao Shuangjiu. 2007. A Framework Beardsley, Jessica Hodgins, Darren Leigh, Kathy Ryall, Eddie Sulli- for Tangible User Interfaces within Projector-based Mixed Reality. , van, and Jonathan S Yedidia. 2000. Tangible interaction+ graphical 283-284 pages. interpretation: a new approach to 3D modeling. , 393–402 pages. 32 Razvoj in Ocenjevanje Prototipa Mobilne Aplikacije z Elementi Igrifikacije in Mešane Resničnosti Development and Assessment of the Mobile Application Prototype with Elements of Gamification and Mixed Reality Monika Zorko† Matjaž Debevc Ines Kožuh Fakulteta za elektrotehniko, Fakulteta za elektrotehniko, Fakulteta za elektrotehniko, računalništvo in informatiko računalništvo in informatiko računalništvo in informatiko Univerza v Mariboru Univerza v Mariboru Univerza v Mariboru Slovenija Slovenija Slovenija monika.zorko1@student.um.si matjaz.debevc@um.si ines.kozuh@um.si ABSTRACT / POVZETEK prototype and the usability of the user interface. We used the SUS and UEQ method. 80 people were included in the survey by Učinkovito oglaševanje je eden od ključnih ciljev snovalcev random sampling. Statistical analysis revealed three key findings. oglasov in njihovih naročnikov. Prav zato si prizadevajo, da An ad that contains mixed reality and gamification stands out njihovi oglasi izstopajo od konkurence, pogosto pa je pri tem slightly from the rest of the advertising method. This type of ad spregledan vidik uporabnika. V raziskovalni študiji smo tako can also increase the level of intent to purchase the advertised izdelali prototip aplikacije, ki vključuje elemente igrifikacije in product. Lastly, the analysis revealed that there is no association mešane resničnosti. Zaradi omejitev osebnih stikov v času between users' age and the understanding of the application. Our pandemije COVID-19 smo izdelali video posnetke, ki so results can serve both advertisers and researchers in the use of modern technologies and advertising. prikazovali uporabo prototipa. Nato smo ocenjevali uporabniško izkušnjo prototipa in uporabnost uporabniškega vmesnika. OPTIONAL: KEYWORDS Uporabili smo SUS in UEQ metodo. S priložnostnim vzorčenjem smo v raziskavo vključili 80 oseb. Statistična analiza je razkrila User experience, usability, advertising, gamification, mixed tri ključne ugotovitve. Oglas, ki vsebuje mešano resničnost in reality igrifikacijo, nekoliko izstopa od ostalega načina oglaševanja. Prav tako lahko taka vrsta oglasa poveča stopnjo namena nakupa 1 UVOD oglaševanega izdelka. Kot zadnje se je pokazalo, da ni povezave med starostjo uporabnika in razumevanjem aplikacije. Naši Vsakodnevno smo izpostavljeni številnim oglasom, kar vodi rezultati lahko služijo tako oglaševalcem, kot tudi raziskovalcem oglaševalce v vse večja vlaganja v zagotavljanje učinkovitosti na področju uporabe sodobnih tehnologij in oglaševanja. oglaševanja in razlikovanja od konkurence. Sodobna tehnologija daje oglaševalcem številne možnosti za inovativne pristope v komuniciranju s ciljnimi javnostmi. Primera takih pristopov sta vpeljava igrifikacije in mešane resničnosti v oglaševanje. Oboje KEYWORDS / KLJUČNE BESEDE se je izkazalo kot pozitiven dejavnik v priklicu blagovne znamke uporabniška izkušnja, uporabnost, oglaševanje, igrifikacija, s strani potrošnika [1]. mešana resničnost Namen pričujoče študije je tako raziskati neizkoriščen potencial, ki ga prinaša oglaševanje s pomočjo kombinacije OPTIONAL: ABSTRACT igrifikacije in mešane resničnosti. Natančneje, zanima nas zaznana stopnja vidljivosti oglasa, ki vpeljuje igrifikacijo in Effective advertising is one of the key goals of ad creators and their target groups. This is why they strive to make their ads stand mešano resničnost v zgodbo komuniciranja s potrošnikom. Prav out from the competition, while the user aspect is regularly tako raziskujemo vplive na odločitve za nakup s tovrstnimi oglasi overlooked. In the current study, we thus produced a prototype oglaševanih izdelkov. Ker se v procesu oblikovanja tovrstnih application that includes elements of gamification and mixed oglasov pojavljajo tudi izzivi v smislu zagotavljanja ustrezne reality. Due to the limitations of personal contact during the uporabniške izkušnje in uporabnosti uporabniškega vmesnika, je COVID-19 pandemic, we produced videos showing the use of predmet te študije raziskati tudi to. the prototype. We then evaluated the user experience of the ∗Article Title Footnote needs to be captured as Title Note †Author Footnote to be captured as Author Note 2 IGRIFIKACIJA, RAZŠIRJENA Permission to make digital or hard copies of part or all of this work for personal or RESNIČNOST IN OGLAŠEVANJE classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full Pri igrifikaciji gre za uporabo izkušnje zabave, ki »z notranjo citation on the first page. Copyrights for third-party components of this work must motivacijo in sistemom nagrajevanja uporabnike privlači in jih be honored. For all other uses, contact the owner/author(s). vključi v različne aktivnosti« [2]. Tipični elementi igrifikacije so Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia © 2020 Copyright held by the owner/author(s). 33 Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia M. Zorko et al. točke, značke, lestvice, grafi uspešnosti, zgodbe s pomenom, 4.2 Razvoj prototipa aplikacije in avatarji in soigralci [4]. predstavitvenih video posnetkov Mešana resničnost je del razširjene resničnosti, kamor jo uvrščamo skupaj z navidezno resničnostjo in obogateno Prototip smo izdelali s pomočjo programa Adobe XD. Na resničnostjo. Mešano resničnost lahko opišemo kot tehnologijo osnovi izdelanega prototipa smo ustvarili tri videoposnetke, ki so med povsem resničnim okoljem in povsem navideznim okoljem prikazovali vsak po en scenarij primera uporabe aplikacije. Prvi [5]. Gre za okolje, kjer sta navidezni in resnični svet v enem video posnetek je prikazoval osnovno delovanje aplikacije in samem zaslonu. Za njeno delovanje se uporablja dovolj prvi primer uporabe – nakupovanje v nakupovalnem centru in zmogljiva tehnologija, kar zajema ustrezen senzor, procesor in ujetje željenega popusta. Drugi videoposnetek je prikazoval zaslon. prejem potisnega sporočila, ki ga uporabnik dobi ob novem popustu za izdelek, ki si ga želi. Tretji videoposnetek pa je prikazoval primer časovno omejenega popusta, ki ga uporabnik 3 PREGLED SORODNIH DEL mora ujeti v določenem časovnem obdobju ter ga shrani med svoje popuste ter unovči kadar je naslednjič v izbrani Obstoječe raziskave se intenzivno ukvarjajo z vprašanjem poslovalnici. Slika 1 prikazuje zaslonski posnetek iz učinkovitosti vpeljave igrifikacije kot inovativnega orodja v predstavitvenega video posnetka omenjenega primera. oglaševanje [6]-[8]. In sicer, Nobre in Ferreira [6] v svoji študiji ugotavljata, da je s pomočjo igrifikacije mogoče na inovativen način soustvarjati blagovno znamko, vplivati na vpletenost uporabnika in občutek povezanosti z blagovno znamko. Teotónio in Reis [7] ugotavljata, da porabniki iščejo zabavo, nagrade, rivalstvo, socialno vključenost – vse, kar jim ponuja igrifikacija. Prav tako se številni raziskovalci [9][10] ukvarjajo z elementi obogatene resničnosti v oglaševanju. Tako ugotavljajo, da se z uporabo tovrstne aplikacije poveča interakcija kupca in prodajalca, zviša ugled podjetja ter nenazadnje poviša tudi prodaja izdelkov [3],[10]. 4 METODOLOGIJA Slika 1: Slika zaslona predstavitvenega video posnetka. 4.1 Raziskovalna vprašanja Raziskovalna vprašanja smo oblikovali na osnovi pregleda 4.3 Vzorčenje in udeleženci raziskave obstoječe literature. Pogoji za vključitev udeležencev v raziskavo so bili naslednji: RV1: Kakšna je zaznana stopnja vidljivosti oglasa s hkratno  osebe so starejše od 18. leta, vpeljavo igrifikacije in mešane resničnosti v mobilno aplikacijo  osebe, ki (vsaj enkrat na mesec) nakupujejo v vsaj za oglaševanje? eni trgovski verigi, Prvo raziskovalno vprašanje smo zastavili, saj se je vpeljava  osebe, ki imajo v lasti pametni mobilni telefon, interaktivne igre v oglaševanje izkazala kot učinkovita in za uporabnika zanimiva metoda oglaševanja [8]. Za vključitev potencialnih udeležencev v raziskavo smo RV2: Kakšna je zaznana stopnja odločitve za nakup izdelka uporabili priložnostno vzorčenje. V raziskavi je sodelovalo 33 ob hkratni vpeljavi igrifikacije in mešane resničnosti v mobilno udeležencev. aplikacijo za oglaševanje? Drugo raziskovalno vprašanje smo zastavili, saj so pretekle 4.4 Merski instrument raziskave pokazale, da se z uporabo mešane resničnosti lahko Merski instrument, ki smo ga uporabili, je bil spletni anketni poveča stopnja prenosa informacij, sodelovanja ter pospeši vprašalnik. Sestavljen je bil iz treh delov. Prvi del je bil splošnejši odločanje [9]. Prav tako raziskave kažejo, da se z uporabo in je zajemal vprašanja o sami aplikaciji ter zajemal po eno aplikacije, ki vsebuje obogateno resničnost, poveča interakcija vprašanje, ki se je navezovalo na eno izmed raziskovalnih kupca in prodajalca, zviša ugled podjetja in tudi poveča prodaja vprašanj. Pri RV3 smo navezujoče se vprašanje povezali z izdelkov [10]. demografskim vprašanjem o starosti uporabnika. Drugi del RV3: Ali starost uporabnika vpliva na razumevanje uporabe vprašalnika je meril uporabniško izkušnjo – uporabili smo User mobilne aplikacije za oglaševanje, ki vključuje igrifikacijo in Experience Questionnaire (UEQ) [12]. Tretji del vprašalnika je mešano resničnost? meril uporabnost uporabniškega vmesnika – uporabili smo Zadnje raziskovalno vprašanje smo zastavili, saj so v eni od System Usability Scale (SUS) [13]. študij [11] ugotovili, da obstaja razlika v razumevanju aplikacij z obogateno resničnostjo med šolarji višjih razredov in študenti. 4.5 Postopek raziskave Glavna razlika se je pokazala v načinu razmišljanja, izkušnjah in načinu reševanja problemov. Pri načrtovanju in izvedbi raziskave smo sledili Evropskemu kodeksu ravnanja za ohranjanje raziskovalne poštenosti, s čimer smo se zavezali načelu spoštovanja udeležencev raziskave [14]. Prav tako smo upoštevali načela Kodeksa etike in integritete za 34 Razvoj in Ocenjevanje Prototipa Mobilne Aplikacije z Elementi Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia Igrifikacije in Mešane Resničnosti raziskovalce na Univerzi v Mariboru (Univerza v Mariboru, uporabniško izkušnjo s pomočjo 26 nasprotujočih si lastnosti. 2014 - 2020), kodeksa Ameriškega združenja psihologov in Tabela 1 prikazuje rezultate UEQ lestvice. Razpon lestvice je kodeksa združenja spletnih raziskovalcev. Spoštovali pa smo med -3 in 3, kar označuje izjemno dobro aplikacijo. tudi Zakon o varstvu osebnih podatkov [15]. Testiranje prototipa je zaradi omejitev osebnih stikov v času pandemije COVID-19 potekalo na daljavo. Udeležencem Tabela 1: Rezultati UEQ lestvice raziskave smo poslali elektronsko pošto z navodili za izvedbo testiranja. Udeleženci so si v vnaprej določenem zaporedju Lastnosti Povprečje Varianca ogledali tri videoposnetke in na koncu izpolnili tri spletne Atraktivnost 1,482 0,93 vprašalnike. Preglednost 1,508 1,18 Učinkovitost 1,076 0,78 4.6 Statistična obdelava podatkov Vodljivost 0,765 0,65 Stimulativnost 0,886 0,84 Za analizo zbranih podatkov o udeležencih raziskave smo Originalnost 1,326 0,86 uporabili opisno statistiko, med tem ko smo za analizo podatkov, s pomočjo katerih smo želeli odgovoriti na raziskovalna vprašanja, uporabili tako opisno, kot tudi inferenčno statistiko. Kot zadnje smo ocenili še uporabnost uporabniškega Natančneje, odgovore na prvi dve raziskovalni vprašanji smo vmesnika z metodo SUS. Faktor SUS se prikaže na lestvici od 0 iskali z opisno statistiko, odgovor na zadnje raziskovalno do 100. V našem primeru smo izračunali kot povprečno SUS vprašanje pa z neparametričnim statističnim testom Kruskal- oceno vrednost 71,52. Udeleženci so uporabniški vmesnik Wallis H Testom. Podatke smo analizirali s programom IBM prototipa tako ocenili kot dobrega. Slika 2 prikazuje rezultate SPSS Statistics. SUS ocenjevanja. 4.7 Rezultati 100,0 Prvo raziskovalno vprašanje je spraševalo, kakšna je zaznana 80,0 stopnja vidljivosti oglasa s hkratno vpeljavo igrifikacije in mešane resničnosti v mobilno aplikacijo za oglaševanje. 60,0 Rezultati deskriptivne statistike so pokazali, da 51,5 % vseh 40,0 S rezultat udeležencev meni, da bi oglas nekoliko izstopal, 27,3 % SU udeležencev pa meni, da bi oglas zelo izstopal. Več kot polovica 20,0 udeležencev raziskave tako meni, da bi oglas, ki je pripravljen na 0,0 način kot so ga videli v videoposnetkih, nekoliko izstopal od 0 10 20 30 40 ostalih načinov oglaševanja. Številka udeleženca Drugo raziskovalno vprašanje je spraševalo, kakšna je zaznana stopnja odločitve za nakup izdelka ob hkratni vpeljavi igrifikacije in mešane resničnosti v mobilno aplikacijo za oglaševanje. Udeležence smo spraševali, kako ocenjujejo, da bi Slika 2: Rezultati SUS ocenjevanja. jih prikazana aplikacija motivirala k nakupu določenega oglaševanega izdelka [3]. Udeleženci raziskave so lahko izbirali 5 DISKUSIJA IN ZAKLJUČEK med petimi različnimi odgovori (1 – uporaba bi me zelo motivirala k nakupu, 5 – uporaba me nikakor ne bi motivirala). Izsledki pričujoče študije se ujemajo z ugotovitvami preteklih 57,6 % udeležencev meni, da bi jih uporaba aplikacije nekoliko raziskav. Tako na primer naši rezultati podpirajo rezultate motivirala k nakupu, 30,3 % pa jih pravi, da jih uporaba ne bi niti pretekle študije [8], kjer avtorji ugotavljajo, da je takšen način bolj, niti manj motivirala [3]. Več kot polovica udeležencev promocije zanimiv za uporabnika. Prav tako naši rezultati raziskave tako meni, da bi jih uporaba aplikacije nekoliko podpirajo rezultate drugih študij [9][10]. V omenjenih študijah motivirala k nakupu. namreč ugotavljajo, da se z uporabo mešane ali obogatene Tretje raziskovalno vprašanje je spraševalo, ali starost resničnosti dviga stopnja zanimanja za nakup iz strani uporabnika vpliva na razumevanje uporabe mobilne aplikacije za uporabnika. oglaševanje, ki vključuje igrifikacijo in mešano resničnost. Omejitve pričujoče raziskovalne študije so v izvedbi Uporabnike smo razvrstili v štiri starostne skupine: 1 – od 18 do testiranja prototipa. Le-ta namreč ni bil testiran na eni lokaciji z 29 let (19 uporabnikov), 2 – od 30 do 49 let (8 uporabnikov), 3 – več udeleženci. od 50 do 64 let (6 uporabnikov), 4 - več kot 65 let (1 uporabnik). Glede na zastopanost v vsaki starostni skupini, smo v analizo ACKNOWLEDGMENTS / ZAHVALA vključili prve tri starostne skupine. Rezultati Kruskal-Wallis H Zahvaljujemo se vsem udeležencem raziskave, brez katerih testa so pokazali statistično neznačilen rezultat, p > .05. S tem raziskave ne bi bilo mogoče izvesti. lahko sklepamo, da starost uporabnika ne vpliva na razumevanje aplikacije, ki vsebuje igrifikacijo in mešano resničnost [3]. REFERENCES Ocenjevali smo tudi uporabniško izkušnjo razvitega prototipa [1] Javornik A. Classifications of Augmented Reality Uses in Marketing. IEEE z UEQ vprašalnikom, kjer so udeleženci ocenjevali svojo International Symposium on Mixed and Augmented Reality 2014 Media, 35 Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia M. Zorko et al. Art, Social Science, Humanities and Design Proceedings. 10 - 12 September [9] Whiskard, H., Jones, D., Voller, S., Snider, C., Gopsill, J., Hicks, B. Mixed 2014, Munich, Germany, (2014). Reality Tools as an Enabler for Improving Operation and Maintenance in [2] Deterding S., Sicart M., Nacke L., O’Hara K., Dixon D. Gamification. Using Small and Medium Enterprises. IFIP Advances in Information and game-design elements in non-gaming contexts. International Conference on Communication Technology, (2018), 3–14. Human Factors in Computing Systems, CHI 2011, Extended Abstracts [10] Gallardo, C., Rodríguez, S. P., Chango, I. E., Quevedo, W. X., Santana, J., Volume, Vancouver, BC, Canada, May 7-12, 2011. (2011), 2425–2428. Acosta, A. G., Andaluz, V. H. Augmented Reality as a New Marketing [3] Zorko, M. (2020). Združevanje igrifikacije in mešane resničnosti v Strategy. Augmented Reality, Virtual Reality, and Computer Graphics, sodobnem oglaševanju. Fakulteta za elektrotehniko, računalništvo in (2018), 351–362. informatiko. Univerza v Mariboru. [11] Klautke H., Bell J., Freer D., Cheng C., Cain W. Bridging the Gulfs: [4] Sailer, M., Hense, J.U., Mayr, S.K., Mandl, H. How gamification motivates: Modifying an EducationalAugmented Reality App to Account for an experimental study of the effects of specific game design elements on TargetUsers’ Age Differences. Springer Nature 2018, (2018), 185-195. psychological need satisfaction. Comput. Hum. Behav. 69, (2017),371–380. [12] Hinderks, A., Schrepp, M., & Thomaschewski, J. (2018). UEQ-User [5] Moser, T., Hohlagschwandtner, M., Kormann-Hainzl, G., P olzlbauer, S., Experience Questionnaire. Retrieved September, 12, 2018. Wolfartsberger, J. Mixed Reality Applications in Industry: Challenges and [13] Blažica, B., & Lewis, J. R. A Slovene Translation of the System Usability Research Areas. SWQD 2019, LNBIP 338, (2019), 95–105. Scale: The SUS-SI. International Journal of Human-Computer Interaction, [6] Nobre, H., & Ferreira, A. Gamification as a platform for brand co-creation 31(2), (2015), 112–117. experiences. Journal of Brand Management, 24(4), (2017), 349–361. [14] Evropska znanstvena fundacija. 2011. Evropski kodeks ravnanja za [7] Teotónio, N., & Reis, J. L. The Gamification Systems Application Elements ohranjanje raziskovalne poštenosti. Dostopno na: in the Marketing Perspective. Trends and Advances in Information Systems https://www.arrs.si/sl/analize/publ/inc/Evropski_kodeks_raziskovalne_post and Technologies, (2018), 77–87. enosti.pdf [8] Järvinen, S., Peltola, J., & Kemppi, P. Sensor Ball Raffle – Gamification of [15] ZVOP-1. Uradni list RS, št. 94/2007. Billboard Advertising: How to Engage the Audience? Lecture Notes in Computer Science, (2018), 164–174. 36 StreetGamez: detection of feet movements on the projected gaming surface on the floor Peter Škrlj Mark Lochrie peter.skrlj@student.upr.si mlochrie@uclan.ac.uk Univerza na Primorskem, UP FAMNIT Media Innovation Studio, University of Central Lancashire Koper, Slovenija Preston, UK Matjaž Kljun Klen Čopič Pucihar matjaz.kljun@famnit.upr.si klen.copic@famnit.upr.si Univerza na Primorskem, UP FAMNIT Univerza na Primorskem, UP FAMNIT Koper, Slovenija Koper, Slovenija Fakulteta za Informacijske Študije Fakulteta za Informacijske Študije Novo mesto, Slovenija Novo mesto, Slovenija ABSTRACT We implemented a software solution for a video game plat- form that is capable of detecting movement of players’ feet on the floor. The solution is a part of a wider project of using a drone as a platform that could project the game board on the floor as well as track movements and scores of differ- ent players. The whole system is composed of three parts: a drone, a mini projector, a depth camera and a computational device for running the software. For the latter two we used Google Tango to run spatial recognition, detect 3D shapes Figure 1: Complete detection system with projection. and obtain the device’s orientation in space. The system was implemented to the point where it can detect the player’s of novel street and chalk games. In this paper we present feet, transform the detected feet to a gaming surface and the software solution for a proposed game platform that is correct the projection distortion. capable of detecting movement of players’ feet on the floor. KEYWORDS 2 SYSTEM DESIGN exergaming, human-drone interaction, drones, pervasive The minimal set of functional requirements was: (i) Track computing player’s feet on a projected grid where each grid unit mea- sures 30x30 cm to support games such as “whack the mall”. A 1 INTRODUCTION particular unit activates when player steps on it. (ii) Provide Exercise games or exergames can be divided into three cate- quick feedback whilst correctly detecting fast movements, gories: location based games (e.g. [5]), games with motion which is vital for an exregaming platform. (iii) The projected tracking (e.g. [6]) and projection based games (e.g. [3]). In [4] surface should be always mapped as a rectangle. To avoid we proposed a new gaming concept that combines projection accidents, the drone should hover on the side of the pro- based games with drones and user tracking creating a novel jected surface, which would in this case be distorted and gaming platform that is (i) independent of location and (ii) should be corrected. (iv) The platform should support multi- offers a new gaming abilities that can facilitate various types ple players to increase motivation – an important element Permission to make digital or hard copies of part or all of this work for of exergaming. personal or classroom use is granted without fee provided that copies are We decided to use the Google Tango device, which is ca-not made or distributed for profit or commercial advantage and that copies pable of detecting player’s movement and control the projec-bear this notice and the full citation on the first page. Copyrights for third-tion. A variety of other devices could be used to achieve the party components of this work must be honored. For all other uses, contact same. However, at the time when the implementation begun the owner/author(s). Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia (2016), this was one of the rare devices with such functionali- © 2020 Copyright held by the owner/author(s). ties and light enough for drone carrying. For implementation and testing of the software solution we planned to use the 37 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Peter Škrlj, Mark Lochrie, Matjaž Kljun, and Klen Čopič Pucihar device in a static environment, placed 2m high of the floor Point cloud processing facing the playing field under an angle of 45-70 degrees. Once we know the position of the ground plane, we move Google Tango integrates three main functionalities. Mo- to Point cloud processing, which starts by obtaining point tion tracking of the device by using visual features of its cloud data. Then a simple min max filter on the Y axis can surroundings in combination with the accelerometer and be applied to isolate 3D points that are likely to be feet. We gyroscope. Area learning by recording the visual features set the filtering threshold to 20 cm distance from the ground and the measuring the distances. Depth perception by scan- plane. Points that fall out of this threshold are discarded. The ning and building a point cloud image of the room. From results of the filtering can be seen in Figure 2. this point cloud, a room meshes can be made and used as 3D models for further processing. This feature is of particular interest to us, as we planed to utilise the depth cloud in order to detect players movement over the ground plane. To appeal to a wider community of game developers, we decided to use Unity together with Google Tango’s SDK to obtain the callback calls and events from the C library used for processing signals in the Tango device itself. For projection we used the 200 lumens ASUS S3 connected Figure 2: Left — green coloured points are on the floor level, to Google Tango via mini HDMI port. S3 has a wide projec- orange points are objects that are within the 20 cm thresh- tion angle capable to project a large playing area from rela- old; Right — filtered image after min max filtering. tively short distance. The image projected has a trapezoidal distortion called keystone distortion caused by the projector After filtering, the remaining points are grouped into projecting at an angle to the projection surface. spherical geometric shapes. This is done by processing every point and trying to fit it into a nearby sphere. The radius of 3 PROTOTYPE IMPLEMENTATION the sphere was manually set to the default diameter of 30 The software is built of four (4) components: (i) floor plane cm. To simplify the grouping process, we ignore the Y coor- detection — detecting the ground plane and initialisation; (ii) dinate of the feedback mapping placing all feedback points point cloud processing and player detection — searching for to a single plane. Using a 2D image, we can generate distinct players feet position using information from depth camera; groups by using a simple grouping algorithm. Our algorithm (iii) RGB optimization — player identification and optimisa- starts the grouping process by randomly selecting a feedback tion of tracking performance; and (iv) rendering — projec- point. Then we check if there is any group defined within tor alignment correction — removing perspective distortions the threshold proximity of this point. If not, we create a new from the projection. point group, set its rank to 1 and set the location of the group to this point. If the point is found in the diameter of an exist- Floor plane detection ing group, it is added to the nearest one. The group position is then updated by weighted average as such: There are three common methods for generating depth in- formation: Stereo method using two cameras, Time of Flight N casting rays into the space and timing the bounces, and Struc- GroupPos = items × GroupPos + FeedbackPosition N tured Light. Tango uses the latter using IR projector, which i tems + 1 beams a grid pattern of dots where each sample group of the After processing all the points, the transformation from dots is uniquely identified. This way the IR projector and 2D group coordinates back to 3D coordinates occurs by av- IR camera are able to determine the exact position of the eraging the Y coordinate of group points. At this stage we detected point group. remove groups which consist of insufficient number of de- The first step of tracking players is to estimate where tected points. This value can be changed though game engine ground plane lies. This is done by floor plane detection algo- configuration. The result of this step is an averaged group rithm. After obtaining point cloud data, we start by mapping of strong feedbacks (Figure 3). points into buckets where the Y axis is kept in small deviation groups. At each new point cloud frame the points are added RGB optimization into group and once the threshold is reached, the algorithm Since the Tango depth camera has a relatively low refresh marks that Y coordinate as a ground plane. Since the Tango rate of 10Hz, we planned a fine grain tracking by analysing device can localise itself in the space, the ground plane needs captured images from RGB camera. To obtain data from the to be detected only once at the initialization stage. camera the SDK callbacks varied across different versions of 38 StreetGamez: detection of feet movements on the projected gaming surface on the floor Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia with intrinsic camera parameters. These are defined through camera calibration process done by the developer of the de- vice resulting in the coordinate system of the depth camera almost perfectly aligned with the device screen. As such we can ignore extrinsic camera parameters. After we receive 2D points, we clip the point groups into detection masks for the next step we call “Use the circles as masks to fine track Colour image”. In practice we map a 3D vector to camera view port by using Unity WorldToView- PortPoint method call. To proceed, we need to map colour image to the mask by scaling the 2D point so that it corre- Figure 3: Groups are being rendered back to the scene in sponds with the captured image. We are then able to cut the form of spheres that cover a certain area in the virtual world. detection area from the colour image. To enable adjusting the performance of the detection, the size of the detection square is possible to be manipulated via GUI. Tango Core. In Ikariotikos (Version 1.54, June 2017), an event needs to be registered that signals when a new camera image has been rendered to the buffer and is available for reading. Unfortunately, we were not able to obtain the RGB stream whilst depth camera was in operation. The reason for this is still not fully understood and the lack of documentation made it impossible to find the solution within the timeframe of this project. Nevertheless, we present the intended approach for optimising player tracking using color detection. Figure 5: Concept sketch of detecting player feets in RGB image with higher refresh rate. Using the mask the segments of detection image are cropped out (Figure 5) and the colour group detection is ran over using OpenCV contours finding method, allowing us to filter the colour groups and detect centre and radius [1]. After detecting 2D point groups, we apply Unity methods to transform 2D location on the image to the world coordinates. Figure 4: Pinhole camera model showing how 3D pint is transformed to 2D image. Since our 3D point detection is also detecting the playing plane, it is possible to calculate the correct point of contact. We planned to perform the colour based tracking within Mapping the detected centres of the feedback back to the the regions detected by the depth camera whilst waiting for detected floor is an easy task. The points are transformed its next frame. The first step was to perform a transforma- with an inverse of the mapping of 3D point to the 2D screen tion of detected 3D points from the depth cloud to the screen space. We simply raycast the screen coordinate of the point coordinate system in the step we call “Transformation of to the floor and obtain the group position. points to view port frame”. This would allow us to create Rendering a mask with regions of interest. Such transformation can be done using a pinhole camera model (Figure 4) [2]. The Projector alignment correction. Projecting an image to a non projection of 3D point cloud to the screen can be therefore perpendicular surface in respect to the light source will pro- formulated as [xz] duce a distorted image commonly called a keystone effect. pixel = K × [X ZY ]3DPoint where (i) x, y is location of point in image coordinate system, (ii) X, Y, Z This distortion can be approximated by cos(ε − α /2)/cos(ε + α is location of points in world coordinate system in which /2), where ε is the angle of the surface being projected the data is provided, and (iii) K is a matrix of intrinsic cam- on, and α is the width of the focus. Because the projector Fx 0 Fx is mounted in the same space as the Tango device, and the era parameters 0 Fz Fy . To obtain the pixel coordinates device is spatially aware of its orientation in respect to the 0 0 1 within the view port frame, we need to multiply the 3D point ground plane, we can calculate the required adjustments to 39 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Peter Škrlj, Mark Lochrie, Matjaž Kljun, and Klen Čopič Pucihar the projected image and project the proper square to the are being calculated by the point cloud detection algorithm. floor. Two rotations that cause the keystone effect are ro- When a square is overlapped with the sphere it becomes tation Rz and Rx. Since the depth camera will be pointed active. at the players, the drone will need to be equipped with a gimbal maintaining the rotation Rx. To successfully correct 4 PROTOTYPE GAME the distortion caused by Rz we need to know the parameters Implementing the above, a simple game was created. The of the projector’s field of view (FOV), lense parameters and detection runs at 10-15 FPS with some lag spikes that addi- Rz in relation to the ground surface. tionally occur because of point cloud detection instability. We created a virtual scene with a single texture we call a Once the system is initialised (the ground plane is recognised) Render Texture and a virtual camera to which we assign FOV it starts tracking feet. Where these are tracked green spheres and aspect ratio that matches our projector. We place the are rendered and segments of the checker box pattern that virtual camera at a fixed distance and rotate the plane around intersect with the spares are coloured in green (see Figure 1). the z-axis in the opposite direction of tilt detected by the The playing area in the figure is of size 1,7x1,7 m, meaning Tango tracking system. In this way we render graphics where that the projected squares were approximately 27x27 cm. perspective distortions from rotation around the z-axis are The area could be increased by putting the projector and removed as seen in Figure 6. Tango further away. Because the playing field was relatively small, only 2 players could be on it at the same time. The player tracking would fail if there would be more players because of the excessive density of the detected points. 5 CONCLUSIONS It is important to note that the system is currently limited to projections on horizontal planar surfaces. The optimi- sation utilising colour tracking of players feet needs to be implemented. Thus, in order to support multiplayer games a unique footwear colour is required for each player. Despite these limitations and the fact that the Tango platform has Figure 6: Example of perspective mapping of square onto a flat surface. The internal camera mimics projectors field of been deprecated and integrated into Google ARCore, the view and inverts the projection angle. concepts presented can be utilised for a solution using an- other platform. More information on the system is available in [7]. This solution only corrects for one rotation, but as Tango device is capable of 6 DOF camera pose tracking, the rotation REFERENCES around x could be accounted for. We could also use tracking [1] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. information to fix the playing field onto a position in the real 2010. Contour detection and hierarchical image segmentation. IEEE world. The playing field would thus stay at the same place transactions on pattern analysis and machine intelligence 33, 5 (2010), regardless of the position and orientation of the drone. A 898–916. more advanced solution would be to use inverse transforma- [2] A Kaehler and G Bradsk. 2013. Computer Vision in C++ with the OpenCV Library. O’Relly (2013). tion using game shaders or other transformations possible [3] Raine Kajastila, Leo Holsti, and Perttu Hämäläinen. 2016. The aug- in Unity. A possible approach would be to apply the correct mented climbing wall: High-exertion proximity interaction on a wall- inverse trapezoid transformation to the image received from sized interactive surface. In CHI ’16. 758–769. Unity. [4] Matjaž Kljun, Klen Čopič Pucihar, Mark Lochrie, and Paul Egglestone. 2015. Streetgamez: A moving projector platform for projected street Mapping feedbacks to a 6x6 playing plane. In the initialization games. In CHI in Play ’17. 589–594. step, a ray is casted from the centre of the camera to the [5] Kate Lund, Paul Coulton, and Andrew Wilson. 2011. Free All Monsters! detected floor plane. The intersection of the ray and the a context-aware location based game. In MobileHCI ’13. 675–678. [6] Emily CS Murphy, Linda Carson, William Neal, Christine Baylis, David plane represents the centre of the detection matrix. Its centre Donley, and Rachel Yeater. 2009. Effects of an exercise intervention point is used for syncing the display grid with the detection using Dance Dance Revolution on endothelial function and other risk grid. The latter is defined in the engine with default values factors in overweight children. Internat. Jour. of Pediatric Obesity 4, 4 of 6 columns and 6 rows. This setting can be additionally (2009), 205–214. adjusted to allow more precise feedbacks. However, this may [7] Peter Škrlj. 2017. StreetGamez, a moving projector platform for games. Master’s thesis. University of Primorska, 6000 Koper. cause performance issues. After the grid is initialised, its fields are updated according to the sphere positions that 40 Anamorfična projekcija na poljubno neravno površino Anamorphic projection on an arbitrary uneven surface Rok Cej Franc Solina rokcej1997@gmail.com franc.solina@fri.uni- lj.si Laboratorij za računalniški vid Fakulteta za računalništvo in informatiko, Univerza v Ljubljani, Večna pot 113 1000 Ljubljana, Slovenia POVZETEK anamorfoze, je ta smer pogleda lahko bolj ali manj natančno določena. Razvili smo metodo, ki omogoča anamorfično projekcijo na ne- ravno, razbrazdano površino. Sliko, ki jo projeciramo v tem pri- meru, ni dovolj le v celoti perspektivno deformirati. Neravna po- 1.1 Vrste anamorfoz vršina je namreč sestavljena iz velikega števila majhnih ploskev Anamorfozo so odkrili v času renesanse, ko so umetniki in znan- različnih orientacij in za vsako od teh ploskev bi morali izraču- stveniki odkrivali zakone perspektive [2, 3]. Prva vrsta anamor-nati ustrezno perspektivno deformacijo. To najlažje storimo tako, foze, ki so jo uporabljali, je bila perspektivna anamorfoza. Per- da za vsak slikovni element projecirane slike izračunamo ustre- spektivno deformirana podoba je naslikana na ravno ploskev. Da zno deformacijo. To pa zahteva, da imamo 3D model površine, bi se ta anamorfična podoba razkrila, jo je potrebno pogledati z na katero se slika projecira, kar pridobimo s pomočjo senzorja določenega zornega kota, običajno je to dokaj oster kot glede na “Kinect”. ravnino, ki nosi deformirano podobo (Slika 1). Katoprične ali zrcalne anamorfoze za razkritje prave po- KLJUČNE BESEDE dobe potrebujejo ogledalo, običajno cilindrične ali konične oblike. Anamorfoza, Kinect, globinski senzor, optična iluzija Če tako ogledalo postavimo na pravo mesto, se deformirana po- doba razkrije kot odsev v ogledalu (Slika 2). ABSTRACT Med anamorfične upodobitve štejemo tudi iluzionistično sli- karstvo, kjer lahko na predvidenem mestu opazovanja prido- This report describes the creation of a distorted image or video bimo izrazit občutek prostorske dimenzije. V umetnostni zgodo- that looks perfect when projected onto a given uneven surface vini so znane predvsem poslikave stropov, kjer se nam dozdeva, and viewed from a predetermined angle. It utilizes the depth da se prostor odpira proti nebu (Slika 3), danes pa podoben pro-sensor Kinect and a projector. The program is written in C++ storski učinek uporabljajo potujoči umetniki, ki s kredo rišejo and it starts off by recreating the projection surface in 3D. It then podobe na ulicah (Slika 4). uses the surface model to create an anamorphic projection. If Sodobni umetniki, kot je npr. švicarski slikar Felice Varini [11], the Kinect and the projector are properly aligned, the projected anamorfozo uporabljajo pri poslikavi notranjih prostorov ali celih image or video creates an anamorphic illusion in real life. urbanih scen tako, da se z določenega zornega kota razkrije nek KEYWORDS pravilen geometrijski vzorec, kot da bi lebdel v prostoru (Slika 5). Anamorfični princip se uporablja tudi pri slikanju prometnih Anamorphosis, Kinect, depth sensor, optical illusion označb na cestišča, da bi bila bolj jasno berljiva in razločna pod ostrim kotom opazovanja, kot ga imajo vozniki in drugi ude- 1 UVOD leženci v prometu. Tudi razni reklamni napisi, ki jih pravilno vidimo v zrcalih ali pod določenim kotom opazovanja sodijo v Ljudje lahko dokaj zanesljivo interpretiramo slike, ki jih ne gle- kategorijo anamorfičnih poslikav. damo frontalno, ampak pod določenim kotom, saj zna naš za- S pojavom multimedijske tehnologije se je pojavila možnost, znavni sistem podzavestno razstaviti informacijo na vsebino slike da za prikaz anamorfičnih upodobitev uporabimo video projek- in na njeno perspektivno deformacijo. Še posebej dobro ta princip cijo. Na primer, reklamne napise je možno perspektivno deformi- deluje, če lahko zanesljivo zaznamo, kako je slikovna ploskev rati, tako da njihova projekcija iz notranjosti trgovin na pločnik orientirana v prostoru. Pri tem igra pomembno vlogo tudi kohe- pred trgovino ni deformirana in je zato lažje berljiva. renca med premikanjem opazovalca in perspektivno deforma- V Laboratoriju za računalniški vid smo celo razvili princip di- cijo. Majhen premik opazovalca povzroči le majhno spremembo namične anamorfoze, ki perspektivno deformacijo projecirane perspektivne deformacije. Pri anamorfičnih slikah pa ta kohe- slike stalno stalno prilagajajo poziciji opazovalca, tako da je z renca ne obstaja. Anamorfična podoba se tipično razkrije le iz opazovalčevega zornega kota slika stalno izgleda nedeformirana točno določene smeri opazovalčevega pogleda. Odvisno od vrste oziroma tako, kot če bi jo gledali frontalno [8]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or 2 MOTIVACIJA distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this Če uporabljamo video projektor, je projecirana slika brez vsakr-work must be honored. For all other uses, contact the owner /author(s). šne perspektivne deformacije le, če jo gledamo natanko iz točke Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia projeciranja. Ker ima projektor svoje fizične dimenzije, to v praksi © 2020 Copyright held by the owner/author(s). seveda ni možno in zato je projecirana slika, ki jo gledamo vedno 41 Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia Rok Cej and Franc Solina Slika 1: Ena od najbolj znanih slik iz zgodovine umetnosti, ki upodablja perspektivno anamorfozo, sta Ambasadorja Hansa Holbeina iz leta 1533. Lobanja, ki se v frontalnem pogledu (levo) vidi kot eliptičen madež na sredini slike spodaj, pa se v pogledu od desno zgoraj (v sredini), razkrije kot lobanja (desno). Umetniki so tako ekstremno popačenje običajno uporabili, da bi skrili določene kontroverzne elemente na sliki (vir: Wikimedia Commons). Slika 2: Zrcalna anamorfoza: popačena 3D skulptura se v odsevu cilindričnega zrcala razkrije kot žaba (avtor: Jonty Slika 4: Uporaba perspektivne anamorfoze v uličnem sli- Hurwitz, vir: Wikimedia Commons). karstvu (avtor: Julian Beever, 1990-ta). Na levi se vidi iz- razit prostorski učinek, gledano z nasprotne strani, pa se vidi kako popačena je na tlaku dejanska podoba, še pose- bej izrazito noga kopalke, ki v 3D iluziji sega najdlje iz sli- kovne ploskve (vir: Wikimedia Commons). Slika 3: Primer iluzionistične poslikave stropa je v celjski Stari grofiji, ki ga umeščajo na prehod iz renesanse v zgo- dnji barok (vir: Wikimedia Commons). Slika 5: Ploskovna grafika superponirana na razgibano ur- nekoliko deformirana. Kot smo že v uvodu razložili, to običajno bano sceno, se v celoti razkrije le s točno določenega zor- ni problem, saj človeška zaznava z lahkoto loči med informacijo nega kota: Felice Varini, Port de St-Nazaire, Francija, za na sliki in zmerno perspektivno deformacijo te iste slike. Če pa je razstavo “Estuaire 2007” (vir: Wikimedia Commons). kot med osjo projekcije in smerjo našega pogleda zelo velik, pa že lahko nastopijo težave pri interpretaciji slike. Pri anamorfozi pa na ta način pravzaprav želimo skriti pravi pomen slike ali vsaj drugimi besedami, kako lahko izračunamo inverzno anamorfično dela slike. Še večji problem pri interpretaciji slike nastane, če deformacijo slike, da bo izgledala pravilno na poljubni neravni projekcijska površina ni ravna. Zato je naš raziskovalni motiv površini? naslednji – ali lahko projecirano sliko vnaprej deformiramo tako, Že pri običajni perspektivni anamorfozi moramo vedeti, kako da bo izgledala nedeformirano iz vnaprej določenega zornega kota, je slikovna ploskev orientirana v prostoru. Če pa želimo sliko pro- neglede na to, kakšna je površina, na katero projeciramo sliko? Z jecirati na poljubno neravno površino, moramo imeti 3D model 42 Anamorfična projekcija na poljubno neravno površino Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia te površine. Sodobna tehnika ima za odčitavanje 3D oblik številne vrednosti predstavlja razdaljo izraženo v milimetrih. Če te vre- odgovore. Cenovno ugodna in za naše potrebe je smiselna upo- dnosti preslikamo v sivinsko sliko, dobimo globinsko sliko, kjer raba senzorja Microsoft Kinect. Kinect smo v našem laboratoriju so v našem primeru svetle točke bolj oddaljene od senzorja. Kjer že uporabili za odčitavanje 3D površine v sorodnem projektu Kinect ni mogel zajeti globine, so točke črne barve. Svetlobni vodnjak [9], kjer smo klasični kamniti skulpturi dodali Aproksimacija manjkajočih globinskih podatkov. Ker Kinect ne še virtualno dimenzijo v obliki polzečih vodnih kapljic, ki smo more zajeti globine v vsaki točki bodisi zato, ker je bodisi točka jih z video projektorjem projecirali v obliki svetlobnih pik [10]. preveč oddaljena, ker se infrardeča svetloba, ki jo Kinect upora- blja, odbije od površine ali zaradi šuma. Manjkajoče vrednosti 3 SORODNA DELA določimo z aproksimacijo na osnovi sosednih točk. Na prvi pogled je naš cilj najbolj podoben tehnikam, ki s pomočjo Konverzija globinske slike v oblak 3D točk. Vrednosti posame- video projekcije na 3D predmete (angl. projection mapping [12]) znih slikovnih točk v globinski sliki spremenimo v koordinate ustvarijo obogateno resničnost in tako omogočijo povsem novo 3D točk z naslednjo enačbo: in dodatno dimenzijo dojemanja tudi gibajočih se predmetov, npr. [4]. Vendar se naš problem razlikuje od zgoraj opisanega v f ovx  ( 2x − 1) ∗ tan( )   width−1 2  dveh bistvenih elementih: pos⃗ition = depth  f ov  ∗  2y y −  (1)  ( heiдht 1) ∗ tan ( ) −  1 2 (1) Nam ni potrebno video projekcije poravnati z neko vnaprej      1  določeno 3D obliko oziroma predmetom. Zato kompleksna   geometrijska kalibracija med 3D površino, na katero se kjer je: depth = globina projecira in katere obliko zajema globinski senzor, ter vi- x,y deo projekcijo ni potrebna [5]. = indeks točke v globinski sliki (2) V večini sistemov za video obogateno resničnost je smer width, heiдht = resolucija senzorja v hor. in vert. smeri gledanja uporabnika v grobem poravnana s smerjo video f ovx , f ovy = zorni kot Kinecta v hor. in vert. smeri v radianih projekcije in zato do potrebe ali pojava perspektivne ana- morfoze niti ne pride, čeprav s sledenjem položaja uporab- Ker ima Kinect tudi barvno kamero, lahko poveže globinske točke nika nekateri sistemi tudi ustrezno korigirajo pespektivno s ustreznimi barvnimi vrednostmi iz barvne kamere. Zato lahko deformacijo v video projekciji [6]. te barve pripišemo tudi 3D točkam. Na sliki 6 je pogled na oblak 3D pobarvanih točk z različnih zornih kotov. V komercialnih sistemih za video obogateno resničnost, npr. [7], so tudi integrirani globinski senzorji, vendar ti služijo predvsem avtomatični segmentaciji scene na osnovi oddaljenost od projek- torja, da zamudna ročna segmentacija slike ni več potrebna. Zato smo se odločili za razvoj lastnega sistema za anamorfno projek- cijo na neravno površino, ki je namenjen opazovanju projekcije iz nekega vnaprej določenega zornega kota. 4 OPREMA Za anamorfično projekcijo na poljubno neravno površino potre- bujemo dve zunanji napravi: Microsoft Kinect in video projektor. Kinect meri razdalje med 0,5m in 4,5m, kar narekuje tudi naš delovni prostor za projekcijo anamorfoze. Programsko opremo za deformacijo slike smo zaradi hitrosti Slika 6: Pogled na oblak točk z različnih zornih kotov. izvajanja razvili v jeziku C++, čeprav bi po funkcionalnosti bila primerna tudi visokonivojska jezika kot sta Processing in Python. Uporabili smo naslednje knjižnice: Virtualna anamorfoza. Najprej bomo izračunali virtualno ana- • OpenGL morfozo v virtualnem prostoru, preden to naredimo v realnem : Aplikacijski programski vmesnik (API) za gra- prostoru. Najprej predpostavimo, da imamo virtualnega opazo- fiko – GLFW valca, ki gleda v smeri pravokotno na smer projekcijskega snopa. : kreiranje okolja OpenGL – GLEW Nato si predstavljajmo, da ta opazovalec projecira sliko na razgi- : nalaganje razširitev OpenGL – GLM bano projekcijsko površino. Ta slika bo za opazovalca izgledala : matrične in vektorske aplikacije • Kinect SDK povsem pravilno, toda iz smeri projektorja bo popačena. Za vsako : API za Kinect • FFmpeg točko v oblaku 3D točk, ki predstavlja projekcijsko površino, iz- : dekodiranje video zapisov • stb_image računamo smer med opazovalcem in to točko in ugotovimo, kje : branje slikovnih datotek ta premica prebada projecirano sliko. Na ta način določimo ko- 5 PERSPEKTIVNA ANAMORFOZA NA respondenco med vsako točko v oblaku 3D točk in ustreznim pikslom projecirane slike. Ko 3D točkam pripišemo korespon- NERAVNO POVRŠINO denčno teksturo iz slike, se v oblaku 3D točk pojavi popačena Postopek za inverzijo anamorfične deformacije slike smo razdelil slika, vendar če na oblak pogledamo iz smeri virtualnega opazo- na več korakov. valca, dobimo nepopačeno sliko (slika 7). Pridobivanje globinske slike. Globinske slike, ki jih pridobiva Prava anamorfoza. Da bi dosegli isti učinek tudi v realnem Kinect imajo dimenzijo 512 × 424, slikovne pike pa imajo celo- svetu, moramo sedaj izračunati sliko, ki naj jo projecira pro- številske vrednosti, ki so predstavljene s 16 biti. Vsaka od teh jektor, da bi opazovalec videl nepopačeno sliko. Za vsak piksel 43 Information Society 2020, 5–9 October 2020, Ljubljana, Slovenia Rok Cej and Franc Solina Slika 9: Levo: originalna slika; Sredina: projecirana slika na nagnjeno, neravno površino; Desno: pogled na projeci- rano sliko navpično navzdol. Slika 7: Virtualna anamorfoza: pogled iz smeri projektorja (levo) in pogled iz smeri virtualnega opazovalca (desno). natančnim globinskim senzorjem. Vseeno pa je tak način video projekcije na poljubno neravno površino možno uporabiti za šte- projecirane slike izračunamo smer v katero se ta piksel proje- vilne aplikacije. Če bi v živo zajemali globinsko sliko, kar Kinect cira v 3D prostoru. Zanima nas, kje je presečišče med to smerjo nenazadnje omogoča, bi bilo možno projecirati nedeformirane in projekcijsko površino, ki pa je predstavljena kot oblak 3D slike in video tudi na gibajoče se tarče. točk. Dodaten problem povzroča še različna resolucija projeci- rane slike, ki je veliko višja od resolucije globinskega senzorja ZAHVALA (Kinecta), ki definira oblak 3D točk. Zato večina pikslov proje- Raziskovalni program Računalniški vid št. P2-0214 (B) je sofi- cirane slike ni imela direktne korespondenčne 3D točke, ampak nancirala Javna agencija za raziskovalno dejavnost Republike smo morali iz štirih najbližjih 3D točk izračunati približek prese- Slovenije iz državnega proračuna. čišča. Za vsako presečiščno točko smo nato, upoštevaje pozicijo virtualnega opazovalca, lahko povezali piksle projecirane slike z LITERATURA ustreznim pikslom na sliki. [1] Rok Cej. Demonstracija anamorfoze na neravno površino Ker je ta postopek dokaj zamuden, smo uporabili večnitno (video). 2020. url: http://youtu.be/_eypZlZTRcM (prido-procesiranje, saj je določanje vrednosti posameznih pikslov v bljeno 10. 9. 2020). projecirani sliki, neodvisno drug od drugega. Primer tako izraču- [2] Daniel L Collins. “Anamorphosis and the Eccentric Obser- nane projecirane slike je na sliki 8. ver: History, Technique, and Current Practice”. V: Leonardo 25.2 (1992), str. 179–187. [3] Daniel L Collins. “Anamorphosis and the Eccentric Obser- ver: Inverted Perspective and Construction of the Gaze”. V: Leonardo 25.1 (1992), str. 72–82. [4] Creators. Box. 2013. url: https://youtu.be/lX6JcybgDFo (pridobljeno 24. 9. 2020). [5] Anselm Grundhöfer in Daisuke Iwai. “Recent advances in projection mapping algorithms, hardware and applicati- ons”. V: Computer Graphics Forum. Zv. 37. 2. Wiley Online Library. 2018, str. 653–675. [6] Brett Jones in sod. “RoomAlive: Magical Experiences Ena- Slika 8: Anamorfoza v oblaku 3D točk (levo) in projecirana bled by Scalable, Adaptive Projector-Camera Units”. V: anamorfično deformirana slika (desno). Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology. 2014, str. 637–644. [7] Lightform, Design Tools for Projection. 2020. url: http:// Kalibracija. Preden posnamemo 3D model površine za projeci- lightform.com (pridobljeno 10. 9. 2020). ranje ga moramo kalibrirati z video projektorjem. Implementirali [8] Robert Ravnik in sod. “Dynamic anamorphosis as a special, smo funkcijo, ki na oblak 3D točk nariše rdeč pravokotnik, ki computer-generated user interface”. V: Interacting with computers predstavlja področje, za katerega Kinect pričakuje, da bo nanj 26.1 (2014), str. 46–62. projecirana slika. Uporabnik mora nato ročno poravnati pozi- [9] Franc Solina in Blaž Meden. “Light fountain–a virtually cijo Kinecta ali video projektorja tako, da se rdeči pravokotnik enhanced stone sculpture”. V: Digital Creativity 28.2 (2017), poravna s projecirano sliko. str. 89–102. [10] Solina, Franc. Light Fountain 2 - Galaxy. 2018. url: http: 6 REZULTATI IN ZAKLJUČEK //youtu.be/y6NAiXlNm20 (pridobljeno 10. 9. 2020). [11] Wikipedia contributors. Felice Varini — Wikipedia, The Free Slika 9 prikazuje projekcijo fotografije v horizontalni smeri na na-Encyclopedia. 2020. url: http://en.wikipedia.org/w/index. gnjeno razbrazdano kamnito površino in pogled na to projekcijo php?title=Felice_Varini&oldid=953793776 (pridobljeno navpično navzdol, kjer se anamorfoza razkrije — proporci slike 10. 9. 2020). so enaki kot na originalni fotografiji. Program na zmogljivem [12] Wikipedia contributors. Projection mapping — Wikipedia, osebnem računalniku teče dovolj hitro, da lahko v realnem času The Free Encyclopedia. 2020. url: https://en.wikipedia.org/ procesiramo tudi video [1]. wiki/Projection_mapping (pridobljeno 10. 9. 2020). Zaradi nenatančnosti pri zajemu globinske slike je v anamor- fični sliki še nekaj nenatančnosti, kar bi bilo možno preseči z bolj 44 Učinkovita predstavitev slovarskih jezikovnih virov pri govornih tehnologijah Jerneja Žganec Gros Žiga Golob Simon Dobrišek Alpineon d.o.o. Alpineon d.o.o. Univerza v Ljubljani, FE Ulica Iga Grudna 15 Ulica Iga Grudna 15 Tržaška cesta 25 1000 Ljubljana, Slovenija 1000 Ljubljana, Slovenija 1000 Ljubljana, Slovenija jerneja.gros@alpineon.si ziga.golob@alpineon.si simon.dobrisek@fe.uni-lj.si POVZETEK Pregled znanstvene literature pokaže, da pri izgradnji govornih tehnologij za jezike z velikim številom pregibnih oblik uporaba Končni pretvorniki predstavljajo kompakten način za predstavitev postopkov, ki so bili razviti za angleški jezik, ni učinkovita (Golob, slovarjev izgovarjav, ki jih potrebujemo pri sintezi ali prepoznavi 2012). Zaradi velikega števila pregibnih oblik besed pri istem govora. V članku je predstavljena nadgradnja končnih številu leksemov se obsežnost prepoznavalnika slovenskega pretvornikov, t.i. končni super pretvorniki, s katerimi lahko govora v primerjavi s primerljivim prepoznavalnikom angleškega razširjeni slovar izgovarjav predstavimo z manjšim številom stanj govora vsaj podeseteri. Zato je potrebno posebno pozornost in prehodov kot s pomočjo minimalnega determinističnega posvetiti prav optimizaciji uporabljenih modelov in njihovi končnega pretvornika. Končni super pretvornik ohranja adaptaciji na morfološke posebnosti pregibno bogatih jezikov. determinističnost, poleg besed iz slovarja lahko dodatno sprejme Pomemben del govorno tehnološke aplikacije, kot je denimo tudi nekatere druge, neznane besede. Pri tem so lahko oddani sintetizator govora, predstavlja sistem za pretvorbo grafemskega izhodni alofonski prepisi za določene neznane besede napačni, zapisa besed v alofonski prepis. Samodejno določanje alofonskega vendar se izkaže, da je napaka primerljiva s trenutno najboljšimi prepisa v slovenščini temelji na množici kontekstno odvisnih metodami za določanje grafemsko-alofonske pretvorbe. pravil, pri čemer moramo poznati besedni naglas (Gros in Mihelič, KLJUČNE BESEDE 1999). Samodejno določanje besednega naglasa slovenskih besed zaradi nepredvidljivosti naglasnega mesta predstavlja zahtevno govorne tehnologije, jezikovni viri, sinteza govora, slovarji nalogo (Golob, 2009), zato je za kvalitetno sintezo slovenskega izgovarjav govora nujna uporaba obsežnih slovarjev izgovarjav. Slovar izgovarjav predstavlja preslikavo grafemskih zapisov besed v alofonske prepise. Pri pregibno bogatih jezikih, kot je 1 Uvod slovenščina, lahko slovarji vsebujejo več milijonov slovarskih Govorno podprti uporabniški vmesniki omogočajo uporabniško vnosov, zaradi česar je lahko njihova uporaba v pomnilniško manj prijazno interaktivno komunikacijo, še posebej v okolju mobilnih zmogljivih sistemih, kot so npr. vgrajeni sistemi, problematična. V komunikacij. Sodobni koncepti sistemov govorne komunikacije se teh primerih je nujna uporaba postopkov, ki omogočajo v praksi prenašajo na majhne prenosne naprave, ki so zasnovane na pomnilniško učinkovito predstavitev slovarjev. vgrajenih sistemih (angl. embedded systems), za katere sta značilna Zato smo želeli poiskati in preizkusiti učinkovite postopke za omejena procesorska moč ter pomnilniška zmogljivost. Za uspešen zmanjševanje odvečnosti pri predstavitvi in računalniškem zapisu razvoj in uporabo govorno podprtih aplikacij na prenosnih jezikovnih virov za pregibno bogate jezikovne skupine, ki bodo napravah je potrebno zagotoviti učinkovite in visoko kakovostne omogočali hitro, pomnilniško čim manj zahtevno ter komponente sistema govornega dialoga, to je uspešnost visokokakovostno pretvorbo grafemskega zapisa besed v fonetični avtomatskega razpoznavanja govora in kakovostno, razumljivo in prepis in obratno. naravno zvenečo sintezo govora. V literaturi je mogoče zaslediti predvsem tri metode, ki Implementacija predstavitve leksikalnih jezikovnih virov v omogočajo pomnilniško učinkovito predstavitev slovarjev celovitih sistemih za prepoznavanje ali sintezo govora na vgrajenih izgovarjav, in sicer s pomočjo oštevilčenih končnih avtomatov platformah predstavlja netrivialen problem, ki ga še dodatno (Lucchesi in Kowaltowski, 1993; Daciuk in Piskorski, 2011), otežujejo omejitve zaradi uporabljene strojne opreme. dreves predpon (Ristov, 2005) ter končnih pretvornikov (odslej kratko KP) (Mohri, 1994; Golob at al., 2012). V tem delu bomo Permission to make digital or hard copies of part or all of this work for personal or predstavili nov način predstavitve s pomočjo končnih super classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation pretvornikov (odslej kratko KSP), ki predstavljajo nadgradnjo KP. on the first page. Copyrights for third-party components of this work must be honored. Poleg manjše predstavitve slovarjev v primerjavi s KP, lahko s KSP For all other uses, contact the owner/author(s). Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia z visoko točnostjo določimo alofonski prepis tudi nekaterim © 2020 Copyright held by the owner/author(s). neznanim besedam oz. besedam, ki niso vsebovane v izvirnem slovarju izgovarjav. 45 V članku bomo najprej na kratko predstavili KP ter prikazali, 3 Vpliv velikosti slovarja izgovarjav na velikost kako lahko z njimi predstavimo slovar izgovarjav. Nadalje bomo končnega pretvornika KP pokazali, da zastopanost pregibnih oblik v slovarju močno vpliva na velikost KP. Sledi predstavitev KSP, ki predstavljajo nov način V tem eksperimentu smo želeli preveriti odvisnost velikosti KP od velikosti slovarja, ki ga želimo predstaviti. Na voljo smo imeli predstavitve slovarjev. Rezultate bomo predstavili na jezikovnih slovar SI-PRON za slovenski jezik, ki vsebuje več kot milijon virih, ki so bili nadgrajeni v okviru projekta OptiLEX. različnih slovarskih vnosov (Žganec-Gros et al., 2006). Slovar smo razširili z dodatnimi leksikalnimi enotami, ki smo jih razvili v 2 Končni pretvorniki (KP) ter predstavitev okviru projekta OptiLEX. slovarjev izgovarjav Z naključnim izbiranjem slovarskih vnosov smo zgradili več pod-slovarjev različnih velikosti in za vse pod-slovarje zgradili KP sestavljajo stanja ter prehodi med stanji. Vsak prehod ima MDKP. vhodno in izhodno oznako. Ko se na vhodu KP pojavi določen Velikost MDKP, je podobno kot v eksperimentu, izvedenemu vhodni niz, se ta nahaja v začetnem stanju. KP nato po vrsti na manj obsežnih jezikovnih virih (Golob et al., 2012), dosegla vrh sprejema vhodne simbole. Pri vsakem sprejetju vhodnega simbola pri 78% do 83% velikosti prvotnega slovarja. To pomeni, da začne odda izhodni niz simbolov, ki ga določa izhodna oznaka velikost MDKP pri določeni velikosti z dodajanjem novih besed oz. pripadajočega prehoda, ter se premakne v naslednje stanje. Če za slovarskih vnosov iz slovarja upadati. To opažanje je bilo poljuben vhodni simbol v trenutnem stanju ne obstaja prehod, ki motivacija za razvoj nove vrste končnih pretvornikov, ki jim ima vhodno oznako enako temu simbolu, pravimo, da KP vhodnega pravimo končni super pretvorniki., in jih opisujemo v naslednjem niza ne sprejema. Če se KP po prejetju vseh simbolov vhodnega razdelku. niza nahaja v končnem stanju, pravimo, da vhodni niz sprejema, pri tem pa postane oddan izhodni niz veljaven. Omenimo še to, da je lahko vhodna ali/in izhodna oznaka enaka praznemu simbolu oziroma nizu. KP, ki imajo v poljubnem stanju največ en prehod z določeno vhodno oznako, pravimo deterministični KP. Za takšne KP je Slika 2: MDKP za izmišljen slovar, katerega ključi so hitrost pretvorbe vhodnega niza v izhodni niz zelo hitra in ob sestavljeni iz vseh možnih izborov dveh črk od treh možnih – a, primerni izvedbi odvisna samo od dolžine vhodnega niza. Druga b in c. Pri tem so vrednosti enake ključem. prednost determinističnih KP je ta, da obstajajo učinkoviti algoritmi za njihovo minimizacijo. Tako dobimo minimalni KP, ki Da bi si ta pojav lahko lažje predstavljali, poglejmo minimalni ima najmanjše število prehodov in stanj med vsemi ekvivalentnimi primer, ki prikazuje mehanizem tega zmanjšanja velikosti MDKP. KP (Mohri, 1997), torej KP, ki za poljuben sprejet vhodni niz Slovarski vnosi so sestavljeni iz para ključ, vrednost. Pri slovarju oddajo enak izhodni niz. izgovarjav tako grafemski zapis predstavlja ključ, alofonski prepis pa vrednost. Kot primer vzemimo vzorčni slovar, katerega ključi so sestavljeni iz vseh možnih izborov dveh črk od treh možnih, npr. črk a, b in c. Na ta način dobimo 9 različnih ključev, in sicer: aa, ab, ac, ba, bb… Zaradi enostavnosti naj bodo pripadajoče vrednosti enake ključem. MDKP za ta slovar prikazuje slika 2. Slika 1: Primer KP, ki predstavlja slovar izgovarjav za štiri slovenske besede: hiš, hiša, hiter in hitra. Krogi predstavljajo stanja, puščice pa prehode med stanji. Vsak prehod je označen z vhodno in izhodno oznako, ki sta ločeni z dvopičjem. Začetno stanje je označeno z odebeljenim krogom, končna stanja pa z Slika 3: MDKP za enak slovar, kot ga predstavlja MDKP na dvojnim krogom. sliki 2, pri čemer mu manjka slovarski vnos cc : cc. Vseh KP ni mogoče determinizirati, saj imajo deterministični KP manjšo izrazno moč kot nedeterministični (Hellis, 2004). KP, Sedaj iz omenjenega slovarja odstranimo slovarski vnos cc : cc ki predstavlja slovar izgovarjav, lahko vedno determiniziramo, če ter ponovno zgradimo MDKP. Rezultat prikazuje slika 3. Opazimo iz slovarja odstranimo enakopisnice. Slika 1 prikazuje primer lahko, da se je pri odstranitvi slovarskega vnosa iz slovarja minimiziranega in determiniziranega KP (odslej kratko MDKP), ki kompleksnost MDKP povečala, saj je za predstavitev slovarja predstavlja slovar za štiri slovenske besede. potrebno eno dodatno stanje ter dva dodatna prehoda. V (Golob et al., 2012) in (Golob et al., 2016) smo podrobneje raziskali vzroke, 46 ki vplivajo na zmanjšanje MDKP pri predstavitvi slovarja pri pregibne oblike, saj si lahko v tem primeru leme, ki se enako dodajanju novih slovarskih vnosov. pregibajo, del končnega pretvornika, ki pretvarja končnice, v celoti delijo. Kompleksnost pri tem še vedno povečujejo besede oz. leme besed, ki imajo med pregibnimi oblikami kakšno izjemo, ki se 3.1 Vpliv množičnosti pregibnih oblik na pregiba nekoliko drugače. V našem izmišljenem slovarju je to lema velikost slovarja izgovarjav zasp, katere dve pregibni obliki imata nekoliko drugačno končnico, Preverili smo vpliv množičnosti pregibnih oblik lem besed iz in sicer končnico al ter ali namesto il ter ili. slovarja na velikost MDKP. Pri tem z množičnostjo pregibnih oblik mislimo na število različnih pregibnih oblik za določeno lemo. Za primer smo vzeli besedo skopati ter v slovarju poiskali vse slovarske vnose, katerih grafemski zapisi predstavljajo pregibne oblike leme izbrane besede. Dobili smo 27 različnih slovarskih vnosov, iz katerih smo s pomočjo naključnega izbiranja vnosov tvorili še štiri različno velike pod-slovarje. Za vsak pod-slovar smo zgradili MDKP. Iz rezultatov je razvidno, da hitrost naraščanja velikosti MDKP z večanjem slovarja rahlo pada, vendar pa ni opaziti obrata trenda povečevanja MDKP (Golob et al., 2012). 3.2 Vpliv množičnosti pregibnih oblik na velikost slovarja izgovarjav Poglejmo sedaj, kako na velikost MDKP vpliva zastopanost pregibnih oblik v slovarju, sestavljenem iz večih besed, ki se podobno pregibajo. Iz slovarja SI-PRON smo izbrali 28 grafemskih zapisov besed, katerih pregibne oblike imajo 9 različnih končnic ter pripadajo štirim različnim lemam - potop, osmod, zasp, natoč. Slika 4: Del MDKP, ki predstavlja celoten slovar z vsemi 28 Izbrane leme ter pripadajoče končnice so prikazane v tabeli 1. vnosi. Prikazan je le del, ki pretvarja končnice vnosov. Lema zasp pri tem predstavlja izjemo, ki se pregiba nekoliko drugače kot ostale tri. MDKP sprejme samo vnose, ki so vsebovani v slovarju. Če za določeno aplikacijo tako stroga zahteva ni potrebna in je dovolj, da Tabela 1: Tabela prikazuje postopek za tvorjenje vseh besed, MDKP sprejme vse vnose iz slovarja, ga lahko naprej ki so vsebovane v slovarju. V levem stolpcu so navedene leme poenostavimo. Še enostavnejšo obliko bi namreč dobili, če bi za besed, v desnem pa možne končnice. vse štiri leme iz slovarja obstajale pregibne oblike za vseh 9 možnih končnic. V slovar lahko tako dodamo dodatne vnose in sicer vnose MOŽNE LEME MOŽNE KONČNICE z lemami potop, osmod, natoč ter končnicama al ter ali, ter vnosa z potop, osmod, zasp, natoč iš,im,imo,ite,ijo lemo zasp in končnicama il ter ili. Pridobljeni slovar ima tako 36 potop, osmod,natoč vnosov, MDKP pa se poenostavi na 23 stanj in 30 prehodov. i+l,i+li zasp a+l, a+li 4 Končni super pretvornik (KSP) Iz teh besed smo nato tvorili slovar, pri čemer smo zaradi V prejšnjem razdelku smo pokazali, da lahko s pomočjo dodatnih, enostavnosti vrednosti ključev izenačili s ključi. Nato smo z izbranih slovarskih vnosov v slovar zmanjšamo kompleksnost naključnim izbiranjem iz tega slovarja tvorili še štiri različno velike MDKP. Problem predstavlja iskanje takšnih slovarskih vnosov, ki pod-slovarje. Za vse tako zgrajene slovarje smo zgradili MDKP. bi zmanjšali kompleksnost, še posebej v primeru realnih slovarjev, Velikost MDKP, ki predstavlja vseh 28 vnosov slovarja, manjša kot so npr. slovarji izgovarjav, ki so prvič večji, drugič pa se ključ od MDKP, ki predstavljata slovarja s 23 in 17 vnosi, število stanj in vrednost posameznih slovarskih vnosov razlikujeta, s čimer je pa je večje celo pri MDKP, ki predstavlja slovar z 9 vnosi. Rezultati iskanje primernih slovarskih vnosov težja naloga. nakazujejo, da zastopanost pregibnih oblik močno vpliva na Problema smo se zato lotili na drugačen način, in sicer tako, da smo kompleksnost pridobljenega MDKP ter lahko vpliva na obrat združevali določena stanja, pri čemer smo želeli zadostiti trenda rasti velikosti MDKP. naslednjima dvema pogojema: Slika 4 prikazuje shematski prikaz dela MDKP, ki predstavlja • Pridobljeni KP mora ostati determinističen. končnice besed za slovar z 28 vnosi. Kompleksnost MDKP, ki • Pridobljeni KP mora sprejemati vse ključe prvotnega slovarja predstavlja slovar z manj vnosi, je precej večja. Zato je smiselno, ter za sprejete ključe oddati pravilne pripadajoče vrednosti. da so v slovarju, ki ga želimo realizirati s KP, prisotne vse možne 47 Tako smo lahko združevali samo stanja, ki so imela določene 6 Zaključek lastnosti. Takšna stanja smo poimenovali združljiva stanja. Dve V članku je predstavljen nov tip KP, ki smo jih poimenovali končni stanji sta združljivi, če zadoščata naslednjim pogojem: super pretvorniki (KSP), ki poleg želenih besed sprejemajo še • Če je eno od stanj končno stanje, stanji ne smeta imeti nekatere druge z namenom, da lahko pretvorbo želenih besed izhodnih prehodov s praznimi vhodnimi simboli oz. ε simboli. predstavimo bolj kompaktno. Rezultat združevanja takšnih stanj je lahko nedeterministični KP. Pokazali smo, da lahko pri predstavitvi slovarja izgovarjav s • Stanji nimata izhodnih prehodov z enakimi vhodnimi simboli pomočjo KSP število stanj in prehodov zmanjšamo za več kot 20%, ter različnimi izhodnimi simboli. ko so za vsebovane leme v slovarju izgovarjav prisotne tudi vse • Stanji nimata izhodnih prehodov z enakimi vhodnimi simboli pripadajoče pregibne oblike besed. ter enakimi izhodnimi simboli, ki prehajajo v različna naslednja Ker KSP sprejemajo še druge, neznane besede, za katere lahko stanja, ki so nezdružljiva. oddajo napačen izhodni niz, so KSP uporabni predvsem v Da bi lahko določili združljiva stanja, je potrebno preveriti aplikacijah, kje ne potrebujemo informacije o tem, katere besede so zgornje pogoje, kar pa je v praksi lahko problematično, saj je vsebovane v KP ampak le informacijo o pravilni pretvorbi danih preverjanje združljivosti stanj zaradi rekurzivnosti, ki je lahko besed oz. besed, iz katerih smo zgradili KSP. ciklična, zahtevno. V ta namen smo zadnji pogoj poenostavili: • Stanji nimata izhodnih prehodov z enakimi vhodnimi simboli ter enakimi izhodnimi simboli, ki prehajajo v različna nasled Zahvala nja stanja. Razvojno raziskovalno delo je delno sofinancirala Javna agencija Zaradi poenostavitve pogoja za združljivost stanj nekaterih za raziskovalno dejavnost Republike Slovenije v sklopu združljivih stanj nismo mogli zaznati. KSP smo zgradili tako, da aplikativnega raziskovalnega projekta OptiLEX (L7-9406). smo najprej zgradili MDKP, nato pa smo nadalje združili vsa stanja, ki so združljiva. Za vsako stanje je bilo potrebno preveriti, LITERATURA IN VIRI ali je združljivo s katerim koli drugim stanjem. Ker nekatera stanja [1] Cyril A., Michael R., Johan S., Wojciech S., Mohri M., 2007. OpenFst: A postanejo združljiva šele, ko združimo neka druga stanja, je bilo General and Efficient Weighted Finite-State Transducer Library. Proceedings of potrebno to storiti v več iteracijah. the 12th International Conference on Implementation and Application of Automata (CIAA 2007). Lecture Notes in Computer Science, Prague, Springer-Verlag, Heidelberg, Germany, 4783: 11-23. [2] Daciuk J., Piskorski J., Ristov S., 2011. Natural Language Dictionaries 5 Predstavitev slovarja izgovarjav s končnimi Implemented as Finite Automata. Scientific Applications of Language Methods. super pretvorniki KSP London: Imperial College Press, World Scientific Publishing. [3] Golob Ž., 2009. Samodejno določanje mesta besednega naglasa pri sintezi Za razširjeni slovar izgovarjav iz razdelka 3 smo najprej zgradili slovenskega govora. Diplomsko delo, Fakulteta za elektrotehniko v Ljubljani. MDKP s pomočjo odprtokodnega orodja OpenFST (Cyril at al., [4] Golob Ž., Žganec-Gros J., Žganec M., Vesnicer B., Dobrišek S., 2012. FST-2007), nato pa smo s postopkom, ki smo ga opisali v razdelku 4, Based Pronunciation Lexicon Compression for Speech Engines. International zgradili še KSP. Tabela 2 prikazuje število stanj in prehodov Journual of advanced robotic systems, 9: 2012. MDKP in KSP. [5] Golob Ž., Žganec-Gros J., Štruc, V., Mihelič, F., Dobrišek S, 2016. A composition algorithm of compact finite-state super transducers for grapheme-to-phoneme conversion. Proceedings of the Text, speech and dialogue Tabela 2: Zmanjšanje števila stanj in prehodov pri gradnji Conference, Brno, Czech Republic, September 12-16, 2016. Switzerland: KSP iz MDKP. Springer, str. 375-382, Lecture notes in artificial intelligence, 2016. [6] Gros J., Mihelič F., 1999. Acquisition of an Extensive Rule Set for Slovene MDKP KSP Sprememba Grapheme-to-Allophone Transcription. Proceedings 6th European Conference on Speech Communication and Technology. September 5–9. 1999. Eurospeech 1999. Budapest, 5: 2075–2078. 1 izhodni Stanja 246.262 186.476 24.3% simbol Prehodi 556.723 441.234 20.7% [7] Hellis T., 2004. On minimality and size reduction of one-tape and multitape finite automata. Doktorska disertacija. [8] Lucchesi C., Kowaltowski T., 1993. Applications of Finite Automata Opazimo lahko, da smo velikost KSP v primerjavi z velikostjo Representing Large Vocabularies. Software-Practice & Experience, 23: 15-30. MDKP uspeli zmanjšati za več kot 20%. [9] Mohri M., 1994. Compact Representations by Finite-State Transducers. 32nd Čeprav lahko s KSP vnose v slovarju predstavimo z manjšim Meeting of the Association for Computational Linguistics (ACL '94). KP kot v primeru MDKP, pri tem izgubimo informacijo o tem, Proceedings of the Conference. Las Cruces. NM, pp. 204–209. katere besede so vsebovane v slovarju. Tako se lahko zgodi, da KSP [10] Mohri M., 1997. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 33: 269–311. sprejme določeno besedo, ki je slovnično pravilna, vendar ni bila vsebovana v slovarju. V tem primeru je lahko oddan alofonski [11] Ristov S., 2005. LZ Trie and Dictionary Compression. Jurnual Software-prepis napačen. Practice & Experience, pp. 445–465. [12] Žganec-Gros J., Cvetko-Orešnik V., Jakopin P., 2006. SI-Pron Pronunciation Lexicon: A New Language Resource for Slovenian. Informatica, 30: 447–452. 48 The Fundamentals of Sound Field Reproduction Using a Higher Order Ambisonics System Rok Prislan* rok.prislan@innorenew.eu InnoRenew CoE Livade 6, SI-6310, Izola, Slovenia ABSTRACT Conventional sound recording methods are based on record- ing the sound pressure level with a microphone which is after some signal processing reproduced by loudspeakers. In spa- tial audio, more than one microphone and loudspeaker are required to provide the sound source location information to the listener. Several spatial audio formats have been de- veloped and some have successfully entered our homes, such as the the multichannel 5.1 surround system. Among spatial audio formats, Ambisonics stands out due to its capability of capturing and reproducing the whole sound field and is not limited to predefined loudspeaker setups. In the paper, the InnoRenew CoE’s Ambisonics system is introduced and some of its underlying principles are explained. Furthermore, practical examples of the use of Ambisonics, also in relation to Virtual reality applications, are presented. KEYWORDS higher order Ambisonics, sound field reproduction 1 INTRODUCTION Figure 1: The higher order Ambisonics reproduction Michael Gerzon [1] invented Ambisonics in the 1970s, and system with 64 loudspeakers (top) and the Ambison-since it has mainly been a research topic in acoustics. It’s ics microphone [2] (bottom) which are part of the higher order version was developed twenty years later but only InnoRenew CoE’s acoustic laboratory equipment. recently it has become a commercially available recording system [2]. Currently, more and more user applications of Ambisonics are emerging since Ambisonicsis is being positioned as the audio framework of choice for virtual reality [3, 4]. 2 RECORDING AND ENCODING The acoustic laboratory of InnoRenew CoE has currently Ambisonics is a method of recording and reproducing a sound been equipped with a higher order Ambisonics system. The field and preserving its directional properties. The signal system is composed of a 32 channel microphone [2], a set of is coded, which is different in comparison with traditional 64 full range loudspeakers, a dedicated low frequency loud- multichannel audio formats (e.g., stereo, and 5.1 surround). speaker, all the required AD/DA converters and accessories, In those, each channel contains the signal corresponding to such as stands and cables. The equipment in shown on Fig- a loudspeaker while in Ambisonics each channel contains ure 1. derivatives of the pressure field. The encoded signals are The system will be used for perceptual acoustic experi- known as B format. ments, mainly by exposing test subjects to different acoustic In Ambisonics we record with several microphones spher- conditions and investigating their response. In fact, room ically arranged on a (virtual) sphere. Summing properly acoustic conditions are essential for a healthy and creative weighted signals from each microphone is equivalent to record- working environment – one of the important research topics ing with a microphone of a certain directional characteristic. at InnoRenew CoE. Another use of Ambisonics is in combi- Such processing is the basis of Ambisonics encoding [2], in nation with virtual reality systems (e.g. [7]) that can provide which case the chosen directional patterns correspond to a multi-sensoric immersion experience to users. spherical harmonic functions (see figure 2). 49 HCI-IS ’20, October 05–09, 2020, Ljubljana, Slovenia R. Prislan Figure 3: Example of a cardioid (left) and super- cardioid (right) microphone polar pattern (figure Figure 2: Polar patterns of spherical harmonics from [8]). 𝑌 𝑚 𝑛 (𝜃, 𝜙) of zero, first, second, third and fourth or- der (from top to bottom) (figure from [5]). surround) that consider fixed loudspeakers position is in- Spherical harmonic functions are grouped by their order dependence on the loudspeaker setup. In Ambisonics, the number 𝑛 and particular coefficient 𝑚 = −𝑛, ...𝑛. Mathemat- decoding from the B format takes into account the actual ically, each spherical harmonic corresponds to the angular position of the available loudspeakers, which can be arbitrary portion of the solution of the wave equation. This way it is chosen. Nevertheless, a high number of loudspeakers spatially possible to capture the whole sound filed as it can be, in fact, distributed around the listener are required to provide a full decomposed into spherical harmonic functions and precise spatial impression. ∞ 𝑛 The number of loudspeakers required is as well dependent ∑︁ ∑︁ 𝑝(𝑘, r, 𝜃, 𝜙) = 4𝜋𝑖𝑛𝑗𝑛(𝑘𝑟)𝐴𝑛,𝑚𝑌 𝑚 𝑛 (𝜃, 𝜙) (1) on the order of the system. The 𝑁 −th order requires a 𝑛=0 𝑚=−𝑛 minimum (𝑁 +1)2 loudspeakers, meaning that 9 loudspeakers where 𝜙 and 𝜃 are the azimuth and elevation, r is the spatial are required for the 2nd order, 16 for the 3rd and 25 for the coordinate and 𝑘 is the wavenumber. 4th. The general idea of a higher order Ambisonics encoding is There are several strategies for decoding the B format to to record sound with directionality patterns that correspond be reproduced on a setup of loudspeakers. The basic idea is to to polar patterns of spherical harmonics. As such, it is possible directionally filter the recorded signals by virtual microphones to encode the sound field in form of spherical harmonic pointing in the direction of each loudspeaker. decomposition factors instead of the sound pressure level at Setting the proper directionality patterns (see Fig 3) is the each microphone position. important part of the decoding process. In a regular layout, The maximum order 𝑁 at which we perform the expansion the signal emitted by a loudspeaker is the same as it would defines the order of the Ambisonic system. Each order con- be recorded by a supercardioid microphone pointing towards tains 2𝑁 + 1 channels, meaning that in total the ambisonics that direction [6]. This means almost all loudspeakers emit system of order 𝑁 has (𝑁 + 1)2 channels that have to be sound at the same time, and for a given sound source position, stored. Increasing the order to which the decomposition is loudspeakers in the opposite direction emit in opposite phase. done improves the directionality of the recording. An important limiting factor for increasing the Ambisonic 4 THE AMBISONICS SYSTEM IN USE order is the number of microphones positioned on the sphere: Ambisonics systems are an useful research tool in acoustics, the pressure is discretely sampled, which leads to artifacts, mainly because they enable to reproduce sound emitted by such as aliasing. Issues related to low frequency noise and sources together with the acoustic environment in which they several other technical limitations have been studied [3]. Gen-are located. An important example of such use are the inves- erally, increasing the number of microphones is favored, al- tigations carried out by Tapio Lokki [9] with his group who though this obviously increases the cost of the system. have been investigating perceptually relevant acoustic prop- It is important to understand that the B format encoded erties of concert halls. In their research, listeners have been signals can be as well manipulated with proper signal pro- asked about their preferences about the acoustics of different cessing. For example, the sound field can be easily rotated for concert halls in which the same orchestra was performing. As a certain angle, and it is also possible to focus to a certain an individual’s acoustic memory is strongly affected by the direction of the sound field [6]. time that has passed since each concert experience, it is re- quired for such research to migrate the listener and orchestra 3 REPRODUCING THE SOUND FILED between concert halls immediately. This can be achieved by The biggest advantage of Ambisonics over conventional mul- an Ambisonics system in which recordings can be switched tichannel spatial audio techniques (e.g. stereo, 5.1 and 7.1 by a push of a button. 50 The Fundamentals of Sound Field Reproduction Using a Higher Order Ambisonics System HCI-IS ’20, October 05–09, 2020, Ljubljana, Slovenia wearable equipment, which is a more natural condition for the user. A relevant use of Ambisonics in relation to VR is also recording the sound field using an Ambisonics microphone and reproducing it over headphones instead of an Ambisonics reproduction system composed of a high number of loudspeak- ers. In fact, the B format encoded signals can be processed for a binaural playback for any arbitrarily chosen head rotation. Recently, many commercial second order Ambisonics micro- phones containing four microphones have become available on the market together with dedicated digital audio work station plug-ins for binaural decoding. 5 ACKNOWLEDGMENTS Figure 4: Photo of a listener in the Ambisonics loud- The author gratefully acknowledges the European Commis- speakers ring at the InnoRenew CoE’s Acoustic lab. sion for funding the InnoRenew project (Grant Agreement The control over the system and perceptual response #739574) under the Horizon2020 Widespread-Teaming pro- is based on a tablet PC as an interface. gram and the Republic of Slovenia (Investment funding from the Republic of Slovenia and the European Union’s European Regional Development Funds). Currently at InnoRenew CoE, we are setting up the Am- REFERENCES bisonics system for the listener to rate different acoustic [1] Michael A Gerzon. Periphony: With-height sound reproduction. environments. The research is not limited to a specific envi- In: Journal of the Audio Engineering Society 21.1 (1973), pp. 2–10. [2] Eigenmike — mh acoustics LLC, url: ronment type, such as concert halls, but includes acoustic https://mhacoustics.com/products (visited on 16/09/2020). environments to which we are exposed on a daily basis (com- [3] F. Zotter and M. Frank: Ambisonics: A Practical 3D Audio Theory monly referred to as soundscape [12]). The recording will be for Recording, Studio Production, Sound Reinforcement, and Virtual Reality, SpringerOpen, 2019 performed on several different locations that include noisy [4] S. Sherbourne et al. Ambisonics and VR/360 Audio in Pro Tools and pleasant environments, such as high-traffic roads, busy url: http://www.avidblogs.com/ambisonics-vr360-audio-pro-tools- workspaces and nature. hd/ (visited on 16/09/2020) [5] Ambisonics — Wikipedia, The Free Encyclope- The interaction of the user with the system can be designed dia. url: http://en.wikipedia.org/w/index.php?title= in various ways. Firstly, we are relying on a tablet PC as Ambisonics&oldid=656474391 (visited on 16/09/2020). [6] D. Arteaga, Introduction to Ambisonics. url: shown in Fig. 4. Using the tablet, the playback is controlled https://www.researchgate.net/publication/280010078 Introduction and the response from individuals is gathered. The system can to Ambisonics (visited on 16/09/2020) be upgraded with more advanced response tracking options, [7] Virtual Acoustics — Institute of Technical Acoustics RWTH Aachen University url: http://virtualacoustics.org/ (visited on such as performing eye-tracking or tracking the electrodermal 16/09/2020) activity of the test subject. [8] Microphone — Wikipedia, The Free Encyclopedia. url: Spatial sound can be incorporated into virtual reality (VR) https://commons.wikimedia.org/w/index.php?curid=50230608 (visited on 16/09/2020). interfaces, such as VR headsets. The most accessible ap- [9] T. Lokki, Tasting music like wine: Sensory evaluation of concert proach is to use headphones for which the signals have to halls, Physics Today, 67(1), 27 - 32, 2014, doi: 10.1063/PT.3.2242 [10] Head-related transfer function — Wikipedia, The Free be processed based on Head-related transfer functions [10]. Encyclopedia. url: https://en.wikipedia.org/wiki/Head- The main drawback in this case is that wearing headphones related transfer function (visited on 24/09/2020). is not natural to users and can produce discomfort. It is [11] Yost W.A., Hafter E.R. Lateralization. In: Yost W.A., Goure- vitch G. (eds) Directional Hearing. Proceedings in Life Sciences. well known [11] that the listener does not localize the sound Springer, New York, NY (1987). source as being external, but rather positions it in between [12] Soundscape — Wikipedia, The Free Encyclopedia. url: the ears. This phenomenon of using headphones is known as https://en.wikipedia.org/wiki/Soundscape (visited on 24/09/2020). lateralization of sound sources [11]. Generally, the relative position/orientation of the sound source in relation to the listener’s ears changes over time, meaning that Head-related transfer functions applied to pro- cess the audio content have to adopt accordingly. Therefore, when using headphones in VR head tracking and real time audio processing are required. In this perspective, the use of Ambisonics advantageous as the full sound field is reproduced and the listener can freely rotate his head while localization clues are correctly perceived. Additionally, in Ambisonics the ears are free from 51 The use of eCare services among informal carers of older people and psychological outcomes of their use Kaja Smole-Orehek Vesna Dolničar Simona Hvalič-Touzery kaja.smole-orehek@fdv.uni-lj.si vesna.dolnicar@fdv.uni-lj.si simona.hvalic-touzery@fdv.uni-lj.si University of Ljubljana, Faculty of University of Ljubljana, Faculty of University of Ljubljana, Faculty of Social Sciences Social Sciences Social Sciences Ljubljana, Slovenija Ljubljana, Slovenija Ljubljana, Slovenija ABSTRACT Informal carers provide not just physical support, but also With increasing age and longevity, the need for informal social and emotional, as well as making sure that the older care will increase significantly in the coming decades. The people are safe and healthy, so informal care is very demand- use of eCare services has potential benefits in meeting some ing and dynamic [17]. Informal carers are spending a lot of informal carer’s needs. However, there is only a limited of time at the home of the care receiver and the demands understanding of the psychological outcomes of using eCare of providing care can be high, especially to those who are services for informal carers of older people. The aim of this employed. Because of caring duties, some informal carers ad- study is to identify positive and negative psychological out- dressed different issues, such as work interruptions, absences comes of the use of eCare services for employed informal and reduced productivity [3, 17, 20]. Combining employment carers of older people, and to review the psychological out-and care is a challenge to many carers and it can have an comes of the use of different functionalities of eCare ser- influence of informal carers’ physical health, social relation- vices. We have conducted a four-month intervention study ships, as well as the work situations [17]. Many are having among 22 dyads of informal carers and older people. The troubles being understood by their employers or co-workers preliminary results showed a prevalent pattern of positive and to some the career opportunities cannot be obtained outcomes of eCare services for employed informal carers. [17]. Further research is needed on the relationship between the ECare services have a potential to address those needs, use of different functionalities, psychological outcomes and such as decrease the demand on carers and stress alleviation care situations. [1, 4, 11, 17]. Many studies are studying the link between psychological outcomes and the functionalities of eCare ser-KEYWORDS vices, but these studies are very disease-specific, such as psychological outcomes, employed informal carers, telecare, dementia [2, 12, 19]. Informal carers of people with dementia ageing in place have specific needs, and these needs cannot be fully trans- ferred to the needs of informal carers providing a different 1 INTRODUCTION type of care, therefore there is a gap in understanding which The growing pressure on families to provide informal care, functionalities of eCare services can help informal carers in due to demographic aging of the population, leads to a search general to better combine work and care. In addition, many for new and innovative solutions to meet those challenges. studies examined different outcomes and models of eCare An increasing attention is being paid to the role of technology services use and acceptance among the older people [8, 15], and its potentials in supporting older people in their own living aside informal carers of older people. homes and their informal carers. However, understanding The aim of the study was to fill this gap and to identify the the psychological outcomes of the use of eCare services is positive and negative psychological outcomes of the use of limited for informal carers of older people and even more so eCare services on employed informal carers and to review the for working informal carers [1, 4, 6, 11, 14, 18]. psychological outcomes of the use of different functionalities of eCare services. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies 2 METHODS bear this notice and the full citation on the first page. Copyrights for third-Study design party components of this work must be honored. For all other uses, contact the owner/author(s). A four-month qualitative intervention study was performed in 2018 and 2019 in Central Slovenia region. In accordance Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia © 2020 Copyright held by the owner/author(s). with the aim of the study, surveys and interviews were con- ducted with employed informal carers only. The intervention 52 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Kaja Smole-Orehek, Vesna Dolničar, and Simona Hvalič-Touzery study included a dyad of an informal carer and an older per- recorded and completely transcribed. Personal information son (65+). A total of 26 dyads, including older care receivers was made anonymous. All participants received gift vouchers and their primary informal carers, were recruited. The final in recognition of their time and were not charged for their use sample included 22 dyads. of the eCare services. The study was approved by the Slove- nian Commission for Medical Ethics (0120-193/2018/15). Apparatus Older people had one of the two eCare equipment installed Analysis in their homes. Their informal carers used the mobile appli- A descriptive analysis of the quantitative data, comprising cation, which allowed them to monitor certain activities in 755 pages of transcribed interview recordings, was carried the older person’s home and receive notifications in case of out. The qualitative data were subjected to a thematic analy- unexpected event. Both services tested had motion and door sis using the programme Atlas.ti 8 for qualitative data anal- sensors, a pendant alarm, a smoke detector and a mobile ysis. A structural coding was used. This is question-based application for the carers, with alarms in the form of push code that “acts as a labelling and indexing device, allowing notifications and activity monitoring. The second service researchers to quickly access data likely to be relevant to a also offered two additional functionalities, a 24/7 call center particular analysis from larger data set” [13, 16]. Deductive and fall detection, and was used by 7 out of 22 carers. and inductive approaches were combined for data coding and analysis [7]. Participants Purposive sample was used. Eligibility criteria for informal 3 RESULTS carers were: (i) primary carers, (ii) family members of older We examined psychological outcomes of five eCare services person, (iii) providers of a long-term care to older person functionalities: motion detection on the App, Push notifica- (providing at least 5 hours of help per week and minimum 1 tions and alarms on the App, Emergency pendant, Smoke year), (iv) owned a smartphone, (v) interested in study par- detector, Call center and Fall detector. The most frequently ticipation. Care receiver’s eligibility criteria were as follows: reported positive psychological outcome was reassurance, (i) interested in study participation, (ii) old 65 years or more, followed by peace of mind and reduced anxiety. In addition, (iii) need help with activities of daily living, (iv) use also the informal carers mentioned several other positive psy- formal care, (v) live alone in their own household. chological outcomes of using eCare services, including an Informal carers ranged in age from 35 to 67 (M = 53.9, increased sense of control, less stress, the feeling of being SD = 7.56). More than half of them were female (n = 14). less burdened, having positive feelings, a feeling of relief and On average they provided 8.5 hours of care per week (SD = satisfaction (Table 1). 12.15), on average for 6.1 years (SD = 5.89). A great majority of carers were care receiver’s children (n = 20) and two were Push daughters-in-law. notifications and Sensor-based alarms on the motion detection Call centre Emergency Care receivers were on average 83 years old (SD = 6.04), app on the app assistance Fall detector pendant Total Reassurance 8 18 7 2 8 43 ranging from 73 to 92 years. All but two were females and Peace of mind 8 12 4 1 6 31 Reduced anxiety 5 12 2 0 3 22 all but one fell in the last five years, with 14 of them needed Reduced stress 3 6 1 0 1 11 medical assistance afterwards. Five of them were severely Feeling less burdened 2 4 2 0 1 9 Positive feelings 0 2 1 0 1 4 dependent, eight moderately dependent, eight slightly de- A sense of relief 0 3 0 0 0 3 Satisfaction 0 0 1 0 1 2 pendent and one needing some help only occasionally. Total 26 57 18 3 21 Table 1: Positive psychological outcomes of eCare services Procedure use on the employed informal carers During the intervention, quantitative (screening question- naire) and qualitative (semi-structured interviews) data were “Yes, yes, that gives you the feeling of reassurance that they will collected, with qualitative methodology playing a fundamen- inform you, if you are not around when he presses the button. tal role. The survey at the beginning of the intervention col- And then you go on vacation or somewhere else, as I say, even if lected basic social, health, care and demographic data. Two something would happen, you organize other family members semi-structured interviews per informal carer were then con- to make an action.” (Carer 15) ducted (the first after 3 weeks of use and the fourth month), “I will say it, if I did not reach mum over the phone call because each lasting about one hour. They were asked about their of her bad hearing, then I looked at this application and saw caregiving situation, their experience with new technology, that mom is inside doing something. If something was wrong, their use of the tested eCare services and the psychological there is also an option for an emergency pendant, which she outcomes of eCare services. The in-depth interviews were could activate, right . . . ” (Carer 6) 53 The use of eCare services among informal carers of older people ... Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia In our study, informal carers recognized that the emer- would approach to older person too technically rather than gency pendant and sensor-based motion detection are two attentively. of the most helpful functionalities of eCare services, as they “. . . I know, even when mom falls, she is always very confused. are the most common contributors to positive psychological It is much easier when she says, "daughter, I fell," opposing to outcomes. The most useful functionality for those who cared explain it to them. Well, "lady, this and that". I do not know, I for people with severe disabilities was an emergency pen-am absolutely for an option that one of the family members dant, while sensor-based motion detection was more useful has it.” (Carer 4) for those carers who cared for people with mild or moderate dependency. “It would not help much. It would not help us, because if mum The few negative psychological outcomes in our study is alone when she falls (...). I can assume that she is able to say were mostly caused by technical failure and false alarms, something on her own, maybe not, right.” (Carer 8) although some participants were less disturbed than oth- For some carers, eCare services contributed to their ability ers. The most frequently mentioned negative outcome was to be in paid employment and was useful in reconciling care anxiety, followed by distrust and stress. Other reasons for and work obligations. They mentioned that eCare services negative psychological outcomes were mentioned: feelings supported their work and increased their labor productivity of false security, invasion into older person’s privacy, feel- by making it easier for them to concentrate on their work. ings of guilt because they do not help enough, increased “Yes, I can concentrate at work. I do not have to think about worries because they know the everyday patterns of older what if ... It helps, and it relieves you, but still, if she falls, person (Table 2). and her phone is ten feet away and she cannot use it, then it’s useless ...” (Carer 9) “Hm, yes, it caused me more stress personally because I was worried, under other circumstances I would not do that. Under 4 DISCUSSION other circumstances, I would not think about whether she was still cooking or not.” (Carer 2) Our study yielded several important findings. We found that positive psychological outcomes of eCare services use for employed informal carers were much more common than Push Push Sensor-based Call centre Fall detector Emergency Total notifications and Sensor-based notifications motion assistance pendant negative ones. This finding supports the findings of previ- alarms on the motion detection Call centre Emergency and alarms on detection on app on the app assistance Fall detector pendant Total the app the app ous limited studies of informal carers [1, 9, 14]. Despite the Reassurance 8 18 7 2 8 43 Anxiety 6 3 1 0 0 10 Peace of mind 8 12 4 1 6 31 Hesitant 0 0 7 0 0 7 prevalence of positive psychological outcomes, the negative Reduced anxiety 5 12 2 0 3 22 Distrust 1 3 1 1 0 6 Reduced stress 3 6 1 0 1 11 Stress 2 2 0 0 0 4 should not be ignored. In particular studies show that unre- Feeling less burdened 2 4 2 0 1 9 Feeling burdened 1 3 0 0 0 4 Positive feelings 0 2 1 0 1 4 Lack of relief 0 1 0 0 2 3 liable and/or inappropriate technology, which in our study A sense of relief 0 3 0 0 0 3 Doubts 0 1 0 0 2 3 Satisfaction 0 0 1 0 1 2 Discomfort 0 2 0 0 0 2 was the main cause of negative psychological outcomes, can Total 26 57 18 3 21 Less peace of mind 0 2 0 0 0 2 Additional problem 1 1 0 0 0 2 be harmful to both older people and their informal carers No reduced burden 0 1 0 0 0 1 Feeling a moral obligation 0 1 0 0 0 1 [5, 10]. In addition, a difference was also observed in the Sense of guilt 0 1 0 0 0 1 perceived usefulness of individual functionalities in relation Bothered 1 0 0 0 0 1 Unpleasant feeling 1 0 0 0 0 1 to the degree of dependence. However, due to the small num- Total 13 21 9 1 4 Table 2: Negative psychological outcomes of eCare services ber of participants in different dependency groups, further use on the employed informal carers empirical and conceptual studies are needed. Our study also confirmed the complex relationship between the functional- ities of eCare services and the psychological outcomes for “Even more, because she does not want to wear this neckless, employed informal carers. then it seems to me to be useless. You do not need it for anything, We also demonstrated that reassurance was the most fre- it will not be very functional, because then it will not matter if quently identified positive psychological outcome. It was she only has a mobile phone.” (Carer 2) mainly related to sensor-based motion detection on the ap- In the present study, all call center users mentioned pos- plication, the possibility of monitoring the activities of an itive psychological outcomes in relation to it. They mostly older person from distance, e.g. to verify that he or she is felt reassured by their service. However, a few participants moving around home safely. In addition, employed informal who did not have access to the call center service felt reluc- carers reported that reassurance makes it easier for them to tant to use it because they said that they might not have go on business trips, work and concentrate on their work, as enough information about the older person to be able to they are notified in case of an emergency. From this, we can react well, that they would not feel comfortable talking to a conclude that eCare services can provide opportunities for "stranger" and that their situation was too specific for a call employed informal carers of older people to reconcile work center to be helpful. They were worried that the call center and care responsibilities. 54 Information Society 2020, 5–9 October, 2020, Ljubljana, Slovenia Kaja Smole-Orehek, Vesna Dolničar, and Simona Hvalič-Touzery This study examined under-researched aspect of eCare [3] Jill I Cameron, Rene-Louise Franche, Angela M Cheung, and Donna E use in relation to informal care of older people. The method- Stewart. 2002. Lifestyle interference and emotional distress in family ology used allowed a detailed account of the experiences of caregivers of advanced cancer patients. Cancer 94, 2 (2002), 521–527. employed informal carers’ with eCare as well as their percep- [4] Stephanie Carretero, James Stewart, and Clara Centeno. 2015. Infor- mation and communication technologies for informal carers and paid tions of it. However, there are some limitations to this study. assistants: benefits from micro-, meso-, and macro-levels. European The first is the duration of the intervention. When conduct- Journal of Ageing 12, 2 (2015), 163–173. ing an intervention study to investigate the detection and [5] Sara J Czaja. 2016. Long-term care services and support systems for vigilance of a potentially harmful event, a longer duration of older adults: The role of technology. Am Psychol 71, 4 (2016), 294. the intervention is usually advisable, but we were limited in [6] Anna Davies, Lorna Rixon, and Stanton Newman. 2013. Systematic review of the effects of telecare provided for a person with social care time and resources. In addition, the incidence of a harmful or needs on outcomes for their informal carers. Health & social care in unexpected event during the testing phase in our study was the community 21, 6 (2013), 582–597. low, so many participants had no real experience with the [7] Jennifer Fereday and Eimear Muir-Cochrane. 2006. Demonstrating support and protocols for using eCare services. Moreover, rigor using thematic analysis: A hybrid approach of inductive and one of the eCare services tested was still in the testing phase deductive coding and theme development. Int. J. Qual. Methods 5, 1 (2006), 80–92. during the intervention study, so that several false alarms [8] Jean D Hallewell Haslwanter and Geraldine Fitzpatrick. 2018. The occurred, especially at the beginning of the study. development of assistive systems to support older people: issues that affect success in practice. Technologies 6, 1 (2018), 2. [9] Nat Harward. 2016. Caregivers & technology: What they want 5 CONCLUSIONS and need. http://www.aarp.org/content/dam/aarp/home-and- Our study confirmed the potential of eCare services to ad- family/personal-technology/2016/04/Caregivers-and-Technology- dress challenges related to long-term care provision. There AARP.pdf. Accessed: 2020-09-10. are many challenges that Slovenian society needs to address [10] Helen Hawley-Hague, Elisabeth Boulton, Alex Hall, Klaus Pfeiffer, and Chris Todd. 2014. Older adults’ perceptions of technologies aimed at in order to realize the full potential of eCare technologies: falls prevention, detection or monitoring: a systematic review. Inter- (i) Public authorities must recognize the role and caring national journal of medical informatics 83, 6 (2014), 416–426. demands of informal carers and provide them with much [11] Kara Jarrold and Sue Yeandle. 2009. A weight off my mind: Exploring needed support as soon as possible. (ii) Policy makers should the impact and potential benefits of telecare for unpaid carers in promote a policy framework for the creation of eCare ser- Scotland. https://www.scie-socialcareonline.org.uk/a-weight-off- my-mind-exploring-the-impact-and-potential-benefits-of-telecare- vices for carers and beyond [19]. (iii) Affordable and accessi- for-unpaid-carers-in-scotland/r/a11G0000001825wIAA. Accessed: ble eCare services must be made available to informal carers 2010-09-30. and older people [18, 19]. At the same time, we must increase [12] Guang Ying Mo, Renée K Biss, Laurie Poole, Bianca Stern, Karen Waite, their acceptance of such technologies. Therefore, the design and Kelly J Murphy. 2020. Technology Use among Family Caregivers and usability of these technologies should be adapted and of People with Dementia. CAG/ACG (2020), 1–13. [13] Emily Namey, Greg Guest, Lucy Thairu, and Laura Johnson. 2008. personalized to the needs of informal carers [2, 18]. End users Data reduction techniques for large qualitative data sets. should therefore be involved in the test phases [5, 18]. [14] Kaja Smole Orehek, Ines Kožuh, Andraž Petrovčič, Vesna Dolničar, Matjaž Debevc, and Simona Hvalič-Touzery. 2018. Psychological out- comes of eCare technologies on informal carers of older people. IJIC ACKNOWLEDGMENTS 18 (2018). The authors acknowledge the projects (Smart ICT Solutions [15] Sebastiaan Theodorus Michaël Peek. 2017. Understanding technology for Active and Healthy Ageing: Integrating Informal eCare acceptance by older adults who are aging in place: A dynamic perspective. Services in Slovenia, ID L5-7626, Factors impacting inten- Ph.D. Dissertation. Tilburg University. tion to use smart technology enabled care services among [16] Johnny Saldaña. 2015. The coding manual for qualitative researchers. Sage. family carers of older people in the context of long-distance [17] Alice Spann, Joana Vicente, Camille Allard, Mark Hawley, Marieke care, ID J5-1785, Programme Internet research, P5-0399) were Spreeuwenberg, and Luc de Witte. 2020. Challenges of combining financially supported by the Slovenian Research Agency. work and unpaid care, and solutions: A scoping review. Health & Social Care in the Community 28, 3 (2020), 699–715. [18] Vimal Sriram, Crispin Jenkinson, and Michele Peters. 2019. Informal REFERENCES carers’ experience of assistive technology use in dementia care at [1] Stefan Andersson, Christen Erlingsson, Lennart Magnusson, and Eliz- home: a systematic review. BMC geriatrics 19, 1 (2019), 160. abeth Hanson. 2017. Information and communication technology- [19] Vimal Sriram, Crispin Jenkinson, and Michele Peters. 2020. Carers’ mediated support for working carers of older family members: an experience of using assistive technology for dementia care at home: a integrative literature review. IJCC 1, 2 (2017), 247–273. qualitative study. BMJ open 10, 3 (2020), e034460. [2] Laura Block, Andrea Gilmore-Bykovskyi, Anna Jolliff, Shannon Mullen, [20] Birgit Trukeschitz, Ulrike Schneider, Richard Mühlmann, and Ivo and Nicole E Werner. 2020. Exploring dementia family caregivers’ Ponocny. 2013. Informal eldercare and work-related strain. J GERON- everyday use and appraisal of technological supports. Geriatric Nursing TOL B-PSYCHOL 68, 2 (2013), 257–267. (2020). 55 56 Indeks avtorjev / Author index Attygale Nuwan ........................................................................................................................................................................... 21 Cej Rok......................................................................................................................................................................................... 41 Čopič Pucihar Klen .......................................................................................................................................................... 21, 29, 37 Debevc Matjaž ............................................................................................................................................................................. 33 Deja Jordan Aiko .......................................................................................................................................................................... 21 Dobrišek Simon ............................................................................................................................................................................ 45 Dolnicar Vesna ............................................................................................................................................................................. 52 Golob Žiga ................................................................................................................................................................................... 45 Hvalič Touzery Simona ................................................................................................................................................................ 52 Kljun Matjaž ..................................................................................................................................................................... 21, 29, 37 Kolmanič Simon ........................................................................................................................................................................... 25 Kožuh Ines ................................................................................................................................................................................... 33 Lochrie Mark ................................................................................................................................................................................ 37 Lukač Niko ................................................................................................................................................................................... 25 Martinovic Andrej .......................................................................................................................................................................... 5 Mlakar Miha ................................................................................................................................................................................. 13 Pejović Veljko ................................................................................................................................................................................ 5 Plankelj Marko ............................................................................................................................................................................. 25 Prislan Rok ................................................................................................................................................................................... 49 Rizvić Selma ................................................................................................................................................................................ 25 Roglej Peter .................................................................................................................................................................................. 29 Romih Miro .................................................................................................................................................................................. 17 Šef Tomaž .................................................................................................................................................................................... 17 Škrlj Peter ..................................................................................................................................................................................... 37 Smole Orehek Kaja ...................................................................................................................................................................... 52 Solina Franc ................................................................................................................................................................................. 41 Sotlar Gregor ................................................................................................................................................................................ 29 Štravs Miha .................................................................................................................................................................................. 13 Tušar Tea ........................................................................................................................................................................................ 9 Žganec Gros Jerneja ............................................................................................................................................................... 17, 45 Zorko Monika ............................................................................................................................................................................... 33 Zupančič Jernej ............................................................................................................................................................................ 13 57 58 IS Interakcija človek-računalnik v informacijski družbi Human-Computer Interaction in Information Society 20 Veljko Pejović, Matjaž Kljun, Vida Groznik, Domen Šoberl, Klen Čopič Pucihar, Bojan Blažica, Jure Žabkar, Matevž Pesek, Jože Guna, Simon Kolmanič 20 Document Outline 02 - Naslovnica - notranja - H - TEMP 03 - Kolofon - H - TEMP 04, 05 - IS2020 - Predgovor & Odbori 07 - Kazalo - H 08 - Naslovnica podkonference - H 09 - Predgovor podkonference - H 10 - Programski odbor podkonference - H 01 - HCI-IS_2020_paper_1 Abstract 1 Introduction and Background 2 Methodology Mobile Application Data collection campaign 3 Mobile Ad Perception Modelling User ID-based model Personality-based model Predictive personality-based model 4 Discussion and Conclusion References 02 - HCI-IS_2020_paper_20 Abstract 1 Uvod 2 Državni proračun 2.1 Struktura proračuna 2.2 Dostopnost podatkov 2.3 Obstoječe vizualizacije 3 Interaktivna vizualizacija s Sankeyevim diagramom 3.1 Sankeyev diagram 3.2 Uporaba interakcije 3.3 Izdelava vizualizacije 3.4 Razprava 4 Zaključki Zahvala 03 - HCI-IS_2020_paper_25 Abstract 1 Introduction 2 MFVoice Architecture 3 The MFVoice NLU Service 3.1 Application View Context Processing 3.2 Intent Recognition 3.3 Entity Recognition 4 Testing 4.1 Laboratory Testing Set-up and Results 4.2 Real-life Testing Set-up and Results 5 Discussion 6 Conclusion Acknowledgments 04 - HCI-IS_2020_paper_4 05 - HCI-IS_2020_paper_11 Abstract 1 Introduction 2 Re-imagining music and the music interface 3 Design Scenarios 4 Conclusion References 06 - HCI-IS_2020_paper_23 07 - HCI-IS_2020_paper_3 Abstract 1 Uvod 2 Pregled področja 3 Opis sistema 4 Raziskava 5 Rezultati in razprava 6 Zaključek Literatura 08 - HCI-IS_2020_paper_14 09 - HCI-IS_2020_paper_15 Abstract 1 Introduction 2 System design 3 Prototype implementation Floor plane detection Point cloud processing RGB optimization Rendering 4 Prototype game 5 Conclusions References 10 - HCI-IS_2020_paper_18 Abstract 1 Uvod 1.1 Vrste anamorfoz 2 Motivacija 3 Sorodna dela 4 Oprema 5 Perspektivna anamorfoza na neravno površino 6 Rezultati in zaključek Zahvala 11 - HCI-IS_2020_paper_5 12 - HCI-IS_2020_paper_17 Abstract 1 Introduction 2 Recording and encoding 3 Reproducing the sound filed 4 The Ambisonics system in use 5 Acknowledgments References 13 - HCI-IS_2020_paper_10 Abstract 1 Introduction 2 Methods Study design Apparatus Participants Procedure Analysis 3 Results 4 Discussion 5 Conclusions Acknowledgments References 12 - Index - H Blank Page Blank Page Blank Page Blank Page 05 - HCI_IS_2020_Paper_Jordan_Nuwan (1).pdf Abstract 1 Introduction 2 Re-imagining music and the music interface 3 Design Scenarios 4 Conclusion References Blank Page