Zbornik 21. mednarodne multikonference

INFORMACIJSKA DRUŽBA - IS 2018

Zvezek G

Proceedings of the 21st International Multiconference

INFORMATION SOCIETY - IS 2018

Volume G

Sodelovanje, programska oprema in storitve

v informacijski družbi

Collaboration, Software and Services

in Information Society

Uredil / Edited by

Marjan Heričko

http://is.ijs.si

8.–12. oktober 2018 / 8–12 October 2018

Ljubljana, Slovenia





Zbornik 21. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2018

Zvezek G





Proceedings of the 21st International Multiconference

INFORMATION SOCIETY – IS 2018

Volume G





Sodelovanje, programska oprema in storitve

v informacijski družbi

Collaboration, Software and Services

in Information Society





Uredil / Edited by



Marjan Heričko





http://is.ijs.si





8.–12. oktober 2018 / 8–12 October 2018

Ljubljana, Slovenia

Urednik:





Marjan Heričko

University of Maribor

Faculty of Electrical Engineering and Computer Science





Založnik: Institut »Jožef Stefan«, Ljubljana

Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak

Oblikovanje naslovnice: Vesna Lasič





Dostop do e-publikacije:

http://library.ijs.si/Stacks/Proceedings/InformationSociety





Ljubljana, oktober 2018





Informacijska družba

ISSN 2630-371X



Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni

knjižnici v Ljubljani

COBISS.SI-ID=21853462

ISBN 978-961-264-141-2 (pdf)





PREDGOVOR MULTIKONFERENCI

INFORMACIJSKA DRUŽBA 2018

Multikonferenca Informacijska družba (http://is.ijs.si) je z enaindvajseto zaporedno prireditvijo osrednji srednjeevropski dogodek na področju informacijske družbe, računalništva in informatike. Letošnja prireditev se ponovno odvija na več lokacijah, osrednji dogodki pa so na Institutu »Jožef Stefan«.

Informacijska družba, znanje in umetna inteligenca so še naprej nosilni koncepti človeške civilizacije. Se bo neverjetna rast nadaljevala in nas ponesla v novo civilizacijsko obdobje ali pa se bo rast upočasnila in začela stagnirati? Bosta IKT in zlasti umetna inteligenca omogočila nadaljnji razcvet civilizacije ali pa bodo demografske, družbene, medčloveške in okoljske težave povzročile zadušitev rasti? Čedalje več pokazateljev kaže v oba ekstrema

– da prehajamo v naslednje civilizacijsko obdobje, hkrati pa so notranji in zunanji konflikti sodobne družbe čedalje težje obvladljivi.

Letos smo v multikonferenco povezali 11 odličnih neodvisnih konferenc. Predstavljenih bo 215 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica, ki se ponaša z 42-letno tradicijo odlične znanstvene revije.

Multikonferenco Informacijska družba 2018 sestavljajo naslednje samostojne konference:

 Slovenska konferenca o umetni inteligenci

 Kognitivna znanost

 Odkrivanje znanja in podatkovna skladišča – SiKDD

 Mednarodna konferenca o visokozmogljivi optimizaciji v industriji, HPOI

 Delavnica AS-IT-IC

 Soočanje z demografskimi izzivi

 Sodelovanje, programska oprema in storitve v informacijski družbi

 Delavnica za elektronsko in mobilno zdravje ter pametna mesta

 Vzgoja in izobraževanje v informacijski družbi

 5. študentska računalniška konferenca

 Mednarodna konferenca o prenosu tehnologij (ITTC)

Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM

Slovenija, Slovensko društvo za umetno inteligenco (SLAIS), Slovensko društvo za kognitivne znanosti (DKZ) in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in institucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju.

V letu 2018 bomo šestič podelili nagrado za življenjske dosežke v čast Donalda Michieja in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe bo prejel prof. dr. Saša Divjak. Priznanje za dosežek leta bo pripadlo doc. dr. Marinki Žitnik. Že sedmič podeljujemo nagradi »informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo. Limono letos prejme padanje državnih sredstev za raziskovalno dejavnost, jagodo pa Yaskawina tovarna robotov v Kočevju.

Čestitke nagrajencem!



Mojca Ciglarič, predsednik programskega odbora

Matjaž Gams, predsednik organizacijskega odbora



i

FOREWORD - INFORMATION SOCIETY 2018

In its 21st year, the Information Society Multiconference (http://is.ijs.si) remains one of the leading conferences in Central Europe devoted to information society, computer science and informatics. In 2018, it is organized at various locations, with the main events taking place at the Jožef Stefan Institute.

Information society, knowledge and artificial intelligence continue to represent the central pillars of human civilization. Will the pace of progress of information society, knowledge and artificial intelligence continue, thus enabling unseen progress of human civilization, or will the progress stall and even stagnate? Will ICT and AI continue to foster human progress, or will the growth of human, demographic, social and environmental problems stall global progress? Both extremes seem to be playing out to a certain degree – we seem to be transitioning into the next civilization period, while the internal and external conflicts of the contemporary society seem to be on the rise.

The Multiconference runs in parallel sessions with 215 presentations of scientific papers at eleven conferences, many round tables, workshops and award ceremonies. Selected papers will be published in the Informatica journal, which boasts of its 42-year tradition of excellent research publishing.

The Information Society 2018 Multiconference consists of the following conferences:

 Slovenian Conference on Artificial Intelligence

 Cognitive Science

 Data Mining and Data Warehouses - SiKDD

 International Conference on High-Performance Optimization in Industry, HPOI

 AS-IT-IC Workshop

 Facing demographic challenges

 Collaboration, Software and Services in Information Society

 Workshop Electronic and Mobile Health and Smart Cities

 Education in Information Society

 5th Student Computer Science Research Conference

 International Technology Transfer Conference (ITTC)

The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, Slovenian Artificial Intelligence Society (SLAIS), Slovenian Society for Cognitive Sciences (DKZ) and the second national engineering academy, the Slovenian Engineering Academy (IAS). On behalf of the conference organizers, we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews.

For the sixth year, the award for life-long outstanding contributions will be presented in memory of Donald Michie and Alan Turing. The Michie-Turing award will be given to Prof. Saša Divjak for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, an award for current achievements will be given to Assist. Prof. Marinka Žitnik. The information lemon goes to decreased national funding of research. The information strawberry is awarded to the Yaskawa robot factory in Kočevje. Congratulations!



Mojca Ciglarič, Programme Committee Chair

Matjaž Gams, Organizing Committee Chair



ii





KONFERENČNI ODBORI

CONFERENCE COMMITTEES



International Programme Committee

Organizing Committee

Vladimir Bajic, South Africa

Matjaž Gams, chair

Heiner Benking, Germany

Mitja Luštrek

Se Woo Cheon, South Korea

Lana Zemljak

Howie Firth, UK

Vesna Koricki

Olga Fomichova, Russia

Mitja Lasič

Vladimir Fomichov, Russia

Blaž Mahnič

Vesna Hljuz Dobric, Croatia

Jani Bizjak

Alfred Inselberg, Israel

Tine Kolenik

Jay Liebowitz, USA



Huan Liu, Singapore



Henz Martin, Germany

Marcin Paprzycki, USA

Karl Pribram, USA

Claude Sammut, Australia

Jiri Wiedermann, Czech Republic

Xindong Wu, USA

Yiming Ye, USA

Ning Zhong, USA

Wray Buntine, Australia

Bezalel Gavish, USA

Gal A. Kaminka, Israel

Mike Bain, Australia

Michela Milano, Italy

Derong Liu, USA

Toby Walsh, Australia





Programme Committee

Franc Solina, co-chair

Matjaž Gams

Vladislav Rajkovič

Viljan Mahnič, co-chair

Marko Grobelnik

Grega Repovš

Cene Bavec, co-chair

Nikola Guid

Ivan Rozman

Tomaž Kalin, co-chair

Marjan Heričko

Niko Schlamberger

Jozsef Györkös, co-chair

Borka Jerman Blažič Džonova

Stanko Strmčnik

Tadej Bajd

Gorazd Kandus

Jurij Šilc

Jaroslav Berce

Urban Kordeš

Jurij Tasič

Mojca Bernik

Marjan Krisper

Denis Trček

Marko Bohanec

Andrej Kuščer

Andrej Ule

Ivan Bratko

Jadran Lenarčič

Tanja Urbančič

Andrej Brodnik

Borut Likar

Boštjan Vilfan

Dušan Caf

Mitja Luštrek

Baldomir Zajc

Saša Divjak

Janez Malačič

Blaž Zupan

Tomaž Erjavec

Olga Markič

Boris Žemva

Bogdan Filipič

Dunja Mladenič

Leon Žlajpah

Andrej Gams

Franc Novak



iii





iv





KAZALO / TABLE OF CONTENTS



Sodelovanje, programska oprema in storitve v informacijski družbi / Collaboration, Software and

Services in Information Society ................................................................................................................. 1

PREDGOVOR / FOREWORD ....................................................................................................................... 3

PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ........................................................................... 5

Self-Assessment Tool For Evaluating Sustainability Of Ict In Smes / Soini Jari, Leppäniemi Jari, Sil berg

Pekka ........................................................................................................................................................ 7

Reference Standard Process Model For Farming To Support The Development Of Applications For

Farming / Rupnik Rok ............................................................................................................................11

Semiotics Of Graphical Signs In Bpmn / Kuhar Saša, Polančič Gregor ....................................................15

Knowledge Perception Infuenced By Notation Used For Conceptual Database Design / Kamišalić Aida,

Turkanović Muhamed, Heričko Marjan, Welzer Tatjana ........................................................................19

The Use Of Standard Questionnaires For Evaluating The Usability Of Gamfication / Rajšp Alen, Kous

Katja, Beranič Tina .................................................................................................................................23

Analyzing Short Text Jokes From Online Sources With Machine Learning Approaches / Šimenko Samo,

Podgorelec Vili, Karakatič Sašo .............................................................................................................27

A Data Science Approach To The Analysis Of Food Recipes / Heričko Tjaša, Karakatič Sašo,

Podgorelec Vili ........................................................................................................................................31

Introducing Blockchain Technology Into A Real-Life Insurance Use Case / Vodeb Aljaž, Tišler Aljaž,

Chuchurski Martin, Orgulan Mojca, Rola Tadej, Unger Tea, Žnidar Žan, Turkanović Muhamed ..........35

A Brief Overview Of Proposed Solutions To Achieve Ethereum Scalability / Podgorelec Blaž, Rek Patrik,

Rola Tadej ..............................................................................................................................................39

Integration Heaven Of Nanoservices / Révész Ádám, Pataki Norbert .......................................................43

Service Monitoring Agents For Devops Dashboard Tool / Török Márk, Pataki Norbert .............................47

Incremental Parsing Of Large Legacy C/C++ Software / Fekete Anett, Cserép Máté ...............................51

Visualising Compiler-Generated Special Member Functions Of C++ Types / Szalay Richárd, Porkoláb

Zoltán ......................................................................................................................................................55

How Does An Integration With Vcs Affect Ssqsa? / Popović Bojan, Rakić Gordana .................................59

Indeks avtorjev / Author index ......................................................................................................................63





v





vi



Zbornik 21. mednarodne multikonference

INFORMACIJSKA DRUŽBA – IS 2018

Zvezek G





Proceedings of the 21st International Multiconference

INFORMATION SOCIETY – IS 2018

Volume G





Sodelovanje, programska oprema in storitve

v informacijski družbi

Collaboration, Software and Services

in Information Society





Uredil / Edited by



Marjan Heričko





http://is.ijs.si





9. oktober 2018 / 9 October 2018

Ljubljana, Slovenia

1





2





PREFACE



This year, the Conference “Collaboration, Software and Services in Information Society” is

being organised for the eighteenth time as a part of the “Information Society” multi-conference.

As in previous years, the papers from this year's proceedings address actual challenges and best

practices related to the development of advanced software and information solutions as well as

collaboration in general.



Information technologies and the field of Informatics have been the driving force of innovation

in business, as well as in the everyday activities of individuals for several decades. Blockchain

technology, Big Data, intelligent solution, reference models, open standards, interoperability

and the increasing responsiveness of IS/IT experts are leading the way to the development of

intelligent digital service platforms, innovative business models and new ecosystems where not

only partners, but also competitors are connecting and working together. On the other hand,

quality assurance remains a vital part of software and ICT-based service development and deployment. The papers in these proceedings provide a better insight and/or propose solutions

to challenges related to:

-

Self-Assessment of Sustainability of ICT in SMEs;

-

Ontology-based knowledge sharing on BPMN graphical signs using semiotics;

-

Influence of notations used for conceptual design on knowledge perception;

-

Application of machine learning techniques to obtain new knowledge;

-

Establishment of domain specific reference models;

-

Introduction of Blockchain technology into real-life use cases;

-

Architectural design proposals for ensuring scalability of Blockchain platforms;

-

Application of usability questionnaires when evaluating gamification and serious

games

-

Visualization, analysis and comprehension of complex software systems;

-

Continuous software development, integration and delivery;

-

Integration of source code repositories and QA tools.



We hope that these proceedings will be beneficial for your reference and that the information

in this volume will be useful for further advancements in both research and industry.





Prof. Dr. Marjan Heričko

CSS 2018 – Collaboration, Software and Services in Information Society Conference Chair





3

PREDGOVOR

Konferenco “Sodelovanje, programska oprema in storitve v informacijski družbi” organiziramo

v sklopu multikonference Informacijska družba že osemnajstič. Kot običajno, tudi letošnji

prispevki naslavljajo aktualne teme in izzive, povezane z razvojem sodobnih programskih in

informacijskih rešitev ter storitev kot tudi sodelovanja v splošnem.



Informatika in informacijske tehnologije so že več desetletij gonilo inoviranja na vseh področjih

poslovanja podjetij ter delovanja posameznikov. Tehnologija veriženja blokov, velepodatki,

inteligentne storitve, referenčni modeli, odprti standardi in interoperabilnost ter vedno višja odzivnost informatikov vodijo k razvoju inteligentnih digitalnih storitvenih platform in

inovativnih poslovnih modelov ter novih ekosistemov, kjer se povezujejo in sodelujejo ne le

partnerji, temveč tudi konkurenti. Napredne informacijske tehnologije in sodobni pristopi k razvoju, vpeljavi in upravljanju omogočajo višjo stopnjo avtomatizacije in integracije doslej

ločenih svetov, saj vzpostavljajo zaključeno zanko in zagotavljajo nenehne izboljšave, ki

temeljijo na aktivnem sodelovanju in povratnih informacijah vseh vključenih akterjev. Ob vsem

tem zagotavljanje kakovosti ostaja eden pomembnejših vidikov razvoja in vpeljave na

informacijskih tehnologijah temelječih storitev.



Prispevki, zbrani v tem zborniku, omogočajo vpogled v in rešitve za izzive na področjih kot so

npr.:

-

samoocenitev kakovosti in zrelosti IKT podpore v malih in srednje velikih

podjetjih;

-

deljenje znanja o grafičnih simbolih BPMN z uporabo semiotike;

-

vpliv notacije, uporabljene pri oblikovanju konceptualih modelov, na dojeti nivo

pridobljenega znanja;

-

uporaba tehnik strojnega učenja za ekstrakcijo znanja;

-

vzpostavitev domenskih referenčnih modelov;

-

vpeljava tehnologije veriženja blokov v realne primere uporabe;

-

arhitekturni predlogi za rešitev razširljivosti platform tehnologije veriženja blokov;

-

uporaba standardnih vprašalnikov uporabnosti pri vrednotenju učinkov vpeljave

igrifikacije in resnih iger;

-

vizualizacija, analiza in razumevanje kompleksnih programskih sistemov;

-

neprekinjen razvoj, integracija in dobava informacijskih rešitev;

-

integracija repozitorijev izvorne kode z orodji za zagotavljanje kakovosti.



Upamo, da boste v zborniku prispevkov, ki povezujejo teoretična in praktična znanja, tudi letos

našli koristne informacije za svoje nadaljnje delo tako pri temeljnem kot aplikativnem

raziskovanju.





prof. dr. Marjan Heričko

predsednik konference CSS 2018 – Collaboration, Software and Services in Information

Society Conference

4





PROGRAMSKI ODBOR / PROGRAM COMITTEE



Dr. Marjan Heričko

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Gabriele Gianini

University of Milano, Faculty of Mathematical, Physical and Natural Sciences

Dr. Hannu Jaakkola

Tampere University of Technology Information Technology (Pori)

Dr. Mirjana Ivanović

University of Novi Sad, Faculty of Science, Department of Mathematics and Informatics

Dr. Zoltán Porkoláb

Eötvös Loránd University, Faculty of Informatics

Dr. Stephan Schlögl

MCI Management Center Innsbruck, Department of Management, Communication & IT

Dr. Zlatko Stapić

University of Zagreb, Faculty of Organization and Informatics

Dr. Vili Podgorelec

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Maja Pušnik

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Muhamed Turkanović

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Boštjan Šumak

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Aida Kamišalić Latifić

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Gregor Polančič

University of Maribor, Faculty of Electrical Engineering and Computer Science

Dr. Luka Pavlič

University of Maribor, Faculty of Electrical Engineering and Computer Science





5





6





Self-Assessment Tool for Evaluating Sustainability of ICT

in SMEs



Jari Soini

Jari Leppäniemi

Pekka Sillberg

Tampere University of Technology

Tampere University of Technology

Tampere University of Technology

P.O. Box 300

P.O. Box 300

P.O. Box 300

FI-28101 Pori, Finland

FI-28101 Pori, Finland

FI-28101 Pori, Finland

jari.o.soini@tut.fi

jari.leppaniemi@tut.fi

pekka.sillberg@tut.fi





ABSTRACT

effects (e.g., electricity used by database servers, cloud servers,

The ever-increasing demand for ICT may compromise global

and network routers) that may not be consciously recognized [1,

objectives for emissions reduction if the aggregate effects of ICT

2, 3, 4]. Typically users are concerned only of the electricity

sustainability are not considered in the business digitalization

consumption of their own devices. The increasing demand for ICT

processes. In this paper, we present a free self-assessment tool

may, in fact, compromise the national objectives for emissions

enabling small and medium sized companies to evaluate the

reduction if the aggregate effects of ICT un-sustainability (Figure

utilized ICT in terms of sustainability. The ICT4S is a free e-

1) are not considered in the business digitalization processes.

service, in effect, a web-based self-assessment tool that was

developed in co-operation with Swiss Green IT SIG. The

assessment is currently divided into five categories of

sustainability questions. The categories are strategy, procurement

and recycling, practices, servers and network, and Green ICT. As

the result, organizations will gain a general understanding about

their state of sustainability, and practical suggestions for greater

eco-friendliness and sustainability of their ICT operations.

Categories and Subject Descriptors

•



Social and professional topics~Sustainability • Information

systems~Web applications

Figure 1. Environmental impacts of the ICT. [5]

In 2017, it was estimated that ICT accounted for 12% of the

General Terms

overall electricity consumption around the globe, and the

Measurement, Performance, Human Factors.

percentage is expected to increase twice as rapidly in the future

(by approximately 7% per year). Most of the energy is consumed

by networks, server rooms, and computing centers, (Figure 2) the

Keywords

efficiency of which should urgently be improved.

Sustainability, Assessment, ICT, Metrics, Web tools, E-services.

1. INTRODUCTION

The study presented in this paper aims at contributing to the

business activity digitalization of companies concerning the

reduction of carbon footprint and improvement of sustainability.

The paper introduces a self-assessment tool developed in the

research project that allows companies to self-evaluate the

sustainability of the ICT exploited in the organization. The

objective is to provide companies with concrete tools and

proposals for actions enabling more ecological procedures in the

organization. Additionally, the knowledge gained by using the



self-assessment tool allows companies to become generally more

Figure 2. Electricity consumption in the ICT sector. [6]

aware of the distribution of energy consumption in a modern ICT

As most of the electricity is still being generated by using fossil

infrastructure as well as the factors affecting sustainability of ICT.

fuels (Figure 3), the current ICT, and its heavy usage of electrical

energy, constitutes a global issue that is, unfortunately, little

2. BACKGROUND

known outside the expert field [7, 8]. This is partly due to the

There is a lot of evidence for significant benefits in terms of

users not perceiving the energy consumption of data systems

productivity and cost savings through the exploitation of ICT in

operating invisibly or in the background, but rather only noticing

the daily business activity of organizations. However, the

the consumption of the terminal device, which, in reality,

increasingly dependent use of ICT also brings about “invisible”

comprises a fraction of the overall energy consumption (Figure 2).

7





In 2015 through 2017, the TUT Pori Department implemented a

research project (AjaTar) with the aim of improving the

digitalization of organizations and companies while promoting a

low-carbon economy and sustainability. As part of the project, a

technology enabling organizations to self-evaluate their ICT

sustainability was developed, tested, and studied, aiming at

increasing general awareness of the distribution of electricity

consumption in a modern IT infrastructure in order for the

organizations to be able to make ICT-related decisions more

consciously than before.



The most notable added value of the project comprise an increase

in knowhow and knowledge promoting easy and lightweight

Figure 3. Electricity generation by source of energy. [9]

assessment of sustainability in terms of the organization’s

The problem of energy consumption due to the constantly

business activities and support processes, as well as a freely

increasing utilization of ICT is expected to further worsen

available tool for evaluating the sustainability of the ICT used in

(Figures 4a and 4b) through the amount of IoT devices and

the organization. By making the sustainability issues visible, the

automatic steering systems [10]. If the majority of the predicted

objective was to change attitudes and conventions related to the

IoT devices and information systems supporting them are

utilization of ICT in organizations: indeed, during the project,

implemented by the current practices, a near-catastrophic peak

several organizations distinctly declared their need to recognize

demand in terms of electricity will ensue. This, in turn, will result

practices promoting sustainable development as well as invest in

in an increase in emissions rather than their reduction.

an eco-friendly image.

3. ICT4S SELF-ASSESSMENT TOOL

During the last six years, the SEIntS research group from TUT

Pori Department has studied, developed, and piloted innovative

ICT solutions in cooperation with local organizations.

Additionally, SEIntS has collaborated with, for example Keio

University in Japan as well as with various information society

associations, for example, in Switzerland regarding the Green IT

and assessment of datacenters. As a result of the AjaTar project,

an open self-assessment website for organizations to quickly and

easily evaluate the ecological aspects of their ICT-related

operations was published at the end of 2017. The self-assessment

tool, developed in collaboration with Green IT SIG, a Swiss

Green IT information special interest group, is based on the

assumption that most of the ICT equipment used in an



organization is controllable, enabling the relatively easy

(a) Estimated growth.

adjustment of various functions. With the assessment tool

developed in the project, it is possible to increase knowledge

about the ecological aspects related to the use of ICT in

organizations and, thus, affect their operations and practices.

Based on the self-assessment, the organization is offered overall

evaluation of the current state and propositions for practices for

more sustainable ICT operations.

The self-assessment tool is freely available on a dedicated website

for sustainable ICT [11]. On the landing page of the tool (Figure

5) there is a welcoming message that explains the goals of the

assessment. There is also information of the privacy solution that

is used to guarantee all the information of the assessor’s company.

The privacy solution is based on the HTML5 local storage

concept. The assessment menu is currently divided into five

categories of sustainability questions and the information of the

organization to be evaluated. The categories are: strategy,



procurement and recycling, practices, servers and network, and

(b) Estimated standby energy consumption.

Green ICT.

Figures 4a and 4b. Estimated growth and impact of IoT

devices. [10]

Therefore, it is essential to establish instructions and an

assessment procedure to support system planning to improve

sustainability of ICT, and, thus, to promote methods for a low-

carbon economy.

8





After assessing all categories, the assessment tool calculates and

shows an evaluation of the given answers. The results are first

shown in a short form as in Figure 8, but users can explore the

results more carefully by selecting “Display detailed evaluation.”

The percentage and the color of the beams give a fast response of

the maturity of the different categories. In the case of 100% and a

green beam, the user can be satisfied with the sustainability state

of their company in that certain category. In the case of low

percentages (0 - 70%) or yellow or even red beams, the evaluation

shows that there is room for improvement. In such a case, the user

may find the detailed evaluation useful when planning concrete



actions for these improvements.

Figure 5. Welcoming the assessors.

Each of the categories comprises several questions and additional

text that explains the current issue to the assessor. While trying to

answer the questions, the assessor also receives background

information on the current topic. In Figures 6 and 7, the assessor

is facing questions concerning the strategy and practices at the

office.



Figure 8. Brief results of the assessment.

The detailed evaluation can be shown by selecting the

corresponding option in the user interface (see Figure 9). The user

is also able to print the results – hopefully in a sustainable way,

for example using an e-format such as Portable Document Format

(PDF).



Figure 6. Assessing the strategy.



Figure 9. Detailed results of the assessment.

The assessment tool has now been in use for several months.

Unfortunately, we do not have the exact statistics concerning the

usage of the tool. However, we piloted the tool with the assistance

of local companies before launching it last December. Since the

piloting groups were satisfied with the tool and because we

wanted to keep our promises regarding the privacy of the

assessments, we did not implement any logging system in it.

We have planned to enhance the tool with a new capability –

aiming to enable an easy way to estimate the carbon footprint of



the ICT usage in a company. It will not be fully scientific life

Figure 7. Assessing practices at the office category.

cycle assessment (LCA) but a practical version of such targeted to

9

non-professionals in the field of sustainability. The reasoning for

[3] Amsel, N., Ibrahim, Z., Malik, A. and Tomlinson, B. 2011.

this new capability is that we anticipate that by introducing easy

Toward sustainable software engineering: NIER track.

assessment tools we will be able to raise the awareness of

published in 33rd International Conference on Software

companies in terms of sustainability issues and thus help them to

Engineering (ICSE), 21-28 May 2011, Honolulu, USA.

develop their business processes toward a sustainable state.

[4] Baliga, J., Hinton, K., Ayre, R. and Tucker, R.S. 2009.

4. RESULTS AND FUTURE WORK

Carbon footprint of the internet. Journal of Australia, vol.

59, no. 1, 5.1-5.14.

This paper presented the ICT4S self-assessment tool enabling

companies and other organizations to evaluate the utilized ICT in

[5] Hilty, L. and Aebischer, B. (eds.). 2015. ICT Innovations for

terms of a low-carbon economy and sustainability and thus

Sustainability. Advances in Intelligent System and

improve their image as well as resource efficiency. As the result,

Computing 310, Springer International Publishing,

organizations will gain a general understanding of the current

Switzerland.

sustainability state of their ICT and practical suggestions for more

[6] Corcoran, A. and Andrae, A. 2013. Emerging Trends in

eco-friendly and sustainable operations.

Electricity Consumption for Consumer ICT. Retrieved

The role of the TUT Pori Unit was to function as a producer and

August 22, 2018 from https://www.researchgate.net/profile/

facilitator of new knowledge. The applied project aimed at

Anders_Andrae/publication/255923829_Emerging_Trends_i

contributing to the business development with TUT Pori Unit

n_Electricity_Consumption_for_Consumer_ICT/

acting as a distributor of knowledge and knowhow as well as an

[7] Pickavet, M., Vereecken, W., Demeyer, S., Audenaert, P.,

innovator. Within the project, the accumulation of diverse energy-

Vermeulen, B., Develder, C., Colle, D., Dhoedt, B. and

related knowhow and knowledge and exploitation of sustainable

Demeester, P. 2008. Worldwide energy needs for ICT: The

solutions of ICT in organizations were successfully implemented.

rise of power-aware networking. In proceedings of 2nd

Further development is planned to be realized in the ICT4LC

International Symposium on Advanced Networks and

project launched at the beginning of 2018. It focuses on

Telecommunication Systems, 1-3.

examining contemporary information processing that is based on

[8] Lambert, S., and Van Heddeghem, W. 2012. Worldwide

mobile and ‘thin clients’ as well as the increasing utilization rate

electricity consumption of communication networks. Optics

of information networks and cloud computing. The new project

Express, vol. 20, no. 26, 513-524.

explores tools for assessing the energy efficiency of business

[9] OECD Factbook 2014: Economic, Environmental and Social

activities and support processes as well as planning procedures of

Statistics. Retrieved August 27, 2018 from

business processes, promoting responsible and sustainable

http://dx.doi.org/10.1787/888933025499

utilization of ICT in organizations.

[10] International Energy Agency. 2016. Energy Efficiency of the

5. ACKNOWLEDGMENTS

Internet of Things, Technology and Energy Assessment

Our thanks to Niklaus Meyer and Beat Koch from Swiss Green IT

Report. Prepared for IEA 4E EDNA. Retrieved August 27,

SIG for collaboration.

2019 from https://www.iea-4e.org/document/384/energy-

efficiency-of-the-internet-of-things-technology-and-energy-

assessment-report

6. REFERENCES

[11] Tampere University of Technology. 2017. ICT4S Self

[1] Hilty, L., Arnfalk, P., Erdmann, L., Goodman, J., Lehmann,

Assessment. Retrieved August 27, 2018 from https://green-

M., Wager, A.P. 2006. The relevance of information and

ict.fi/arviointi/?lang=en

communication technologies for environmental sustainability

– A prospective simulation study. Environmental Modelling



& Software 2006, vol. 21, issue 11, 1618-1629.



[2] Hilty, L. 2008. Information technology and sustainability:

Essays on the relationship between ICT and sustainable

development. Books on Demand, Norderstedt.



10





Reference Standard Process Model for Farming to Support

the Development of Applications for Farming



Rok Rupnik



Faculty of Computer and Information



Science



University of Ljubljana

Ljubljana, Slovenia

+386 1 479 8266

rok.rupnik@fri.uni-lj.si



ABSTRACT

have managed to define so far: Domains, processes and elements

The paper introduces the idea and the concepts of a Reference

of process description. We also introduce the current list of

Standard Process Model (RSPMF) which are based on the concepts

processes and domains.

of COBIT, an IT governance framework used worldwide. Our

The structure of the paper is as follows. The second chapter

research on RSPMF is focused in two directions. First, RSPMF is

introduces the EU funded project AgroIT, during which the idea for

aimed at becoming a support for Product Managers in software

the Reference Standard Process Model arose. Only aspects of the

companies developing software products or IoT systems. Namely,

project relevant for the content of this paper are introduced. The

each process in RSPMF is described through the following

third chapter introduces key findings from the AgroIT project

components: Process goals, process metrics, KPI’s (Key

which led to the idea of RSPMF. To support the idea of RSPMF the

Performance Indicators) and process activities, Second, RSPMF is

COBIT framework for IT governance is also introduced, since

aimed to help managers or owners of bigger farms in farm

many concepts of RSPMF are taken from the COBIT framework.

management. The paper introduces research in the progress state of

The fourth chapter introduces the RSPMF, its concepts, draft list of

our research.

domains and their processes, and the methodology to facilitate the

sustainability of RSPMF. The last chapter contains the conclusion

Categories and Subject Descriptors

and directions for future work on the RSPMF.

D.2.2 [Requirements/Specifications]: Tools.

2. EXPERIENCE GAINED IN THE AgroIT

General Terms

PROJECT

Farming, Standardization, Process model.

AgroIT was an EU funded project covering various previously

mentioned aspects and problems in today’s implementation of IT

Keywords

and IoT in farming [5], [6]. First, the project included the

Standard Process Model, COBIT, Transformation of model.

implementation of ERP systems for farming: A traditional ERP

1. INTRODUCTION

system for small and medium enterprises which, additionally, also

has modules for livestock, fruit growing, winery, etc. [7]. This area

In recent years, farming has become an area with extensive need

of farm management was covered, which was the subject of several

for the use of information systems and IoT technologies [1]. The

papers in recent years [8][1], [2], [6], [7], [9], [10]. Second, the

experience gained in an EU funded project has revealed that

project included the implementation of a decision support system

software companies have diverse and unequal knowledge and

based on advanced methods to support decision processes in

understanding of farming processes, activities within processes and

farming [8]. This way, the area of the use of decision support within

metrics. This causes a problem when software products and IoT

farm management was covered [1], [6]. Third, the project included

systems need to be integrated. There are many software products

the implementation of IoT systems where various sensors were

and IoT systems on the market today, but each of them covers a

used to collect data about several measurements [2], [11], [12].

quite narrow functional area and, for the treason the integration, is

Having (a lot of) data available is the basis for farm management

simply a necessity [2].

and operations of farms [13]. Fourth, the project also covered the

The Reference Standard Process Model is one way to help Product

implementation of the cloud integration platform. All applications

Managers at software companies in removing the gap of diverse

and IoT systems were integrated through the cloud integration

and unequal knowledge and understanding of farming processes,

platform to facilitate data exchange between them [6], [12], [14].

activities within processes and metrics. The reference model can

Six software companies (they were called software partners during

become a common denominator, a kind of Esperanto, as a

the project) cooperated in the AgroIT project with their software

knowledge base for the development of software products and IoT

products: Applications, IoT systems and the cloud integration

systems for farming. The reference model, on the other hand, will

platform. Each software company “contributed” their product to the

also help farm managers and owners in farm management.

project and, during the project, software products were improved

We built and designed a Reference Standard Process Model for

significantly, i.e. upgraded and extended. They were also improved

Farming (RSPMF) based on the idea and concepts of the COBIT

implicitly through integrations between each other.

framework, which is defined for the area of IT governance [3], [4].

For the pilot use of integrated software products and IoT systems

This paper introduces the research in progress and the concepts we

several pilot projects were organised in 5 EU countries by pilot

11



partners. Pilot partners did not do software implementation in the

Figure 1. COBIT meta model [3]

project, but supported pilot farms in the use of software products.

A detailed explanation of the schema, i.e. a detailed explanation of

For that reason, pilot partners were organisations with extensive

the concepts and relations between them is beyond the scope of this

knowledge in agriculture and experience in consulting for farming.

paper.

3. KNOWLEDGE OF FARMING FOR

3.2 The idea of the Standard Process Model

IMPLEMENTATION OF SOFTWARE

for Farming

PRODUCTS AND IoT SYSTEMS FOR

The idea and concept of the previously introduced COBIT

FARMING

framework and the problems based on the diversity of knowledge

of partners in the project initiated the idea of a Standard Process

Improving software products and IoT systems was based on and

Model for farming. COBIT is,, therefore based on various concepts,

extending existing functionalities of software products and IoT

and those concepts can be used and adapted in other areas as well,

systems and upgrading them with new ones. The key goal of the

not only in IT governance. The idea and concepts of COBIT were

project was to design functionalities which base on integration

already transferred and used in the governance of Flood

between software products and IoT systems. This means that a

Management [15] and Nursing [16].

software product also can use data from another software product

or IoT system.

The transfer of the idea and concepts of a particular Standard or

framework to another area, in this case the transfer of COBIT to the

During the analysis and design phase it has become apparent that

area of farming, does not mean a one-to-one transfer. Some

software partners have diverse and unequal knowledge and

concepts of source area (in this case, IT governance), might not be

understanding of farming processes, activities within processes and

relevant or have any sense in the destination area (in this case

metrics. The gap was even bigger when compared to the knowledge

farming). For this reason, a successful and significant transfer with

and understanding of the pilot partners.

useful outcome can only be achieved through:

The diversity mentioned, and having the expertise of COBIT, has,



Good understanding of the idea and concepts of the

step-by-step, led to the idea of transferring the idea of COBIT to be

framework of the source area (in this case COBIT),

used for farming [3], [4].



Extensive knowledge and experience on the destination

area:

Processes

and

their

activities,

metrics,

3.1 COBIT framework for IT governance

responsibilities, rules, etc.

COBIT has, in recent years, become a de-facto Standard for IT

governance in companies and organisations. COBIT defines a set

4. REFERENCE STANDARD PROCESS

of generic processes (IT processes) for the management of IT. For

MODEL FOR FARMING (RSPMF)

each IT process the following is defined: Process inputs and

As can be concluded based on the previous discussion, we designed

outputs, goals of the process, key process activities, metrics of the

RSMPF on the idea and concepts of COBIT 4.1 [3]. In the literature

process (performance measures), and levels of process maturity

we so far haven’t found any paper representing a Standard Process

(maturity model) [3]. The development of COBIT has been

Model for Farming.

progressing since 1996, from version 1 to the current version 5.

COBIT is the result of several working groups of highly

4.1 The concepts of RSMPF

experienced experts as coordinated work owithin ISACA, which is

Processes are divided on three hierarchical levels which are called

an international professional association focused on IT governance.

domains: Govern and Monitor (GM), Plan and Manage (PM) and

COBIT is defined as a process model which divides IT into four

Implement and Execute (IE).

domains: Plan and Organise, Acquire and Implement, Deliver and

Farming has several branches: Livestock, fruit growing,

Support, and Monitor and Evaluate). Domains have altogether 34

agriculture, winery (viticulture), etc. RSMPF enables modular

defined IT processes.

definition of processes for every branch of farming. For the Govern

The schema below shows a meta model of COBIT and all of its

and Monitor domain only common processes are defined, for the

concepts. The schema reveals the business orientation of COBIT:

other two domains, a process module is also added for every branch

The aim of defining the COBIT framework is to align IT and

of farming. For now, only the process module for livestock is

business where business goals dictate IT goals [3], [4].

defined for domains PM and IE.

Each process is described through the following components:

Process goals, process metrics, KPI’s ( Key Performance

Indicators) and process activities.

Each process has a unique code, which reveals the domain to which

the process belongs and the process module. The code of Common

Processes is CP and the code for LiveStock is LS.

The aim of defining RSPMF is not to prevail over any existing

Standard for farming. RSMPF is defined and structured to be

opened and enables the reference to any existing Standard in the

process description section.

4.2 Target groups and aimed benefits of

RSPMF

When designing a Standard Process Model, regardless of the area

it is intended for, the group designing it must first decide which are



12

the target groups who will use the model, and what should be the



GM.09: Implement and monitor implementation of

benefits of its use. For target groups this should become a

strategy

Reference Standard Process Model.

Plan and Manage (PM) – Common Processes (CP):



We designed RSPMF for the following groups:



PM.CM.01: Manage implementation of strategy and

investments



Product Managers in software companies which



PM.CM.02: Manage budget and cost

develop software products and IoT systems for farming.



PM.CM.03: Manage financials

As can be revealed from our discussion, we noticed the



PM.CM.04: Manage risks

need for a Standard Process Model,







PM.CM.05: Manage human resources



Managers and owners of bigger farms: COBIT is the



PM.CM.06: Manage buildings and security

first place aimed at bigger companies. Each Standard



PM.CM.07: Manage products sales

Process Model should, in our opinion, be sized for bigger



institutions

(organisations

in

general).

Smaller



PM.CM.08: Manage suppliers



institutions then use it to the extent for which they



PM.CM.09: Manage sub-contractors

believe is suitable for them. We followed this approach



PM.CM.10: Manage certifications

in the designing of the RSPMF.



PM.CM.11: Manage environment and protection



PM.CM.12: Manage energy consumption

The aimed benefits for Product Managers are as follows:



PM.CM.13: Manage energy production



Based on experience from the AgroIT project, we can



PM.CM.14: Manage farming machinery

state that there is a diversity of farming knowledge of



PM.CM.15: Manage equipment

Product Managers in software companies. RSPMF will



PM.CM.16: Manage IT

become a common denominator, a kind of Esperanto as



PM.CM.17: Manage information system

a knowledge base for the development of software



PM.CM.18: Manage innovations

products and IoT systems for farming,







PM.CM.19: Manage investment projects



We expect the integrations between various software



PM.CM.20: Manage needs and expectations

products and IoT systems to be more straightforward

and “softer” if



PM.CM.21: Manage knowledge and legislation

Product Managers will base



functionalities on RSPMF.



PM.CM.22: Manage changes based on legislation

demands

We are designing RSPMF to reach several aimed benefits for



PM.CM.25: Manage changes based on IT and

managers and owners of bigger farms:

innovation



Knowledge and experience of farming experts and



PM.CM.26: Manage assets

academics will, step by step, be transferred to RSPMF.



PM.CM.27: Manage technical capacity

We could say that RSPMF introduces the best practices



PM.CM.28: Manage internal control

for farming,

Plan and Manage (PM) – LiveStock (LS):



RSPMF provides the best practice guidelines for



PM.LS.01: Manage animal sales

processes and their activities on farms. This helps



PM.LS.02: Manage animal purchases

managers ensure that the processes perform according



PM.LS.03: Manage animals` health and veterinary

to best practice,

service



Metrics and KPI’s are defined for processes. This helps



PM.LS.04: Manage animal welfare

managers to set goals and execute monitoring. This



PM.LS.05: Manage hygiene

lowers various risks,



PM.LS.06: Manage animal feeding and grazing



Managers can identify gaps in process execution and



PM.LS.07: Manage animal reproduction

monitoring. This helps them close the gaps identified



PM.LS.08: Manage animal breeding plan

and improve processes,

Implement and Execute (IE) – Common Processes (CP):



Managers can be better prepared for any auditing. If a



IE.CM.01: Perform internal control

particular audited farm will be “RSPMF compliant”,



IE.CM.02: Perform farm accounting

then this will increase the trust of auditors,







IE.CM.03: Perform maintenance of buildings



Not only managers, but also other personnel working



IE.CM.04: Perform employments and other Human

on farm can learn about processes, metrices and KPI’s.

Resource issues

4.3 Draft list of domains and their processes



IE.CM.05: Perform product sales

We already have defined a draft list of domains and their processes.



IE.CM.06: Perform purchases of equipment



IE.CM.07: Perform purchases of farming machinery

Govern and Monitor (GM):





IE.CM.08: Perform purchases and implementation of



GM.01: Define and maintain strategy



software products



GM.02: Ensure profitability





IE.CM.09: Perform asset maintenance



GM.03: Ensure risk governance





IE.CM.10: Perform purchases



GM.04: Ensure machinery and equipment governance



Implement and Execute (IE) – LiveStock (LS):



GM.05: Ensure IT and innovation governance







IE.LS.01: Perform animal feeding



GM.06: Ensure compliance with legislation







IE.LS.02: Perform animal movements and grazing



GM.07: Enable external and internal control







IE.LS.03: Preform animal health checking and health



GM.08: Manage and monitor process definition and

treatment

change



IE.LS.04: Perform sales of animals

13



IE.LS.05: Perform purchasing of animals

[5]

L. Ruiz-Garcia and L. Lunadei, “The role of RFID in



IE.LS.06: Perform animal selection

agriculture: Applications, limitations and challenges,”



IE.LS.07: Perform animal reproduction

Comput. Electron. Agric. , vol. 79, no. 1, pp. 42–50, Oct.

2011.

4.4 Concepts of methodology to facilitate the

[6]

A. Kaloxylos et al. , “Farm management systems and the

sustainability of RSPMF

Future Internet era,” Comput. Electron. Agric. , vol. 89, no.

COBIT was first issued in 1996, and this means that it has been

null, pp. 130–144, Nov. 2012.

going through evolution, where experts from the whole world

participated. COBIT is now version 5, but had several versions

[7]

C. N. Verdouw, R. M. Robbemond, and J. Wolfert, “ERP in

before that [3], [4].

agriculture: Lessons learned from Dutch horticulture,”

Comput. Electron. Agric. , vol. 114, pp. 125–133, 2015.

To facilitate the sustainability of RSPMF, we plan a similar

approach. We have plan to issue the first version in a year or year

[8]

R. Rupnik, M. Kukar, P. Vračar, D. Košir, D. Pevec, and Z.

and a half. The first version will cover only livestock. We will form

Bosnić, “AgroDSS: A decision support system for

an international panel of experts of various profiles: Consultants,

agriculture and farming,” Comput. Electron. Agric. , no.

academics, Product Managers, farmers and government officials.

November 2017, 2018.

5. CONCLUSION AND FUTURE WORK

[9]

R. Nikkilä, I. Seilonen, and K. Koskinen, “Software

architecture for farm management information systems in

We have introduced the research in progress for the idea and

precision agriculture,” Comput. Electron. Agric. , vol. 70, no.

concepts of the Reference Standard Process Model for Farming.

2, pp. 328–336, Mar. 2010.

Our aim of the design of reference model is to improve the support

for managers and owners of bigger farms in farm management.

[10] C. G. Sørensen et al. , “Conceptual model of a future farm

Another aim is to facilitate Product Managers in development of

management information system,” Comput. Electron. Agric. ,

software products and IoT systems.

vol. 72, no. 1, pp. 37–47, Jun. 2010.

In midterm, we also want RSPMF to be suitable for government

[11] J. De Baerdemaeker, Precision Agriculture Technology and

and EU officials who are responsible for farming. At the moment,

Robotics for Good Agricultural Practices, vol. 46, no. 4.

we plan to add the concept of maturity levels of a process. The

IFAC, 2013.

maturity level of a process will show or indicate the level of detail

[12] J. Santa, M. A. Zamora-Izquierdo, A. J. Jara, and A. F.

and expertise with which a farm executes a process. This way, the

Gómez-Skarmeta,

“Telematic platform for integral

comparison of different farms will also be possible.

management of agricultural/perishable goods in terrestrial

We are aware that there are two phases of defining RSPMF: First,

logistics,” Comput. Electron. Agric. , vol. 80, no. null, pp. 31–

to define its concepts and structure; second, to put content in the

40, Jan. 2012.

structure of processes` descriptions. Those two phases overlap,

[13] J. W. Jones et al. , “Toward a new generation of agricultural

because, while inserting the content, for sure some ideas to change

system data, models, and knowledge products: State of

structure will appear. The definition of concepts and the structure

agricultural systems science,” Agric. Syst. , vol. 155, pp. 269–

is our research mission for the next 12 months, that is how we plan

288, 2017.

it.

[14] J. W. Kruize, R. M. Robbemond, H. Scholten, J. Wolfert, and

6. REFERENCES

a. J. M. Beulens, “Improving arable farm enterprise

[1]

A. Kaloxylos et al. , “A cloud-based Farm Management

integration - Review of existing technologies and practices

System: Architecture and implementation,”

from a farmer’s perspective,”

Comput.

Comput. Electron. Agric. , vol.

Electron. Agric. , vol. 100, pp. 168–179, Jan. 2014.

96, pp. 75–89, 2013.

[2]

S. Fountas et al. , “Farm management information systems:

[15] M. Othman, M. Nazir Ahmad, A. Suliman, N. Habibah

Current situation and future perspectives,”

Arshad, and S. Maidin, “COBIT principles to govern flood

Comput.

management,”

Electron. Agric. , vol. 115, pp. 40–50, 2015.

Int. J. Disaster Risk Reduct. , vol. 9, 2014.

[3]

ISACA, COBIT 4.1. 2007.

[16] M. Burnik, “The Approach for the Presentation of Nursing

Processes,” University of Primorska, 2011.

[4]

ISACA, COBIT5: Enabling Processes. 2012.



14





Semiotics of graphical signs in BPMN

Saša Kuhar

Gregor Polančič

Faculty of Electrical Engineering and Computer Science

Faculty of Electrical Engineering and Computer Science

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

sasa.kuhar@um.si

gregor.polancic@um.si





ABSTRACT

RQ2: Can we categorize graphical signs from BPMN according

The terminology of graphical signs (e.g. icons, symbols,

to semiotic studies?

pictograms, markers etc.) is ambiguous in academic articles. This

We organized the remainder of the article as follows. The next

is the same with articles focusing on graphics in business

chapter presents the theoretical background. Chapters 3 and 4

notations, although concepts of graphical elements in notations

represent the main objective of this paper – answering the

are well defined. In semiotics, on the other hand, the concepts

research questions. The conclusion is given in the last chapter.

related to signs are defined in detail. In this paper, we examined

linguistic terms that are used for describing graphical elements in

2. BACKGROUND

BPMN specifications (BPMN being the de-facto Standard of

business notations), and related them to the terminology specified

2.1 Semiotics

in semiotics. We created a Sign ontology with BPMN graphical

Semiotics is the study of signs and symbols (not only visual) and

signs as ontology instances. The ontology can be used by

their use or interpretation. For the purpose of the terminology

researchers to share common knowledge about concepts of signs,

definition, we will sum the book of Daniel Chandler Semiotics:

symbols, icons, and indices, as well as the knowledge on BPMN

The basics [3], which offers a comprehensive explanation of the

graphical signs.

field, including many views of modern theoreticians. There are

two main traditions in contemporary semiotics: From Ferdinand

Categories and Subject Descriptors

de Saussure and Charles Sanders Peirce.

H.1.m

[Information

Systems]:

Models

and

Principles,

Saussure's model of signs consists of two parts: Signifier (the

Miscellaneous.

form that the sign takes) and signified (the concept to which it

refers). The sign is then the whole that results from the association

General Terms

of the signifier and the signified (Figure 1 on the left). For

Management, Documentation, Design, Languages, Theory.

Saussure, both signifier and signified take non-material form

rather than substance. Nowadays, common adoption of his model

Keywords

takes a more materialistic form, where the signifier is commonly



interpreted as the material that can be seen, heard, touched,

Business Process Model and Notation, BPMN, Semiotics,

smelled or tasted. Being concerned mostly with linguistics,

Ontologies, Graphical signs, icons.

Saussure stressed that the relationship between the signifier and

the signified is relatively arbitrary: There is no inherent, essential,

1. INTRODUCTION

transparent, self-evident or natural connection between the

Business process diagrams provide a graphical notation for

signifier and the signified – between the sound of a word and the

specifying business processes. Among many business notations,

concept to which it refers [3].

Business Process Model and Notation (BPMN) is known as

the de-facto Standard [1]. BPMN consists of execution semantics

and notation, the latter including graphical elements such as

shapes, arrows, icons, and labels. Those elements are all signs

where each has a defined meaning and represents a certain concept.

However, terminology for graphical elements (e.g. icon, sign, or



shape) is not used consistently among researchers in this domain.

If one, for example, wants to perform a literature search on icons

Figure 1: Saussure's model of signs on the left and Peircès

in BPMN, the term icon does not incorporate all linguistic terms

model of signs on the right

that different authors use in their articles (other words for icon can

Peirce, on the other hand, introduced a three-part model

be pictogram, symbol, sign, marker etc). Even in BPMN

consisting of: Representamen (the form which the sign takes,

specifications [2], those terms are not used uniquely, but with

also called “sign vehicle” or, in the Saussurean model, the

loosely defined synonyms.

signifier), interpretant (the sense made of the sign, or signified in

With this situation in mind we formulated the following research

Saussure's model). and object (something beyond the sign to

questions:

which it refers, also called the referent). In this model, the sign is

the unity of what is represented (the object), how it is represented

RQ1: What are the linguistic terms that are used in the BPMN

(the representamen) and how it is interpreted (the interpretant)

specification for graphical shapes, graphical icons, and other

(Figure 1 on the right). The term sign is often used loosely and

visual signs?

15

confused with signifier or representamen. However, the signifier

and other visual signs?) we examined the BPMN specifications

or representamen is the form in which the sign appears, whereas

and mapped the specifications’ terms to semiotics’ terms. In

the sign is the whole meaningful unity [3].

BPMN specifications the signs are denominated as follows: The

term BPMN element represents the term signified, the terms

2.1.1 Symbol, Index, Icon

shape, object, marker, indication, icon and depiction stand for

In addition to his sign model, Peirce offered a classification of

signifier. The answer to RQ1 and a detailed meaning of each

signs, based on the relationship between representamen and its

BPMN term is provided in Table 1.

object or its interpretant, or, in Sausurres’ terms, the relationship

Table 1: Linguistic terms used in BPMN specifications

between signifier and signified. Dependent upon the relationship

Semiotics’

BPMN

Detailed meanings in BPMN

being more arbitrary, directly connected, or more resembling,

terms

terms

specification

three types of signs are possible: Symbol, index, and icon

Signified

BPMN element Concepts in business notation

respectively.

Shape

Graphical element

SYMBOL represents a relationship where the signifier does not

Basic shape (e.g. circle

Object

resemble the signified, but which is arbitrary or conventional.

representing simple event)

The relationship must be agreed upon and learned, such as in

Signifier

Marker,

Graphical icon that can be

language (letters, words, phrases, and sentences), numbers, Morse

Indicator or

included in an object (e.g.

code, traffic lights or national flags.

Icon

message icon)

Depiction

Graphical example of the usage

INDEX denotes a relationship where the signifier is not arbitrary,

but connected directly (physically or causally) to the signified,

As we can observe from the Table above, many linguistic terms

which can be observed or inferred. An index indicates something

are used for signifier, some of which are not used consistently

(that is, necessarily, existent). Examples are natural signs (smoke,

(e.g. marker, indicator, and icon). The only term from semiotics

thunder, footprints), medical symptoms (pain, a rash, pulse-rate),

that is used in BPMN specifications is the term icon, that is used

measuring instruments (thermometer, clock), ‘signals’ (a knock on

to denote a graphical icon and stands for the term signifier.

a door, a phone ringing), recordings (a photograph, a film, video

shot), personal ‘trademarks’ (handwriting, catchphrases).

4. ONTOLOGY CONSTRUCTION

ICON represents a relationship where the signifier is perceived as

For the purpose of Ontology construction and answering RQ2

resembling or imitating the signified – being similar in possessing

(Can we categorize graphical signs from BPMN according to

some of its qualities, like a portrait, a cartoon, a scale-model,

semiotic studies?), we followed recommendations in Ontology

onomatopoeia, metaphors, sound effects in radio drama, a dubbed

Development 101: A Guide to Creating Your First Ontology [5].

film soundtrack and imitative gestures. [3]

The authors suggest taking the following 7 steps for ontology

creation: Step 1. Determine the domain and scope of the ontology,

2.1.2 Synonyms of terms

Step 2. Consider reusing existing ontologies, Step 3. Enumerate

The terminology from semiotics is used rarely in popular

important terms in the ontology, Step 4. Define the classes and the

language. The term sign in semiotics is frequently replaced by the

class hierarchy, Step 5. Define the properties of classes, Step 6.

term symbol in popular usage [3]. Also, several meanings of the

Define the facets of the slots, and Step 7. Create instances. Steps 4

term icon can be found in everyday language: a) To be iconic

and 5 are closely intertwined and can be executed simultaneously.

means that something or someone is recognized as famous, b) In

computing, an icon is a small image intended to signify a

4.1 Domain and scope of BPMN Sign

particular function to the user (in semiotic terms these are signs

ontology

which may be iconic, symbolic or indexical), c) Religious icons

For the domain definition, the authors [5] propose answering

represent sacred, holy images [3]. If not stated otherwise, we will

several questions. Our answers are provided below, after the

continue to use terms as defined in semiotics throughout this paper.

proposed questions.

2.2 Ontologies

What is the domain that the BPMN Sign ontology will cover?

Ontologies are explicit formal specifications of the terms in a

Signs in BPMN

domain and the relationships among them [4]. They define

What are we are going to use the ontology for?

common vocabulary and can, among other things, be used by

To share a common understanding of knowledge about signs

researchers, who need to understand and share the structure of

among researchers, and to be able to reuse and analyze domain

information in a domain [5]. Because of these reasons, we find

knowledge.

them appropriate for terminology clarification in the domain of

For what types of questions should the information in the

Graphical Signs in BPMN. Our research purpose is mainly

ontology provide answers?

definition of terms, so our ontology will, according to Obrst [6],

Definitions of concepts in semiotics and relationships among

be of the weak to moderately strong semantics, not intended to be

them, categorization of BPMN graphical signs according to

used for machine processing or machine interpretation (at least

semiotics’ concepts, and the frequency of occurrence of sign types

not at this stage of our research).

in BPMN.

3. LINGUISTIC TERMS IN BPMN

Who will use and maintain the ontology?

The ontology will be maintained and used by us and will be

SPECIFICATION

available for other interested researchers.

To answer the first RQ (What are the linguistic terms that are used

To determine the scope of the ontology, a list of competency

in the BPMN specification for graphical shapes, graphical icons,

questions can be used that ontology will be able to answer [5].

16



The competency questions we defined are listed next.

Also, over time, a mode can change. Originally signs were in part



iconic, in part indexical (primitive writing), and symbols come



What does the term icon mean?



into being by development out of other signs, particularly from



How do icons, indices, and symbols correlate?



icons [3].



Which type of sign (icon, index or symbol) is used most

in BPMN?



4.5 Sign Ontology construction



Are symbols always arbitrary, or can they convey a

certain degree of meaning?

With the utilization of the Protégé 5.2.0 software tool and

according to semiotic concepts and their relationships, we created

4.2 Reuse of existent ontology

simple Sign Ontology as follows. We created a class Sign (with

With a literature search we found no existing ontologies in the

disjoint subclasses Icon, Index and Symbol), a class Relationship

domain of signs or icons. However, we identified a Business

(with subclasses PrimaryRelationship and SecondaryRelationship),

Process Modelling Ontology (BPMO) that has been built

and a class BPMNElement (with subclasses BasicShape, Activity,

automatically, starting from the XML schemas contained in the

Event, Gateway, and Data). We also created 2 object properties:

BPMN 2.0 specifications from OMG [7]. It contains all the

hasRelationshipType (with subproperties hasPrimaryRelationshipType

BPMN elements and their relationships as defined BPMN

and hasSecondaryRelationshipType), and its inverse property

specifications. The class that is related most closely to our

definesModeOf (with subproperties definesPrimaryModeOf and

research domain (Graphical Signs) is DiagramElement and its

definesSecondaryModeOf). The range of hasPrimaryRelationshipType

subclasses (Figure 2). This class is, in BPMN specifications,

is the class Sign, and the domain is the class PrimaryRelationship.

defined under BPMN Diagram Interchange (BPMN DI) meta-

We then defined 3 instances, Arbitrary, Indicative and Similar,

model and schema for the purpose of the unambiguous rendering

and included them in the classes PrimaryRelationship and

of BPMN diagrams in different tools [2].

SecondaryRelationship. Next, we defined that, if a Sign has a

hasPrimaryRelationshipType property of value Similar, it is

included in the class Icon. Similarly, we defined classes Index

(with hasPrimaryRelationshipType property value Indicative) and

Symbol ( with hasPrimaryRelationshipType property value

Arbitrary).

4.6 BPMN graphical shapes as Instances in



Figure 2: DiagramElement class and its subclasses in BPMO,

Sign Ontology

visualized by the OntoGraf plugin for Protégé

To decide whether graphical signs in BPMN are of the mode icon,

As our focus in Sign Ontology is mainly on graphical signs that

index or symbol, we invited 5 BPMN experts to evaluate BPMN

are, as such, not contained in BPMO, we will start our own

signs and define one sign mode for each. We chose BPMN

ontology and, later, consider the options of merging both

experts as they are fully familiar with the concepts (signifieds) in

ontologies.

BPMN. Before the evaluation, the experts were acquainted with

concepts from semiotics. The results of the evaluation are given in

Table 2.

4.3 Definition of concepts in Sign Ontology

The next step in ontology creation is the enumeration of important

On six shapes, the experts agreed on the sign mode, thus defining

terms. We defined the concepts for BPMN sign ontology from

the primary relationship between signifier and signified. For other

semiotics (S ign, Icon, Index, and Symbol), and from BPMN

shapes, where experts had different opinions, the mode was

( BasicShape, Activity, Event, Gateway, and Data).

defined with the primary and the secondary relationship. The

mode that was defined most often by experts was set for the

4.4 Relationships among concepts

primary relationship, and the mode that ranked second in choices

was set for the secondary relationship.

For the definition of a hierarchy of classes and their properties, we

will next define the relationships among three types of signs,

As we can observe from Table 2, the majority of the signs were

again from semiotics.

specified as symbols (the primary relationship is arbitrary). 6

symbols were also the only signs where experts agreed fully on

At first sight, the relationship among the signifier and the

the sign mode. Furthermore, in all but one symbols, the secondary

signified (and, consequently, the types of signs) seems

mode was set as an index, and, similarly, the other way around; in

unambiguous, but that is not always the case. We should keep in

all indices, the secondary mode was set as a symbol. The

mind that signs denote concepts (not material objects), and each

consensus on the primary relationship was not possible for two

person has their own understanding of a certain concept in his or

signs (Script task and Data object), and on the secondary

her mind. Concepts cannot be represented precisely [8] therefore

relationship for one sign (Manual task). Thus, for the Script task

icons, for example, cannot be denoted simply as similar. They are

and the Data object, the primary relationship was not set, but two

defined by perceived similarity [3]. Also, as stated in [9], the

secondary relationships were set. For Manual task only the

process of sign-making is the process of the constitution of

primary relationship was set.

metaphor, and, therefore, symbols are never only arbitrary.

Within each type, signs vary in their degree of conventionality.

After the modes of signs were defined we included the signs into

Therefore, we must not speak of types of signs but of modes of

Sign Ontology. The ontology, including the instances, is shown in

relationships where the difference between signs lays in the

Figure 3. The figure represents classes as circles and relationships

hierarchy of their properties rather than in the properties

as lines connecting the circles. The size of the circle corresponds

themselves [3].

to the number of instances included in the class.

17





Table 2: Modes of BPMN signs

Figure 3: Sign Ontology with BPMN shapes rendered in the

NavigOwl plugin for ProtégéCONCLUSION

* Signifier

Signified

Secondary relationship

Primary relationship: Arbitrary (Symbol)

In this paper, we mapped the linguistic terms from semiotics to

linguistic terms regarding signs in BPMN specifications. We

Activity





found that, in BPMN specifications, many terms are used for the



Gateway



term signifier, some of which inconsistently.

5



Signal event



To correlate concepts from semiotics to BPMN graphical signs,



Multiple event



we developed the BPMN Sign Ontology based on definitions



Ad-hoc sub-process



from semiotics. We categorized each BPMN graphical sign in a



Complex gateway



mode that represents the relationship between signifier and

signified. The majority of the BPMN signs are of mode symbol,



Event

Indicative (index)

following by mode index. As the meaning of symbols needs to be



Parallel event

Indicative (index)

learned, this indicates a possible correlation with the principle of



Escalation event

Indicative (index)

Semantic transparency from [10]. Addressing this issue, we will,

4



Link event

Indicative (index)

in future work, examine our results further with those from [11]



Service task

Indicative (index)

and other related articles.



Inclusive gateway

Indicative (index)

Since the current study included only 5 experts in BPMN,



Parallel gateway

Indicative (index)

resulting in possible bias, empirical research with more users is



Error event

Indicative (index)

planned, as well as a thorough literature search. At this point, the



Send task

Indicative (index)

BPMN Sign Ontology can, in the BPMN domain, serve for

unambiguous knowledge definition and sharing.



Receive task

Indicative (index)

3



Business rule task

Indicative (index)

5. REFERENCES



Sub-process

Indicative (index)

[1]

M. Kocbek, G. Jošt, M. Heričko, and G. Polančič,



Exclusive gateway

Indicative (index)

“Business process model and notation: The current state of



Data object collection Similar (icon)

affairs,” Comput. Sci. Inf. Syst. , vol. 12, no. 2, pp. 509–539,

Primary relationship: Indicative (Index)

2015.



Conditional event

Arbitrary (symbol)

[2]

O.M.G., “Business Process Modeling Notation.” 2011.

[3]

D. Chandler and E. W. B. Hess-Lüttich, Semiotics the

4



Flow

Arbitrary (symbol)

Basics, Second Edi., vol. 35, no. 6. London: Routledge,



Cancel event

Arbitrary (symbol)

2007.



Data store

Arbitrary (symbol)

3

[4]

T. R. Gruber, “A translation approach to portable ontology



Compensation event

Arbitrary (symbol)

specifications,” Knowl. Acquis. , vol. 5, no. 2, pp. 199–220,

Primary relationship: Similar (Icon)

Jun. 1993.

4



Message event

Indicative (index)

[5]

N. F. Noy and D. L. McGuinness, “Ontology Development



Timer event

Indicative (index)

101: A Guide to Creating Your First Ontology,” Standford

3



User task

Arbitrary (symbol)

Knowl. Syst. Lab. Tech. Rep. , pp. 1–25, 2001.



Manual task

Not set

[6]

L. Obrst, H. Liu, R. Wray, and L. Wilson, “Ontologies for

Primary relationship: Not set

semantically interoperable electronic commerce,” IFIP Adv.



Inf. Commun. Technol. , vol. 108, pp. 325–333, 2003.



Script task

Similar/indicative (2*)



[7]

L. Cabral, B. Norton, J. Domingue, L. C. Kmi, B. Norton,



Data object

Similar/arbitrary (2*)

and J. Domingue, “The business process modelling

* - The number of experts who decided on this primary mode

ontology,” Proc. 4th Int. Work. Semant. Bus. Process

Manag. , pp. 9–16, 2009.

[8]

A. Fenk, “Symbols and icons in diagrammatic

representation,” Pragmat. Cogn. , vol. 6, no. 1–2, pp. 301–

334, 1998.

[9]

G. R. Kress and T. van Leeuwen, Reading Images (The

Grammar of Visual Design). London: Routledge, 1996.

[10] D. Moody, “The physics of notations: Toward a scientific

basis for constructing visual notations in software

engineering,” IEEE Trans. Softw. Eng. , vol. 35, no. 6, pp.

756–779, 2009.

[11] N. Genon, P. Heymans, and D. Amyot, “Analysing the

Cognitive Effectiveness of the BPMN 2.0 Visual Notation,”

in Journal of Visual Languages & Computing, vol. 22, no.

6, 2011, pp. 377–396.





18





Knowledge Perception influenced by Notation Used for

Conceptual Database Design

Aida Kamišalić

Muhamed Turkanović

Marjan Heričko

Faculty of Electrical

Faculty of Electrical

Faculty of Electrical

Engineering and Computer

Engineering and Computer

Engineering and Computer

Science

Science

Science

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

aida.kamisalic@um.si

muhamed.turkanovic@um.si marjan.hericko@um.si

Tatjana Welzer

Faculty of Electrical

Engineering and Computer

Science

University of Maribor

Maribor, Slovenia

tatjana.welzer@um.si

ABSTRACT

there are no researches that dealt with knowledge percep-

The paper presents an experimental study which examined

tion within the databases learning environment. In order

the influence of the notation used for conceptual design on

to examine the effectiveness of learning database fundamen-

students’ knowledge perception at higher educational study

tals, depending on the notation used for conceptual design,

level. The results demonstrate that students’ knowledge per-

we set-up a multi-level experimental study [7].

Different

ception is higher than actual knowledge throughout the en-

experimental instruments to evaluate the effectiveness of a

tire learning process and is correlated with the used nota-

teaching approach using Barker or Bachman notation for

tion.

conceptual database design were developed. In contrast to

Barker notation, Bachman notation incorporates elements

Categories and Subject Descriptors

of logical design (i.e. foreign keys) in the conceptual design

H.2.1 [Database Management]: Logical Design; K.3.2

level. Students’ achievements were examined with regard

[Computers and Education]: Computer and Information

to influencing factors throughout the learning process. Re-

Science Education

sults indicated that introducing the Bachman notation and

a manual transformation from a conceptual into a logical

data model increased students’ understanding of conceptual,

General Terms

logical and relational data model concepts (CLR concepts).

Theory, Experimentation

Here we present another aspect of this study. The influence

of notation used for conceptual design on student knowl-

Keywords

edge perception is examined. Research questions that are

entity relationship models, conceptual design, database de-

addressed and answered in the paper are (RQ1) How does

sign learning, Barker, Bachman, knowledge perception

the notation used for conceptual design influence students’

knowledge perception? and (RQ2) Does the correlation be-

1.

INTRODUCTION

tween student’s knowledge perception and actual knowledge

The relational databases are fundamental part of any infor-

about CLR concepts change throughout the learning pro-

mation system. Conceptual and logical design represent im-

cess?

portant segment of almost every application. Therefore, dif-

The structure of the paper is as follows. In Section 2, a

ferent issues related to teaching approaches of database fun-

methodological framework and experimental setting are pro-

damentals and design must be adequately addressed. The

vided. The main contribution of the paper is presented in

introduction to databases course is one of the fundamen-

Section 3 where results and discussion are detailed. Finally,

tals of computer science and/or informatics higher educa-

the conclusions are presented in Section 4.

tion programs. It is mostly a single semester course that

covers data requirements elicitation, conceptual database

2.

METHODOLOGY

design, normalization, logical database design, and physical

database design [3, 4, 5]. There is much research addressing

2.1

Experimental framework

issues related to teaching computer science and informatics

The study was carried out during the academic year 2016/2017

disciplines including various aspects of databases [3, 5, 9],

at the Faculty of Electrical Engineering and Computer Sci-

some research has dealt with the effectiveness of teaching

ence at the University of Maribor. The experiment was per-

approaches to database design (conceptual and logical mod-

formed within the Database I course. It is a single semester

eling) [1, 2, 7, 8]. However, to the best of our knowledge

course that includes 45 hours of theory/practice lectures and

19





30 hours of laboratory work in the form of computer exer-

participants have to model an entity (i.e. person) and give

cises.

it some attributes and possibly a primary key. For the sec-

ond task (9b), the participants have to model an additional

The focus of the experiment was on the evaluation of stu-

entity (i.e. phones) and present an 1:N relationship between

dents’ laboratory work. Students were randomly split into

the previous entity and the newly added one. For the third

two approximately equal size groups. Both groups worked

task (9c), the participants had to add a third entity (i.e. ad-

on the same database modeling tasks, using the Oracle SQL

dress), and correctly use a form of M:N relationship between

Developer Data Modeler design tool. One of the groups used

the previous entities and the newly added one. In order to

Bachman notation which explicitly includes the foreign key

be able to analyze the results, five concepts are evaluated:

in the E-R diagram, while another group used the Barker

entity, relationship, attribute, PK and FK. The scoring is

notation, which does not explicitly include the foreign key

as follows: if they used any possible form of the concept in

in the E-R diagram [6].

their solution and if the presented use of the concept was

correct, participants got a point for the concept. Thus, five

2.2

Experimental instruments

points could be scored in total.

In this section, a detailed presentation of the experimental

instruments used during the study is given. The question-

3.

RESULTS AND DISCUSSION

naire was conducted twice: Intro-Questionnaire and Final-

In the next sections we report on the results achieved in the

Questionnaire. The participation was optional in both oc-

experiment. Statistical analyses were performed using IBM

currences. The questionnaire used in the study is available

SPSS Statistics version 23.

on the web (http://bit.ly/2wMvrVQ).

3.1

Knowledge perception

The questionnaire is split into three parts. The first part

An analysis was performed on related samples of the per-

consists of mainly closed-ended questions related to basic

ception score and test score. It was based on data gathered

demographic information and database design tools (Ques-

from the Intro-Questionnaire and Final-Questionnaire. The

tions 1 - 6). The second part consists of a Likert scale-like

data for each questionnaire was analyzed separately.

multi-level table (Question 8), where participants have to

cross one of the multi-level options for five basic database

In the analysis we excluded all those records where students

terms and concepts: Entity, Relationship, Attribute, Pri-

rated one of the concepts as undefined, thus the total num-

mary Key (hereinafter PK) and Foreign Key (hereinafter

ber of records taken into account were 116. Therefore, we

FK). The values of the Likert scale are as follows: (1) - I am

got four levels of knowledge and five different concepts. As

not familiar with the term, (2) - I am familiar with the term,

mentioned in the previous section, part of the questionnaires

but not with the meaning, (3) - Undefined, (4) - I am famil-

was a short test. We will refer to the total test score as

iar with the meaning but I do not know how to use it and (5)

the test score. In order to effectively compare the actual

- I am familiar with the meaning and I know how to use it.

knowledge with the perception, we normalized the results

The third part included open-ended questions, given in the

of knowledge perception so that the total score (max. 20

form of a short test (Question 9). The short test consists of

points) was divided by five. We will refer to the normalized

three consecutively simple tasks, whereby each is related to

perception results as the perception score. Table 1 reports

the previous and each presents an increase in difficulty. In

on the results of the analysis which was performed using a

order to solve the test correctly, the participants have to use

Wilcoxon signed-rank test for related samples.

a form of one-to-many (hereinafter 1:N) and many-to-many

(hereinafter M:N) relationship. The participants should not

be given any instructions on how to solve the test. They

Table 1: Correlation of results for perception score

should be left to use any means and techniques that seem

and test score.

appropriate. The foreseen time limit is 20 minutes.

Asymp.

Experimental

Related

Sig.

N

Decision

instrument

Samples

The purpose of the questionnaire was to examine if there

(2-tailed)

was any correlation between the participant’s perception of

Intro-

Percep. score

Reject the

0.000**

107

knowledge of CLR concepts (Question 8) and their actual

Questionnaire

- Test score

null hypothesis

knowledge (score on the test questions 9a, 9b, 9c). When

Final-

Percep. score

Reject the

0.000**

116

the questionnaire was handed out the second time during the

Questionnaire

- Test score

null hypothesis

experiment, an additional closed-ended question was added

**Significant at 1%

to the first part (Question 7), whereby students were asked

which notation they used during the laboratory work. The

purpose of this particular question was to examine if there

We used the Wilcoxon signed-rank test in order to compare

was any correlation between the notation used during the

two not normally distributed sets of scores, one actual score

laboratory work and their knowledge (score on the test ques-

and another normalized perception score, that came from

tions 9a, 9b, 9c).

the same participants, since each participant had to solve

tasks and evaluate their knowledge on the aforementioned

In order to evaluate the questionnaire a scoring structure

CLR topics. The Shapiro-Wilk test of normality indicated

for the third part is needed (Question 9). The test con-

that data significantly deviates from a normal distribution

sists of three consecutive tasks (9a, 9b, 9c), whereby each

(p-value below 0.05). The Wilcoxon signed-rank test returns

relates to the previous and each constitutes an increase in

an asymptotic significance lower than 0.01, thus rejecting

difficulty. In order to solve the first task (9a) correctly, the

the null hypothesis for related samples. The null hypothesis

20





states that the median of difference between the perception

participant reached the opposite result, while 15 assessed

score and the test score will equal zero. There is a statisti-

their knowledge correctly.

A further indication of wrong

cally significant difference between the perception score and

knowledge perception can be deduced from the mean of the

the test score, suggesting that students’ perception of their

scored results.

The mean of the perception score during

knowledge is not in accordance with their actual knowledge

the Intro-Questionnaire is 4.095, while the mean for the test

on CLR concepts. Figures 1 and 2 depict the correlation be-

score stood at 1.74. In addition, the means for the Final-

tween students’ actual knowledge and their knowledge per-

Questionnaire were 4.957 and 3.35, respectively. We con-

ception, which indicates a higher knowledge perception than

clude that students overestimated their knowledge of CLR

the actual knowledge in both questionnaires. The results

concepts throughout the entire course.

indicate that the correlation between the knowledge percep-

tion and actual knowledge is corrected by the end of the

course (Final-Questionnaire), which is due to higher knowl-

Table 2: Cases of knowledge perception scores ver-

edge achieved by the end of the course. However, the knowl-

sus actual knowledge scores.

edge perception remains at a high level.

Experimental

Related

Mean

Sum of

N

instrument

Samples

Rank

Ranks

Negative

2 a

17.25

34.5

Ranks

Intro-

Percep. score

Positive

Questionnaire

- Test score

100 b

52.19

5218.5

Ranks

Ties

5 c

Total

107

Negative

1 a

1

1

Ranks

Final-

Percep. score

Positive

Questionnaire

- Test score

100 b

51.5

5150

Ranks

Ties

15 c

Total

116

a Perception score <Test score

b Perception score >Test score

c Perception score = Test score

Figure 1:

Correlation between students’ actual

Conclusions regarding RQ2: Students overestimated their

knowledge and their knowledge perception. Intro-

knowledge of CLR concepts throughout the entire course.

Questionnaire (course start).

The correlation between the students’ knowledge perception

and actual knowledge is corrected by the end of the course,

due to higher knowledge reached by the end of the course.

However, the knowledge perception remains at a high level.

3.2

Knowledge perception and notation

Additionally, we analyzed the results of the students’ knowl-

edge perception and actual knowledge considering the nota-

tion used in the learning process. Normalized results of stu-

dents’ self-assessment of their knowledge and results of our

assessment of their knowledge was summarized and used to

assess the students’ perception of knowledge in terms of the

dependence of the notation. The range of the summed score

is thus 1 - 10. As the summed score approaches the extremes,

the students were better able to assess their knowledge. It

means that their perception of their knowledge and their ac-

tual knowledge were very close. On the contrary, the closer

Figure 2:

Correlation between students’ actual

the results were to the middle, the more students incorrectly

knowledge and their knowledge perception. Final-

assessed their knowledge. It means that they either overes-

Questionnaire (course end).

timated or underestimated it. For example students could

assess their knowledge as high and reach five points for the

perception and also score all five points on the test, thus

Table 2 reports on the ranks of the performed Wilcoxon

collecting ten points. On the contrary, students could assess

signed-rank test. There were 100 out of 107 participants at

their knowledge as high, but reach a minimum or even none

the Intro-Questionnaire who assessed their knowledge higher

points on the test, thus scoring five points in total. The

than their actual knowledge was. On the contrary, only two

analysis of the impact of the notation was based on the data

participants reached the opposite results and only five as-

gathered from the Final-Questionnaire only, because the im-

sessed their knowledge correctly. The Final-Questionnaire

pact of the notation can only be seen after the notation was

results showed a slight increase in correctly assessed knowl-

used in the learning process.

Table 3 reports on the re-

edge. There were 100 out of 116 participants at the Final-

sults of the Mann-Whitney U test for independent samples.

Questionnaire who assessed their knowledge as being higher

We used the Mann-Whitney U test in order to compare dif-

than their actual knowledge was. On the contrary, only one

ferences between two independent groups (students using

21





4.

CONCLUSIONS

Table 3: Correlation of summed perception and test

The paper reported on the results of an experimental study

score and influencing factor (notation used).

aimed at analyzing the influence of notation used for the

Exper.

Independ.

Depend.

Asymp.

N

Decision

conceptual design on students’ knowledge perception. The

instr.

variable

variable

Sig.

study continues on the work already presented in [7], while

Summed

reporting on students’ knowledge perception being higher

Final-

perc.

Notation

116

0.008**

Reject the

Quest.

and test

null hypothesis

than the actual knowledge.

score

*Significant at 5%; **Significant at 1%

We examined whether students’ perception of knowledge is

in accordance with their actual knowledge of CLR concepts.

The results confirm that their perception is higher than the

Bachman or Barker notation) and the dependent variable

actual knowledge throughout the entire learning process. By

(students’ summarized test score and normalized perception

the end, their knowledge increases and perception remains

score), while the groups are not normally distributed. The

at a similar level as at the beginning. Additionally, the re-

Shapiro-Wilk test of normality indicated that data signifi-

sults prove that students who used the Bachman notation

cantly deviates from a normal distribution (p-value below

during the learning process were able to better estimate their

0.05).

knowledge. In the future we plan to analyze the correlation

between students’ educational background and their success

The Mann-Whitney U test returns an asymptotic signifi-

rate while learning the CLR concepts on the higher educa-

cance lower than 0.01 for the notation variable, therefore

tion degree level.

rejecting the related null hypothesis. The null hypothesis

states that the distribution of the summed score is the same

5.

ACKNOWLEDGMENTS

across categories of both Bachman and Barker notations.

Considering the results, there is a statistically significant dif-

The authors acknowledge the financial support from the

ference between the summed results scored by notation used

Slovenian Research Agency (Research Core Funding No. P2-

in the learning process. There were 68 out of 116 students

0057).

who used the Barker notation during the learning process,

and their summed mean score stood at 8.1. The Bachman

6.

REFERENCES

notation was used by 48 students, whereby their summed

[1] A. Al-Shamailh. An Experimental Comparison of ER

mean score was 8.608. According to Figure 3, it is evident

and UML Class Diagrams. International Journal of

that there were more students who used the Bachman nota-

Hybrid Information Technology, 8(2):279–288, 2015.

tion and better assessed their knowledge.

[2] H. C. Chan, K. K. Wei, and K. L. Siau. Conceptual

level versus logical level user-database interaction. In

Figure 3 depicts the correlation between the summed per-

ICIS 45. Proceedings, pages 29–40, 1991.

ception score and test score, and the notation used during

[3] T. M. Connolly and C. E. Begg. A

the learning process.

Constructivist-Based Approach to Teaching Database

Analysis and Design. Journal of Information Systems

Education, pages 43–53, 2005.

[4] R. Dargie and A. Steele. Teaching Database Concepts

using Spatial Data Types. In In Proceedings of the 4th

annual conference of Computing and Information

Technology Research and Education New Zealand, pages

17–21, 2013.

[5] C. Dom´ınguez and A. Jaime. Database design learning:

A project-based approach organized through a course

management system. Computers & Education,

55(3):1312–1320, 2010.

Figure 3: Summed perception score and test score in

[6] D. C. Hay. A comparison of data modeling techniques.

correlation with the notation used during the learn-

Essential Strategies, Inc, pages 1–52, 1999.

ing process.

[7] A. Kamišalić, M. Heričko, T. Welzer, and

M. Turkanović. Experimental Study on the

Effectiveness of a Teaching Approach Using Barker or

On the contrary, there were students who used the Barker

Bachman Notation for Conceptual Database Design.

notation with a summed score of five which indicates the

Computer Science and Information Systems,

worst assessment of knowledge. We conclude that students

15(2):421–448, 2018.

who used the Bachman notation in the learning process bet-

[8] H. C. Purchase, R. Welland, M. McGill, and

ter evaluated their knowledge than students who used the

L. Colpoys. Comprehension of diagram syntax: an

Barker notation.

empirical study of entity relationship notations.

International Journal of Human-Computer Studies,

Conclusions regarding RQ1: Bachman notation posi-

61(2):187–203, 2004.

tively influences students’ ability of knowledge self-assessment.

[9] S. D. Urban and S. W. Dietrich. Integrating the

By the course’s end, the difference between knowledge per-

Practical Use of a Database Product into a Theoretical

ception and actual knowledge lowers.

Curriculum. SIGCSE Bull., 29(1):121–125, 1997.

22





The Use of Standard Questionnaires for Evaluating the

Usability of Gamification

Alen Rajšp

Katja Kous

Tina Beranič

Faculty of Electrical

Faculty of Electrical

Faculty of Electrical

Engineering and Computer

Engineering and Computer

Engineering and Computer

Science

Science

Science

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

alen.rajsp@um.si

katja.kous@um.si

tina.beranic@um.si

ABSTRACT

the standardised definitions of usability. Fernandez et al. [7]

Usability has a significant impact on the satisfaction and

found that 59% of the reviewed papers reported end-user-

frequency of use of a designed system. Nowadays, gami-

based usability testing, while 35% of the reviewed papers

fication and serious game approaches are implemented in

used the inquiry methods (such as focus group, interviews,

software solutions to increase their usability. We present a

questionnaires and surveys). Based on these facts, this re-

literature review of 32 identified studies measuring usability,

search focuses on inquiry methods, more specifically on tech-

with established questionnaires in gamified systems and se-

nique questionnaires, and investigates which standard ques-

rious games. We identified 18 different questionnaires used

tionnaires are used most commonly for usability evaluation

for measuring usability, and found System Usability Scale

in The Gamification domain. Within the presented paper,

to be the most widely used. An immense issue exists in the

we focus on the research question: Which standard question-

field, with only 22% of studies measuring usability actually

naires are used for evaluating the usability of gamification?

describing or defining what usability is.

Using a literature review, we study the use and popularity

of established usability questionnaires in the Gamification

Categories and Subject Descriptors

domain.

H.5.2 [User Interfaces]: User-centered design

A similar study, made by Yá˜

nez-Gómez et al. [18], presents

General Terms

the review of academic methods for usability evaluation of

serious games. The scope of the study is broader, aiming at

Measurement, Experimentation, Standardization, Theory,

finding the preferred approach for evaluating the usability

Verification

of games. As the results show, standard questionnaires are

the second most used technique applied in post-use analysis

Keywords

[18]. They mention three questionnaires in use, but detailed

Usability Evaluation Method, Formal Questionnaires, Gam-

analysis is not provided. Also, in comparison to the pre-

ification, Serious Games

sented study, our search string differs. Another review is

presented by Calderón and Ruiz [5], also covering the do-

1.

INTRODUCTION

main of Serious Games’ Evaluation. One of the research

In recent years, gamification has become an essential part

questions concerned evaluation techniques, and discovered

of varieties of domains, from Education to Medicine. It is

that questionnaires are the most commonly used, but the

used for facilitating the use of developed products. They

categorization or detailed analysis of the used questionnaires

cannot achieve its purpose if the usability of the product is

was not provided.

inadequate. Therefore usability evaluation should present a

crucial step of development.

The paper is structured as follows. We start by presenting

the research background covering usability evaluation and

Solutions utilising (1) gamification or (2) serious game ap-

gamification, we continue by presenting and discussing the

proach should be evaluated separately, due to them being

results of the literature review. We close our paper by pre-

inspired by games which have very specific (and different)

senting the conclusions reached by our review.

natures.

The primary function of games is to entertain

through experience whereas serious games and gamification

have some intended useful purpose [10].

Because gamifi-

2.

USABILITY EVALUATION

cation and serious games’ approaches utilise elements from

The term usability represents a combination of several prop-

games, this leads to solutions where even other needs of so-

erties and attributes [13]. Regardless of the variety of def-

lutions intended for an audience are being met to varying

initions by different authors [1, 3, 9, 13, 15, 17], Jeng [12]

degrees. In solutions this causes an increase of user satisfac-

states that Nielsen and ISO 9241-11 definitions are the most

tion.

widely cited. ISO 9241-11 defines usability as “the extent to

which a product can be used by a specified user to achieve

In the web area of expertise, only 18% of the reviewed pa-

specified goals with effectiveness, efficiency and satisfaction

pers in [7] present usability evaluation methods relying on

in a specified context of use” [11], while Nielsen [15] defines

23



usability as an aggregation of five attributes: Learnability,

efficiency, memorability, errors and satisfaction.

The usability evaluation method is defined as “a procedure,

composed of a set of activities for collecting usage data re-

lated to end user interaction with a software product, and/or

how the specific properties of this software product con-

tribute to achieving a certain degree of usability” [7]. Ac-

cording to Battleson et al. [2], the usability evaluation meth-

ods are classified into three categories: (1) Inquiry methods

(such as focus group, interviews, questionnaires and sur-

veys), (2) Formal usability testing (such as interactions with

a website by performing tasks) and (3) Inspection methods

(such as heuristic evaluation, cognitive walk-through, plu-

ralistic walk-through and formal inspection). The first two

categories involve real-users, while inspection methods are

Figure 1: Primary studies by years

based on reviewing the usability aspects of web artifacts,

which have to comply with established guidelines, and are

performed by expert evaluators or designers [7].

from selected primary studies showed that only seven pri-

mary studies (22%) defined and described the term of us-

3.

GAMIFICATION

ability. Two of them indicated usability as a concept (S5,

S10), while five researches treated usability as construct,

Gamification is the use of design elements characteristic for

namely two studies (S11, S21) used Nielsen’s definition, one

games in non-game contexts [6]. Gamification should not be

research (S4) used the ISO definition, one research (S18) de-

confused with serious games. Whereas the goal of introduc-

scribed usability as “ease of use of the game”, while study

ing gamification is influencing learning related behaviuors

S25 defined usability similar to ISO, but expanded the defi-

and attitudes without providing knowledge, the use of seri-

nition with two new concepts (“simple” and “operating with

ous games should influence learning and provide knowledge

ease”). The remaining studies (78%) used the term usability

by the experience itself [14]. Another way to compare gam-

without providing the meaning of usability.

ification and serious games is that gamification represents

using only parts (game elements) from games, while serious

Studies are classified by domain in Table 1. Over half (56%)

games represent the whole immense gaming experience [6].

of all studies were from the field of Health and Medicine.

Most of the studies from the domain addressed (1) Training

4.

EVALUATING THE USABILITY OF

of health care personnel (S8, S17, S18), (2) Rehabilitation

GAMIFICATION

and exercise for patients (S3, S6, S7, S16) and (3) Assessing

patients (S1). The second most popular domain (37%) was

4.1

Research

Education and Learning. All other identified domains had

Our research aims to find available standard questionnaires

only 1 study per domain.

used for evaluating the usability of gamification. Using the

following search string ”usability” AND (”gamification” OR

Domain

Primary studies

”serious games” OR ”educational games”) we conducted a

Agriculture

S27

search in the following digital libraries: ScienceDirect, IEEE

Xplore, ACM Digital Library and Sage journals.

Deter-

Business Intelligence

S5

mined inclusion and exclusion criteria guided the study se-

Computer Science

S5

lection process. We considered the papers evaluating us-

Education & Learning

S2, S8, S10, S13, S14, S16, S17,

ability with the help of established and well-known ques-

S18, S23, S28, S29, S31

tionnaires. Therefore, we excluded primary studies using

Entertainment

S4

ad-hoc questionnaires.

Health & Medicine

S1, S3, S6, S7, S8, S11, S12,

S16, S17, S18, S19, S20, S21,

After the review process, we selected 33 primary studies.

S22, S24, S29, S30, S32

The list of primary studies we used as input into the data ex-

Social Science

S25

traction and data synthesis step is available at: https://tiny

Task Management

S9

url.com/CSS2018-IJS. 26 out of 33 primary studies are con-

Travel

S15

ference papers, whereas seven papers are journal articles.

Figure 1 shows the number of primary studies by year of

Table 1: Domain

publishing. We selected 23 primary studies from the IEEE

Xplore digital library, six from the ACM Digital Library,

We continued the data extraction by identifying standard

three from ScienceDirect and one from Sage journals.

questionnaires used for usability evaluation. We followed

the explanation provided by Yá˜

nez-Gómez et al. [18], which

4.2

Results

states that standard questionnaires are the ones that are

Within data extraction, we focused on two main areas. First,

validated statistically. Table 2 presents used questionnaires

we searched for used definitions of usability, since the lat-

in connection with primary studies. The majority of stud-

ter was evaluated in the analysed studies. Extracted data

ies evaluated usability by using the System Usability Scale

24

(SUS). It was used in 78% of primary studies. Although

most established questionnaire SUS for measuring usability,

Technology Acceptance Model (TAM) is used in the model-

but did not define the measured attribute in their research.

driven analysis for measurement of users’ acceptance and us-

Table 3 presents the connection between the used question-

age of technology and it is not classified as a standard ques-

naires and measured attributes that were measured at least

tionnaire for usability evaluation, it was used for assessment

in two primary studies. The most frequently measured at-

of gamification in four primary studies. On the other hand,

tributes were ”ease of use” and ”usability” and both were

Game Experience Questionnaire (GEQ), Task Load index

used in six primary studies. In all cases, the attribute ”us-

(TLX), Game Engagement Questionnaire (GEQ), Post-Study

ability”, was measured with SUS, while the attribute ”ease

System Usability Questionnaire (PSSUQ) and Net Promoter

of use” was measured with three different questionnaires:

Score (NPS) are each used in two primary studies. We ex-

SUMI (S7), USE (S9) and TAM (S2, S3, S11, S20). The

tracted other questionnaires that are used only in one pri-

second most frequent measured attribute was attribute ”use-

mary study, such as Presence Questionnaire (PQ) and Soft-

fulness”. In three primary studies (S2, S11, S20), it was

ware Usability Measurement Inventory (SUMI). To achieve

treated and determined as one of the two factors defined in

TAM, while, in one case, it was measured with USE (S9) and

Questionnaire

Primary studies

PSSUQ (S15). The attribute ”satisfaction” was the third

most commonly used attribute measured by two different

System Usability Scale

S1, S3, S6, S7, S10, S11,

(SUS)

S12, S14, S16, S17, S18,

questionnaires: SUS (S21, S22, S31) and USE (S9).

S19, S20, S21, S22, S23,

S24, S25, S26, S27, S28,

Measured attribute

Questionnaires

S29, S30, S31, S32

Ease of use

SUMI (S7), USE (S9),

Technology

Acceptance

S2, S3, S11, S20

TAM (S2, S3, S11, S20)

Model (TAM)

Usability

SUS (S10, S11, S16, S23,

Game Experience Ques-

S1, S4, S30

tionnaire (GEQ)

S24, S31)

Task Load index (TLX)

S1, S22

Usefulness

TAM (S2,S11,S20), USE

Game

Engagement

S11, S18

(S9), PSSUQ (S15)

Questionnaire (GEQ)

Satisfaction

USE (S9), SUS

Post-Study System Us-

S8, S15

(S21, S22, S31)

ability

Questionnaire

Flow

GEQ (S1, S4, S11)

(PSSUQ)

Learnability

SUMI (S7), USE (S9)

Net

Promoter

Score

S31-S32

Competence

GEQ (S1, S4)

(NPS)

Overall

CSUQ (S13), SUMI (S7)

User Engagement Scale

S5

(UES)

Quality of Information

CSUQ (S13), PSSUQ (S15)

Computer

System

Us-

S13

Quality of interface

CSUQ (S13), PSSUQ (S15)

ability

Questionnaire

(CSUQ)

Table 3:

Connection between the measured at-

Software Usability Mea-

S7

tributes and used questionnaires

surement

Inventory

(SUMI)

The most popular devices on which developed/proposed so-

Intrinsic Motivation In-

S16

lutions were run were computers (62%), virtual reality equip-

ventory (IMI)

ment (22%) and mobile devices (16%) as seen in Table 4.

User Interaction Satis-

S18

faction (QUIS)

Device

Primary studies

Presence

Questionnaire

S10

(PQ)

Computer

S1, S2, S4, S5, S6, S7, S8, S10,

Usefulness, Satisfaction,

S9

S11, S13, S14, S16, S21, S22,

and Ease of use (USE)

S23, S25, S26, S28, S29, S31

Questionnaire

Customised system

S19

Pick-A-Mood (PAM)

S10

Mobile device

S9, S12, S15, S20, S27

Technology

Affinity

S20

Smart TV

S3

-

Electronic

Devices

Virtual reality

S10, S15, S17, S18, S24, S30, S32

(TA-ED) Questionnaire

Game User Experience

S19

and

Satisfaction

Scale

Table 4: Devices on which the studied system runs

(GUESS)

Differential

Emotions

S10

4.3

Discussion

Scale (DES)

An extensive collection of standard questionnaires were found

for evaluating the usability of gamification, with System Us-

Table 2: Standard questionnaires in use

ability Scale (SUS) as the prevailing choice (84% of all stud-

ies). Since SUS is a well-known questionnaire, which is easy

a comprehensive usability evaluation, it is crucial that mea-

to perform and analyse, this is not a surprise. As SUS was

surement instruments used are utilised appropriately accord-

developed for providing a subjective assessment of usability

ing to the attribute they are measuring. 41% (13/32) of

[4], its extensive use is even more understandable. The ma-

primary studies (S6, S12, S17-S19, S25-S30, S32) used the

jority of researchers that used SUS in their studies did not

25

quote explicitly which attribute of usability was measured;

Kaufmann, San Francisco, 2002.

the remaining studies, where the SUS were used, defined two

[4] J. Brooke. Sus: A quick and dirty usability scale, 1996.

different attributes that can be measured with SUS. The

[5] A. Calderón and M. Ruiz. A systematic literature

first attribute was ”usability” and it is in accordance with

review on serious games evaluation: An application to

description of SUS usage purpose [4], while the second one

software project management. Computers &

was ”satisfaction”, which is recommended by the ISO/TS

Education, 87:396–422, 2015.

20282-2:2013 [8] Standard, where the SUS is defined as a

[6] S. Deterding, D. Dixon, R. Khaled, and L. Nacke.

questionnaire for measuring satisfaction.

From game design elements to gamefulness: Defining

gamification. Proceedings of the 15th International

Another aspect is also if standard usability questionnaires

Academic MindTrek Conference on Envisioning Future

can evaluate the usability of gamification adequately. She-

Media Environments - MindTrek ’11, pages 9–11,

gawa et al. [16] claims that the SUS questionnaire is a veri-

2011.

fied instrument for measuring usability in the Serious Games

[7] A. Fernandez, E. Insfran, and S. Abrah˜

ao. Usability

domain.

Technology Acceptance Model (TAM) is widely

evaluation methods for the web: A systematic

used in the Information System domain to investigate how

mapping study. Inf. Softw. Technol., 53(8):789–817,

accepted the use of technology is among their target users.

Aug 2011.

Although it is not classified as a standard questionnaire for

[8] I. O. for Standardization. ISO/TS 20282-2:2013

usability evaluation, but rather as a model combining con-

Usability of consumer products and products for

structs ease of use and usefulness, it was the second most

public use - Part 2: Summative test method, 2013.

used measuring instrument for usability evaluation in re-

[9] E. Furtado, J. J. V. Furtado, F. Lincoln Mattos, and

viewed literature. On the other hand, it is also seen that

J. Vanderdonckt. Improving usability of an online

questionnaires, like Game Experience Questionnaire (GEQ)

learning system by means of multimedia,

and Game Engagement Questionnaire (GEQ), that originate

collaboration, and adaptation resources. In Usability

from Gaming domain, are nowadays used to evaluate the us-

Eval. Online Learn. Programs, pages 69–86, October

ability of gamification. Therefore, the fusion of two fields is

2003.

perceived.

[10] C. Girard, J. Ecalle, and A. Magnan. Serious games as

new educational tools: how effective are they? A

5.

CONCLUSION

meta-analysis of recent studies. Journal of Computer

The paper presents conducted literature review which was

Assisted Learning, 29(3):207–219, 2013.

aimed at finding standard questionnaires used for usabil-

[11] ISO. Standard 9241: Ergonomic Requirements for

ity evaluation of gamification and serious games. We found

Office Work with Visual Display Terminals (VDT)s,

that the majority (84%) of studies evaluate usability using

Part 11. Guidance on Usability. 1998.

a System Usability Scale (SUS), though some other ques-

[12] J. Jeng. What is usability in the context of the digiral

tionnaires were also detected and used independently, or in

library and how can it be measured? Information

combination with SUS. We, as prospective researchers, can

Technology and Libraries, 24(2):47–56, Nov 2005.

determine only in a minority of cases what primary studies

[13] Z. Kılıç Delice, Elif Güngör. The usability analysis

were measuring, because only 22% of primary studies mea-

with heuristic evaluation and analytic hierarchy

suring usability defined or described what usability is. That

process. Int. J. Ind. Ergon., 39(6):934–939, Nov 2009.

is an immense issue on validity of their measurements of us-

[14] R. N. Landers. Developing a Theory of Gamified

ability, since multiple definitions of it exist. We propose that

Learning: Linking Serious Games and Gamification of

methods for measuring usability in the field of Gamification

Learning. Simulation & Gaming, 45(6):752–768, 2014.

and Serious Games should be formalised in the future. Al-

though researchers are already using standardised methods

[15] J. Nielsen. Usability Engineering. Academic Press, San

for measuring usability, research should also present what

Diego, 1993.

usability means for them, what they are measuring.

[16] R. Shewaga, A. Uribe-Quevedo, B. Kapralos, K. Lee,

and F. Alam. A Serious Game for Anesthesia-Based

Crisis Resource Management Training. Entertainment

6.

ACKNOWLEDGMENTS

Computing, 16(2):6:1–6:16, apr 2018.

The authors acknowledge the financial support from the

[17] G. Tsakonas and C. Papatheodorou. Exploring

Slovenian Research Agency (Research Core Funding No. P2-

usefulness and usability in the evaluation of open

0057).

access digital libraries. Information Processing &

Management, 44(3):1234–1250, May 2008.

7.

REFERENCES

[18] R. Yá˜

nez-Gómez, D. Cascado-Caballero, J.-L. J.-L.

[1] A. Abran, A. Khelifi, W. Suryn, and A. Seffah.

Sevillano, R. Yanez-Gomez, D. Cascado-Caballero,

Usability meanings and interpretations in iso

and J.-L. J.-L. Sevillano. Academic methods for

standards. Information and Software Technology.,

usability evaluation of serious games: a systematic

11(4):325–338, Aug 2003.

review. Multimedia Tools and Applications,

[2] B. Battleson, A. Booth, and J. Weintrop. Usability

76(4):5755–5784, Feb 2017.

testing of an academic library web site: A case study

use of academic library web. J. Acad. Librariansh.,

27(3):325–338, 2001.

[3] T. Brinck, D. Gergle, and S. D. Wood. Designing Web

Sites that Work: Usability for the Web. Morgan

26





Analyzing Short Text Jokes from Online sources with

Machine Learning Approaches



Samo Šimenko

Vili Podgorelec

Sašo Karakatič

Faculty of Electrical Engineering and Faculty of Electrical Engineering and Faculty of Electrical Engineering and Computer Science

Computer Science

Computer Science

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

samo.simenko@student.um.si

vili.podgorelec@um.si

saso.karakatic@um.si



ABSTRACT

jokes from the online sources. Third section presents the individual

This paper presents the whole data mining process of analyzing

steps of data- and text- mining in details. It consists of machine

jokes in Slovenian language gathered from various online sources.

learning method description, applications and techniques used in

The gathering was done with the help of web scrapping system and

the process and the results itself. We finish up with the conclusion

the analysis was carried out on the gathered jokes to determine the

and the discussion on the topic of joke analysis with various data

properties of various types of jokes. In addition, with the help of

mining methods.

various text-mining methods, we analyzed different types of jokes

2. GATHERING AND PARSING OF THE

and built a machine learning model for classifying jokes into

categories. These results are supplemented with the visualization of

JOKES FROM THE ONLINE SOURCES

different categories and the interpretation of constructed machine

In order to fulfill the set goals of analyzing jokes, we obtained these

learning classification models.

from various sources. Three different sources were used:

–

Categories and Subject Descriptors

From the first source, a web site called VERZIVICI [2], joker



already classified into categories;

H.4 [Information Systems Applications]: Miscellaneous;

– Jokes from the second source NAJVICI [3];

I.2.m [Artificial Intelligence]: Miscellaneous;

– Jokes from a third source MLADINSKI [4].

General Terms

For the data acquisition we developed a program in the Visual

Machine Learning, Data Mining, MDS, SVC

Studio IDE, using the C# programming langauge, which acquired

jokes from the selected sources and saved them in a suitable text

Keywords

format. Due to the unstructured data of selected web resources, we

Data mining, Machine Learning, Joke analysis, Short text analysis,

used HAP (HTMLagilityPack) for processing. HAP is a HTML

Text mining

parser written in C# for reading/writing the DOM (Document

Object Model) and supports plain XPATH or XSLT [1]. Using the

1. INTRODUCTION

HAP library and XPATH, we could easily access individual

Due to the ever-advancing technology, opportunities are

sections, which contained content known as a “joke”.

opening for analyzing all types of data, so we can make the most

of this and use it for our benefit. By studying and examining various

Jokes from VERZIVICI, which were categorized when gathered,

types of texts, scientists were already involved in the initial phases

were manually entered, since the program for collecting jokes from

of textual analysis [8, 9, 10], but studying the meaning and

different categories used the category name in the creation of a

URL, which is used for scrolling between categories. For

connection of texts presents a rather new direction of research,

where there is still a lot of room for improvement. While there has

NAJVICI, we manually created a URL for gathering jokes so we

been a lot of work done on various short text types, i.e. tweets [12],

can easily access all jokes on the site.

reviews [11], recipes [13] and others, there is a lack of research

On the website MLADINSKI, jokes were already grouped and the

published on the topic of jokes analysis.

jokes were sequentially recorded on one side of the web page. For

the purpose of processing and subsequent manipulation, a simple

In our paper, we present a process of gathering, parsing and

VIC class was created, which contains two textual attributes of Text

pre-processing jokes and applying various data- and text-mining

and Category. Both attributes can store values in string format, Text

techniques to extract patterns and new knowledge from jokes data.

attribute is for raw text of a joke, Category is for type of category

By semantic text processing, we identify more than just a sequence

in which joke is categorized. When we were capturing blank

of symbols, we can assign them meaning, which can influence the

spaces, we encountered redundant badges before text and between

classification of jokes. In our case, we undertook the processing of

texts. Also, unreadable machine records were created instead of

various jokes that we analyzed in order to determine how the

symbols due to coding. All badges with associated symbols and

categories of such texts are interconnected by their content and find

non-nominal groups of words, which were created instead of

out which categories of jokes share the most similar content. Based

symbols, were manually entered into the program and then

on the texts, we created a classification model for the classification

programmatically removed.

of jokes into predefined categories.

As a result of obtaining and processing the data from the selected

The rest of the paper is structured in the following way. The

sources, we received the data, which are used as the basis below:

following section presents the method for gathering and parsing

– VERZIVICI [2] – 13 categories, a total of 1729 jokes,

27

– NAJVICI [3] – a total of 297 jokes, and

classification is a supervised machine learning method, which

–

means that machine learns to classify jokes from the already solved

MLADINSKI [4] – a total of 145 jokes.

(classified) examples [15].

We have saved the acquired data in the CSV format. Due to the

characteristics of the CSV format, the comma symbol "," was

There are numerous different classification algorithms [18], but for

our case we used the Support Vector Machine (SVM) classifier,

changed to the XX symbol, addressed below, because comma in

CSV represents a separator between lines, in jokes commas can

developed by Vapnik in 2000 [16]. This method learns the

have different meaning. All of the jokes were in Slovenian

boundaries that separate instances (jokes in our case) from one

language, so this had to be taken into consideration during the text

category to another, by finding a linear separation border called

hyper-plane that has a maximum distance from the entire instance

analysis.

set, which is called the maximum margin. The instances that are

3. DATA ANALYSIS

closest on the hyper-plane (on the hyper-plane itself) are called

In this section, we will present the methods and techniques for

support vectors. This SVM method also uses a kernel trick [19],

analyzing the jokes and the results of these analysis. The whole

which maps the attribute space of the classification instance to a

process of cleaning, preprocessing, and the analysis itself was done

higher dimensional space. In our case, we used a linear kernel,

with the Python programming language, and its libraries.

which uses a liner function to transform the attributes in such a way,

that the margin of the hyper-plane is maximized.

3.1 Cleaning and preprocessing the data

We used the implementation of SVM from the library liblinear [20],

As mentioned, we use the Python programming language to process

which has high flexibility in the choice of penalties and loss

data in which you can simply import information in a CSV format

functions and should scale to large numbers of samples. This library

using the Pandas library [5]. Pandas is an open source, BSD-

supports both dense and sparse input and the multiclass support is

licensed library providing high-performance, easy-to-use data

handled according to a one-vs-the-rest scheme [6].

structures and data analysis tools for the Python programming

language [5, 14]. The imported data is then appropriately structured

Upon preliminary data preparation, the whole joke dataset is

using the DataFrame class with the following columns (attributes):

divided into train and test sets, where the training set is used to build

–

the SVM classification model, and the test set is used to test the

Index,

quality of the model – the ability to correctly classify yet unseen

– Category, and

jokes. In our experiment, we applied stratified sampling to split the

– RawText.

data and used 60% of data for training test and the rest 40% for the

test set. The results of the experiment show, that the resulting

The XX symbols are also removed and replaced with the comma

classification model classifies test jokes with 61% accuracy. The

symbol ",". From the text, we also removed stop-words, which is a

classifier has correctly classified more than half of jokes into their

list of common words that do not carry any semantic meaning and

proper category out of 13 possible categories.

information. Stop words occurred in texts in high frequency but are

of little significance and consequently uninteresting. A sample of

The default classification of instances in one of 13 categories would

stop words in Slovenian language are the following:

result in only 0.08 accuracy, so our resulting classifier improves the

default classifier significantly. This represents a high percentage of

“in” (En. and),



“ali” (En. or),

precision as was not foreseen at first glance. Additionally, we also

”je”(En. is),



”za” (En. for),

manually examined some of the jokes that were misclassified.

”to” (En. this),



”na” (En. on),

Interestingly, although the predicted categories were not correct,

”to” (En. this),



”ti” (En. you),

several of the examined jokes would fit well into the predicted

”ko” (En. when),

”bi” (En. would),

category as well, as the semantics of a joke is not always

”ne” (En. no),



”da” (En. yes),

monolithic.

”že” (En. already), ”le” (En. only).

In addition, the punctuations were removed, so the resulting text

3.3 Word frequency analysis and

was in the form of one sentence without most common stop words.

visualization

From the resulting text, we built a representation of every joke in

From the dataset of jokes, with attributes of individual word’s tf-

the format appropriate for the analysis. We used the method of

idf scores, we built word cloud diagrams for every category of the

counting the frequency of individual words called word frequency.

joke. The word clouds were made with the help of libraries

This number was normalized by the word frequency of the word in

matplotlib [21] and wordcloud for the Python programming

all categories, so the more common words got the lower score and

language. In the word cloud, the most common words (or rather

the less common and maybe unique words got higher score. This

those with higher tf-idf scores) are written in larger font, while

process is called tf-idf (term frequency-inverse document

those with lower frequency (lower tf-idf scores) are written in

frequency) and is a common word scoring method in text mining

smaller font. The color of the words only serves to make words

[17]. The new dataset was built in such way, that all of the identified

more differentiable and thus improves the readability of word

words represented one attribute of the joke, and the corresponding

clouds.

value of that attribute is the tf-idf score of that word in that joke.

Also, these word cloud show which highly informative words (non-

stop words) are common for each category and can be used for

3.2 Classification of jokes in the

manual classification, this way we can check whether a joke, which

predetermined categories

reads: “pride nekega dne k janezkovemu očetu domov nek njegov

We used the classification machine learning technique in order to

nadležen prijatelj tone potrka vpraša dober dan oče doma janezek

construct a model of classification that would learn how to classify

tone ja kje janezek vem grem vprašat” was appropriately classified

yet unseen jokes to one of the predetermined categories. This can

into a category (the original category is called “janezek”, “Solski”

be useful if one would want to automate joke categorization on an

was predicted). As we can see in Figure 1, our model correctly

online joke portal without any need for human intervention. The

decided to classify the joke in the category “Solski”, because the

28





word “janezek” prevails in this category and is the dominant word

expressions, which are more commonly used in foreign jokes as

in the content of the joke.

well as older jokes.



Figure 1: Hierarchical clustering of joke categories.



3.5 Multidimensional scaling

Multidimensional scaling (MDS) enables the visualization of the

level of similarity of individual cases of a dataset by lowering the

number of different attributes to only two. It refers to a set of related

ordination techniques used in information visualization, in

particular to display the information contained in a distance matrix

[7]. By using the MDS in the sklearn.manifold[23] library and the

mpl_toolkits.mplot3d[24] library, we can observe relations

between categories even more efficiently, as shown in a 2D graph

in the Figure 3. This plot shows which categories are closer together

and which categories differ the most. Contrary to the dendrogram,

we can see that “Mujo in Haso” are not so close to “Ciganski” and

“Stari vici”, but these three categories differ the most from the rest.



Figure 1: Word-clouds for ten joke categories.



3.4 Hierarchy of the categories

With the help of the scipy [22] Python library, we also built a

dendrogram of relations between the categories using a hierarchical

clustering method, which is shown in the Figure 2. Here we also

included the category from sources NAJVICI and MLADINSKI,

so that we can visually display the content linkage between

different categories. The dendrogram is a hierarchical diagram,

which shows which terms (in our case joke categories) are closer

together by putting the more similar categories closer together on

the Y-axis. The more similar are the categories, shorter are the lines

connecting these categories, and vice versa.

From the dendrogram we can see that the categories MLADINSKI

(En. young ones) and SOLSKI (En. School ones) are most similar,

since the school is usually visited by young people. Based on the



names of the categories NAJVICI and Mesane sale (En. Random

jokes), it can also be assumed that these categories are very similar.

Figure 2: 2D Multidimensional scaling plot, which shows the

From the dendrogram we can also see that groups of categories

similarity of different joke categories

marked by red and green connections are very different. We can



conclude that this division can be attributed primarily to slang

29



This shows the seclusion of three categories (a group of categories

[3] http://www.naj-vici.com, Last visited: 5.8.2018

marked in a dendrogram with red color, which includes Stari Vici,

[4] http://www.mladinska.com/, Last visited: 5.8.2018

Mujo in Haso and Ciganski) in relation to other categories. These

make up a kind of circle around the categories “NAJVICI” and

[5] https://pandas.pydata.org, Last visited: 13.8.2018

“Mesana Sale”. Categories “NAJVICI” and “Mesana Sale” are the

[6] http://scikit-learn.org/stable/modules/generated/sklearn.svm.Li

closest neighbors, which also suggests an exceptional similarity

nearSVC.html, Last visited: 13.8.2018

between the categories.With the help of Figure 3, we can see the

[7] https://en.wikipedia.org/wiki/Multidimensional_scaling,

relationship between categories even better; in the case of the

categories “Moski” and “Zenske”, we can see that according to

Last visited: 20.8.2018

their content, these two are very similar categories.

[8] Song, G., Ye, Y., Du, X., Huang, X. and Bie, S., 2014. Short

As depicted in the Figure 4 is a 3D graph of relations for use in

text classification: A survey. Journal of Multimedia, 9(5), p.635.

further discussions for the show. By using the 3D graph (Graph 4),

[9] Chen, M., Jin, X. and Shen, D., 2011, July. Short text

we can even more accurately determine the differences between the

classification improved by learning multi-granularity topics. In

categories of texts. This display mode can turn out to be even more

IJCAI (pp. 1776-1781).

useful in a larger number of data and when looking for interesting

patterns in these texts.

[10] Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H. and

Demirbas, M., 2010, July. Short text classification in twitter to

improve information filtering. In Proceedings of the 33rd

international ACM SIGIR conference on Research and

development in information retrieval (pp. 841-842). ACM.

[11] Dave, K., Lawrence, S. and Pennock, D.M., 2003, May.

Mining the peanut gallery: Opinion extraction and semantic

classification of product reviews. In Proceedings of the 12th

international conference on World Wide Web (pp. 519-528). ACM.

[12] Wang, Y., Liu, J., Qu, J., Huang, Y., Chen, J. and Feng, X.,

2014, December. Hashtag graph based topic model for tweet

mining. In Data Mining (ICDM), 2014 IEEE International

Conference on (pp. 1025-1030). IEEE.

[13] Badra, F., Bendaoud, R., Bentebibel, R., Champin, P.A.,

Cojan, J., Cordier, A., Després, S., Jean-Daubias, S., Lieber, J.,



Meilender, T. and Mille, A., 2008, September. Taaable: Text

mining, ontology engineering, and hierarchical classification for

Figure 3: 3D Multidimensional scaling plot, which shows the

textual case-based cooking. In 9th European Conference on Case-

similarity of different joke categories

Based Reasoning-ECCBR 2008, Workshop Proceedings (pp. 219-

4. CONCLUSION

228).

This paper presents a use case of machine learning methods in the

[14] McKinney, W., 2012. Python for data analysis: Data

analysis of short texts in a form of jokes. We presented the process

wrangling with Pandas, NumPy, and IPython. " O'Reilly Media,

of gathering, cleaning and preprocessing the jokes, which was

Inc.".

followed by the description of the analysis done with machine

[15] Friedman, J., Hastie, T. and Tibshirani, R., 2001. The elements

learning methods and various visualization techniques. We

of statistical learning (Vol. 1, No. 10). New York, NY, USA::

demonstrated how jokes could be automatically categorized in the

Springer series in statistics.

predefined categories using the Support Vector Machine

classification method. With two different visualizations: the

[16] Vapnik, V. and Mukherjee, S., 2000. Support vector method

dendrogram and the multidimensional scaling plot, we showed how

for multivariate density estimation. In Advances in neural

different joke categories are similar one to another. With these

information processing systems (pp. 659-665).

methods, we demonstrated, how we could perform different

[17] Aizawa, A., 2003. An information-theoretic perspective of tf–

comparisons, which can serve us in the further processing of data,

idf measures. Information Processing & Management, 39(1),

and the connection of data between us is visualized in a useful and

pp.45-65.

interesting way.

[18]http://en.wikipedia.org/wiki/Category:Classification_algorith

In this paper, we only analyzed the jokes in Slovenian language.

ms, Last visited 13.9.2018

For future work, we could compare jokes in different languages to

find similarities and differences of jokes and their popularity across

[19] https://en.wikipedia.org/wiki/Support_vector_machine Last

different languages and cultures.

visited 13.9.2018

[20] https://www.csie.ntu.edu.tw/~cjlin/liblinear/, Last visited

ACKNOWLEDGMENTS

13.9.2018

The authors acknowledge the financial support from the Slovenian

[21] https://matplotlib.org/, Last visited 13.9.2018

Research Agency (research core funding No. P2-0057).

[22] https://www.scipy.org/, Last visited 13.9.2018

REFERENCES

[23] http://scikit-learn.org/stable/modules/generated/sklearn.manif

[1] http://html-agility-pack.net, Last visited: 20.8.2018

old.MDS.html, Last visited 13.9.2018

[2] http://www.verzi-vici.com, Last visited: 5.8.2018

[24] https://matplotlib.org/2.0.2/mpl_toolkits/mplot3d/api.html

30





A Data Science Approach to the Analysis of Food Recipes



Tjaša Heričko

Sašo Karakatič

Vili Podgorelec

Faculty of Electrical Engineering and Faculty of Electrical Engineering and Faculty of Electrical Engineering and Computer Science

Computer Science,

Computer Science,

University of Maribor, FERI

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

tjasa.hericko@student.um.si

saso.karakatic@um.si

vili.podgorelec@um.si





ABSTRACT

from recipes needed to perform well at cuisine prediction. (3)

In this paper, we explore the correlation between cuisine and text-

Enhancing cuisine prediction.

based information in recipes. The experiments are conducted on a

This paper is organized as follows. Section 2 gives a brief

real dataset consisting of 9,080 recipes with data science

overview of related work. Section 3 presents the dataset used in

approaches focusing on enhancing cuisine prediction and

our research. Section 4 describes the applied methodologies.

providing a detailed insight on the characterization of food

Section 5 provides results of our research. Section 6 concludes the

cultures. The findings suggest that information about ingredients

paper by summarizing the main results of our work.

is the most relevant predictor of cuisines, however, despite being

less efficient, recipe name, preparation instructions, preparation

2. RELATED WORK

time, skill level and nutritional facts can be considered as well.

The correlation between recipes and their cuisines has been the

subject of several recipe analysis related research. Mostly, there

Categories and Subject Descriptors

have been previous studies conducted on classifying recipes into

I.2.m [Artificial Intelligence]: Miscellaneous.

respective cuisines based on ingredients. H. Su et. al. [1]

I.5.m [Pattern Recognition]: Miscellaneous.

evaluated data collected from Food2 and used the techniques of

associative classification and support vector machine to classify

General Terms

226,025 recipes to one of six cuisines, using ingredients as inputs,

Algorithms, Measurement, Experimentation.

with a precision and recall of about 75 %. The researchers in [2–

8] further studied cuisine-ingredient connection, using 39,774

Keywords

recipes from twenty cuisines provided by Yummly3. Similar

Data Science, Machine Learning, Text mining, Classification,

studies were conducted on data from Epicurious [9], Epicurious

Food Recipes, Cuisines.

and Menupan4 [10] and Food, Epicurious5 and Yummly [11]. A

variety of machine learning algorithms, including k-means [2, 9],

1. INTRODUCTION

random forest classifier [2, 5, 6, 8, 9, 10], support vector machine

In response to technological advancements and social changes in

[3, 5, 6, 7, 10, 11], logistic regression [4, 5, 6, 10, 11] and naive

the last decades, the tendency to collect and store recipes only in

Bayes [5, 6, 7, 9, 10, 11], were used in these studies. From several

cookbooks has changed. Numerous online recipe portals started to

tested algorithms, linear support vector machine, reaching up to

rapidly accumulate food-related content, with more and more

80,9 % accuracy in [7], was found to be the most efficient for this

recipes being published online daily. The growth in the amount of

cuisine prediction task based on ingredients.

user-generated recipe data available on the Internet has raised

Other studies focused on the importance of other information

several issues that researchers have been trying to address in

extracted from recipes for cuisine prediction. H. Kicherer et. al.

recent years. The objective of this paper is to explore the

[12] evaluated the use of ingredients and preparation instructions

correlation between cuisine and text-based information in recipes,

for cuisine prediction, conducted on recipes from German website

including recipe name, list of ingredients, preparation

Chefkoch6. The study revealed that ingredients alone are as good

instructions, preparation time, skill level, calories and nutritional

an indicator as the recipe instructions. Whereas a combination of

information. The results of this study address the issue of

information from both – nouns from the instructions and the list

automatic recipe cuisine categorization, making it easier to submit

of ingredients – performs better. T. Ozaki et. al. [13] also

a new recipe and preventing possible additional noise in recipe

demonstrated that, based on Japanese recipes from Cookpad

database – this can be helpful for both the contributors as well as

Data7, certain sets of ingredients and preparation actions deeply

for the culinary website curators.

correspond to cuisine types.

We conducted a series of experiments on a real dataset retrieved

Previous studies have already noted that ingredients reveal

from BBC Good Food1 consisting of 9,080 recipes from various

important information about cuisines and that predicting cuisines

cuisines with data science approaches focusing on the following:



(1) Providing a detailed insight on the characterization of various

2 https://www.food.com/

food cultures. (2) Identifying necessary text-based information

3 https://www.yummly.com/

4 https://www.menupan.com/

5 https://www.epicurious.com/



6 https://www.chefkoch.de/

1 https://www.bbcgoodfood.com/

7 https://cookpad.com/

31

based on the ingredients is possible. Though, to our knowledge,

4.1 Data Preprocessing

few researchers have considered using additional text-based

For the dataset to be feasible for the analysis, preprocessing was

information from recipes, for instance, preparation instructions,

performed on the raw scraped data.

preparation time and nutrition facts, as possible attributes in

cuisine prediction. Therefore, there is little understanding of how

During the data cleaning step, missing values and duplicates were

they are related to cuisine types. In contrast to the work presented

resolved by removing these recipes from the original dataset,

above, we performed a richer analysis of recipes with a wider

leaving a subset of 9,080 recipes.

range of attributes extracted from recipes, whereas the dominant

The original dataset included 45 cuisine categories, many of them

approach appears to deal only with ingredients as attributes.

only consisted of few recipes. In the next step of data preparation,

3. DATASET

based on the findings of previous researches of cuisines being

location-dependent [14], we combined smaller cuisines into

Our research was conducted on the crawled data collected from an

bigger regional cuisine categories (e.g. Balinese, Thai,

online food recipe portal BBC Good Food. A dataset of 9,429

Vietnamese and Indonesian into Southeast Asian cuisine) and

recipes was scraped with Python8, using Scrapy framework9 and

therefore reduced cuisine categories to the following 13: African,

CSS selectors, in June 2018.

Middle Eastern, South Asian, Southeast Asian, East Asian,

For each recipe, the following information was provided: recipe

Oceanic, American, Latin American, Western European, Northern

name, cuisine, list of ingredients, preparation instructions,

European, Central European, Eastern European, Mediterranean.

preparation time, skill level and nutrition facts, including the

As highlighted in Table 1, preparation time and nutrition facts are

amount of calories, total fat, saturated fat, total carbohydrate,

numerical, cuisine and skill level are categorical, whereas recipe

sugars, protein, fiber and salt per serving. More details are

name, list of ingredients and preparation instructions are

presented in Table 1.

described in natural language. For all of them, additional

Table 1. Characteristics of text-based information in recipe

preprocessing was needed prior to conducting analyses.

Numerical attributes were standardized, considering certain

Information

Data Type

Description

algorithms used in our research are sensitive to varied number

Arbitrary string described in

Recipe name

Unstructured

scales and intervals used [15]. As scikit-learn algorithms only

natural language.

work on numerical data, categorical data needed to be encoded as

Cuisine

Categorical

One of 45 cuisine types.

numerical. This was done by converting categorical data into

Arbitrary string depicting

dummy variables [16]. For unstructured data to be used for

needed ingredients for

classification, several more text preprocessing methods were

List of

preparation, each ingredient

needed: tokenization, stop word removal, stemming and tf–idf

Unstructured

ingredients

normally consisting of an

term weighting. Tokenization is the process of segmenting a text

ingredient type, an amount and

into identifiable basic linguistic units called tokens, such as words

a unit.

and punctuation [17]. For better processing, all tokens were

Step-by-step instructions for

Preparation

converted to lowercase. Stop words are frequently used common

Unstructured

preparation using ingredients

instructions

words, such as ‘and’, ‘the’ and ‘this’. Because their presence in a

described in natural language.

text fails to distinguish it from other texts and are therefore not

A number representing time

Preparation

Numerical

measured in minutes needed

useful in classifications, they were removed before further

time

for preparation.

processing [18]. We also made a custom list of stop words, where

One of 3 difficulties: easy,

we included numbers that represent amounts and words that

Skill level

Categorical

more effort or a challenge.

represent units, e.g. ‘2’ and ‘tbs’, that would not be of value in the

A number representing

analysis. The same applies to punctuation, therefore they were

nutrition per serving measured

filtered out as well. Next, stemming using the Porter stemming

in kcal for calories intake or in

Nutrition facts

Numerical

algorithm, the process of removing morphological affixes from

grams for fat, saturated fat,

words, which conflate variant forms of a word into a unified

carbohydrate, sugars, protein,

representation [19], was performed. Lastly, for words counts

fiber and salt.

being suitable for usage by a classifier, tf–idf transform was

4. METHODOLOGY

conducted. Tf–idf, short for term-frequency times inverse

The methodology in this paper was implemented in Jupyter

document-frequency, is used to re-weight a words importance

notebook environment10 running Python code and using a

based on a frequency of a world in a document compared to the

combination of Python libraries comprising pandas11, scikit-

appearance in other documents [20].

learn12, NLTK13, seaborn14, matplotlib15 and wordcloud16.

4.2 Exploratory Data Analysis

To get an overall view of the data, exploratory data analysis was



made on preprocessed data using graphs, word clouds and tables.

8 https://www.python.org/

Visualization was especially used to provide clarity on the

9 https://scrapy.org/

characterization of various cuisines.

10 http://jupyter.org/

11 https://pandas.pydata.org/

4.3 Classification

12 http://scikit-learn.org/

Various classification algorithms were used to perform the cuisine

13 https://www.nltk.org/

prediction based on the information from the recipes. The recipe

14 https://seaborn.pydata.org/

15

dataset was randomly divided into training (75 %) and testing set

https://matplotlib.org/

16 http://amueller.github.io/word_cloud/

32





(25 %). The training set was used to train, while the test set was

To give an idea of the ingredients that form an integral part of

used to assess models.

each cuisine, we extracted the most common ingredients in every

cuisine and visualized unigrams from the ingredient list in word

4.3.1 Naive Bayes

clouds. As detailed in Table 2, many ingredients are frequent in

Naive Bayes is based on applying Bayes’ theorem with the naïve

all the cuisines, e.g. oil and onion, hence, these will not be useful

independence assumption between every pair of features.

for prediction. While others are typically used only in certain

Gaussian naive Bayes assumes the probability of features is

cuisines, e.g. soya sauce and clove.

Gaussian. Multinomial naive Bayes implements the algorithm to

Figure 1 represents word clouds consisted of the most common

the usage for text classification [21].

unigrams extracted from the ingredient list for East Asian cuisine.

Although most common ingredients did not give us much insight,

4.3.2 Support Vector Machine

these word clouds do show some typical ingredients, based on

A linear support vector machine constructs a hyper-plane or set of

which they can be distinguished from other cuisines, e.g. sugar,

hyper-planes in a high or infinite dimensional space using linear

flour, milk, cream, chocolate, egg, mayonnaise, butter in

algebra [22].

American cuisine and soy sauce, rice, ginger, soy, chili in East

Asian cuisine.

4.4 Evaluation Metrics

To measure classification performance the following metrics were

used: accuracy and F-score. Accuracy is the percentage of correct

predictions. F-score is a weighted average of the precision and

recall, where precision represents the ability of the classifier not to

label as positive a sample that is negative and recall the ability of

the classifier to find all the positive samples [23].

5. RESULTS

As an initial step, we carried out an exploratory data analysis to

get a better understanding of cuisines and their characteristics.



Figure 1. Word cloud for East Asian cuisine

Table 2 lists average preparation time and calories per serving for

each cuisine. Given the analysis, recipes from Northern Europe,

Middle East and Western Europe take the longest to prepare,

Cuisines also differ on nutrition facts. In Figure 2, for every

whereas recipes from East Asia, Latin America and Southeast

cuisine an average value of each nutrition per serving is presented.

Asia are generally the quickest to prepare. Furthermore, on

average, Mediterranean, Oceanic and American cuisines are high

in energy, on the contrary, Southeast Asian, East Asian and South

Asian have recipes with lower energy values.

Table 2. Overview of the cuisines

Average

Average

Cuisine

Common ingredients

preparation

calories

time [min]

[kcal]

Oil, onion, lemon,

African

51,73

399,68

clove, coriander.

Oil, onion, tomato,

Middle Eastern

76,67

409,11

garlic, clove.

Onion, oil, coriander,

South Asian

53,74

367,50

chili, clove.

Sauce, lime, chili, oil,

Southeast Asian

45,00

350,78

sugar.

Sauce, oil, onion, chili,

East Asian

40,49

363,18

rice.

Oceanic

Sugar, oil, egg.

60,36

430,70

Sugar, butter, oil, flour,

American

57,68

422,37

egg.

Onion, oil, chili,

Latin American

43,17

399,30

coriander, lime.

Western

Sugar, oil, butter, egg,

66,59

394,85

European

flour.

Northern

Oil, sugar, onion, egg,

119,59

374,61

European

cream.

Central

Sugar, butter, egg,

62,73

402,85

European

flour, oil.



Eastern

Oil, butter, egg, flour,

Figure 2. Nutrition facts for cuisines

57,96

390,04

European

garlic.

In the next step, classification algorithms were applied to identify

Oil, garlic, clove,

Mediterranean

48,68

433,36

which text-based information from recipes is needed to perform

tomato, onion.

33

well at cuisine prediction. A classification with multinomial naive

Accessed on: August 16, 2018.R. Ghewari, and S. Raiyani,

Bayes, based on the list of ingredients, proved to be the most

“Predicting Cuisine from Ingredients.” [Online]. Available:

efficient. This model yielded an accuracy of 73,8 %. Less than 1

http://cseweb.ucsd.edu/~jmcauley/cse255/reports/fa15/029.pdf.

% lower was the accuracy obtained with classification based on

Accessed on: August 16, 2018.

[6] S. Kalajdziski, G. Radevski, I. Ivanoska, K. Trivodaliev, and B. R.

recipe name and more than 2 % based on preparation instructions.

Stojkoska, “Cuisine classification using recipes ingredients,” 2018

Classifications based on skill level, preparation time, calories and

41st International Convention on Information and Communication

nutritional information all performed with an accuracy of about 56

Technology, Electronics a nd Microelectronics (MIPRO), 2018.

%. Classification performance based on accuracy and F-score are

[7] R. M. R. V. Kumar, M. A. Kumar, and K. P. Soman, “Cuisine

summarized in Table 3.

Prediction based on Ingredients using Tree Boosting

Algorithms,” Indian Journal of Science and Technology, vol. 9, no.

Table 3. Results of classification

45, Aug. 2016.

Information

Classifier

Accuracy

F-score

[8] T. Arffa, R. Lim, and J. Rachleff, “Learning to cook: An exploration

of recipe data.” [Online]. Available:

Multinomial naive

Recipe name

72,73 %

72,73 %

https://pdfs.semanticscholar.org/3f63/269aa7910774e9386b1ffb340

Bayes

a9e8638c02d.pdf. Accessed on: August 16, 2018.

List of

Multinomial naive

73,83 %

73,83 %

[9] J. Naik, and V. Polamreddi, “Cuisine Classification and Recipe

ingredients

Bayes

Generation,” 2015. [Online]. Available:

Preparation

Multinomial naive

70,97 %

70,97 %

https://pdfs.semanticscholar.org/aaa9/67ce597961bad308ec137a616

instructions

Bayes

9e1aba1fe35.pdf. Accessed on: August 16, 2018.

Preparation

Gaussian naive Bayes

55,29 %

55,29 %

[10] S. Jayaraman, T. Choudhury, and P. Kumar, “Analysis of

time

Linear SVM

55,68 %

55,68 %

classification models based on cuisine prediction using machine

learning,” 2017 International Conference On Smart Technologies

Skill level

Linear SVM

56,12 %

56,12 %

For Smart Nation (SmartTechCon), pp. 1485–1490, 2017.

[11] H. Kicherer, M. Dittrich, L. Grebe, C. Scheible, and R. Klinger,

Gaussian naive Bayes

55,68 %

55,68 %

Calories

“What you use, not what you do: Automatic classification and

Linear SVM

55,68 %

55,68 %

similarity detection of recipes,” Data & Knowledge Engineering,

Nutritional

Gaussian naive Bayes

53,48 %

53,48 %

2018.

information

Linear SVM

57,00 %

57,00 %

[12] T. Ozaki, X. Gao, and M. Mizutani, “Extraction of Characteristic



Sets of Ingredients and Cooking Actions on Cuisine Type,” 2017

31st International Conference on Advanced Information

6. CONCLUSION

Networking and Applications Workshops (WAINA), pp. 509–513,

Thousands of recipes from various cuisines were analyzed with

2017.

[13] K. J. Kim, and C. H. Chung, “Tell Me What You Eat, and I Will Tell

data science approaches with the objective of providing a deeper

You Where You Come From: A Data Science Approach for Global

understanding of culinary cultures and cuisine prediction. While

Recipe Data on the Web,” IEEE Access, vol. 4, pp. 8199–8211,

previous research efforts have mostly used only ingredients for

2016.

cuisine prediction, our findings demonstrate that other text-based

[14] Scikit-learn, “sklearn.preprocessing.StandardScaler.” [Online].

information extracted from recipes can be used as well. While

Available: http://scikit-

ingredients with an obtained accuracy of almost 74 % remain to

learn.org/stable/modules/generated/sklearn.preprocessing.StandardS

be the most efficient, cuisine prediction from recipe name and

caler.html. Accessed on: August 21, 2018.

preparation instructions also performs well. Whereas prediction

[15] Pandas, “pandas.get_dummies.” [Online]. Available:

based on preparation time, skill level and nutrition facts were

https://pandas.pydata.org/pandas-

discovered to be less effective, with about 56 % accuracy.

docs/stable/generated/pandas.get_dummies.html. Accessed on: August 21, 2018.

[16] NLTK, “NLP with Python – Processing Raw Text.” [Online].

7. REFERENCES

Available: http://www.nltk.org/book/ch03.html. Accessed on:

[1] H. Su, M. K. Shan, T. W. Lin, J. Chang, and C. T. Li, “Automatic

August 21, 2018.

recipe cuisine classification by ingredients,” Proceedings of the

[17] NLTK, “NLP with Python – Accessing Text Corpora and Lexical

2014 ACM International Joint Conference on Pervasive and

Resources.” [Online]. Available:

Ubiquitous Computing Adjunct Publication - UbiComp 14 Adjunct,

https://www.nltk.org/book/ch02.html. Accessed on: August 21, pp. 565–570, 2014.

2018.

[2] S. Srinivasasubramanian, B. Kushwaha, and V. Parekh, “Identifying

[18] NLTK, “NLTK HOWTOs – Stemmers.” [Online]. Available:

Cuisines From Ingredients,” 2015. [Online]. Available:

http://www.nltk.org/howto/stem.html. Accessed on: August 21,

https://pdfs.semanticscholar.org/3daa/3c535a3c2580e69984203137

2018.

db3ee6422601.pdf. Accessed on: August 16, 2018.

[19] Scikit-learn, “Feature extraction.” [Online]. Available: http://scikit-

[3] P. Bhat, S. Gupta, and T. Nabar, “Bon Appetite: Prediction of

learn.org/stable/modules/feature_extraction.html. Accessed on: cuisine based on Ingredients.” [Online]. Available:

August 21, 2018.

http://cseweb.ucsd.edu/~jmcauley/cse255/reports/fa15/020.pdf.

[20] Scikit-learn, “Naive Bayes.” [Online]. Available: http://scikit-

Accessed on: August 16, 2018.

learn.org/stable/modules/naive_bayes.html. Accessed on: August 21,

[4] H. H. Holste, M. Nyayapati, and E. Wong, “What Cuisine? - A

2018.

Machine Learning Strategy for Multi-label Classification of Food

[21] Scikit-learn, “Support Vector Machines.” [Online]. Available:

Recipes,” 2015. [Online]. Available:

http://scikit-learn.org/stable/modules/svm.html. Accessed on:

http://jmcauley.ucsd.edu/cse190/projects/fa15/022.pdf. Accessed on: August 21, 2018.

August 16, 2018.

[22] Scikit-learn, “Classification metrics.” [Online]. Available:

[5] R. S. Verma, and H. Arora, “Cuisine Prediction/Classification based

http://scikit-

on ingredients.” [Online]. Available:

learn.org/stable/modules/model_evaluation.html#classification-

http://cseweb.ucsd.edu/~jmcauley/cse255/reports/fa15/028.pdf.

metrics. Accessed on: August 21, 201

34





Introducing Blockchain Technology into a Real-Life

Insurance Use Case



Aljaž Vodeb

Aljaž Tišler

Martin Chuchurski

Faculty of Electrical Engineering and

Faculty of Economics and Business

Faculty of Electrical Engineering and

Computer Science

University of Maribor

Computer Science

University of Maribor

Maribor, Slovenia

University of Maribor

Maribor, Slovenia

aljaz.tisler@student.u

Maribor, Slovenia

aljaz.vodeb@student.um.

martin.chuchurski@student.

si

m.si



um.si





Mojca Orgulan

Tadej Rola

Tea Unger

Faculty of Electrical Engineering and Faculty of Electrical Engineering and

Faculty of Law

Computer Science

Computer Science

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

tea.unger@student.um.

mojca.orgulan@student.

tadej.rola@student.u

um.

si

si

m.si





Žan Žnidar

Muhamed Turkanović

Faculty of Electrical Engineering and Faculty of Electrical Engineering and

Computer Science

Computer Science

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

zan.znidar@student

muhamed.turkanovic@

um.si

um.si





ABSTRACT

The outcome of such approaches is various: (1) proposal and

The paper presents an analysis of a possible introduction of the

prototypes of blockchain-based use cases, which unnecessarily use

blockchain technology into an insurance business use case. The

this technology, (2) prototypes which are consistent with the

technology’s purpose but are unpractical and not user

analysis is focused on the implications such an attempt can have

-friendly, and

from various standpoints and the technical workaround needed for

(3) failed attempts to produce a practical prototype or a production

a prototype to be implemented.

system. In this article, we explore the possibility of introducing the

blockchain technology in an insurance-based use case. The aim was



to explore the possible reasonableness of such a use case, its possible

Categories and Subject Descriptors

restrictions, limitations, advantages and disadvantages. The focus of

H.3.4 [Information Storage and Retrieval]: Systems and

the paper is on thus the implications of such a use case on all related

Software

processes and the overall picture of a possible implementation.





General Terms

2. BLOCKCHAIN

Performance, Economics, Reliability, Experimentation, Security,

A blockchain is an invention that can be seen as a distributed ledger

Legal Aspects, Verification.

of all transactions or events that have been executed and shared



among distributed participants. All transactions are verified with

Keywords

distributed consensus inside the system. Considering basic

blockchain platforms, once a transaction is recorded, it cannot be

Blockchain; Smart contracts; Ethereum; Insurance

removed [2]. Group of verified transactions are stored in a block.



Each block contains a cryptographic hash of the previous block and

1. INTRODUCTION

a timestamp. New linked block strengthens the integrity of the

Blockchain technology nowadays is considered as the new IT

previous one, making the chain extremely tamper resistant and

revolution and even as the messiah for all IT-based problems.

secure. With a public blockchain, a copy of the entire transaction

Nevertheless, as with other innovative technologies, public’s hype

database (ledger) is distributed to the network. Every person can

about the technology is fading. Experts now know that the

view transactions and even participate in a consensus process.

technology is useful only for specific domains and use cases as

public, virtual and untrusted environment or cryptocurrency-based

scenarios. Nonetheless, media is full of articles and news about

corporations and companies using blockchain technology for some

specific use case, which may or may not be fully meaningful. The

result of such news is rising prices of cryptocurrencies and more 35

importantly, rising stock prices of organisations [1].



Blockchain enables a more effective way to solve the virtual

3. USE CASE

currency problem. It solves it in a distributed manner, without the

To test the concept of introducing the blockchain technology in a

need for a central authority [3]. Central authority represents costs

real-life business use case, we chose the insurance domain, which is

and must be trusted to act honestly.

also one of the promising domains for the blockchain technology

Public blockchain is not the only type of possible blockchain

[5].

platforms. There are also private and consortium blockchains [4].

A preliminary result of a market analysis has shown that a possibly

Private blockchains have write permission kept centralized to one

meaningful, but not yet implemented use case would be the lost

organization. That can be useful for a single company for database

baggage insurance. This specific real-life use case nowadays still

management, auditing, etc. In a consortium blockchain partner

represents long-term problems for passengers and airlines. To make

companies are joined together in a trusted and adaptable network.

it as user-friendly and meaningful as possible, an app was envisaged.

The right to read in such blockchain types may be public or restricted

The key functionalities of such an app, as presented in Figure 1,

to the participants.

would be: (1) user scans QR code of the flight ticket, (2) confirms

read data, (3) scans barcode of baggage, (4) acknowledges terms of

2.1 Smart contracts

the smart contract, (5) info about the possible payout is provided.

The concept of smart contract has been known since 1994, when

With help of RFID trackers at the airports the system would be able

Nick Szabo defined it as a "computerized transaction protocol that

to surveillance the position of passenger's baggage based on the

executes the terms of a contract". Inside the blockchain context,

newly confirmed IATA resolution 753. In case of a lost or delayed

smart contracts are stored on the blockchain. They can be presented

baggage, an activation of a blockchain-based smart contract is

as stored procedures in relational databases. Given that smart

executed. A compensation could be given in crypto or fiat currencies

contracts are deployed on the blockchain, they have their own

(ex. ETH, EURO), within 4 levels of payout.



unique addresses. A smart contract is invoked by executing a

transaction to the unique address of the contract. It is then executed

independent and automatically on each node in the network [8].

The contract has its own state and can manage assets on the ledger.

It allows expressing the business logic within a programming code.

A well-written smart contract should describe all the possible

outcomes of the contract. This means that a function would refuse

to execute in case of incorrect (inconsistent with business logic)

parameters [8]. Smart contracts are deterministic - this means that

the same input will always produce the same output. Implementation

of smart contracts on known platforms (e.g., Ethereum), written for

example in the Solidity programming language, the developer is

prevented from writing non-deterministic contracts, since the



programming language does not contain non-deterministic

Figure 1: Poster for a possible lost baggage insurance.

constructs. All communication with a smart contract is done through



cryptographically signed transactions. This means that all

blockchain stakeholders will receive a cryptographically verified

4. IMPLICATIONS

trace of a contract operation.

This section provides the implications of a possible implementation

of a blockchain-based solution as presented in section 3 on three

2.2 Oracles

domains, legal, economic and organizational.

Smart contracts on the Ethereum blockchain platform run within the



Ethereum ecosystem, where they communicate with each other.

4.1 Legal implications

External data can only enter the blockchain (i.e. smart contracts)

Blockchain technology as presented in section 3 raised up some

through external interaction using a transaction. This is also a

legal issues. The main legal question is the General Data Protection

shortcoming of the platform, because the majority of business logic

Regulation (GDPR). GDPR is a legal framework for personal data

is based on external data, which is thus not part of the blockchain

privacy, it has been written by the European Union (EU) and became

ledger (e.g., weather, currency price) [9]. To overcome such a

effective on May 25th. This framework is drastically changing

shortcoming an oracle can be used. Oracle is a trusted data source

business of any digital venture. The Regulation granted EU citizens

that sends external data to a smart contract in form of a transaction.

new rights, e.g., the right to be forgotten and right to request all data

By doing so, it relieves the smart contract of the need to directly

storage and acquisition links. The latter allows an individual to ask

access the desired data outside of the network. Oracles are usually

an organization to delete all their personal data they store. This

offered as a third-party solution [8].

specific right is also the main problem in the blockchain technology.

The oracle service behaves like a data courier where communication

Blockchain technology relies on the principles of decentralization

between the service and smart contract is asynchronous. First, the

and immutability, which means that data stored on the ledger

transaction performs the function within a smart contract in which

cannot be deleted. When this data includes personal data, we have

the instructions for service are sent. The Oracle service will then

a problem in the GDPR area. This is the main implication of this

obtain a result based on the parameters that will be returned to the

domain, since the use case worked on required the processing of

smart contract via a special function (callback) implemented in the

personal data. The main question is thus how to process personal

main smart contract in which we want data (result) from the service

data with the blockchain, but still being able to delete it if needed

[9].

or to process it outside the blockchain. Research shows that many

experts are trying to find a solution [7]. Majority of the solutions

are focused on the off/on chain paradigm, whereby personal data is

never dealt with on the blockchain. Nonetheless, new problems

arise as how to link off/on chain data and if the link itself is a GDPR

violation.



36



4.2 Economic

Figure 2 presents the architecture of the possible solution. Users

The main goal of the solution is to enable air passengers to sign an

connect to the service through a dApp with the option to pay with

ad hoc luggage insurance, which is tied to an airline ticket. The

crypto or fiat currencies. For clarity, the former option will be

blockchain technology will be used for the insurance coverage and

marked with the letter (a), and the latter with (b). There are two

blockchains used, the Ethereum’s MainNet to process paymen

the payout of an insurance premium. The solution should allow the

t

payment of the insurance coverage through cryptocurrencies to get

transactions and our InsurNet for business logic (private Ethereum

the biggest customer coverage. It is a new business model, where the

network). Crypto transactions are first processed on the MainNet

target group are all airline users.

(2a), where an oracle is triggered to convert the value into fiat (2.1a),



before sending it to the InsurNet (2.2a), whereas fiat requests are

The biggest negative factor associated with the possible solution is

processed directly through the API and if successful, forwarded

the volatility of cryptocurrencies. In practice, this represents the

towards the InsurNet (2b) to create the insurance (smart) contract.

possibility that we lose some of our assets as a customer or as airlines.

The InsurNet smart contract uses an oracle deployed at an airline to

In addition to volatility, problems can occur in certain processing

retrieve the status of the baggage (3.1 and 3.2) before processing the

delays. The application itself is also linked to airline and airport data.

business logic to determine the validity of the claim. If the user is

If the system fails, automatic payment is not made possible, nor can

entitled to a payout, the payout oracle is called (4) to determine the

the insurance be concluded. From an economic point of view, the

correct payment method and convert currency if needed. In case the

application also brings many positive aspects. It is about introducing

user paid in cryptocurrency (5a), the payout is processed on the

the possibility of speeding up the rigid process of current luggage

MainNet (6a). Otherwise the FIAT payout is handled off-chain (5b).

insurance and redress. The cost of maintaining a blockchain network



and smart contracts is not negligible. These can be covered through

the annual contribution of airlines for their usage of such a possible

solution. At the same time a certain percentage can be collected from

each insurance.



The economic advantages of such a solution are many: (1)

introduction of new technology, (2) the possibility of ad hoc

insurance, and (3) a new business model.



4.3 Organizational

One of the main problems of a possible solution are of

organizational structure. For it to make sense, a platform should be

implemented, where all willing airlines could register and provide

baggage insurance to all possible consumers. Each airline can and



should have a partnership with an insurance company. Thus, to

Figure 2: Architectural model of the proposed solution.

complete the registration, the airlines must provide their insurance



price and max payout in case of a lost baggage. Furthermore, the

6. DISCUSSION

solutions must be automatic and enable easy baggage check and

Due to the Ethereum Protocol, where every transaction must be

insurance claim. A simplification of such a request comes with the

validated by miners and added to the block, these can be slowly

IATA Resolution 753, which states that by June 2018, airline

processed. When a user pays insurance with the cryptocurrency

members must be able to, among others, demonstrate delivery of

Ether into the smart contract on the MainNet and the transaction is

baggage when custody changes [6]. This furthermore implies that

confirmed, the function in our smart contract will trigger an event,

the ecosystem must include airports which will provide the data

which we can listen from outside of our dApp. We will detect the

mentioned about the status of the baggage. Technically, a link to a

event only when the transaction is confirmed. Once our server

web service is required, where data about the baggage is accessible.

detects the "Paid" event from the MainNet, it will create a new



smart contract on our private blockchain InsurNet. This is reflected

5. PROTOTYPING

in some latency for the user. With the aforementioned oracle, we

It should be emphasized that blockchain technology is a rather

have two more. One is to verify the location of the luggage, while

unexplored thing. In most cases there are no examples of good

the other one is to process the payment when the event is triggered

practice on process of how the introduction of the blockchain

on InsurNet.

should start.





We can consider the following example where the user pays

After analyzing the possible use-case and its implications we

insurance for one luggage in the cryptocurrency. We will assume

propose a prototype in a form of a decentralized application (dApp),

the average time to validate the transaction on MainNet is 25

based on the Ethereum smart contracts. The front end of the

seconds. The user transfers the cryptocurrency to our smart

solution could be a simple Angular 2 web application with an

contract, where the validation of this transaction takes 25 seconds.

intuitive, user-friendly interface, accessible on multiple devices.

Then, on a triggered event, oracle performs a new transaction on

The main advantage of using a web application as opposed to

our network, where the transaction validation time is defined for 10

device-specific applications, is the support of various operating

seconds. Because the user does not have the luggage yet, after three

systems and models. If a user selects to pay with cryptocurrency,

hours of landing, he performs a payout using the dApp.

he/she can use the plugin MetaMask to connect to the Web3 part of

Transactions are done within 10 seconds. An oracle then performs

the application and send a signed transaction to a smart contract on

a new transaction to write the current location information in the

the blockchain. According to GDPR laws, personal information

smart contract (+ 10 seconds). Since baggage is not yet available,

needs to be delible, therefore it should be stored in a separate

the user is entitled to a payout, which is reflected in a new event

database off-chain, accessible through an API. Such an architecture

where an oracle performs a transaction on the MainNet. The

can be given by storing airline information off-chain and non-

validation of this transaction takes 25 seconds. Thus, it takes at least

identifying user insurance data on the blockchain.

80 seconds for all transaction validations to complete.



37



8. REFERENCES



[1] CB Insights. Companies 'pivoting to blockchain' see huge stock

spikes - but does the hype hold up? CB Insights - Research

7. CONCLUSION

Brief. [Available] 2018.

By proposing the concept of a fully workable prototype, we

www.cbinsights.com/research/blockchain-hype- stock-trends.

demonstrate that a solution is possible. Nevertheless, after

considering all the implications, we conclude that such a solution

[2] BlockChain Technology: Beyond Bitcoin. M. Crosby,

would be unpractical and not user friendly, due to all workaround

Nachiappan, P. Pattanayak, S. Verma and V. Kalyanaraman.

needed in order to prepare a fully working technical solution.

2016, Applied Innovation Review .

Considering the current evolutional stage of the blockchain

[3] Mattila, Juri. The Blockchain Phenomenon – The Disruptive

technology, we conclude that a fully crypto-based solution can be

Potential of Distributed Consensus Architectures. [Available]

met with approval, thus advocating the idea of the blockchain

researchgate.net/publication/313477689_The_Blockchain_P

technology being seen as business disruptor in the sense of digital

henomenon_-_The_Disruptive_Potential_of_Distribute.

money.

[4] EduCTX: A Blockchain-Based Higher Education Credit



Platform. Muhamed Turkanović, Marko Hölbl, Kristjan Košič,

ACKNOWLEDGMENTS

Marjan Heričko, Aida Kamišalić. 2018, IEEE Access , str. 5112

Our thanks to the public scholarship, development, disability and

- 5127.

maintenance fund of the Republic of Slovenia and the project

[5] Bruno Teboul, Frédéric Maserati, Maxime Leroux.

Following the Creative Path to Knowledge 2017 – 2020 (Po

BLOCKCHAIN: CONCEPT AND APPLICATION

kreativni poti do znanja 2017 – 2020) - SmartInsTech.

DOMAINS. Keyrus.

[Available]

http://keyrus-

prod.s3.amazonaws.com/Avis%20d%27expert/Blockchain/Avis

% 20d%27Expert_BLOCKCHAIN-EN%20COM.pdf.

[6] IATA. Baggage Reference Manual. 2018. [Available]

https://www.iata.org/publications/Documents/brm03-toc-

20180523.pdf.

[7] Mercer, Rebekah. Privacy on the Blockchain: Unique Ring

Signatures.

arXiv.

[Available]

2016.

https://arxiv.org/pdf/1612.01188.pdf.

[8] Podgorelec, Blaž. Arhitektura za nadgradljivost in zamenljivost

pametnih pogodb na platformi Ethereum. s.l. : DKUM, 2018.

[9] Zdun, Maximilian Wöhrer and Uwe. Design Patterns for Smart

Contracts in the Ethereum Ecosystem. univie.ac.at. [Available]

8 2018.

http://eprints.cs.univie.ac.at/5665/1/bare_conf.pd

38





A Brief Overview of Proposed Solutions to Achieve

Ethereum Scalability



Blaž Podgorelec

Patrik Rek

Tadej Rola

Faculty of Electrical Engineering and Faculty of Electrical Engineering and Faculty of Electrical Engineering and Computer Science

Computer Science

Computer Science

University of Maribor

University of Maribor

University of Maribor

Maribor, Slovenia

Maribor, Slovenia

Maribor, Slovenia

blaz.podgorelec@um.si

patrik.rek@um.si

tadej.rola@student.um.si





Muhamed Turkanović



Faculty of Electrical Engineering and

Computer Science

University of Maribor

Maribor, Slovenia

muhamed.turkanovic@um.si



ABSTRACT

conclude that it is becoming increasingly popular. The increase in

Blockchain technology is part of Gartner’s top technological

popularity consequently affects the increased number of

trends in the following five years, whereby already moving away

transactions performed within the Ethereum blockchain network

from the peak of the inflated expectations on its hype cycle,

[2], whereby we can assume that the number of business

towards the slope of enlightenment. With the development of the

processes that are implemented with the help of blockchain

blockchain technology, the emergence of completely new business

technology and Ethereum is also increasing.

processes is anticipated, as well as changes to existing business

All transactions transmitted on the blockchain network are

processes, which will include the use of blockchain technology in

irreversibly recorded in a shared ledger among all network nodes

its implementation, partially or completely, thereby taking

[3, 4]. Nodes in the blockchain network perform a protocol,

advantage of the benefits that the technology itself offers.

defining the ability to create new blocks with associated

Nevertheless, the technology has several drawbacks, whereby the

transactions in an approximate 15 seconds time frame. This allows

most vivid is the scalability problem. With the introduction of

the frequency of transactions executed in the network to be

Blockchain 2.0 and the Ethereum platform, the scalability

approximately 7 - 15 transactions per second (tp/s) [5]. The open

problem seemed settled out for a moment, which proved

source Ethereum platform is based on a permisionless and

otherwise with first generations of non-fungible tokens and high

publicly accessible blockchain network, which is at the same time

traffic. Although Ethereum is in its infancy, progress is on high

a distributed and decentralized operating system for running smart

tracks, with this year’s focus on the infrastructure. A lot of

contracts via its Ethereum Virtual Machine (EVM). Because of

research and work is being done on the Ethereum’s layer 2 scaling

the platform indigenous crypto currency called Ether, generated

solution such as the state channels, plasma and sharding. This

by the blockchain network and defined by the protocol, the

paper presents a brief overview of the current state of the

platform is often used as a payment system, like the Bitcoin.

mentioned proposed solutions and some ongoing projects, which

Therefore it is often compared to existing non-crypto payment

are focused on their implementation.

solutions, such as Visa, which, unlike the Ethereum platform, is

capable of processing a much larger number of transactions

Categories and Subject Descriptors

(56,000 tp/s) [6].

H.3.4 [Information Storage and Retrieval]: Systems and

In the paper, we will present the problem of scaling the

Software

Ethereum network and the proposed solutions. These solutions

could increase the number of transactions carried out on the

General Terms

Ethereum platform, thus getting closer or exceeding the

Performance, Design, Reliability, Experimentation, Security

processing capacity of existing non-crypto payment systems. This

would enable the development and implementation of new

business processes with the blockchain technology.

Keywords

Blockchain, scalability, Ethereum, channels, plasma.

2. ETHEREUM SCALING PROBLEM

The current implementation of the Ethereum protocol requires the

1. INTRODUCTION

processing of all, within the network transmitted transactions, as

In recent years, on the basis of an increase in the market

well as the storage of all states, from each node in the network,

capitalization [1] of the Ethereum platform, the performance of

that acts as a validator [7]. To confirm a change of the network

which is based entirely on the blockchain technology, we can

state with a transaction, the transaction must be included in a

39

block created by a node, which must solve the calculation puzzle

network due to the need for processing transactions of

defined by the distributed consensus protocol, which is in the

blockchain networks [12].

current Ethereum version the Proof of Work (PoW). The

The described "simple" solutions directly relate to the so-called

processing speed of the transactions is limited by the capacity of

trilemma of blockchain technology, which says that the

each individual node participating in the network as the

blockchain network can contain only two of the three features,

transaction validator. Such an implementation of the protocol

such as:

provides increased safety in terms of secure processing of

transactions within the network, which is one of the key properties

-

Decentralization

of such systems. At the same time, the way in which an increased

security is achieved, is a major obstacle achieving a greater

-

Scalability

number of transactions carried out within the blockchain network,

-

Security

due to its need for heavy computation [8].

In the case of the use of different altcoins, this would mean

The number of transactions one block can include is limited by

increasing the efficiency (scalability) of transaction processed

the number of gas (fee for processing the operations within the

within the blockchain network, while in contrary a reduction of

transaction), that can be consumed by all transactions in the block.

security of the network itself. The increase in the limit of number

In the future, it is possible to expect a change in the way of

of transactions in a single block and the aggregation of

reaching consensus between the individual nodes in the Ethereum

computational power or the share between different blockchain

network. Namely, the transition to the use of the Proof of Stake

networks would theoretically increase the efficiency (scalability),

(PoS) protocol is planned, which would mean that the time of

which would require greater use of computational power for the

block generation within the Ethereum network with associated

processing of all requirements within the blockchain network

transactions could be reduced to an average of four seconds [5].

from the network nodes. This reduces the possibility of equal

The transition to a new protocol for reaching consensus among

participation in the network by nodes with less computational

the nodes in the blockchain network will thus reduce the current

power, which can lead to a reduction in the decentralization of the

scaling problems. In addition, the switch to PoS distributed

blockchain network by nodes who have greater computing power

consensus will decrease the required computational power and

[8].

thus energy consumption of the network.

In the following chapters, we will present some solutions that

Changing the network consensus protocol between nodes will

could solve the described problem of efficiency, whereby not to

have a positive effect on the transaction processing frequency

affecting one of the described properties of the trilemma of the

within the blockchain, but it is expected that the number of

blockchain technology.

processed transactions will still be significantly smaller compared

to the existing payment systems. Described problems in the terms

of achieving greater efficiency of blockchain, assuming

3. PROPOSED SOLUTIONS

knowledge of its structure and understanding of the concepts of

The main concern of blockchain technology is the security and a

the blockchain technology, offer so-called "simple" theoretical

distributed consensus in a decentralized network. The processing

solutions, such as:

of every transaction by all nodes of the network is a process that

provides these characteristics but does not provide enough

1.

It envisages the use of different "altcoins" within a

measure for increasing efficiency and scalability. Below we

variety of separate blockchain networks, which results

describe some already proposed solutions, which can help

in a strong increase in the flow rate of the performance

increasing the efficiency and scalability of the Ethereum

of individual transactions within the separate blockchain

blockchain network without undermining the security and

networks. As a result, due to the increased number of

decentralization of the network as such.

different blockchain networks, a reduced number of

nodes within different blockchain networks are

3.1 State channels

expected, which would mean that separate blockchain

One of the proposed solutions, which is currently considered to be

networks will be more susceptible to attacks by

the most mature and used, is based on the transaction processing

malicious nodes than if all network nodes are merged

approach outside the blockchain network (i.e. off-chain) through

within a single common blockchain network [9, 10].

the establishment of state channels [13]. The proposal of the

solution derives from the so-called payment channels, the purpose

2.

Increasing the limit of the number of transactions per

of which was to allow multiple micro-transactions between two

block or increasing the ceiling of fuel consumption in

users of the system without the need of transmitting each

the case of the Ethereum protocol, theoretically implies

transaction through the blockchain network [14].

a large number of processed transactions. Nevertheless,

this requires significantly more computational power

While payment channels focus on off-chain processing of

(for using the PoW protocol, or the percentage (stake)

payment transactions, the purpose of the "state channels" is to

when using the PoS protocol) to validate a block with

establish a channel, through which the state can be changed

an increased number of transactions of an individual

outside the blockchain network, between predefined participants

node in the network [9, 11].

[15]. This is because Ethereum blockchain holds the state of each

defined variable of every deployed smart contract. The need to

3.

Combining computational power (when using the PoW

process a transaction within a blockchain network occurs only in

protocol) or stake (when using the PoS protocol)

case of disagreement about the state changed by a transaction

between the different blockchain networks, can

within the established channel by any participant or in the case of

theoretically increase the flow of transaction processing,

a closed communication within the channel. In case that there is

but this could burden each individual node in the

40

no disagreement about the changed state during the

3.3 Sharding

communication within the established channel, this solution

With the current implementation of the protocol, each node that is

significantly increases the number of transactions, since it

part of the Ethereum network must validate every transaction,

aggregates micro transactions and issues them as one in a

which ensures a high level of network security. One solution is

predefined time [16].

sharding, where the protocol would separate the network state into

State channels are implemented with the help of dedicated smart

smaller partitions, called shards. Each shard would store its

contracts. The establishment of communication through such a

separate state and transaction history. By implementing such a

channel is carried out with a special “channel smart contract”,

protocol, certain nodes would process only the transactions of

aimed at ensuring fair communication between participants that

certain shards. Transactions on different shards at the same time

perform operations and record the final state into the blockchain

would increase the permeability of these [20].

network, after the communication has ended. In case of a conflict

Sharding is a general technique used in distributed computing, the

between participants in communication outside the blockchain

implementation of which can be expected in Ethereum by 2020

(within the channel), the smart contract has the task of selecting

[21]. Implementation of sharding is the only one of the described

the most relevant last state that the users still agreed on when

scaling solutions that will practically have no impact on end users,

communicating within the channel [17]. The security of such an

as well as not on smart contract developers on the Ethereum

off-chain communication approach is based on the fact that each

platform. The system for storing states will remain the same. The

message sent through the status channel is cryptographically

change will be at layer 1 of the Ethereum Protocol. Solutions

signed, with the aforementioned channel smart contract having an

mentioned in 3.2. and 3.1. will work on layer 2 [22]. Sharding

implementation for verifying these messages. Each participant can

eliminates the need for the entire network (each node) to process

cancel the communication at any time, and the final state that is

all transactions. The result is increased number of processed

recorded in the blockchain is that which is recognized by all

transactions per second [21].

participants in the off-chain communication [15].

Prior to implementing sharding in the protocol, various challenges

This type of communication allows the implementation of more

must be addressed. The main challenge is a single-shard take over

complex operations defined within smart contracts, completely

attack. With such an attack, an attacker could possibly take

independent of the blockchain network. Consequently this means

control of the entire shard, which may result in the avoidance of

almost instantaneous execution of operations with very low total

sufficient validations, or even worse, to validate the blocks that

costs of execution of all implemented channel transactions, since

are incorrect. These attacks are usually prevented by random

all transactions carried out within the established off-chain

sampling schemes. The next challenge is the availability of states

channel are aggregated into a single transaction [17, 13].

between different shards. The most appropriate approach for

addressing this challenge is that the effect of a transaction

3.2 Plasma

depends on the events that happened before in the second shard.

The scalability of the Ethereum network with theoretically trillion

A simple example is the transfer of money where the user A (e.g.

transactions per second should be achieved by the introduction of

in shard 2) transfers money to user B (e.g. in shard 7). First, a

a strategy called Plasma. Similarly, as in the solution described in

debit transaction is executed that destroys the tokens at user A (in

Chapter 3.1, the purpose of Plasma is to implement transactions

shard 2), after which a "credit" transaction is created that creates

without the need for individual confirmation of each of them by

the tokens of user B (in shard 7). This transaction has an account

the blockchain network. The solution envisages the introduction

indicator on a "debit" transaction, which proves that the "credit"

of several side chains, whereby the last state of the newly created

transaction is legitimate [8].

chain being recorded in i.e. the main blockchain network. This

could be implemented without any need to change the current

4. CONCLUSION

protocol and Ethereum network. The most important factor in

In the paper, we presented several different solutions, the common

terms of achieving security in the Plasma solution, relates to the

purpose of which is to achieve greater efficiency of scalable

privilege of every user to perform transactions within any side

transaction processing in the Ethereum blockchain network. State

chain (with the exception of the main Ethereum chain) and to

channels move state modifications outside of the main blockchain

leave the side-chain and write the final state in the main Ethereum

network. The Plasma solution envisages the introduction of

chain - where the final valid state is defined. To prevent the

several blockchains, whereby each chain is used for a specific

recording of a false state into the main chain, the Plasma solution

purpose. Both solutions allow users to record the final state in the

suggests a "Challenge mechanism", which assumes that the state

main Ethereum blockchain network. We also descried the

that a user wants to record in the main chain is frozen for a certain

sharding solution, the introduction of which, in contrast to the

period. During this period, other users can prove that the

above-mentioned solutions, requires the change of the lowest

proposed state is not relevant. Because of the above mechanism,

layer of the Ethereum protocol. All the described solutions pursue

the user must provide a sum of the Ether cryptocurrency into such

the goal of not reducing the current level of transaction processing

a transaction that writes the state into the main Ethereum chain,

security, as well as maintaining the decentralization of the

which if another user proves that such a transaction contains an

blockchain itself in order to achieve scalability. In the future, due

invalid state, loses and is acquired by that user, who proved the

to the increase in the number of transactions transmitted within

invalid state. This mechanism could trigger a lot of false evidence

the Ethereum network, it is reasonable to expect several concrete

of invalid transactions; therefore, a user wishing to prove an

implementations (Loom Network, OmiseGO, Raiden,...) of the

invalid transaction must pledge a sum of the Ether cryptocurrency,

described solutions, as well as an increased use of these in

which in the case of false evidence of invalidity, is acquired by the

practice, since it is the increase in the efficiency of the transaction

user of the original transaction [18, 19].

processing which is one of the key factors in achieving the

41

optimization of existing and new business processes, supported by

[12] A. Judmayer, A. Zamyatin, N. Stifter, A. G. Voyiatzis, and E.

the blockchain technology.

Weippl, “Merged mining: Curse or cure?,” Lect. Notes

Comput. Sci. (including Subser. Lect. Notes Artif. Intell.

5. ACKNOWLEDGMENTS

Lect. Notes Bioinformatics), vol. 10436 LNCS, pp. 316–333,

The authors acknowledge the financial support from the

2017.

Slovenian Research Agency (research core funding No. P2-0057).

[13] P. Mccorry, S. Meiklejohn, and A. Miller, “Pisa : Arbitration

Outsourcing for State Channels.”

6. REFERENCES

[14] “Lightning Network.” [Online]. Available:

[1] “Total Market Capitalization,” coinmarketcap.com, 2018.

https://lightning.network/. [Accessed: 01-Aug-2018].

[Online]. Available: https://coinmarketcap.com/charts/.

[15] J. Coleman, L. Horne, and L. X. L4, “Counterfactual:

[Accessed: 06-Jul-2018].

Generalized State Channels,” 2018.

[2] “Ethereum Transaction Chart,” etherscan.io, 2018. [Online].

[16] S. Dziembowski, L. Eckey, and S. Faust, “Perun : Virtual

Available: https://etherscan.io/chart/tx. [Accessed: 06-Jul-

Payment Hubs over Cryptocurrencies.”

2018].

[17] S. Dziembowski, S. Faust, and K. Hostáková, “Foundations

[3] B. Podgorelec, “Arhitektura za nadgradljivost in

of State Channel Networks,” pp. 1–56, 2018.

zamenljivost pametnih pogodb na platformi Ethereum,”

University of Maribor, 2018.

[18] J. Poon and V. Buterin, “Plasma : Scalable Autonomous

Smart Contracts Scalable Multi-Party Computation,”

[4] M. Pustisek, A. Kos, and U. Sedlar, “Blockchain Based

Whitepaper, pp. 1–47, 2017.

Autonomous Selection of Electric Vehicle Charging

Station,” 2016 Int. Conf. Identification, Inf. Knowl. Internet

[19] “Explained: Ethereum Plasma – Argon Group – Medium.”

Things, pp. 217–222, 2016.

[Online]. Available:

https://medium.com/@argongroup/ethereum-plasma-

[5] F. M. Benčić and I. P. Žarko, “Distributed Ledger

explained-608720d3c60e. [Accessed: 02-Aug-2018].

Technology: Blockchain Compared to Directed Acyclic

Graph,” 2018.

[20] R. Jordan, “How to Scale Ethereum: Sharding Explained,”

2018. [Online]. Available: https://medium.com/prysmatic-

[6] Visa, “Visa Inc. at a Glance,” no. August, p. 1, 2015.

labs/how-to-scale-ethereum-sharding-explained-

[7] V. Buterin, “A next-generation smart contract and

ba2e283b7fce. [Accessed: 01-Aug-2018].

decentralized application platform,” Etherum, no. January,

[21] J. Kim, “Vitalik Buterin: Sharding and Plasma to Help

pp. 1–36, 2014.

Ethereum Reach 1 Million Transactions Per Second,” 2018.

[8] J. Ray, “On sharding blockchains,” github.com/ethereum,

[Online]. Available: https://cryptoslate.com/vitalik-buterin-

2018. [Online]. Available:

sharding-and-plasma-to-help-ethereum-reach-1-million-

https://github.com/ethereum/wiki/wiki/Sharding-FAQs.

transactions-per-second/. [Accessed: 01-Aug-2018].

[Accessed: 04-Jul-2018].

[22] A. Rathod, “We Should See Sharding in 2020 as Part of

[9] “The State of Scaling Ethereum – ConsenSys Media,” 2018.

‘Ethereum 2.0,’” 2018. [Online]. Available:

[Online]. Available: https://media.consensys.net/the-state-of-

https://toshitimes.com/we-should-see-sharding-in-2020-as-

scaling-ethereum-b4d095dbafae. [Accessed: 04-Jul-2018].

part-of-ethereum-2-0-eth-foundation-researcher/. [Accessed:

01-Aug-2018].

[10] A. Back, M. Corallo, and L. Dashjr, “Enabling blockchain

innovations with pegged sidechains,” URL http//www., pp.



1–25, 2014.



[11] GoChain, “GoChain : Blockchain at Scale,” pp. 0–5, 2018.





42





Integration Heaven of Nanoservices

Ádám Révész

Norbert Pataki

EPAM Hungary

Department of Programming Languages

Budapest, Hungary

and Compilers, Faculty of Informatics,

Adam_Revesz@epam.com

Eötvös Loránd University

Budapest, Hungary

patakino@elte.hu

ABSTRACT

benefits, improved scalability, separate responsibilities, bet-

ter maintainability to name but a few [17]. On the other

Microservices have become an essential software architec-

hand, having a software architecture utilizing more than 70

ture in the last few years. Nanoservices as a generalization

own built nanoservices in active development requires spe-

of microservice architecture are getting more and more pop-

cial care for build processes.

ular recently. However, this means that every component

In terms of continuous integration (CI) and continuous de-

has more and more public interfaces, and the number of

livery (CD) – modern software development process frame-

components is increasing, as well.

works pipelines are defined as composable parts of the pro-

Integration hell had been appeared when the number of

cess describing how the product is created, transformed and

developers was increased. The developers work parallelly,

delivered from making source code and configurations on

so it is necessary to merge their work. Collaboration re-

developers workstations to serving them to end users [16].

quires software support, such as version controll tools and

The pipelines mentioned in this paper are executed by

continuous integration servers.

automation systems following deterministic scripts referred

However, modern software development tools such as build

as “pipeline scripts”.

systems, testing frameworks and continuous integration servers

This paper discusses the topic of bulk management of uni-

become sensitive regarding the version of source code to deal

fied pipeline scripts in aspects of reproducibility, replayabil-

with. This can result in exponential explosion in many ways

ity, compactness and overhead of change management.

when nanoservices are in the focus.

This paper is organized as follows. We present the prob-

In this paper, we argue for workflow that can handle this

lem of integration hell in section 2. We describe the problem

exponential explosion. This workflow can be included into

in section 3. Our proposed workflow is presented in section

continuous integration servers as jobs in order to execute test

4. Finally, this paper concludes in section 5.

cases in a reproducible way even if the test cases deal with

special environment specifications. Moreover, the workflow

is able to deal with building and artifact publishing pro-

2.

INTEGRATION HELL

cesses, as well.

2.1

Case

Categories and Subject Descriptors

The subject of the study is a software running on top

of a container orchestration system operating over multiple

D.2.7 [Software Engineering]: Distribution, Maintenance,

nodes. Using event sourcing with Command Query Respon-

and Enhancement; K.6.3 [Computing Milieux]: Software

sibility Segregation (CQRS), the software utilizes over 70

Management

services.

Every own built service is stored in its own version con-

Keywords

troller system (VCS) repository [14]. Most of them are iden-

tical in the aspect of programming language, project struc-

Nanoservices, Integration, version control

ture, packaging system, types of artifacts, testing frame-

works, static analysis system (e.g.

[11]).

The discussion

1.

INTRODUCTION

continues about this kind of services.

Microservices and nanoservices are essential software ar-

2.2

Orchestration

chitectures recently. These software architectures have many

A container orchestration tool manages resource alloca-

tions, configurations, credentials of containers.

Provides

common internal network with service discovery, domain

Permission to make digital or hard copies of all or part of this work for

services, serving well defined endpoints for outer network

personal or classroom use is granted without fee provided that copies are

communications.

not made or distributed for profit or commercial advantage and that copies

In terms of scalable services, operating with nanoservices

bear this notice and the full citation on the first page. To copy otherwise, to

an orchestration tool must provide load balancer service

republish, to post on servers or to redistribute to lists, requires prior specific

over multiple nodes ensuring high availability. Also provides

permission and/or a fee.

declarative configuration and deployment management with

CSS ’18 Ljubljana, Slovenia

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

the ability of rolling updates and rollbacks between config-

43

uration and deployment versions also.

2.4.5

Common pipeline

Currently the industry standard for a real battle tested,

The subject project uses mostly Java Spring Boot nanoser-

serious production-grade orchestration tool is Kubernetes,

vices, which kind of services have a common pipeline script

developed by Google [9].

actively developed.

The common pipeline script contains the following stages:

2.3

Build tools

Modern programming laguage ecosystems have their own

• VCS checkout

(sometimes multiple) package manager for dependency han-

• Build source code using package manager (like npm,

dling and easy build, test, install and deploy management

Gradle, Cabal, etc.)

[12].

The common pipeline script utilizes those package

managers, reaching higher level of abstraction [10]. For ex-

• Run tests on the artifact using package manager

ample:

• Sending the source code to the static analysis system

• Java, Scala: Gradle[3], Maven[5], Ant[1]

• Building Docker image artifact

• JavaScript - NodeJS: NPM[6], Yarn[8]

• Uploading artifacts

• C++: Conan [13]

• Announcing build status on channels (email, instant

• Python: Pip[7]

messaging)

• Haskell: Cabal[2]

Since these are nanoservices, their Docker images differ

only on the built artifact. The configurations, including en-

• Docker (images): Docker (registry) [15]

vironment variables, configuration and secret files, are han-

Closed source software projects as the subject utilize arti-

dled by the orchestration tool and building them into an

fact repository systems which can serve repositories for mul-

image is an anti-pattern in this use case.

tiple type of packages for own artifacts and serve as cache

2.5

Integration hell definition

for public domain packages (in case of outage and lowering

network traffic). For example: Nexus, JFrog Artifactory.

Integration hell is a place where developers have to main-

tain all the pipeline scripts manually for each service or use

2.4

Pipelines

a common pipeline script and update all the source codes

and configurations on each service repository to be compat-

The services are built automatically on VCS commit on

ible with the pipeline script. Also called one pipeline script

marked branches. Build pipeline scripts of actively devel-

over all.

oped services have to be in sync in order to guarantee the

same level of quality and compatibility with environment

(following its changes).

3.

PROBLEM STATEMENT

2.4.1

Pipeline script

3.1

Build job generation

A pipeline script is interpreted by a CI tool, a build system

The jobs are generated depending on the VCS repository

(e.g.

Jenkins [4]), is a sequence of commands optionally

path structure. The generator job accepts the list of the

separated into stages.

service names to make build job for. The build jobs are

generated from template, the only difference is in the source

2.4.2

Pipeline script stage

code repository URL and the project name.

A pipeline script stage is a named sequence of commands.

Used for visualizing the main parts of the script, leverag-

3.2

Single pipeline script repository approach

ing process status display during execution, variable scope

Having dozens of services with identical pipeline scripts, it

segregation.

would come in hand to use the exactly same pipeline script

file checked out from one build script repository.

2.4.3

Pipeline command

Each pipeline command can be variable declaration and

3.2.1

Limitations of updates

definition (including functions), function invocation, shell

The single pipeline script repository approach has mul-

invocation.

tiple pitfalls. Since the the job configuration has only the

Ideally, a build system has its own pipeline script domain-

repository, the branch name and the path of the pipeline

specific language (DSL) with an application-programming

script, any change on the pipeline script would affect all the

interface (API) library for common operations like VCS check-

build jobs at once. In this case either the ability to create

out, packaging operations, status notifications, common con-

experimental changes on the build scripts is lost or the abil-

figuration and secret storage operations.

ity to recreate all the build jobs without breaking any of

them.

2.4.4

Build job

In common CI tools, each pipeline script invoked by a

3.2.2

Lack of replayability

corresponding build job. These jobs contain metadata for

Other problem regarding the single repository approach

running the pipeline script, like the location of the pipeline

is the lack of replayability.

Having a case when recreat-

script itself. Storing and passing variables like job name,

ing an artifact based on an older state of the service source

parameters (given on job invocation via API call or web

code repository is needed, there is no guarantee the cur-

UI).

rent state of the pipeline script in its repository is backward

44





Figure 1: Sequence diagram of the proposed work-

flow

compatible, so there is the risk of broken or unstable build

(in worse case it turns out in production). The correct build

script should be searched in the history of the pipeline script

repository (see Figure 1).

Figure 2: Sequence diagram of the single source of

3.2.3

Growing overhead

truth approach

The mentioned problems are getting harder to resolve as

the size of the software project (the number of services) is

This solution does not introduce the problem of difficult

growing. The maintenance cost of those pipeline scripts is

generator job but still carries the synchronization problem.

high. Onboarding a new developer-, handing out the de-

Pipeline scripts are being modified in multiple cases. There

velopment of such project could be extremely difficult due

are cases which are not strictly drived by source code changes.

to the multiple tools and sytems, scripts and their difficult

Having the case of enriching the log of the pipeline script in

dependency graph.

order to leverage traceability of the process. This change

is made only in the pipeline script and the side effects are

4.

PROPOSED WORKFLOW

present only on the pipeline script log. Has no side effect on

the artifacts or test results. There are multiple open ques-

Addressing these problems a reasonable solution could be

tions about which service VCS repository has to be updated

a property file in each service source code repository. This

first, which should be the subject of experimental changes

approach makes the generator job more difficult since every

and how to update all the other service pipeline script?

invocation it should parse the property file of every repos-

itory and generating the job according to that. An other

4.4

Automatized script updating

problem is the synchronization of those property files.

Addressing these questions, there is a pipeline script in the

4.1

Single source of truth

VCS repository but unlike the single pipeline script reposi-

tory approach (see 3.2), the service build jobs are not refer-

There is an other, more compact, more robust and more

ring to the script repository. There is a synchronization job

redundant way to address the problems. The single source

introduced instead. The pipeline script synchronization job

of truth for service artifact build workflows should be the

takes service name list as its arguments as the service build

repository of their source code. This approach leverages the

job generator job does. The pipeline script updater job has

compactness of each service. The service VCS repository

permission to update the service VCS repositories. To en-

should contain the source code of the service, package de-

force traceability an issue id referencing an issue describing

scriptor (build scripts included) and the pipeline script. This

the change and its cause is recommended to be present in the

approach can be seen on Figure 2.

commit message in all affected VCS repository. The figure

4.2

Utilization of VCS

3 presents this workflow.

Since the VCS repository handles the pipeline script along

with the source code, any arbitrary snapshot (commit) of

5.

CONCLUSION

the repository in any time of its history should contain the

Microservices and nanoservices are popular software archi-

pipeline script which executes exactly the same pipeline with

tectures. On the other, dealing with complex software devel-

exactly the same result any time.

opment processes and many different development software

tools, the maintenance can be a critical problem because of

4.3

Keeping job generator simple

the combinatorical explosion.

45



[12] M. P. Martinez, T. László, N. Pataki, C. Rotter, and

C. Szalai. Multivendor deployment integration for

future mobile networks. In A. M. Tjoa, L. Bellatreche,

S. Biffl, J. van Leeuwen, and J. Wiedermann, editors,

SOFSEM 2018: Theory and Practice of Computer

Science: 44th International Conference on Current

Trends in Theory and Practice of Computer Science,

Krems, Austria, January 29 - February 2, 2018,

Proceedings, pages 351–364, Cham, 2018. Springer

International Publishing.

[13] A. Miranda and J. a. Pimentel. On the use of package

managers by the C++ open-source community. In

Proceedings of the 33rd Annual ACM Symposium on

Applied Computing, pages 1483–1491, New York, NY,

USA, 2018. ACM.

Figure 3: Sequence diagram of the proposed work-

[14] S. Phillips, J. Sillito, and R. Walker. Branching and

flow

merging: An investigation into current version control

practices. In Proceedings of the 4th International

Workshop on Cooperative and Human Aspects of

This solution holds some security concerns like the up-

Software Engineering, CHASE ’11, pages 9–15, New

dater pipeline execute right has to be available for restricted

York, NY, USA, 2011. ACM.

group of users since the VCS enables Jenkins to commit to

[15] Á. Révész and N. Pataki. Containerized A/B testing.

the master (trunk) branch.

In Z. Budimac, editor, Proceedings of the Sixth

The current prototype version is restricted to only one

Workshop on Software Quality Analysis, Monitoring,

kind of services to upgrade their build pipeline. Enabling

Improvement, and Applications, pages 14:1–14:8.

modular build scripts and their modular upgrade could be a

CEUR-WS.org, 2017.

next iteration. The bulk update problem could be derivated

to a version controll system problem, updating common files

[16] S. Stolberg. Enabling agile testing through continuous

in two or more repositories. In context of build systems like

integration. In Agile Conference, 2009. AGILE ’09.,

Jenkins (git) submodules could not be an optimal solution

pages 369–374, New York, Aug 2009. IEEE.

increasing complexity.

[17] E. Wolff. Microservices: Flexible Software

The proposed solution grants the robust script handling

Architectures. CreateSpace Independent Publishing

workflow allowing bulk pipeline script updates and replaya-

Platform, 2016.

bility. It introduces some additional difficulty with the up-

date process but it has been automatized. The approach

reached a single source of truth state for each service artifact

creation process and the refered source is the VCS repository

which is a great tool to manage and observe the whole devel-

opment of its content through time. The approach reduces

the cost of maintaining pipeline scripts.

6.

REFERENCES

[1] Ant. https://ant.apache.org/.

[2] Cabal. https://www.haskell.org/cabal/.

[3] Gradle. https://gradle.org/.

[4] Jenkins. https://jenkins.io/.

[5] Maven. https://maven.apache.org/.

[6] Npm. https://npmjs.com/.

[7] Pip. https://pypi.org/project/pip/.

[8] Yarn. https://yarnpkg.com/.

[9] D. Bernstein. Containers and cloud: From LXC to

Docker to Kubernetes. IEEE Cloud Computing,

1(3):81–84, Sept. 2014.

[10] C. Ebert, G. Gallardo, J. Hernantes, and N. Serrano.

Devops. IEEE Software, 33(3):94–100, May 2016.

[11] G. Horváth and N. Pataki. Source language

representation of function summaries in static

analysis. In Proceedings of the 11th Workshop on

Implementation, Compilation, Optimization of

Object-Oriented Languages, Programs and Systems,

ICOOOLPS ’16, pages 6:1–6:9, New York, NY, USA,

2016. ACM.

46





Service Monitoring Agents for DevOps Dashboard Tool

Márk Török

Norbert Pataki

Department of Programming Languages

Department of Programming Languages

and Compilers, Faculty of Informatics,

and Compilers, Faculty of Informatics,

Eötvös Loránd University

Eötvös Loránd University

Budapest, Hungary

Budapest, Hungary

tmark@caesar.elte.hu

patakino@elte.hu

ABSTRACT

sends report to the developers regarding the changes and

their effects [6].

Deployment of the compiled application

DevOps is an emerging approach that aims at the symbiosis

and its necessary dependencies can be launched in various

of development, quality assurance and operations. Develop-

infrastructures [4]. Virtual machines in cloud, Docker con-

ers need feedback from the test executions that Continuous

tainers on a host take part in the deployment frequently [5].

Integration servers support. On the other hand, developers

Configuration management tools (e.g. Ansible) can execute

need feedback from deployed application that is in produc-

specific code snippets for the deployment. Monitoring and

tion.

logging of the started application is useful to detect every

Recently, we are working on the dashboard tool which vi-

kind of runtime phenomenon and orchestrate the application

sualizes the runtime circumstances for the developers and

seamlessly [3].

architects. The tool requires runtime circumstances from

However, tools landscape is missing good tools which are

the production environment. In this paper, we introduce

able to present the runtime performance of applications in

our background mechanism which uses agents to retrieve

staging or production environment regarding the changes of

runtime information and send it to our tool. We present

the source code. We are working on a dashboard tool to

many specific agents that we have developed for this soft-

visualize how the deployed application behaves in specific

ware. Our approach deals with many useful services and

environment.

Many typical use-cases can be mentioned.

tools, such as Docker and Tomcat.

Does the memory consumption decrease when a feature’s

new implementation is deployed? Which commit may cause

Categories and Subject Descriptors

a memory leak, if it is suspicious. Does the introduction of

D.2.5 [Software Engineering]: Testing and Debugging;

a new feature or API cause increase in the number of end-

D.2.8 [Software Engineering]: Metrics

users? How can one compare the performance of the system

if the webserver or a database server is replaced?

Keywords

For our dashboard tool, we have developed many tool-

specific agents to report runtime perception. Our tool vi-

Agents, Monitoring, DevOps

sualizes the reports come from agents. We have developed

agents that deal with Docker, Tomcat webserver, etc. In this

1.

INTRODUCTION

paper, we present our agent-based approach and illustrate

DevOps is an emerging approach in modern software en-

some agents’ internal high-level functions.

gineering.

The key achievements of DevOps are compre-

This paper is organized as follows. In section 2, we briefly

hensive processes from building source to deployment, con-

present the main concept of our tool. After, we present our

tinuous synchronization of development and operations in

agent-based approach in a detailed way with some examples

order to make every new feature delivered to the end users.

in section 3. Finally, this paper is concluded in section 4.

DevOps emphasizes the feedback from every phase.

DevOps-culture uses a wide range of software tools. Au-

2.

DASHBOARD TOOL

tomation of build processes is essential solution for many

years. Continuous Integration (CI) servers track the version

A safe software development requires control over the en-

control system if a change of the source has been commited

tire software development lifecycle (SDLC). During the de-

[7]. In this case, the CI server (e.g. Jenkins [1]) starts the

velopment, it is essential to avoid memory leakage, or overuse

compilation process and executes the test cases and finally,

of the CPUs. To get a good overview of the resource uti-

lization engineers, DevOps engineers have to keep their eyes

on these units that means they have to monitor their envi-

ronments by using tools that can reflect the status of the

Permission to make digital or hard copies of all or part of this work for

different services, databases, network I/Os, or the amount

personal or classroom use is granted without fee provided that copies are

of written/read blocks.

not made or distributed for profit or commercial advantage and that copies

In this chapter, we would like to give a brief introduction

bear this notice and the full citation on the first page. To copy otherwise, to

about our Dashboard tool which can help developers to get

republish, to post on servers or to redistribute to lists, requires prior specific

metrics about their environments. Developers can declare

permission and/or a fee.

new environments on the board and assign charts to them.

CSS ’18 Ljubljana, Slovenia

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

A chart represents a single observable unit from the real en-

47



vironment. Metrics are provided by agents which run on the

period. Beside these steps, an agent also has minor charac-

machine where the application is deployed. A continuously

teristics, like

running agents send the gathered information back to the

•

Listener. This way software and DevOps engineers can get

Runs as a daemon

an accurate picture immediately. A screenshot can be seen

• Validates the configuration file to have proper keys

in Figure 1 about how a chart looks like.

• Validates the values in the configuration yml file

• Checks whether the related OS-level dependencies ex-

ist

• Transfers the collected metric to JSON

Beside these steps, an agent also has minor characteristics.

It runs as a daemon.

It checks whether the related OS-

level dependencies exist. It transfers the collected metric in

JSON.

All agents require a file that contains specific information

for the observed unit, as well as, parameter for the connec-

tion to the Listener. One file can be used by many agents,

and one file can contain configurations for multiple observed

units.

Here we detail some of the agents mechanism, how they

Figure 1:

Memory consumption of a Tomcat in-

work and what information we can get from the unit.

stance

3.1

Tomcat

Tomcat is one of the most popular and widely-used ap-

3.

AGENTS IN OUR APPLICATION

plication server among Java developers. It provides a sim-

In this section, we give a detailed view of how our agents

ple dashboard-like landing page where software and DevOps

work and what the main steps are that we kept in focus dur-

engineers can manage the deployed packages. Via this page

ing the implementation. Before we go through the agents

those users, who are dedicated to enter the server, can check

listed below, we would like to introduce system require-

the state of their applications. This can be a simple health-

ments. The target hosts are always based on Debian images,

check, the number of threads or how much memory is avail-

or any of its derivatives, like Ubuntu or Linux Mint. As we

able for Tomcat to allocate more space for the applications.

present later, we have strived to use as less dependencies as

The tomcat agent monitors both the inside status page

possible, like OS-related functionalities or commands. Most

and the process itself as well. In the configuration file (see

of the commands come with the basic OS, like

Listing 2), DevOps engineer has to declare specific parame-

ps, but some

of the switches can be different on other OS, like

ters.

-eo is Unix

syntax, but using axo is acceptable on both Unix and BSD

uri : ’ l o c a l h o s t ’

OS, as well.

po rt : 8 080

The architecture consists of a server, the Listener and

u s e r n a m e : ’ admin ’

nodes which serve as hosts for the agents. In our solution,

p a s s w o r d : ’ admin ’

an agent is responsible for the following steps:

pid : 2 4 5 6 7

• After start, it runs endlessly

Listing 2: Agent configuration file example

• Collects the information about the observed unit

If pid is not available, agent monitors the inside status

• Transforms and if necessary aggregates the collected

page only. An example metric that the agent is intended to

data

send towards the Listener can be seen in Listing 3.

• Transfers the data towards the Listener server in JSON

{

format

" s t a t u s ": {

" jvm ": {

At first, we have to start the agent with an agent-specific

" m e m o r y ": {

sub-command and a configuration file which contains all the

" fr ee ": 233 564 5 ,

information that are necessary to observe the chosen unit

" t o t a l ": 8 8 2 3 4 1 2 3 ,

(e.g. see Listing 1).

" max ": 2 4 5 3 4 2 2

}

$ tomcat - a g e n t s t a r t -- fil e c o n f i g . yml

} ,

" c o n n e c t o r ": {

Listing 1: Launching the agent

" r e q u e s t I n f o ": {

When it starts running, it validates the arguments and

" m a x T i m e ": 12 ,

then parse and validates the file against the expected con-

...

figuration settings that are required to the unit. Then it

} ,

starts monitoring and collecting metrics in a specified time

" t h r e a d I n f o ": {

48

" m a x T h r e a d s ": 1 ,

pa th : ’/ logs ’

" c u r r e n t T h r e a d C o u n t ": 1 ,

fi le : ’ obs erv ed ’

" c u r r e n t T h r e a d B u s y ": 0

f o r m a t : ’ S E V E R I T Y || ’

}

n u m b e r _ o f _ l i n e s : 10

}

}

Listing 5: Example configuration for the log agent

}

The path tag is responsible for the path of the folder which

is considered as a log folder and

Listing 3: Example metric sent by the tomcat agent

file is the observed unit. To

distinguish an ERROR leveled message from other messages

that contains the word error, engineers have to declare the

3.2

Docker

format of the log. The last key is responsible for the number

Containerization is new directive in virtualization: this

fetched and forwarded messages. A sent message example

lightweight approach supports operating system-based iso-

sent can be seen in Listing 6.

lation among different pieces of the application. Container-

{

ization is on the crest of a wave since Docker has been de-

" l i n e s ": [ ".. ."] ,

veloped. Docker provides a systematic way to automate the

" s e v e r i t y ": {

fast deployment of Linux applications inside portable con-

" in fo ": 655 ,

tainers [2].

" w a r n i n g ": 848 ,

The name of docker is basically almost equivalent of con-

" e r r o r ": 2 ,

tainer for most of the engineers. Docker, just like Tomcat,

" f a t a l ": 0

provides a calculation on how much memory it consumes or

}

what the total bytes of the received and transmitted data

}

is over the network for each container. These are the stats.

Without declaring any specific container name in the config

Listing 6: JSON message example sent by the log

file, the agent sends information about all the containers at

collection agent

the same time that are shown up in the stats. An example

message can be seen in Listing 4.

Since an agent is run on a machine by an arbitrary user,

the software, DevOps and test engineers have to take care

{

that the observed log can be any file depends on the privi-

" c o n t a i n e r s " : [

leges of the user.

{

" pid ": 38 ,

3.4

Host Machine

" na me " : ’ j i n g l e _ b e l l ’ ,

The host machine which the agent is executed on, can be

" cpu " : 1.86 ,

a real machine, a virtual machine or a container whether it

" mem ": {

is on local or on remote. Whichever the host machine is,

" u s a g e ": " 1 6 8 . 2 M " ,

from the agent perspective they are the same. From inside

" l i m i t ": " 1 5 . 4 3 G " ,

out it seems that machine has memory, CPU (or GPU),

" p e r c e n t a g e ": 1 .06

hard disk and other resources. These resources are reachable

} ,

for the agents that means agents can use them. Having a

...

picture about the usage and consumption of these resources

}

are essential.

]

With this agent, we can monitor the above-mentioned re-

}

sources and gather their metrics. These metrics are cumu-

lated, agent takes, for example the total memory, the total

Listing 4: JSON message example sent by agent

swap memory or the size of the available space on the hard

disk, regardless which processes use them.

3.3

Log

Here we would like to give a view which metrics are taken

One of the most important mirror of the status of an ap-

during the agent’s execution.

We arranged the resources

plication is its logs. It could contain all the steps that an

into three groups. All the metrics belong to the memory, or

execution takes and provide those steps in different granu-

CPU, or disk storage (volume).

larity.

The two main approaches in case of this agent are, first,

3.4.1

Memory

get the last n messages from the log and forward it to the

Memory has multiple parts from total to used to swap.

Listener, and second, get the number of the different severity

To get an accurate picture about the consumption we use,

levels. The earlier can provide a view of the latest messages,

multiple commands that can help calculating the usage of

which is a talkative information based on the error or excep-

the different parts.

The agent uses free (see Listing 7),

tion messages raised in the code. The latter one can show

/proc/meminfo and the vmstat commands to get metrics

the ratio of the different levels giving a clear overview how

about the memory (see Listing 8). All of them provide in-

much warnings or errors get hit during the execution. To

formation about how much total memory is in that host,

get these two metrics we mentioned above, engineers have

what the size of the cached swap or how much memory is

to use such a configuration seen in Listing 5.

free or how much is available for allocating new processes.

...

$ f ree - m

49

t o t a l

u sed

fre e ...

$ df - t e x t 4

Mem :

1 5 8 0 2

5 485

570 7 ...

F i l e s y s t e m

1 K - b l o c k s

U s e d A v a i l a b l e Use % M o u n t e d on

/ dev / n v m e 0 n 1 p 5 1 2 0 4 6 2 0 6 4 7 7 2 5 9 4 9 2

3 7 0 4 0 3 9 6

68%

/

Sw ap :

204 7

0

204 7

Listing 11: Using the df command

Listing 7: Using the free command

{

{

" f i l e s y s t e m ": "/ dev / n v m e 0 n 1 p 5 " ,

" Mem ": {

"1 k _ b l o c k s ": 1 2 0 4 6 2 0 6 4 ,

" t o t a l ": 15802 ,

" us ed ": 7 7 2 5 9 4 9 2 ,

" us ed ": 5485 ,

" a v a i l a b l e ": 3 7 0 4 0 3 9 6 ,

" fr ee ": 5707 ,

" use ": 68 ,

" s h a r e d ": 2088 ,

" m o u n t e d _ o n ": "/"

" bu ff / c a c h e ": 4609 ,

}

" a v a i l a b l e ": 789 4

} ,

Listing 12: Sent JSON message about volume usage

" Sw ap ": {

" t o t a l ": 2047 ,

" us ed ": 0 ,

4.

CONCLUSION

" fr ee ": 204 7

DevOps is an emerging approach that aims at the symbio-

}

sis of development, quality assurance and operations. Devel-

}

opers need feedback from the test executions that CI servers

support. On the other hand, no tools have been created that

Listing 8: Sent message about memory consumption

support feedback from the production enviroment to the de-

velopers to follow up the code changes and its effect on the

3.4.2

CPU

end-users and the production or the staging environment.

There are plenty of tools that provide the opportunity to

In this paper, we argue for a new tools into the DevOps

monitor the usage of the CPU. Some of them are part of

toolset.

The aim of this tool is retriving and visualizing

the default OS, then the rest come as a third-party tool and

the runtime circumstances of deployed application because

require installation with privileges. We took the focus on

this information can be essential for the developers and ar-

those tools that are part of the OS, or used in wide range, like

chitects. For this tool, we have developed many agents to

vmstat, or iostat (see Listing 9). Both tools can provide a

collect the runtime performance information from specific

picture of the CPU utilization in percentage.

services. In this paper, we presented the mechanism of some

$ i o s t a t - c

specific agents in Linux environment.

L i n u x 4.15.0 -32 - g e n e r i c 2018 -08 -25 _ x 8 6 _ 6 4 _ (8 CPU )

avg - cpu : % u s e r % n i c e % s y s t e m % i o w a i t % s t e a l % i d l e

5.

REFERENCES

24 ,97

0 ,03

6 ,07

0 ,03

0 ,00 68 ,90

[1] Jenkins. https://jenkins.io/.

[2] D. Bernstein. Containers and cloud: From LXC to

Listing 9: Using the iostat command

Docker to Kubernetes. IEEE Cloud Computing,

The agent sends the above information towards the Lis-

1(3):81–84, Sept. 2014.

tener as it seen in Listing 10.

[3] P. P. I. Langi, Widyawan, W. Najib, and T. B. Aji. An

{

evaluation of twitter river and logstash performances as

" us er ": 24.97 ,

elasticsearch inputs for social media analysis of twitter.

" ni ce ": 0.03 ,

In Information Communication Technology and

" s y s t e m ": 6.07 ,

Systems (ICTS), 2015 International Conference on,

" i o w a i t ": 0.03 ,

pages 181–186, New York, Sept 2015. IEEE.

" s t e a l ": 0.00 ,

[4] M. Leppänen, S. Mäkinen, M. Pagels, V. P. Eloranta,

" id le ": 68. 9

J. Itkonen, M. V. Mäntylä, and T. Männistö. The

}

highways and country roads to continuous deployment.

IEEE Software, 32(2):64–72, Mar 2015.

Listing 10: Sent JSON message about CPU usage

[5] Á. Révész and N. Pataki. Containerized A/B testing. In

Z. Budimac, editor, Proceedings of the Sixth Workshop

3.4.3

Volume

on Software Quality Analysis, Monitoring,

Volume usage does not belong to the major metrics of

Improvement, and Applications, pages 14:1–14:8.

the previously mentioned three units. Though it can tell

CEUR-WS.org, 2017.

useful information about a running application. To get a

[6] J. Roche. Adopting DevOps practices in quality

metric about the volume agent uses df (see Listing 11) and

assurance. Commun. ACM, 56(11):38–43, Nov. 2013.

du commands. Both of them are responsible for giving a

[7] S. Stolberg. Enabling agile testing through continuous

view of how much space is taken by a folder or how the

integration. In Agile Conference, 2009. AGILE ’09.,

size of the local storage changes. Moreover, agent can be

pages 369–374, New York, Aug 2009. IEEE.

parameterized. It takes the path to the observed folder or

partition of the storage of type of the disk. The agent sends

aggregated information as it seen in Listing 12.

50





Incremental Parsing of Large Legacy C/C++ Software

Anett Fekete, Máté Cserép

Eötvös Loránd University

Faculty of Informatics

Budapest, Hungary

{hutche, mcserep}@inf.elte.hu

ABSTRACT

incremental parsing [14] and the lazy analysis [10] have been

CodeCompass is an open source project intended to sup-

studied. A great overview of pratical algorithms and the

port code comprehension by providing textual information,

exsiting methodology is given by Tim A. Wagner in [13].

source code metrics, version control information and visu-

C/C++ language-specific compilation tools [12, 4] and pro-

alization views of the file and directory level relations for

gramming environments [7] supporting incremental parsing

the analyzed project. Regarding the typical software de-

have also emerged as an advancement.

velopment methodologies (especially the agile ones), only a

smaller portion of the code base is affected by any change

CodeCompass [9] is an open source, scalable code compre-

during a shorter amount of time (e.g.

between nightly

hension tool developed by Ericsson Ltd. and the Eötvös

builds), therefore parsing the entire project each time is un-

Loránd University, Budapest to help understanding large

necessary and expensive. A newly introduced feature, in-

legacy software systems. Its web user interface provides rich

cremental parsing is intended to solve this problem by only

textual search and navigation functionalities and also a wide

processing files that have been recently changed and leaving

range of rule-based visualization features [5, 6]. The code

the rest alone. This is achieved by the maintenance of the

comprehension capabilities of CodeCompass is not restricted

project workspace database followed by the partial parsing

to the existing code base, but important architectural infor-

of the project. The feature has been tested both on medium

mation are also gained from the build system by processing

and large scale projects and proved to be an effective tool

the compilation database of the project [11]. The C/C++

in CodeCompass.

static analyzer component is based on the LLVM/Clang

parser [1] and stores the position and type information of

Categories and Subject Descriptors

specific AST nodes in the project workspace database to-

D.2.3 [Software Engineering]: Coding Tools and Tech-

gether with further information collected during the parsing

niques; D.3.4 [Programming Languages]: Processors

process (e.g. the relations between files). By introducing

the concept of incremental parsing into CodeCompass we

General Terms

can detect the added, deleted or modified files in the pro-

gram and carry out maintenance operations for the database

Management, Languages

of the code comprehension tool in only the required cases.

Thus the required time of the reanalysis can be reduced by

Keywords

multiple magnitudes.

code comprehension, software maintenance, static analysis,

incremental parsing, C/C++ programming language

In this paper first we present our research in Section 2 on

how we extended the static analytical capabilities of the

1.

INTRODUCTION

CodeCompass code comprehension tool with incremental

One of the main tasks of a code comprehension software

parsing. Then Section 3 demonstrates the usability of the

tool is to provide exact textual information and visualiza-

concept by showcasing incremental parsing and measuring

tion views regarding the analyzed codebase to support the

its performance on a medium and a large size C/C++ soft-

(newcomer) developers in understanding the source code.

ware. Finally, Section 4 concludes the results and discusses

For an enterprise software under development this requires

further research opportunities.

the frequent static reanalysis of the program, which could

take several hours for a large legacy software.

Performing a complete static analysis each time is a signif-

2.

METHODOLOGY

icant waste of computational resources, since in most cases

A major consideration of the introduced incremental pars-

(e.g. between nightly builds) only a few percent of the file

ing feature was to integrate it seamlessly into the existing

set has been affected by any change. In order to boost the

parsing process by not differentiating in how an initial or a

parsing and compilation process and to provide richer user

follow-up incremental parse should be initiated. This was

experience in integrated development environments (IDEs)

achieved by utilizing the partial parsing feature of Code-

[8], the concept of incremental parsing and compilation has

Compass, which means that the tool is capable of continu-

been researched since decades. More recently further ap-

ing a previously aborted analysis, by omitting the already

proaches, like the involvment of version control systems into

parsed files which are present in workspace database.

51



Therefore the main concept of the introduced incremental

parsing feature consists of two steps: i) perform a database

maintenance operation, where the project workspace is re-

stored into a state that ii) the existing partial parsing can

finish the procedure.

2.1

Determining file states

When a new parse is being done in incremental mode, the

state of each file is determined first. Let FDB be the file set

stored in the workspace database and FDISK be the file set

stored on the disk. An f ∈ FDB ∪ FDISK file may take one

Figure 1: Traversal directions

of the three states listed as follows.

Proof. Let G = (V, W, E) be the directed acyclic graph

Added files f is added to the project since the latest parse

(DAG) of header inclusions with V containing the file set as

if f ∈ FDISK but f /

∈ FDB.

vertices and E being the set of upward connections, n := |V |,

e := |E|. Let W ⊆ V denote the set of directly changed files,

Deleted files f is deleted from the project if f ∈ FDB but

k := |W |.

f /

∈ FDISK .

Modified files f is modified when f ∈ F

Let NG(v) be the neighborhood file set of vertex v in G,

DB ∩ FDISK at

the time of the new parse but its content has changed

so w ∈ NG(v) ⇔ (v, w) ∈ E. Therefore for a file v we can

since the latest. This can be determined by comparing

define the directly included file set as NG(v) and the includer

the contents that are stored in the database and on

files of v as NGT (v), where GT is the transpose graph of G.

the disk, or by their respective hashes for performance

optimization.

We define up(G, v) and down(G, v) as the file set result of

the upward and downward traversal for v ∈ V in G by the

2.2

Header inclusion traversal

corresponding traversal model, as formally described below:

Specifically when parsing a C or C++ language project,

up(G, v) = {v} ∪ ∀w∈NG(v) : up(G, w)

(1)

changes in header inclusions provide one more challenge to

tackle. Upon the modification of a header file all further

down(G, v) = {v} ∪ ∀w∈NGT (v) : down(G, w)

(2)

files in the inclusion chain depending on it should be consid-

ered as modified, even without containing any direct changes

As a simplification in our model lets assume a uniform

themselves. Therefore when determining the modified state

distribution of header inclusions among the files.

Since

of a file as defined in Section 2.1, the set of files defined

P

deg+(v) = P

deg−(v) = e, the average in-degree

by the header inclusion relationships transitively should be

v∈V

v∈V

and out-degree for a file v is deg+(v) = deg−(v) = e , which

checked for changes. There are two approaches for this, as

n

will be denoted with d henceforth. As a consequence the

described below and shown in Figure 1.

length of the longest path in G is logdn, which is the length

of the longest header inclusion chain in the project, since G

Definition 1. For files a, b and c, given that a is included

was defined as a DAG.

by b and b is included by c, we say that file a is in an upward

connection with b and accordingly file c is in a downward

Therefore the asymptotic tight bound both for up(G, v) and

connection with b.

down(G, v) can be calculated as:

Θ(up(G, v)) = Θ(down(G, v)) = dlogdn = n

(3)

Upward traversal model The upward traversal model

depends on the upward connection between files.

We define up(G) and down(G) as the upward and downward

When resolving the state of file a, its included headers

traversal algorithms which determines indirectly changed

have to be checked for modifications transitively.

files in V through header inclusions from W by the cor-

responding traversal model. We define the computational

Downward traversal model Similarly,

the

downward

complexity of the algorithms as the number of files checked

traversal model uses the downward connections that

for changes in their content (or by their hash). Based on

can be found between files. If a file a is resolved as

Equation 3, the asymptotic tight bound both for up(G) and

modified, all files that include a can be marked as mod-

down(G) can be calculated as:

ified transitively. Note that with this method, the state

of any marked files can be considered final and can be

X

Θ(up(G)) =

Θ(up(G, v)) = n2

(4)

omitted from further inspections.

v∈V

X

Θ(down(G)) =

Θ(down(G, w)) = k ∗ n

(5)

w∈W

Theorem 1. The downward traversal model has better

computational complexity over the upward traversal model,

and therefore is preferred to be used through the incremental

Since k ≤ n and in a typical use case for incremental parsing

parsing.

k n: Θ(down(G)) < Θ(up(G)).

52

An example for the downward traversal model is showcased

in Figure 2. On the left side of the figure the example file set

Table 1: Time measures for incremental parsing the

is shown with header inclusion dependencies denoted as ar-

Xerces-C++ project

rows between them. Directly modified files are marked with

Parse type

Changed files

Time

a dark background, while files requiring expansion through

Full parse

–

2 min 49 sec

traversal to find indirectly changed files are marked with an

1% change

3

10 sec

italic font. Note, that these two categories are equivalent in

5% change

17

21 sec

the initial stage. On the right side of the figure the effects

10% change

35

49 sec

of downward traversing a.h is demonstrated: files c.h, d.h,

f.cpp and g.cpp are also detected as indirectly changed files.

While c.h was also a directly modified file, observe that it

Table 2: Time measures for incremental parsing the

no longer requires downward traversal.

LLVM project by one atomic transaction

Parse type

Changed files

Time

2.3

Database maintenance

Full parse

–

5 h 46 min

As mentioned above, incremental parsing includes some

1% change

28

7 min 30 sec

maintenance of the existing database depending on the state

5% change

142

1 h 58 min

of changed files.

10% change

284

2 h 45 min

1. Added files are perceived as new files to the project

• Carry out all deletions from the database in one single

and therefore are registered into the database.

transaction, so the maintenance is either completely

2. Deleted files need to be purged from the database as

executed, otherwise no changes are performed.

they have been removed from the project.

• Generate multiple file level transactions, so informa-

3. Modified files are handled as if they were a combina-

tion regarding a file is either cleaned from the database

tion of deleted and added files. First, they are com-

or the file is untouched, therefore a consistent state of

pletely wiped out from the database – meaning that

the database is always kept.

all their AST related information and file level rela-

tions are erased –, thus considering them deleted, then

re-registered like newly added files. Directory level re-

Table 2 and Table 3 compare the differences when the

lations are not sufficiently maintainable, but these rela-

database maintenance is executed through a single and by

tions can be effectively computed runtime, on-demand

file level transactions. It is clear that the extensive size of

from the file level relations.

the database rollback log containing all the deletion oper-

ations for a larger quantity of files can significantly hinder

3.

EXPERIMENTAL RESULTS

the effectiveness of incremental parsing, providing signifi-

The go-to projects on which CodeCompass is usually tested

cant difference in the timespan of incremental parsing for

are the Xerces-C++ [3] and LLVM [2] projects.

Both

large size projects like LLVM. Hence while a single transac-

are open source projects that have been under develop-

tion may provide stronger guarantees, file level transaction

ment for several years and therefore are considered legacy

proved to be a more adequate solution, where the required

projects. Incremental parsing was also tested on these two

time is more or less linear with the quantity of parsed files,

as Xerces-C++ is a medium size and LLVM is a large-scale

depending on the length and content of the files in question.

project and contain enough files (respectively 347 and 2845)

to produce a significant difference in runtime between even

small portions of changes in the number of files.

Table 3: Time measures of incremental parsing the

LLVM project by file level transactions

Incremental parsing is aimed to reduce the parsing time of

Parse type

Changed files

Time

builds, especially nightly builds, therefore it was tested on

1% change

28

9 min 30 sec

1, 5 and 10 percent change of the file set, since no bigger

5% change

142

49 min

difference between two builds is presumable. The changeset

10% change

284

1 h 21 min

was generated automatically by random selection of files.1

Table 1 shows the results for Xerces-C++, while Table 2

and Table 3 depict the results for LLVM. All measurements

4.

CONCLUSIONS

were carried out on a standard notebook computer, parsing

Incremental parsing was introduced into CodeCompass to

on 2 processor cores.

reduce the costs of parsing, both time and computational

resources, by omitting unchanged files in the project. The

In order to keep database consistency in case of a graceful

feature distinguishes added, deleted and modified files and

abort or unexpected termination of the parser module, the

handles them accordingly. The early tests of incremental

basic concept is that the maintenance operation of incre-

parsing were run on the Xerces-C++ and LLVM projects

mental parsing must be performed in a transactional mode,

and showed that it works according to its original purpose,

in one of the following ways:

especially in decreasing the timespan of parsing.

While

1Only leaf nodes from graph G introduced in Section 2.2

the results are promising, further challenges include the im-

were included in the changeset, so header inclusions did not

proved reduction of the timespan required by incremental

affect the number of changed files.

parsing through parallelizing the process.

53



Figure 2: Downward traversing of a.h demonstrated on a showcase file set.

5.

ACKNOWLEDGMENTS

[12] T. Tromey. Incremental compilation for GCC. In

This work is supported by the European Union, co-financed

Proceedings of the GCC Developers’ Summit. Citeseer,

by the European Social Fund (EFOP-3.6.3-VEKOP-16-

2008.

2017-00002).

[13] T. A. Wagner. Practical algorithms for incremental

software development environments. PhD thesis,

Citeseer, 1997.

[14] T. A. Wagner and S. L. Graham. Efficient and flexible

6.

REFERENCES

incremental parsing. ACM Transactions on

[1] Clang: a C language family frontend for LLVM.

Programming Languages and Systems (TOPLAS),

https://clang.llvm.org/.

20(5):980–1013, 1998.

[2] The LLVM Compiler Infrastructure.

https://llvm.org/.

[3] Xerces-C++ XML Parser.

https://xerces.apache.org/xerces-c/.

[4] Zapcc – A (Much) Faster C++ Compiler.

https://www.zapcc.com/.

[5] T. Brunner and M. Cserép. Rule based graph

visualization for software systems. In Proceedings of

the 9th International Conference on Applied

Informatics, pages 121–130, 2014.

[6] M. Cserép and D. Krupp. Visualization Techniques of

Components for Large Legacy C/C++ software.

Studia Universitatis Babes-Bolyai, Informatica,

59:59–74, 2014.

[7] M. Karasick. The Architecture of Montana: An Open

and Extensible Programming Environment with an

Incremental C++ Compiler. SIGSOFT Softw. Eng.

Notes, 23(6):131–142, Nov. 1998.

[8] R. Medina-Mora and P. H. Feiler. An incremental

programming environment. IEEE Transactions on

Software Engineering, (5):472–482, 1981.

[9] Z. Porkoláb, T. Brunner, D. Krupp, and M. Csordás.

Codecompass: An open software comprehension

framework for industrial usage. In Proceedings of the

26th Conference on Program Comprehension, ICPC

’18, pages 361–369, New York, NY, USA, 2018. ACM.

[10] V. Savitskii and D. Sidorov. Fast analysis of source

code in C and C++. Programming and Computer

Software, 39(1):49–55, 2013.

[11] R. Szalay, Z. Porkoláb, and D. Krupp. Towards better

symbol resolution for C/C++ programs: A

cluster-based solution. In IEEE 17th International

Working Conference on Source Code Analysis and

Manipulation (SCAM), pages 101–110. IEEE, 2017.

54





Visualising Compiler-generated Special Member Functions of

C++ Types

Richárd Szalay

Zoltán Porkoláb

Eötvös Loránd University, Faculty of Informatics

Eötvös Loránd University, Faculty of Informatics

Department of Programming Languages and Compilers

Department of Programming Languages and Compilers

Budapest, Hungary

Budapest, Hungary

szalayrichard@inf.elte.hu

gsd@elte.hu

ABSTRACT

expressed, yet relied upon by the most trivial codes. What’s more,

In the C++ programming language, special member functions are

the compiler is free to lazily evaluate the generation of these mem-

either user-defined or automatically generated by the compiler.

bers, which results in one such member’s non-availability to only be

The detailed rules for when and how these methods are generated

reported when its usage was attempted. In case the used software

are complex and many times surprise developers. As generated

library is outdated or not easily modifiable, or not open source, this

functions never appear in the source code it is challenging to com-

can result in loss of run-time performance or development effort

prehend them. For a better understanding of the details under the

wasted on having to redesign parts of the software. For discovery

hood, we provide a visualisation method which presents gener-

and understanding of the existence and behaviour of these methods,

ated special functions in the form of C++ source code that in effect

developers can either consult the Language Standard, read Abstract

identical to their implicit versions.

Syntax Trees (ASTs), or view the disassembly of the binary — none

of which is favourable for the average developer.

CCS CONCEPTS

1

# i n c l u d e < i o s t r e a m >

• Software and its engineering → Source code generation;

2

s t r u c t A { i n t x } ;

Software maintenance tools; • Human-centered computing

3

i n t main ( )

{

→ Information visualization;

4

A a1 ;

/ / <− D e f a u l t

c o n s t r u c t o r

c a l l e d .

GENERAL TERMS

5

a1 . x = 5 ;

6

A a2 ( a1 ) ;

/ / <− Copy c o n s t r u c t o r

c a l l e d .

programming languages, software development, visualisation

7

a1 . x = 6 ;

KEYWORDS

8

9

/ /

W i l l p r i n t " 6

5 " .

C++ programming language, compilers, code comprehension, code 10

s t d : : c o u t <<

a1 . x << " ␣ " << a2 . x ;

design

11

}

1

MOTIVATION

Listing 1: Example code which uses a default and a copy con-

structor.

Languages supporting the Object-oriented programming (OOP)

paradigm define a central principle of object lifetime which is sur-

To aid ongoing development and code comprehension of projects

rounded by construction/initialisation and destruction/finalisation.

we introduced a tool that allows pretty-printing the visual represen-

In the Java programming language, apart from the basic default

tation of special member functions that is the closest to how they

construction – where everything is initialised to the respective zero

would be written by developers. To further this aid, we don’t only

value – the developer must explicitly state their intent for different

show the compiler-generated special members, but provide a subset

construction logic, custom finalisation. A special case is when a

of the type’s all member functions which shows both user-written

new object is created from an already existing one, where deep copy

– e.g. a constructor that initialises from a different data type – and

(clone) operations or conversions might be warranted. In C++, how-

the standard, implicit ones. We used the open source LLVM/Clang

ever, the Language Standard specifies that these aforementioned

Compiler Infrastructure [16] for parsing and generation.

actions, in the form of special member functions [8], should have a

The rest of the paper is organised as follows. In Section 2 we

default implementation automatically generated by the compiler

discuss the purpose and rules of C++ special member functions.

if the user does not explicitly write them. The rules which dictate

Then, Section 3 describes the implementation approach and chal-

the conditions for generating the special member functions and

lenges faced with respect to pretty-printing and presentation to the

their behaviour can appear dauntingly complex, and subsequent

developers. The paper concludes in Section 4.

versions of the language standard may revise and elaborate these

rules, increasing their complexity. The most recent, and most sig-

nificant such change was with the release of the C++11 standard,

2

C++ SPECIAL MEMBER FUNCTIONS

which introduced move semantics [9].

Special member functions in C++ denote the functions that are

Modernising code initially written for an older standard can be

necessary for the management of instances’ lifetime. [12] These

cumbersome as the behaviour of special members are never directly

are the constructors, the assignment operator and the destructor.

55





CSS’2018, October 2018, Ljubljana, Slovenia

Richárd Szalay and Zoltán Porkoláb

2.1

Constructors

data member’s destructor is explicitly hidden – this is a common

Constructors are responsible for the initialisation of an object. They

practice for scenarios where a controller has to ensure an orderly

are usually executed together with the memory allocation for the

or batch destruction.

instance. Unless the user specifies and provides any constructor

function, both C++ and Java will generate a default constructor. In

Java, this function initialises every data member to their respective

2.3

Assignment operators

zero value, such as integer 0, rational 0.0, the \0 character, or a

null reference. In C++, the initial state of the members depend on

Contrary to Java, where there exists only primitive types and ref-

the storage scope of the object – in most cases, the memory garbage

erences, C++ is a language with value semantics. Assigning to a

is retained from the memory block where the object is allocated.

reference in Java only results in the actual memory modification

Unlike Java, however, the default constructor is not created if at

of a memory address’ size. The object that is no longer referred by

least one data member does not have a default constructor.

the assigned-away reference is then left for garbage collection, if

Another case of construction is when a new object is initialised

applicable. In C++, however, this means that assigning an object

from the state of another, already living object of the same type. In

to another object of the same type results in the assigned-to object

Java, this functionality can be achieved in multiple ways, one of

having the assigned object’s state’s copy within its own memory

which is by using the special clone() function. This function is

region. Traditionally, copy assignment operators have a “destructor”

defined in Object, and performs a shallow copy of the instance in

part where the current object’s resources and buffers are released,

question, only initialising the new object’s members to the same

and then a “copy constructor”-like logic where the copy of state

value of the cloned one [4, 11]. In case of references to other objects

takes place, however, the developer is free to choose a different

results in aliasing, the sharing of the same resource – usually an

implementation. The compiler-generated copy assignment operator

internal buffer – by two separate entities. Another problem with

implements a memberwise copy assignment for the entire object.

clone() is that the existence of the cloneability marker and the

Thus, the copy assignment operator is not generated by the com-

respective method must exist through the whole chain of the type

piler due to type infeasibility if one of the data members cannot be

hierarchy – it is usually referred to as an epidemic [10]. What’s more,

copy-assigned.

cloning does not actually invoke a construction, but rather creates

It is noteworthy to mention that not every language has defined

a copy of the memory’s snapshot, which means that business logic

the = assignment operator as an operator: in some languages, such

strictly bound to a constructor, such as initialisation of read-only

as Ada or Pascal, assignment is defined as a statement/instruction,

members, cannot be done. In C++, the default behaviour of the copy

rather than an operator application. This has led to the inability

constructor is to run the copy construction of every data member.

to write copy assignment logic in Ada. To avoid use of assignment

For fundamental types, this means a copy of the value, and for more

on types that are not designed for memberwise copy the limited

complex types their respective copy constructors are called. Thus,

keyword [18] and type-annotation is used.

in case a custom resource which can be properly deep-copied is

In C++ it is commonly referred as The Rule of Three that if any

used the copy constructor that is generated for the object using this

of the copy constructor, copy assignment operator, and destruc-

resource will be sufficient.

tor is written explicitly by the developer, all of them should be

written explicitly. This rule of thumb is not enforced by compil-

ers but considered a good practice, because, as discussed earlier,

2.2

Destructor

explicitly specifying either will not stop the compiler from automat-

The destructor or finalise is called at the end of an instance’s life-

ically creating the implicit definitions of the other special member

time and is responsible for tearing down the state of the instance.

functions.

This most commonly means releasing resources, performing clean-

up tasks and committing changes, e.g. to a database. In Java, the

finalize() method’s implementation is run for an object at an

unspecified point in time when the runtime’s garbage collector de-

2.4

Members for move semantics

cides that the object is to be reaped. [3] The behaviour differences

The release of the C++11 Language Specification has introduced

between Java Virtual Machine versions and the general looming

move semantics, which allows resources to be directly “stolen” by a

of a finalisation never happening for an instance resulted in a con-

variable from another, as opposed to a copy-constructed and the

sensus on not using finalize() – it has also been deprecated

original data’s memory destroyed. [13] This is used heavily with

since Java 9. Instead, the AutoCloseable design pattern is used that

temporary objects which would get destroyed in the next statement.

explicitly requires writing a close() method which executes tear-

The move special members’ default implementation executes a move

down logic, but can be called arbitrarily by the developers when

construction or move assignment of every data member, however,

teardown is deemed necessary, such as at the end of finishing a

the rules for their existence are more exquisite. Move members

database operation. In C++, a destructor can be written by the

are not generated automatically if any explicit destructor, copy or

user or is automatically generated by the compiler. It is always

move member exists, and an explicitly defined move member also

executed immediately when an instance’s lifetime ends. The gener-

turns off the automatic generation of copy members.

ated destructor does nothing in its body, and then the destructor

Accordingly, the Rule of Three has been extended to also include

of each data member is executed individually – as their lifetimes

the two move members, and is referred to as The Rule of Five.

also expired. Thus an implicit destructor always exists unless a

56





Visualising Compiler-generated Special Member Functions of C++ Types

CSS’2018, October 2018, Ljubljana, Slovenia

3

IMPLEMENTATION

lists [5] too. The AST nodes found in the subtrees of these nodes are

3.1

Syntax transliteration

then manually converted into a textual, source code representation.

We used the open source LLVM/Clang Compiler Infrastructure for

s t r u c t A {

parsing and generation of special member visualisations because

A ( )

{ }

/ / The d e f a u l t

c o n s t r u c t o r .

Clang’s object-oriented Abstract Syntax Tree (AST) API allows for

/ / The c o p y c o n s t r u c t o r .

an optimised and maintainable application. An example subtree of

A ( c o n s t A & r h s ) : x ( r h s . x )

{ }

the AST corresponding to the source code in Listing 1 can be seen

} ;

in Listing 2. The copy constructor’s body corresponds to copying the

Listing 4: The special members of the example class in List-

right-hand record’s single data member into the current record’s

ing 1 translated back to source text.

corresponding data member.

CXXConstructorDecl </ tmp / main . cpp : 3 : 8 >

There are three interesting cases that need to be noted, where

i m p l i c i t u s e d c o n s t e x p r A v o i d

explicit source code differs from what a compiler generates for itself

( c o n s t s t r u c t A &) n o e x c e p t i n l i n e

automatically. First of all, the compiler generates the implicit mem-

bers’ arguments without an argument name. One such example

ParmVarDecl 20 f 9 0 c 0 u s e d c o n s t s t r u c t A &

can be seen in Listing 2, where the ParmVarDecl (parameter vari-

C X X C t o r I n i t i a l i z e r F i e l d x i n t

able declaration) has no name, and the initialiser’s DeclRefExpr

I m p l i c i t C a s t E x p r i n t < LValueToRValue >

(declaration reference expression) only refers to this ParmVarDecl

MemberExpr c o n s t i n t l v a l u e

. x

by its memory address, 20f90c0. Such a construct cannot exist in

DeclRefExpr c o n s t s t r u c t A

actual source code. As a remedy, we manually assign the name rhs

l v a l u e ParmVar 20 f 9 0 c 0

' '

to the variable – or in case multiple parameters would be possible,

c o n s t s t r u c t A &

number them as arg_1, arg_2, . . . – and use it in the pretty-printed

CompoundStmt

code.

Another such interesting case is about move constructors and

CXXConstructExpr < c o l : 7 , c o l : 1 1 > s t r u c t A

move assignment operators, namely that the compiler generates the

v o i d ( c o n s t s t r u c t A &) n o e x c e p t

argument as a temporary, an xvalue, from which move operations

can be done. However, T&& rhs written in source code specifies

Listing 2: The Clang AST representation of the implicit copy

a named variable, an lvalue, from whose members move must ex-

constructor’s body, and the call to it in main().

plicitly be specified by using a type annotation std::move, which

Other compilers, might use different internal representations, on

casts the members to be xvalues which denote variables that are

which these transformations would be infeasible to execute – in case

essentially transformed into a temporary and their resources can

of GNU/GCC, the Register Transfer Language (RTL) is only meant

be moved from. The pretty-printer annotates the right-hand sides

to be used by compiler-internal applications and code generation is

of move initialiser or assignment expressions with std::move to

organised into various steps called loops. The example of the same

ensure the same semantics. We only do this for record types, as no

copy construction can be seen in Listing 3, which has already been

fundamental type supports move operations.

stripped of semantic information and only the memory access for

The third case is with regards to inheritance. In case a class has

the data member can be studied from it by humans. It should be

at least one superclass, the special members’ default behaviour is to

noted that the presented representation is the earliest and shortest

cast the current instance to the base class and call the appropriate

where copy construction is apparent on the inner data member level.

constructor or assignment operator for each base class. A core

Previous transformation loops only show the copy constructor’s

principle in object-oriented programming is that up-casting – cast

call source line in it’s original form, i.e. A a2(a1);.

to any base class – is always possible and well-defined, however,

this would result in unintelligible source code lines, such as *this

( insn 7 6 8 2

= rhs; – which would lead to an infinite recursion if written in

( s e t (mem/ c : S I ( p l us : DI ( reg / f : DI 82

source code verbatim. The type system allows us to see that this =

v i r t u a l − s t a c k − v a r s )

is for the base class, so we explicitly wrap the statement into a cast

( c o n s t _ i n t −8 [ 0 x f f f f f f f f f f f f f f f 8 ] ) )

at the appropriate location to show base class initialisation to the

[ 1 a2 +0 S4 A64 ] )

developer. Examples of these cases are depicted in Figure 1.

( reg : S I 9 1 ) ) " / tmp / main . cpp " : 6 −1

We have encountered that the Standard only specifies generating

( n i l ) )

a body for a special member if the currently compiled translation

unit ODR-uses [7] the function. While no compiler error is given at

Listing 3: The GNU RTL of the copy constructor call in line

compilation for an infeasible, implicit deleted special member unless

9 of Listing 1.

used, the type system in Clang annotates the forward declaration

We have utilised Clang’s architecture to perform a parsing on

of the function if it is deleted. Thus by using this annotation and

the translation unit, and then performed a traversal on the built

the related diagnostics, we can, for each member without a body

AST searching for all records, or a particular record with a name

achieve either an explicit body generation or printing the reason

specified by the user. Once the record is found, we visit every special

behind the member being deleted by the type system in a single pass.

member’s body, and in the case of constructors their initialiser

It should be noted that generating the body for members which are

57





CSS’2018, October 2018, Ljubljana, Slovenia

Richárd Szalay and Zoltán Porkoláb

4

CONCLUSION

In this paper, we have discussed the rules and behaviour of auto-

matically generated special member functions, an intrinsic feature

of the C++ programming language. We have introduced an ap-

proach to transliterate the compiler’s internal representation of

these special members to source text to promote understanding

of software projects without resorting to unfavourable techniques

such as reading syntax trees manually.

We have implemented our solution in the open-source code com-

prehension tool CodeCompass [1, 14, 15] — http://github.com/

Ericsson/CodeCompass — as an additional visualisation over C++

files. The upstreaming of this addition is underway at the writing

of this paper.

ACKNOWLEDGMENTS

This work presented in this paper was supported by the Euro-

pean Union, co-financed by the European Social Fund in project

EFOP-3.6.3-VEKOP-16-2017-00002.

REFERENCES

[1] CodeCompass. 2012. A software comprehension tool for large-scale software

written in C/C++ and Java. http://github.com/Ericsson/CodeCompass

[2] Margaret Ellis. 1990. The Annotated C++ Reference Manual. Addison-Wesley,

Reading, Massachusetts, USA.

[3] James Gosling, Bill Joy, Guy L. Steele, Gilad Bracha, Alex Buckley, and Daniel

Smith. 2017. Finalization of Class Instances (1st ed.), Chapter 12.6, 389–393. In

[4]. https://docs.oracle.com/javase/specs/jls/se9/jls9.pdf visited on 2018-08-13.

[4] James Gosling, Bill Joy, Guy L. Steele, Gilad Bracha, Alex Buckley, and Daniel

Smith. 2017. The Java Language Specification, Java SE 9 Edition. https://docs.

oracle.com/javase/specs/jls/se9/jls9.pdf visited on 2018-08-13.

[5] ISO. 2012. Initializing bases and members, Chapter 12.6.2, [class.base.init]. In

Figure 1: Special member overview for a class with two base

[6]. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?

csnumber=50372

classes and a single char data member.

[6] ISO. 2012. ISO/IEC 14882:2011 Information technology — Programming languages —

C++, version 11 (C++11). International Organization for Standardization, Geneva,

Switzerland. 1338 (est.) pages. http://www.iso.org/iso/iso_catalogue/catalogue_

tc/catalogue_detail.htm?csnumber=50372

allowed to have one, and it is only an optimisation that generation

[7] ISO. 2012. One definition rule, Chapter 3.2.3, [basic.def.odr]. In [6]. http://www.

didn’t take place is a non-functional change and does not affect

iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=50372

[8] ISO. 2012. Special member functions, Chapter 12, [special]. In [6]. http://www.

the semantics of the generated code – thus this transformation can

iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=50372

safely be integrated into other compilation steps.

[9] ISO. 2012. Temporary objects, Chapter 12.2, [class.temporary]. In [6]. http://www.

iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=50372

[10] Marián Juhás, Zoltán Juhász, Ladislav Samuelis, and Csaba Szabó. 2009. Measur-

3.2

Special member overview

ing the complexity of students’ assignments. Annales Universitatis Scientiarum

Budapestinensis de Rolando Eötvös Nominatae. 31 (2009), 203–215.

To facilitate better code comprehension, we have decided not only

[11] Zoltán Juhász, Marián Juhás, Ladislav Samuelis, and Csaba Szabó. 2008. Teaching

to show the implicit special members but every related overload of

Java programming using case studies. Teaching Mathematics and Computer

constructors and assignment operators. This allowed us to show

Science. 6(2) (2008), 245–256.

[12] Stanley Lippman. 1996. Inside the C++ Object Model. Addison Wesley Longman,

a subset of the class’ members which are related to the instance’s

Reading, Massachusetts, USA.

lifetime.

[13] Scott Meyers. 2015. Effective Modern C++: 42 specific ways to improve your use of

The full overview proves useful when a special member is de-

C++11 and C++14. O’Reilly Media, Sebastopol, California, USA.

[14] Zoltán Porkoláb and Tibor Brunner. 2018. The CodeCompass Comprehension

faulted. If for example, a class contains some constructors and a

Framework. In Proceedings of the 26th Conference on Program Comprehension

user-defined copy constructor, the move members will not be gen-

(ICPC ’18). ACM, New York, New York, USA, 393–396. https://doi.org/10.1145/

3196321.3196352

erated automatically, however, the developer can explicitly ask the

[15] Zoltán Porkoláb, Tibor Brunner, Dániel Krupp, and Márton Csordás. 2018. Code-

compiler to generate the methods with the implicit body rules by

Compass: An Open Software Comprehension Framework for Industrial Usage.

using the = default specifier, available in C++11 and onwards.

In Proceedings of the 26th Conference on Program Comprehension (ICPC ’18). ACM,

New York, New York, USA, 361–369. https://doi.org/10.1145/3196321.3197546

This is the suggested approach for modern C++, practised by most

[16] The LLVM Project. 2003. Clang: C Language Family Frontend for LLVM. http:

open-source projects. In this case, we show these members’ body

//clang.llvm.org visited on 2018-08-13.

along with the rest of the class with the annotation that the user

[17] Bjarne Stroustrup. 1994. The design and evolution of C++. Addison-Wesley,

Reading, Massachusetts, USA.

requested the body generation.

[18] S. Tucker Taft, Robert A. Duff, Randall L. Brukardt, and Erhard Ploedereder. 2000.

Another case for the full view is showing the reason why a

Consolidated Ada Reference Manual: Language and Standard Libraries. Springer-

Verlag, Berlin, Heidelberg, Germany.

special member was not automatically generated by printing a hint

from the semantic analysis’ diagnostics.

58





How Does an Integration with VCS Affect SSQSA?

Bojan Popović

Gordana Rakić

Naovis d.o.o.

University of Novi Sad, Faculty of Sciences Trg

Bulevar oslobo ¯

denja 30A

Dositeja Obradovića 4

Novi Sad, Serbia

Novi Sad, Serbia

bojan.popovic@primafin.com

goca@dmi.uns.ac.rs

ABSTRACT

Consequently, software analysis tools integrate support for

Contemporary trends in software development almost nec-

VCS. Usually this support means possibility to analyze code

essarily involve version control system (VCS) for storing and

stored to VCS repositories. In some cases tools also rely

manipulation of source code and other artifacts.

Conse-

on advantages of VCS to improve analysis performances or

quently, tools supporting the development process such are

results.

software analysis tools integrate with VCS. In most of cases

tools support only analysis of the resources in VCS reposi-

In this paper we explore potential advantages of integration

tories, while some of them rely on VCS to improve analysis

of SSQSA (Set of Software Quality Static Analyzers) plat-

process and results. In this paper we explore how an inte-

form [9] with GIT [2] as a representative VCS. First, we

gration of the SSQSA platform with VCS influences some of

introduce a concise background by describing VCS (Section

its performances.

2) and SSQSA (Section 3). Prerequisites for the integration

and the integration are described in the section 4. We dis-

Categories and Subject Descriptors

cuss results in the section 5 and possible application models

and scenarios in the Section 6, that is followed by comparison

D.2.8 [Software Engineering]: Metrics—complexity mea-

to related integration solutions (Section 7). We conclude the

sures, performance measures

paper in the Section 8. This paper is summary of a master

thesis described in [8] (in Serbian).

Keywords

Software quality analysis, intermediate representation, Ver-

2.

VERSION CONTROL SYSTEMS

sion Control System

Version control systems (VCS) might have very broad appli-

cation in different areas of content manipulation for personal

1.

INTRODUCTION

or professional purposes. These are tools used primarily to

Quality of a software product is observed through the level of

support teams and individuals in development and mainte-

satisfied requirements. It could be assessed by its execution

nance of a software products. These systems remember all

by applying different techniques of dynamic analysis. These

the changes of separate files, so that at any time we can re-

techniques are applicable when the product is ready for test-

cover a specific version, or follow and compare changes over

ing which might be late to recognize weaknesses or issues.

the time. In this way, all data is safer, good synchronization

On the other side, static analysis techniques are travers-

between the team members is ensured, the possibilities for

ing source code and its various intermediate representations

errors are significantly reduced, and therefore the project

which makes them applicable already in the early phases of

development process is improved.

software development process [5].

VCS are divided into two large groups [2]:

Contemporary software development practice relies on source

code repositories and their synchronization implemented by

various version control systems (VCS). VCS are used to store

CVCS: Centralized Version Control System where all the data

the whole history of activities in the evolution of a software

are stored to a centralized server. This approach is cer-

product, from version information to the finest details about

tainly easier to maintain, but in case of system failure,

every individual change in the repository, including informa-

all information about the project will be lost. Addi-

tion about contributors to the changes.

tionally, availability of a network connection is very

important. Previously, this was the standard way to

execute version control. Representatives of this group

are CVS: Concurrent Versions System [4] and Subver-

sion [3].

DVCS: Distributed Version Control System where clients map

the whole repository. If a server failure occurs, any of

the client repositories can be copied back to the server

to restore it. However, local copy enables us to work on

changes independently of a network connection while

59

Property

Git

Mercurial

derived representations are generated based on eCST, by

Simple GUI

-

+

a unique implementation of the derivation process, ensures

Getting started for beginners

-

+

their language independence and universality, too.

Simplicity branches visualization

-

+

Speed (Windows OS)

-

+

By traversing all or some of these universal intermediate rep-

Speed online

+

-

resentations different analysis algorithms are implemented.

Changing the history

+

+

Therefore, it is possible to have a single implementation of

Using the index

+

-

every functionality that we integrate in the SSQSA which

PL independent extensions

+

-

ensures consistency of the results across different languages,

Repo. migrating to another system

+

-

but also adaptability to a new language and extendability by

a new analysis [9]. Described process and a corresponding

Table 1: Comparison between Git and Mercurial

platform design is illustrated by the Figure 1.

Current version of the SSQSA platform manipulates input

the connection is necessary only for saving changes at

source code from an local directory (components colored by

the remote repository or taking a version from it. Files

gray color), while our primary goal in this research is to

stored on the hard disk are of small size, and hence this

integrate it to analyze the code stored in a Git repository.

does not pose a problem problem of a storage space.

Additionally, we will explore how usage of Git repository for

storing intermediate representation affects SSQSA platform

An additional advantage of DVCS is that we can share the

and its performances. This level of the integration will en-

changes with other team members before they are shared

able us to traverse only changed fragments of the structures,

globally. On the other hand, there is little advantage of cen-

which might further lead to improvement of performances

tralized systems compared to distributed ones. Centralized

of the analyses. The first prototype includes only results of

systems offer us an easier way to control all the people who

generation of eCST in the repository. New components that

access the server, as well as easy provision of a central point

implement integration are yellow-colored in the Figure 1.

where all the changes are in place. They also offer us the

option of downloading only a piece of code, if we only need

4.

THE SSQSA AND GIT INTEGRATION

to work on a project module. However, if needed, one copy

To enable collaboration of SSQSA with Git, it was neces-

of the project in the DVCS can be announces as the main

sary to connect eCSTGenerator to Git repository and to

one, and thus we can simulate the centralized system.

enable it to process the source code stored in it. After the

first connection eCSTGenerator is processing the whole con-

Distractions that can be addressed to distributed systems

tent of the repository and generates its eCST representation.

are more technical. For example, in case of a project with

Every next time, eCSTGenerator will process only changed

many large files that can not be compressed, more storage

files. This feature was not easily implementable before the

space is required. Additionally if we are working on a large

integration with Git.

project that contains many customized changes, download-

ing a full version of the project can take longer than ex-

In addition, SSQSA uses advantages of its integration with

pected, and also take up more space on the hard drive than

Git at one more level.

Namely, after the set of eCST is

expected.

generated, it is stored to a Git repository so that other com-

ponents can also process only changes between versions. For

All described differences bring to the decision to conduct

these purposes we do not use the same repository as it is a

the first experimental integration SSQSA platform with a

dedicated development repository, while developers do not

DVCS. Therefore, we compare Git [2] and Mercurial [6]

have to be affected by the analysis.

as the main representatives of DVCS in order to compare

their properties to our requirements (Table 1).

We can

conclude that Mercurial has better characteristics from the

5.

RESULTS

users point of view, but for our integration these character-

To explore applicability aspects of the described integration

istics do not have value. On the other hand easiness to inte-

solution, we measure time needed for generation of eCST

grate with other systems, possibility to migrate to an other

representation of a JavaScript project ”proton-native”1.

system and speed are extremely important to us. Therefore,

in this work, we integrate SSQSA with Git.

First, we observe time needed only for generation of eCST

representation of the source code from the local folder and

3.

THE SSQSA PLATFORM

compare it to the time needed to generate it for the code

The SSQSA (Set of Software Quality Static Analyzers) [9]

stored in a Git repository (Table 2).

is a set of tools that enables language independent static

software product analysis based on its source code. Lan-

As we can see, for the first commit generation process lasted

guage independence is ensured by a universal intermediate

for significantly longer time.

The reason for this is time

representation of a source code called eCST (enriched Con-

needed for the connection to the Git repository. However,

crete Syntax Tree). Once when this representation is pro-

even though process spends additional time on the connec-

duced for any system, written in any set of programming

tion, in later commits we get better results from the version

languages, it can be transformed to some of derived inter-

integrated with Git.

mediate representations such are dependency networks, at

different abstraction levels, or flow graphs. The fact that

1https://github.com/kusti8/proton-native

60

Figure 1: SSQSA platform and its integration with Git

Version

from a

from a

Time for the

ally works according to the pull-request model on the local

no. of commit

local dir

Git repo

Git connection

level.

7.

744 ms

1250 ms

720 ms

14.

812 ms

1270 ms

754 ms

The most practical model for implementing a new imple-

34.

1589 ms

1353 ms

739 ms

mentation for the use of Git is the pull-request model. A

80.

1601 ms

1520 ms

870 ms

project leader can start an eCST generator on a new repos-

126.

1650 ms

1515 ms

780 ms

itory commit to analyze the modified file. If a developer

wants to create XML trees, it can also launch an eCST gen-

Table 2: Comparison between time needed for eCST

erator at each commit. The problem can arise if more teams

generation proccess from a local directory and from

are made and the eCST generation process is lunched then

a Git repository.

only. In this case it must be adapted to go through all the

commits, not only looking at the latest changes.

Eventually, if we include functionality for committing of gen-

The ”Director and Lieutenant” model is also suitable for new

erated eCST to a repository, time needed for whole process

implementation. Each sub-project has its own leader who

goes over 6000ms. Obviously, in this scenario integration re-

can create XML trees. Also, the leader of the repository

duces performances of SSQSA. Still, further integration will

may generate eCST when joining new changes to a branch

utilize benefits of version control to improve generation of

of a project (merge). Also, if developers want to generate

derived intermediate representations. Finally, it will be in-

XML trees, the same rules apply as with the Pull-Request

tegrated with the analyzers. It can be expected that, with

model.

the growth of data that will be saved up in the exchange,

traverse and analysis process, the benefits from the integra-

The centralized model is the most unpractical model for us-

tion will also grow. Therefore, effects of the integration on

ing the new implementation.

All team members commit

other components still have to be explored (Section 8.

their changes to a centralized repository, which in this case

contains a lot of commits through which traversal should be

6.

APPLICATION SCENARIOS

conducted.

Depending on a scenario, Git has three common application

models: a centralized model, a pull-request model, and a Di-

7.

RELATED SOLUTIONS

rector and Lieutenants model [2]. In a centralized system, all

Many tools also support code analysis from various VCS

members of the team synchronize their changes in a central

such as BCH: Better Code Hub2 and SonarQube3, primar-

repository that stores all source code. In the pull-request

ily because the repositories have become the standard code

model, developers can make changes to his local repository,

storage. However, only some tools rely on versions for more

and he commits them to his own repository, and can see the

advanced analysis.

changes that other team members make. In this model one

repository is considered the main repository. In order to ac-

Lean Language Independent software analyzer (Lisa) is a

complish the changes in it, a request is sent to the project

software that analyzes the quality of software projects. The

leader to pull the changes. The project leader can add devel-

main goal of Lisa is to analyze a large number of project re-

oper’s repository as a remote repository, locally test changes,

visions asynchronously with minimal redundancy. Analyses

and if everything is fine, save them to the main repository.

are aimed to cover as many analyzes, and as many program-

In a Director and Lieutenants model the project is divided

into sub-projects and distributed among teams. Each team

2https://bettercodehub.com/

(sub-project) has its own repository and its leader, and usu-

3https://www.sonarqube.org/

61

ming languages as possible.

These goals are comparable

Integration is developed at two levels.

At the first level

with the goals of SSQSA, as well as the new implementation

the platform is connected to the Git repository in order to

presented in this paper. However, Lisa currently supports

enable processing source code stored in it. At the next level

three programming languages, while the SSQSA framework

of the integration we use Git repository to store XML file

currently allows us to work with more than ten program-

containing eCST intermediate representation of the source

ming languages. Concerning the subject of this paper, We

code so that we can always look only for changes, and not

can note certain differences in the approach to the problem

traverse all the code, or more precisely, eCST representation

and the concrete solution implementation. For the needs

of it. This is very important if we have in mind that one

of the Lisa analyzer, a special interface called SourceAgent

input file (compilation unit) is represented by one eCST.

has been developed. It supports the asynchronous access

to the Git repository and file revisions [1]. On the other

At the first look, the results of the integration are not promis-

hand, SSQSA, with the current implementation, uses all the

ing. Namely, Git connection used the time that we can save

benefits of the Git and the library for interactions with it,

by looking only in the changes and not in the whole source

looks at the differences between the last two committees,

code. However, Without storing trees to the Git repository

and reads all the files that have been changed, and gener-

we are already saving some processing time. In case when

ates XML trees for them. Furthermore, Lisa communicates

we store eCST in a Git repository we are spending more

directly with the Git repository by making a local copy of

time but in the future work we will explore if this cost may

the remote repository to a local hard disk, while our imple-

be payed off after extending this integration on generation of

mentation allows reading from a local disk and thus does

derived representations and analyzers. For example, genera-

not require an internet connection. Internet connection is

tion of dependency network currently traverses all the trees

only needed if we want to save the generated XML tree in a

while after the full integration with Git it will also look only

remote repository.

for changes. The similar expectation we have from an inte-

gration of analyzers with Git. Therefore, these integration

The Analizo is a solution that analyzes source code written

activities will be subject of the a future work, as well as

in different programming languages, whose emphasis was on

analysis of potential costs and benefits, and selection of the

C, C ++ and Java. The analysis supports the reading of

most suitable usage scenarios.

content from remote repositories for each audit in which the

source code has been changed in the project [10] and, un-

9.

REFERENCES

like the SSQSA which currently allows reading of contents

[1] C. V. Alexandru, S. Panichella, and H. C. Gall.

only from the Git repository, allows reading from the Git

Reducing redundancies in multi-revision code analysis.

and Subversion repositories, and then generates CSV files.

In Software Analysis, Evolution and Reengineering

SSQSA also compares file revisions and decides from which

(SANER), 2017 IEEE 24th International Conference

files to create an XML tree. An advantage over Analizo is

on, pages 148–159. IEEE, 2017.

that we can monitor file versions on a remote repository.

[2] S. Chacon and B. Straub. Pro git. Apress, 2014.

Again, the difference is in the number of supported lan-

[3] B. Collins-Sussman, B. W. Fitzpatrick, and C. M.

guages: Analyzo supports three languages, while SSQSA

Pilato. Version control with subversion, 2006.

currently supports more than ten programming languages.

Accessible in URL: http://svnbook. redbean. com, 2007.

[4] D. Grune et al. Concurrent versions systems, a

EvoJava is a tool for static code analysis of an input from

method for independent cooperation. VU Amsterdam.

a Java repository. It uses a VCS to access the code, mines

Subfaculteit Wiskunde en Informatica, 1986.

the source repository, and calculates metrics.

Unlike the

SSQSA platform, EvoJava uses Subversion (SVN) and pro-

[5] G. O Regan. Introduction to software quality.

cesses only .java files. The output file is also in .XML for-

Springer, 2014.

mat, but containing metric results. EvoJava takes a list of

[6] B. O Sullivan. Mercurial: The Definitive Guide: The

the code versions that is in the repository and thus creates

Definitive Guide. ”O’Reilly Media, Inc.”, 2009.

a model based on the XML-generated files [7]. SSQSA, on

[7] J. Oosterman, W. Irwin, and N. Churcher. Evojava: A

the other hand, observes the latest changes that are commit-

tool for measuring evolving software. In Proceedings of

ted to a remote repository, finds these files in the file system

the Thirty-Fourth Australasian Computer Science

and creates XML files based on them. Later it automatically

Conference-Volume 113, pages 117–126. Australian

commits them to a special local or remote repository, where

Computer Society, Inc., 2011.

we can track what changes were made during the evolution

[8] B. Popović. Integration of a platform for static

of our software. We cane also note the variety in supported

analysis with a version control system (in serbian).

programming languages in SSQSA while EvoJava only sup-

Master’s thesis, Faculty of Sciences, University of Novi

ports Java programming language.

Sad, 2018.

[9] G. Rakić. Extendable and adaptable framework for

8.

CONCLUSION AND FUTURE WORK

input language independent static analysis. PhD thesis,

Faculty of Sciences, University of Novi Sad, 2015.

Following actual trends in software development and soft-

ware analysis SSQSA frameworks goes into a direction of

[10] A. Terceiro, J. Costa, J. Miranda, P. Meirelles, L. R.

integration with VCS. In this paper we compare character-

Rios, L. Almeida, C. Chavez, and F. Kon. Analizo: an

istics of different VCS and select Git as a first candidate for

extensible multi-language source code analysis and

the integration. Furtehr, we describe its integration with

visualization toolkit. In Brazilian conference on

Git and explore possible benefits from this integration for

software: theory and practice (Tools Session), 2010.

the performances of the platform.

62





Indeks avtorjev / Author index



Beranič Tina ................................................................................................................................................................................. 23

Chuchurski Martin........................................................................................................................................................................ 35

Cserép Máté ................................................................................................................................................................................. 51

Fekete Anett ................................................................................................................................................................................. 51

Heričko Marjan ............................................................................................................................................................................ 19

Heričko Tjaša ............................................................................................................................................................................... 31

Kamišalić Aida ............................................................................................................................................................................. 19

Karakatič Sašo ........................................................................................................................................................................ 27, 31

Kous Katja .................................................................................................................................................................................... 23

Kuhar Saša ................................................................................................................................................................................... 15

Leppäniemi Jari .............................................................................................................................................................................. 7

Orgulan Mojca ............................................................................................................................................................................. 35

Pataki Norbert ........................................................................................................................................................................ 43, 47

Podgorelec Blaž............................................................................................................................................................................ 39

Podgorelec Vili ....................................................................................................................................................................... 27, 31

Polančič Gregor ............................................................................................................................................................................ 15

Popović Bojan .............................................................................................................................................................................. 59

Porkoláb Zoltán ............................................................................................................................................................................ 55

Rajšp Alen .................................................................................................................................................................................... 23

Rakić Gordana .............................................................................................................................................................................. 59

Rek Patrik ..................................................................................................................................................................................... 39

Révész Ádám ............................................................................................................................................................................... 43

Rola Tadej .............................................................................................................................................................................. 35, 39

Rupnik Rok .................................................................................................................................................................................. 11

Sillberg Pekka ................................................................................................................................................................................ 7

Šimenko Samo ............................................................................................................................................................................. 27

Soini Jari......................................................................................................................................................................................... 7

Szalay Richárd ............................................................................................................................................................................. 55

Tišler Aljaž ................................................................................................................................................................................... 35

Török Márk .................................................................................................................................................................................. 47

Turkanović Muhamed ............................................................................................................................................................ 19, 35

Unger Tea ..................................................................................................................................................................................... 35

Vodeb Aljaž ................................................................................................................................................................................. 35

Welzer Tatjana ............................................................................................................................................................................. 19

Žnidar Žan .................................................................................................................................................................................... 35





63





64





Konferenca / Conference

Uredil / Edited by

Sodelovanje, programska oprema in storitve

v informacijski družbi /

Collaboration, Software and Services

in Information Society

Marjan Heričko





Document Outline


01 - Naslovnica-sprednja-G

02 - Naslovnica - notranja - G

03 - Kolofon - G

04 - 05 - IS2018 - Skupni del

07 - Kazalo - G

08 - Naslovnica podkonference - G

09 - Predgovor podkonference - G

10 - Programski odbor podkonference - G

11 - Clanki - G 01_Soini

02_Rupnik

03_Kuhar

04_Kamisalic Introduction

Methodology Experimental framework

Experimental instruments





Results and discussion Knowledge perception

Knowledge perception and notation





Conclusions

Acknowledgments

References





05_Rajsp

06_Simenko

07_Hericko

08_Vodeb

09_Podgorelec

10_Revesz

11_Torok

12_Fekete

13_Szalay Abstract

1 Motivation

2 C++ Special Member Functions 2.1 Constructors

2.2 Destructor

2.3 Assignment operators

2.4 Members for move semantics





3 Implementation 3.1 Syntax transliteration

3.2 Special member overview





4 Conclusion

Acknowledgments

References





14_Popovic





12 - Index - G

13 - Naslovnica-zadnja-G

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page

Blank Page

04 - 05 - IS2018 - Predgovor in odbori.pdf 04 - IS2018 - Predgovor

05 - IS2018 - Konferencni odbori





11 - Clanki - G.pdf 01_Soini

02_Rupnik

03_Kuhar

04_Kamisalic Introduction

Methodology Experimental framework

Experimental instruments





Results and discussion Knowledge perception

Knowledge perception and notation





Conclusions

Acknowledgments

References





05_Rajsp

06_Simenko

07_Hericko

08_Vodeb

09_Podgorelec

10_Revesz

11_Torok

12_Fekete

13_Szalay Abstract

1 Motivation

2 C++ Special Member Functions 2.1 Constructors

2.2 Destructor

2.3 Assignment operators

2.4 Members for move semantics





3 Implementation 3.1 Syntax transliteration

3.2 Special member overview





4 Conclusion

Acknowledgments

References





14_Popovic