130 Organizacija, Volume 53 Issue 2, May 2020Research Papers Fraud Prevention in the Leasing Industry Using the Kohonen Self- Organising Maps1 DOI: 10.2478/orga-2020-0009 Mirjana PEJIĆ BACH, Nikola VLAHOVIĆ, Jasmina PIVAR University of Zagreb, Faculty of Economics & Business, Trg J. F. Kennedy 6, Zagreb, Croatia, mpejic@efzg.hr, nvlahovic@efzg.hr, jpivar@efzg.hr Background and Purpose: Data mining techniques are intensely used in various industries for the purpose of fraud prevention and detection. Research that focuses on the leasing industry is scarce, although frauds in the field of leasing occur rather often. First, we identify clusters of business clients in one leasing company by using the method of self-organising maps based on leasing contract attributes. Second, we compare clusters based on the presence of fraudulent clients, in order to develop fraudsters’ profiles. Methodology: For detecting characteristics of fraudulent clients, we use a client database containing leasing con- tract attributes of one Croatian leasing company. In order to develop profiles of fraudulent clients, we utilise a clus- tering procedure with the Kohonen Self-Organizing Maps supported by Viscovery SOMine software. Results: Five clusters were identified and labelled according to the modal values of attributes describing the leasing object and the industry in which the client operates: (i) New cars / Trade; (ii) Used trucks or tugboats / Other services; (iii) New machinery / Construction; (iv) New motors / Trade; and (v) New machinery and tractors / Agriculture. Conclusion: Self-organising maps have proved to be a useful methodology for developing profiles of fraudulent cli- ents in leasing companies. Companies can use our results and make additional efforts in monitoring clients from the identified industries, buying specific leasing objects. In addition, companies can apply our methodology to their own databases, in order to develop fraudster profiles for their specific purposes, and implement fraud alert mechanisms in their client database. Keywords: fraud, leasing, self-organising maps, Viscovery SOMine, Ward algorithm, Croatia, data mining 1 1A preliminary version of this research (http://doi.org/10.23919/MIPRO.2018.8400218) was presented at 41st International Con- vention on Information and Communication Technology, Electronics and Microelectronics MIPRO 2018, Opatija, May 21-25, 2018. Received: July 11, 2019; revised: March 30, 2020; accepted: April 8, 2020 1 Introduction Knowledge management consists of the processes of cre- ating, storing/retrieving, transferring and applying knowl- edge (Alavi & Leidner, 2001). The process of knowledge discovery is an important subprocess in knowledge man- agement (Wang & Wang, 2008). Some of the tasks solved by data mining are clustering and deviation detection (Folorunso & Ogunde, 2005), which also includes fraud detection. Numerous other applications are also focused to rare events, such as bankruptcy (e.g. Moradi, Salehi, Ghor- gani & Yazdi, 2013). In this paper, the focus is on fraud in the leasing industry. Frauds represent an issue for leasing companies and regulators, which should be able to predict fraudulent be- haviour and take different actions to prevent losses caused by fraud. Defence against frauds includes the implemen- tation of operational and technical solutions for fraud pre- vention and detection. Fraud detection systems are based on data mining techniques and methods that can discover and visualise patterns related to fraudulent behaviour, such 131 Organizacija, Volume 53 Issue 2, May 2020Research Papers as financial frauds (Sadgali, Sael, & Benabbou, 2019), credit card frauds (Carcillo et al., 2019), and frauds in the insurance sector (Leite, Gschwandtner, Miksch, Gstrein, & Kuntner, 2018). Cluster analyses and profiling of clients based on various behavioural, demographic and opera- tional attributes contained in clients databases are essen- tial tools in analysing transactions, and recognising client profiles, which have been used in various industries, such as banking (e.g. Pejić Bach, Juković, Dumičić, & Šarlija, 2014). Clients profiling based on the cluster analysis has also been used in various researches and has been proved as a useful tool in predicting fraudulent behaviour, which can help companies to develop appropriate fraud detection and response systems, e.g. financial statement fraud de- tection system (Chen, Liou, Chen & Wu, 2019). Current research on fraud detection and prevention in the leasing industry is scarce (Singleton & Singleton, 2007), with only a few examples that present the utilization of data mining techniques for that purpose. For example, Horvat, Pejić Bach and Merkač Skok (2014) used a decision tree model- ling in order to discover fraud in leasing agreements. Self organizing maps have been efficiently used to explain fraudulent behaviour in different contexts of the financial industry, including banking (e.g. Merkevicius, Garšva, & Simutis, 2004; Balasupramanian, Ephrem, & Al-Barwani, 2017) and insurance (e.g. Hainaut, 2019). However, to our best knowledge, previous works did not utilise self-organising maps for fraud profiling in leasing, although self-organising maps have been previously effec- tively deployed for fraud prevention and detection (Jian, Ruicheng, & Rongrong, 2016). The research question that emerges is whether self-organising maps are an appropri- ate method for identifying and describing clusters of cli- ents in the context of the leasing industry, with the specific goal of detecting specific attributes that could explain the fraud in the leasing industry. In order to shed some light on this issue, we develop the methodology for developing fraudsters profiles using self-organising maps, based on the leasing contract attributes. We use the database of one leasing company with the rich data on client characteristics and behaviour, for the identification of fradulent behav- iour. First, we use self-organising maps in order to develop clusters of business clients in a leasing company based on leasing contract attributes. Second, we identify the charac- teristics of fraudulent clients among cluster members. The paper is structured as follows. After the intro- duction, the literature review section describes frauds in the leasing industry and gives an overview of previous research related to fraud modelling. The second section explains the methodology of the research, including the self-organising maps, the sample description, and the sta- tistical analysis. The fourth section provides results of the clustering procedure and the fraud analysis according to client and leasing characteristics. It also contains the in- terpretation of the clusters and profiles of fraudsters for each of the clusters based on all the attributes used for the analysis. The last section is the discussion and conclusion section, which provides a response to the research question and describes the contributions of this research. 2 Literature review 2.1 Fraud in the leasing industry Fraud causes material and immaterial losses to an or- ganisation or a person. According to the Basel Committee (Basel Committee on Banking Supervision, 2002), frauds are loss events that are classified into internal and external frauds. Internal frauds are “losses due to acts of a type in- tended to defraud, misappropriate property or circumvent regulations, the law or company policy, excluding diversi- ty and discrimination events, which involves at least one internal party” (Basel Committee on Banking Supervision, 2002, p.3), such as accounting administrators. External frauds are “losses due to acts of a type intended to defraud, misappropriate property or circumvent the law, by a third party” (Basel Committee on Banking Supervision, 2002, p.3), such as clients or partners. Fraud is often both inter- nal and external. European Commission (2011, p.3) defines a lease as “an agreement whereby the lessor conveys to the lessee in return for a payment or series of payments the right to use an asset for an agreed period”. In order to understand the concept of fraud in leasing, it is necessary to understand ownership rights in the context of the leasing contracts. During different stages of the leasing contract, difficulties in executing ownership rights can occur. Such differences can be the result of the complex leasing law framework (Flath, 1980). However, fraud in leasing, as in other finan- cial industries, is often intentionally conducted by the cli- ent. In that case, leasing companies are usually not able to reach a client or locate a leasing object. For example, fraud happens when a client refuses to return a leasing object after a lease expires. In such a scenario, a leasing company can contact a client and it knows the location of a leasing object but regaining or repurchasing a leasing object is not possible without a complex law procedure. This research focuses on frauds and defaults commit- ted by clients (small and medium companies, and sole proprietorships) in the leasing industry. Defending leasing companies against leasing fraud brings challenging issues both operationally and technically. An efficient fraud de- fence system in the field of leasing has several prereq- uisites. A leasing organisation needs to create anti-fraud measures and introduce them to its employees, as well as to keep employees aware of the fact that frauds are a part of the leasing industry (Boobyer, 2003). Cross-departmen- tal cooperation and communication, especially of sales, human resources, and accounting department, as well as cooperation with external experts are also needed. Addi- 132 Organizacija, Volume 53 Issue 2, May 2020Research Papers tionally, an organisation should establish client verification procedures (Wang, Cheng, & Chen, 2019). In leasing, such procedures are used to verify leasing objects such as verifi- cation of client economic activity, verification of payments and so on. Upgrading information systems with data ana- lytics and warning systems that would support decisions in relation to potentially fraudulent clients are crucial as well (Bănărescu, 2015). 2.2 Fraud modelling Fraud and default modelling are based on various data mining methods. Ngai, Hu, Wong, Chen and Sun (2011) reviewed data mining techniques for the detection of fi- nancial fraud. They concluded that logistics models, neural networks, decision trees, and the Bayesian belief network are the primary data mining techniques for financial fraud detection. Sadgali, Sael and Benabbou (2019) reviewed the performance of various machine-learning techniques such as classification, clustering, and regression for fraud and prevention detection. In addition, visual analysis tech- niques are used for the identification of fraud detection. In identifying and preventing attempts of fraud, detection of suspicious events can be made by using visual analyt- ics techniques (Leite, Gschwandtner, Miksch, Gstrein, & Kunter, 2018), who categorised, described and discussed current visualisation, interaction and analytical methods that can be used in fraud detection systems. Chen, Liou, Chen and Wu (2019) proposed the approach for detecting fraud in the financial statements in business groups by us- ing data mining techniques. However, current research does not conclude which method performs the best in fraud prevention and detection, although several authors identified that neural networks and clustering were the most efficient. Deep convolution neural networks (DCNN) were used to detect fraudsters in customer records of a mobile communication company (Chouiekh & Haj, 2018). The authors stated that DCNN outperforms support vector machines, random for- est and a gradient boosting classifier in terms of accuracy and training duration. Data mining methods have been implemented in various application areas related to fraud. Rousseeuw, Per- orotta, Riani and Hubert (2019) combined the idea of the Fast LTS algorithm (least trimmed squares) for robust re- gression for the detection of unexpected events in time se- ries. These unexpected events are often outliers and shifts that can represent suspicious transactions. An intuitionistic fuzzy set, one of the classification methods, and evidential reasoning were proposed for fraud detection in banking transactions by Eshghi and Kargari (2019), who modelled transactional behaviour by considering the trends of differ- ent variables. The method determines the originality of a newly arrived transaction. Credit card fraud has been researched by several au- thors. Lucas et al. (2020) used a hidden Markov model and a random forest classifier for credit card fraud detec- tion. The hidden Markov model was used to associate a likelihood to a transaction given its sequence of previous transactions. Likelihoods are then used by a random for- est classifier for fraud detection. Ryman-Tubb, Krause and Garn (2018) presented a survey of methods that use AI and machine learning for credit card fraud detection, with the conclusion that in terms of accuracy neural networks were on average better than other techniques. West and Bhat- tacharya (2016) analysed issues of credit card fraud min- ing related to the choice of detection techniques, problem representation, feature and performance analysis. Nami and Shajari (2018) proposed a two-stage method of detect- ing fraudulent payment card transactions. The method is based on k-nearest neighbours, the dynamic random for- est algorithm and the minimum risk model. Patil, Nemade and Soni (2018) used the big data analysis framework and machine learning algorithms for real credit card fraud de- tection. Deployment of a fraud detection system based on machine learning methods in a large e-tail merchant was explored and described by Carneiro, Figueira and Costa (2017). Ensemble learning is a common method used in various practical problems. Zareapoor and Shamsolmoali (2015) evaluated and compared various data mining tech- niques for credit card fraud detection. They presented the decision tree based bagging classifier as the best classifier to construct the fraud detection model. Deep learning neu- ral networks, Generative Adversarial Networks, were used to improve the effectiveness of classifiers for credit card fraud detection by Fiore et al. (2019). Tu, He, Shang, Zgou and Li (2019) proposed convolutional neural networks for the enhancement of anti-fraud systems in the area of e-commerce payments. Several pieces of research have been conducted in the area of insurance. Yan, Li, Liu, and Qi (2020) used an adaptive genetic algorithm with a backpropagation neural network for simulation and prediction of frauds in the au- tomobile insurance claim data. An Artificial Bee Colony algorithm-based Kernel Ridge Regression was proposed for automobile insurance fraud detection by Yan et al. (2019). An Artificial Bee Colony was used for global opti- mization and to optimize the parameter combination of the Kernel Ridge Regression. Wang and Xu (2018) proposed a deep learning model for automobile insurance fraud detec- tion based on text mining. They used the Latent Dirichlet Allocation-based text analytics to extract text features of the descriptions of the accidents in the claims. Deep neural networks are used for detecting fraudulent claims. Neural networks were used to detect fraud in the automobile in- surance industry, with the aim of fraud detection when it comes to personal injury claims (Viaene, Dedene, & Der- rig, 2005). Machado and Santos (2015) used five strategies for auditing vehicle claims and concluded that neural net- works perform the best. Šubelj, Furlan, and Baje (2011) proposed an expert system for the detection of groups of 133 Organizacija, Volume 53 Issue 2, May 2020Research Papers automobile insurance fraudsters by using an Iterative As- sessment Algorithm (IAA). Patel and Singh (2013) used genetic algorithms to detect fraudulent activities in credit card transactions. Fuzzy C-Means clustering and super- vised classifiers comprise the novel hybrid approach that was proposed for detecting fraud in an automobile insur- ance dataset (Subudhi & Panigrahi, 2017). Nian, Zhang, Tayal, Coleman and Li (2016) proposed a spectral ranking method for automobile insurance fraud detection, while Caldeira, Gassenferth, Machado and Santos (2015) used neural networks for the same purpose. Additionally, neural networks were used to detect fraud in the context of bank direct marketing (Zakaryazad & Duman, 2016) and card payments and operations (Dor- ronsoro, Ginel, Sánchez, & Cruz, 1997). Recurrent neural networks were used for the detection of stock price manip- ulation activities by Wang, Xu, Huang, and Yang (2019). The authors concluded that the method could be used to identify unusual trading activities among huge amounts of data. 2.3 Kohonen self-organising maps in fraud research Self-organising maps (SOMs), Kohonen Map or Kohonen Neural Networks are feed-forward neural networks based on unsupervised learning and a clustering algorithm that produces two dimensional and nonlinear mappings of mul- tidimensional data (Urueña López et al., 2019). SOMs are widely used for research in different contexts of the financial industry, including banking, insurance and so on (Van Hulle, 2012). Pejić Bach, Juković, Dumičić and Šarlija (2014) iden- tified three clusters by using self-organising maps for busi- ness clients’ segmentation in the context of the Croatian banking industry, and authors suggested marketing activi- ties for the identified clusters. Holmbom, Eklund and Back (2011) described how self-organising maps could be used for customer portfolio analysis. Merkevicius, Garšva, and Simutis (2004) explored the usage of self-organising maps for forecasting of credit classes. Only several researchers investigated the usage of SOMs in fraud. Urueña López et al. (2019) used self – organising maps for finding hidden relationships in data about fraud on the Internet, computer users’ behaviour, as well as security incidents. Balasupramanian, Ephrem and Al-Barwani (2017) proposed an architectural framework that uses big data analytics and the self-organising maps to handle card fraud effectively. Olszewski (2014) presented how self-organising maps can be used for visualisation of user profiles and comparison of frauds in credit card trans- actions, telecommunications, and networks. Almendra and Enachescu (2013) present an algorithm that combines the self-organising map with the supervised learning para- digm with labelled data in the context of online auction sites. Quah and Sriganesh (2008) described a real-time fraud detection approach aimed at a better understanding of fraudulent spending patterns based on self-organising maps. Zaslavsky and Strizhak (2006) derived the model of a typical cardholder’s behaviour and analysed suspicious transactions by using self-organising maps. Brockett, Xia, and Derrig (1998) classified suspicious automobile bodily injury claims by using self-organising maps. Data mining has been extensively used in fraud de- tection and prevention, with various areas of applications, such as credit card fraud and insurance fraud. Several re- searchers indicated that neural networks outperform other methods for fraud prevention and detection. To our best knowledge, no research presents the application of data mining in fraud prevention and detection in the leasing industry. 3 Methods 3.1 Self-organising maps (SOMs) The goal of using the SOMs is to discover similarities among elements in the set of instances and to organise the neurons in the computational layer into clusters associated with patterns in the set of instances. Therefore, SOMs are visual representations of learned structures that appear as clusters of similar objects. The basic SOMs algorithm can be described as follows (Bação, Lobo, & Painho, 2005). The neighbourhood func- tion is a function that decreases with the distance to the winning node and is responsible for the interactions among nodes. During training, the radius of this function decreas- es, so each node becomes more isolated from the effects of its neighbours. The winning node changes its weight vec- tor to become more similar to the input vector. All neigh- bours of the winning node also change their weights to the direction of the input vector. Thus, the weight vectors of neighbouring nodes become similar because of their con- vergence with the winning node towards the input data vector. The corresponding error function E(w) with an expec- tation value converging to a minimum during the training process (distortion measure) is: E = ∫Σi hci |w − x| g(x) dnx, (1) where hci is the neighbouring function of node i to the corresponding winner c(x), and g(x) the density function of the vectors x in the n-dimensional data space. The Ko- honen net is obtained in a discrete data space by computing the optimal weight vectors for minimizing E(w)) using a gradient descent (Viscovery, 2019). In addition, SOMs can be seen as a form of k-means clustering in which every unit corresponds to a “cluster”, and the number of clusters is defined by the size of the 134 Organizacija, Volume 53 Issue 2, May 2020Research Papers grid (Wehrens & Buydens, 2007). In comparison to the k-means clustering, Kohonens’s self-organizing maps showed more accuracy in classifying most of the objects when the number of clusters is lower than eight (Abbas, 2008). Bação, Lobo, and Painho (2005) also proposed the use of SOMs as a possible substitution for the k-means clustering. They concluded that during the search, space is better explored by SOM, and by the end of the search process, the SOM is the same as k-means, which allows for a minimization of the distances between the nodes and the winning node. The main reason for the usage of SOMs in this research is that the k-means clustering algorithm is mainly used for minimizing the sum of squared distances between the input and the prototype vectors, but it does not perform topological mapping like Kohonen self-organiz- ing maps do (Van Laerhoven, 2001). SOMs are used in state-of-the-art software. Viscovery SOMine software is specialised software, which enables clustering by using two algorithms that are based on the classical hierarchical agglomerative cluster method of Ward (Viscovery, 2019). The first algorithm is based on the Ward method, which uses the variance criterion as a distance measure. The second algorithm is the SOM Ward algorithm based on the modified Ward method. It is de- veloped on the ground of the soft computing paradigm. In this method, the topological neighbourhood influences the cluster merge steps (Viscovery, 2019). The nodes with many corresponding data records have a higher impact in comparison with the nodes with fewer matching records (Viscovery, 2019). As a distance measure, a modified Ward distance is used. This distance observes the topological locations of the clusters. It means that two clusters that are not neigh- bouring in the SOM are never considered to be merged (Viscovery, 2019): (2) Then, the SOM – Ward distance is normalized with an exponential function (Viscovery, 2009): µ(c) = d(c)*cβ, (3) where d(c) indicates the SOM-Ward distance used to merge c clusters into c-1 clusters and β is a linear regres- sion coefficient (3≤c