Informatica 36 (2012) 359-368 359 Usage of Holt-Winters Model and Multilayer Perceptron in Network Traffic Modelling and Anomaly Detection Maciej Szmit Orange Labs Poland, 7 Obrzezna Street, 02-691 Warsaw, Poland E-mail: maciej.szmit@gmail.com, http://maciej.szmit.info Anna Szmit Technical University of Lodz, Department of Management, 266 Piotrkowska Street, 90-924 Lodz, Poland E-mail: agorecka@p.lodz.pl, http://anna.szmit.info Slawomir Adamus Technical University of Lodz, Computer Engineering Department, 18/22 Stefanowskiego Street, 90-924 Lodz, Poland AMG.lab, 11 Lakowa Street, 90-562 Lodz, Poland E-mail: slawomir.adamus@hotmail.com Sebastian Bugala Technical University of Lodz, Computer Engineering Department, 18/22 Stefanowskiego Street, 90-924 Lodz, Poland E-mail: sebastian.bugala@hotmail.com Keywords: network behavioral anomaly detection, Holt-Winters model, multilayer perceptron Received: September 16, 2012 This paper presents results of analysis offew kinds of network traffic using Holt-Winters methods and Multilayer Perceptron. It also presents Anomaly Detection - a Snort-based network traffic monitoring tool which implements a few models of traffic prediction. Povzetek: Predstavljena je metoda za modeliranje in iskanje anomalij v omrezju. 1 Introduction In modern computer networks and high-loaded business or industrial systems there is a need of continuous availability of services and hosts (see e.g. [28], [29] [30] [34]). Inaccessibility of some mission critical can cause large impact to business processing continuity and this as a result would generate looses. Solution for such potential problems could be permanent and uninterrupted supervision on network health. This in turn can be achieved by implementation of some monitoring solution. Efficient monitoring method helps achieve high service availability and it will be a good idea to extend network security by tools such as Intrusion Detection System, Intrusion Prevention System and Unified Thread Managers (see e.g. [32] [33]). IDS is a tool which monitors and analyses in real time every aspect of inbound and outbound traffic of the network. Based on the analysis and based on one of the mechanisms responsible for threat detection creates reports of the abnormalities of network traffic. Most common mechanisms which detect threats used in IDS are misuse detection and anomaly detection, they are two different approaches to threat detection, first one relays on determination abnormal parameters and network traffic behavior, everything which we do not know is treated as normal, second one is a reverse of the first one, it treats everything which deviates from the standard is treated as potential threat. IDS on its own only reports and logs the abnormalities and does not take any further actions and his role is to report to administrator which is whom decides what action should be taken to prevent imminent danger which can be a cumbersome for the administrator with a large number of notifications. In order to relieve the amount of work of administrator, ideas of IDS have been extended by possibility to take defined actions immediately in case of detection of typical and schematic threats for the network, as a result IPS was created which is a variety of IDS which is compatible with tools such as firewalls and control its settings in order to counter the threat. A typical representative of the above-described tool is Snort (see e.g. [2] [3] [31]), a software type of IDS/IPS based on mechanism which detects attack signatures originally intended only for the Unix platform, but now also transferred to the Windows operating system, developed on the principles of open source software licenses. Large capacity and performance are characteristics that gained snort popularity among users. Its modular design makes the software very flexible and thus can be easily adapted to the requirements of the currently analyzed network environments, and expand its functionality. This article extends demonstration of the capabilities of the AnomalyDetection tool (basic overview of the tool was published in [15] and [36]) created for network 360 Informática 36 (2012) 359-368 M. Szmit et a. monitoring and future network traffic forecasting Snort-based applications using the flexibility and easy extensibility (the ability to create own preprocessors and postprocessors) of this program. The preprocessor was developed to extends Snorts possibilities of network traffic analysis by anomaly detection mechanism [4]. Combination of the two mechanisms (i.e., misuse detection and anomaly detection) provides more comprehensive protection against all types of threats, even those partially abstract, such as the malice of employees. Tools included in the Anomaly Detection 3.0 allows analysis of movement, its forecasting with help of its advanced statistical algorithms, evaluation of created forecasts, real-time monitoring and verifying that the individual volumes of network traffic parameters do not exceed the forecasted value and in case of exceeding the norms to generate the appropriate messages for the administrator who should check each alarm for potential threats. Current (3.0) version (see e.g. [5], [6]) of AnomalyDetection provides monitoring of following network traffic parameters: total number of TCP, UDP, and ICMP packets, number of outgoing TCP, UDP, and ICMP packets, number of incoming TCP, UDP, and ICMP packets, number of TCP, UDP, and ICMP packets from current subnet, number of TCP packets with SYN/ACK flags, number of outgoing and incoming WWW packets - TCP on port 80, number of outgoing and incoming DNS packets - UDP outgoing on port 53, number of ARP-request and ARP-reply packets, number of non TCP/IP stacks packets, total number of packets, TCP, WWW, UDP, and DNS upload and download speed [kBps]. Whole Anomaly Detection application consists of three parts: Snorts preprocessor, Profile Generator and Profile Evaluator. Data exchange between these parts is realized by CSV (Comma Separated Values) files (see: Figure 1). PATTERN file (theoretical values of time series Figure 1: Anomaly Detection data flow diagram. Source: [15]. Gray solid arrows means saving to file and black dotted -reading from file. Particular files stands for: • Log file - this file gathers all network traffic data collected with AD Snort preprocessor. Data from this file is next used by Profile Generator for network traffic forecasting. • Profile file - this file stores network profile computed with Profile Generator. This file is generated by Profile Generator and used by AD preprocessor for detecting anomalies and generating alerts. After every passed time period preprocessor reads profile file and looks for data corresponding to current period. If value for some counter exceeds minimum (MIN) to maximum (MAX) range then alert is generated. • Predicted pattern file - predicted pattern file contains predicted future data for network - in fact this is the same file as profile file, but with single value for each counter. This is necessary for evaluating profile in AD Evaluator script. Structure of pattern file is the same as log file. • Pattern file - this file is created like predicted pattern file, but network traffic profile stored in this file is historical data. • Parameters file - this file stores information for method of profile generation and method parameters values. This file has different structure for every algorithm of profile generation. • Structures of log and profile files can be found in [15]. Anomaly Detection have two main modes: • data acquisition mode - only network traffic statistics are saved into log file. Only log file is created in this mode. • alerting mode - instead of data acquisition there is also created profile file and current traffic statistics are compared to values stored in profile file. In this mode log and profile file are required. Pattern, predicted pattern and parameters files are always optional and they're useful for future research. Anomaly Detection 3.0 can be downloaded from http://anomalydetection.info [24]. Preprocessor is available as source or RPM package. Both Profile Generator and Evaluator are available as R scripts -additional R CRAN (free) software is required for use R scripts. Additional instalation, update and removal scripts are provided for Profile Generator and Evaluator. 2 Preprocessor The main part of the Anomaly Detection system is a preprocessor written in C programming language, designed to enhance Snort possibilities to monitor, analyze and detect network traffic anomalies using NBAD (Network Behavioral Anomaly Detection) approach. The first version of AnomalyDetection preprocessor [6] for Snort version 2.4x was published in a Master's Thesis [25] in 2006. Next the project has been developed (see e.g. [5] [7] [8] [9] [17]) till the current version 3.0 designed for Snort 2.9.x. The main task of the preprocessor is anomaly detection, realized by using a simple algorithm based on data acquisition and subsequent comparison of the collected values with pattern. Preprocessor reads a predicted pattern of the network traffic (of all USAGE OF HOLT-WINTERS MODEL AND. Informatica 36 (2012) 359-368 361 parameters) from the 'profile' file and generates alert when the current value exceeds 'minimum' to 'maximum' range for the current moment (the moment is given by day of the week, hour, minute and second corresponding to the intervals from the log file) from the profile file. The profile can be generated 'manually', using external tools, or by a Profile Generator using appropriate model, based on historic values from the log file. The architecture affords easy implementation of different statistical models of the traffic and usage of different tools (i.e. statistical packets) for building profiles. Data from the profile is read in intervals defined by the user, there is only one line read into the structure at a time, this gives possibility to dynamically alter the profile file. In case of failure to find the correct entry in the profile, anomaly report module is automatically disabled to prevent generation of false positive alerts. As mentioned above the current version of the preprocessor can work with adaptive network models through changes in the algorithm which loads profile information. Abandoned single network profile load for the load of single-line in specified time interval. Profile data is loaded at exact time of writing counter to the log file. This solution although increases the number of I/O operations adversely affecting the performance but also supports replacing another model during runtime without having to restart whole application. In addition, all the calculations have been relegated to third-party applications and the profile has been changed so that it contains the minimum and maximum value. This approach makes the preprocessor is more flexible and efficient, does not limit the user to use a single method to generate a network profile, the profile can be freely generated by any application while maintaining only the appropriate input format. Reporting anomalies was adjusted to snort standards by implementing a mechanism which reports events and handle these events by dedicated preprocessor rules. The user can freely adjust the rules to fit his needs, for example; the content of messages stored in the log, which is a priority or which action should be taken when matching rules. These changes make the application more customizable and user-friendly. Improving algorithm for packet acquisition by removing unnecessary comparisons and optimizations of other ones and increased capacity of counters made it possible to use preprocessor in networks with high bandwidth 1Gb and above. The next function of the preprocessor is generating alerts. Preprocessor reads a predicted pattern of the network traffic (of all parameters) from the 'profile' file and generates alert when the current value exceeds 'minimum' to 'maximum' range for the current moment (the moment is given by day of the week, hour, minute and second corresponding to the intervals from the log file) from the profile file. The profile can be generated 'manually', using external tools, or by a Profile Generator using appropriate model, based on historic values from the log file. The architecture affords easy implementation of different statistical models of the traffic and usage of different tools (i.e. statistical packets) for building profiles. Data from the profile is read in intervals defined by the user, there is only one line read into the structure at a time, this gives possibility to dynamically alter the profile file. In case of failure to find the correct entry in the profile, anomaly report module is automatically disabled to prevent generation of false positive alerts. 3 Profile Generator In previous versions of AnomalyDetection system profile generation module was included in preprocesor module -because of this whole application was inflexible. The current version of Profile Generator (see e.g. [7] [8] [9]) have been separated into independent module which can be used to compute statistical models not only for AD preprocessor. Furthermore current version is based on R language / environment (The R Project for Statistical Computing) (see e.g. [10] [11] [12] [13] [14]) which is more flexible and user-friendly than previous implementation in C language. R-project is an free, open source packet for statistical computing and graphics. In this implementation optional packages for R: tseries, quadprog, zoo and getopt are used. The whole implementation of Profile Generator is divided into few parts. First part prepares data from log file for further calculations and other parts - depending on the given parameters - calculates future network traffic forecasts. At the end all computed values are written into proper files - based on given runtime parameters. Data flow in ProfileGenerator module is shown on Figure 2. Figure 2: Profile Generator data flow diagram. Source: [35]. Profile Generator is controlled with parameters passed for script execution - all script parameters are handled with getopt() function. Particular columns of specification matrix contains respectively: • long flag name • short flag 362 Informática 36 (2012) 359-368 M. Szmit et a. • parameters arguments • arguments type description Profile Generator actually implements five methods of profile file generation: moving average, naive method, autoregressive time series model, Holt-Winters model and Brutlags version of HW model (see e.g. [1] [17]). The value of dependent variable is given as follows: Moving average: t-1 y = S y i=t-k Naive method: yt = yt-T where T is day or week period, or yt = yt-i Autoregressive time series model: yt = ao + ai yt-i + a2 yt-2 + •••+akyt-k Holt-Winters model: yt = L _i + p _i + st_T where: L is level component given by: Lt = a(yt -st t) +(i - a)(Lt_i + Pt_i) P is trend component given by: Pt = ß(Lt - Lt-i) + (i - ß)Pt-i s is seasonal component given by: St = /(yt - Lt ) + (1 - Y)S, t-T (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Brutlag method: jpr = L-i + Pt-i + St _T + mdt_T ^r = Lt_i + Pt-i + St_T - mdt_T where: L , P and S are the same as in Holt-Winters model d is predicted deviation given by: d = H yt- y J +(1 - r)dt - where: k is number of measurements in time series t is moment in time y is predicted value of variable in moment * y t is real (measured) value of variable in (11) moment 1 T is time series period Œ is data smoothing factor f3 is trend smoothing factor y is the seasonal change smoothing factor m is the scaling factor for Brutlags confidence bands 4 Implementation of Naïve Method Naïve method is the simplest method implemented in Profile Generator module. For computing profile with this method PG must be launched with '-m NAIVE' parameter. Additional '--naive' parameter can be used for defining detailed method 'periodicity'. Method implement three version of naïve prediction - LAST, DAILY and WEEKLY. For LAST version forecasted data are defined as the same as previous measurement. DAILY version means that predicted values for some day would be the same as values in previous day of given time-series. The last version stand for algorithm in which forecasted values are determined based on logged data for the same day-of-week in previous week. Because of simplicity if this method it should be used only in adaptive startup mode - this will cause less false-positive alerts and more dynamically prediction. In this mode profile is recalculated in regular intervals of time, so predicted values refreshes with every oncoming period of counter values registration. Figure 3 shows graph with predicted values with 5 period interval of method recalculation. It can be observed step changes of predicted values in succeeding periods. Y-axis on Fig 5 stands for minimal and maximal border of permitted values for total number of TCP packets. X-axis stands for sample number in forecasted time-series Figure 3: Naive method running in adaptive mode with 5 period interval of recalculation. Source: [35]. 5 Implementation of Moving Average Method Moving average method is computed when Profile Generator is run with '-m AVG' parameter set. Detailed method periodicity and length of the horizon of values used for calculation can be defined with '--avg' parameter. Similar to the naïve method - there are three versions of periodicity: LAST, DAILY and WEEKLY. USAGE OF HOLT-WINTERS MODEL AND. Informatica 36 (2012) 359-368 363 There is also required second parameter which stands for number of values used to compute moving average. For example 'DAILY,3' means that values from three previous days would be used to compute prediction, 'LAST,5' means that average would be computed using five previous values registered in log file. 6 Implementation of Autoregressive Model AR model can be calculated when run with '-m AR' parameter. Calculations in this method are based on ar() function from package stats in R environment. Function ar() fits an autoregressive time series model to given data and it is wrapper for the functions: ar.yw, ar.burg, ar.ols and ar.mle. Setting 'method' parameter to ar() function defines the method used to fit the model. There are available four algorithms used to fit model to given time-series: Yule-Walkers, Burgs, MLE (maximum likelihood) and OLS (ordinary least squares). 7 Implementation of Holt-Winters Model The Holt-Winters model, called also the triple exponential smoothing model, is a well-known adaptive model used to modeling time series characterized by trend and seasonality (see e.g. [20], [19] p. 248, [18], [21], [22]). The model is sometimes used to modeling and prediction of network traffic (see e.g. [23],[7], [8]). For computing an Holt-Winters model Profile Generator must be launched with parameter '-m HW'. Optional parameter '--hw' can be set for defining model periodicity and subset of data used to build model. Implementation of Holt-Winters prediction method in Profile Generator is based on function HoltWinters() from package stats. HoltWinters() functions requires time series data as object of class 'ts' (time-series object). Object 'ts' is created as follows: ts obj<- ts(log.data[,column.log], frequency=pr ofile.config.frequency, start=c(as.num eric(log.first.date),log.first.sample. no)) Function 'ts' gets in this implementation 3 parameters: • data - a numeric vector of the observed time-series values • frequency - the number of observations per unit of time • start - the number of observations per unit of time. This parameter can be a single number or a vector of two integers - because of this in our implementation human-readable date from log file is converted into numeric value and second value is number of sample of first observation in the day. Next HoltWinters() function computes Holt-Winters filtering of a given time series. Function tries to find the optimal values of a or P or y by minimizing the squared one-step prediction error with optim() function. Start values for L , P and ^ are inferred by performing a simple decomposition in trend and seasonal component using moving averages - it is realized with decompose () function. Figure 4 shows one weekly period (from January 1st to January 7th) of testing data. Figure 4: One period of testing data. Source: own research. Decompose () function decomposes a time series into seasonal, trend and irregular components using moving averages. For testing data decompose () function returns values with trend, seasonal and random component. Figure 5 shows those decomposed data. Decomposition of additive time series iW a k A „A \ aA JA vv-'l JA \ / \ (n \ / \ <# \ / \ ; Vv V W \tl \iJ \aJ A / /V Figure 5: Decomposed time series. Source: own research. HoltWinters() function estimates HW model smoothing parameters (alpha, beta and gamma), which were for testing data as follows (see: Figure 6). Figure 7 shows Holt-Winters fitted to observed comparison. Fitted values with HoltWintersQ function V j vvvA^ A tt1 .A /^V^z-'W M M M M JV sf V V V V v Figure 6: Fitted Holt-Winters. Alpha=0.8140128; beta=0; gamma=1. Source: own research. 364 Informática 36 (2012) 359-368 M. Szmit et a. Figure 8: Holt-Winters prediction. Source: own research. 8 Brutlags Algorihm Holt-Winters method was used to detect network traffic anomalies as described in the article [1]. In that paper, the concept of "confidence bands" was introduced. As described in the article, confidence bands measure deviation for each time point in the seasonal cycle and this mechanism bases on expected seasonal variability. Illustration Fig 9 shows computed confidence bands for HW time series prediction. Figure 7: Holt-Winters fitted to observed comparison. Source: own research. Fitted values compared to observed values for given testing data: Black line stands for observed data and gray line stands for fitted model (in most range black line covers gray). When Holt-Winters model is computed, then future prediction can be calculated simple with predict.HoltWinters() function. Predict() function takes in this case two arguments: HoltWinters object with fitted model parameters number of future periods to predict Function returns a time series of the predicted values for given future periods. For testing data values returned from predi ct () function are shown on Figure 8. Figure 9: Brutlags confidence bands. Source: own research. Confidence band is computed by comparing last period of collected network traffic values with fitted Holt-Winters values for the same period. Subtract of real and predicted values is next scaled with Y estimated by Holt-Winters function - obtained value is finally multiplied by scaling factor. Confidence band width is controlled with '--scale' parameter - above example is computed with scale parameter value of '2'. Brutlag proposes sensible values of '--scale' parameter are between 2 and 3. Particular lines stands for: • black - observed values of time series • medium gray - computed prediction of time series with Holt-Winters model • light gray - upper bound of Brutlags confidence band • gray - lower bound of Brutlags confidence band 9 Usage of Profile Generator Generator can be launched like any script in CLI (Command Line Interface) of operating system with R software and necessary packages installed. Scripts available at [24] were tested on few GNU / Linux distributions: Fedora, Oracle Linux, CentOS, Debian, and Ubuntu. Parameters for Profile Generator script are validated against bellow BNF notation grammar: ad profilegenerator.r | ahead) ::= -(-log|l) <> ::= -(-profile|p) <> | <> ::= -(-evaluator|e) <> | <> ::= -(-pattern|P) «pattern file path>> | <> ::= -(-save|s) <> | <> > | < ::= - <> ::= AVG | NAIVE | AR | HW | BRUTLAG ::= —avg | <> ::= -(-verbose|v) | ■ ::= -(-ahead|a) <> ::= WEEK|MONTH| ::= -(-scale|d) <> -method|m) | USAGE OF HOLT-WINTERS MODEL AND. Informatica 36 (2012) 359-368 365 -naive | <> --hw > --brutlag | <> : ::= :: | <> (DAILY|WEEKLY),(YW|BURG|MLE|OLE) ::= (DAILY|WEEKLY) ::= (DAILY|WEEKLY) ::= |0|1|2|3|4|5|6|7|8|9 Sense of each parameter impact is clarified under '-help' parameter. At least one of ,,, or parameter should be set for any sense of running script. For example the simplest naïve prediction for real data stored in 'log.csv' file with saving profile data to 'profile.csv' file can be launched with: ./ad profilegenerator.r -l log.csv -p profile.csv -m NAIVE —naive LAST Prediction for one week for the same file based on Holt-Winters algorithm with daily periodicity and with 'verbose' mode can be calculated with: ./ad profilegenerator.r -l log.csv -p profile.csv -m HW --hw DAILY -ahead WEEK -v 10 Evaluator Profile Evaluator is' the third part of Anomaly Detection project. This script is designed for fast evaluation of profile file compared to log file. This script calculates MAE simple statistic -for two files. Main application of M Evaluator is to check fit between pattern and current logged values (with log and pattern file) or between model and historical data (log and predicted pattern file). MAE means Mean Absolute Error and M means Mean. 1 n 1 n MAE = -yt -yt\ = -2hi n t=i n t=i 1 m=-2 y n t=i (12) (13) where: yt is real (current) value of counter in moment y t is t t is predicted (estimated) value of counter in moment t et is prediction error in moment t Calculated values for each counter can be stored in output file when '-s' parameter is set. Exemplary comparison of real registered values with its prediction is shown on Fig 10. Figure 10: Real values compared to AVG - DAILY,3 prediction. Source: [35]. Profile Evaluator script is launched likewise Profile Generator script. Profile Evaluator script parameters grammar looks as follows: ad evaluator.r ::= | ::= -(-help|h) ::= ::= -(-log|l) <> ::= -(-pattern|p) «pattern file path>> ::= -(-save|s) <> <> ::= -(-skip|S) | <> ::= -(-verbose|v) | <> Evaluation of pattern stored in 'pattern.csv' file compared with log data stored in 'log.csv' file can be done with: ./ad evaluator.r -l log.csv -p pattern.csv --verbose 11 Multilayer Perceptron All our previous models can be classified as statistical model assigned to one of two groups: Time Series Models and descriptive models. The next step is usage of artificial-intelligence methods, particularly Artificial Neural Networks (ANN) which are implemented only as offline models in the current state of our research. Artificial Neural Networks are the mathematical models inspired by biological neural networks. ANN consist of an interconnected group of artificial neurons operating in parallel. ANN function is determined by the weights of the connections between neurons, which usually change during a learning phase. There are a lot of types and architectures of ANN according on their purpose. Because of the nature of IDS there are two main groups of issues: pattern recognition, especially classification and prediction. These issues correspond with two main areas of application of ANN. In consequence ANN can be used for intrusion detection in two main ways: as a classifier which determine whether a given object (for example: network packet, e-mail, network flow) is normal or suspicious and as a predictor which try to forecast a future values of system parameters (for example: network traffic, CPU utilization, number of network connections). There are a lot of publications about usage different types of ANN for network traffic prediction (See e.g.: [22], [23], [24], [25]) or intruder detection (See e.g.: [19], [20], [21]). 366 Informática 36 (2012) 359-368 M. Szmit et a. In our current research we choose the simplest artificial neural network - Multilayer Perceptron (MLP) for prediction of traffic time series values. An MLP is a network of neuron called perceptrons. The perceptron is a binary classifier which compute a single output from multiple inputs (and the 'bias', a constant term that does not depend on any input value) as function of its weighted sum. y =