Informatica 37 (2013) 93-105
93
Network Anomaly Identification using Supervised Classifier
Prasanta Gogoi, B. Borah and D. K. Bhattacharyya Department of Computer Science and Engineering Tezpur University, Napam, Tezpur-784028, India {prasant,bgb, dkb}@tezu.ernet.in and www.tezu.ernet.in
Keywords: supervise, incremental, clustering, anomaly, DoS, classification, PCC Received: July 31, 2012
In this paper we present a clustering based classification method and apply it in network anomaly detection. A set of labeled training data consisting of normal and attack instances are divided into clusters which are represented by their representative profiles consisting of attribute-value pairs for selected subset of attributes. Each category of attack and normal instances are broken down into a set of clusters using a training algorithm based on supervised classification algorithm. The cluster profiles together with their class label form rules for labeling unseen test instances. Methods for clustering, training and prediction are provided. The proposed method is evaluated using real life TUIDS Intrusion datasets. Evaluation results on KDD 1999 datasets showed good performance in comparison to results produced by decision tree counterparts. The method presented can be utilized for classification jobs in any other domain.
Povzetek: Predstavljena je metoda za identifikacijo anomalij v omrežju.
1 Introduction
A network intrusion can be any exploit of a network that compromises its stability or the security of information stored on computers connected to it. An intrusion detection system gathers relevant data from computers or the network and analyzes them for signs of intrusion. Different methods such as statistical, pattern matching, machine learning, and data mining are used for intrusion detection. Some approaches are online, that is, they detect attacks in progress in real time, while offline approaches such as data mining provide after-the-fact clues about the attacks to help reduce the possibilities of future attacks of the same type. In general there are two types of approaches [1] for network intrusion detection: misuse detection and anomaly detection. Misuse detection searches for specific signatures to match, signaling previously known attacks without generating a large number of false alarms. Such methods fail to detect new types of attacks as their signatures are not known. Anomaly detection builds models for normal behavior and significant deviations from it are flagged as attack. Supervised or unsupervised learning can be used for anomaly detection. In a supervised approach, the model is developed based on labeled training data. Unsupervised approaches work without any training data or they may use training with unlabeled data. The main advantage of anomaly detection is that it can detect previously unknown attacks if their behavior is significantly different from what is considered to be normal. False alarm rates tend to be higher for anomaly detection methods.
Data mining methods including classification, association analysis, clustering and outlier detection are being used in network intrusion detection. Anomaly detection
often tries to cluster test datasets into groups of similar instances which may be either attacks or normal data. Intrusion detection problem is then reduced to the problem of labeling the clusters as intrusive or normal traffic. When doing unsupervised anomaly detection a model based on clusters of data is trained using unlabeled data, normal as well as attacks. Supervised anomaly detection methods such as classification algorithms need to be presented with both normal and known attack data for training.
In this paper we present a supervised classification method and use it for network anomaly detection. Labeled training dataset is broken down into clusters belonging to normal and attack categories and the clusters are represented by their representative profiles, which together with category labels form the classification rules. During detection, the cluster profiles collect similar testing instances which are then labeled as belonging to the same category as the label of the profiles.
Rest of the paper is organized as follows. In Section 2, some related work regarding supervised anomaly detection methods are presented. The proposed classification algorithm including a training method and a prediction method is presented in Section 3. Application of the method to network anomaly detection is presented in Section 4. Experimental results for datasets TUIDS intrusion, KDD 1999, and NSL-KDD datasets are reported in Section 5. Finally, Section 6 concludes providing a few possible future extensions.
94 Informatica 37 (2013) 93-105
P. Gogoi et al.
2 Related work
Classification is an important supervised learning method that has been applied to anomaly detection. Lee and Stolfo [1] proposed a systematic framework employing data mining methods for intrusion detection. This framework consists of classification, association rules and frequent episodes algorithms that can be used to construct detection models. The authors in [2] presented PNrule, for multi-class classification problem. The key idea used in PNrule is learning a rule-based model in two stages: first find P-rules to predict presence of a class and then find N-rules to predict absence of a class. The scoring mechanism used in PNrule allows one to tune selectively the effect of each N-rule on a given P-rule. ADWICE [3] uses extended BIRCH [4] clustering algorithm to implement a fast, scalable and adaptive anomaly detection scheme. They apply clustering as a method for training of the normality model. Several soft computing paradigms, viz., fuzzy rule-based classifiers, support vector machines, linear genetic programming and an ensemble method to model fast and efficient intrusion detection systems were investigated in [5]. Empirical results clearly show that soft computing approach could play a major role for intrusion detection. Yang et al. [6] present an anomaly detection approach based on clustering and classification for intrusion detection. They perform clustering to group training data points into clusters, from which they select some clusters as normal and create known-attack profiles according to certain criteria. The training data excluded from the profile are used to build a specific classifier. During the testing stage, they use an influence-based classification algorithm to classify network behaviors. A comparative study of several supervised probabilistic and predictive machine learning methods for intrusion detection is reported in [7]. Two probabilistic methods, Naive Bayes and Gaussian and two predictive methods, Decision Tree [8] and Random Forests [9], are used. The ability of each method to detect four attack categories (DoS, Probe, R2L and U2R) has been compared. A new supervised intrusion detection method based on TCM-KNN (Transductive Confidence Machine for K-Nearest Neighbors) algorithm has been presented in [10]. This algorithm works well even if sufficient attack data are not available for training. A review of the most well known anomaly-based intrusion detection methods is provided by [11, 12, 13]. Available platforms, systems under development and research projects in the area are also presented. This paper also outlines the main challenges to be dealt with for the wide scale deployment of anomaly-based intrusion detectors, with special emphasis on assessment issues. Beghdad [14] presents a study of the use of some supervised learning methods to predict intrusions. The performances of six machine learning algorithms involving C4.5, ID3, Classification and Regression Tree (CART), Multinomial Logistic Regression (MLR), Bayesian Networks (BN), and CN2 rule-based algorithm are investigated. KDD 1999 datasets were used to evaluate
the considered algorithms. An SVM-based intrusion detection system is found in [15], one which combines a hierarchical clustering algorithm (BIRCH [4]), a simple feature selection procedure, and the SVM (support vector machines) method. This method is also evaluated using on the KDD 1999 datasets. For these evaluations, three cases were considered: the whole attacks case, the five behaviors classes case, and the two behaviors classes case. The performances of each method were compared.
3 Proposed supervised method
The proposed supervised method uses a training algorithm to create a set of representative clusters from the available labeled training objects. Unlabeled test objects are then inserted in these representative clusters based on similarity calculations and thus get labels of the clusters in which they are inserted. We first present the basics of the clustering algorithm, and then present the training and prediction algorithms.
3.1 Background
The dataset to be clustered contains n objects, each described by d attributes A1, A2, ■ ■ ■ , Ad having finite discrete valued domains D1, D2, ■ ■ ■ , Dd, respectively. A data object can be represented as X = {x1, x2, ■ ■ ■ , xd}. The j-th component of object X is Xj and it takes one of the possible values defined in domain Dj of attribute Aj. Referring to each object by its serial number, the dataset can be represented by the set N = {1, 2, ■ ■ ■ , n}. Similarly, the attributes are represented by the set M = {1, 2, ■ ■ ■ , d}.
3.1.1 Similarity function between two objects
The similarity between two data objects X and Y is the sum of per attribute similarity for all the attributes. It is computed as sim(X, Y),
iim(X, Y) = Y^ s(xj, Vj), j=1
(1)
where s(xj, yj) is the similarity of the j-th attribute defined as
s(x- v} = /1 iflxj-ysj (2)
v j j \ 0 otherwise,	v 7
where Sj is the similarity threshold for the j-th attribute. For categorical attributes Sj =0 and for numeric attributes
Sj > 0.
3.1.2 Similarity between cluster and object
A subspace based incremental clustering method is used here. A cluster is a set of objects which are similar considering a subset of attributes only. The minimum size of the subset of attributes required to form a cluster is defined by the threshold MinAtt. Let the subset
Network Anomaly Identification...
Informatica 37 (2013) 93-105
95
of defining attributes be represented by Dattributes = {ai, a2, • • • , anoAttributes} such that Dattributes C M and noAttributes is the size of Dattributes. A cluster is represented by its profile, somewhat like an object. All objects in a cluster are similar to the profile. The cluster profile is defined by a set of values, Values = {vi ,V2, ••• , VnoAttributes} considering the attributes in Dattributes. That is vi G Dai is the value for attribute ai G M, v2 G Da2 is the value for attribute a2 G M, and so on. Thus, the cluster profile is defined by
Profile = {noAttributes, Dattributes, Values}. (3)
Let Olist C N be the list of data objects in the cluster. A cluster C is completely defined by its Profile and Olist
Table 1: A sample dataset
C = {Olist, Profile}.
(4)
Serial no.	Ai	A2	A3	A4	A5
1	a3	b2	c4	d1	e2
2	a2	b2	c4	d3	e2
3	a3	b1	c2	d1	e1
4	a2	b2	c4	d1	e2
5	a3	b1	c2	d3	e1
6	a1	b2	c1	d2	e2
7	a3	b1	c2	d2	e1
Table 2: Algorithm Notations
Symbol
Description
Our incremental clustering algorithm inserts an object in any one of the clusters existing at a particular moment. So, the similarity between a cluster and a data object needs to be computed. Obviously, the cluster profile is used for computing this similarity. As the similarity is computed over the set of attributes in Dattributes only, the similarity function between a cluster C and an object X becomes sim(C, X)
noAttributes
sim(C, X)= ^ s(Vj, Xaj), (5)
j = i
where s(vj, xaj) is the similarity of the j-th attribute defined as
,	\ f 1 if |vj - Xaj | < 5aj
s(vj, xaj Hjo otherwise j ' (6)
Example: Consider a small dataset shown in Table 1 with seven objects defined using five attributes Ai, A2, A3, A4 and A5. The domains for the attributes are respectively,
Di = {a1, a2, a3}, D2 = {b1, b2},
D3 = {c1,c2,c4}, D4 = {d1,d2,d3} andD5 = {e1,e2}.
Clusters Ci and C2 can be identified in the dataset with
MinAtt = 3 and 5aj = 0 for aj G Dattributes: Ci = {Olist = {1, 2, 4}, noAttributes = 3, Dattributes = {2, 3, 5}, Values = {b2, c4, e2}}, and C2 = {Olist = {3, 5, 7}, noAttributes = 4, Dattributes = {1, 2, 3, 5}, Values = {a3, b1, c2, el}}.
3.1.3 Our supervised clustering algorithm
The clustering algorithm starts with an initially empty set of clusters. It reads each object Xi sequentially, inserts it in an existing cluster based upon the similarity between Xi and the clusters or a new cluster is created with Xi if it is not similar enough, as defined by the threshold MinAtt for insertion in an existing cluster. The search for a cluster for inserting the present object is started with the last cluster created and moves towards the first cluster until the
MinAtt	threshold value for minimum attribute.
minSize	minimum objects in a cluster threshold value.
sim(X, Y)	similarity value between two objects.
sim(C, X)	similarity value between a cluster and an object.
Olist	object list in a cluster.
Profile	profile of a cluster.
noAttribtes	total number of attributes in a cluster profile.
Dattributes	distribution of attributes in a cluster profile.
Values	values of attributes in a cluster profile.
search is successful. If successful, the object is inserted in the cluster found and the search is terminated. At the time of inserting the object in the found cluster C, the number of defining attributes of the cluster (C.noAttributes) is set according to the computed similarity measure between the cluster and the object and the sets of C.Dattributes along with CValues are updated. If the search is not successful a new cluster is created and the object itself made the representative object of the cluster, i.e., the full set of attributes becomes the Dattributes while full set of values of the object becomes corresponding Values of the new cluster profile.
The classification method begins with a fixed number of existing clusters (as defined by their profiles) with no objects present in them. Objects are incrementally inserted in any of the clusters and no new clusters are created. The description of notations used in the algorithm is given in Table 2
3.2 Training method
Given a set of labeled training data, a combination of un-supervised incremental clustering and supervised classification methods are applied to create and refine a set of clusters from which profiles are extracted for use in the test object labeling process. The unsupervised clustering algorithm decides cluster membership immediately as the objects arrive sequentially without considering subsequent objects that are yet to be seen. Therefore, refinement of the created clusters is performed using a subsequent supervised classification step that allow possible realignment of the data objects in the clusters.
In the beginning of the training algorithm, all objects are marked unprocessed. Similarity thresholds minAtt (the
96	Informática 37 (2013) 93-105
P. Gogoi et al.
minimum number of attributes) and minSize (the minimum number of objects in a cluster) are set high and they are gradually decreased in steps. In each iteration, the remaining unprocessed objects are clustered using a combination of supervised classification and unsupervised clustering. If the supervised classification process fails to insert an object in any of the preexisting (created in the previous iteration) clusters (SCs), then the unsupervised clustering process inserts it in any of the unsupervised clusters (UCs) created in the present iteration or a new unsupervised cluster is created with the object. When the clustering process ends in the present iteration, cluster profiles are extracted from each of the SCs with at least minSize objects in it and the objects in such a cluster are marked processed. All SCs are then deleted. The UCs may also include some insignificant clusters whose sizes are less than minSize. Such clusters are deleted. Remaining UCs are made SCs for next iteration after making them empty by deleting their object lists. Then the threshold values minSize and minAtt are reduced so that the next iteration can create larger clusters instead of fragmented clusters. Reducing the thresholds allows more generalization. The algorithm iterates so long as there are unprocessed objects. To ensure termination of the algorithm minSize is reduced to minSize/2 so that the ultimate value of minSize becomes 1, beyond which no objects remain unprocessed. The threshold minAtt is loosened by setting minAtt=minAtt — a, where a is a small integral constant such as 1 or 2. Reduction of minAtt below a certain level (MIN) is not allowed, after which remains constant at MIN. Generalization beyond MIN makes data objects from two different classes indistinguishable. When training terminates, the set of profiles found in the profile file becomes the final cluster profiles for use in the prediction process. The training algorithm is given as Algorithm 2.
3.2.1 Effective profile searching
Two-dimensional link list (2-DLL) structure (as found in [16]) is used to store the profiles in memory. It enables to update the profile base dynamically. The 2-DLL structure updates row wise with insertion of new parameter in a profile and column wise with the insertion of a new protocol type. It enables to search the profile base protocol wise, viz., TCP, ICMP, UDP, "other" (which profiles are not included the protocols viz., TCP, ICMP or UDP. Once the respective protocol is identified in this 2-DLL structure, further traversal is supported for other parameters matching by following the link list structure.
3.3 Prediction method
Once the set of cluster profiles is ready, labeling test objects becomes simple. Supervised clustering is performed with the objects. The label of the cluster profile becomes the label of each object inserted in the cluster. The objects
Algorithm 1 Training algorithm
Input: Training Dataset; Output: file1=TrainProfile; 1: read (n, d, minAtt, minSize) ; 2: for (i = 0; i < n; i = i +1) do 3: read (X [i], recordLabel [i]) ; 4: processed[i]=0; 5: end for
6: totalProfile = 0 ; 7: top = 0 ;
8: while (minSize > 0) do
9: for (i = 0; i < n; i = i + 1) do
10:	found = FALSE; //supervised clustering
11:	for (j = 0; j < totTrain; j = j + 1) do
12:	if (recordLabel[i] = = clusterLabelj]) then
13:	if (sim(SC[j], X[i]) = = SC[j].noAttributes) then
14:	append(i, SC[j].oList) ;
15:	found = TRUE ;
16:	break;
17:	end if
18:	end if
19:	end for //end supervised clustering
20:	if (found == TRUE) then
21:	continue;
22:	end if
23:	for (j = top; j > = 0; j = j — 1) do
24:	if (recordLabel[i] == clusterLabelj]) then
25:	if (sim(UC[j], X[i]) > = minAtt) then
26:	found = TRUE ;
27:	update(UC[j], i, X[i]) ;
28:	break ;
29:	end if
30:	end if
31:	end for
32:	if (found == FALSE) then
33:	top = top + 1 ; //create new cluster
34:	UC[top] =new cluster ;
35:	append(i, UC[top].oList) ;
36:	UC[top].noAttributes = d;
37:	for (j = 0; j<d; + + j) do
38:	UC[top].a[j] = j ;
39:	UC[top].v[j]= X[i][j] ;
40:	end for
41:	clusterLabel[top] = recordLabel[i] ;
42:	end if
43: end for//end new cluster
44: for (i = 0; i < totTrain; i = i +1) do
45:	m=sizeof(SC[i].oList)
46:	if (m >= minSize) then
47:	k = SC[i].noAttributes ;
48:	for (j = 0; j < k; j = j + 1) do
49:	write(file1, SC[i].a[j]);
50:	end for
51:	for (j = 0; j <k; j = j +1) do
52:	write(file1, SC[i].v[j]);
53:	end for
54:	p = SC[i].oList ;
55:	for (j = 0; j < m; j = j + 1) do
56:	k = p.getdata(j) ;
57:	processed[k]=1 ;
58:	p=p.next;
59:	end for
60:	delete SC[i] ;
61:	end if
62:	totTrain = 0 ;
63:	for (i = 0; i < top; i = i +1) do
64:	m=sizeof(UC [i] .oList)
65:	if (m >= minSize) then
66:	SC[totTrain] = UC. [i] ;
67:	delete UC[i].oList;
68:	UC[i].oList = NULL ;
Network Anomaly Identification...
Informatica 37 (2013) 93-105	97
2 Training algorithm (continued)
69:	totTrain = totTrain + 1 ;
70:	else
71:	delete UC[i] ;
72:	end if
73:	end for
74:	minSize = minSize/2 ;
75:	MinAtt = minAtt - a ;
76:	if (MinAtt < MIN) then
77:	minAtt = MIN ;
78:	end if
79: end for 80: end while
81: int function sim(cluster C, object X)
82: int scount = 0 ;
83: k = C.noAttributes ;
84: for (j = 0; j < k; j = j + 1) do
85: l = C.a[j] ;
86: if (abs(C.v[j] - x[l]) <= 5[l]) then
87:	scount = scount + 1 ;
88: end if
89: end for
90: return scount;
91: end function
92: function update(cluster C, int r, object X)
93: append(r, C.Olist) ;
94: int count = 1;
95: m = C.noAttributes;
96: for (j = 0; j < m; + + j) do
97: l = C.a[j] ;
98: if (abs(s(C.v[j] - x[l])) <= 5[l]) then 99:	count = count + 1 ;
100:	C.v[count] = C.v[j] ;
101:	C.a[count] = C.a[j] ;
102: end if
103: C.noAttributes = count ; 104: end for 105: end function
not inserted in any of the clusters defined by the profiles need special attention. They may be flagged suspicious for anomalies. Another method for dealing with such objects is to insert such an object in a cluster which is most similar to the object and label accordingly. The algorithm for prediction method is given as Algorithm 4.
3.3.1 Complexity analysis
The clustering algorithms require one pass through the set of training examples that currently remain unprocessed. Each training example needs to be compared with existing clusters one after another until it gets inserted in one of the clusters. The similarity computation involves a subset of attributes. Therefore, the clustering process has a complexity O(ncd), where n is the number of training examples, c is the number of clusters and d is the number of attributes. Each of the created clusters needs to be visited to extract its size and profile. Hence, maximum time complexity of one iteration of the training algorithm becomes O(ncd) + O(c). The algorithm performs at most k iterations, where k = log2(minSize). As minSize is the minimum number of objects for a cluster to be considered significant, it is not large. Overall maximum time complexity of the algorithm is O(kncd) + O(kc).
4 Classification of network anomalies
In this section we apply our classification method to network anomaly detection.
Algorithm 3 Prediction algorithm
Input: Dataset /i1e1=TrainProfile, /i1e2=DataFile; Output: Object cluster label of file2 ;
1
2
3
4
5
6
7
8 9: 10 11 12
13
14
15
16
17
18
19
20 21 22
23
24
read(file1, totTrain); for i = 0; i < totTrain; i = i + 1 do read(file1, k); C[i].noAttributes = k; for (q = 0; q < k; q = q +1) do read(file1, C[i].a[q]); for (j = 0; j < k; j = j + 1) do read(file1, C[i].v[j]); if count == C[j].noAttributes then print("Object" i " is of category" clusterLabel [j]); found = 1; break;
else
if count > max then
max = count; position = j; end if end if end for
if found == 0 then
print("Object" i "is of category"; clusterLabel[position]); end if end for end for
4.1 Dataset Description
The algorithm was evaluated with three real life TU-IDS [17] intrusion datasets and two benchmark intrusion datasets, viz., KDD Cup 1999 [18] and NSL-KDD [19] datasets.
4.1.1	TUIDS dataset
The real life TUIDS Intrusion datasets consist of three datasets Packet level, Flow level and Portscan where each dataset consists of attributes (numerical/categorical) of 50(11/39), 24(9/15) and 23(9/14), respectively as given in Table 3.
4.1.2	Benchmark intrusion datasets
The description of two benchmark intrusion datasets, KDD Cup 1999 [18] and NSL-KDD [19] are given in Table 4 and Table 5, respectively. Each record of the datasets represents a connection between two network hosts according to some well defined network protocol and is described by 41 attributes (38 continuous or discrete numerical attributes and 3 categorical attributes). Each record is labeled as either normal or one specific kind of attack. The attacks fall
98	Informática 37 (2013) 93-105
Table 3: TUIDS Intrusion Datasets
Data sets	Connection Type	Data set type	
		Training	Testing
	Normal	35043	27895
Packet Level	Attack	397832	138370
	Total	432875	166265
	Normal	36402	16770
Flow Level	Attack	363729	123955
	Total	400131	140725
	Normal	2445	1300
Portscan	Attack	39215	28615
	Total	41660	29915
Table 4: KDD Cup 1999 datasets
Data sets	DoS	U2R	R2L	Probe	Normal	Total
Corrected	229853	70	16347	4166	60593	311029
KDD						
10-percent	391458	52	1126	4107	97278	494021
KDD						
Table 5: NSL-KDD datasets						
Data sets	DoS	U2R	R2L	Probe	Normal	Total
KDDTrain+	45927	52	995	11656	67343	125973
KDDTest+	7458	67	2887	2422	9710	22544
in one of the four categories: User to Root (U2R), Remote to local (R2L), Denial of Service (DoS) and Probe.
-	Denial of Service(DoS): Attacker tries to prevent legitimate users from using a service. For example, SYN flood, smurf, teardrop etc.
-	Remote to Local (R2L): Attackers try to gain access to victim machine without having an account on it. For example, guessing password.
-	User to Root (U2R): Attackers have local access to the victim machine and tries to gain super user privilege. For example, buffer overflow attacks.
-	Probe: Attacker tries to gain information about the target host. For example, Port-scan, ping-sweep etc.
Number of samples of each category of attack in Corrected KDD dataset and 10 percent KDD dataset of KDD 1999 are shown in Table 4. The attack distribution in KDDTrain+ and KDDTest+ of NSL-KDD are shown in Table 5.
4.2 Data preprocessing
We have discretized continuous valued attributes by taking logarithm to the base 2 and then converting to integer. This is done for each attribute value z using the computation:
if (z > 2) z = int(log2(z)) +1. Before taking logarithm,
P. Gogoi et al.
the attributes which take fractional values in the range [0,1] are multiplied by 100 so that they take values in the range [0, 100]. Nominal valued attributes are mapped to discrete numeric codes which are nothing but serial numbers beginning with zero for the unique attribute values in the order in which they appear in the dataset. The class label attribute is removed from the dataset and stored separately in a different file. The class labels are used for training and evaluating the detection performance of the algorithm.
4.3	Feature selection
Information gain [20] is computed for each of the discretized attributes. Six attributes (attribute numbers 7, 9, 15, 18, 20 and 21 in KDD Cup 1999 and NSL-KDD datasets) corresponding to very low information gain are removed from the dataset. It reduces computation time.
4.4	Parameter selection
The training algorithm has 3 parameters to be given as inputs: minAtt, minSize and MIN. The parameters minAtt takes values in the range [1,d], where d is the number of attributes. Higher values of 'minAtt corresponds to more specialization leading to a larger number of rules and a lower value of minAtt corresponds to more generalization leading to fewer number of rules. But over generalization can make normal and attack data indistinguishable resulting in more false positives or false negatives. So, a lower limit for minAtt is specified using the parameter MIN. Now, the possible range of values for minAtt becomes [MIN,d]. To determine the value of MIN, information gain [20] is calculated for each attribute in the training dataset. Attributes with very low information gain can be ignored and the number of remaining attributes is set as the value for MIN. The parameter minSize is the minimum number of data elements received to form a cluster. In some cases, at least one training example for an attack should be considered significant. So, the minimum value for minSize is 1. Its value should increase with increase in minAtt value. The maximum value for minSize can be set to be any multiple of log2 (n), where n is the number of training examples.
In our experiments, MIN = 27. The value for MinAtt is set to 33, which is gradually reduced to MIN = 27 in steps of -1. The starting value for MinSize is set to 2log2 (n). The similarity thresholds for all attributes are set to zero, Si =0, i = 1, • • • ,d.
4.5	Performance measures
Evaluation of performance of a classification model is based on the counts of test records correctly and incorrectly predicted by the model. These counts are tabulated in a confusion matrix [21]. A confusion matrix that summarizes the number of instances predicted correctly or incorrectly by a classification model is shown in Table 6.
Network Anomaly Identification...
Informatica 37 (2013) 93-105
99
Table 6: Confusion matrix for binary classification
Table 7: 2-class confusion matrix for TUIDS datasets
		Predicted Class	
		+	-
Actual Class	+	f++ (TP )	f+- (FN)
	-	f-+ (FP )	f-- (TN)
Recall and Precision are two widely used metrics employed in applications where successful detection of one of the classes is considered more significant than detection of the other classes. A formal definition of the metrics is given below.
Precision, p =
Recall, r
TP
TP + FP TP
(7)
(8)
TP + FN
Building a model that maximizes both precision and recall is the key challenge for classification algorithms. Another measure of accuracy is PCC (percentage of correct classification). A formal definition of PCC is given below.
PCC
TP + TN
(9)
TP + FN + FP + TN
A high value of PCC ensures reasonably correct classification.
5 Experimental results
The experiments were performed on a 3 GHz HP dc 7000 series desktop with 2 GB RAM, 250 GB HDD. C++ programs were used in a LINUX environment. We provide the cross evaluation result performing training and testing of the method with corresponding training and testing datasets.
5.1 Results on TUIDS Intrusion Datasets
Two categories of experiments 2-class (normal and attack) and all-attacks behaviors are carried out.
«8	Actual		Predicted Class		
& « O tt	Class	Normal	Attack Sum	Recall	1-Prc*
	Normal	34790	253 35043	0.9928	0.0267
	Attack	955	396877 397832	0.9976	0.0006
"o 13 « £	Sum	35745	397130 432875		
Cu ±		Re-substitution error =0.0028		PCC=	=99.72%
			Predicted Class		
		Normal	Attack Sum	Recall	1-Prc*
£ T,	Normal	36107	295 36402	0.9919	0.0246
® ^	Attack	910	362819 363729	0.9975	0.0008
	Sum	37017	363114 400131		
		Re-substitution error =0.0030		PCC=	=99.70%
			Predicted Class		
		Normal	Attack Sum	Recall	1-Prc*
S	Normal	2414	31 2445	0.9876	0.2175
o t«	Attack	671	38544 39215	0.9829	0.0009
ë	Sum	3085	38575 41660		
Re-substitution error =0.0169
PCC=98.31%
Note- *1-Precision
and 98.31% for Packet level, Flow level andPortscan datasets, respectively. The average execution time of classification for Packet level, Flow level and Portscan datasets are 0.39 minute, 0.17 minute and 0.14 minute, respectively.
5.2 Results on KDD Cup 1999 dataset
We perform three different experiments which are categorized into 2-class (normal and attack), 5-class (normal, R2L, DoS, Probe and U2R) and all-attacks behaviors.
5.2.1 2-class prediction results
The confusion matrix for the 2-class behavioral categories on the Corrected KDD is shown in Table 9. Classification model is trained with 10-percentKDD dataset. The classification rate is PCC = 93.40%. The performance degrades due to the fact that no examples are present in the training set corresponding to the 15 categories of attacks that are present in the testing set. Most of these attacks are misclassified as normal by our algorithm.
5.1.1 2-class prediction results
The confusion matrices for the 2-class behavioral categories on the Packet Level, Flow Level and Portscan datasets are shown in Table 7. Classification rates with PCC 99.72%, 99.70 and 98.31% for Packet Level, Flow Level and Portscan datasets, respectively indicate good performance for our method.
5.2.2 5-class prediction results
The confusion matrix for the 5-class behavioral categories on the Corrected KDD datasets is shown in Table 10. The classification model is trained with the 10-percentKDD dataset. In this case most of the unseen attacks are misclassified as normal category and PCC reduced to 92.39%.
5.1.2 All-attacks prediction results
The confusion matrices for all-attacks categories on the Packet level, Flow level andPortscan datasets are shown in Table 8. Classification rates of PCC are 99.42%, 99.01%
5.2.3 All-attacks prediction results
The confusion matrix for the all-attacks categories on the Corrected KDD is shown in Table 11. The classification model is trained with 10%KDD dataset that includes
100
Informática 37 (2013) 93-105	P. Gogoi et al.
Table 10: 5-class confusion matrix results of Corrected KDD dataset training with 10%KDD dataset
Predicted Class
		Normal	R2L	DoS	Probe	U2R	Sum	Recall	1-Prc*
s	Normal	60211	21	73	284	4	60593	0.9937	0.7386
i3	R2L	13868	1115	3	1356	5	16347	0.0682	0.9378
•a	DoS	6360	42	223349	102	0	229853	0.9717	0.9977
	Probe	1036	0	443	2687	0	4166	0.6450	0.6067
<	U 2R	49	11	0	0	10	70	0.1429	0.5263
	Sum	81524	1189	223868	4429	19	311029		
Re-substitution error=0.0761	PCC=92.39%
Note- *1-Precision
22 attack classes. The testing dataset Corrected KDD include 37 attack classes. Since no examples are present in the training set, objects belonging to the 15 categories of attacks cannot be classified correctly and hence we exclude them (18729 objects) from the testing dataset. We see from the results that DoS and Probe attacks are well detected but R2L and U2R attacks are detected very poorly. It is apparent from that there are very few examples in the training set corresponding to attacks belonging to R2L and U2R categories compared to DoS, Probe and normal categories. The average execution time of classification for Corrected KDD dataset is 1 minute.
5.3 Results on NSL-KDD dataset
NSL-KDD [19] is a data set for network-based intrusion detection systems. It is the new version of KDD Cup 1999 intrusion detection benchmark dataset. In the KDD Cup dataset, there are huge number of redundant records, which can cause the learning algorithms to be biased towards the frequent records. To solve this issue, one copy of each record is kept in the NSL-KDD data set. Though, this dataset is not the perfect representative of real networks, still, it can be applied as an effective benchmark dataset to compare different intrusion detection methods. NSL-KDD included two datasets KDDTrain+ and KDDTest+ are shown in Table 5.
5.3.1	2-class prediction results
The confusion matrix for the 2-class behavior category on the KDDTest+ datasets is shown in Table 12. The classification model is trained with KDDTrain+ dataset. Classification rate of PCC = 98.34% in Table 12 indicate good performance for our method. The performance for the datasets is better compared to the KDD Cup dataset for 2-class classification.
5.3.2	5-class prediction results
The confusion matrix for the 5-class behavioral category of KDDTest+ dataset is shown in Table 16. The classification model is trained with KDDTrain+ dataset. Classification rate of PCC = 98.39% in Table 16 indicate good performance for our method datasets.
5.3.3 All-attacks prediction results
The confusion matrix for the all-attacks categories on KDDTest+ dataset is shown in Table 13. Here, the classification model is trained with KDDTrain+ dataset. The testing dataset KDDTest+ includes 37 attack classes and training dataset KDDTtrain+ includes 15 attack classes. Objects belonging to the 15 categories of attacks cannot be classified correctly since no examples are present in the training set and hence they are excluded (3751 objects) from the testing dataset. Classification rate of PCC = 98.94% in Table 13 indicates good performance for our method. The classification rates still remain better compared to the KDD Cup dataset. The average execution time of classification for KDDTest+ dataset is 0.07 minute.
5.3.4 Performance comparisons
The algorithms C4.5 [22], CART [23], BayesianNetwork(BN) [24] and CN2 [25] rule-based algorithm are executed for TUIDS dataset using Weka [26]. Their results are compared with method as given in Table 17 for TUIDS datasets. The performance results of our method outperforms the other methods.
The performance results of our method for Corrected KDD dataset is compared with the experiment results provided in [14] for methods C4.5 [14, 22], CART [14, 23], BayesianNetwork(BN) [14, 24] and CN2 [14, 25] rule-based algorithm as given in Tables 18-20. We see that in terms of PCC our algorithm performs better than these algorithms in most of the cases.
The performance rates of SVM-based IDS [15], a combined method of hierarchical clustering and SVM method, are shown in Tables 14 and 15 using Corrected KDD dataset. The evaluation result using the Corrected KDD dataset, shows that our algorithm performance rates are distinctly higher in the 5-class attack behavior and in new attack detection. The accuracy rate of detection for DoS attacks is 99.99% as shown in Table 14. In the detection of snmpgetattack and snmpguess attacks, our method detects 6457 and 2360 records respectively, whereas the other method detected 0 and 1 record respectively as shown in Table 15.
Network Anomaly Identification...
Informatica 37 (2013) 93-105 101
Table 16: 5-class confusion matrix of KDDTest+ dataset trined with KDDTrain+ dataset
Predicted Class
		Normal	R2L	DoS	Probe	U2R	Sum	Recall	1-Prc*
s	Normal	9556	125	5	24	0	9710	0.9841	0.0195
	R2L	157	2727	0	0	3	2887	0.9446	0.0482
-g	DoS	14	1	7443	0	0	7458	0.9980	0.0009
	Probe	16	5	2	2399	0	2422	0.9905	0.0099
<	U 2R	3	7	0	0	57	67	0.8507	0.0500
	Sum	9746	2865	7450	2423	60	22544		
Re-substitution error=0.0161	PCC=98.39%
Note- *1-Precision
Table 17: Comparison among CART, C4.5, CN2, BN for all Attacks of TUIDS datasets
Attack	CART	C4.5__CN2__BN_ Our Algorithm
O 1 values_Recall 1-Prc*_Recall 1-Prc*_Recall 1-Prc*_Recall 1-Prc*_Recall 1-Prc*
	normal	0.9430	0.0540	0.9436	0.0507	0.9822	0.1442	0.7855	0.0215	0.9928	0.0014
	smurf	0.6360	0.3862	0.6384	0.3828	0.0026	0.0000	0.9503	0.6135	0.9981	0.0403
	1234	0.7289	0.1721	0.2353	0.5000	0.7900	0.2410	0.8400	0.2510	0.9883	0.0550
	bonk	0.8600	0.3156	0.7500	0.2100	0.8300	1.0000	0.8950	0.2400	0.5714	0.2000
	fraggle	1.0000	0.0000	1.0000	0.0000	1.0000	0.0011	0.9995	0.0012	1.0000	0.0000
	jolt	0.9641	0.0199	0.9902	0.0000	0.9248	0.0752	0.9739	0.2905	0.9998	0.0424
75 >	nestea	0.7694	0.2310	0.0556	0.0000	6830	0.3100	0.8450	0.2010	0.9993	0.0027
	newtear	0.6829	0.2390	0.6300	0.2190	0.5910	0.2110	0.9300	0.3400	0.9900	0.0000
	oshare	0.7590	0.3270	0.8200	0.1040	0.6703	0.1401	0.8700	0.1031	1.0000	0.0000
r, Û-	saihyousen	0.8900	0.1590	0.2353	0.5000	0.7810	0.2110	0.8505	0.2030	0.9500	0.0952
	syndrop	0.5910	0.3410	0.4550	0.1060	0.6900	0.1030	0.5450	0.1300	1.0000	0.0331
	syn	1.0000	0.0000	1.0000	0.0000	1.0000	0.0011	0.9995	0.0012	0.9931	0.0014
	teardrop	0.9641	0.0199	0.9902	0.0000	0.9248	0.0752	0.9739	0.2905	0.9759	0.0714
	window	0.7910	0.3200	0.0556	0.0000	0.7500	0.2100	0.7480	0.2041	0.9881	0.2727
	winnuke	0.6830	0.1520	0.5800	0.2170	0.5950	0.2140	0.6200	0.2100	1.0000	0.0000
	xmas	0.7420	0.1320	0.7300	0.2380	0.6350	0.2100	0.7900	0.2100	0.9911	0.0632
		PCC	98.30%		98.65%		95.67%		94.15%		99.42%
	normal	0.9500	0.2310	0.8850	0.2200	0.8350	0.2101	0.7950	0.0067	0.9918	0.0018
	smurf	0.8507	0.2030	0.7900	0.1220	0.8900	0.2020	0.8105	0.0821	0.6000	0.0103
	1234	0.9863	0.0137	0.9944	0.0293	0.8546	0.1428	0.9850	0.4125	0.9999	0.0150
	bonk	0.9800	0.1040	0.8700	0.0098	0.9300	0.2055	0.8600	0.0025	0.9961	0.0340
	fraggle	0.9724	0.2917	0.8598	0.0628	0.8898	0.1113	0.8677	0.1160	0.9987	0.0029
	jolt	0.8500	0.2010	0.2308	0.0000	0.8250	0.0712	0.9100	0.1090	0.8486	0.0474
75	land	0.9706	0.0404	0.8927	0.0389	0.8965	0.0268	0.9924	0.0905	0.9998	0.0000
	nestea	0.9750	0.0250	0.9789	0.0000	0.8762	0.1086	0.9657	0.0639	0.5217	0.0021
o	newtear	0.7809	0.3105	0.8700	0.2110	0.9700	0.2510	0.6950	0.0910	0.7956	0.0062
in	saihyousen	0.8510	0.1700	0.9881	0.3197	1.0000	0.0000	1.0000	0.6147	0.7817	0.0059
	syndrop	0.9205	0.1850	0.9769	0.0000	0.8900	0.0072	0.6900	0.2900	0.6515	0.0371
	syn	0.9990	0.0016	0.9978	0.0011	0.9994	0.0011	0.9923	0.0000	0.9775	0.0214
	teardrop	0.4500	0.0230	0.5000	0.0000	0.4900	0.0087	0.7800	0.0426	0.5769	0.0314
	window	0.8950	0.0089	0.9400	0.2330	0.7800	0.2600	0.8100	0.1100	0.9832	0.3728
	winnuke	1.0000	0.4583	0.9545	0.0132	0.7951	0.1164	0.9791	0.0835	0.9740	0.0000
	xmas	0.7025	0.5088	0.8038	0.3553	0.8987	0.1744	0.9494	0.4700	0.9720	0.0538
		PCC	98.69%		97.19%		95.26%		95.15%		99.01%
	Normal	0.9342	0.3527	0.5000	0.0078	0.7263	0.4219	0.8621	0.0069	0.9876	0.0024
	SYN	0.8458	0.3910	0.8600	0.7610	0.6500	0.4510	0.5500	0.3290	0.9859	0.0040
i	ACK	1.0000	0.4583	0.9545	0.0132	0.8252	0.1164	0.9791	0.0835	0.9930	0.0252
	FIN	0.7536	0.5088	0.8038	0.3553	0.9393	0.1744	0.9494	0.4700	0.9766	0.0153
£	xmas	0.8813	0.4510	0.6800	0.24100	0.7502	0.3160	0.6578	0.2715	0.9761	0.0143
		PCC	87.45%		82.37%		78.76%		78.99%		98.31%
Note- *1-Precision
102 Informatica 37 (2013) 93-105	P. Gogoi et al.
Table 18: Comparison among CART, C4.5, CN2, BN for two-class on Corrected KDD dataset
	CART		C4.5		CN2		BN		Our Algorithm	
Values	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*
Normal Attack	0.9297 0.9879	0.0511 0.0169	0.9297 0.9879	0.0511 0.0145	0.8917 0.9852	0.0640 0.0259	0.9869 0.9463	0.1835 0.0033	0.9001 0.9940	0.0269 0.0237
PCC		97.65%		97.85%		96.69%		95.42%		97.57%
Note- *1-Precision
Table 19: Comparison among CART, C4.5, CN2, BN for five-class on Corrected KDD dataset
Values	CART		C4.5		CN2		BN		Our Algorithm	
	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*
Normal	0.9379	0.0526	0.9442	0.0526	0.8708	0.0490	0.7946	0.0624	0.9007	0.0273
R2L	0.7813	0.1919	0.8153	0.1927	0.8451	0.3561	0.8871	0.4554	0.9110	0.2873
DoS	0.9984	0.0027	0.9997	0.0011	0.9993	0.0015	0.9811	0.0022	0.9999	0.0001
Probe	0.9081	0.2525	0.9482	0.0403	0.9585	0.0230	0.8778	0.3036	0.9875	0.0044
U 2R	0.5526	0.4545	0.6711	0.0192	0.6754	0.12	0.7281	0.9195	0.7857	0.0517
PCC		97.37%		97.83%		96.54%		93.83%		97.57%
Note- *1-Precision
Table 20: Comparison among CART, C4.5, CN2, BN for all Attacks on Corrected KDD dataset
Attack values
CART
C4.5
CN2
BN
Our Algorithm
	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*	Recall	1-Prc*
normal	0.9430	0.0540	0.9436	0.0507	0.9822	0.1442	0.7855	0.0215	0.9016	0.0257
snmpgetattack	0.6360	0.3862	0.6384	0.3828	0.0026	0.0000	0.9503	0.6135	0.8341	0.4792
named	0.0000	1.0000	0.2353	0.5000	0.0000	1.0000	0.0000	1.0000	0.8824	0.3750
xlock	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.8889	0.1111
smurf	1.0000	0.0000	1.0000	0.0000	1.0000	0.0011	0.9995	0.0012	1.0000	0.0000
ipsweep	0.9641	0.0199	0.9902	0.0000	0.9248	0.0752	0.9739	0.2905	0.9869	0.0033
multihop	0.0000	1.0000	0.0556	0.0000	0.0000	1.0000	0.0000	1.0000	0.7222	0.0714
xsnoop	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.7500	0.0000
sendmail	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.9412	0.2000
guess_passwd	0.9968	0.0566	0.9863	0.0242	0.9725	0.0270	0.9679	0.0404	0.9979	0.0002
saint	0.1236	0.0000	0.8302	0.2382	0.8003	0.2209	0.7024	0.3277	0.9402	0.6121
bufer_overflow	0.0000	1.0000	0.4545	0.4118	0.0000	1.0000	0.0000	1.0000	0.9545	0.0455
portsweep	0.8362	0.1111	0.9463	0.1604	0.7260	0.1376	0.9915	0.2252	0.9746	0.0000
pod	0.8391	0.0000	1.0000	0.4082	0.8391	0.0000	0.7701	0.4071	0.9655	0.0118
apache2	0.0000	1.0000	0.9937	0.1841	0.8476	0.0399	0.9786	0.0152	0.9987	0.0000
phf	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
udpstorm	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
warezmaster	0.9863	0.0137	0.9944	0.0293	0.8546	0.1428	0.9850	0.4125	0.9744	0.0013
perl	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
satan	0.9724	0.2917	0.8598	0.0628	0.8898	0.1113	0.8677	0.1160	0.3270	0.0582
xterm	0.0000	1.0000	0.2308	0.0000	0.0000	1.0000	0.0000	1.0000	0.7692	0.0000
mscan	0.9706	0.0404	0.8927	0.0389	0.8965	0.0268	0.9924	0.0905	1.0000	0.0000
processtable	0.9750	0.0250	0.9789	0.0000	0.8762	0.1086	0.9657	0.0639	1.0000	0.0000
ps	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.8125	0.0000
nmap	0.0000	1.0000	0.9881	0.3197	1.0000	0.0000	1.0000	0.6147	1.0000	0.0000
rootkit	0.0000	1.0000	0.0769	0.0000	0.0000	1.0000	0.0000	1.0000	0.6923	0.1818
neptune	0.9990	0.0016	0.9978	0.0011	0.9994	0.0011	0.9923	0.0000	1.0000	0.0000
loadmodule	0.0000	1.0000	0.5000	0.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
imap	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
back	1.0000	0.4583	0.9545	0.0132	0.7951	0.1164	0.9791	0.0835	1.0000	0.0000
httptunnel	0.7025	0.5088	0.8038	0.3553	0.8987	0.1744	0.9494	0.4700	0.9873	0.0000
worm	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
mailbomb	0.9516	0.0000	0.9982	0.0062	0.9998	0.0161	0.9992	0.0046	0.9998	0.0000
ftp_write	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
teardrop	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.4167	0.1667
land	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
sqlattack	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	1.0000	0.0000
snmpguess	0.9909	0.0004	0.9983	0.0144	0.3603	0.3812	0.3624	0.4718	0.9809	0.0000
PCC		97.25%		97.69%		96.16%		94.75%		97.25%
Note- *1-Precision
Network Anomaly Identification...	Informatica 37 (2013) 93-105 103
Table 8: All-attacks confusion matrix for TUIDS datasets
i s
Actual		Predicted Class		
Class	Detected	Present	Recall	1-Prc*
normal	34790	35043	0.9928	0.0014
smurf	38652	38726	0.9981	0.0403
1234	19679	19902	0.9883	0.0550
bonk	387	678	0.5714	0.2000
fraggle	8622	8624	1.0000	0.0000
jolt	27659	27661	0.9998	0.0424
nestea	23758	23775	0.9993	0.0027
newtear	23537	23775	0.9900	0.0000
oshare	20000	20000	1.0000	0.0000
saihyousen	4338	4567	0.9500	0.0952
syndrop	53623	53627	1.0000	0.0331
syn	20230	20371	0.9931	0.0014
teardrop	8171	8373	0.9759	0.0714
window	39908	40386	0.9881	0.2727
winnuke	68248	68249	1.0000	0.0000
xmas	38771	39118	0.9911	0.0632
Sum	430373	432875		
Re-substitution error=0.0058			PCC=	=99.42%
Actual		Predicted Class		
Class	Detected	Present	Recall	1-Prc*
normal	36103	36402	0.9918	0.0018
smurf	18	30	0.6000	0.0103
1234	163408	163412	0.9999	0.0150
bonk	27196	27303	0.9961	0.0340
fraggle	38238	38288	0.9987	0.0029
jolt	1183	1394	0.8486	0.0474
land	2	2	0.9998	0.0000
nestea	48	92	0.5217	0.0021
newtear	109	137	0.7956	0.0062
saihyousen	197	252	0.7817	0.0059
syndrop	43	66	0.6515	0.0371
syn	20330	20797	0.9775	0.0214
teardrop	75	130	0.5769	0.0314
window	40947	41646	0.9832	0.3728
winnuke	29915	30714	0.9740	0.0000
xmas	38361	39465	0.9720	0.0538
Sum	396173	400131		
Re-substitution error=0.0099			PCC=	=99.01%
Actual		Predicted Class		
Class	Detected	Present	Recall	1-Prc*
Normal	2414	2445	0.9876	0.0024
SYN	9612	9750	0.9859	0.0040
ACK	9875	9945	0.9930	0.0252
FIN	9551	9780	0.9766	0.0153
xmas	9505	9740	0.9761	0.0143
Sum	40957	41660		
Re-substitution error=0.0169			PCC=	98.31%
Table 11: All-attacks confusion matrix results
Corrected KDD dataset training with 10%KDD
Predicted Class
of
		Detected	Present	Recall	1-Prc*
	normal	60207	60593	0.9936	0.0790
	smurf	164089	164091	1.0000	0.0002
	ipsweep	298	306	0.9739	0.0418
	multihop	0	18	0.0000	1.0000
	guess _passwd	3	4367	0.0007	0.2500
	buf f er_overf low	2	22	0.0909	0.5000
	portsweep	340	354	0.9605	0.2075
	pod	82	87	0.9425	0.1546
	phf	1	2	0.5000	0.0000
ss	warezmaster	2	1602	0.0012	0.0000
clas	perl	0	2	0.0000	1.0000
cal	satan	1280	1633	0.7838	0.1551
tua Act	nmap	84	84	1.0000	0.0455
	rootkit	2	13	0.1538	0.9962
	neptune	57971	58001	0.9995	0.0004
	loadmodule	0	2	0.0000	1.0000
	imap	0	1	0.0000	1.0000
	back	1068	1098	0.9727	0.0000
	ftp_write	0	3	0.0000	1.0000
	teardrop	12	12	1.0000	0.7857
	land	6	9	1.6667	0.6667
	warezclient	0	0	0.0000	1.0000
	spy	0	0	0.0000	1.0000
	Sum	285447	292300		
	Re-substitution error=0.0234			PCC=	97.66%
Note- *1-Precision
Note- *1-Precision
Table 12: 2-class confusion matrix of KDDTest+ dataset training with KDDTrain+
Table 9: 2-class Confusion matrix of Corrected KDD with training 10%KDD dataset
		Predicted Class			
	Normal	Attack	Sum	Recall	1-Prc*
Actual class	Normal 60215 Attack 20155 Sum 80370	378 230281 230659	60593 250436 311029	0.9938 0.9195	0.2508 0.0016
Re-substitution error=0.0660	PCC=93.40%
Predicted Class
Normal	Attack	Sum	Recall	1-Prc*
Actual Normal 9531 class Attack 195 Sum 9726	179 12639 12818	9710 12834 22544	0.9816 0.9848	0.0200 0.0140
Re-substitution error	=0.0166		PCC=	98.34%
Note- *1-Precision				
Note- *1-Precision
104	Informática 37 (2013) 93-105
P. Gogoi et al.
6 Conclusion and future works
Table 13: All attacks confusion matrix of KDDTest+ dataset
Predicted Class
	Detected	Present	Recall	1-Prc*
normal	9609	9710	0.9895	0.0125
smurf	605	665	0.9098	0.0529
ipsweep	136	141	0.9845	0.0178
multihop	17	18	0.9444	0.2273
guess _passwd	1130	1231	0.9180	0.0238
buf f er_over f l ow	20	20	1.0000	0.0000
portsweep	149	157	0.9490	0.0688
pod	39	41	0.9512	0.0000
phf	2	2	1.0000	0.0000
warezmaster	902	944	0.9555	0.0075
perl	2	2	1.0000	0.0000
satan	716	735	0.9741	0.1836
nmap	73	73	1.0000	0.0000
rootkit	12	13	0.9231	0.2500
neptune	4626	4657	0.9933	0.0008
loadmodule	2	2	1.0000	0.0000
imap	1	1	1.0000	0.0000
back	339	359	0.9443	0.3416
ftp_write	3	3	1.0000	0.0000
teardrop	5	12	0.4167	0.2857
land	7	7	1.0000	0.0000
Sum	18593	18793		
Re-substitution error=0.0181			PCC=	:98.94%
Note- *1-Precision
Table 14: Comparison with SVM-based IDS for 5-class over Corrected KDD dataset
Type of
SVM-based IDS
Accuracy of Our Algorithm
Traffic	Correctly Detected	Miss Detected	Accuracy (%)	(%)	
Normal	60166	427	99.29		90.07
DoS	228769	1084	99.53		99.99
Probe	4064	102	97.55		98.75
U 2R	45	183	19.73		78.57
R2L	4664	11525	28.81		91.10
Overall	297708	13321	95.72		97.57
In this paper we provide a clustering based classification method and applied it in network anomaly detection. We have developed a subspace based incremental clustering method which forms the basis for the classification method. A training algorithm with a combination of unsupervised incremental clustering and supervised classification algorithm clusters a labeled training dataset into different clusters which are then represented by their profiles. These profiles together with the class label behave as classification rules. Prediction is done using a supervised classification algorithm that matches testing objects with the cluster profiles for labeling them. The simple classification method provided is effective in network anomaly detection as indicated by the evaluation results on real life TUIDS intrusion dataset and the benchmark intrusion datasets. The method can be applied for other classification jobs as well.
At present, rules (profiles) are stored in a flat file. A decision tree can be constructed based on the derived rules to reduce search time. Investigation for dealing with test instances that fit poorly with the supervised profiles may increase the performance of the algorithm. Another possible area is rule refinement that may cause reduction in number of attributes in each rule and also reduction of some rules themselves. For instance, KDD 1999 data set contains many similar training examples having different labels. In such situation, similar rules may be derived to detect more than one category of attacks.
Acknowledgement
This work is an outcome of a research project funded by MCIT, New Delhi.
References
Table 15: Comparison with SVM-based IDS for attack detection over Corrected KDD dataset
Attack Name
Attack present
Detection
SVM-based IDS
Our
Algorithm
apache2	794	536	793
mailbomb	5000	4459	4999
processtable	759	578	759
mscan	1053	981	1053
saint	736	724	692
httptunnel	158	15	156
ps	16	3	13
sqlattack	2	2	2
xterm	13	5	10
sendmail	17	2	16
named	17	3	15
snmpgetattack	7741	0	6457
snmpguess	2406	1	2360
xlock	9	2	8
xsnoop	4	0	3
worm	2	0	0
Total	18729	39.04%	92.56%
[1]	W. Lee and S. J. Stolfo, "Data mining approaches for intrusion detection," in Proc. of the 7th conf. on USENIX Security Symposium - Volume 7. San Antonio, Texas, USA: USENIX, 26-29, January 1998, pp. 1-16.
[2]	R. Agarwal and M. V. Joshi, "PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection)," Department of Computer Science, University of Minnesota, Tech. Rep. TR 00-015, 2000.
[3]	K. Burbeck and S. Nadjm-Tehrani, "ADWICE -Anomaly Detection with Real-Time Incremental Clustering," in Proc. of Information Security and Cryptology -ICISC 2004, vol. 3506/2005. Berlin, Germany: Springer, May 2005, pp. 407-424.
[4]	T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: an effective data clustering method for very large databases," SIGMOID Record 1996 ACM SIGMOID
Network Anomaly Identification...
Informatica 37 (2013) 93-105	105
Int'nl Conf. on Management of Data, vol. 25, pp. 103-114, 1996.
[5]	A. Abraham and R. Jain, "Soft Computing Models for Network Intrusion Detection Systems," Studies in Computational Intelligence, vol. 16, pp. 191-211, 2005.
[6]	H. Yang, F. Xie, and Y. Lu, "Clustering and Classification Based Anomaly Detection," Fuzzy Systems and Knowledge Discovery, vol. 4223/2006, pp. 10821091,2006.
[7]	A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava, "A Comparative Study of Supervised Machine Learning Techniques for Intrusion Detection," in Proc. of the 3rd SIAM Int'nl Conf. on Data Mining. San Francisco, CA: SIAM, May 1-3 2003, pp. 25-36.
[8]	G. Stein, B. Chen, A. S. Wu, and K. A. Hua, "Decision tree classifier for network intrusion detection with GA-based feature selection," in Proc. of the 43rd Annual Southeast Regional Conference, ACM-SE 43, 2005, pp. 136-141.
[9]	J. Zhang and M. Zulkernine, "Network intrusion detection using random forest," in Proc. of the 3rd Annual Conf. on Privacy, Security and Trust, 2005, pp. 53-61.
[10]	Y. Li, B.-X. Fang, L. Guo, and Y. Chen, "TCM-KNN Algorithm for Supervised Network Intrusion Detection," LNCS, vol. 4430/2007, pp. 141-151, 2007.
[11]	P. Garcia-Teodoro, J. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez, "Anomaly-based network intrusion detection: Techniques, systems and challenges," Computers & Security, vol. 28, pp. 1828, 2009.
[12]	P. Gogoi, B. Borah, and D. K. Bhattacharyya, "Anomaly Detection Analysis of Intrusion Data using Supervised & Unsupervised Approach," Journal of Convergence Information Technology, vol. 5(1), pp. 95-110, Feb. 2010.
[13]	P. Gogoi, D. K. Bhattacharyya, B. Borah, and J. K. Kalita, "A Survey of Outlier Detection Methods in Network Anomaly Identification," The Computer Journal, vol. 54(4), pp. 570-588, 2011.
[14]	R. Beghdad, "Critical Study of Supervised Learning Techniques in Predicting Attacks," Information Security Journal: A Global Perspective, vol. 19, pp. 2235, 2010.
[15]	S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, and C. D. Perkasa, "A novel intrusion detection system based on hierarchical clustering and support vector machines," Expert Systems with Applications, vol. 38, pp. 306-313, 2011.
[16]	M. Roesch, "Snort-lightweight intrusion detection for networks," in Proc. of the 13th USENIX conf. on System administration. Seattle, Washington: USENIX, Nov. 1999, pp. 229-238.
[17]	P. Gogoi, M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Packet and Flow Based Network Intrusion Dataset," in LNCS Proc. ofIC3 2012, ser. CCIS, vol. 306. Berlin Heidelberg: Springer, August 6-8 2012, pp. 322-334.
[18]	University of California, "KDD Cup 1999 Data," http://kdd.ics.uci.edu/^/kddcup99.html, Irvine, 1999.
[19]	M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A Detailed Analysis of the KDD CUP 99 Data Set," in Proc. of the 2009 IEEE Symposium on Computational Intelligence in Security and Defence Applications (CISDA 2009). Ottawa, Canada: IEEEXplore, 8-10 July 2009, pp. 1-6.
[20]	H. G. Kayacik, A. N. Z. Heywood, and M. I. Heywood, "Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets," in Proc. of the 3rd Annual Conf. on Privacy, Security and Trust. Halifax, Dalhousie University, October 2005.
[21]	P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Pearson Education, Inc., 2009.
[22]	J. R. Quinlan, "C4.5: Programs for Machine Learning," Morgan Kaufman, New York, USA, 1993.
[23]	L. Brieman, J. H. Friedman, R. A. Olshen, and C. Stone, "Classification and reregression trees," Wadsworth & Brooks, Monterey, CA, 1984.
[24]	F. V. Jesen, "Introduction to Bayesien networks," UCL Press, 1996.
[25]	P. Clark and T. Niblett, "The CN2 induction algorithm," Machine Learning, vol. 3, pp. 261-283, 1989.
[26]	University of Waikato, "Weka 3: Data Mining Software in Java," http://www.cs.waikato.ac.nz/ml/weka, New Zealand, 1999.