https://doi.org/10.31449/inf.v47i5.4527 Informatica 47 (2023) 1–20 1 
Analyzing Adaptive and Non-Adaptive Online Learners on 
Imbalanced Evolving Streams 
 
Himaja. D
1
, Dondeti Venkatesulu
2
, Uppalapati Srilakshmi
3 
1 
Computer Science and Engineering, Vignan’s Foundation for Science and Technology (Deemed to be University), 
Guntur, Andhra Pradesh, India  
2, 3 
Advanced Computer Science and Engineering, Vignan’s Foundation for Science and Technology (Deemed to be 
University), Guntur, Andhra Pradesh, India  
E-mail: himajadirsumilli@gmail.com 
Keywords: online learners, dynamic class imbalance, concept drift, active learning, support vector machine 
Received: November 22, 2022 
 
The online class imbalance and concept drift (OCI-CD) has recently received much interest. The impact 
of this combined problem on state-of-the-art of online adaptive and non-adaptive learners has received 
little attention. This study investigates the effect of parameters such as current imbalance ratio, stream 
length, drift type, drift levels, and imbalance state (static or dynamic) on adaptive and non-adaptive online 
learners. The experimental results show that each parameter considered for the study has a significant 
impact on learner performance: (a) minority class performance decreases as the degree of imbalance 
increases, (b) non-adaptive learners are much susceptible to class imbalance, concept drift, and the 
combined problem of both drifts than adaptive learners, (c) adaptive learners are only susceptible to class 
imbalance drifts, and (d) the impact of the dynamic degree of imbalance is more on learner than static (e) 
the adaptive large scale support vector machine yields stable performance to all the parameters 
considered for the study. Based on these findings, directions for developing new approaches are also 
presented. 
Povzetek: Analizirane so razne metode strojnega učenja glede na parametre učenja, recimo spreminjanje 
neuravnoteženja razredov.
1 Introduction 
Real-world classification problems like fraud and fault 
detection are constantly changing due to class imbalance 
(CI) and concept drift (CD) [1, 2, 3]. One class of samples 
in a stream will experience CI if it is much smaller [4, 5]. 
The CI between classes changes with time in evolving 
streams (i.e., dynamic) [3]. When the underlying function 
that generates concepts changes, CD happens. 
Let the input (x) and goal (y) variables be 
included in the training dataset. The Bayesian theorem 
states that three different types of drifts can result from 
changes in (i) the posterior p(y/x), (ii) the prior p(y), 
without changing p(y/x) and p(x/y), and (iii) the likelihood 
p(x/y), without affecting p(y/x) and p(y) [3]. Real CDs are 
changes in (i) over time that don’t depend on changes in 
p(y) and p(x/y). The sorts of drifts (ii) and (iii) on the other 
hand are virtual CDs [1, 2, 3]. Real and virtual CDs coexist 
in the real world. There are three types of drifts based on 
the speed of evolution: (i) gradual, where the concept 
changes gradually (ii) abrupt, where the underlined 
concept changes suddenly and (iii) recurrent (or cyclic), 
where the same concept recurs regularly [1]. Thus, an 
imbalanced stream evolving with CD is an online class 
imbalance with CD (OCI-CD) problem. 
 An adaptive learner is a window-based technique 
that preserves the training samples from the current time t 
while ignoring the older samples and only using 
representative samples from the window [1, 2]. A non- 
 
adaptive learner, on the other hand, does not employ a  
window for incremental learning streams. The typical 
application of non-adaptive learning strategies is in static 
domains [1, 2, 3]. 
2 Motivation 
Wang et al. [3] recently conducted a systematic analysis 
to determine the impact of CI on three different types of 
drifts (i.e., p(y), p(y/x), and p(x/y)) while ignoring their 
counterparts. The effect of different levels of imbalance 
(static and dynamic) and coupled dynamic OCI-CD drift 
on online learner’s performance has yet to be empirically 
investigated. They only gave advice based on observations 
of cases with a high level of imbalance (i.e., 1:9). 
3 Contributions 
To fill in the gaps mentioned above, the impact of the 
training stream’s characteristics, including the degree of 
imbalance, length at the time t, drift types (CI, CD, and 
OCI-CD), and the state of imbalance (static and dynamic) 
on state-of-the art adaptive and non-adaptive learners used 
for minority class prediction, is explored. 
This study explores various static and dynamic 
imbalanced streams with gradual and abrupt drift levels. 
This work also aims to answer the following research 
questions: 
2 Informatica 47 (2023) 1–20 Himaja. D et al. 
RQ1. Does the length of the stream with respect to the 
imbalance ratio at the current time t impact the online 
learner’s performance? 
RQ2. Is the degree of imbalance or CD whose impact is 
critical on minority class performance degradation? 
RQ3. Is the impact of OCI-CD more adverse than 
individual p(y) or p(y/x) drifts on the learner’s 
performance? 
RQ4. To what extent does online SVM cope with OCI-
CD, p(y), and p(y/x) drifts compared to other online 
learners? 
The case of combined p(x/y), p(y/x) drift is not 
considered in the scope of this study, as the impact is only 
p(x/y) due to the change in the likelihood of the concept 
[3]. 
The following is the structure of the paper. The 
related work is shown in Section 4. Section 5 provides 
background for the problem-related methods. Section 6 
discusses the study design, while section 7 discusses the 
experiments performed on the synthetic data. Section 8 
discusses the validity of the observations on real-world 
data, and section 9 discusses the results obtained in greater 
depth. Section 10 brings this paper to a close. 
4 Related work 
4.1 CI problem 
This problem has solutions at both the algorithmic and 
data levels [4, 5]. Solutions include resampling techniques 
at the data level. Adjusting the threshold [6], cost-sensitive 
learning [7, 8], and novelty detection techniques [9] are 
examples of algorithm-level solutions. 
 Ensemble learning techniques like bagging and 
boosting has been intensively studied as solutions to the 
CI problem. Cost-sensitive learning-based boosting [10, 
11, 12], under-over bagging [13, 14], under-sampling-
based boosting [15], oversampling-based boosting [16, 
17], under sampling-based bagging [18, 19], oversampling 
based bagging [13], and hybrids of bagging and boosting, 
under-over-bagging [13, 14] are proposed hybrid 
ensembles that improve minority class prediction. 
 
4.2 Learning streams from non-stationary 
environments 
Real and virtual drifts, or a combination of the two, can be 
found in an evolving stream. On the topic of drift 
detection, there has been a lot of research, including recent 
surveys [1, 2, 20]. These categorize drift detection 
techniques into two categories: (i) active and (ii) passive. 
 The active methods detect drift first and then 
update/rebuild the learner to adapt to data changes. The 
drift detection can be carried out by hypothesis tests [21, 
22], change-point method [23], sequential hypothesis test 
[24], and change detection test [23]. Recently, statistical 
methods that identify distribution differences have been 
used in SDDM [25] for drift detection, while cluster-based 
distance methods [26] has been used to detect recurring 
CDs. Drift detection methods such as the drift detection 
method for OCI (DDM-OCI) [27], LFR [28], and PAUC 
[29], on the other hand, detect p(y/x) drift in imbalanced 
distributions. Wang et al. defined AUC for multi-class 
classification as prequential multiclass AUC (PMAUC), 
weighted AUC (WAUC), and equal-weighted AUC 
(EWAUC) [30]. 
 When it comes to passive learning techniques, in 
contrast to detection and adaptation methods, the 
highlighted model continuously adapts to the change by 
updating a single classifier or adding/removing/modifying 
a classifier in an ensemble [1] to retain the new knowledge 
and forget the old for each new set of data. A 
heterogeneous dynamic weighted majority (HDWM) [31] 
is suggested to replace the existing base learner in the 
ensemble when a performance decrease is seen. It works 
with both active and passive strategies. Even though they 
require more computing, ensemble approaches are 
superior to single learners. While passive approaches are 
better for gradual drifts, active ones are better for batch 
learning and forecasting rapid drifts [1]. 
 
4.3 Learning imbalanced streams from non-
stationary environments 
This is the issue that arises when CD and CI streams 
coevolve. Either a static or dynamic evolution of the CI 
stream is possible. The term “dynamic” in this context 
refers to the p(y) change [3], or the dynamic change in CI 
degree. 
 Gao et al. [32] proposed an instance propagation 
ensemble mechanism. Chen and he [33] proposed 
selecting the best n minority class instances using 
mahalanobis distance. Lichten and Chawla [34, 35, 36] 
proposed an extension to the work of gao et al. [32]. 
Instead of simply propagating minority class samples, 
they proposed method that propagates misclassified 
majority class instances from the previous model. They 
also proposed weighing every ensemble member based on 
the probability of a combined hellinger distance and 
information gain [34] change detection test. HUWRS.IP 
[36], an instance selection method based on the NB 
classifier is proposed. 
 Learn++CDS and Learn++ are two ensemble 
approaches proposed as an extension to Learn++.NSE 
[37], to handle the OCI-CD problem from the evolving 
stream. The former employs the SMOTE oversampling 
technique to rebalance the data, while the latter employs 
the bagging-based sub-ensemble method. Wang et al. [38] 
proposed a resampling-based ensemble method (OOB/ 
UOB) for online bagging in which the time decay class 
size guides oversampling and under sampling rates. 
 Active approaches with single classifiers include 
recursive least square adaptive cost perceptron (RLSACP) 
[39] and online neural network (ONN) [40]. An ensemble 
of the subset of online sequential extreme learning 
machine (ESOS-ELM) [41] is proposed. Baruva et al. [42] 
proposed a generalized over-sampling-based online 
imbalanced learning framework (GOS-IL) for online 
learners to only cope with p(y) drift. 
 Lu et al. [43] proposed dynamic weighted 
majority for imbalance learning (DWMIL), a batch based 
incremental learning method to deal with the combined 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 3 
problem. Furthermore, as an extension of DWMIL, the 
same authors [44] proposed an adaptive batch-based 
dynamic weighted majority (ACDWM). The batch size is 
adaptively increased until the classifier produces stable 
predictions. Thwart and schenck proposed a two-stage 
active learning algorithm [45]. Korycki et al. [46] 
proposed a two-module uncertainty-based active learning 
strategy for partially labeled nonstationary and 
imbalanced data streams. This method is only suitable for 
binary classification. Korycki and krawczyk [47] 
proposed a solution for CI among multiple classes and CD 
in the presence of limited labels for multi-class 
classification. A comprehensive study of the widely used 
classification and regression techniques was conducted 
[48]. It is evaluated how well the ensemble approaches 
perform [49]. The effectiveness of the various sampling 
approaches is examined [50]. Machine learning methods 
were compared in this investigation [51]. Table 1 
summarizes all the related works. 
 Wang et al. [3] analyzed the impact of p(y), 
p(y/x), and p(x/y) drifts independently on active and 
passive approaches, which are intended to learn from non-
stationary environments. Throughout their study, the 
degree of imbalance is considered as 1:9, and for most 
cases, it is static. The authors pointed out that in the 
existence of OCI-CD, the impact of class imbalance in 
both static and dynamic forms is more critical than the 
p(y/x) and p(x/y) drifts. But the authors didn’t simulate the 
environment to derive that conclusion on the combined 
problem of OCI-CD drift. Furthermore, they stated that 
when three drifts are considered independently, the p(y/x) 
drift has a critical impact on learner performance 
concerning p(y) and p(x/y) drifts, and considered drift 
detection methods are still much susceptible to different 
types of drifts. However, these observations are intended 
for online bagging classifiers (i.e., for both active and 
passive), and the other state-of-the-art adaptive and non-
adaptive learners are not considered. 
 Therefore, to address the above-mentioned gaps, 
the impact of the training stream’s characteristics, 
including the degree of imbalance, length at the time t, 
drift types (CI, CD, and OCI-CD), and the state of 
imbalance (static and dynamic) on state-of-the-art 
adaptive and non-adaptive learners used for minority class 
prediction, is explored. 
5 Background 
This section presents the adaptive and non-adaptive online 
learning algorithms and evaluation measures used in this 
study. 
 
5.1 Online learners 
The online learning algorithms update the model or learn 
a new model even when a single sample is available to 
learn. 
(a) Naive bayes [52]: 
This algorithm uses the baye’s independent assumption on 
the likelihood conditional probabilities. Initially, the 
online version predicts the class of each evolving sample 
with the highest probability among given N classes. This 
new sample is used for training to update the probabilities 
of the existing model. Hence, the model learns 
incrementally for each evolving sample. For adaptive NB, 
the required probabilities are calculated only on the 
window of current training at time t. 
                  𝑝 (𝑦 /𝑥 ) =
p(x/y) 𝑝 (𝑦 )
𝑝 (𝑥 )
   (1) 
(b) Perceptron [52]: 
A Perceptron can be learned either online or in batch 
mode. In an online mode, each sample (input) of size m 
having the bias b is fed to the neural network which gets 
initialized with a random weight vector (W). Next, at the 
level of each neuron, the summation ∑(WX+B) is 
computed and the output y is predicted by applying 
activation function f. Each evolving sample undergoes 
predefined number of epochs until (Y-O) becomes 
minimum by the updation of: 
W n𝑒𝑤 = W𝑜𝑙𝑑 + 𝜂 (Y – O) x   (2) 
where Y is the target and O is the observed prediction from 
the model. 
(c) KNN [52]: 
As it is a lazy learning algorithm, the model is built 
whenever the test sample invokes it. For adaptive KNN, 
an initial training set is maintained in a window of constant 
size. The evolving test samples are predicted against this 
window of training set by considering the class label of the 
K of its nearest neighbors. Once after the prediction, the 
new samples are added to the end of the window. As long 
as new samples are added to the window, the old 
knowledge is forgotten. Hence, the window with these 
data is tending to the current time and can be adaptive to 
the change. For non-adaptive KNN, the model is built on 
the entire training set. 
(d) VFDT [53]: 
A very fast decision tree algorithm, builds a decision tree 
on evolving data based on hoeffding bounds. The main 
idea behind this hoeffding bounds is to get some 
confidence on the data that so far seen. Initially, the root 
node is fitted over the available data, sufficient statistics 
are calculated to compute the information gain on each 
attribute. Let G(X a) be the attribute a of the highest 
information gain and G(X b) be the attribute b of the second 
highest information gain among the given attributes. If 
G(X a)-G(X b)> ϵ, a split can be carried out on attribute a, 
and for all branches of the split is replaced with a leaf node 
which is again initiated with sufficient statistics. Further, 
newly arriving instances are forwarded to the leaf nodes, 
where the updation of the model takes place. In this 
manner, the tree is incrementally updated concerning the 
newly evolving instances. Here the ϵ is calculated as 
𝜖 =
√
𝑅 2
 ln
1
δ
 
2𝑛   (3) 
where R is the range of a real valued variable r, n is the 
number of independent observations so far seen, r is the 
mean of the n independent observations. According to 
hoeffding bound with 1-δ probability, the true mean of the 
variable is r-ϵ. For adaptive VFDT the model is built on 
window of training samples of current time t. 
 
 
4 Informatica 47 (2023) 1–20 Himaja. D et al. 
 
Table 1: Summary of all the related works. 
 
(e) LASVM active learning [54]: 
To accommodate the learning from the incrementally 
arriving data the standard SVM tools such as LIBSVM 
[55] are prone to re-training [56], where the QP is 
repeatedly solved from scratch which is computationally 
expensive when the data becomes large. 
 To overcome this problem, a large support vector 
machine (LASVM) is proposed. This is a kernel-based 
online active learning algorithm, that Process the new 
sample by trying to add it to the existing support vectors 
set S and some blatant non-support vectors are removed 
from S by Reprocess. Usually, the active learning methods 
are incremental and with respect to SVM, the next learning 
task proceeds from the current boundary. Thus, for each 
new sample, at the end of Process (and reprocess) a new 
SVM boundary is learned. 
 The QP problem of LASVM simply extends the 
optimization procedure of sequential minimum 
optimization (SMO) [57] and computes the gradient from 
the previous α’s and S. Hence, learning becomes faster. 
Further, the boundary always models the current data 
being learned. With the notion of adaptability to the 
change in data and due to its applicability to imbalanced 
standalone datasets [58] though, there are much 
incremental learning approaches [55] for SVM online 
learning, here LASVM is considered for this study. 
 As per our knowledge, it is the first sort of study 
that aims to analyze the behavior of LASVM on p(y), 
p(y/x), OCI-CD drifts. 
 
5.2 Performance measures 
Since the main focus is on the minority class prediction, 
evaluation prequential [2] of minority class Recall is used 
as the evaluation measure to anticipate the performance. 
According to [3, 27], Recall is used as a performance 
evaluator. Eventually, the performance of the online 
learners is depicted with incremental learning curves with 
the number of instances versus minority class Recall. 
 
Evaluation prequential: 
It is an interleaved test-and-train procedure to evaluate the 
data stream by testing each evolving sample on the learned 
model, after that using it for training. 
 
Classification Recall: 
The classification Recall of each class usually measured 
as the 
Recall=
𝑇𝑃
𝑇𝑃 +𝐹𝑁
   (4) 
 
Here TP referred as number of positive class samples 
predicted as positives and FN referred as number of 
positive class samples predicted as negatives. Though the 
size of the stream is usually assumed to be infinity, for the 
sake of comparison different sizes are assumed. In this 
work, the prequential results are reported for 50 fixed 
chunk sizes. Hence, the total time steps covered for each 
chunk size is (size of the stream/50). 
 
Method   Type of learning State of imbalance         Drift               Drift Detection 
Batch Online Static Dynamic p(y/x) p(y) Active  Passive 
[25] 
✓ 
 
✓ 
 
✓ 
 
✓ 
 
[26] 
✓ 
 
✓ 
 
✓ 
 
✓ 
 
[27]  
✓ ✓ 
 
✓ 
 
✓ 
 
[28] 
✓ ✓ 
 
✓ ✓ ✓ ✓ 
 
[29]  
✓ ✓ ✓ ✓ ✓ ✓ 
 
[30]  
✓ ✓ ✓ ✓ ✓ ✓ 
 
[31] 
✓ 
 
✓ 
 
✓ 
 
✓ ✓ 
[32]  
✓ ✓ 
 
✓ 
  
✓ 
[33] 
✓ 
 
✓ 
 
✓ 
  
✓ 
[34] 
✓ 
 
✓ ✓ ✓ ✓ 
 
✓ 
[35] 
✓ 
 
✓ 
 
✓ 
 
✓ 
 
[36] 
✓ 
 
✓ 
 
✓ 
 
✓ 
 
[37] 
✓ 
 
✓ 
 
✓ 
  
✓ 
[38]  
✓ ✓ ✓ 
 
✓ 
 
✓ 
[39]  
✓ ✓ 
 
✓ 
  
✓ 
[40]  
✓ ✓ 
 
✓ 
  
✓ 
[41] 
✓ ✓ ✓ 
 
✓ 
 
✓ 
 
[42]  
✓ ✓ ✓ 
 
✓ 
 
✓ 
[43] 
✓ 
 
✓ 
 
✓ 
  
✓ 
[44] 
✓ 
 
✓ 
 
✓ 
  
✓ 
[45] 
✓ 
  
✓ ✓ 
  
✓ 
[46]  
✓ ✓ 
 
✓ 
  
✓ 
[47]  
✓ ✓ ✓ ✓ ✓ 
 
✓ 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 5 
6 Study design 
This section depicts the data generation procedure and the 
experimental setup used in this study. 
 
6.1 Data sets 
Table 2 depicts the synthetic stream generation procedure 
from two generating functions such as CIRCLE and LINE 
[59]. From each of these stream generators, two states of 
imbalance STATIC (i.e., static degree of imbalance) and 
DYNAMIC (i.e., p(y) the prior probabilities of the classes 
change dynamically i.e., dynamic degree of class 
imbalance) are generated as shown in table 3. For each of 
these states, streams with varying degrees of imbalance 
such as [1:9, 2:8, 3:7, 4:6, and 5:5] are generated. For each 
of these imbalanced streams, drifts with three different 
speeds such as NO (i.e., no drift and the streams are 
stationary), Gradual (the drift starts at the middle of the 
stream and it takes few time steps to undergo a complete 
change in the underlined concept) and Abrupt (the drifts 
start from the middle of the stream and it takes only one 
time step to undergo a complete change in the underlined 
concept) are generated. 
 To generate Gradual and Abrupt drifts, the speed 
of the drift is varied from 1 to (chunk size * 10). Each 
synthetic data set contains a single drift and this drift is 
simulated with either of these three different severities 
such as NO (i.e., severity=0%), LOW (i.e., 
severity=≈16%), and HIGH (i.e., severity=≈66%) [59] (as 
shown in table 4). Here, the severity refers to the 
percentage of change in the underlined concept after the  
drift. The streams with the specified settings are generated 
for lengths such as [1K, 50K, 100K, 150K, and 200K]. 
Here K refer to the size 1000. 
 Figure 1 depicts the data generation scenario for 
DYNAMIC imbalance and HIGH Drift for both CIRCLE 
and LINE generators. In addition to the simulated datasets, 
a real-world CD dataset KDD CUP 99 [60] is also used in 
the analysis. 
 
6.2 Experimental setup 
Except the LASVM, the online learners of both adaptive 
and non-adaptive algorithms such as non-adaptive NB, 
adaptive and non-adaptive KNN, and VFDT, Perceptron 
is considered from MOA. For non-adaptive KNN the 
window size is reset for training set sizes. The adaptive 
NB is implemented in MATLAB. The same window size 
is considered for both NB and KNN. On the other hand, 
the LASVM was originally developed to perform SVM 
active learning on an offline training set incrementally, 
whereas this has been modified to learn CD streams 
online. Mainly evaluation prequential of minority class 
Recall is used to demonstrate the performance of the 
classifiers, and majority class Recall is also used where 
ever necessary. Due to the nature of automatic adjustment 
of the boundary towards every incoming data both the 
LASVM and Perceptron are considered adaptive 
algorithms. 
7 Experimental results  
This section explores the analysis on the research 
questions that are already stated. Though we have carried 
out the proposed study on all aforementioned synthetic 
streams, the results for small (1K) and large (200K) 
datasets are presented for simplicity. The Gradual drifts 
are also used based on the necessity. 
RQ1: Does the length of the stream concerning the 
imbalance ratio at the current time t impact the online 
learner’s performance? 
To address this research question, we have considered the 
case of STATIC Imbalance- NO Drift i.e., the degree of 
imbalance is static and there is no drift in evolving stream. 
Here, each stream length [i.e., 1K, 50K, 200K] with 
varying degrees of imbalances are considered such as [1:9, 
2:8, 3:7, 4:6, 5:5]. From Table 5, for all the streams from 
both generators, on all considered learners, it is identified 
that as the length of the stream increases the performance 
also increases until the learning saturates from each 
evolving stream (Figure 2). 
 This trend is observed the same for all degrees of 
imbalance ratios. However, the rate of convergence to 
maximum recall varies with the degree of imbalance in the 
stream. From Table 5, for all the streams from both 
generators, on all considered learners, it is identified that 
as the degree of imbalance decreases from 1:9 to 5:5, the 
minority class Recall converging rate increases [See 
horizontally table. 5]. At 1:9 degree of imbalance, the 
stream is not able to rise from the minimum Recall value 
(i.e., 0), at 5:5 degree of imbalance, the stream 
performance saturates nearly at maximum Recall value 
(i.e., 100%). This is to be observed same for the streams 
from both generators, on all classifiers. Although 
perceptron yielded better performance (i.e., 10 times 
altogether), its performance is not consistent with the 
circle stream generator whereas LASVM (7 times 
altogether) exhibited consistent improvement in its 
performance with varying degrees of imbalances as well 
as size. 
It is also identified for the balanced streams (i.e., 
5:5), the time stamps required to converge to maximum 
Recall also decrease when compared to imbalanced 
streams. The stream with 5:5 converging to its maximum 
Recall 100%, is below the first 1000-time stamps, whereas 
the rest of the imbalanced streams converged to their 
maximum Recall after the first 1000 steps. This trend is 
illustrated with adaptive NB and LASVM (figure 2). 
 In addition to this, with LASVM, performance 
improvement is observed compared with the rest of the 
learners. The [40%, 45%] of the Recall yielded by 
LASVM at high degrees of imbalance [1:9, 2:8] after 
1000-time steps, whereas it is observed as zero or nearly 
zero for the rest of the learners. One of the findings on 
standalone data sets [61] where the minority class 
performance is affected by data set size, does not directly 
implicate imbalanced evolving streams as the length of the 
stream usually tends to infinity. Here the degree of 
imbalance only plays a critical role in performance 
degradation. 
 
6 Informatica 47 (2023) 1–20 Himaja. D et al. 
Table 2: Synthetic Dataset Description. 
 
 
State of imbalance  Before After 
STATIC 1:9 1:9 
DYNAMIC 1:9 9:1 
Table 3: State of imbalance before and after drift for 1:9 case.
Table 4: Settings of CD Generators. 
 
Figure 1: Dynamic imbalance.
 
RQ2. Is the degree of imbalance or CD whose impact 
critical in minority class performance degradation? 
Here, the main concern is on studying the degree of 
imbalance and CD on minority class performance. Thus, 
this question is addressed with STATIC imbalance- HIGH 
drift case. The degree of imbalance for all streams is 
considered constant throughout the stream. From figure 3, 
it is observed that the non-adaptive NB, VFDT, and KNN 
learners are sensitive to the drift compared to the adaptive 
versions. At the drift point, the learner’s performance of 
the minority class started to drop significantly.  
 
 
This effect is critical for moderate degree of 
imbalance cases such as [3:7, 4:6, and 5:5] %, due to the 
low impact of class imbalance on learner’s performance 
(From RQ1). Besides, for the high degree of imbalance 
cases such as [1:9, 2:8] %, it is observed that the impact of 
the degree of imbalance on the learner’s performance is 
more critical than the impact of the CD. This is due to 
minimum or zero performance of the learners in terms of 
minority class Recall. 
 
 
 
Dataset Imbalance 
[1:9,2:8,3:7,4:6,5:5] 
Speed of the drift Dataset Sizes Type of the drift 
 
 
CIRCLE 
STATIC  NO 
Gradual  
Abrupt 
[1K,50K,100K,150K,200K] p(y/x) 
DYNAMIC NO 
Gradual  
Abrupt 
[1K,50K,100K,150K,200K] p(y), OCI-CD 
 
 
LINE 
STATIC  NO 
Gradual  
Abrupt 
[1K,50K,100K,150K,200K] p(y/x) 
DYNAMIC NO 
Gradual  
Abrupt 
[1K,50K,100K,150K,200K] p(y), OCI-CD 
KDDCUP 99   STATIC [2:8] ------- 500K p(y/x) 
Problem Fixed values Before− >after drift 
CIRCLE a=b= 0.5 r= 0.2− > 0.2(NO DRIFT)  
r= 0.24− > 0.3(LOW DRIFT)  
r= 0.2− > 0.5(HIGH DRIFT) 
LINE a1=0.1 a0= -0.25−>-0.25(NO DRIFT) 
a0=-0.25−>-0.7(LOW DRIFT)  
a0=-0.1−> -0.8(HIGH DRIFT) 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 7 
Table 5: Minority class prequential Recall for CIRCLE and LINE of stream sizes [1K, 200K]. Here nonadap refers to 
non-adaptive and adap refers to adaptive. 
 
For the adaptive learner such as LASVM, KNN, and 
NB the effect of the drift seems to be nominal when 
compared to non-adaptive versions. Though there is an 
impact of the drift at the beginning of the new concept in 
terms of a performance drop, the learners are quickly 
regaining better performances than non-adaptive versions. 
This trend is observed the same for Gradual cases (figure 
4). The adaptability of NB and KNN to the drift is due to 
the window of the sample’s current time t. Whereas 
LASVM is due to active learning of new samples near to 
boundary. 
 
RQ3: Is the impact of OCI-CD is more adverse than 
p(y) and p(y/x) drifts on the learner’s performance? 
This research question is addressed by considering the 
cases of DYNAMIC imbalance- NO DRIFT(CI), 
DYNAMIC imbalance- HIGH DRIFT (OCI-CD). Here 
each of the stream starts with one of the imbalance degrees 
(i.e., Minority: Majority) such as [1:9, 2:8, 3:7, 4:6, 5:5] 
and after the p(y) drift, the degree of imbalance changes to 
[9:1, 8:2, 7:3, 6:4, 5:5] (i.e., the minority becomes the 
majority and majority becomes the minority). For 
simplicity here Class 1 is referred to as minority class and 
class 0 is the majority class before the drift. This research 
question is illustrated with stream size 200K, because the 
visibility of the drift is vibrant. In case of DYNAMIC 
imbalance-HIGH Drift both the p(y) and p(y/x) change 
have tailored to occur at the middle of the stream. 
 From figure 5, 6, 7, 8 it is noticed that the 
performance of the two classes hindered by both CI, OCI-
CD drifts. However, it is identified that this impact varies 
from classifier to classifier. 
 Concerned with Class 1, it is observed that in 
DYNAMIC imbalance- NO DRIFT case, for the high 
degree of imbalance cases such as 1:9, 2:8, the 
performance improvement is observed after the drift point 
and whereas for moderate degree of imbalance cases such 
as 3:7 and 4:6, the performance drop down is observed. 
This trend is similar for both adaptive and non-adaptive 
cases but more significant in non-adaptive learners (figure 
5, 6, 7, 8). On an average the performance of DYNAMIC 
imbalance - NO DRIFT and DYNAMIC imbalance- 
HIGH DRIFT are very much similar to each other in 
adaptive learners.  
 
  Recall (%) for different imbalance ratios 
Size Classifier 1:9 2:8 3:7 4:6 5:5 
 
 
 
1K(CIRCLE) 
NB (nonadap) 0  19.5 61.66 90.25 98.8 
NB (adap) 5  25 57.66 77.5 89.8 
KNN (nonadap) 6  30 62.66 90.25 99.2 
KNN (adap) 0  11 52 87.25 98.6 
VFDT (nonadap) 0  0 61.66 90.25 98.6 
VFDT (adap) 0  0 67.33 91 91 
Perceptron 0  0 0 25.75 11.6 
LASVM 8  14.5 70 91 99.6 
 
 
 
200K(CIRCLE) 
NB (nonadap) 0  26.1 63.9 95.4 100 
NB (adap) 0.01  26.8 62.97 94.11 100 
KNN (nonadap) 1.4  24.5 73 98.8 100 
KNN (adap) 2.6  42.5 64.4 95.9 99.8 
VFDT (nonadap) 2.3  0.3 93.3 99.7 99.9 
VFDT (adap) 17.7  85.9 99.8 99.8 99.9 
Perceptron 84.2  92.5 100 100 10.9 
LASVM 42.15  45.9 83.95 95.42 100 
 
 
 
1K(LINE) 
NB (nonadap) 0  50 81 91.75 98.4 
NB (adap) 12  41.5 70.66 86.75 96.8 
KNN (nonadap) 0  18.5 50.33 87.5 97 
KNN (adap) 0  18.5 50.33 87.5 97 
VFDT (nonadap) 0  1.5 75.66 95.25 97.8 
VFDT (adap) 0  2.5 75.66 95.5 97.2 
Perceptron 0  0 32.33 78 98.4 
LASVM 7  20 67 87 98.4 
 
 
 
 
200K(LINE) 
NB (nonadap) 0  57.3 86 96 100 
NB (adap) 0.01  56.35 84.9 94.4 100 
KNN (nonadap) 0.4  17.5 63.7 96.7 100 
KNN (adap) 5.7  49.6 69.8 98 99.7 
VFDT (nonadap) 0.5  1.6 97.2 99.9 99.9 
VFDT (adap) 21.2  20 97.5 95.9 100 
Perceptron 77.1  87.7 100 100 100 
LASVM 40.74  43.60 77.81 94.09 100 
8 Informatica 47 (2023) 1–20 Himaja. D et al. 
(a) Adaptive NB(1K) 
(b) Adaptive NB (200K) 
  
(c) LASVM (1K) 
  
(d) LASVM (200K) 
Figure 2: Minority class Recall Prequential for the stream sizes [1K, 50K, 200K] on STATIC IMBALANCE-NO 
DRIFT for CIRCLE DATASET. 
 
 
As the both drifts point to the same position it is observed 
that the impact of p(y) drift is more prominent than p(y/x) 
drift in OCI-CD (figure 5, 6, 7, 8 of Class 1). Compared to 
non-adaptive learners, adaptive learners are coping better 
with CI, OCI-CD drifts. 
Concerned with Class 0, with DYNAMIC 
imbalance- NO DRIFT, after the drift point there is a 
performance drop down on the streams with most of the 
learners. But LASVM exhibits a stable performance 
towards p(y) drift compared to the rest of the learners. 
However due to either the adaptive nature of the learner or 
more learning before the drift and scarcity of the concept 
after, few streams exhibited stable performance. In the 
case of DYNAMIC imbalance- HIGH drift, besides 
LASVM rest of the learners have shown performance drop 
compared to DYNAMIC imbalance-NO drift. In addition 
to this it is observed that this impact is more on non- 
adaptive learners when compared with other learners. 
Based on the learning mechanism the non-adaptive 
learners are prone to p(y) and OCI-CD drift. Since the 
perceptron behavior is not consistent in both adaptive and 
non-adaptive learner’s we cannot derive conclusions 
(figure 5, 6, 7, 8 of Class 0). Compared to non-adaptive 
learners, adaptive learners are coping better with CI, OCI-
CD drifts. 
For non-adaptive NB, KNN, and VFDT, there is 
a performance drop after the drift. This impact is observed 
significant for both DYNAMIC Imbalance- NO DRIFT, 
DYNAMIC imbalance HIGH DRIFT cases. However, the 
rate of fall of performance at the drift point, the rate of 
convergence after the drift of DYNAMIC imbalance 
HIGH DRIFT case only differs with DYNAMIC 
imbalance- NO DRIFT case. Here the balanced cases 5:5 
are even prone to CI, OCI-CD drifts in case of non-
adaptive learners. Compared with non -adaptive learners, 
the class 1 performance of the adaptive learners is better 
coping with CI, OCI-CD drifts (figure 5, 6, 7 and 8). 
Compared with other adaptive and non-adaptive 
learners, LASVM exhibits stable performance towards the 
OCI-CD drift for Class 0. Whereas for class 1, at moderate 
degree of imbalance cases such as [3:7, 4:6] and at 
balanced degree of imbalance [5:5] cases are not much 
sensitive to different types of drifts. However, the high 
degree of imbalance cases such as [1:9, 2:8] are only prone 
to CI drift. The same scenario is observed with Gradual 
drifts on LASVM. 
 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 9 
(a) non-adaptive NB (200K)  
(b) Adaptive NB (200K)  
(c) non- adaptive KNN (200K)  
(d) adaptive KNN (200K)  
(e) non-adaptive VFDT (200K) 
(f) adaptive VFDT (200K)  
(g) PERCEPTRON (200K)  (h) LASVM (200K) 
Figure 3: Minority class Recall Prequential on STATIC IMBALANCE- HIGH DRIFT (Abrupt) for 
CIRCLE dataset. 
10 Informatica 47 (2023) 1–20 Himaja. D et al. 
 
 
 
(a) non-adaptive NB (200K)  
(b) LASVM (200K)  
Figure 4: Minority class Recall Prequential on STATIC IMBALANCE- HIGH DRIFT (Gradual) for CIRCLE dataset
 
 
RQ4. To what extent does online SVM 
cope with OCI-CD, p(y), and p(y/x) drifts 
compared to other online learners?  
From RQ1, the LASVM has consistently exhibited better 
performance than the other adaptive and non-adaptive 
learners, in coping with the degree of imbalance. As the 
degree of imbalance decreases [1:9, 2:8, 3:7, 4:6, 5:5], the 
performance increases. Though the size is increasing from 
1K to 200K, it is converging earlier to the maximum 
performance (Recall) (table 5 and figure 2). 
From RQ2, unlike other adaptive and non-
adaptive algorithms, it is not much sensitive to p(y/x) CD. 
In this case, both LASVM and adaptive KNN consistently 
yielded better performance (figure 3). 
From RQ3, the LASVM is sensitive to p(y) drift 
at the high degree of imbalance cases on class 1 such as 
[1:9, 2:8] only. For LASVM, the performance yield looks 
the same on both DYNAMIC imbalance- NO DRIFT and 
DYNAMIC imbalance- HIGH DRIFT cases.  
Hence the observations regarding online SVM 
are mimicking the conclusions of [62, 53] which are 
investigated in static training set settings. Further, apart 
from the high degree of imbalance cases the performance 
of online SVM active learning is not that sensitive to p(y), 
p(y/x), OCI-CD drifts. Compared with other adaptive and 
non-adaptive learners, adaptive KNN is performing 
equally in coping with considered three drifts. 
8 Analysis on real world data sets 
The conclusions are further validated on real world drift 
dataset such as KDDCUP’99 of 10% [60] which is having 
a constant degree of imbalance 20:80. Corresponding data 
characteristics are depicted in table 2. The dataset is well 
discriminative in nature (figure 9 (b)) and only prone to 
p(y/x) drift. Here the last five fisher discriminate 
components are considered to preserve the non-linearity 
of the data [63], based on the assumption that the non-
linear concepts are not much separable. Consequently, the 
impact of static imbalance (STATIC) with p(y/x) drift in   
 
 
the context of adaptive and non-adaptive learners is only 
studied. This scenario is illustrated with 1K-500K. For 
better readability, the classifiers from MOA (non-adaptive 
NB, VFDT, adaptive KNN, Perceptron) are considered for 
comparisons with LASVM. 
From figure 9(a), (b) as the length of the stream 
increases from 0 to 500K the minority class Recall 
increases from 0% to 100% nearly for all classifiers. Here 
LASVM converges to the maximum Recall earlier than 
other learners. It is identified that NB is more susceptible 
to drift (figure 9(b) at 350K) compared to other online 
learners. On the other hand, in the case of p(y/x) drift 
LASVM is consistent and has exhibited stable 
performance compared to other adaptive learners KNN, 
Perceptron, and non-adaptive learners like VFDT and NB. 
For the smaller stream lengths i.e., 1K (figure 9(a)), 
experiments were repeated three times and its average 
evaluation prequential Recall is reported. 
Hence, from the real-world datasets, it is identified 
that the adaptive learners better cope with p(y/x) drift 
compared with non-adaptive learners. In addition to this, 
it is prominently identified that LASVM is not much 
sensitive to static class imbalance as well as to p(y/x) drift, 
which mimics the findings on synthetic streams. Due to 
the well separability of the data, the impact of class 
imbalance on other learners is also minimal. 
9 Discussion 
In this section, observations are discussed on analyzing the 
impact of the parameters such as static degree of 
imbalance, stream length, drifts (i.e., p(y), p(y/x), OCI-
CD) on adaptive, non-adaptive learners over unbalanced 
evolving streams. The main observations based on 
considered parameters are: 
• Degree of imbalance: In an evolving stream, as the 
degree of imbalance increases the performance of the 
minority class decreases. This is observed true for the 
cases when the degree of imbalance is either static or 
dynamic.  
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 11 
Figure. 5: CLASS 0 and CLASS 1 Recall prequential of DYNAMIC IMBALANCE on NB Learner. Here, b refers to 
before, a refers to after the drift (Abrupt). (i): non-adaptive NB with no p(y/x) drift. (ii): non-adaptive NB with high 
p(y/x) drift. (iii): adaptive NB with no p(y/x) drift. (iv): adaptive NB with high p(y/x) drift. 
 
 
 
 
 
 
 
 
CLASS0 (CIRCLE) CLASS1 (CIRCLE) CLASS0 (LINE) CLASS1(LINE) 
  
 
 
          (i) 
  
  
         (ii) 
 
 
 
 
          (iii) 
 
   
          (iv) 
12 Informatica 47 (2023) 1–20 Himaja. D et al. 
CLASS0 (CIRCLE) CLASS1 (CIRCLE) CLASS0 (LINE) CLASS1(LINE) 
  
  
          (i) 
    
         (ii) 
 
 
 
 
          (iii) 
 
  
 
          (iv) 
Figure 6: CLASS 0 and CLASS 1 Recall prequential of DYNAMIC IMBALANCE on KNN Learner. Here, b refers to 
before, a refers to after the drift (Abrupt). (i): non-Adaptive KNN with no p(y/x) drift. (ii): non-adaptive KNN with 
high p(y/x) drift. (iii): adaptive KNN with no p(y/x) drift. (iv): adaptive KNN with high p(y/x) drift. 
 
 
 
 
 
 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 13 
Figure 7: CLASS 0 and CLASS 1 Recall prequential of DYNAMIC IMBALANCE on PERCEPTRON Learner. Here, 
b refers to before, a refers to after the drift (Abrupt). (i): Non-Adaptive PERCEPTRON with no p(y/x) drift. (ii): Non-
Adaptive PERCEPTRON with high p(y/x) drift. 
 
In the dynamic case, there is a performance drop 
at the p(y) drift position. This impact is severe for 
moderately imbalanced streams when compared to 
the highly. For high imbalance degrees, in a static 
imbalance state, constant performance degradation is 
observed. Whereas for dynamic imbalance state, 
performance improvement is observed after p(y) drift. 
• Length of the Stream: As the size of the stream 
increases, the stream with more balanced classes 
converges earlier than the unbalanced classes.  
Unlike, on standalone training sets, here the length of 
the stream with respect to imbalance does not have 
much impact on minority class performance. 
However, as the length of the stream increases, the 
performance increases till the learning from the 
stream saturates. 
• Real Drift (i.e., p(y/x)): As the length of the stream 
increases the non-adaptive classifiers are much more 
prone to p(y/x) drift compared to the adaptive 
classifiers. This impact is critical for moderate degree 
of imbalance cases where the imbalance has less 
impact. For adaptive learners, the impact of the class 
imbalance on minority class performance is more 
critical than real drift. 
• Virtual and Real Drift (i.e., p(y) and OCI-CD): The 
non-adaptive classifiers are much more prone to both 
p(y) and OCI-CD drifts compared to adaptive 
learners. Whereas, the adaptive learners are much 
more prone to p(y) drifts at the high degree of 
imbalance cases. In addition to this, the impact of p(y) 
and OCI-CD drifts varies from learner to learner and 
concept to concept. Learners such as NB, KNN, and 
VFDT are prone to both p(y) and OCI-CD drifts. 
Further, the impact of dynamic imbalance is more 
severe than real drift due to the sudden drop down in 
performances. 
• Adaptive and Non-Adaptive learners: The non-
adaptive learners are much susceptible to drifts (i.e., 
p(y), p(y/x)) and OCI-CD) compared with adaptive 
learners. The degree of imbalance in two of the forms, 
static and dynamic is the common factor that both 
types of learners tend to be prone to. From all 
considered classifiers apart from the high degree of 
imbalance cases such as [1:9, 2:8] in both static and 
dynamic cases, the adaptive large scale SVM active 
learning is not much sensitive to OCI-CD, p(y) and 
p(y/x) drift compared to other online learning 
methods. Further, it is not sensitive to both states of 
imbalance (static and dynamic). Consistent 
performances are observed with respect to all 
considered parameters, though it is not designed to 
handle CD and p(y) changes in evolving streams. 
 
 
 
 
 
 
 
 
CLASS0 (CIRCLE) CLASS1 (CIRCLE) CLASS0 (LINE) CLASS1(LINE) 
 
  
 
          (i) 
    
         (ii) 
14 Informatica 47 (2023) 1–20 Himaja. D et al. 
 
 
CLASS0 (CIRCLE) CLASS1 (CIRCLE) CLASS0 (LINE) CLASS1(LINE) 
 
  
 
          (i) 
 
 
 
 
         (ii) 
    
          (iii) 
 
  
 
          (iv) 
 
Figure 8: CLASS 0 and CLASS 1 Recall prequential of DYNAMIC IMBALANCE on LASVM Learner. Here, b 
refers to before, a refers to after the drift. (i): LASVM with no p(y/x) drift (Gradual). (ii): LASVM with high p(y/x) 
drift (Gradual). (iii): LASVM with no p(y/x) drift (Abrupt). (iv): LASVM with high p(y/x) drift (Abrupt).
 
 
 
 
 
 
 
 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 15 
  
(a) 1K 
(b) 500K 
 
Figure 9 Minority class Recall Prequential for KDD CUP from Data Set Size [1K- 500K].
 
 
In terms of overall performance, adaptive algorithms 
outperformed non-adaptive algorithms due to the 
maintenance of a window throughout the learning process 
in which older samples are ignored and only 
representative samples are retained. LASVM 
outperformed other algorithms in adaptive learners due to 
its adaptability to changing data and applicability to 
imbalanced data sets [58]. Furthermore, LASVM employs 
active learning of new samples near the boundary. 
From the above discussions the following 
recommendations are made to cope with the combined 
problem of OCI-CD. 
• It is visualized that with adaptive methods such as 
LASVM, which are less sensitive to p(y|x) drifts, the 
p(y) drift can be handled dynamically by adapting the 
methods that address the class imbalance problem [4]. 
This sort of solution is viable for the high degree of 
imbalanced cases (both static and dynamic) where the 
impact of p(y/x) drift is nullified by the 
underperformance of the learner. However, for the 
environment where tracking of change detection is 
mandatory, there drift detection methods can be 
implemented. Whereas the dynamic degree of 
imbalance is handled with an indicator function [38]. 
• For non-adaptive learners, the combined OCI-CD 
problem is approached by employing both drift 
detection and methods that address the class 
imbalance problem simultaneously. At first, based on 
the current degree of imbalance at time t the class 
imbalance methods are employed. Then the drift can 
be detected using drift detection methods [1, 2]. If the 
drift is not detected the current model is updated else 
a new model is learned with the new sample. 
However, in case of the dynamic degree of imbalance, 
the p(y) change could be captured dynamically by an 
indicator function, and then it can be adaptively 
countered by the methods that address the class 
imbalance problem [3]. This sort of solution is viable  
 
 
 
 
 
for moderate degree of imbalance cases (both static 
and dynamic) where the tracking of p(y/x) drift is 
possible and can be addressed. 
• New drift detection methods are required to develop 
for identifying the CI, CD and OCI-CD drifts at the 
high degree of imbalance cases, or the existing drift 
detection methods are needed to be fine-tuned for 
adaptability. However, the change detection methods 
based on classification error or performance are prone 
to a state of imbalance in both states of static and 
dynamic. 
10 Conclusion 
This work presents an explorative study to analyze the 
impact of the combined problem of CI (both static and 
dynamic) and CD (i.e., p(y/x)). Initially, this study aims at 
exploring the impact of the degree of imbalance on online 
learner’s performance. Here it is identified that as the 
degree of imbalance increases, the performance converged 
rate of the stream decreases. Further, the balanced streams 
converging earlier to their maximum performance 
compared to unbalanced streams. Later, the impact of the 
CD analyzed over adaptive and non-adaptive learners. It 
is noticed that the impact of the real CD is more on non-
adaptive learners compared with adaptive learners. This 
effect is critical for an evolving stream with moderate 
degree of imbalances. For the high degree of imbalance 
streams, the degree of imbalance is more critical than the 
CD. 
In addition to the above findings, the effects of virtual 
drift (i.e., p(y)) and combined drift (i.e., OCI-CD) are 
analyzed. It is noticed that the non-adaptive learners such 
as NB and VFDT are much more prone to both p(y) and 
OCI-CD drifts. Whereas the adaptive classifiers such as 
NB, KNN, and VFDT are much more prone to the virtual 
kind of p(y) drift. Further, to these findings, it is reported 
that the large-scale active learning SVM, (LASVM) is not  
 
 
16 Informatica 47 (2023) 1–20 Himaja. D et al. 
much sensitive to the degree of imbalance as well as 
different types of drifts though it was not designed for 
countering the combined problem of virtual and real drift. 
This study also presents a few guidelines for designing 
online learning algorithms to address the combined 
problem of imbalanced evolving streams with CD. 
Though the LASVM has better coped with a 
combined problem compared with other learners, still it is 
prone to p(y) drift at the high degree of imbalances, 
therefore an enhanced LASVM for better prediction 
performance is under study and a drift detector that is able 
to identify CD in the presence of class imbalance is also 
under progress. 
 
Acknowledgement 
 
This study was funded by India’s defense research and 
development organization (DRDO) under the sanction 
code ERIPR/GIA/17-18/038. The work was reviewed by 
the center for artificial intelligence and robotics (CAIR). 
We would like to thank the late Dr. T. Maruthi Padmaja 
for her assistance and support in this work, and she is the 
grant recipient. 
 
References 
 
[1] G. Ditzler, M. Roveri, C. Alippi and R. Polikar. 
Learning in nonstationary environments: A survey. 
IEEE Computational intelligence magazine, 10(4): 
12-25, 2015. 
https://doi.org/10.1109/mci.2015.2471196 
 
[2] J. Gama, I. Zliobaite, M. Pechenizkiy and A. 
Bouchachia. A survey on concept drift adaptation. 
ACM Computing Surveys, 46(4): 1-37, 2014.                                                      
https://doi.org/10.1145/2523813 
 
[3] S. Wang, LL. Minku and X. Yao. A systematic study 
of online class imbalance learning with concept drift. 
IEEE transactions on neural network learning 
system, 29(10): 4802-4821, 2018. 
https://doi.org/10.1109/tnnls.2017.2771290 
 
[4] H. Haibo, EA. Garcia E. A learning from imbalanced 
data, IEEE transactions on knowledge and data 
engineering, 21(9): 1263-1284, 2009. 
https://doi.org/10.1109/tkde.2008.239 
 
[5] Y. Sun, A. Wong and MS. Kamel M S. Classification 
of imbalanced data: a review. International journal of 
pattern recognition and artificial intelligence, 23(4): 
687-719, 2009. 
https://doi.org/10.1142/s0218001409007326. 
 
 
 
 
 
[6] K. Morik, P. Brockhausen and T. Joachims. 
Combining statistical learning with a knowledge-
based approach - a case study in intensive care 
monitoring. In proceedings of the 16th international 
conference on machine learning ICML: 268-277, 
1999. 
 
[7] C. Elkan. The foundations of cost sensitive learning, 
In proceedings of intelligence joint conference on 
artificial intelligence (IJCAI’01): 973-978, 2001. 
 
 
[8] X. Liu and Z. Zhou. The influence of class imbalance 
on cost-sensitive learning: An empirical study. In 
sixth international conference on data mining 
(ICDM’06): 970- 974, 2006. 
https://doi.org/10.1109/icdm.2006.158 
 
[9] HJ. Lee and S. Cho. The novelty detection approach 
for difference degrees of class imbalance. In I. King, 
J. Wang, LW. Chan, D. Wang, ed, neural information 
processing, ICONIP 2006, 4233, lecture notes in 
computer science. springer, berlin, Heidelberg: 21-
30, 2006. 
https://doi.org/10.1007/11893257_3 
 
[10] W. Fan, SJ. Stolfo, J. Zhang and PK. Chan. Adacost: 
misclassification cost sensitive boosting. In 
proceedings of 16
th
 international conference on 
machine learning, morgan Kaufmann: 97–105, 1999. 
 
[11] Y. Sun, MS. Kamel, AK. Wong and Y. Wang. Cost-
sensitive boosting for classification of imbalanced 
data. Pattern recognition, 40(12): 3358-3378, 2007. 
https://doi.org/10.1016/j.patcog.2007.04.009 
 
[12] MV. Joshi, V. Kumar and RC. Agarwal. Evaluating 
boosting algorithms to classify rare classes: 
comparison and improvements. In proceedings 2001 
IEEE international conference on data mining, 257-
264. 
https://doi.org/10.1109/icdm.2001.989527 
 
[13] S. Wang and X. Yao. Diversity analysis on 
imbalanced data sets by using ensemble models. In 
IEEE symposium on computational intelligence and 
data mining CIDM ’09, 324-331, 2009. 
https://doi.org/10.1109/cidm.2009.4938667 
 
[14] X. Liu, J. Wu J and Z. Zhou Z. Exploratory under 
sampling for class imbalance learning. IEEE 
transactions on systems, man and cybernetics, part 
B(cybernetics), 39(2): 539-550, 2009. 
https://doi.org/10.1109/tsmcb.2008.2007853 
 
 
 
 
 
 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 17 
[15] C. Seiffert, TM. Khoshgoftaar, JV. Hulse and A. 
Napolitano. RUSBoost: A hybrid approach to 
alleviating class imbalance, IEEE transactions on 
systems, man, and cybernetics - part A: systems and 
humans, 40(1): 185-197, 2010. 
https://doi.org/10.1109/tsmca.2009.2029559 
 
[16] NV. Chawla, A. Lazarevic, LO. Hall and KW. 
Bowyer. SMOTEBoost: Improving prediction of the 
minority class in boosting. In N. Lavrac, D. 
Gamberger, L. Todorovski and H. Blockeel ed, 
Knowledge discovery in databases: PKDD 2003, 
2838, lecture notes in computer science, springer, 
berlin, heidelberg, 107-119, 2003. 
https://doi.org/10.1007/978-3-540-39804-2_12 
 
[17] H. Guo and HL. Viktor H L. Learning from 
imbalanced data sets with boosting and data 
generation: the Data Boost-IM approach. ACM 
SIGKDD explorations newsletter - special issue on 
learning from imbalanced datasets homepage 
archive, 6(1): 30-39, 2004. 
https://doi.org/10.1145/1007730.1007736 
 
[18] S. Hido, H. Kashima and Y. Takahashi. Roughly 
balanced bagging for imbalanced data, 2: 412-426, 
2009. 
https://doi.org/10.1002/sam.10061 
 
[19] H. Blaszczynski and J. Stefanowski. Neighborhood 
sampling in bagging for imbalanced data. 
Neurocomputing, 184-203, 2015. 
 
[20] I. Khamassi, MS. Mouchaweh, M. Hammami and K. 
Ghedira. Discussion and review on evolving data 
streams and concept drift adapting. Evolving 
Systems, 9: 1-23, 2018. 
https://doi.org/10.1007/s12530-016-9168-2 
 
[21] JP. Patist. Optimal window change detection. In 
seventh IEEE international conference on data 
mining workshops (ICDMW 2007), 557-562, 2007. 
https://doi.org/10.1109/icdmw.2007.9 
 
[22] K. Nishida and K. Yamauchi. Detecting concept drift 
using statistical testing. In V. Corruble, M. Takeda 
and E. Suzuki E ed, Discovery Science, 4755, lecture 
notes in computer science, Springer, Berlin, 
Heidelberg, 264-269, 2007. 
https://doi.org/10.1007/978-3-540-75488-6_27 
 
[23] DM. Hawkins, Q. Peihua and WK. Change. The 
change point model for statistical process control. 
Journal of quality technology, 35(4): 355-366, 2003. 
 
[24] A. Wald. Sequential tests of statistical hypotheses. In 
S. Kotz and NL. Johnson ed. Breakthroughs in 
statistics, springer series in statistics (perspectives in 
statistics), new york, NY, 1992 
https://doi.org/10.1007/978-1-4612-0919-5_18 
 
[25] S. Micevska, A. Awad and S. Sakr. SDDM: An 
interpretable statistical concept drift detection 
method for data streams. Journal of intelligent 
information systems, 56: 459-484, 2021. 
https://doi.org/10.1007/s10844-020-00634-5 
 
[26] P. Li, Wu. Man, He. Junhong and Hu. Xuegang. 
Recurring drift detection and model selection-based 
ensemble classification for data streams with 
unlabeled data. New generation computing, 39: 341-
376, 2021. 
 
[27] S. Wang, LL. Minku, D. Ghezzi, D. Caltabiano, P. 
Tino and X. Yao. Concept drift detection for online 
class imbalance learning. In international joint 
conference on neural networks, 1-8, 2013. 
https://doi.org/10.1109/ijcnn.2013.6706768 
 
[28] H. Wang, Z. Abraham. Concept drift detection for 
streaming data. In international joint conference of 
neural networks, 1-9,2015. 
https://doi.org/10.1109/ijcnn.2015.7280398 
 
[29] D. Brzezinski and J. Stefanowski. Prequential auc for 
classifier evaluation and drift detection in evolving 
data streams. New frontiers in mining complex 
patterns, 8983, 87-101, 2015. 
https://doi.org/10.1007/978-3-319-17876-9_6 
 
[30] S. Wang and LL. Minku. AUC estimation and 
concept drift detection for imbalanced data streams 
with multiple classes. In 2020 international joint 
conference on neural networks (IJCNN), 1-8, 2020. 
 
[31] MM. Idrees, LL. Minku, F. Stahl and A. Badii. A 
heterogeneous online learning ensemble for non-
stationary environments. Knowledge-based systems, 
188, 104983, 2020. 
https://doi.org/10.1016/j.knosys.2019.104983 
 
[32] J. Gao, W. Fan, J. Han and P. Yu, P. A general 
framework for mining concept drifting data streams 
with skewed distributions. In proceedings of the 
2007 SIAM 
international conference on data mining (SDM), 3-
14, 2007. 
https://doi.org/10.1137/1.9781611972771.1 
 
[33] S. Chen and He. Haibo. SERA: Selectively recursive 
approach towards nonstationary imbalanced stream 
data mining. In 2009 international joint conference 
on neural networks, 522-529, 2009. 
https://doi.org/10.1109/ijcnn.2009.5178874 
 
[34] R. Lichtenwalter and NV. Chawla. Adaptive methods 
for classification in arbitrarily imbalanced and 
drifting data streams. In T. Theeramunkong et al ed. 
new frontiers in applied data mining, PAKDD, 5669, 
Springer, Berlin, Heidelberg, 53-75, 2009. 
https://doi.org/10.1007/978-3-642-14640-4_5 
 
18 Informatica 47 (2023) 1–20 Himaja. D et al. 
[35] T. Ryan Hoens and NV. Chawla. Learning in non-
stationary environments with class imbalance. In 
18th ACM SIGKDD international conference on 
knowledge discovery and data mining, 168-176, 
2012. 
https://doi.org/10.1145/2339530.2339558 
[36] TR. Hoens, NV. Chawla, R. Polikar. Heuristic 
updatable weighted random subspaces for non-
stationary environments. In 2011 IEEE 11th 
international conference on data mining, 241-250, 
2011. 
https://doi.org/10.1109/icdm.2011.75 
[37] G. Ditzler and R. Polikar. Incremental learning of 
concept drift from streaming imbalanced data. IEEE 
transactions on knowledge and data engineering, 
25(10): 2283-2301, 2013. 
https://doi.org/10.1109/tkde.2012.136 
 
[38] S. Wang, LL. Minku and X. Yao. Resampling-based 
ensemble methods for online class imbalance 
learning. IEEE transactions on knowledge and data 
engineering, 27(5): 1356-1368, 2015. 
https://doi.org/10.1109/tkde.2014.2345380 
[39] A. Ghazikhani, R. Monsefi and YH. Sadoghi. 
Recursive least square perceptron model for non-
stationary and imbalanced data stream classification. 
Evolving Systems, 4: 119–131, 2013. 
https://doi.org/10.1007/s12530-013-9076-7 
[40] A. Ghazikhani, R. Monsefi and YH. Sadoghi. Online 
neural network model for nonstationary and 
imbalanced data stream classification. International 
journal of machine learning and cybernetics, 5(1): 
51-62, 2014. 
https://doi.org/10.1007/s13042-013-0180-6 
[41] B. Mirza, Z. Lin Z and N. Liu N. Ensemble of subset 
online sequential extreme learning machine for class 
imbalance and concept drift. Neurocomputing, 149: 
316-329, 2015. 
https://doi.org/10.1016/j.neucom.2014.03.075 
[42] S. Barua, MM. Islam and K. Murase K. GOSIL: A 
generalized over-sampling based online imbalanced 
learning framework. In S. Arik, T. Huang T, W. Lai 
and Q. Liu ed, Neural information processing, 
ICONIP, Lecture notes in computer science, 
springer, cham, 9489, 2015. 
https://doi.org/10.1007/978-3-319-26532-2_75 
[43] LU. Yang, Y. Cheung and Y. Tang Y. Dynamic 
weighted majority for incremental learning of 
imbalanced data streams with concept drift. In 
proceedings of the twenty-sixth international joint 
conference on artificial intelligence, 2393-2399, 
2017. 
https://doi.org/10.24963/ijcai.2017/333 
[44] LU. Yang, Y. Cheung and Y. Tang Y. Adaptive 
chunk-based dynamic weighted majority for 
imbalanced data streams with concept drift. IEEE 
transactions on neural networks and learning 
systems, 31(8): 2764-2778, 2020 
https://doi.org/10.1109/tnnls.2019.2951814 
 
[45] A. Tharwat and W. Schenck W. Balancing 
exploration and exploitation: a novel active learner 
for imbalanced data. Knowledge based systems, 210, 
2020. 
https://doi.org/10.1016/j.knosys.2020.106500 
[46] L. korycki, A. Cano and B. krawczyk B. Active 
learning with abstaining classifiers for imbalanced 
drifting data streams. In 2019 IEEE international 
conference on big data (big data), 2334-2343, 2019. 
https://doi.org/10.1109/bigdata47090.2019.9006453 
[47] L. Korycki and B. Krawczyk. Online oversampling 
for sparsely labeled imbalanced and non-stationary 
data streams. In 2020 international joint conference 
on neural networks (IJCNN), 1-8, 2020. 
https://doi.org/10.1109/ijcnn48605.2020.9207118 
[48] M. Jena and D. Satchidananda. Decision tree for 
classification and regression: a state-of-the art 
review, informatica, 44: 405-420, 2019. 
https://doi.org/10.31449/inf.v44i4.3023 
[49] R. Saifan, K. Sharif, M. Abu-Ghazaleh and M. Abdel-
Majeed. Investigating algorithmic stock market 
trading using ensemble machine learning methods. 
Informatica, 44: 311-325, 2020. 
https://doi.org/10.31449/inf.v44i3.2904 
[50] EB, Francis, Q. Zhen, K. Hughes-Lartey and EA. 
Kwame. Predicting fraud in mobile money 
transactions using machine learning: the effects of 
sampling techniques on the imbalanced dataset. 
Informatica, 45: 45-46, 2021. 
https://doi.org/10.31449/inf.v45i7.3179 
[51] AA. Ali and AS. Fakhreldeen. A Comparative 
analysis of machine learning algorithms to build a 
predictive model for detecting diabetes 
complications. Informatica, 45: 117-125, 2021. 
https://doi.org/10.31449/inf.v45i1.3111  
[52] A. Bifet, G. Holmes, R. Kirkby and B. Pfahringer. 
MOA: Massive online analysis, Journal of Machine 
Learning Research, 11: 1601-1604, 2010. 
[53] P. Domingos and G. Hulten. Mining highspeed data 
streams. In proceedings of 6
th
 ACM SIGKDD 
international conference on knowledge discovery 
data mining, Boston, MA USA, 71-80, 2000. 
https://doi.org/10.1145/347090.347107 
[54] Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast 
Kernel Classifiers with Online and Active Learning, 
Journal of Machine Learning Research, 6, pp. 1579-
1619. 
[55] C. Chang and CJ. Lin. LIBSVM A library for support 
vector machines. ACM transactions on intelligent 
systems and technology, 2(3), 2011. 
[56] L. Pierre-Xavier. Adaptive machine learning 
algorithms for data streams subject to concept drifts, 
Ph. D Thesis., University pierre at marie curie, Paris 
VI, 2017. 
[57] JC. Platt J C. Sequential minimum optimization: A 
fast algorithm for training support vector machines. 
A technical report MSR-TR-98-14, 1998. 
[58] S. Ertekin, J. Huang and C. Lee Giles. Active learning 
class imbalance Problem, In conference on research 
and development in information retrieval, 
netherlands, 823- 824, 2007. 
Analyzing adaptive and non-adaptive online learners… Informatica 47 (2023) 1–20 19 
https://doi.org/10.1145/1277741.1277927 
 
 
[59] LL. Minku, AP. White A P and X. Yao X. The impact 
of diversity on on-line ensemble learning in the 
presence of concept drift. IEEE transactions on 
knowledge and data engineering, 22(5): 730-742, 
2010. 
https://doi.org/10.1109/tkde.2009.156 
[60] D. Dua and K. Taniskidou. UCI machine learning 
repository. 
http://kdd.ics.uci.edu/databases/kddcup99.html  
[61] N. Japkowicz and S. Stephen. The class imbalance 
problem: A systematic study. Intelligent data 
analysis, 6(5): 429-449, 2002. 
https://doi.org/10.3233/ida-2002-6504 
[62] TS. Sethi and M. Kantardzic M. On the reliable 
detection of concept drift from streaming unlabeled 
data. Expert systems with applications: an 
international journal, 82(c): 77-99, 2017. 
https://doi.org/10.1016/j.eswa.2017.04.008 
[63] RC. Prati, GEAPA. Batista and MC. Monard. Class 
imbalances versus class overlapping: An analysis of 
a learning system behavior. In R. Monroy, G. 
Arroyo-Figueroa, LE. Sucar and H. Sossa ed, 
MICAI: Advances in Artificial Intelligence, 2972, 
Lecture notes in computer science, springer, Berlin, 
Heidelberg, 312-321, 2004. 
https://doi.org/10.1007/978-3-540-24694-7_32   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20 Informatica 47 (2023) 1–20 Himaja. D et al.