Viktor Losing, Barbara Hammer, Heiko Wersing,
"Tackling Heterogeneous Concept Drift with the Self Adjusting Memory (SAM)",
Knowledge and Information Systems, vol. 54, no. 1, pp. 171-201, 2018.
Data Mining in non-stationary data streams is particularly relevant in the context of Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self Adjusting Memory (SAM) model for the k Nearest Neighbor (kNN) algorithm. SAM-kNN can deal with heterogeneous concept drift, i.e different drift types and rates, using biologically inspired memory models and their coordination. Its basic idea is to have dedicated models for current and former concepts used according to the demands of the given situation. It can be easily applied in practice without meta parameter optimization.
We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and real world datasets. We explicitly add new benchmarks enabling a precise performance analysis on multiple types of drift. Highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capability to handle heterogeneous concept drift.
Knowledge about drift characteristics in streaming data is not only crucial for a precise algorithm evaluation, but it also facilitates the choice of an appropriate algorithm on real world applications. Therefore, we additionally propose two tests, able to determine the type and strength of drift. We extract the drift characteristics of all utilized datasets and use it for our analysis of the SAM in relation to other methods.
Download Bibtex file