Urio-Larrea, Asier and ANDREU-PEREZ, Javier and Pereira Dimuro, Gracaliz (2024) Data stream clustering: introducing recursively extendable aggregation functions for incremental cluster fusion processes. IEEE Transactions on Cybernetics. (In Press)
Urio-Larrea, Asier and ANDREU-PEREZ, Javier and Pereira Dimuro, Gracaliz (2024) Data stream clustering: introducing recursively extendable aggregation functions for incremental cluster fusion processes. IEEE Transactions on Cybernetics. (In Press)
Urio-Larrea, Asier and ANDREU-PEREZ, Javier and Pereira Dimuro, Gracaliz (2024) Data stream clustering: introducing recursively extendable aggregation functions for incremental cluster fusion processes. IEEE Transactions on Cybernetics. (In Press)
Abstract
In Data Stream (DS) learning, the system has to extract knowledge from data generated continuously, usually at high speed and in large volumes, making it impossible to store the entire set of data to be processed in batch mode. Hence, machine learning models must be built incrementally by processing the incoming examples, as data arrive, while updating the model to be compatible with the current data. In fuzzy DS clustering, the model can either absorb incoming data into existing clusters or initiate a new cluster. As the volume of data increases, there is a possibility that the clusters will overlap to the point where it is convenient to merge two or more clusters into one. Then, a cluster comparison measure (CM) should be applied, to decide whether such clusters should be combined, also in an incremental manner. This defines an incremental fusion process based on aggregation functions that can aggregate the incoming inputs without storing all the previous inputs. The objective of this paper is to solve the fuzzy DS clustering problem of incrementally comparing fuzzy clusters on a formal basis. Firstly, we formalize and operationalize incremental fusion processes of fuzzy clusters by introducing Recursively Extendable (RE) aggregation functions, studying construction methods and different classes of such functions. Secondly, we propose two approaches to compare clusters: similarity and overlapping between clusters, based on RE aggregation functions. Finally, we analyze the effect of those incremental CMs on the online and offline phases of the well-known fuzzy clustering algorithm d-FuzzStream, showing that our new approach outperforms the original algorithm and presents better or comparable performance to other state-of-the- art DS clustering algorithms found in the literature.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Data streams, fuzzy clustering, similarity mea- sures, overlap indices, aggregation functions |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 03 Dec 2024 16:24 |
Last Modified: | 03 Dec 2024 17:05 |
URI: | http://repository.essex.ac.uk/id/eprint/39682 |
Available files
Filename: Preprint_IEEE_Trans_Cybernetics.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0