Yang, Xinan and Vernitski, Alexei and Carrea, Laura (2016) An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters. European Journal of Operational Research, 252 (3). pp. 985-994. DOI https://doi.org/10.1016/j.ejor.2016.01.042
Yang, Xinan and Vernitski, Alexei and Carrea, Laura (2016) An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters. European Journal of Operational Research, 252 (3). pp. 985-994. DOI https://doi.org/10.1016/j.ejor.2016.01.042
Yang, Xinan and Vernitski, Alexei and Carrea, Laura (2016) An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters. European Journal of Operational Research, 252 (3). pp. 985-994. DOI https://doi.org/10.1016/j.ejor.2016.01.042
Abstract
Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes–no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognized incorrectly by the yes-filter (that is, to recognize the false positives of the yes-filter). By querying the no-filter after an object has been recognized by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognizes as many as possible false positives but no true positives, thus producing the most accurate yes–no Bloom filter among all yes–no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognized by the no-filter, with the constraint being that it should recognize no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes–no Bloom filters. In a wider context of the study of lossy compression algorithms, our research is an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Lossy compression; Bloom filter; Integer linear program; Approximate dynamic programming; Heuristics |
Subjects: | Q Science > QA Mathematics |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of Faculty of Science and Health > Mathematics, Statistics and Actuarial Science, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 01 Feb 2016 12:32 |
Last Modified: | 16 May 2024 16:50 |
URI: | http://repository.essex.ac.uk/id/eprint/16006 |
Available files
Filename: YangVernitskiCarrea-revision2_AV_18012016.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 3.0