Creating and sharing knowledge for telecommunications

Bloom Filter based Optimization Scheme for Massive Data Handling in IoT Environment

Singh, A. ; Garg, S. G. ; Batra, S. ; Kumar, N. K. ; Rodrigues, J. R.

Future Generation Computer Systems Vol. 82, Nº -, pp. 440 - 449, May, 2018.

ISSN (print): 0167-739X
ISSN (online):

Journal Impact Factor: 2,786 (in 2014)

Digital Object Identifier: 10.1016/j.future.2017.12.016

Abstract
With the widespread popularity of big data usage across various applications, need for efficient storage, processing, and retrieval of massive datasets generated from different applications has become inevitable. Further, handling of these datasets has become one of the biggest challenges for the research community due to the involved heterogeneity in their formats. This can be attributed to their diverse sources of generation ranging from sensors to on-line transactions data and social media access. In this direction, probabilistic data structures (PDS) are suitable for large-scale data processing, approximate predictions, fast retrieval and unstructured data storage. In conventional databases, entire data needs to be stored in memory for efficient processing, but applications involving real time in-stream data demand time-bound query output in a single pass. Hence, this paper proposes Accommodative Bloom filter (ABF), a variant of scalable bloom filter, where insertion of bulk data is done using the addition of new filters vertically. Array of m bits is divided into b buckets of bits each and new filters of size are added to each bucket to accommodate the incoming data. Data generated from various sensors has been considered for experimental purposes where query processing is done at two levels to improve the accuracy and reduce the search time. It has been found that insertion and search time complexity of ABF does not increase with increase in number of elements. Further, results indicate that ABF outperforms the existing variants of Bloom filters in terms of false positive rates and query complexity, especially when dealing with in-stream data.