Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems - DEBS '16
Data Stream Processing (DSP) applications are widely used to timely extract information from distributed data sources, such as sensing devices, monitoring stations, and social networks. To successfully handle this ever increasing amount of data, recent trends investigate the possibility of exploiting decentralized computational resources (e.g., Fog computing) to define the applications placement. Several placement policies have been proposed in the literature, but they are based on different assumptions and optimization goals and, as such, they are not completely comparable to each other. In this paper we study the placement problem for distributed DSP applications. Our contributions are twofold. We provide a general formulation of the optimal DSP placement (for short, ODP) as an Integer Linear Programming problem which takes explicitly into account the heterogeneity of computing and networking resources and which encompasses - as special cases - the different solutions proposed in the literature. We present an ODP-based scheduler for the Apache Storm DSP framework. This allows us to compare some well-known centralized and decentralized placement solutions. We also extensively analyze the ODP scalability with respect to various parameter settings.