Dissemin is shutting down on January 1st, 2025

Published in

VLDB Endowment, Proceedings of the VLDB Endowment, 5(13), p. 656-669, 2020

DOI: 10.14778/3377369.3377375

Links

Tools

Export citation

Search in Google Scholar

Hunting multiple bumps in graphs

Journal article published in 2020 by Yahui Sun, Jun Luo, Theodoros Lappas, Xiaokui Xiao, Bin Cui ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Bump hunting is an important approach to the extraction of insights from Euclidean datasets. Recently, it has been explored for graph datasets for the first time, and a single bump is hunted in an unweighted graph in this exploration. Here, we extend this exploration by hunting multiple bumps in a weighted graph. Given a weighted graph and a set of query nodes exhibiting a property of interest, our objective is to find k non-overlapping and connected subgraphs, i.e., bumps, in which the discrepancy between the numbers of query and non-query nodes is maximized and the sum of edge costs is minimized simultaneously. We prove that our extended bump hunting problem can be transformed to a recently formulated Prize-Collecting Steiner Forest Problem (PCSFP). We further prove that PCSFP is NP-hard even in trees. Then, we propose a fast approximation algorithm for solving PCSFP in trees. Based on this algorithm, we improve the state-of-the-art approximation algorithm for solving PCSFP in graphs, and prove that the solutions of our improvement are always better than or equal to those of the state-of-the-art algorithm. Moreover, we adapt the existing bump hunting algorithms for solving our extended bump hunting problem. We evaluate our methodology via real datasets, and show that 1) our improvement scales well to large graphs, while producing solutions that dominate those of the state-of-the-art algorithm; and 2) our adaptation of an existing bump hunting algorithm can also produce solutions that are better than those of the state-of-the-art algorithm in some cases.