Bottom-up k-Vertex Connected Component Enumeration by Multiple Expansion

IEEE International Conference on Data Engineering (ICDE) 2024

 

Haoyu Liu, Yongcai Wang, Xiaojia Xu, Deying Li

School of Information, Renmin University of China, Beijing, 100872

 

image-20240529183359317 image-20240529183422179 image-20240529183517490

 

 

Overview

Bottom-up k-vertex connected component (k-VCC) enumeration methods, referred to as VCCE-BU, have exhibited better efficiency compared to the exact top-down k-VCC enumeration method (VCCE-TD). However, VCCE-BU}has been found to have surprisingly low detection accuracy, that it may detect fewer vertices in k-VCC than VCCE-TD. This raises the question of what causes VCCE-BU to have a low k-VCC enumeration quality.

This paper investigates the reason and proposes that the local expansion should be reformulated as a Multiple vertex collaborative Expansion problem instead of the traditional Unitary Expansion (UE). A Multiple Expansion (ME) approach, which allows to expand multiple neighboring vertices jointly and collaboratively is proposed, which is proven exact in local expansion. However, the exact ME-based local expansion needs to explore large neighborhoods in each step, which is time-consuming. To address the efficiency issue, a Ring-based Multiple Expansion (RME) is proposed to conduct ME within one-hop neighbors. A maximum flow-based merging algorithm FBM is proposed for effective merging. A maximal clique and breath-first-search-based quick seeding algorithm QkVCS is proposed to generate k-VCC seeds efficiently. As a result, RIPPLE which integrates QkVCS+FBM+RME is presented as a new accurate and efficient bottom-up approach. Extensive verifications in real large-scale graph datasets demonstrate that even the single-thread RIPPLE is much more accurate and a magnitude faster than the state-of-the-art VCCE-BU method. We also demonstrate the effective speeding up to run RIPPLE in parallel.

System Architecture

We first use QkVCS to generate a large number of seed subgraphs in the graph, and these seeds are connected by k vertices. We then use the FBM algorithm and the RME algorithm for multiple rounds of iterations. FBM determines whether the two seeds can be merged into a larger k-vertex connected subgraph; RME tries to add more vertices to the seed subgraph. Until all seeds are no longer growing, RIPPLE outputs the results.

fig1

Contributions

 

Evaluations

image-20240530163844315

image-20240530163828219

 

image-20240530163852013

 

image-20240530163857732

image-20240530163903645

image-20240530163911266

image-20240530163917177

Acknowledgment

This work was partially supported by the National Natural Science Foundation of China Grant No. 61972404, No. 12071478, and Public Computing Cloud, Renmin University of China.