We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.
Efficient and near-optimal algorithms for sampling connected subgraphs / M. Bressan - In: STOC 2021: Proceedings / [a cura di] S. Khuller, V. Vassilevska Williams. - [s.l] : ACM, 2021. - ISBN 9781450380539. - pp. 1132-1143 (( Intervento presentato al 53. convegno Symposium on Theory of Computing tenutosi a Roma nel 2021.
|Titolo:||Efficient and near-optimal algorithms for sampling connected subgraphs|
BRESSAN, MARCO (Primo) (Corresponding)
|Parole Chiave:||graphlet sampling; subgraph sampling; sublinear algorithms; random walks|
|Settore Scientifico Disciplinare:||Settore INF/01 - Informatica|
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
|Data di pubblicazione:||2021|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.1145/3406325.3451042|
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|