We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.
Efficient and near-optimal algorithms for sampling connected subgraphs / M. Bressan - In: STOC 2021: Proceedings / [a cura di] S. Khuller, V. Vassilevska Williams. - [s.l] : ACM, 2021. - ISBN 9781450380539. - pp. 1132-1143 (( Intervento presentato al 53. convegno Symposium on Theory of Computing tenutosi a Roma nel 2021 [10.1145/3406325.3451042].
Efficient and near-optimal algorithms for sampling connected subgraphs
M. Bressan
Primo
2021
Abstract
We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.File | Dimensione | Formato | |
---|---|---|---|
STOC_21_camera_ready CORRECTED 2.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
659.8 kB
Formato
Adobe PDF
|
659.8 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
3406325.3451042.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
803.04 kB
Formato
Adobe PDF
|
803.04 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.