We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.

Efficient and near-optimal algorithms for sampling connected subgraphs / M. Bressan - In: STOC 2021: Proceedings / [a cura di] S. Khuller, V. Vassilevska Williams. - [s.l] : ACM, 2021. - ISBN 9781450380539. - pp. 1132-1143 (( Intervento presentato al 53. convegno Symposium on Theory of Computing tenutosi a Roma nel 2021 [10.1145/3406325.3451042].

Efficient and near-optimal algorithms for sampling connected subgraphs

M. Bressan
Primo
2021

Abstract

We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.
graphlet sampling; subgraph sampling; sublinear algorithms; random walks
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
2021
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
STOC_21_camera_ready CORRECTED 2.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 659.8 kB
Formato Adobe PDF
659.8 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
3406325.3451042.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 803.04 kB
Formato Adobe PDF
803.04 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/850657
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 1
social impact