We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.

Efficient and near-optimal algorithms for sampling connected subgraphs / M. Bressan - In: STOC 2021: Proceedings / [a cura di] S. Khuller, V. Vassilevska Williams. - [s.l] : ACM, 2021. - ISBN 9781450380539. - pp. 1132-1143 (( Intervento presentato al 53. convegno Symposium on Theory of Computing tenutosi a Roma nel 2021.

Efficient and near-optimal algorithms for sampling connected subgraphs

Bressan, Marco
2021

Abstract

We study the graphlet sampling problem: given an integer ≥ 3 and a graph = ( , ), sample a connected induced -node subgraph of (also called -graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δ −1 where Δ is the maximum degree of . Color coding is better understood, but requires a 2 ()-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only -uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic -graphlet random walk, as a function of the mixing time of . In particular, ignoring () factors, we show that the -graphlet random walk mixes in Θe(() · () −1 ) steps, where () is the mixing time of and () is the ratio between its maximum and minimum degree, and on some graphs this is tight up to poly lg factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time (2 lg +) and space (), and a sampling phase that takes () lg Δ time per sample. It is based on ordering in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for-uniform graphlet sampling, with a preprocessing phase that runs in time ( 6 −1 lg) and space (), and a sampling phase that takes () 1 10 lg 1 expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of, followed by emulating uniform sampling through coupling arguments.
graphlet sampling; subgraph sampling; sublinear algorithms; random walks
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
STOC_21_camera_ready CORRECTED 2.pdf

non disponibili

659.8 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
3406325.3451042.pdf

non disponibili

803.04 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/850657
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact