IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural approaches based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that this approach is outperformed by a carefully engineered version of color coding (CC) [1], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC. Furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. WhileMCis very efficient in terms of space, CC's memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that a careful implementation of CC can push the limits of the state of the art, both in terms of the size of the input graph and of that of the graphlets.

Counting graphlets: Space vs time / M. Bressan, F. Chierichetti, R. Kumar, S. Leucci, A. Panconesi - In: WSDM '17: Proceedings / [a cura di] M. de Rijke, M. Shokouhi, A. Tomkins, M. Zhang. - [s.l] : ACM, 2017. - ISBN 9781450346757. - pp. 557-566 (( Intervento presentato al 10. convegno Web Search and Data Mining tenutosi a Cambridge nel 2017 [10.1145/3018661.3018732].

Counting graphlets: Space vs time

M. Bressan^Primo;Chierichetti F.;Kumar R.;Leucci S.;Panconesi A.

2017

Abstract

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural approaches based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that this approach is outperformed by a carefully engineered version of color coding (CC) [1], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC. Furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. WhileMCis very efficient in terms of space, CC's memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that a careful implementation of CC can push the limits of the state of the art, both in terms of the size of the input graph and of that of the graphlets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2017
			
	Enti collegati al convegno
	
				ACM SIGKDD
ACM SIGMOD
ACM SIGWEB
Special Interest Group on Information Retrieval (ACM SIGIR)
			
	DOI
	
				https://dx.doi.org/10.1145/3018661.3018732
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
Bressan&2017-WSDM.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 394.95 kB Formato Adobe PDF Visualizza/Apri	394.95 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/922284

Citazioni

ND

68

57

ND

social impact