In recent years, networked data have become widespread due to the increasing importance of social networks and other web-related applications. This growing interest is driving researchers to design new algorithms for solving important problems that involve networked data. In this thesis we present a few practical yet principled algorithms for learning and sequential decision-making on graphs. Classification of networked data is an important problem that has recently received a great deal of attention from the machine learning community. This is due to its many important practical applications: computer vision, bioinformatics, spam detection and text categorization, just to cite a few of the more conspicuous examples. We focus our attention on the task called ``node classification'', often studied in the semi-supervised (transductive) setting. We present two algorithms, motivated by different theoretical frameworks. The first algorithm is studied in the well-known online adversarial setting, within which it enjoys an optimal mistake bound (up to logarithmic factors). The second algorithm is based on a game-theoretic approach, where each node of the network is maximizing its own payoff. The setting corresponds to a Graph Transduction Game in which the graph is a tree. For this special case, we show that the Nash Equilibrium of the game can be reached in linear time. We complement our theoretical findings with an extensive set of experiments using datasets from many different domains. In the second part of the thesis, we present a rapidly emerging theme in the analysis of networked data: signed networks, graphs whose edges carry a label encoding the positive or negative nature of the relationship between the connected nodes. For example, social networks and e-commerce offer several examples of signed relationships: Slashdot users can tag other users as friends or foes, Epinions users can rate each other positively or negatively, Ebay users develop trust and distrust towards sellers in the network. More generally, two individuals that are related because they rate similar products in a recommendation website may agree or disagree in their ratings. Many heuristics for link classification in social networks are based on a form of social balance summarized by the motto “the enemy of my enemy is my friend”. This is equivalent to saying that the signs on the edges of a social graph tend to be consistent with some two-clustering structure of the nodes, where edges connecting nodes from the same cluster are positive and edges connecting nodes from different clusters are negative. We present algorithms for the batch transductive active learning setting, where the topology of the graph is known in advance and our algorithms can ask for the label of some specific edges during the training phase (before starting with the predictions). These algorithms can achieve different tradeoffs between the number of mistakes during the test phase and the number of labels required during the training phase. We also presented an experimental comparison against some state-of-the-art spectral heuristics presented in a previous work, where we show that the simplest or our algorithms is already competitive with the best of these heuristics. In the last chapter we present another way to exploit relational information for sequential predictions: the networks of bandits. Contextual bandits adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such online advertisement and recommendation systems. Many practical applications have a strong social component whose integration in the bandit algorithm could lead to a significant performance improvement: for example, since often friends have similar taste, we may want to serve contents to a group of users by taking advantage of an underlying network of social relationships among them. We introduce a novel algorithmic approach to a particular networked bandit problem. More specifically, we run a bandit algorithm on each network node (e.g., user), allowing it to ``share'' feedback signals with the other nodes by employing the multi-task kernel. We derive the regret analysis of this algorithm and, finally, we report on the results of an experimental comparison between our approach and the state of the art techniques, on both artificial and real-world social networks.

LEARNING ON GRAPHS: ALGORITHMS FOR CLASSIFICATION AND SEQUENTIAL DECISIONS / G. Zappella ; relatore: N. Cesa Bianchi ; coordinatore: G. Naldi. DIPARTIMENTO DI INFORMATICA, 2014 Mar 28. 26. ciclo, Anno Accademico 2013. [10.13130/zappella-giovanni_phd2014-03-28].

LEARNING ON GRAPHS: ALGORITHMS FOR CLASSIFICATION AND SEQUENTIAL DECISIONS

G. Zappella
2014

Abstract

In recent years, networked data have become widespread due to the increasing importance of social networks and other web-related applications. This growing interest is driving researchers to design new algorithms for solving important problems that involve networked data. In this thesis we present a few practical yet principled algorithms for learning and sequential decision-making on graphs. Classification of networked data is an important problem that has recently received a great deal of attention from the machine learning community. This is due to its many important practical applications: computer vision, bioinformatics, spam detection and text categorization, just to cite a few of the more conspicuous examples. We focus our attention on the task called ``node classification'', often studied in the semi-supervised (transductive) setting. We present two algorithms, motivated by different theoretical frameworks. The first algorithm is studied in the well-known online adversarial setting, within which it enjoys an optimal mistake bound (up to logarithmic factors). The second algorithm is based on a game-theoretic approach, where each node of the network is maximizing its own payoff. The setting corresponds to a Graph Transduction Game in which the graph is a tree. For this special case, we show that the Nash Equilibrium of the game can be reached in linear time. We complement our theoretical findings with an extensive set of experiments using datasets from many different domains. In the second part of the thesis, we present a rapidly emerging theme in the analysis of networked data: signed networks, graphs whose edges carry a label encoding the positive or negative nature of the relationship between the connected nodes. For example, social networks and e-commerce offer several examples of signed relationships: Slashdot users can tag other users as friends or foes, Epinions users can rate each other positively or negatively, Ebay users develop trust and distrust towards sellers in the network. More generally, two individuals that are related because they rate similar products in a recommendation website may agree or disagree in their ratings. Many heuristics for link classification in social networks are based on a form of social balance summarized by the motto “the enemy of my enemy is my friend”. This is equivalent to saying that the signs on the edges of a social graph tend to be consistent with some two-clustering structure of the nodes, where edges connecting nodes from the same cluster are positive and edges connecting nodes from different clusters are negative. We present algorithms for the batch transductive active learning setting, where the topology of the graph is known in advance and our algorithms can ask for the label of some specific edges during the training phase (before starting with the predictions). These algorithms can achieve different tradeoffs between the number of mistakes during the test phase and the number of labels required during the training phase. We also presented an experimental comparison against some state-of-the-art spectral heuristics presented in a previous work, where we show that the simplest or our algorithms is already competitive with the best of these heuristics. In the last chapter we present another way to exploit relational information for sequential predictions: the networks of bandits. Contextual bandits adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such online advertisement and recommendation systems. Many practical applications have a strong social component whose integration in the bandit algorithm could lead to a significant performance improvement: for example, since often friends have similar taste, we may want to serve contents to a group of users by taking advantage of an underlying network of social relationships among them. We introduce a novel algorithmic approach to a particular networked bandit problem. More specifically, we run a bandit algorithm on each network node (e.g., user), allowing it to ``share'' feedback signals with the other nodes by employing the multi-task kernel. We derive the regret analysis of this algorithm and, finally, we report on the results of an experimental comparison between our approach and the state of the art techniques, on both artificial and real-world social networks.
28-mar-2014
Settore INF/01 - Informatica
Settore MAT/06 - Probabilita' e Statistica Matematica
machine Learning ; graphs ; online learning
CESA BIANCHI, NICOLO' ANTONIO
NALDI, GIOVANNI
Doctoral Thesis
LEARNING ON GRAPHS: ALGORITHMS FOR CLASSIFICATION AND SEQUENTIAL DECISIONS / G. Zappella ; relatore: N. Cesa Bianchi ; coordinatore: G. Naldi. DIPARTIMENTO DI INFORMATICA, 2014 Mar 28. 26. ciclo, Anno Accademico 2013. [10.13130/zappella-giovanni_phd2014-03-28].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R08968.pdf

accesso aperto

Tipologia: Tesi di dottorato completa
Dimensione 2.02 MB
Formato Adobe PDF
2.02 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/234167
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact