We cast several problems arising from digital markets and economics into an online learning framework, where a learner sequentially interacts with an unknown environment, trying to discover its relevant features to maximize her cumulative reward. After an introduction to online learning in Chapter 1, we start with a study of the bilateral trade problem in Chapter 2. Here, the learner plays the role of a broker whose goal is to increase the value of the market by sequentially interacting with pairs of sellers and buyers, facilitating trades between them. We show how the interplay between the feedback received by the learner and the set of available trading mechanisms affects the attainable regret regimes, devising ad hoc solutions to address the exploration/exploitation dilemma in various types of environments. In Chapter 3, we present an analysis of transparency in repeated first-price auctions. Here, the learner participates in a sequence of first-price auctions to win objects whose exact value is revealed only when she wins the corresponding auction. We show how the level of transparency of the auctioneer (i.e., the amount of information disclosed at the end of each auction) influences the regret rates in different types of environments. In Chapter 4, we study the problem of adaptive optimal taxation. Here, the learner plays the role of a policymaker whose goal is to increase social welfare (seen as a weighted sum of private utility and public revenue) by sequentially setting the tax rate in the labor market. Interestingly, once framed in a formal online learning setting, this problem can be seen as a non-trivial generalization of the classical dynamic pricing problem. Finally, in Chapter 5, we propose an abstract framework generalizing bandits with delayed feedback. This framework allows us to capture scenarios arising frequently in advertising campaigns, where the feedback received comes in the form of delayed and composite income, the sources of which depend on the actions the learner took in the past, and whose exact contributions are not easily identifiable.

ONLINE LEARNING METHODS FOR DIGITAL MARKETS / R. Colomboni ; supervisor: N. Cesa-Bianchi ; co-supervisor: M. Pontil, T. Cesari ; coordinatore: R. Sassi. - Milano. Dipartimento di Informatica Giovanni Degli Antoni, 2024 Apr. 36. ciclo, Anno Accademico 2022/2023.

ONLINE LEARNING METHODS FOR DIGITAL MARKETS

R. Colomboni
2024

Abstract

We cast several problems arising from digital markets and economics into an online learning framework, where a learner sequentially interacts with an unknown environment, trying to discover its relevant features to maximize her cumulative reward. After an introduction to online learning in Chapter 1, we start with a study of the bilateral trade problem in Chapter 2. Here, the learner plays the role of a broker whose goal is to increase the value of the market by sequentially interacting with pairs of sellers and buyers, facilitating trades between them. We show how the interplay between the feedback received by the learner and the set of available trading mechanisms affects the attainable regret regimes, devising ad hoc solutions to address the exploration/exploitation dilemma in various types of environments. In Chapter 3, we present an analysis of transparency in repeated first-price auctions. Here, the learner participates in a sequence of first-price auctions to win objects whose exact value is revealed only when she wins the corresponding auction. We show how the level of transparency of the auctioneer (i.e., the amount of information disclosed at the end of each auction) influences the regret rates in different types of environments. In Chapter 4, we study the problem of adaptive optimal taxation. Here, the learner plays the role of a policymaker whose goal is to increase social welfare (seen as a weighted sum of private utility and public revenue) by sequentially setting the tax rate in the labor market. Interestingly, once framed in a formal online learning setting, this problem can be seen as a non-trivial generalization of the classical dynamic pricing problem. Finally, in Chapter 5, we propose an abstract framework generalizing bandits with delayed feedback. This framework allows us to capture scenarios arising frequently in advertising campaigns, where the feedback received comes in the form of delayed and composite income, the sources of which depend on the actions the learner took in the past, and whose exact contributions are not easily identifiable.
17-apr-2024
Settore INF/01 - Informatica
online learning; partial monitoring; multi-armed bandits; two-sided markets; first-price auctions
CESA BIANCHI, NICOLO' ANTONIO
CESA BIANCHI, NICOLO' ANTONIO
SASSI, ROBERTO
Doctoral Thesis
ONLINE LEARNING METHODS FOR DIGITAL MARKETS / R. Colomboni ; supervisor: N. Cesa-Bianchi ; co-supervisor: M. Pontil, T. Cesari ; coordinatore: R. Sassi. - Milano. Dipartimento di Informatica Giovanni Degli Antoni, 2024 Apr. 36. ciclo, Anno Accademico 2022/2023.
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R12845.pdf

accesso aperto

Descrizione: Tesi di PhD
Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 2.84 MB
Formato Adobe PDF
2.84 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1038352
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact