Date : Sept. 29, 2016, 3:30 p.m. - Room :Amphi Garcia

Sequential Decision Making in Linear Bandit Setting

Marta SOARE - Univ. of Aalto, Finlande

Pas de visio-conférence When making a decision in an unknown environment, a learning agent decides at every step whether to gather more information on the environment (explore), or to choose what seems to be the best action given the current information (exploit). The multi-armed bandit setting is a simple framework that captures this exploration-exploitation trade-off and offers efficient solutions for sequential decision making. In this talk, I will review a particular multi-armed bandit setting, where there is a global linear structure in the environment. I will then show how this structure can be exploited for finding the best action using a minimal number of steps and for deciding when to transfer samples to improve the performance in other similar environments.