Séminaire
Date : 14 mai 2024 11:00 - Salle :Amphi 3 - Pôle commun
Combinatorial optimization layers in deep learning pipelines provide efficient policies for dynamic vehicle routing problemsAxel PARMENTIER - ENPC / Cermics |
The 2022 EURO-NeurIPS challenge (1) on dynamic vehicle routing highlights the difficulties of multistage stochastic optimization problems whose state and decision spaces are discrete and combinatorially large. Multistage stochastic optimization policies do not scale, while reinforcement learning approaches have difficulties with the combinatorial decisions. We suggest using a combinatorial optimization layer in a deep learning pipeline : a deep neural network takes in input the state of the system and returns as output the parameters of a deterministic vehicle routing problem, which we solve with a classic combinatorial optimization heuristic to obtain the decision. It seems to overcome the difficulties above and enabled to win the challenge with a three percent margin. The approach requires a learning algorithm to choose the parameters of the neural network. The results of the challenge startled the winning team and opened a scientific question because the learning algorithm used should not have worked. Indeed, it imitates the anticipative policy, which is known to perform poorly in multi-stage stochastic optimization. In this talk, we use tools from combinatorial optimization, reinforcement learning, and multi-stage stochastic optimization to reinterpret the learning algorithm, understand why it works on the problem of the challenge and fails on others, and introduce better learning algorithms for policies based on combinatorial optimization layers. We illustrate the performance of these algorithms on several dynamic routing problems. In particular, we significantly improve the performance of the policy used for the challenge.
(1) https://euro-neurips-vrp-2022.challenges.ortec.com/
https://ecoledesponts.fr/axel-parmentier