Seminar time shown in CDT (UTC -5)
I will present a simple algorithm that achieves constant regret in many widely-studied online decision-making problems, including online resource-allocation and pricing, generalized assignment, and online bin packing. In particular, I will consider a general class of finite-horizon control problems, where we see a stream of stochastic arrivals from some known distribution, and need to select actions, with the final objective depending only on the aggregate type-action counts. For such settings, I will introduce a unified algorithmic paradigm, and provide a simple, yet general, condition under which these algorithms achieve constant regret, i.e., additive loss compared to the hindsight optimal solution which is independent of the horizon and state-space. The results stem from an elementary sample-path coupling argument, which may prove useful for a larger class of problems in online decision-making. Time permitting, I will illustrate this by showing how we can use this technique to obtain simple data-driven implementations of the above algorithms, which achieve constant regret with as little as a single data trace.
Access:
Seminar will be delivered live: Zoom link (sign-in required)
The Zoom conferencing system is accessible to UT faculty, staff, and students with support from ITS. Otherwise, you can sign up for a free account on the Zoom website.