- This event has passed.
December 6 10:00 am - 11:00 am EST
Defender: Maryam Aziz
Date: December 6, 2018
Location: 138 ISEC
Title: On Multi-Armed Bandits Theory and Applications
How would one go about choosing a near-best option from an effectively infinite set in finite time, with imperfect knowledge of the quality of the options? Such problems arise in computer science (e.g. online/active learning, reinforcement learning, and recommendation systems) and beyond. Consider drug testing, for example. One may have access to many candidate drugs (“arms”) but only resources to perform a limited number of tests, and yet one’s goal is to identify a “near-optimal” drug within the budget available. Such problems are well-modeled by variants of the classical multi-armed bandit problem.
We focus on the pure exploration version of the infinitely-armed bandit problem, wherein one seeks one of the best arms without penalty for trying sub-optimal arms along the way. The challenge is to quickly identify an arm with “near best” mean reward by repeatedly testing arms in some intelligent manner. We provide good general strategies to solve this problem with theoretical guarantees in both the fixed budget setting, wherein one attempts to maximize performance with a certain number of tests, and the fixed confidence setting, wherein one attempts to minimize the number of tests needed to meet a certain performance target.
We also present two real-world applications. The first aims to train greedy-optimal boosted decision trees faster than state-of-the-art algorithms using a novel bandit-inspired algorithm. Our algorithm minimizes the number of training documents used to measure each possible decision tree split while ensuring that we identify the split which would score the best were all documents used. We show that our algorithm empirically almost matches a lower bound on algorithms of its class, and approaches a more general lower bound on the number of documents needed for any class of algorithms.
Lastly we present an application to dose finding in phase I clinical trials of cancer treatments. We develop a bandit algorithm which balances three conflicting objectives: the need to efficiently find the best dose level to treat future patients, the need to avoid giving trial patients unsafe doses, and the need to give trial patients large enough doses for effective treatment during the trial. Our method typically beats state-of-the-art methods at all three goals, balancing these competing concerns well.
- Javed Aslam, Northeastern University
- Jonathan Ullman, Northeastern University
- Byron Wallace, Northeastern University
- Emilie Kaufmann, Inria, Lille, France