Optimal Resource Allocation with SemiBandit Feedback
Tor Lattimore, Koby Crammer, Csaba Szepesvari
Abstract:
We study a sequential resource allocation prob
lem involving a fixed number of recurring jobs.
At each timestep the manager should distribute
available resources among the jobs in order to
maximise the expected number of completed
jobs. Allocating more resources to a given job in
creases the probability that it completes, but with
a cutoff. Specifically, we assume a linear model
where the probability increases linearly until it
equals one, after which allocating additional re
sources is wasteful. We assume the difficulty of
each job is unknown and present the first algo
rithm for this problem and prove upper and lower
bounds on its regret. Despite its apparent sim
plicity, the problem has a rich structure: we show
that an appropriate optimistic algorithm can im
prove its learning speed dramatically beyond the
results one normally expects for similar problems
as the problem becomes resourceladen.
Keywords:
Pages: 477486
PS Link:
PDF Link: /papers/14/p477lattimore.pdf
BibTex:
@INPROCEEDINGS{Lattimore14,
AUTHOR = "Tor Lattimore
and Koby Crammer and Csaba Szepesvari",
TITLE = "Optimal Resource Allocation with SemiBandit Feedback",
BOOKTITLE = "Proceedings of the Thirtieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI14)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2014",
PAGES = "477486"
}

