Stefan Otte
Stefan Otte
https://github.com/sotte
– Kurt Lewin
“The key idea behind *active learning* is that a machine learning algorithm can achieve *greater accuracy* with *fewer training labels* if it is allowed to *choose the data* from which it learns.
An active learner may *pose queries*, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator).
Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but *labels are difficult, time-consuming, or expensive to obtain*.”
– Burr Settles, Active Learning Literature Survey
greater accuracy with fewer training labels
→ “good dataTM”
actively query for data
→ sequential decision making
- uncertainty
- least confident
- margin
- entropy
- query-by-committee
- expected model change (decision theory)
- expected error reduction
- expected variance reduction
- …
Problem statement
- Find a multi-armed bandit
- Play arms using bandit theory
- Profit $$$
- given a bandit with
$n$ arms - each arm
$i ∈ {1,…,n}$ returns reward
Goal: Find a policy that $$max ∑t=1^T y_t$$
past performance + exploration bonus
Play each bandit once
Then play bandit that
-
$\bar\mu_i$ : mean reward of bandit$i$ -
$n$ : total rounds played -
$n_i$ : rounds bandit$i$ was played
- brand bandit
- car body bandit
- segment bandit
- Pythons all the way down ;D
- sklearn
- Flask REST API
- Elasticsearch
Active Learning or: How I Learned to Stop Worrying and Love Small Data
- Sequential Decision Making
- Global Optimizaiton
- Experimental Design
- (Bayesian) Reinforcement Learning
- Optimal solution exists: planning in belief space, but is infeasible
- Tuning hyperparams with Hyperband
Questions?
Stefan Otte
- Active Learning Literature Survey
- Finite-time Analysis of the Multiarmed Bandit Problem - Auer et al
- Bandits, Global Optimization, Active Learning, and Bayesian RL – understanding the common ground - Toussaint video
Questions?
Stefan Otte