RIKEN Center for Advanced Intelligence Project Online Decision Making Unit
Unit Leader: Junya Honda (D.Sci.)
We are studying algorithms for problems requiring sequential decision making. In the most situations of decision making we do not have enough data or knowledge on the target and we have to explore the best choice with trial and error. Such problems are formulated as bandit problems where an agent tries to maximize the cumulative reward or find the action with the maximum expectation based on information obtained only from actually chosen actions. We are establishing achievable limits on these problems and constructing algorithms to achieve these limits.
Main Research Fields
- Computer Science
Related Research Fields
- Bandit Problems
- Experimental Design
Papers with an asterisk(*) are based on research conducted outside of RIKEN.
- 1.*Komiyama, J., Honda, J., and Nakagawa, H.:
"Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm"
The 33rd International Conference on Machine Learning (ICML2016), pp.1235-1244, (2016).
- 2.*Honda, J., and Takemura, A.:
"Non-Asymptotic Analysis of a New Bandit Algorithm for Semi-Bounded Rewards"
Journal of Machine Learning Research, vol.16, pp.1721-3756, (2015).
- 3.*Komiyama, J., Honda, J., and Nakagawa, H.:
"Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring"
The 29th Neural Information Processing Systems (NIPS2015), pp.1783-1791, (2015).
- 4.*Komiyama, J., Honda, J., and Nakagawa, H.:
"Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays"
The 32nd International Conference on Machine Learning (ICML2015), pp.1152-1161, (2015).
- 5.*Komiyama, J., Honda, J., Kashima, H. and Nakagawa, H.:
"Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem"
The 28th Annual Conference on Learning Theory (COLT2015), pp.1141-1154, (2015).
- 6.*Honda, J., and Takemura, A.,
"Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors"
Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS2014), (2014).
- 7.*Honda, J., and Takemura, A.:
"Stochastic Bandit Based on Empirical Moments".
Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS2012), pp.529-537, (2012).
- 8.*Honda, J., and Takemura, A.:
"An Asymptotically Optimal Policy for Finite Support Models in the Multiarmed Bandit Problem"
Machine Learning, vol.85, pp.361-391, (2011).
- 9.*Honda, J., and Takemura, A.:
"An Asymptotically Optimal Bandit Algorithm for Bounded Support Models".
The 23rd Annual Conference on Learning Theory (COLT2010), pp.67-79, (2010).
- Junya Honda
- Unit Leader
Email: junya.honda [at] riken.jp