多腕バンディット問題とは - わかりやすく解説 Weblio辞書

多腕バンディット問題

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2023/02/28 09:44 UTC 版)

多腕バンディット問題（たわんばんでぃっともんだい、Multi-armed bandit problem）は、確率論と機械学習において、一定の限られた資源のセットを競合する選択肢間で、期待利得を最大化するように配分しなければならない問題。それぞれの選択肢の特性が、配分時には一部しか分かっておらず、時間が経過したり選択肢に資源が配分されることで理解できる可能性がある^[1]^[2]。これは、探索(exploration)と活用(exploitation)のトレードオフのジレンマを例証する古典的な強化学習の問題である。この名前は、スロットマシン（単腕バンディットとも呼ばれる）の列で、どのマシンをプレイするか、各マシンを何回プレイするか、どの順番でプレイするか、現在のマシンを続けるか別のマシンを試すかを決めなければならないギャンブラーを想像することに由来している^[3]。多腕バンディット問題も、広義の確率的スケジューリングに分類される。

脚注

^ ^a ^b John C. Gittins (1989), Multi-armed bandit allocation indices, Wiley-Interscience Series in Systems and Optimization., Chichester: John Wiley & Sons, Ltd., ISBN 978-0-471-92059-5
^ Don Berry; Fristedt, Bert (1985), Bandit problems: Sequential allocation of experiments, Monographs on Statistics and Applied Probability, London: Chapman & Hall, ISBN 978-0-412-24810-8
^ Weber, Richard (1992), “On the Gittins index for multiarmed bandits”, Annals of Applied Probability 2 (4): 1024-1033, doi:10.1214/aoap/1177005588, JSTOR 2959678, https://jstor.org/stable/2959678
^ Press, William H. (2009), “Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research”, Proceedings of the National Academy of Sciences 106 (52): 22387-22392, Bibcode: 2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC 2793317, PMID 20018711.
^ Brochu, Eric; Hoffman, Matthew W.; de Freitas, Nando (2010-09), Portfolio Allocation for Bayesian Optimization, arXiv:1009.5419, Bibcode: 2010arXiv1009.5419B
^ Shen, Weiwei; Wang, Jun; Jiang, Yu-Gang; Zha, Hongyuan (2015), “Portfolio Choices with Orthogonal Bandit Learning”, Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI2015)
^ Farias, Vivek F; Ritesh, Madan (2011), “The irrevocable multiarmed bandit problem”, Operations Research 59 (2): 383-399, doi:10.1287/opre.1100.0891
^ Peter Whittle (1979), “Discussion of Dr Gittins' paper”, Journal of the Royal Statistical Society, Series B 41 (2): 148-177, doi:10.1111/j.2517-6161.1979.tb01069.x
^ Vermorel, Joannes; Mohri, Mehryar (2005), Multi-armed bandit algorithms and empirical evaluation, In European Conference on Machine Learning, Springer, pp. 437-448

「多腕バンディット問題」の続きの解説一覧

多腕バンディット問題のページへのリンク

多腕バンディット問題とは？わかりやすく解説