エンドツーエンドの強化学習とは - わかりやすく解説 Weblio辞書

索引トップランキングカテゴリー

エンドツーエンドの強化学習

(End-to-end reinforcement learning から転送)

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2021/01/14 06:13 UTC 版)

機械学習およびデータマイニング

問題分類クラスタリング回帰異常検知相関ルール（英語版）強化学習構造化予測（英語版）特徴量設計（英語版）表現学習（英語版）オンライン学習（英語版）半教師あり学習（英語版）教師なし学習ランキング学習（英語版）文法獲得（英語版）
教師あり学習（分類 • 回帰）決定木アンサンブル（バギング、ブースティング、ランダムフォレスト） k-NN 線形回帰単純ベイズニューラルネットワークロジスティック回帰パーセプトロン関連ベクトルマシン (RVM)（英語版）サポートベクトルマシン (SVM)
クラスタリング BIRCH（英語版）階層的（英語版） k平均法期待値最大化法 (EM) DBSCAN OPTICS（英語版）平均値シフト（英語版）
次元削減因子分析 CCA ICA LDA（英語版） NMF（英語版） PCA t-SNE
構造化予測（英語版）グラフィカルモデル（ベイジアンネットワーク、 CRF、HMM）
異常検知 k-NN 局所外れ値因子法
ニューラルネットワークオートエンコーダディープラーニング DeepDream 多層パーセプトロン RNN LSTM GRU 制約ボルツマンマシン（英語版） SOM 畳み込みニューラルネットワーク U-Net
強化学習 Q学習 SARSA（英語版）時間差分 (TD)（英語版）
理論偏りと分散のトレードオフ計算論的学習理論（英語版）経験損失最小化（英語版）オッカム学習（英語版） PAC学習統計的学習（英語版） VC理論（英語版）
学会・論文誌等 NIPS（英語版） ICML（英語版） ML（英語版） JMLR（英語版） ArXiv:cs.LG
全般統計学および機械学習の評価指標
表話編歴

エンドツーエンドの強化学習では、エンドツーエンドのプロセス、つまり、ロボットまたはエージェントのセンサーからモーターまでのプロセス全体が、モジュール化されていない単一の層状またはリカレントニューラルネットワークを含み、強化学習（RL）によってトレーニングされる。このアプローチは長い間提案されてきたが、アタリビデオゲーム（2013–15）^[1] ^[2] ^[3] ^[4] およびGoogle DeepMindによるAlphaGo （2016）^[5] の学習で成功した結果によって再び隆盛した。

RLは従来、状態空間とアクション空間の明示的な設計を必要とする一方、状態空間からアクション空間へのマッピングは学習により行われるものであった^[6]。したがって、RLはアクションの学習に限定されるものであり、人間の設計者は、センサー信号から状態空間を構築する方法を設計し、学習前に各アクションのモーションコマンドを生成する方法を提供する必要があった。RLでは、次元の呪いを回避するための非線形関数の近似を提供する目的で、ニューラルネットワークがよく用いられてきた。また主に知覚的エイリアシングまたは部分観測マルコフ決定過程（POMDP）を回避するために、リカレントニューラルネットワークも採用されてきた^[7] ^[8] ^[9] ^[10] ^[11] 。

エンドツーエンドRLは、RLを、アクションのみの学習から、他の機能から独立して開発することが困難な高レベルの機能を含む、センサーからモーターまでのプロセス全体の学習にまで拡張する。高レベルの機能は、センサーやモーターのいずれにも直接接続されないため、入力と出力を与えることさえ困難である。

歴史

このアプローチはTD-Gammon （1992）^[12] で始まった。バックギャモンでは、セルフプレイ中のゲーム状況の評価は、階層型ニューラルネットワークを用いたTD（ $\lambda$

^ ^a ^b ^c Mnih, Volodymyr (December 2013). “Playing Atari with Deep Reinforcement Learning”. NIPS Deep Learning Workshop 2013

^ Mnih, Volodymyr (2015). “Human-level control through deep reinforcement learning”. Nature 518 (7540): 529–533. Bibcode: 2015Natur.518..529M. doi:10.1038/nature14236. PMID 25719670.

^ V. Mnih (2015年2月26日). Performance of DQN in the Game Space Invaders

^ ^a ^b ^c V. Mnih (2015年2月26日). Demonstration of Learning Progress in the Game Breakout

^ Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press. ISBN 978-0262193986

^ Lin, Long-Ji; Mitchell, Tom M. (1993). “Reinforcement Learning with Hidden States”. 2. 271–280

^ Onat, Ahmet; Kita, Hajime (1998). “Q-learning with Recurrent Neural Networks as a Controller for the Inverted Pendulum Problem”. The 5th International Conference on Neural Information Processing (ICONIP). pp. 837–840

^ Onat, Ahmet; Kita, Hajime (1998). “Recurrent Neural Networks for Reinforcement Learning: Architecture, Learning Algorithms and Internal Representation”. International Joint Conference on Neural Networks (IJCNN). pp. 2010–2015. doi:10.1109/IJCNN.1998.687168

^ Bakker, Bram; Linaker, Fredrik (2002). “Reinforcement Learning in Partially Observable Mobile Robot Domains Using Unsupervised Event Extraction”. 2002 IEEE/RSJ International Conference on. Intelligent Robots and Systems (IROS). pp. 938–943

^ Bakker, Bram; Zhumatiy, Viktor (2003). “A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observation”. 2003 IEEE/RSJ International Conference on. Intelligent Robots and Systems (IROS). pp. 430–435

^ Tesauro, Gerald (March 1995). “Temporal Difference Learning and TD-Gammon”. Communications of the ACM 38 (3): 58–68. doi:10.1145/203330.203343 2017年3月10日閲覧。.

^ Shibata, Katsunari; Okabe, Yoichi (1997). “Reinforcement Learning When Visual Sensory Signals are Directly Given as Inputs”. International Conference on Neural Networks (ICNN) 1997

^ Shibata, Katsunari; Iida, Masaru (2003). “Acquisition of Box Pushing by Direct-Vision-Based Reinforcement Learning”. SICE Annual Conference 2003

^ Utsunomiya, Hiroki; Shibata, Katsunari (2008). “Contextual Behavior and Internal Representations Acquired by Reinforcement Learning with a Recurrent Neural Network in a Continuous State and Action Space Task”. International Conference on Neural Information Processing (ICONIP) '08^{[リンク切れ]}

^ Shibata, Katsunari; Kawano, Tomohiko (2008). “Learning of Action Generation from Raw Camera Images in a Real-World-like Environment by Simple Coupling of Reinforcement Learning and a Neural Network”. International Conference on Neural Information Processing (ICONIP) '08

^ Shibata, Katsunari (7 March 2017). "Functions that Emerge through End-to-End Reinforcement Learning". arXiv:1703.02239 [cs.AI]。

^ Shibata, Katsunari (10 March 2017). "Communications that Emerge through Reinforcement Learning Using a (Recurrent) Neural Network". arXiv:1703.03543 [cs.AI]。