「勾配消失問題」の意味や使い方わかりやすく解説 Weblio辞書

こうばいしょうしつ‐もんだい〔コウバイセウシツ‐〕【勾配消失問題】

読み方：こうばいしょうしつもんだい

機械学習の多層化したニューラルネットワークにおいて、ある段階を越えると学習が進まなくなること。学習は予測値と実際の値の誤差を最小化する過程で進むが、活性化関数の勾配がゼロに近づくことによって、ネットワークの重み付けの修正ができなくなり、結果的に層数が増えるほど学習が困難となる。活性化関数としてシグモイド関数ではなく、ReLU関数を用いると勾配消失が起こりにくくなることが知られる。

ウィキペディア

索引トップランキングカテゴリー

勾配消失問題

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2024/02/25 00:42 UTC 版)

Category:機械学習

脚注

注釈

^ 各種文献では、最上層という表現が使われることもある^[17]。
^ Residual neural networkの略であり、頭文字をとるとRNNとなるが、回帰型ニューラルネットワークとは関係がない。
^ Xavielの初期値は、2010年の提案では一様分布を用いた導出が紹介されている^[35]が、2015年のHeらの論文が示しているように、入出力のノード数をパラメータとして用いた正規分布として表すこともできる^[36]。
^ その後の研究で、バッチノーマライゼーションは内部共変量シフトの問題の緩和には寄与していない可能性があるとしているものもある^[40]。

出典

^ Okatani, Takayuki (2015). “On Deep Learning”. Journal of the Robotics Society of Japan 33 (2): 92–96. doi:10.7210/jrsj.33.92. ISSN 0289-1824. https://doi.org/10.7210/jrsj.33.92.
^ ^a ^b Basodi et al. 2020, p. 197.
^ Yang 2020, p. 53-54.
^ ^a ^b Yang 2020, p. 54.
^ Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 90-91,93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080.
^ Deng 2012, p. 2-3,4.
^ Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.
^ Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”. In Kremer, S. C.; Kolen, J. F.. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. ISBN 0-7803-5369-2
^ Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080.
^ Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (2017-06-15). “Deep learning for computational chemistry” (英語). Journal of Computational Chemistry 38 (16): 1291–1307. arXiv:1701.04503. Bibcode: 2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810.
^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG]。
^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (2013-06-16). “On the difficulty of training recurrent neural networks”. Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA: JMLR.org): III–1310–III–1318. doi:10.5555/3042817.3043083.
^ ^a ^b ^c Deng, Li (2014). “A tutorial survey of architectures, algorithms, and applications for deep learning” (英語). APSIPA Transactions on Signal and Information Processing 3 (1): 18. doi:10.1017/atsip.2013.9. ISSN 2048-7703.
^ J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," Neural Computation, 4, pp. 234–242, 1992.
^ Deng 2012, p. 3.
^ Deng 2012, p. 4.
^ 川上玲「5分で分かる?!有名論文ナナメ読み」（pdf）『情報処理』第59巻第10号、情報処理学会、2018年10月15日、946頁。
^ Hinton, G. E.; Osindero, S.; Teh, Y. (2006). “A fast learning algorithm for deep belief nets”. Neural Computation 18 (7): 1527–1554. doi:10.1162/neco.2006.18.7.1527. PMID 16764513.
^ Hinton, G. (2009). “Deep belief networks”. Scholarpedia 4 (5): 5947. Bibcode: 2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.
^ Hochreiter, Sepp; Schmidhuber, Jürgen (1997). “Long Short-Term Memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
^ Ribeiro, Antônio H.; Tiels, Koen; Aguirre, Luis A.; Schön, Thomas B. (2020-08-26). “Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness” (English). Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR: 2371.
^ Graves, Alex; and Schmidhuber, Jürgen; Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), Advances in Neural Information Processing Systems 22 (NIPS'22), December 7th–10th, 2009, Vancouver, BC, Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552
^ Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. PMID 19299860.
^ ^a ^b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1。
^ ^a ^b ^c Zaeemzadeh, Alireza; Rahnavard, Nazanin; Shah, Mubarak (2021-11-01). “Norm-Preservation: Why Residual Networks Can Become Extremely Deep?”. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11): 3980–3990. arXiv:1805.07477. doi:10.1109/TPAMI.2020.2990339. ISSN 0162-8828.
^ Veit, Andreas; Wilber, Michael; Belongie, Serge (20 May 2016). "Residual Networks Behave Like Ensembles of Relatively Shallow Networks". arXiv:1605.06431 [cs.CV]。
^ ^a ^b Noel, Mathew Mithra; L, Arunkumar; Trivedi, Advait; Dutta, Praneet (4 September 2021). "Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks". arXiv:2108.12943 [cs.LG]。
^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011-06-14). “Deep Sparse Rectifier Neural Networks” (英語). PMLR: 315–323.
^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). “Deep learning”. Nature 521 (7553): 436–444. Bibcode: 2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442.
^ Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (16 October 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE]。
^ Noel, Matthew Mithra; Bharadwaj, Shubham; Muthiah-Nakarajan, Venkataraman; Dutta, Praneet; Amali, Geraldine Bessie (7 November 2021). "Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons". arXiv:2111.04020 [cs.NE]。
^ ^a ^b Balduzzi, David; Frean, Marcus; Leary, Lennox; Lewis, J P; Ma, Kurt Wan-Duo; McWilliams, Brian (2017-08-06). “The shattered gradients problem: if resnets are the answer, then what is the question?”. Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia: JMLR.org): 344. doi:10.5555/3305381.3305417.
^ Glorot & Bengio 2010.
^ He et al. 2015.
^ Glorot & Bengio 2010, p. 251, 253.
^ He et al. 2015, p. 1030.
^ He et al. 2015, p. 1029, 1030.
^ Bjorck et al. 2018, p. 7705.
^ Bjorck et al. 2018, p. 7705-7706.
^ Santurkar et al. 2018, p. 2488-2489.
^ Ioffe, Sergey; Szegedy, Christian (2015-07-06). “Batch normalization: accelerating deep network training by reducing internal covariate shift”. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France: JMLR.org): 449，455. doi:10.5555/3045118.3045167.
^ Santurkar et al. 2018, p. 2492-2493.
^ Bjorck et al. 2018, p. 7709-7710.
^ ^a ^b Schmidhuber, Jürgen (2015). “Deep learning in neural networks: An overview”. Neural Networks 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637.
^ Sven Behnke (2003). Hierarchical Neural Networks for Image Interpretation.. Lecture Notes in Computer Science. 2766. Springer
^ “Sepp Hochreiter's Fundamental Deep Learning Problem (1991)”. people.idsia.ch. 2017年1月7日閲覧。

[続きの解説]

「勾配消失問題」の続きの解説一覧

勾配消失問題のページへのリンク

勾配消失問題とは？わかりやすく解説

こうばいしょうしつ‐もんだい〔コウバイセウシツ‐〕【勾配消失問題】

勾配消失問題

注釈

出典

「勾配消失問題」の関連用語


	(C)Shogakukan Inc. 株式会社小学館
	All text is available under the terms of the GNU Free Documentation License. この記事は、ウィキペディアの勾配消失問題 (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。 Weblio辞書に掲載されているウィキペディアの記事も、全てGNU Free Documentation Licenseの元に提供されております。

勾配消失問題とは？ わかりやすく解説

こうばいしょうしつ‐もんだい〔コウバイセウシツ‐〕【勾配消失問題】

勾配消失問題

注釈

出典

「勾配消失問題」の関連用語

勾配消失問題とは？わかりやすく解説