勾配消失問題勾配消失問題の概要

勾配消失問題（こうばいしょうしつもんだい、英: vanishing gradient problem）は、機械学習において、勾配ベースの学習手法と誤差逆伝播法を利用してニューラルネットワークを学習する際に、誤差逆伝播に必要な勾配が非常に小さくなり、学習が制御できなくなる問題である^[1]。この問題を解決するために、リカレントニューラルネットワークではLSTMと呼ばれる構造が導入されたり、深層のネットワークではResNetと呼ばれる構造が導入される。

また、活性化関数の勾配が非常に大きな値をとり、発散してしまうこともある。このような問題は、勾配爆発問題（こうばいばくはつもんだい、英: exploding gradient problem）と呼ばれる。

脚注

注釈

^ 各種文献では、最上層という表現が使われることもある^[17]。
^ Residual neural networkの略であり、頭文字をとるとRNNとなるが、回帰型ニューラルネットワークとは関係がない。
^ Xavielの初期値は、2010年の提案では一様分布を用いた導出が紹介されている^[35]が、2015年のHeらの論文が示しているように、入出力のノード数をパラメータとして用いた正規分布として表すこともできる^[36]。
^ その後の研究で、バッチノーマライゼーションは内部共変量シフトの問題の緩和には寄与していない可能性があるとしているものもある^[40]。

出典

^ Okatani, Takayuki (2015). “On Deep Learning”. Journal of the Robotics Society of Japan 33 (2): 92–96. doi:10.7210/jrsj.33.92. ISSN 0289-1824. https://doi.org/10.7210/jrsj.33.92.
^ ^a ^b Basodi et al. 2020, p. 197.
^ Yang 2020, p. 53-54.
^ ^a ^b Yang 2020, p. 54.
^ Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 90-91,93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080.
^ Deng 2012, p. 2-3,4.
^ Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.
^ Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”. In Kremer, S. C.; Kolen, J. F.. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. ISBN 0-7803-5369-2
^ Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080.
^ Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (2017-06-15). “Deep learning for computational chemistry” (英語). Journal of Computational Chemistry 38 (16): 1291–1307. arXiv:1701.04503. Bibcode: 2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810.
^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG]。
^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (2013-06-16). “On the difficulty of training recurrent neural networks”. Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA: JMLR.org): III–1310–III–1318. doi:10.5555/3042817.3043083.
^ ^a ^b ^c Deng, Li (2014). “A tutorial survey of architectures, algorithms, and applications for deep learning” (英語). APSIPA Transactions on Signal and Information Processing 3 (1): 18. doi:10.1017/atsip.2013.9. ISSN 2048-7703.
^ J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," Neural Computation, 4, pp. 234–242, 1992.
^ Deng 2012, p. 3.
^ Deng 2012, p. 4.
^ 川上玲「5分で分かる?!有名論文ナナメ読み」（pdf）『情報処理』第59巻第10号、情報処理学会、2018年10月15日、946頁。
^ Hinton, G. E.; Osindero, S.; Teh, Y. (2006). “A fast learning algorithm for deep belief nets”. Neural Computation 18 (7): 1527–1554. doi:10.1162/neco.2006.18.7.1527. PMID 16764513.
^ Hinton, G. (2009). “Deep belief networks”. Scholarpedia 4 (5): 5947. Bibcode: 2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.
^ Hochreiter, Sepp; Schmidhuber, Jürgen (1997). “Long Short-Term Memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
^ Ribeiro, Antônio H.; Tiels, Koen; Aguirre, Luis A.; Schön, Thomas B. (2020-08-26). “Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness” (English). Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR: 2371.
^ Graves, Alex; and Schmidhuber, Jürgen; Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), Advances in Neural Information Processing Systems 22 (NIPS'22), December 7th–10th, 2009, Vancouver, BC, Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552
^ Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. PMID 19299860.
^ ^a ^b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1。
^ ^a ^b ^c Zaeemzadeh, Alireza; Rahnavard, Nazanin; Shah, Mubarak (2021-11-01). “Norm-Preservation: Why Residual Networks Can Become Extremely Deep?”. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11): 3980–3990. arXiv:1805.07477. doi:10.1109/TPAMI.2020.2990339. ISSN 0162-8828.
^ Veit, Andreas; Wilber, Michael; Belongie, Serge (20 May 2016). "Residual Networks Behave Like Ensembles of Relatively Shallow Networks". arXiv:1605.06431 [cs.CV]。
^ ^a ^b Noel, Mathew Mithra; L, Arunkumar; Trivedi, Advait; Dutta, Praneet (4 September 2021). "Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks". arXiv:2108.12943 [cs.LG]。
^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011-06-14). “Deep Sparse Rectifier Neural Networks” (英語). PMLR: 315–323.
^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). “Deep learning”. Nature 521 (7553): 436–444. Bibcode: 2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442.
^ Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (16 October 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE]。
^ Noel, Matthew Mithra; Bharadwaj, Shubham; Muthiah-Nakarajan, Venkataraman; Dutta, Praneet; Amali, Geraldine Bessie (7 November 2021). "Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons". arXiv:2111.04020 [cs.NE]。
^ ^a ^b Balduzzi, David; Frean, Marcus; Leary, Lennox; Lewis, J P; Ma, Kurt Wan-Duo; McWilliams, Brian (2017-08-06). “The shattered gradients problem: if resnets are the answer, then what is the question?”. Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia: JMLR.org): 344. doi:10.5555/3305381.3305417.
^ Glorot & Bengio 2010.
^ He et al. 2015.
^ Glorot & Bengio 2010, p. 251, 253.
^ He et al. 2015, p. 1030.
^ He et al. 2015, p. 1029, 1030.
^ Bjorck et al. 2018, p. 7705.
^ Bjorck et al. 2018, p. 7705-7706.
^ Santurkar et al. 2018, p. 2488-2489.
^ Ioffe, Sergey; Szegedy, Christian (2015-07-06). “Batch normalization: accelerating deep network training by reducing internal covariate shift”. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France: JMLR.org): 449，455. doi:10.5555/3045118.3045167.
^ Santurkar et al. 2018, p. 2492-2493.
^ Bjorck et al. 2018, p. 7709-7710.
^ ^a ^b Schmidhuber, Jürgen (2015). “Deep learning in neural networks: An overview”. Neural Networks 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637.
^ Sven Behnke (2003). Hierarchical Neural Networks for Image Interpretation.. Lecture Notes in Computer Science. 2766. Springer
^ “Sepp Hochreiter's Fundamental Deep Learning Problem (1991)”. people.idsia.ch. 2017年1月7日閲覧。

[前の解説]

[続きの解説]

「勾配消失問題」の続きの解説一覧

[18] 各種文献では、最上層という表現が使われることもある^[17]。

[25] Residual neural networkの略であり、頭文字をとるとRNNとなるが、回帰型ニューラルネットワークとは関係がない。

[39] Xavielの初期値は、2010年の提案では一様分布を用いた導出が紹介されている^[35]が、2015年のHeらの論文が示しているように、入出力のノード数をパラメータとして用いた正規分布として表すこともできる^[36]。

[44] その後の研究で、バッチノーマライゼーションは内部共変量シフトの問題の緩和には寄与していない可能性があるとしているものもある^[40]。

[1] Okatani, Takayuki (2015). “On Deep Learning”. Journal of the Robotics Society of Japan 33 (2): 92–96. doi:10.7210/jrsj.33.92. ISSN 0289-1824. https://doi.org/10.7210/jrsj.33.92.

[FOOTNOTEBasodiJiZhangPan2020197-2] Basodi et al. 2020, p. 197.

[FOOTNOTEYang202053-54-3] Yang 2020, p. 53-54.

[FOOTNOTEYang202054-4] Yang 2020, p. 54.

[5] Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 90-91,93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080.

[FOOTNOTEDeng20122-3,4-6] Deng 2012, p. 2-3,4.

[7] Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.

[8] Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”. In Kremer, S. C.; Kolen, J. F.. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. ISBN 0-7803-5369-2

[9] Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080.

[10] Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (2017-06-15). “Deep learning for computational chemistry” (英語). Journal of Computational Chemistry 38 (16): 1291–1307. arXiv:1701.04503. Bibcode: 2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810.

[11] Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG]。

[12] Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (2013-06-16). “On the difficulty of training recurrent neural networks”. Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA: JMLR.org): III–1310–III–1318. doi:10.5555/3042817.3043083.

[Deng2014-13] Deng, Li (2014). “A tutorial survey of architectures, algorithms, and applications for deep learning” (英語). APSIPA Transactions on Signal and Information Processing 3 (1): 18. doi:10.1017/atsip.2013.9. ISSN 2048-7703.

[SCHMID1992-14] J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," Neural Computation, 4, pp. 234–242, 1992.

[FOOTNOTEDeng20123-15] Deng 2012, p. 3.

[FOOTNOTEDeng20124-16] Deng 2012, p. 4.

[17] 川上玲「5分で分かる?!有名論文ナナメ読み」（pdf）『情報処理』第59巻第10号、情報処理学会、2018年10月15日、946頁。

[hinton2006-19] Hinton, G. E.; Osindero, S.; Teh, Y. (2006). “A fast learning algorithm for deep belief nets”. Neural Computation 18 (7): 1527–1554. doi:10.1162/neco.2006.18.7.1527. PMID 16764513.

[20] Hinton, G. (2009). “Deep belief networks”. Scholarpedia 4 (5): 5947. Bibcode: 2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.

[lstm-21] Hochreiter, Sepp; Schmidhuber, Jürgen (1997). “Long Short-Term Memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.

[22] Ribeiro, Antônio H.; Tiels, Koen; Aguirre, Luis A.; Schön, Thomas B. (2020-08-26). “Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness” (English). Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR: 2371.

[23] Graves, Alex; and Schmidhuber, Jürgen; Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), Advances in Neural Information Processing Systems 22 (NIPS'22), December 7th–10th, 2009, Vancouver, BC, Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552

[24] Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. PMID 19299860.

[He2015-26] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1。

[Zaeemzadeh2021-27] Zaeemzadeh, Alireza; Rahnavard, Nazanin; Shah, Mubarak (2021-11-01). “Norm-Preservation: Why Residual Networks Can Become Extremely Deep?”. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11): 3980–3990. arXiv:1805.07477. doi:10.1109/TPAMI.2020.2990339. ISSN 0162-8828.

[Resnet_emsemble-28] Veit, Andreas; Wilber, Michael; Belongie, Serge (20 May 2016). "Residual Networks Behave Like Ensembles of Relatively Shallow Networks". arXiv:1605.06431 [cs.CV]。

[:0-29] Noel, Mathew Mithra; L, Arunkumar; Trivedi, Advait; Dutta, Praneet (4 September 2021). "Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks". arXiv:2108.12943 [cs.LG]。

[30] Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011-06-14). “Deep Sparse Rectifier Neural Networks” (英語). PMLR: 315–323.

[31] LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). “Deep learning”. Nature 521 (7553): 436–444. Bibcode: 2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442.

[32] Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (16 October 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE]。

[33] Noel, Matthew Mithra; Bharadwaj, Shubham; Muthiah-Nakarajan, Venkataraman; Dutta, Praneet; Amali, Geraldine Bessie (7 November 2021). "Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons". arXiv:2111.04020 [cs.NE]。

[Balduzz2017-344-34] Balduzzi, David; Frean, Marcus; Leary, Lennox; Lewis, J P; Ma, Kurt Wan-Duo; McWilliams, Brian (2017-08-06). “The shattered gradients problem: if resnets are the answer, then what is the question?”. Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia: JMLR.org): 344. doi:10.5555/3305381.3305417.

[FOOTNOTEGlorotBengio2010-35] Glorot & Bengio 2010.

[FOOTNOTEHeZhangRenSun2015-36] He et al. 2015.

[FOOTNOTEGlorotBengio2010251,_253-37] Glorot & Bengio 2010, p. 251, 253.

[FOOTNOTEHeZhangRenSun20151030-38] He et al. 2015, p. 1030.

[FOOTNOTEHeZhangRenSun20151029,_1030-40] He et al. 2015, p. 1029, 1030.

[FOOTNOTEBjorckGomesSelmanWeinberger20187705-41] Bjorck et al. 2018, p. 7705.

[FOOTNOTEBjorckGomesSelmanWeinberger20187705-7706-42] Bjorck et al. 2018, p. 7705-7706.

[FOOTNOTESanturkarTsiprasIlyasMądry20182488-2489-43] Santurkar et al. 2018, p. 2488-2489.

[45] Ioffe, Sergey; Szegedy, Christian (2015-07-06). “Batch normalization: accelerating deep network training by reducing internal covariate shift”. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France: JMLR.org): 449，455. doi:10.5555/3045118.3045167.

[FOOTNOTESanturkarTsiprasIlyasMądry20182492-2493-46] Santurkar et al. 2018, p. 2492-2493.

[FOOTNOTEBjorckGomesSelmanWeinberger20187709-7710-47] Bjorck et al. 2018, p. 7709-7710.

[#1-48] Schmidhuber, Jürgen (2015). “Deep learning in neural networks: An overview”. Neural Networks 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637.

[49] Sven Behnke (2003). Hierarchical Neural Networks for Image Interpretation.. Lecture Notes in Computer Science. 2766. Springer

[50] “Sepp Hochreiter's Fundamental Deep Learning Problem (1991)”. people.idsia.ch. 2017年1月7日閲覧。

[1]

[17]

[35]

[36]

[40]

勾配消失問題 勾配消失問題の概要

勾配消失問題

注釈

出典

急上昇のことば

「勾配消失問題」の関連用語

勾配消失問題勾配消失問題の概要