こうばいしょうしつ‐もんだい〔コウバイセウシツ‐〕【勾配消失問題】
勾配消失問題
出典: フリー百科事典『ウィキペディア(Wikipedia)』 (2024/02/25 00:42 UTC 版)
注釈
出典
- ^ Okatani, Takayuki (2015). “On Deep Learning”. Journal of the Robotics Society of Japan 33 (2): 92–96. doi:10.7210/jrsj.33.92. ISSN 0289-1824 .
- ^ a b Basodi et al. 2020, p. 197.
- ^ Yang 2020, p. 53-54.
- ^ a b Yang 2020, p. 54.
- ^ Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 90-91,93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080 .
- ^ Deng 2012, p. 2-3,4.
- ^ Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.
- ^ Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”. In Kremer, S. C.; Kolen, J. F.. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. ISBN 0-7803-5369-2
- ^ Schmidhuber, Jürgen (2015-01-01). “Deep learning in neural networks: An overview” (英語). Neural Networks 61: 93-94. doi:10.1016/j.neunet.2014.09.003. ISSN 0893-6080 .
- ^ Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (2017-06-15). “Deep learning for computational chemistry” (英語). Journal of Computational Chemistry 38 (16): 1291–1307. arXiv:1701.04503. Bibcode: 2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810.
- ^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG]。
- ^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (2013-06-16). “On the difficulty of training recurrent neural networks”. Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA: JMLR.org): III–1310–III–1318. doi:10.5555/3042817.3043083 .
- ^ a b c Deng, Li (2014). “A tutorial survey of architectures, algorithms, and applications for deep learning” (英語). APSIPA Transactions on Signal and Information Processing 3 (1): 18. doi:10.1017/atsip.2013.9. ISSN 2048-7703 .
- ^ J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," Neural Computation, 4, pp. 234–242, 1992.
- ^ Deng 2012, p. 3.
- ^ Deng 2012, p. 4.
- ^ 川上玲「5分で分かる?!有名論文ナナメ読み」(pdf)『情報処理』第59巻第10号、情報処理学会、2018年10月15日、946頁。
- ^ Hinton, G. E.; Osindero, S.; Teh, Y. (2006). “A fast learning algorithm for deep belief nets”. Neural Computation 18 (7): 1527–1554. doi:10.1162/neco.2006.18.7.1527. PMID 16764513 .
- ^ Hinton, G. (2009). “Deep belief networks”. Scholarpedia 4 (5): 5947. Bibcode: 2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.
- ^ Hochreiter, Sepp; Schmidhuber, Jürgen (1997). “Long Short-Term Memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
- ^ Ribeiro, Antônio H.; Tiels, Koen; Aguirre, Luis A.; Schön, Thomas B. (2020-08-26). “Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness” (English). Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR: 2371 .
- ^ Graves, Alex; and Schmidhuber, Jürgen; Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), Advances in Neural Information Processing Systems 22 (NIPS'22), December 7th–10th, 2009, Vancouver, BC, Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552
- ^ Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. PMID 19299860.
- ^ a b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1。
- ^ a b c Zaeemzadeh, Alireza; Rahnavard, Nazanin; Shah, Mubarak (2021-11-01). “Norm-Preservation: Why Residual Networks Can Become Extremely Deep?”. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11): 3980–3990. arXiv:1805.07477. doi:10.1109/TPAMI.2020.2990339. ISSN 0162-8828 .
- ^ Veit, Andreas; Wilber, Michael; Belongie, Serge (20 May 2016). "Residual Networks Behave Like Ensembles of Relatively Shallow Networks". arXiv:1605.06431 [cs.CV]。
- ^ a b Noel, Mathew Mithra; L, Arunkumar; Trivedi, Advait; Dutta, Praneet (4 September 2021). "Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks". arXiv:2108.12943 [cs.LG]。
- ^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011-06-14). “Deep Sparse Rectifier Neural Networks” (英語). PMLR: 315–323 .
- ^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). “Deep learning”. Nature 521 (7553): 436–444. Bibcode: 2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442.
- ^ Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (16 October 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE]。
- ^ Noel, Matthew Mithra; Bharadwaj, Shubham; Muthiah-Nakarajan, Venkataraman; Dutta, Praneet; Amali, Geraldine Bessie (7 November 2021). "Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons". arXiv:2111.04020 [cs.NE]。
- ^ a b Balduzzi, David; Frean, Marcus; Leary, Lennox; Lewis, J P; Ma, Kurt Wan-Duo; McWilliams, Brian (2017-08-06). “The shattered gradients problem: if resnets are the answer, then what is the question?”. Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia: JMLR.org): 344. doi:10.5555/3305381.3305417 .
- ^ Glorot & Bengio 2010.
- ^ He et al. 2015.
- ^ Glorot & Bengio 2010, p. 251, 253.
- ^ He et al. 2015, p. 1030.
- ^ He et al. 2015, p. 1029, 1030.
- ^ Bjorck et al. 2018, p. 7705.
- ^ Bjorck et al. 2018, p. 7705-7706.
- ^ Santurkar et al. 2018, p. 2488-2489.
- ^ Ioffe, Sergey; Szegedy, Christian (2015-07-06). “Batch normalization: accelerating deep network training by reducing internal covariate shift”. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France: JMLR.org): 449,455. doi:10.5555/3045118.3045167 .
- ^ Santurkar et al. 2018, p. 2492-2493.
- ^ Bjorck et al. 2018, p. 7709-7710.
- ^ a b Schmidhuber, Jürgen (2015). “Deep learning in neural networks: An overview”. Neural Networks 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637.
- ^ Sven Behnke (2003). Hierarchical Neural Networks for Image Interpretation.. Lecture Notes in Computer Science. 2766. Springer
- ^ “Sepp Hochreiter's Fundamental Deep Learning Problem (1991)”. people.idsia.ch. 2017年1月7日閲覧。
[続きの解説]
「勾配消失問題」の続きの解説一覧
- 1 勾配消失問題とは
- 2 勾配消失問題の概要
- 3 勾配消失問題の発生
- 4 解決手法
- 5 参考文献
- 6 関連項目
- 勾配消失問題のページへのリンク