ReLUとは？わかりやすく解説

正規化線形関数（せいきかせんけいかんすう、英: Rectified linear functionあるいは単にrectifier^{[注釈 1]}とも）は、引数の正の部分として定義される活性化関数であり、次のように表される。

f(x)=x^{+}=\max(0,x)

上式において、 $x$ はニューロンへの入力である。これはランプ関数（傾斜路関数）としても知られ、電気工学における半波整流回路と類似している。この活性化関数は、1993年にTangらによって新しいニューロンモデルとして最初に提案され、ニューラルネットワークの学習に適用し、その有効性が示された^[1]。2000年にHahnloseらによって強い生物学的動機と数学的正当化を持って、動的ネットワークへに導入された^[2]^[3]。2011年以前に広く使われていた活性化関数、例えばロジスティックシグモイド（これは確率論から発想を得ている。ロジスティック回帰を参照。）およびそのより実践的な^[4]機能的に同等な関数である双曲線正接関数と比較して、より深いネットワークのより良い訓練を可能にすることが2011年に初めて実証された^[5]。2018年現在、ディープニューラルネットワークで最も広く使われている活性化関数が、この正規化線形関数である^[6]^[7]。

正規化線形関数を利用したユニットは正規化線形ユニット（rectified linear unit、ReLU）とも呼ばれる^[8]。

正規化線形ユニットはディープニューラルネットワークを用いたコンピュータビジョン^[5]や音声認識^[9]^[10]に応用されている。

ソフトプラス

正規化線形関数に対する平滑化近似が解析関数

f(x)=\log(1+e^{x}),

であり、ソフトプラス（softplus）関数^[11]^[5]またはSmoothReLU関数^[12]と呼ばれる。ソフトプラスの導関数はロジスティック関数 $f'(x)={\frac {e^{x}}{1+e^{x}}}={\frac {1}{1+e^{-x}}}$ である。ロジスティック関数は、正規化線形関数の導関数であるヘヴィサイドの階段関数の平滑化近似である。

単変数ソフトプラスの多変数一般化は、第一独立変数をゼロとしたLogSumExp関数（英語版）

LSE_{0}^{+}(x_{1},...,x_{n}):=LSE(0,x_{1},...,x_{n})=\log \left(1+e^{x_{1}}+\cdots +e^{x_{n}}\right).

である。LogSumExp関数自身は

\mathrm {LSE} (x_{1},\dots ,x_{n})=\log \left(e^{x_{1}}+\cdots +e^{x_{n}}\right),

であり、その勾配はソフトマックス関数である。第一独立変数がゼロのソフトマックスは、ロジスティック関数の多変数一般化である。LogSumExpとソフトマックスはどちらも機械学習に用いられる。

派生物

ノイジーReLU

正規化線形ユニットはガウス雑音を含むように拡張できる。これはnoisy（雑音のある）ReLUと呼ばれ、以下の式を与える^[8]。

f(x)=\max(0,x+Y)

, with

Y\sim {\mathcal {N}}(0,\sigma (x))

Noisy ReLUはコンピュータビジョン問題のための制限ボルツマンマシン（英語版）において使用され、ある程度の成功を挙げている^[8]。

漏洩ReLU

漏洩（リーキー、Leaky）ReLUは、ユニットがアクティブでない時に小さな正の勾配を許容する^[10]。

f(x)={\begin{cases}x&{\mbox{if }}x>0\\0.01x&{\mbox{otherwise}}\end{cases}}

パラメトリックReLU

パラメトリックReLU (PReLU) はさらにこの着想を発展させ、漏れの係数を他のニューラルネットワークのパラメータと同時に学習する^[13]。

f(x)={\begin{cases}x&{\mbox{if }}x>0\\ax&{\mbox{otherwise}}\end{cases}}

$a\leq 1$ について、これは

f(x)=\max(x,ax)

と等価であり、ゆえに「マックスアウト」ネットワークとの関連があることに留意すべきである^[13]。

ELU

指数関数的線形ユニット（Exponential linear unit）は、学習を高速化するため、平均活性化をゼロに近づけようと試みる。ReLUよりも高い分類制度を得ることができると示されている^[14]。

$f(x)={\begin{cases}x&{\mbox{if }}x>0\\a(e^{x}-1)&{\mbox{otherwise}}\end{cases}}$

$a$ はチューニングされるハイパーパラメータ、 $a\geq 0$ は定数である。

Swish

Swish関数、もしくはシグモイド重み付き線形ユニット (Sigmoid-weighted linear unit) はReLUに類似した曲線を描くが、その曲線は連続的であり無限階微分可能である。ReLU及び先述した派生形よりも高い分類精度を得ることができると示されている^[15]^[7]。

f(x)=x*\sigma (x)

なお、ここでは $\sigma (x)$ は以下の数式によって定義される標準シグモイド関数を表す。

\sigma (x)={\tfrac {1}{1+e^{-x}}}

Funnel Activation

Funnel Activation、もしくはFReLUはコンピュータビジョンに特化した活性化関数であり、二次元の画像に対して適応することを前提として設計されたものである^[16]。性質上、Funnel Activationを一つの値を入力とする関数として表すことはできないが、記述の都合上以下の数式で表される。

f(x)=\max(x,\mathbb {T} (x))

なお、 $\mathbb {T} (x)$ は他のニューラルネットワークのパラメータと同時に学習されるDepthwise 畳み込みモジュールである。

長所

生物学的妥当性
- tanhの反対称性と比較して、片側。
疎な活性化
- 例えば、無作為に初期化されたネットワークでは、隠れユニットの約50%のみが活性化される（ゼロでない出力を持つ）。
より良い勾配伝搬
- 両方向に飽和するシグモイド活性化関数と比較して、勾配消失問題が少ない。
効率的計算
- 比較、加算、乗算のみ。
スケールによって影響を受けない
- $\max(0,ax)=a\max(0,x)\quad \mathrm {for} \;a\geq 0$ 。

正規化線形関数は、複数のコンピュータビジョン課題を学習するために教師ありで訓練されたニューラル抽象ピラミッド (Neural Abstraction Pyramid) において特異的興奮と非特異的抑制を分離するために用いられた^[17]。2011年^[5]、非線形関数としての正規化線形関数の使用は、教師なし事前学習を必要とせずに教師ありディープニューラルネットワークの訓練を可能にすることが示されている。正規化線形ユニットは、シグモイド関数または類似の活性化関数と比較して、大きく複雑なデータセット上のディープニューラル構造のより速く、効率的な訓練を可能にする。

潜在的な問題

原点において微分不可能
- しかしながら、その点以外ではどこでも微分可能であり、入力が0の点を埋めるために0または1の値を任意に選ぶことができる。
原点を中心としていない
有界でない
Dying ReLU問題
- ReLUニューロンは、実質的に全ての入力に対して不活性となる状態に入り込むことがあり得る。この状態において、勾配はこのニューロンを通って逆方向に流れないため、このニューロンは永久に不活性な状態で動かなくなり、「死」んでしまう。これは勾配消失問題の一種である。ある場合において、ネットワーク中の多くのニューロンが死状態で動かなることがありえて、これはモデル容量を著しく低下させる。この問題は高過ぎる学習率が設定されている時に典型的に起こる。代わりに、x = 0の左側に小さな正の勾配を割り当てるリーキーReLUを用いることによって問題を緩和することができる。

脚注

[脚注の使い方]

注釈

^ 整流器の意味

出典

^ Tang, Z.; Ishizuka, O.; Matsumoto, H. (1993). “A Model of Neurons with Unidirectional Linear Response”. IEICE Trans. on Fundamentals. E76-A (9): 1537–1540.
^ Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). “Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit”. Nature. 405: 947–951. doi:10.1038/35016072.
^ R Hahnloser; H.S. Seung (2001). Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks. NIPS 2001.
^ Yann LeCun; Leon Bottou; Genevieve B. Orr; Klaus-Robert Müller (1998). “Efficient BackProp” (PDF). In G. Orr and K. Müller (ed.). Neural Networks: Tricks of the Trade. Springer.
^ ^a ^b ^c ^d Xavier Glorot; Antoine Bordes; Yoshua Bengio (2011). Deep sparse rectifier neural networks (PDF). AISTATS. Rectifier and softplus activation functions. The second one is a smooth version of the first.
^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). “Deep learning”. Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442.
^ ^a ^b Ramachandran, Prajit; Zoph, Barret; Le, Quoc V. (16 October 2017). “Searching for Activation Functions”. arXiv:1710.05941 [cs.NE].
^ ^a ^b ^c Vinod Nair; Geoffrey Hinton (2010). Rectified Linear Units Improve Restricted Boltzmann Machines (PDF). ICML.
^ László Tóth (2013). Phone Recognition with Deep Sparse Rectifier Neural Networks (PDF). ICASSP.
^ ^a ^b Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models
^ Dugas, Charles; Bengio, Yoshua; Bélisle, François; Nadeau, Claude; Garcia, René (1 January 2000). “Incorporating second-order functional knowledge for better option pricing” (PDF). Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00). MIT Press: 451–457. Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex
^ “Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer”. Developer Guide for Intel® Data Analytics Acceleration Library (アメリカ英語). 2017. 2018年12月4日閲覧.
^ ^a ^b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification”. arXiv:1502.01852 [cs.CV].
^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp (2015). “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)”. arXiv:1511.07289 [cs.LG].
^ 「［活性化関数］シグモイド関数（Sigmoid function）とは？」『＠IT』。2021年11月28日閲覧。
^ Ma, Ningning; Zhang, Xiangyu; Sun, Jian (24 July 2020). “Funnel Activation for Visual Recognition”. arXiv:2007.11824 [cs.CV].
^ Behnke, Sven (2003). Hierarchical Neural Networks for Image Interpretation. Lecture Notes in Computer Science. Vol. 2766. Springer. doi:10.1007/b11963.

外部リンク

『ランプ関数（ReLU，正規化線形関数）』 - 高校数学の美しい物語

[1] 整流器の意味

[Hahnloser20002-2] Tang, Z.; Ishizuka, O.; Matsumoto, H. (1993). “A Model of Neurons with Unidirectional Linear Response”. IEICE Trans. on Fundamentals. E76-A (9): 1537–1540.

[Hahnloser2000-3] Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). “Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit”. Nature. 405: 947–951. doi:10.1038/35016072.

[Hahnloser2001-4] R Hahnloser; H.S. Seung (2001). Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks. NIPS 2001.

[5] Yann LeCun; Leon Bottou; Genevieve B. Orr; Klaus-Robert Müller (1998). “Efficient BackProp” (PDF). In G. Orr and K. Müller (ed.). Neural Networks: Tricks of the Trade. Springer.

[glorot2011-6] Xavier Glorot; Antoine Bordes; Yoshua Bengio (2011). Deep sparse rectifier neural networks (PDF). AISTATS. Rectifier and softplus activation functions. The second one is a smooth version of the first.

[7] LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). “Deep learning”. Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442.

[Ramachandran_Zoph_Le_2017-8] Ramachandran, Prajit; Zoph, Barret; Le, Quoc V. (16 October 2017). “Searching for Activation Functions”. arXiv:1710.05941 [cs.NE].

[nair2010-9] Vinod Nair; Geoffrey Hinton (2010). Rectified Linear Units Improve Restricted Boltzmann Machines (PDF). ICML.

[tothl2013-10] László Tóth (2013). Phone Recognition with Deep Sparse Rectifier Neural Networks (PDF). ICASSP.

[maas2014-11] Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models

[12] Dugas, Charles; Bengio, Yoshua; Bélisle, François; Nadeau, Claude; Garcia, René (1 January 2000). “Incorporating second-order functional knowledge for better option pricing” (PDF). Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00). MIT Press: 451–457. Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex

[13] “Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer”. Developer Guide for Intel® Data Analytics Acceleration Library (アメリカ英語). 2017. 2018年12月4日閲覧.

[prelu-14] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on Image Net Classification”. arXiv:1502.01852 [cs.CV].

[15] Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp (2015). “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)”. arXiv:1511.07289 [cs.LG].

[16] 「［活性化関数］シグモイド関数（Sigmoid function）とは？」『＠IT』。2021年11月28日閲覧。

[17] Ma, Ningning; Zhang, Xiangyu; Sun, Jian (24 July 2020). “Funnel Activation for Visual Recognition”. arXiv:2007.11824 [cs.CV].

[NeuralAbstractionPyramid-18] Behnke, Sven (2003). Hierarchical Neural Networks for Image Interpretation. Lecture Notes in Computer Science. Vol. 2766. Springer. doi:10.1007/b11963.

[注釈 1]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

ReLUとは？わかりやすく解説

正規化線形関数

ソフトプラス