畳み込みニューラルネットワーク畳み込みニューラルネットワークの概要

CNNは、その重み（行列の）共有構造と並進不変特性に基づいて、シフト不変（shift invariant）あるいは位置不変（space invariant）人工ニューラルネットワーク（SIANN）とも呼ばれている^[1]^[2]。

一般的な畳み込み処理は以下のように定式化される^[3]。 $C_{\mathrm {out} _{j}}$ はj番目の出力チャネルを、 $\star$ は相互相関関数を意味する。

$\mathrm {output} (C_{\mathrm {out} _{j}})=\mathrm {bias} (C_{\mathrm {out} _{j}})+\sum _{k=1}^{C_{\mathrm {in} }}\mathrm {weight} (C_{\mathrm {out} _{j}},k)\star \mathrm {input} (k)$

すなわち各出力チャネルごとに入力チャネル $k$ 枚分の畳み込みカーネル $\mathrm {weight} (C_{\mathrm {out} _{j}},k)$ が用意され、カーネルを用いた各入力チャネルの畳み込みの総和へバイアス $C_{\mathrm {out} _{j}}$ 項 $\mathrm {bias} (C_{\mathrm {out} _{j}})$ が付与され各チャネル出力となっている。式からわかるように、入力チャネル間は畳み込み処理ではなく和で計算され、また入力チャネル $\mathrm {input} (k)$ と畳みこまれるカーネルは出力チャネルごとに異なる。

カーネルはしばしばフィルタと呼ばれる^[4]。これは位置関係をもつ重みづけ和のスライド演算（畳み込み）がフィルタ適用と等価なことに由来する。

畳み込み処理自体は単純な線形変換である。出力のある1点を見ると局所以外の重みが全て0の全結合と等価であることからこれはわかる。多くのCNNでは畳み込み処理に引き続いてシグモイド関数やReLUなどの活性化関数による非線形変換をおこなう。^[要出典]

単純なCNNは順伝播型 (FFN)、すなわち浅い層から深い層へのみ結合をもつ。ただしCNNは2層間の結合様式を規定するクラスでありFFNと限らない^[要出典]。非FFN型CNNの一例として大局的に回帰結合をもち層間では畳み込みをおこなうRecurrent CNNが提唱されている^[5]。

CNNは画像・動画認識やレコメンダシステム^[6]、自然言語処理^[7]に応用されている。

脚注

^ Zhang, Wei (1988). “Shift-invariant pattern recognition neural network and its optical architecture”. Proceedings of annual conference of the Japan Society of Applied Physics.
^ Zhang, Wei (1990). “Parallel distributed processing model with local space-invariant interconnections and its optical architecture”. Applied Optics 29 (32).
^ “Conv2d — PyTorch 1.6.0 documentation”. pytorch.org. 2020年10月3日閲覧。
^ "convolved with its own set of filters" PyTorch 1.10 Conv1D
^ "we propose a recurrent CNN (RCNN) for object recognition by incorporating recurrent connections into each convolutional layer" p.3367 and "This work shows that it is possible to boost the performance of CNN by incorporating more facts of the brain. " p.3374 of Liang, et al. (2015). Recurrent Convolutional Neural Network for Object Recognition.
^ van den Oord, Aaron; Dieleman, Sander; Schrauwen, Benjamin (2013-01-01). Burges, C. J. C.. ed. Deep content-based music recommendation. Curran Associates, Inc.. pp. 2643–2651
^ Collobert, Ronan; Weston, Jason (2008-01-01). “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning”. Proceedings of the 25th International Conference on Machine Learning. ICML '08 (New York, NY, USA: ACM): 160–167. doi:10.1145/1390156.1390177. ISBN 978-1-60558-205-4.
^ "a 1×1 convolution called a pointwise convolution." Andrew (2017) MobileNets Arxiv
^ "In a group conv layer ..., input and output channels are divided into C groups, and convolutions are separately performed within each group." Saining (2017). Aggregated Residual Transformations for Deep Neural Networks. Arxiv
^ "groups controls the connections between inputs and outputs. ... At groups=1, all inputs are convolved to all outputs ... At groups= in_channels, each input channel is convolved with its own set of filters" PyTorch nn.Conv2d
^ "Depthwise convolution with one filter per input channel (input depth)" Andrew G. Howard. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Arxiv
^ "depthwise separable convolutions which is a form of factorized convolutions which factorize a standard convolution into a depthwise convolution and a 1×1 convolution called a pointwise convolution." Howard, et al. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
^ "we propose a recurrent CNN (RCNN) for object recognition by incorporating recurrent connections into each convolutional layer" Liang, et al. (2015). Recurrent Convolutional Neural Network for Object Recognition.
^ "予測の際に使用する有限長の過去のデータ点数 R は受容野 (receptive field) の大きさを表す．" 松本. (2019). WaveNetによる言語情報を含まない感情音声合成方式の検討. 情報処理学会研究報告.
^ "Effective Receptive Field (ERF): is the area of the original image that can possibly influence the activation of a neuron. ... ERF and RF are sometimes used interchangeably" Le. (2017). What are the Receptive, Effective Receptive, and Projective Fields of Neurons in Convolutional Neural Networks?. Arxiv.
^ "layer k ... R_k be the ERF ... f_k represent the filter size ... the final top-down equation: $R_{k,j}=\left(R_{k,j+1}-1\right)s_{j+1}+f_{j+1}$ "
^ Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). “Subject independent facial expression recognition with robust face detection using a convolutional neural network”. Neural Networks 16 (5): 555–559. doi:10.1016/S0893-6080(03)00115-1 2013年11月17日閲覧。.
^ Fukushima, K. (2007). “Neocognitron”. Scholarpedia 2 (1): 1717. doi:10.4249/scholarpedia.1717.
^ Fukushima, Kunihiko (1980). “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position”. Biological Cybernetics 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364 2013年11月16日閲覧。.
^ LeCun, Yann. “LeNet-5, convolutional neural networks”. 2013年11月16日閲覧。
^ 藤吉 2019, p. 293-294.
^ Dosovitskiy, et al. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.
^ Liu, et al. (2021). Pay Attention to MLPs. NeurIPS 2021.

[前の解説]

[続きの解説]

「畳み込みニューラルネットワーク」の続きの解説一覧

[:0-1] Zhang, Wei (1988). “Shift-invariant pattern recognition neural network and its optical architecture”. Proceedings of annual conference of the Japan Society of Applied Physics.

[:1-2] Zhang, Wei (1990). “Parallel distributed processing model with local space-invariant interconnections and its optical architecture”. Applied Optics 29 (32).

[3] “Conv2d — PyTorch 1.6.0 documentation”. pytorch.org. 2020年10月3日閲覧。

[4] "convolved with its own set of filters" PyTorch 1.10 Conv1D

[5] "we propose a recurrent CNN (RCNN) for object recognition by incorporating recurrent connections into each convolutional layer" p.3367 and "This work shows that it is possible to boost the performance of CNN by incorporating more facts of the brain. " p.3374 of Liang, et al. (2015). Recurrent Convolutional Neural Network for Object Recognition.

[6] van den Oord, Aaron; Dieleman, Sander; Schrauwen, Benjamin (2013-01-01). Burges, C. J. C.. ed. Deep content-based music recommendation. Curran Associates, Inc.. pp. 2643–2651

[7] Collobert, Ronan; Weston, Jason (2008-01-01). “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning”. Proceedings of the 25th International Conference on Machine Learning. ICML '08 (New York, NY, USA: ACM): 160–167. doi:10.1145/1390156.1390177. ISBN 978-1-60558-205-4.

[8] "a 1×1 convolution called a pointwise convolution." Andrew (2017) MobileNets Arxiv

[9] "In a group conv layer ..., input and output channels are divided into C groups, and convolutions are separately performed within each group." Saining (2017). Aggregated Residual Transformations for Deep Neural Networks. Arxiv

[10] "groups controls the connections between inputs and outputs. ... At groups=1, all inputs are convolved to all outputs ... At groups= in_channels, each input channel is convolved with its own set of filters" PyTorch nn.Conv2d

[11] "Depthwise convolution with one filter per input channel (input depth)" Andrew G. Howard. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Arxiv

[12] "depthwise separable convolutions which is a form of factorized convolutions which factorize a standard convolution into a depthwise convolution and a 1×1 convolution called a pointwise convolution." Howard, et al. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.

[:2-13] "we propose a recurrent CNN (RCNN) for object recognition by incorporating recurrent connections into each convolutional layer" Liang, et al. (2015). Recurrent Convolutional Neural Network for Object Recognition.

[14] "予測の際に使用する有限長の過去のデータ点数 R は受容野 (receptive field) の大きさを表す．" 松本. (2019). WaveNetによる言語情報を含まない感情音声合成方式の検討. 情報処理学会研究報告.

[15] "Effective Receptive Field (ERF): is the area of the original image that can possibly influence the activation of a neuron. ... ERF and RF are sometimes used interchangeably" Le. (2017). What are the Receptive, Effective Receptive, and Projective Fields of Neurons in Convolutional Neural Networks?. Arxiv.

[16] "layer k ... R_k be the ERF ... f_k represent the filter size ... the final top-down equation: $R_{k,j}=\left(R_{k,j+1}-1\right)s_{j+1}+f_{j+1}$ "

[robust_face_detection-17] Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). “Subject independent facial expression recognition with robust face detection using a convolutional neural network”. Neural Networks 16 (5): 555–559. doi:10.1016/S0893-6080(03)00115-1 2013年11月17日閲覧。.

[fukuneoscholar-18] Fukushima, K. (2007). “Neocognitron”. Scholarpedia 2 (1): 1717. doi:10.4249/scholarpedia.1717.

[intro-19] Fukushima, Kunihiko (1980). “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position”. Biological Cybernetics 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364 2013年11月16日閲覧。.

[LeCun-20] LeCun, Yann. “LeNet-5, convolutional neural networks”. 2013年11月16日閲覧。

[FOOTNOTE藤吉2019293-294-21] 藤吉 2019, p. 293-294.

[22] Dosovitskiy, et al. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.

[23] Liu, et al. (2021). Pay Attention to MLPs. NeurIPS 2021.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

機械学習およびデータマイニング

問題分類クラスタリング回帰異常検知相関ルール（英語版）強化学習構造化予測（英語版）特徴量設計（英語版）表現学習（英語版）オンライン学習（英語版）半教師あり学習（英語版）教師なし学習ランキング学習（英語版）文法獲得（英語版）
教師あり学習（分類 • 回帰）決定木（英語版）アンサンブル（バギング、ブースティング、ランダムフォレスト） k-NN 線形回帰単純ベイズニューラルネットワークロジスティック回帰パーセプトロン関連ベクトルマシン (RVM)（英語版）サポートベクトルマシン (SVM)
クラスタリング BIRCH（英語版）階層的（英語版） k平均法期待値最大化法 (EM) DBSCAN OPTICS（英語版）平均値シフト（英語版）
次元削減因子分析 CCA ICA LDA（英語版） NMF（英語版） PCA t-SNE
構造化予測（英語版）グラフィカルモデルベイジアンネットワーク CRF HMM
異常検知 k-NN 局所外れ値因子法
ニューラルネットワークオートエンコーダディープラーニング DeepDream 多層パーセプトロン RNN LSTM GRU 制約ボルツマンマシン（英語版） SOM CNN U-Net
強化学習 Q学習 SARSA 時間差分 (TD)（英語版）
理論偏りと分散のトレードオフ計算論的学習理論（英語版）経験損失最小化（英語版）オッカム学習（英語版） PAC学習統計的学習（英語版） VC理論（英語版）
学会・論文誌等 NIPS（英語版） ICML（英語版） ML（英語版） JMLR（英語版） ArXiv:cs.LG
全般統計学および機械学習の評価指標
表話編歴

畳み込みニューラルネットワーク 畳み込みニューラルネットワークの概要

畳み込みニューラルネットワーク

急上昇のことば

「畳み込みニューラルネットワーク」の関連用語

畳み込みニューラルネットワーク畳み込みニューラルネットワークの概要