自己教師あり学習とは？わかりやすく解説

→「教師なし学習」および「教師あり学習」も参照

自己教師あり学習（じこきょうしありがくしゅう、英: self-supervised learning、SSL）とは、外部から提供されるラベルに依存せず、データ自体から生成される教師信号を用いてモデルを訓練する機械学習の枠組みである。ニューラルネットワークの文脈においては、入力データに内在する構造や関係性を活用し、有意義な学習シグナルを生み出すことを目指す。SSLのタスクは、データ中の重要な特徴や関係を捉える必要があるように設計されており、入力データは通常、ノイズの追加、切り取り、回転などの方法で拡張・変換され、関連するサンプルのペアが作られる。このペアの一方が入力として用いられ、もう一方が教師信号を与える材料として利用される。自己教師あり学習は、人間が物事の分類を学習する方法をより忠実に模倣する^[1]。

典型的には次の2段階で学習する。まず最初に、補助的（あるいは前段階）の分類タスク（pretext classification task）を疑似ラベルを用いて解き、このプロセスで得られた有用な特徴を使ってモデルのパラメータを初期化する^[2]^[3]。次に、教師あり学習または教師なし学習によって、実際のタスクが行われる^[4]^[5]^[6]。

自己教師あり学習は、近年有望な成果を上げており、音声処理で実用化され、Facebookなどの音声認識に使用されている^[7]。

種類

二値分類タスクの場合、トレーニングデータは正例と負例に分けることができる。正例とは、ターゲットと一致するものである。たとえば、鳥の識別を学習している場合、鳥が写っている写真が正例の学習データとなる。負例は、そうでないものをいう^[8]。

自己教師あり対照学習

自己教師あり対照学習（英: contrastive self-supervised learning）は教師ラベルを用いない対照学習である^[8]。正例を用意する代表的な方法に以下が挙げられる：

データ拡張（例: SimCLR）
co-occurrence（例: CPC）

また負例を用意する代表的な方法に以下が挙げられる：

ミニバッチ内他サンプル
non-co-occurrence

自己教師あり非対照学習

自己教師あり非対照学習（non-contrastive self-supervised learning、NCSSL）では、正例のみを使用する。直感に反して、NCSSLは自明解に到達するのではなく、有用な局所最小値に収束し、損失はゼロになる。二値分類の例では、NCSSLは通常、各例を正と分類するように学習する。効果的なNCSSLでは、ターゲット側に逆伝播しないオンライン側の追加の予測器を要する^[8]。

他の機械学習との比較

入力から分類された出力を生成することを目的とする限り、SSLは教師あり学習法である。そうではあるが、ラベル付きの入力と出力の組を明示的に使用する必要はない。代わりにデータから相関関係、データに埋め込まれたメタデータ、または入力に存在するドメイン知識が暗黙的かつ自律的に抽出される。データから生成されたこれらの監視信号は、トレーニングに使用することができる^[1]。

SSLは、サンプルデータにラベルを必要としない点で、教師なし学習法と似ている。ただし、教師なし学習とは異なり、データに内在する構造から学習するものではない。

半教師あり学習法（英語版）（semi-supervised learning）は、教師あり学習と教師なし学習を組み合わせたもので、学習データのごく一部にラベルを付ける必要がある^[3]。

転移学習では、あるタスクのために開発されたモデルを、別のタスクで再利用する^[9]。

オートエンコーダのトレーニングは、出力パターンが入力パターンの最適な再構成になる必要があるため、本質的には自己教師ありのプロセスを構成する。しかし、現在の専門用語では、「自己教師あり（self-supervised）」という用語は、プレテキストタスク（pretext task）のトレーニング設定に基づく分類タスクに関連している。これは、完全に自己完結したオートエンコーダのトレーニングの場合とは異なり、そのようなプレテキストタスクを（人間が）設計することになる^[10]。

強化学習では、損失の組み合わせによる自己教師あり学習により、状態に関する最も重要な情報のみが圧縮された形で保持される抽象的な表現を形成することがある^[11]。

事例

自己教師あり学習は、音声認識で特に適している。たとえば、Facebookは、音声認識のための自己教師ありアルゴリズムであるwav2vecを開発し、相互に構築し合う2つの深い畳み込みニューラルネットワークを使用している^[7]。

GoogleのBERTモデルは、検索クエリのコンテキストをよりよく理解するために使用されている^[12]。

OpenAIのGPTは、言語処理に使用できる自己回帰言語モデルである。テキストの翻訳や質問への回答などに使用することができる^[13]。

Bootstrap Your Own Latent（BYOL）はNCSSLであり、ImageNetや転位、半教師ありベンチマークで優れた結果を出した^[14]。

Yarowskyアルゴリズム（英語版）は、自然言語処理における自己教師あり学習の例である。ラベル付けされた少数の例から、多義語のどの語義がテキスト中の特定の部分で使用されているかを予測するように学習する。

FacebookのDirectPredは、勾配更新による学習の代わりに、予測器の重みを直接設定するNCSSLである^[8]。

脚注

^ ^a ^b Bouchard, Louis (2020年11月25日). “What is Self-Supervised Learning? | Will machines ever be able to learn like humans?” (英語). Medium. 2021年6月9日閲覧。
^ Doersch, Carl; Zisserman, Andrew (October 2017). “Multi-task Self-Supervised Visual Learning”. 2017 IEEE International Conference on Computer Vision (ICCV) (IEEE): 2070–2079. arXiv:1708.07860. doi:10.1109/iccv.2017.226. ISBN 978-1-5386-1032-9. https://doi.org/10.1109/iccv.2017.226.
^ ^a ^b Beyer, Lucas; Zhai, Xiaohua; Oliver, Avital; Kolesnikov, Alexander (October 2019). “S4L: Self-Supervised Semi-Supervised Learning”. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE): 1476–1485. arXiv:1905.03670. doi:10.1109/iccv.2019.00156. ISBN 978-1-7281-4803-8.
^ Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). “Unsupervised Visual Representation Learning by Context Prediction”. 2015 IEEE International Conference on Computer Vision (ICCV) (IEEE): 1422–1430. arXiv:1505.05192. doi:10.1109/iccv.2015.167. ISBN 978-1-4673-8391-2. https://doi.org/10.1109/iccv.2015.167.
^ Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (April 2018). “Fast and robust segmentation of white blood cell images by self-supervised learning”. Micron 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969.
^ Gidaris, Spyros; Bursuc, Andrei; Komodakis, Nikos; Perez, Patrick Perez; Cord, Matthieu (October 2019). “Boosting Few-Shot Visual Learning With Self-Supervision”. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE): 8058–8067. arXiv:1906.05186. doi:10.1109/iccv.2019.00815. ISBN 978-1-7281-4803-8.
^ ^a ^b “Wav2vec: State-of-the-art speech recognition through self-supervision” (英語). ai.facebook.com. 2021年6月9日閲覧。
^ ^a ^b ^c ^d “Demystifying a key self-supervised learning technique: Non-contrastive learning” (英語). ai.facebook.com. 2021年10月5日閲覧。
^ Littwin, Etai; Wolf, Lior (June 2016). “The Multiverse Loss for Robust Transfer Learning”. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE): 3957–3966. arXiv:1511.09033. doi:10.1109/cvpr.2016.429. ISBN 978-1-4673-8851-1. https://doi.org/10.1109/cvpr.2016.429.
^ Kramer, Mark A. (1991). “Nonlinear principal component analysis using autoassociative neural networks”. AIChE Journal 37 (2): 233–243. doi:10.1002/aic.690370209.
^ Francois-Lavet, Vincent; Bengio, Yoshua; Precup, Doina; Pineau, Joelle (2019). "Combined Reinforcement Learning via Abstract Representations". Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:1809.04506。
^ “Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing” (英語). Google AI Blog. 2021年6月9日閲覧。
^ Wilcox, Ethan; Qian, Peng; Futrell, Richard; Kohita, Ryosuke; Levy, Roger; Ballesteros, Miguel (2020). “Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models”. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Stroudsburg, PA, USA: Association for Computational Linguistics): 4640–4652. arXiv:2010.05725. doi:10.18653/v1/2020.emnlp-main.375.
^ Grill, Jean-Bastien; Strub, Florian; Altché, Florent; Tallec, Corentin; Richemond, Pierre H.; Buchatskaya, Elena; Doersch, Carl; Pires, Bernardo Avila; Guo, Zhaohan Daniel; Azar, Mohammad Gheshlaghi; Piot, Bilal (10 September 2020). "Bootstrap your own latent: A new approach to self-supervised Learning". arXiv:2006.07733 [cs.LG]。

外部リンク

Abshire, Chris (2018年4月6日). “Self-Supervised Learning: A Key to Unlocking Self-Driving Cars?” (英語). Toyota Ventures. 2021年10月5日閲覧。
Doersch, Carl; Zisserman, Andrew (October 2017). “Multi-task Self-Supervised Visual Learning”. 2017 IEEE International Conference on Computer Vision (ICCV): 2070–2079. arXiv:1708.07860. doi:10.1109/ICCV.2017.226. ISBN 978-1-5386-1032-9.
Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). “Unsupervised Visual Representation Learning by Context Prediction”. 2015 IEEE International Conference on Computer Vision (ICCV): 1422–1430. arXiv:1505.05192. doi:10.1109/ICCV.2015.167. ISBN 978-1-4673-8391-2.
Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (2018-04-01). “Fast and robust segmentation of white blood cell images by self-supervised learning” (英語). Micron 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969.
Shenwai, Tanushree (2021年9月30日). “Google AI's New Study Enhance Reinforcement Learning (RL) Agent's Generalization In Unseen Tasks Using Contrastive Behavioral Similarity Embeddings” (英語). MarkTechPost. 2021年10月7日閲覧。
Yarowsky, David (1995). “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (Cambridge, MA: Association for Computational Linguistics): 189–196. doi:10.3115/981658.981684. https://aclanthology.org/P95-1026/ 1 November 2022閲覧。.

[auto3-1] Bouchard, Louis (2020年11月25日). “What is Self-Supervised Learning? | Will machines ever be able to learn like humans?” (英語). Medium. 2021年6月9日閲覧。

[2] Doersch, Carl; Zisserman, Andrew (October 2017). “Multi-task Self-Supervised Visual Learning”. 2017 IEEE International Conference on Computer Vision (ICCV) (IEEE): 2070–2079. arXiv:1708.07860. doi:10.1109/iccv.2017.226. ISBN 978-1-5386-1032-9. https://doi.org/10.1109/iccv.2017.226.

[auto1-3] Beyer, Lucas; Zhai, Xiaohua; Oliver, Avital; Kolesnikov, Alexander (October 2019). “S4L: Self-Supervised Semi-Supervised Learning”. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE): 1476–1485. arXiv:1905.03670. doi:10.1109/iccv.2019.00156. ISBN 978-1-7281-4803-8.

[4] Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). “Unsupervised Visual Representation Learning by Context Prediction”. 2015 IEEE International Conference on Computer Vision (ICCV) (IEEE): 1422–1430. arXiv:1505.05192. doi:10.1109/iccv.2015.167. ISBN 978-1-4673-8391-2. https://doi.org/10.1109/iccv.2015.167.

[5] Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (April 2018). “Fast and robust segmentation of white blood cell images by self-supervised learning”. Micron 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969.

[6] Gidaris, Spyros; Bursuc, Andrei; Komodakis, Nikos; Perez, Patrick Perez; Cord, Matthieu (October 2019). “Boosting Few-Shot Visual Learning With Self-Supervision”. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE): 8058–8067. arXiv:1906.05186. doi:10.1109/iccv.2019.00815. ISBN 978-1-7281-4803-8.

[auto-7] “Wav2vec: State-of-the-art speech recognition through self-supervision” (英語). ai.facebook.com. 2021年6月9日閲覧。

[:0-8] “Demystifying a key self-supervised learning technique: Non-contrastive learning” (英語). ai.facebook.com. 2021年10月5日閲覧。

[9] Littwin, Etai; Wolf, Lior (June 2016). “The Multiverse Loss for Robust Transfer Learning”. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE): 3957–3966. arXiv:1511.09033. doi:10.1109/cvpr.2016.429. ISBN 978-1-4673-8851-1. https://doi.org/10.1109/cvpr.2016.429.

[10] Kramer, Mark A. (1991). “Nonlinear principal component analysis using autoassociative neural networks”. AIChE Journal 37 (2): 233–243. doi:10.1002/aic.690370209.

[11] Francois-Lavet, Vincent; Bengio, Yoshua; Precup, Doina; Pineau, Joelle (2019). "Combined Reinforcement Learning via Abstract Representations". Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:1809.04506。

[12] “Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing” (英語). Google AI Blog. 2021年6月9日閲覧。

[13] Wilcox, Ethan; Qian, Peng; Futrell, Richard; Kohita, Ryosuke; Levy, Roger; Ballesteros, Miguel (2020). “Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models”. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Stroudsburg, PA, USA: Association for Computational Linguistics): 4640–4652. arXiv:2010.05725. doi:10.18653/v1/2020.emnlp-main.375.

[14] Grill, Jean-Bastien; Strub, Florian; Altché, Florent; Tallec, Corentin; Richemond, Pierre H.; Buchatskaya, Elena; Doersch, Carl; Pires, Bernardo Avila; Guo, Zhaohan Daniel; Azar, Mohammad Gheshlaghi; Piot, Bilal (10 September 2020). "Bootstrap your own latent: A new approach to self-supervised Learning". arXiv:2006.07733 [cs.LG]。

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]


	All text is available under the terms of the GNU Free Documentation License. この記事は、ウィキペディアの自己教師あり学習 (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。 Weblio辞書に掲載されているウィキペディアの記事も、全てGNU Free Documentation Licenseの元に提供されております。
	Text is available under GNU Free Documentation License (GFDL). Weblio辞書に掲載されている「ウィキペディア小見出し辞書」の記事は、Wikipediaの教師なし学習 (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。

自己教師あり学習とは？ わかりやすく解説