gpt2とは？わかりやすく解説

Generative Pre-trained Transformer 2 (GPT-2)
	GPT-2がプロンプトを補完する様子を示すHugging FaceのWrite With Transformerのウェブサイト。Wikipediaのこの記事から得たテキストをプロンプトとして用いた。最初のプロンプトに続くハイライトされたテキストはすべて最初の補完候補から機械的に生成されたもので、それ以外の編集はない。
開発元	OpenAI
初版	2019年2月14日 (7年前)
リポジトリ	https://github.com/openai/gpt-2
前身	GPT-1
後継	GPT-3
種別	大規模言語モデル; Generative Pre-trained Transformer; 基盤モデル;
ライセンス	オープンソース
公式サイト	openai.com/blog/gpt-2-1-5b-release/
	テンプレートを表示

GPT-2（ジーピーティーツー、Generative Pre-trained Transformer 2）は、OpenAIが開発したGPTシリーズの大規模言語モデル。2019年 2月14日にオープンソースとしてリリースされた^[1]^[2]^[3]^[4]。

概要

GPT-2は、テキストを翻訳し、質問に答え、文章の要約を行い^[5] ^[1]、時には人間と見分けがつかないようなテキスト出力を生成するが^[6]、長い文章を生成すると繰り返したり意味不明な表現になることもある^[7]。GPT-2は、汎用生成的人工知能であり、特定のタスク（仕事）を行うための特別な訓練を受けてはおらず、これらのタスクを行う能力は、任意の順序で次の項目を正確に合成する一般的な能力の延長線上にある^[8]^[5]。GPT-2は、OpenAIの2018年版GPTモデルの「スケールアップ版」として構築され^[9]、パラメータ数と訓練用データセットがともに10倍に増加した^[1]。

GPTアーキテクチャは、ディープラーニング（深層学習）によるニューラルネットワーク、具体的には「トランスフォーマー（Transformer）モデル」を実装しており^[9]、これまでの回帰型や畳み込み型のアーキテクチャの代わりに「アテンション」を使用している^[10]^[11]。アテンション機構により、モデルは、入力テキストの中から最も関連性が高いと予測される部位に選択的に焦点を当てることができる^[12]^[13]。このモデルでは、並列化を大幅に向上させることができ、RNN／CNN／LSTMに基づくモデルのこれまでのベンチマークよりも優れた性能を発揮している^[9]。

OpenAIは、2019年11月、GPT-2言語モデルの完全版（15億個のパラメータを含む）を公開した^[14]。GPT-2に続いて、1,750億個のパラメータを含むGPT-3^[15]が、2020年に公開される予定だった^[16]（そのソースコードは公開されていない）。GPT-3へのアクセスは、OpenAIとマイクロソフトが提供するAPIを通じてのみ提供されている^[17]。

能力

GPT-2はGPTのスケールアップ版として作成され、パラメータ数とデータセットサイズをいずれも10倍にしている^[8]^[9]^[1]。双方とも教師なしのTransformerモデルで、一連のトークンの並びから次の単語を予測してテキストを生成するように訓練された。GPT-2モデルは15億のパラメータを持ち、800万のウェブページのデータセットで訓練が行われた^[8]。GPT-2は、テキストサンプル中の一連の単語を解釈し、最も可能性の高い次の単語を予測するという非常に単純な基準で強化され、追加される単語を予測し続けることで完全な文や段落を生成し、自然言語で完全に理解できる（そして意味論的に意味を持つ）文を生成する^[8]。特に、GPT-2は、ゼロショット設定（英語版）でのタスクに対する性能で評価された。

データセット

GPT-2は新規に開発された WebText コーパスをデータセットとして利用している。

WebTextコーパス

WebText コーパスは約800万のウェブページから抽出された高品質自然言語テキストコーパスである^[18]。

GPT-2はゼロショット推論可能な基盤モデルを意図して開発された。個別タスクを明示的に学習せずゼロショットで推論するには、学習用テキスト内に様々なタスクの具体例が（タスクラベル無しで）含まれている必要があると考えられる^[19]。一方で質の低いテキストはモデルの精度を落とすため^[20]、コモン・クロールのような無作為収集されたコーパスは利用できない^[21]。これらの問題を解決するためにGPT-2論文で開発されたコーパスが WebText コーパスである。

WebText は人間によるキュレーションを品質向上に利用している^[22]。まずRedditで3回以上賛成票を受けたリンク先ウェブページを一定品質のテキストとみなし^[23]、Wikipedia記事の削除（他の多くのデータセットに含まれているので過剰適合の原因となる可能性があった）・重複文章の除去・ヒューリスティックによるクリーニングを経て、最終的に約800万のウェブページから抽出された約40GBの自然言語テキストをWebTextとしている^[24]。

モデル

GPT-2のモデルアーキテクチャはGPT-1のマイナーチェンジ版である。アーキテクチャ上の変更点は以下の通り：

公開

GPT-2は、2019年2月14日に初めて発表された。2019年2月のThe Vergeに掲載されたJames Vincentによる記事では「（このプログラムが）作り出す文章は、通常、人間ではないと容易に判別できる」ものの、言語生成プログラムの「これまでで、もっともわくわくする例の一つ」であることに変わりはないと述べた^[35]。

偽の見出しを付けると、あとは偽の引用や統計を交えた残りを書いてくれる。短編小説の最初の行を入力すると、登場人物に次に何が起こるか教えてくれる。適切なプロンプト（命令）を入力すれば、ファン・フィクションだって書ける^[35]。

ガーディアン紙はこの出力を「もっともらしい新聞の散文」と表現し^[7]、VoxのKelsey Piperは「私がこれまで見た中で最もクールなAIシステムの一つは、私を失業に追い込むものかもしれない」と述べている^[36]。GPT-2の柔軟性は、The Vergeによれば「印象的」と評され、具体的には、言語間でのテキスト翻訳、長文の記事の要約、雑学的な質問へ回答などの能力が注目された^[35]。

修正チューリングテストを用いたアムステルダム大学の研究では、少なくともいくつかのシナリオで、参加者はGPT-2が生成した詩と人間が書いた詩を見分けられないことが分かった^[37]。

制限と部分公開

「Skub」は実在する製品ではないが、DistilGPT2で使用されている縮小サイズのモデルでさえ、賛成と反対の両側からもっともらしい議論を作り出すことができる。

これまでOpenAIのモデルはすぐに一般公開されていたが、2019年2月の発表では、悪用される危険性があるとして^[7]、GPT-2のソースコードの公開を当初拒否していた^[7]。発表時には、一部の報道関係者にのみ、モデルへの制限付きアクセス（ソースコード自体ではなく、入力でき、出力を提供するインターフェース）が許可された。よく言われるのは、生成されたテキストは通常まったく新しいものなので、スパマー（英語版）が自動フィルターを回避するために悪用する可能性があるという正当化の理由である。OpenAIは、GPT-2を微調整して「肯定的または否定的な製品レビューを永遠に生成する」バージョンを実演した^[7]。もう一つの問題は、GPT-2を使用すると、わいせつあるいは人種差別的なテキストが生成される可能性があることである。ジェレミー・ハワードなどの研究者は「この技術は、Twitterや電子メール、そしてウェブを、合理的な響きを持って文脈に沿った散文で完全に埋め尽し、他のすべての発言をかき消すようなものであり、フィルタリングは不可能になる」と警告した^[35]。アレン人工知能研究所（英語版）は、GPT-2に呼応して「ニューラルフェイクニュース」を検出するツールを発表した^[38]。

しかし、意見は分かれた。2019年2月のThe Vergeの記事は、GPT-2がもたらす脅威は誇張されていると論じ^[39]、カリフォルニア工科大学の教授でNvidiaの機械学習研究ディレクターであるAnima Anandkumarは、OpenAIが言うような脅威をもたらす能力がGPT-2にあるという証拠はなく、彼らがしたことは「オープンとは正反対」だと述べ、完全モデルの公開を拒否したことを「悪意のあるたわごと（英語版）」とみなした^[39]。The Gradient紙は、OpenAIに対してモデルの公開を促す公開書簡を発表し、テキスト生成AIがもたらす脅威を印刷機のそれと比較し「混乱をもたらす可能性があったものの、（幸いにも）現代社会を破壊しなかった技術」としてPhotoshopを例に挙げた^[40]^[41]。

30年後、Photoshopは高校生が使えるほど簡単で、動詞として広く使われているにもかかわらず、社会は比較的無事ですんでいる。なぜか？それは、誰もがPhotoshopを知っているからこそである^[40]。

774M公開

OpenAIは、完全な学習済みモデルや、訓練用コーパスを公開しなかったが、過去の出版物におけるその手法の説明（および基礎となる技術の無償での入手性）により、GPT-2は自由ソフトウェアとして他者が複製することが可能であった。そのような複製の一つ、OpenGPT-2は、OpenWebTextと呼ばれる自由ライセンス版のWebTextと組み合わせて2019年8月に公開された。OpenGPT-2のクラウドコンピューティング費用は約50,000ドルと提示された^[42]。

2019年8月20日、OpenAIは、7億7,400万のパラメータ（15億パラメータの完全モデルの約半分の規模）を持つGPT-2の縮小版を公開した^[3]。

完全版1.5B公開

しかし、GPT-2が広範な悪用につながるのではないかという当初の懸念は、現実のものとならなかった。The Vergeは「AI技術がある種の『情報世紀末（infopocalypse）』をもたらすという主張には懐疑的な理由がある。まず第一に、わずかなコストでもっともらしい文章を大量に生成できるプログラム、すなわち人間が既に存在している。」と述べている^[43]。2019年11月までに、OpenAIは「これまでのところ悪用された強い証拠は見られない」と述べ、2019年11月5日に15億のパラメータを持つ完全版を公開した^[4]^[14]。

限界

GPT-2は、さまざまなシナリオに対してテーマに沿ったテキストを生成することができる。たとえば、ドナルド・トランプがアニメキャラクターの惣流・アスカ・ラングレーを賞賛する演説をしたというCNNの記事のような非現実的なものでさえ生成することができる。ここでは、1.5Bの完全モデルであっても、第2段落では文法の劣化が始まり、最終的には意味不明な一文が何度も繰り返される出力など、出力が長くなるにつれて無意味で反復的なテキストを生成する傾向が見られる。

GPT-2の自然言語テキストの生成能力は、おおむね高く評価されているが、特に段落数が2段を超える長いテキストを生成する場合には、その欠点も指摘されている。Voxは「散文はかなり大まかで、ときおり非合理的なこともあり、記事が長くなればなるほど一貫性が失われる」と述べている^[36]。The Vergeも同様に、GPT-2の文章は長いサンプルになると「話題がそれる」傾向があり、首尾一貫性に欠けると指摘した^[35]。ウェブサイト「The Register（英語版）」は、「それを読んだ人間は、しばらくすると、何かが起きていることに気づくはずだ」と評し「GPT-2は、情報を抽出し取りだすためにアルゴリズムに依存する他のシステムと同様、質問には答えていない」と述べている^[32]。

GPT-2を導入するには多くの資源が必要で、完全版モデルの大きさは5ギガバイトを超えるため、アプリケーションにローカルに組み込むことが難しく、また大量のメモリー（RAM）を消費する。また、1回の予測を行うと「CPUを100%の使用率で数分間占有することがある」ほか、GPU処理でも「1回の予測に数秒かかることがある」^[6]。これらの問題を軽減するために、Hugging Faceは、知識蒸留を使用して、「いくつかの品質ベンチマークで数ポイント低い」ものの、「33%小さく、2倍速い」小型モデルを作成するDistilGPT2を開発した^[6]。

実装とその後の研究

ジャーナリストによって報じられたGPT-2の応用として、ニュース記事などの文章を人間が書くことを補助するなどが挙げられている^[7]。GPT-2は、製品版の公開以前から、さまざまなアプリケーションやサービス、それにエンターテインメントに利用されていた。2019年6月にはRedditのサイト内に「r/SubSimulatorGPT2」というコミュニティ（サブレディット）が作られ、さまざまなサブレディットで訓練したGPT-2の実例（インスタンス）が投稿し、互いのコメントに返信することで「r/Bitcoinが擬人化したAIと、r/ShittyFoodPornの機械学習に由来する霊が議論する」状況が作られた^[43]。同年7月までに、GPT-2に基づいて、さまざまなプログラミング言語のコード行を自動補完するソフトウェアが公開され、ユーザーから「ゲームチェンジャー（トレンドを変えるできごと）」と評された^[44]。

2019年には、GPT-2を利用し、ユーザーの入力に基づいて動的なテキストアドベンチャーを提供するAI Dungeon（英語版）が発表された^[45]。2021年現在、AI Dungeonは、オプションの有料アップグレードとしてGPT-3の最大リリースAPIへのアクセスを提供し、無料版ではGPT-3の2番目に大きなリリースを使用した^[46]。AI Dungeonを中心に設立されたLatitudeは、2021年に開業資金（英語版）330万ドルを調達した^[41]。いくつかのウェブサイトでは、GPT-2やその他のTransformerモデルのさまざまなインスタンスの対話的なデモンストレーションを公開している^[47]^[48]^[49]。

2021年2月、問題を抱えたティーンエイジャー向けの危機管理センターが、カウンセラーが10代の模擬患者と会話してトレーニングするために、GPT-2由来のチャットボットを使用開始すると発表した（これは純粋に内部の訓練目的での使用で、GPT-2は実在のティーンエイジャーとは会話しなかった）^[50]。

GPT1 - 3の比較

GPTシリーズの比較
	アーキテクチャ	パラメータ数	訓練用データ
GPT-1	12層、12ヘッドのTransformerデコーダ（エンコーダなし）、次いで線形softmax	1.2億	BookCorpus: 4.5 GBのテキスト、さまざまなジャンルの未発表小説7000冊分^[51]
GPT-2	GPT-1 変種	15億^[28]	WebTextコーパス (40 GB)
GPT-3	GPT-2, ただしスケーリングが大きく変更された	1750億	570 GBの平文、4,000億のトークン。主にCommonCrawl、WebText、英語版Wikipedia、2つの書籍コーパス（Books1、Books2）

脚注

[脚注の使い方]

1 2 3 4 5 “Better Language Models and Their Implications”. OpenAI (2019年2月14日). 2020年12月19日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
↑ Piper, Kelsey (2019年5月15日). “A poetry-writing AI has just been unveiled. It's ... pretty good.”. Vox. 2020年11月7日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
1 2 Johnson, Khari (2019年8月20日). “OpenAI releases curtailed version of GPT-2 language model”. VentureBeat. 2020年12月18日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
1 2 Vincent, James (2019年11月7日). “OpenAI has published the text-generating AI it said was too dangerous to share”. The Verge. 2020年6月11日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
1 2 Hegde, Chaitra; Patil, Shrikumar (9 June 2020). “Unsupervised Paraphrase Generation using Pre-trained Language Models”. arXiv:2006.05477 [cs.CL].
1 2 3 Kaiser, Caleb (2020年1月31日). “Too big to deploy: How GPT-2 is breaking servers”. Towards Data Science. 2020年2月15日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
1 2 3 4 5 6 Hern, Alex (2019年2月14日). “New AI fake text generator may be too dangerous to release, say creators”. The Guardian. 2019年2月14日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
1 2 3 4 5 6 7 Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). Language models are unsupervised multitask learners. 1. オリジナルの6 February 2021時点におけるアーカイブ。 2020年12月19日閲覧。.
1 2 3 4 “Improving Language Understanding by Generative Pre-Training”. OpenAI. pp. 12 (2018年6月11日). 2021年1月26日時点のオリジナルよりアーカイブ。2021年1月23日閲覧。
↑ Polosukhin, Illia; Kaiser, Lukasz; Gomez, Aidan N.; Jones, Llion; Uszkoreit, Jakob; Parmar, Niki; Shazeer, Noam; Vaswani, Ashish (2017年6月12日). “Attention Is All You Need”. arXiv:1706.03762 [cs.CL].
↑ Olah, Chris; Carter, Shan (8 September 2016). “Attention and Augmented Recurrent Neural Networks”. Distill 1 (9). doi:10.23915/distill.00001. オリジナルの22 December 2020時点におけるアーカイブ。 2021年1月22日閲覧。.
↑ Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (1 September 2014). “Neural Machine Translation by Jointly Learning to Align and Translate”. arXiv:1409.0473 [cs.CL].
↑ Luong, Minh-Thang; Pham, Hieu; Manning, Christopher D. (17 August 2015). “Effective Approaches to Attention-based Neural Machine Translation”. arXiv:1508.04025 [cs.CL].
1 2 “GPT-2: 1.5B Release” (英語). OpenAI (2019年11月5日). 2019年11月14日時点のオリジナルよりアーカイブ。2019年11月14日閲覧。
↑ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (July 22, 2020). “Language Models are Few-Shot Learners”. arXiv:2005.14165 [cs.CL].
↑ Arram (2020年7月9日). “GPT-3: An AI that's eerily good at writing almost anything”. Arram Sabeti. 2020年7月20日時点のオリジナルよりアーカイブ。2020年7月31日閲覧。
↑ Hao, Karen (September 23, 2020). “OpenAI is giving Microsoft exclusive access to its GPT-3 language model” (英語). MIT Technology Review 2020年9月25日閲覧. ""The companies say OpenAI will continue to offer its public-facing API, which allows chosen users to send text to GPT-3 or OpenAI’s other models and receive its output. Only Microsoft, however, will have access to GPT-3’s underlying code, allowing it to embed, repurpose, and modify the model as it pleases.""
↑ "a new dataset of millions of webpages called WebText ... which emphasizes document quality." Radford. (2019). Language Models are Unsupervised Multitask Learners.
↑ "Our approach motivates building as large and diverse a dataset as possible in order to collect natural language demonstrations of tasks in as varied of domains and contexts as possible." Radford. (2019)
↑ Trinh, Trieu H.; Le, Quoc V. (7 June 2018). “A Simple Method for Commonsense Reasoning”. arXiv:1806.02847 [cs.CL].
↑ "Common Crawl ... they have significant data quality issues ... We observed similar data issues in our initial experiments with Common Crawl." Radford. (2019)
↑ "emphasizes document quality. To do this we only scraped web pages which have been curated/filtered by humans." Radford. (2019)
↑ "we scraped all outbound links from Reddit, a social media platform, which received at least 3 karma." Radford. (2019)
↑ "a preliminary version of WebText ... which ... contains slightly over 8 million documents for a total of 40 GB of text." Radford. (2019)
↑ "Layer normalization ... was moved to the input of each sub-block" Radford. (2019)
↑ "an additional layer normalization was added after the final self-attention block." Radford. (2019)
↑ "A modified initialization which accounts for the accumulation on the residual path with model depth ... scale the weights of residual layers at initialization by a factor of 1/√N where N is the number of residual layers." Radford. (2019)
1 2 "Our largest model, GPT-2, is a 1.5B parameter Transformer" Radford. (2019)
↑ "The vocabulary is expanded to 50,257." Radford. (2019)
↑ "We also increase the context size from 512 to 1024 tokens" Radford. (2019)
↑ "a larger batchsize of 512 is used." Radford. (2019)
1 2 Quach, Katyanna (2019年2月14日). “Roses are red, this is sublime: We fed OpenAI's latest chat bot a classic Reg headline”. The Register. 2021年3月9日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
1 2 “The Staggering Cost of Training SOTA AI Models”. Synced (2019年6月27日). 2020年11月24日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ Wiggers, Kyle (2020年3月23日). “Google open-sources framework that reduces AI training costs by up to 80%”. VentureBeat. 2020年11月26日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
1 2 3 4 5 6 Vincent, James (2019年2月14日). “OpenAI's new multitalented AI writes, translates, and slanders”. The Verge. 2020年12月18日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
1 2 3 Piper, Kelsey (2019年2月14日). “An AI helped us write this article”. Vox. 2020年11月8日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。
↑ Köbis, Nils; Mossink, Luca D. (1 January 2021). “Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry”. Computers in Human Behavior 114: 106553. doi:10.1016/j.chb.2020.106553.
↑ Schwartz, Oscar (2019年7月4日). “Could 'fake text' be the next global political threat?”. The Guardian. オリジナルの2019年7月16日時点におけるアーカイブ。 2019年7月16日閲覧。
1 2 Vincent, James (2019年2月21日). “AI researchers debate the ethics of sharing potentially harmful programs”. The Verge. 2021年2月9日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
1 2 Zhang, Hugh (2019年2月19日). “OpenAI: Please Open Source Your Language Model”. The Gradient. 2021年1月28日時点のオリジナルよりアーカイブ。2021年2月28日閲覧。
1 2 Ha, Anthony (2021年2月4日). “AI Dungeon-maker Latitude raises $3.3M to build games with 'infinite' story possibilities”. TechCrunch. 2021年2月21日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ “OpenGPT-2: We Replicated GPT-2 Because You Can Too”. Noteworthy (2019年8月22日). 2021年2月27日閲覧。
1 2 Vincent, James (2019年6月6日). “There's a subreddit populated entirely by AI personifications of other subreddits”. The Verge. 2021年2月21日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ Vincent, James (2019年7月24日). “This AI-powered autocompletion software is Gmail's Smart Compose for coders”. The Verge. 2021年3月9日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ Olson, Mathew (2019年12月17日). “AI Dungeon 2, the Text Adventure Where You Can do Nearly Anything, Is Now on Mobile”. 2020年9月20日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ Nelius, Joanna (2020年8月3日). “This AI-Powered Choose-Your-Own-Adventure Text Game Is Super Fun and Makes No Sense”. Gizmodo. 2021年2月28日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ “Write With Transformer”. 2019年12月4日閲覧。
↑ “Talk to Transformer”. 2019年12月4日閲覧。
↑ “CreativeEngines”. 2021年6月25日閲覧。
↑ “An AI is training counselors to deal with teens in crisis”. MIT Technology Review (2021年2月26日). 2021年2月27日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。
↑ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. pp. 19–27. arXiv:1506.06724.

外部リンク

GPT‑2: 1.5B release - OpenAI（英語）

この項目は、ソフトウェアに関連した書きかけの項目です。この項目を加筆・訂正などしてくださる協力者を求めています（PJ:コンピュータ/P:コンピュータ）。

[openai-1] 1 2 3 4 5 “Better Language Models and Their Implications”. OpenAI (2019年2月14日). 2020年12月19日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[voxxy2-2] Piper, Kelsey (2019年5月15日). “A poetry-writing AI has just been unveiled. It's ... pretty good.”. Vox. 2020年11月7日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[vb-3] 1 2 Johnson, Khari (2019年8月20日). “OpenAI releases curtailed version of GPT-2 language model”. VentureBeat. 2020年12月18日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[verge2-4] 1 2 Vincent, James (2019年11月7日). “OpenAI has published the text-generating AI it said was too dangerous to share”. The Verge. 2020年6月11日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[badpaper-5] 1 2 Hegde, Chaitra; Patil, Shrikumar (9 June 2020). “Unsupervised Paraphrase Generation using Pre-trained Language Models”. arXiv:2006.05477 [cs.CL].

[tds2-6] 1 2 3 Kaiser, Caleb (2020年1月31日). “Too big to deploy: How GPT-2 is breaking servers”. Towards Data Science. 2020年2月15日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[guardian-7] 1 2 3 4 5 6 Hern, Alex (2019年2月14日). “New AI fake text generator may be too dangerous to release, say creators”. The Guardian. 2019年2月14日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[gpt2paper-8] 1 2 3 4 5 6 7 Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). Language models are unsupervised multitask learners. 1. オリジナルの6 February 2021時点におけるアーカイブ。 2020年12月19日閲覧。.

[gpt1paper-9] 1 2 3 4 “Improving Language Understanding by Generative Pre-Training”. OpenAI. pp. 12 (2018年6月11日). 2021年1月26日時点のオリジナルよりアーカイブ。2021年1月23日閲覧。

[attention-10] Polosukhin, Illia; Kaiser, Lukasz; Gomez, Aidan N.; Jones, Llion; Uszkoreit, Jakob; Parmar, Niki; Shazeer, Noam; Vaswani, Ashish (2017年6月12日). “Attention Is All You Need”. arXiv:1706.03762 [cs.CL].

[attentionRNNs-11] Olah, Chris; Carter, Shan (8 September 2016). “Attention and Augmented Recurrent Neural Networks”. Distill 1 (9). doi:10.23915/distill.00001. オリジナルの22 December 2020時点におけるアーカイブ。 2021年1月22日閲覧。.

[jointly-12] Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (1 September 2014). “Neural Machine Translation by Jointly Learning to Align and Translate”. arXiv:1409.0473 [cs.CL].

[effective-13] Luong, Minh-Thang; Pham, Hieu; Manning, Christopher D. (17 August 2015). “Effective Approaches to Attention-based Neural Machine Translation”. arXiv:1508.04025 [cs.CL].

[15Brelease-14] 1 2 “GPT-2: 1.5B Release” (英語). OpenAI (2019年11月5日). 2019年11月14日時点のオリジナルよりアーカイブ。2019年11月14日閲覧。

[gpt3paper-15] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (July 22, 2020). “Language Models are Few-Shot Learners”. arXiv:2005.14165 [cs.CL].

[Arram_20200709-16] Arram (2020年7月9日). “GPT-3: An AI that's eerily good at writing almost anything”. Arram Sabeti. 2020年7月20日時点のオリジナルよりアーカイブ。2020年7月31日閲覧。

[GPT3microsoft-17] Hao, Karen (September 23, 2020). “OpenAI is giving Microsoft exclusive access to its GPT-3 language model” (英語). MIT Technology Review 2020年9月25日閲覧. ""The companies say OpenAI will continue to offer its public-facing API, which allows chosen users to send text to GPT-3 or OpenAI’s other models and receive its output. Only Microsoft, however, will have access to GPT-3’s underlying code, allowing it to embed, repurpose, and modify the model as it pleases.""

[18] "a new dataset of millions of webpages called WebText ... which emphasizes document quality." Radford. (2019). Language Models are Unsupervised Multitask Learners.

[19] "Our approach motivates building as large and diverse a dataset as possible in order to collect natural language demonstrations of tasks in as varied of domains and contexts as possible." Radford. (2019)

[commoncrawl-20] Trinh, Trieu H.; Le, Quoc V. (7 June 2018). “A Simple Method for Commonsense Reasoning”. arXiv:1806.02847 [cs.CL].

[21] "Common Crawl ... they have significant data quality issues ... We observed similar data issues in our initial experiments with Common Crawl." Radford. (2019)

[22] "emphasizes document quality. To do this we only scraped web pages which have been curated/filtered by humans." Radford. (2019)

[23] "we scraped all outbound links from Reddit, a social media platform, which received at least 3 karma." Radford. (2019)

[24] "a preliminary version of WebText ... which ... contains slightly over 8 million documents for a total of 40 GB of text." Radford. (2019)

[25] "Layer normalization ... was moved to the input of each sub-block" Radford. (2019)

[26] "an additional layer normalization was added after the final self-attention block." Radford. (2019)

[27] "A modified initialization which accounts for the accumulation on the residual path with model depth ... scale the weights of residual layers at initialization by a factor of 1/√N where N is the number of residual layers." Radford. (2019)

[:0-28] 1 2 "Our largest model, GPT-2, is a 1.5B parameter Transformer" Radford. (2019)

[29] "The vocabulary is expanded to 50,257." Radford. (2019)

[30] "We also increase the context size from 512 to 1024 tokens" Radford. (2019)

[31] "a larger batchsize of 512 is used." Radford. (2019)

[register-32] 1 2 Quach, Katyanna (2019年2月14日). “Roses are red, this is sublime: We fed OpenAI's latest chat bot a classic Reg headline”. The Register. 2021年3月9日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[staggering-33] 1 2 “The Staggering Cost of Training SOTA AI Models”. Synced (2019年6月27日). 2020年11月24日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[vb2-34] Wiggers, Kyle (2020年3月23日). “Google open-sources framework that reduces AI training costs by up to 80%”. VentureBeat. 2020年11月26日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[verge1-35] 1 2 3 4 5 6 Vincent, James (2019年2月14日). “OpenAI's new multitalented AI writes, translates, and slanders”. The Verge. 2020年12月18日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[voxxy-36] 1 2 3 Piper, Kelsey (2019年2月14日). “An AI helped us write this article”. Vox. 2020年11月8日時点のオリジナルよりアーカイブ。2020年12月19日閲覧。

[37] Köbis, Nils; Mossink, Luca D. (1 January 2021). “Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry”. Computers in Human Behavior 114: 106553. doi:10.1016/j.chb.2020.106553.

[neuralfakesnooze-38] Schwartz, Oscar (2019年7月4日). “Could 'fake text' be the next global political threat?”. The Guardian. オリジナルの2019年7月16日時点におけるアーカイブ。 2019年7月16日閲覧。

[ethics-39] 1 2 Vincent, James (2019年2月21日). “AI researchers debate the ethics of sharing potentially harmful programs”. The Verge. 2021年2月9日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[pls-40] 1 2 Zhang, Hugh (2019年2月19日). “OpenAI: Please Open Source Your Language Model”. The Gradient. 2021年1月28日時点のオリジナルよりアーカイブ。2021年2月28日閲覧。

[tclat-41] 1 2 Ha, Anthony (2021年2月4日). “AI Dungeon-maker Latitude raises $3.3M to build games with 'infinite' story possibilities”. TechCrunch. 2021年2月21日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[opengpt2-42] “OpenGPT-2: We Replicated GPT-2 Because You Can Too”. Noteworthy (2019年8月22日). 2021年2月27日閲覧。

[reddit-43] 1 2 Vincent, James (2019年6月6日). “There's a subreddit populated entirely by AI personifications of other subreddits”. The Verge. 2021年2月21日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[smartcompose-44] Vincent, James (2019年7月24日). “This AI-powered autocompletion software is Gmail's Smart Compose for coders”. The Verge. 2021年3月9日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[aid2-45] Olson, Mathew (2019年12月17日). “AI Dungeon 2, the Text Adventure Where You Can do Nearly Anything, Is Now on Mobile”. 2020年9月20日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[aidungeon-46] Nelius, Joanna (2020年8月3日). “This AI-Powered Choose-Your-Own-Adventure Text Game Is Super Fun and Makes No Sense”. Gizmodo. 2021年2月28日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[47] “Write With Transformer”. 2019年12月4日閲覧。

[48] “Talk to Transformer”. 2019年12月4日閲覧。

[49] “CreativeEngines”. 2021年6月25日閲覧。

[teens-50] “An AI is training counselors to deal with teens in crisis”. MIT Technology Review (2021年2月26日). 2021年2月27日時点のオリジナルよりアーカイブ。2021年2月27日閲覧。

[51] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. pp. 19–27. arXiv:1506.06724.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[32]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[28]

gpt2とは？わかりやすく解説

GPT-2

概要