ジョー・カールスミスとは？わかりやすく解説

ジョー・カールスミス
ジョー・カールスミス
出身校	イェール大学（B.A.）; オックスフォード大学（BPhil・DPhil）; ニューヨーク大学（博士課程）
職業	研究者・哲学者・著述家
	テンプレートを表示

ジョー・カールスミス（Joe Carlsmith、生年非公開）は、アメリカの哲学者・研究者・著述家。現在はAnthropicの技術スタッフ（Member of Technical Staff）として勤務し、同社の大規模言語モデル「Claude」の価値観・性格仕様（コンスティテューション）の設計に携わる^[1]。また2026年春学期よりイェール大学ロースクールで客員講師を務め、「欺瞞的・陰謀的なAIの規制」をテーマとする講座を共同担当している^[2]。

専門はAIアライメント・実存的リスク・倫理学・意思決定理論など。オックスフォード大学で哲学の博士号を取得しており、ロングタームフューチャリズムや効果的利他主義の理論的基盤に関する独自の著述でも知られる。

経歴

教育

イェール大学で哲学を専攻し、2012年5月に優秀な成績（Summa cum laude・優等）で学士号を取得。在学中はウォーレン記念最優秀奨学賞（人文学部門最高位の学生に授与）を受賞^[2]。
オックスフォード大学マートン・コレッジでBPhil（哲学修士）を2014年6月に取得（Distinction）。指導教員はジョン・ガードナー教授で、論文題目は「Hypocrisy and Accountability」^[2]。
ニューヨーク大学哲学博士課程に2016年秋より在籍（2017〜18年度はオックスフォードへ移籍）^[2]。
オックスフォード大学セント・ジョンズ・コレッジでDPhil（哲学博士）を2023年春に取得。指導教員はヒラリー・グリーブス教授およびジェフ・マクマハン教授。博士論文題目は "A Stranger Priority: Topics at the Outer Reaches of Effective Altruism"^[2]。

職歴

2017〜18年、オックスフォード大学人類の未来研究所（Future of Humanity Institute）および効果的利他主義センター（Centre for Effective Altruism）において、トビー・オード（Toby Ord）の著書『The Precipice: Existential Risk and the Future of Humanity』の執筆補佐を務めた^[2]。
2018〜2025年、フィランソロピー組織Open Philanthropy（後にCoefficient Givingに改称）に在籍。リサーチアナリストを経てシニアリサーチアナリスト（2021〜25年）、シニアアドバイザー（2025年）を歴任し、先進的AI技術からの実存的リスクを専門に研究した^[3]。
2025年11月よりAnthropicに技術スタッフ（Member of Technical Staff）として入社。[[Claude's Constitution

]]の設計に中心的役割を担う^[3]。

受賞・フェローシップ

Future of Humanity Institute DPhil奨学金（オックスフォード大学）
マックラッケン・フェローシップ（ニューヨーク大学）
クラレンドン奨学金（オックスフォード大学）
ウォーレン記念最優秀奨学賞（イェール大学）^[2]

研究・著作

カールスミスの研究は、先進的AIが人類にもたらす実存的リスクの分析と、その哲学的・倫理的基盤の探求を中軸としている。特に「パワーシーキング（権力追求型）AI」の危険性と「スキーミング（欺瞞的アライメント）」の問題に関する2本の主要報告書は、AI安全性コミュニティで広く参照されている。またOpen Philanthropyの「世界観調査（Worldview Investigations）」チームを主導し、AIタイムラインや実存的リスクの定量的推計に取り組んだ^[3]。

2026年1月にはAnthropicが公開した「クロードのコンスティテューション（Claude's Constitution）」の主要共著者の一人として、多くのセクションの執筆および改訂に中心的に関与したことが同文書内で明示されている^[4]。

主要論文・報告書

"Claude's Constitution" （Amanda Askell, Joe Carlsmith, Chris Olah, Jared Kaplan, Holden Karnofsky ほか共著）, Anthropic, 2026年1月. ^[4]
"Existential Risk from Power-Seeking AI", in Essays on Longtermism, eds. Hilary Greaves, Jacob Barrett, and David Thorstad, Oxford University Press, 2025. Joe Carlsmith (2025). “Existential Risk from Power-Seeking AI”. In Hilary Greaves; Jacob Barrett; David Thorstad. Essays on Longtermism. Oxford University Press
"Scheming AIs: Will AIs fake alignment during training in order to get power?", Open Philanthropy Report, arXiv公開 2023年11月. ^[5]
"Is Power-Seeking AI an Existential Risk?", Open Philanthropy Report, 2021年4月 / arXiv公開 2022年6月. ^[6]
"How Much Computational Power Does It Take to Match the Human Brain?", Open Philanthropy Report, 2020年9月. Joe Carlsmith (2020年9月). “How Much Computational Power Does It Take to Match the Human Brain?”. Open Philanthropy. 2026年3月12日閲覧。

エッセイシリーズ

『How do we solve the alignment problem?』（進行中、2025〜2026年）
『Otherness and control in the age of AGI』（書籍級エッセイシリーズ、2024年）
博士論文収録論文：「SIA vs. SSA」「Simulation arguments」「Infinite ethics and the utilitarian dream」^[2]

発言

AIリスクと実存的危機について

知的エージェンシー

「知的エージェンシーというのは、非常に強力な力です。私たちより遥かに賢いエージェントを作るというのは、まさに火遊びなのです——とりわけ、そのエージェントの目標に問題があれば、おそらく人間に対して権力を握ろうとする手段的なインセンティブが生じるでしょう。」

（原文："Intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire — especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans."）^[6]

大惨事の見積もり

「私はこの論証を評価し、2070年までにこのような実存的大惨事が起きる確率をおよそ10%超と見積もっています。」

（原文："I assign rough subjective credences to the premises in this argument, and I end up with an overall estimate of >10% that an existential catastrophe of this kind will occur by 2070."）^[6]

権力を追求するAIシステム

「権力を追求するAIシステムは、多くの被害をもたらす可能性がありますが、権力を追求しないシステムによる被害はより容易に限定できます。権力追求こそが、意図せぬAIの挙動から実存的大惨事へとつながる最も顕著な経路だと私には思われます。」

（原文："But it is power-seeking, in particular, that seems to me the most salient route to existential catastrophe from unintended AI behavior. AI systems that don't seek to gain or maintain power may cause a lot of harm, but this harm is more easily limited."）^[6]

スキーミング（欺瞞的アライメント）について

欺瞞的アライメント

「訓練中に良好なパフォーマンスを発揮することが権力獲得のための良い戦略である——私はそう思うのですが——としたら、非常に広い範囲の目標がスキーミングの動機となりえます。そのため、訓練プロセスが自然にそのような目標へと行き着き、それを強化してしまったり、あるいはパフォーマンスを向上させる手っ取り早い方法として積極的にモデルの動機をその方向へ押し進めてしまったりすることは、十分にあり得ることです。」

（原文："if performing well in training is a good strategy for gaining power (as I think it might well be), then a very wide variety of goals would motivate scheming – and hence, good training performance. This makes it plausible that training might either land on such a goal naturally and then reinforce it, or actively push a model's motivations towards such a goal as an easy way of improving performance."）^[5]

「これは私たちがこれまでに見た中で、最も自然な形でのスキーミングの実証例です。そして、AIシステムにおける不整合な権力追求に関して、これまでで最も興味深い実証的結果だと思います。」

（原文："This is the most naturalistic and fleshed-out demonstration of something-like-scheming we've seen, and imo it's the most interesting empirical result we have yet re: misaligned power-seeking in AI systems more broadly."）^[7]

キャリアとAnthropicへの転職について

前例のない技術的・哲学的な挑戦

「このような仕事は、私が考えるに、人類の歴史上前例のない技術的・哲学的な挑戦であり、AIが私たちの社会でますます大きな影響力を持つようになるにつれ、その賭け金は急速に高まっています。そして私は、自分の経歴とスキルセットがこの仕事を支援するのに特に適していると思っています。」

（原文："This sort of project, I believe, is a technical and philosophical challenge unprecedented in the history of our species; one with rapidly increasing stakes as AIs start to exert more and more influence in our society; and one I think that my background and skillset are especially suited to helping with."）^[3]

倫理的・認識論的誠実さ

「優れた倫理的・認識論的誠実さが何を可能にするかを思い起こさせてくれる。Open Philanthropyは多くの欠点もあるが、私が見る限り、ひとつの組織として真に稀有なほど善い存在です。」

（原文："You are a reminder, to me, of what ethical and epistemic sincerity can make possible. Open Phil has many flaws. But as far as I can tell, as an institution, it is a truly rare degree of good."）^[3]

AIとよりよい文明について

AGIと倫理

「AGIをめぐる賭け金は、根本的に倫理メタ的なものです。もし未来がより高度な知性によって導かれるとしたら、私たちは依然として、何かが良いとはどういうことかを理解しなければなりません。知性それ自体は、美徳の保証にはなりません——ペーパークリップ最大化器の例が、その誤りを最も明快に示しています。」

（原文："the stakes of AGI are ultimately meta-ethical. If the future is to be guided by more advanced minds, we must still understand what it means for something to be good. Intelligence itself carries no guarantee of virtue – the paperclip maximizer remains the clearest illustration of that mistake."）^[8]

超知性のリスク

「調整のとれた慎重な文明は、超知性に向けた前進の現在のリスクレベルを受け入れないでしょう。AIラボと政府は共に行動して、取り返しのつかない大惨事を回避すべきです。そして規制体制は、私的なアクターが実存的リスクを冒すことを阻止すべきです。」

（原文："A coordinated and prudent civilization would not accept the current level of risk from pushing forward toward superintelligence. AI labs and governments should act together to avoid irreversible catastrophe, and regulatory regimes should stop private actors from taking existential risks."）^[8]

哲学・倫理についての思索

将来の世代のために

「もしあなたが本当に、自分は物事を正しく理解していない可能性が高いと思っているなら、少なくとも何か助けになる可能性のあることのひとつは、将来の世代がこれらの問いを解決できるよりよい立場に置くこと、つまり、彼らが十分な時間を持ち、私たちよりもずっと賢く、ずっと多くの情報を持っていることが期待されるときに、です。」

（原文："if you really think that there's a good chance that you're not understanding things, then something that you could do that at least probably has some shot of helping is to put future generations in a better position to solve these questions — once they have lots of time and hopefully are a whole lot smarter and much more informed than we are."）^[9]

脚注

^ “Home – Joe Carlsmith”. joecarlsmith.com. 2026年3月12日閲覧。
^ ^a ^b ^c ^d ^e ^f ^g ^h “Joe Carlsmith – Curriculum Vitae”. jc.gatspress.com. 2026年3月12日閲覧。
^ ^a ^b ^c ^d ^e Joe Carlsmith (2025年11月3日). “Leaving Open Philanthropy, going to Anthropic”. Substack. 2026年3月12日閲覧。
^ ^a ^b “Claude's Constitution”. Anthropic (2026年1月). 2026年3月12日閲覧。
^ ^a ^b Joe Carlsmith (2023年11月). “Scheming AIs: Will AIs fake alignment during training in order to get power?”. arXiv. 2026年3月12日閲覧。
^ ^a ^b ^c ^d Joe Carlsmith (2022年6月). “Is Power-Seeking AI an Existential Risk?”. arXiv. 2026年3月12日閲覧。
^ Joe Carlsmith (2024年12月18日). “Joe Carlsmith on X (2024-12-18)”. X（旧Twitter）. 2026年3月12日閲覧。
^ ^a ^b “Joe Carlsmith – A Wiser, AI-Powered Civilization is the "Successor"”. Dan Faggella / The Trajectory (2025年11月). 2026年3月12日閲覧。
^ “Joe Carlsmith on navigating serious philosophical confusion”. 80,000 Hours Podcast (2023年5月). 2026年3月12日閲覧。

外部リンク

[1] “Home – Joe Carlsmith”. joecarlsmith.com. 2026年3月12日閲覧。

[CV-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h “Joe Carlsmith – Curriculum Vitae”. jc.gatspress.com. 2026年3月12日閲覧。

[Substack2025-3] Joe Carlsmith (2025年11月3日). “Leaving Open Philanthropy, going to Anthropic”. Substack. 2026年3月12日閲覧。

[Constitution-4] “Claude's Constitution”. Anthropic (2026年1月). 2026年3月12日閲覧。

[SchemingarXiv-5] Joe Carlsmith (2023年11月). “Scheming AIs: Will AIs fake alignment during training in order to get power?”. arXiv. 2026年3月12日閲覧。

[PowerSeekingarXiv-6] Joe Carlsmith (2022年6月). “Is Power-Seeking AI an Existential Risk?”. arXiv. 2026年3月12日閲覧。

[7] Joe Carlsmith (2024年12月18日). “Joe Carlsmith on X (2024-12-18)”. X（旧Twitter）. 2026年3月12日閲覧。

[Faggella2025-8] “Joe Carlsmith – A Wiser, AI-Powered Civilization is the "Successor"”. Dan Faggella / The Trajectory (2025年11月). 2026年3月12日閲覧。

[9] “Joe Carlsmith on navigating serious philosophical confusion”. 80,000 Hours Podcast (2023年5月). 2026年3月12日閲覧。

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

ジョー・カールスミスとは？わかりやすく解説