Web ARchiveとは？わかりやすく解説

Web ARChive
拡張子	warc
MIMEタイプ	application/warc
派生元	ARC
国際標準	ISO 28500:2017
ウェブサイト	iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1-annotated/
	テンプレートを表示

WARC（Web ARChive）とは、複数のデジタル資源を関連情報と共に1つの集合アーカイブファイルに統合する方法を規定するアーカイブフォーマットである。これらの統合された資源はWARCファイルとして保存され、ReplayWeb.pageなどの適切なソフトウェアを使用して再生できたり、ウェイバックマシンなどのアーカイブウェブサイトで使用できる。

WARCフォーマットは従来World Wide Webから収集された「ウェブクロール」をコンテンツブロックのシーケンスとして保存するために使用されていたインターネットアーカイブのARC_IAファイルフォーマット^[3]の改訂版である。WARCフォーマットはアーカイビング団体の収集、アクセス、交換の需要をより適切にサポートするために古いフォーマットを一般化したものである。現在記録されている主なコンテンツに加えて、この改訂版では指定されたメタデータ、省略された重複検知イベント^{[注釈 1]}、後日の変換記録など関連する二次コンテンツも収容できる^[4]。WARCフォーマットはHTTP/1.0ストリームに触発されており、同様のヘッダーとCRLFを区切り文字として使用するので、クローラの実装に非常に適している。

2008年に最初に仕様が規定されたWARCは^[5]、現在ではほとんどの国立図書館システムでウェブアーカイビングの標準として認められており^[6]、一部の国立図書館システムではWACZも許容できるフォーマットとして挙げられ始めている^[7]^[8]。

ソフトウェア

脚注

[脚注の使い方]

注釈

↑ §7.6「revisit」を参照。

出典

↑ “Introduction”. SourceForge. 2015年3月5日閲覧。
↑ “Information and documentation -- WARC file format”. 2018年3月16日閲覧。
↑ “ARC_IA, Internet Archive ARC file format”. www.digitalpreservation.gov (2008年2月14日). 2015年5月9日閲覧。
↑ “WARC, Web ARChive file format”. www.digitalpreservation.gov (2009年8月31日). 2015年5月9日閲覧。
↑ Arvidson, Allan; Kunze, John; Mohr, Gordon; Stack, Michael (5 July 2008). The WARC File Format 2021年4月29日閲覧。.
↑ Allegrezza, Stefano (21 April 2016). “Nuove prospettive per il Web archiving: Gli standard ISO 28500 (Formato WARC) e ISO/TR 14873 sulla qualità del Web archiving”. Digitalia 2015: 49–61.
↑ “Web Archive Collection Zipped”. www.loc.gov (2023年5月19日). 2025年3月28日閲覧。
↑ “Preferred file formats” (英語). digitalpreservation.no (2024年12月5日). 2025年3月28日閲覧。
↑ “ArchiveBox” (英語). ArchiveBox. 2025年3月6日閲覧。
↑ “ArchiveWeb.page • Webrecorder” (英語). Webrecorder (2025年1月10日). 2025年3月28日閲覧。
↑ “Frequently Asked Questions” (英語). Conifer User Guide. 2025年3月27日閲覧。
↑ webrecorder/har2warc, Webrecorder, (2025-01-25) 2025年3月28日閲覧。
↑ “User Guide - Replay Webpage Docs”. replayweb.page. 2025年3月28日閲覧。
↑ harvard-lil/scoop, Harvard Library Innovation Laboratory, (2025-03-26), https://github.com/harvard-lil/scoop 2025年3月28日閲覧。
↑ Scrivano, Giuseppe (2012年8月6日). “GNU wget 1.14 released”. GNU wget 1.14 released. Free Software Foundation, Inc.. 2016年2月25日閲覧。
↑ “WebsiteArchiver - 保存と整理”. websitearchiver.net. 2026年5月5日閲覧。

外部リンク

WARC File Format specifications - ウェイバックマシン（2023年11月8日アーカイブ分）（英語）
The WARC File Format (ISO 28500) - Information, Maintenance, Drafts （英語）
WARC, Web ARChive file format （英語）
WARC implementation guidelines （英語）
Welcome （英語）
13. Internet Archive ARC files （英語）
The WARC Ecosystem （英語）

この項目は、コンピュータに関連した書きかけの項目です。この項目を加筆・訂正などしてくださる協力者を求めています（PJ:コンピュータ/P:コンピュータ）。

[4] §7.6「revisit」を参照。

[SourceForge-1] “Introduction”. SourceForge. 2015年3月5日閲覧。

[ISO-2] “Information and documentation -- WARC file format”. 2018年3月16日閲覧。

[ARC_IA-3] “ARC_IA, Internet Archive ARC file format”. www.digitalpreservation.gov (2008年2月14日). 2015年5月9日閲覧。

[DigitalPreservation-5] “WARC, Web ARChive file format”. www.digitalpreservation.gov (2009年8月31日). 2015年5月9日閲覧。

[Arvidson-6] Arvidson, Allan; Kunze, John; Mohr, Gordon; Stack, Michael (5 July 2008). The WARC File Format 2021年4月29日閲覧。.

[Allegrezza-7] Allegrezza, Stefano (21 April 2016). “Nuove prospettive per il Web archiving: Gli standard ISO 28500 (Formato WARC) e ISO/TR 14873 sulla qualità del Web archiving”. Digitalia 2015: 49–61.

[8] “Web Archive Collection Zipped”. www.loc.gov (2023年5月19日). 2025年3月28日閲覧。

[9] “Preferred file formats” (英語). digitalpreservation.no (2024年12月5日). 2025年3月28日閲覧。

[10] “ArchiveBox” (英語). ArchiveBox. 2025年3月6日閲覧。

[11] “ArchiveWeb.page • Webrecorder” (英語). Webrecorder (2025年1月10日). 2025年3月28日閲覧。

[12] “Frequently Asked Questions” (英語). Conifer User Guide. 2025年3月27日閲覧。

[13] webrecorder/har2warc, Webrecorder, (2025-01-25) 2025年3月28日閲覧。

[14] “User Guide - Replay Webpage Docs”. replayweb.page. 2025年3月28日閲覧。

[15] harvard-lil/scoop, Harvard Library Innovation Laboratory, (2025-03-26), https://github.com/harvard-lil/scoop 2025年3月28日閲覧。

[FSF2012-16] Scrivano, Giuseppe (2012年8月6日). “GNU wget 1.14 released”. GNU wget 1.14 released. Free Software Foundation, Inc.. 2016年2月25日閲覧。

[17] “WebsiteArchiver - 保存と整理”. websitearchiver.net. 2026年5月5日閲覧。

[1]

[2]

[3]

[注釈 1]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

表話編歴アーカイブファイルフォーマット（比較）
アーカイブのみ	ar AXF BagIt（英語版） cpio LBR（英語版） shar（英語版） tar WAD WARC
圧縮のみ	Brotli bzip2 compress gzip LZMA LZ4 lzip lzop SQ（英語版） xz Zstandard pack
アーカイブおよび圧縮	7z ACE ARC（英語版） ARJ B1（英語版） Cabinet cfs（英語版） cpt dar（英語版） DGCA .dmg .egg（英語版） GCA kgb（英語版） LHA LZX MPQ（英語版） PEA .qda RAR rzip（英語版） sit SQX（英語版） UDA（PAQ/UDA） Xar（英語版） zoo ZIP ZPAQ
ソフトウェアパッケージ	pkg (SVR4) deb pkg (macOS) RPM RUNZ MSI JAR WAR RAR (Java) EAR XPI
文書パッケージ	OEB Package Format OEBPS Container Format (EPUB) Open Packaging Conventions Office Open XML

典拠管理データベース
全般	FAST
国立図書館	アメリカ日本チェコイスラエル


	All text is available under the terms of the GNU Free Documentation License. この記事は、ウィキペディアのWARC (ファイルフォーマット) (改訂履歴)、Webarchive (改訂履歴)、ウェブアーカイブ (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。 Weblio辞書に掲載されているウィキペディアの記事も、全てGNU Free Documentation Licenseの元に提供されております。
TANAKA Corpus	Tanaka Corpusのコンテンツは、特に明示されている場合を除いて、次のライセンスに従います： Creative Commons Attribution (CC-BY) 2.0 France.
京大-NICT 日英中基本文データ	この対訳データはCreative Commons Attribution 3.0 Unportedでライセンスされています。
	Copyright © 1995-2026 Hamajima Shoten, Publishers. All rights reserved.
	Copyright © Benesse Holdings, Inc. All rights reserved.
	Copyright (c) 1995-2026 Kenkyusha Co., Ltd. All rights reserved.
	日本語ワードネット1.1版 (C) 情報通信研究機構, 2009-2010 License All rights reserved. WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved. License
	Copyright (C) 1994- Nichigai Associates, Inc., All rights reserved. 「斎藤和英大辞典」斎藤秀三郎著、日外アソシエーツ辞書編集部編
	This page uses the JMdict dictionary files. These files are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's licence.

Web ARchiveとは？わかりやすく解説

WARC (ファイルフォーマット)

ソフトウェア

脚注

注釈

出典

関連項目

外部リンク

Webarchive

対応

競合

注釈

出典

外部リンク

ウェブアーカイブ

Webの収集

アーカイビングの難しさと限界

手動によるアーカイビング

脚注

出典

参考文献

関連項目

外部リンク

「web Archive」の例文・使い方・用例・文例

英和和英テキスト翻訳

「Web ARchive」の関連用語

Web ARchiveとは？ わかりやすく解説

WARC (ファイルフォーマット)

ソフトウェア

脚注

注釈

出典

関連項目

外部リンク

Webarchive

対応

競合

注釈

出典

外部リンク

ウェブアーカイブ

Webの収集

アーカイビングの難しさと限界

手動によるアーカイビング

脚注

出典

参考文献

関連項目

外部リンク

「web Archive」の例文・使い方・用例・文例

英和和英テキスト翻訳

「Web ARchive」の関連用語

Web ARchiveとは？わかりやすく解説