LEB128とは？わかりやすく解説

LEB128 または Little Endian Base 128 は任意精度の整数を少ないバイト数に格納するのに用いられる可変長符号圧縮である。

LEB128はDWARF デバッグファイルフォーマット^[1]^[2]や WebAssembly のすべての整数リテラルの符号化方式に使用されている^[3]。

符号化形式

LEB128形式は可変長数値表現 (VLQ) と非常に似ている。主な違いは LEB128 はリトルエンディアンであるのに対し、可変長数値表現はビッグエンディアンであることであり、どちらも小さな数値を1バイトで格納できる一方で、任意の長さの数値をエンコードすることもできる。LEB128には2つのバージョンがあり、符号無しLEB128と符号付きLEB128である。復号する際は、エンコードされた値が符号無しLEB128か符号付きLEB128かを知っている必要がある。

符号無し LEB128

符号無し整数を Unsigned LEB128 (ULEB128) にするには、まず2進数で表現する。

次に数値を7ビットの倍数になるようにゼロ拡張する（数値が0でない場合、最上位7ビットがすべて0にならないようにする）。数値を7ビットのグループに分割する。各7ビットグループに対して、最下位グループから最上位グループの順に1バイトずつ出力する。各バイトは、そのグループを7つの最下位ビットに持ちます。最後のバイトを除くすべてのバイトの最上位ビットを1に設定する。数値0は通常、単一バイト0x00としてエンコードされふ。WebAssemblyでは、0の代替エンコーディングも許可されている（0x80 0x00、0x80 0x80 0x00、...）。

MSB ------------------ LSB
      10011000011101100101  元の値の2進数表現
     010011000011101100101  7の倍数ビットに拡張
 0100110  0001110  1100101  7ビットごとに分割
00100110 10001110 11100101  最後（最上位）のグループを除くすべてに高位1ビットを追加してバイトを形成
    0x26     0x8E     0xE5  16進数表現

→ 0xE5 0x8E 0x26            出力 (最下位ビットから最上位ビットの順)

符号付き LEB128

符号付きの数値も同様に表現される。2の補数で表現された、7の倍数の $N$ ビットから始め、符号なしエンコーディングと同様にグループに分割する。

例えば、符号付き数値-123456は0xC0 0xBB 0x78としてエンコードされる。

MSB ------------------ LSB
         11110001001000000  123456 の2進数表現
     000011110001001000000  21ビットの
     111100001110110111111  すべてのビットを反転 (1の補数)
     111100001110111000000  1を足す(2の補数)
 1111000  0111011  1000000  7ビットごとに分割
01111000 10111011 11000000  最後（最上位）のグループを除くすべてに高位1ビットを追加してバイトを形成
    0x78     0xBB     0xC0  16進数表現

→ 0xC0 0xBB 0x78            出力 (最下位ビットから最上位ビットの順)

C言語風擬似コード

符号無し整数のエンコード

do {
  byte = low-order 7 bits of value;
  value >>= 7;
  if (value != 0) /* more bytes to come */
    set high-order bit of byte;
  emit byte;
} while (value != 0);

符号付き整数のエンコード

more = 1;
negative = (value < 0);

/* the size in bits of the variable value, e.g., 64 if value's type is int64_t */
size = no. of bits in signed integer;

while (more) {
  byte = low-order 7 bits of value;
  value >>= 7;
  /* the following is only necessary if the implementation of >>= uses a
     logical shift rather than an arithmetic shift for a signed left operand */
  if (negative)
    value |= (~0 << (size - 7)); /* sign extend */

  /* sign bit of byte is second high-order bit (0x40) */
  if ((value == 0 && sign bit of byte is clear) || (value == -1 && sign bit of byte is set))
    more = 0;
  else
    set high-order bit of byte;
  emit byte;
}

符号無し整数の復号

result = 0;
shift = 0;
while (true) {
  byte = next byte in input;
  result |= (low-order 7 bits of byte) << shift;
  if (high-order bit of byte == 0)
    break;
  shift += 7;
}

符号付き整数の復号

result = 0;
shift = 0;

/* the size in bits of the result variable, e.g., 64 if result's type is int64_t */
size = number of bits in signed integer;

do {
  byte = next byte in input;
  result |= (low-order 7 bits of byte << shift);
  shift += 7;
} while (high-order bit of byte != 0);

/* sign bit of byte is second high-order bit (0x40) */
if ((shift <size) && (sign bit of byte is set))
  /* sign extend */
  result |= (~0 << shift);

^ UNIX International (July 1993), “7.8”, DWARF Debugging Information Format Specification Version 2.0, Draft 2009年7月19日閲覧。
^ Free Standards Group (2005年12月). “DWARF Debugging Information Format Specification Version 3.0”. p. 70. 2009年7月19日閲覧。
^ WebAssembly Community Group (2020年11月12日). “Values — Binary Format — WebAssembly 1.1”. 2020年12月31日閲覧。

使用例

Android プロジェクトはDalvik 実行可能ファイルフォーマット (.dex) で LEB128 で使用^[1]。
Hewlett-Packard IA-64の例外処理におけるテーブルの圧縮^[2]。
DWARF ファイルフォーマットは、様々なフィールドで符号なしと符号付きの両方のLEB128エンコーディングを使用。
LLVM ではカバレッジマッピングフォーマットで使用している^[3]。LLVMのLEB128エンコーディングとデコーディングの実装は、前述の擬似コードと並んで有用である^[4]。
Microsoft .NET Framework では、BinaryReaderとBinaryWriterクラスで「7ビットエンコード整数」フォーマットをサポートしており、BinaryWriterで文字列を書き込む際、文字列の長さはこのメソッドでエンコードされている。
Minecraft は、パケット内のデータの長さを測定するためにプロトコルでLEB128を使用^[5]。
mpatrolデバッギングツールは、トレースファイルフォーマットでLEB128を使用^[6]。
osu! は、osu!リプレイ（.osr）フォーマットでLEB128を使用^[7]。
W3C Efficient XML Interchange (EXI) は、ここで説明されているのとまったく同じ方法で、LEB128を使用して符号無し整数を表現する^[8]。
WebAssemblyは、モジュールのポータブルなバイナリエンコーディングで使用。
Ixzファイルフォーマット^[9]

出典

^ “Dalvik Executable Format”. 2021年5月18日閲覧。
^ Christophe de Dinechin (2000年10月). “C++ Exception Handling for IA-64”. 2009年7月19日閲覧。
^ LLVM Project (2016年). “LLVM Code Coverage Mapping Format”. 2016年10月20日閲覧。
^ LLVM Project (2019年). “LLVM LEB128 encoding and decoding”. 2019年11月2日閲覧。
^ “Minecraft Modern Varint & Varlong”. wiki.vg (2020年). 2020年11月29日閲覧。
^ “MPatrol documentation” (2008年12月). 2009年7月19日閲覧。
^ “Osr (file format) - osu!wiki” (英語). osu.ppy.sh. 2017年3月18日閲覧。
^ “Efficient XML Interchange (EXI) Format 1.0”. www.w3.org. World Wide Web Consortium (2014年2月11日). 2020年12月31日閲覧。
^ “The .xz File Format”. tukaani.org (2009年). 2017年10月30日閲覧。
^ “LLVM Bitcode File Format — LLVM 13 documentation”. 2024年10月4日閲覧。

[dwarfspec2-1] UNIX International (July 1993), “7.8”, DWARF Debugging Information Format Specification Version 2.0, Draft 2009年7月19日閲覧。

[dwarfspec3-2] Free Standards Group (2005年12月). “DWARF Debugging Information Format Specification Version 3.0”. p. 70. 2009年7月19日閲覧。

[wasmint-3] WebAssembly Community Group (2020年11月12日). “Values — Binary Format — WebAssembly 1.1”. 2020年12月31日閲覧。

[dex-4] “Dalvik Executable Format”. 2021年5月18日閲覧。

[Christophe-5] Christophe de Dinechin (2000年10月). “C++ Exception Handling for IA-64”. 2009年7月19日閲覧。

[6] LLVM Project (2016年). “LLVM Code Coverage Mapping Format”. 2016年10月20日閲覧。

[7] LLVM Project (2019年). “LLVM LEB128 encoding and decoding”. 2019年11月2日閲覧。

[8] “Minecraft Modern Varint & Varlong”. wiki.vg (2020年). 2020年11月29日閲覧。

[tracefile-9] “MPatrol documentation” (2008年12月). 2009年7月19日閲覧。

[10] “Osr (file format) - osu!wiki” (英語). osu.ppy.sh. 2017年3月18日閲覧。

[11] “Efficient XML Interchange (EXI) Format 1.0”. www.w3.org. World Wide Web Consortium (2014年2月11日). 2020年12月31日閲覧。

[12] “The .xz File Format”. tukaani.org (2009年). 2017年10月30日閲覧。

[13] “LLVM Bitcode File Format — LLVM 13 documentation”. 2024年10月4日閲覧。

[1]

[2]

[3]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

LEB128とは？わかりやすく解説

LEB128

符号化形式

符号無し LEB128

符号付き LEB128

C言語風擬似コード

符号無し整数のエンコード

符号付き整数のエンコード

符号無し整数の復号

符号付き整数の復号

使用例

関連するエンコーディング

関連項目

出典

LEB128とは？ わかりやすく解説

LEB128

符号化形式

符号無し LEB128

符号付き LEB128

C言語風 擬似コード

符号無し整数のエンコード

符号付き整数のエンコード

符号無し整数の復号

符号付き整数の復号

使用例

関連するエンコーディング

関連項目

出典

急上昇のことば

LEB128とは？わかりやすく解説

C言語風擬似コード