Big-endian vs. little-endian in the context of bit-level encoding

scottdixon · September 2, 2019, 8:25pm

A colleague of mine continues to be vexed by the v0 dynamic array layout, which would remain the same in v1.

According to my colleague, given an example type:

uint16[<=16] data

and a payload with a data array length of 14:

73 2B FE 14 0F 73 FE 73 FE 83 FA 9C 06 4B F8

The first octet, 0x73, contains the 5-bit array length in bits 4 - 8 with bits 1 - 3 as the lowest 3 bits in the first index of the array.

The second octet, 0x2B, contains the highest 5 bits of the first index in bits 1 - 5 and the lowest 3 bits of the second index of the array in bits 6 - 8.

This pattern continues until the fourteenth index’s (byte 15) highest 5 bits are taken from bits 1 - 5 of the fifteenth octet, 0xF8 (bits 6 - 8 of byte 15 is padding and is ignored).

In C, the array length is found as such:

// 0x73 = 0111 0011 b
// 0x73 & 0xF8 = 0111 0000 b
std::uint8_t array_length = (0x73 & 0xF8) >> 3; // 14

The problem is that the length value itself is violating little-endian encoding of integers. This being the first value in the data it should appear in the LSB of the first value instead of the MSB. This defect is more evident when considering array lengths that require > 8 bits to encode.

So instead the story should be:

The first octet contains the 5-bit array length in bits 1 - 5 with bits 6 - 8 as the lowest 3 bits in the first index of the array.

The second octet contains the highest 5 bits of the first index in bits 1 - 5 and the lowest 3 bits of the second index of the array in bits 6 - 8.

This pattern continues until the fourteenth index’s (byte 15) highest 5 bits are taken from bits 1 - 5 of the fifteenth octet (bits 6 - 8 of byte 15 is padding and is ignored).

Is this something we should fix in v1?

Regardless we should update the v0 specification text:

Dynamic array encoding rules are sophisticated; hence it is recommended to review the existing implementations for a deeper understanding.

to actually specify how this has been implemented.