[Cyphal/UDP] Architectural issues caused by the dependency between the node's IP address and its identity

pavel.kirienko · November 19, 2022, 3:25pm

scottdixon:

I’m having to think carefully about this. On the one hand, most high-assurance systems require “end-to-end integrity checking” which often leads to every layer in a network stack adding a checksum before handing off to the next layer. This ensures that errors introduced after serialization but before transmission are caught. That said, if you are on an embedded system with lockstep CPUs and ECC RAM the expectation is your program and network stack are inherently robust against data corruption. The one weakness would be if the message was moved into non-ECC hardware buffers before the Ethernet checksum was calculated (e.g. perhaps a stack copies a message into DMA before the Ethernet header is added? Not sure). Perhaps the strategy should be to reserve enough space for such a checksum but not call out the checksum just yet?

The argument for the checksum does sound convincing but I wonder if we could repurpose the Transfer CRC for this. At the moment, the Transfer CRC only applies to multi-frame transfer payloads. Should we explore the possibility of modifying the Transfer CRC such that:

It is provided for all transfers, not only multi-frame.
Optionally, it covers the headers of all involved frames as well. This option will complicate its computation though.

The downside of this approach is that the validity of received frames will not be possible to ascertain until the entire transfer is reassembled. Are we aware of any specific failure modes this approach might reveal?

If we were to adopt the dedicated header CRC as you described, I presume that we will want to modify the Transfer CRC such that it applies to single-frame transfers regardless, do you agree?

pavel.kirienko · November 19, 2022, 3:43pm

I think this should be “zero on transmit, discard on recept unless zero”. This should enhance forward compatibility, making all nodes unaware of domain-IDs (subnet-IDs) confined to the domain-ID of zero.

Let us please rename it as reserved, transmit 1/ignore. I still don’t understand what you are trying to achieve here and I presume we don’t want to digress to this topic now.

The tentative checksum field — or the space reserved for it — should probably be located at the very end of the header.

The octets are misnumbered on the diagram at offset 86, it should be 88. All following offsets are also incorrect.

scottdixon · November 21, 2022, 6:20pm

Bah! I forgot the anonymous message bit. Hang on…

scottdixon · November 21, 2022, 6:30pm

If we have a dedicated CRC in the header that covers each frame why would we use the transfer CRC for single-frame transfers at all? I feel like I’m missing something?

pavel.kirienko · November 21, 2022, 6:36pm

My implied suggestion was to use your 16-bit CRC for the header only (Hamming distance of 6 is possible) and protect the payload using the transfer-CRC. I neglected to make this clear, sorry.

scottdixon · November 21, 2022, 8:36pm

The saga continues:

This is an optimization for UDP/IP on Ethernet. By limiting the multicast group ID to the least significant 23 bits, Ethernet hosts can avoid additional filtering responsibilities above layer 2.
RFC 2365, Section 6.2.1 reserves 239.0.0.0/10 and 239.64.0.0/10 for future use (because of footnote 1, Cyphal/UDP does not have access to the 239.128.0.0/10 scope). Cyphal/UDP uses this bit to isolate IP header version 0 traffic (note that the IP header version is not, necessarily, the same as the Cyphal Header version) to the 239.0.0.0/10 scope but we can enable the 239.64.0.0/10 scope in the future.
SNM (Service, Not Message): If set then this is an RPC request or response and the 16 LSbs of the destination IP address is the full-range destination node identifier. If not set then the 13 LSbs of the destination IP address are a subject identifier for a pub/sub message and the 14th and 15th, and 16th LSbs are 0.
I’ve omitted the subnet concept for now. I think we should introduce that in a later change once the Cyphal/UDP specification is more mature. As such this is zero on transmit, discard on receipt unless zero.
We’ll register UDP ports later. These are just an examples.
Per RFC 1112, the default TTL is 1, which is unacceptable. Therefore, publishers should use the TTL value of 16 by default, which is chosen as a sensible default suitable for any intravehicular network.
Reserved bits we can use for a future version of the header that supports a variable size or we can decide to do other stuff with this bits.
In the future we want to propose using this bit in Cyphal/UDP and bit 23 of the Cyphal/CAN spec as a “valid data” flag in a similar manner to ARINC-825. It would mark a given transfer as containing valid data versus invalid data for a periodic signal (think GNSS where the position message is always sent at XXHz and is marked invalid until satellite fixes are adequate This allows “signal is present but not valid” logic branches in vehicle control code which is different from “signal is missing”). I’m not going to dive into this proposal right now but the TLDR is this bit must be the same for all transfers composing the same message.
If the SNM¹⁰ bit is set then this is a 10-bit service identifier with a 1-bit IRNR¹¹ flag, otherwise it is a 13-bit subject identifier.
SNM (Service, Not Message). Same value as found in the destination IP header (SNM³).
IRNR (Is Request Not Response) if SNM¹⁰ is set.
SwST (Starts with Synchronized Time): If set then Cyphal routers can interpret the first 56 bits after the Cyphal header as a uavcan.time.SynchronizedTimestamp-1.0 field. This deep packet inspection enables custom routing rules based on time but the specific rules are not controlled by the specification.
Like in CAN: 0 – highest priority, 7 – lowest priority. This data is duplicated from lower-layer QoS fields but provided in the Extended Cyphal header to simplify transfer forwarding where the QoS data is not readily available above the transport layer.
IAM (Is Anonymous Message) bit is set following the same rules as Cyphal/CAN. Note that any message with both the SNM¹⁰ and IAM set is invalid.
The 31 bit frame index within the current transfer.
EOT (End Of Transfer): if the most significant bit (31st) bit of the 32-bit frame index is set if the current frame is the last frame of the transfer.
If EOT¹⁶ is set then this is the CRC-32 of the reassembled transfer (header data excluded).

pavel.kirienko · November 23, 2022, 11:46am

This mostly makes sense except:

I understand that this is not formally part of this proposal but in the future I will strongly oppose this because it constitutes a breach of abstraction: the function of the transport layer is to deliver a serialized object from A to B without regard for its contents (things like the SwST flag are exempted because in this case the operation of the transport layer is dependent on the application data). If your application needs to differentiate valid data from invalid data then it has to be explicitly expressed in the data type definition. I think ARINC-825 (along with some related standards) is poorly designed in this respect.

If strongly desired, we could introduce some opaque user-specific flags that one could use to emulate ARINC-825-like behaviors in closed ecosystems.

I would like to revise the CRC definition once again. Could you comment on my suggestion to limit the CRC such that it covers only the header, and the payload is covered by the Transfer-CRC which is always present (even for single-frame transfers)?

Next I would like to bikeshed the field arrangement a little without altering the semantics significantly. The objectives are two: ensure natural alignment of each field to make struct aliasing possible, and harmonize the header format with Cyphal/serial. The latter stems from the fact that the updated Cyphal/UDP header definition is surprisingly close to that of Cyphal/serial and we could reap some benefits from their unification.

Here is basically your header definition with some fields shifted around, expressed in DSDL:

uint4 version           # <- 1
void4                   # Reserved for minor version or optional feature flags.

uint3 priority          # Duplicates QoS for ease of access; 0 -- highest, 7 -- lowest.
void5

@assert _offset_ == {16}
uint16 source_node_id
uint16 destination_node_id
uint16 data_specifier   # Like in Cyphal/serial: subject-ID | (service-ID + request/response discriminator).

@assert _offset_ == {64}
void64
uint64 transfer_id

@assert _offset_ % 32 == {0}
uint32 frame_index_eot  # MSB is set if the current frame is the last frame of the transfer.

bool starts_with_synchronized_time
void7

uint8 user_flags
# Opaque application-specific flags with user-defined semantics. Generic implementations should ignore.

@assert _offset_ % 16 == {0}
uint16 header_crc

@assert _offset_ / 8 == {32}  # Fixed-size 32-byte header with natural alignment for each field ensured.
@sealed

Key changes:

The field reserved for the Cyphal Header Length (CHL) is removed because its addition will necessitate changing the version number, which makes it redundant.
The data specifier incorporates the service-not-message and request-not-response flags. The subject-ID is 15-bit wide and the service-ID is 14-bit wide. This mirrors Cyphal/serial.
Anonymous transfers are indicated by setting the source node-ID to 2^{16}-1.
Multicast (i.e., message) transfers are indicated by setting the destination node-ID to 2^{16}-1.
The CRC is reduced to 16-bit and covers only the header (Hamming distance of 6 (sic!) is possible). The freed space is allocated for the user-specific flags.
TLAs are replaced with spelled-out names!!!1one

scottdixon · November 29, 2022, 7:51pm

indeed. Let’s drop this subject for now.

Okay, but where is the transfer CRC for the reassembled message then (i.e. with EOT=1)?

How about this?

Generic Header

uint4 version                      # <- 1
bool starts_with_synchronized_time # <- 0
void3

@assert _offset_ == {8}
uint3 priority                     # Duplicates QoS for ease of access; 0 -- highest, 7 -- lowest.
void5

@assert _offset_ == {16}
uint16 source_node_id
uint16 destination_node_id
uint16 data_specifier              # Like in Cyphal/serial: subject-ID | (service-ID + request/response discriminator).

@assert _offset_ == {64}
uint64 transfer_id

@assert _offset_ == {128}
uint31 frame_index
bool end_of_transfer

uint16 user_data
# Opaque application-specific data with user-defined semantics. Generic implementations should ignore.

@assert _offset_ % 16 == {0}
uint16 header_crc

void64

@assert _offset_ / 8 == {32}       # Fixed-size 32-byte header with natural alignment for each field ensured.
@sealed

Synchronized Header

uint4 version                      # <- 1
bool starts_with_synchronized_time # <- 1
void3

@assert _offset_ == {8}
uint3 priority                     # Duplicates QoS for ease of access; 0 -- highest, 7 -- lowest.
void5

@assert _offset_ == {16}
uint16 source_node_id
uint16 destination_node_id
uint16 data_specifier             # Like in Cyphal/serial: subject-ID | (service-ID + request/response discriminator).

@assert _offset_ == {64}
uint64 transfer_id

@assert _offset_ == {128}
uint31 frame_index
bool end_of_transfer

uint16 user_data
# Opaque application-specific data with user-defined semantics. Generic implementations should ignore.

@assert _offset_ % 16 == {0}
uint16 header_crc

uavcan.time.SynchronizedTimestamp.1.0 timestamp
# Allows a node to apply lower-layer logic using message timestamps. The specific meaning
# of these timestamps is system defined for v1 of this header.

void8

@assert _offset_ / 8 == {32}       # Fixed-size 32-byte header with natural alignment for each field ensured.
@sealed

pavel.kirienko · November 29, 2022, 8:14pm

Okay, but where is the transfer CRC for the reassembled message then (i.e. with EOT=1)?

Right after the payload! It is not part of the header because there is one Transfer-CRC per transfer, not per frame. So in a multi-frame transfer, only the last frame will contain the CRC. One implication is that if at least one frame contains mangled data, it will not be discovered until the last frame is received, but I presume it is not a problem.

Shouldn’t the Header CRC be the last field of the header? Or do you want to exclude the timestamp from CRC check? (if yes, why?) Can we not use the 64-bit void I provided in my suggested layout above for this?
Could you please share a few words on the rationale behind moving starts_with_synchronized_time next to the version field? The void after the version field is a good place to add the minor version later on.

scottdixon · November 29, 2022, 8:33pm

I get it. We get a bit of coverage from the UDP checksum per-frame so I think I’m buying this scheme.

The idea is to allow deep packet inspection to be as natural as possible where the timestamp is at the start of the data segment. That is:

The checksum would include the timestamp for these message types since the timestamp becomes an implied part of the header.

I’m not married to the location. Where would you put it?

pavel.kirienko · November 29, 2022, 8:41pm

This is no big deal but is awkward to compute. Could you please expand a bit on what we gain, in terms of DPI, by locating the timestamp and the payload adjacently?

After the priority, perhaps? I don’t have any use in mind for these five bits.

scottdixon · November 29, 2022, 8:46pm

Dots Per Inch?

pavel.kirienko · November 29, 2022, 8:48pm

Yes, also known as Deep Packet Inspection. Same thing, essentially.

scottdixon · November 29, 2022, 9:21pm

My scheme allows 56-bit timestamps to be used both by a lower layer and the application without incurring the overhead of including the same timestamp twice. For example, a Cyphal-aware router could rate-limit based on these timestamps (i.e. signal decimation) or a generic framework could provide message group synchronization without having to deserialize all the messages in a group, etc.

sold.

pavel.kirienko · November 30, 2022, 8:57am

I think I now see what you were trying to achieve with the variable-size Cyphal header: the header size field would change depending on whether the payload incorporates the timestamp as the first field or not; if it does, the header size would be minimal (24 bytes per our examples above). Otherwise, the header timestamp would be injected at the same offset from the origin of the Cyphal frame where it would have been if it were part of the serialized payload, and the Cyphal header length would be increased by 7 bytes such that the injected timestamp is not deserialized as part of the message. Except that in your case, the CHL is said to be a multiple of 4 bytes. Is this more or less in line with your goal, at least on a high level?

With a fixed-size header, the objective of “without incurring the overhead of including the same timestamp twice” is not achievable because there will always be some reserved space in the header if the timestamp is not provided.

One possible option here that doesn’t require variable-size headers and yet allows the transport layer to reach the timestamp is to replace your timestamp indication flag with an optional offset:

uint8 timestamp_offset_from_header_origin  # Zero if not timestamped.

If the serialized object is timestamped, this field would be set to the header size (32 bytes).
If the serialized object is not timestamped but the header timestamp is given in the reserved field, this field would be set to the offset of the reserved field (8 bytes per my proposal, 24 bytes per your proposal).
If neither is present, this field would be set to zero, indicating the lack of timestamps.

The obvious disadvantage is that we always incur the overhead of transferring the reserved field in the header so that kind of defeats the point.

scottdixon · November 30, 2022, 8:00pm

yes

uint8 timestamp_offset_from_header_origin  # Zero if not timestamped.

this works for me however the checksum becomes an issue. My proposal has the property of including the timestamp in the header checksum. DPI can’t use header data if the header is invalid so It must be part of this checksum.

My proposal does require 8 more bytes to expand the header but it never transmits those bytes on the wire as additional header data. What we’re doing is borrowing 8 bytes, sometimes, from the data payload to save bandwidth but we still require anyone reading the header to always read those 8 bytes when calculating the checksum and to allocate 8 bytes if using the timestamp without de-serializing the message.

pavel.kirienko · December 1, 2022, 11:02am

We must choose: either to retain the fixed-size header or to sacrifice intelligent timestamping.

To this end, I would like to discuss how would a Cyphal/UDP implementation deduce whether the serialized object contains a timestamp in it so that it can decide whether to add an explicit timestamp to the header or not. Do you have some API solution in mind for this? I presume that a generic implementation that always adds the header timestamp would also defeat the point of this variable-size-header approach.

scottdixon · December 1, 2022, 5:29pm

My assumption was this would be an API where the sender tells the library that there’s a timestamp and any offset (if we go that route) involved.

pavel.kirienko · December 1, 2022, 8:32pm

Okay. I certainly see the value in this but I fear that we might accidentally build another DDS unless we scrutinize every feature we add to the protocol.

I suspect that the situation where the application layer and the transport layer view the timestamping problem differently is not entirely impossible. From the transport layer standpoint, a timestamp could be leveraged for queueing policy implementations, discarding of obsolete data, and rate limiting, and we can expect it to represent the point in time where the transfer is emitted to the network. At the application layer, a timestamp could potentially refer to a point in time that precedes the formation of the transfer (e.g., it is common for state estimators to timestamp published estimations based on the timestamp of the latest to arrive sensor feed message). This creates a certain danger that the transport layer might misuse an application-layer timestamp for transport purposes where it would not be appropriate. If you accept this, then perhaps you might see how the danger arises out of our attempt to save seven bytes per transfer (sic! not per frame) by mixing two distinct layers of the communication stack.

The risk is indeed low (I expect the cases where the application layer timestamp is ill-suited for transport purposes to be rare), but then so is the reward (saving just 7 bytes per transfer). Considering our commitment to simplicity, I am mildly inclined towards the option of using a simpler and less efficient fixed-size header with an optional transport layer timestamp that does not invite implementations to make assumptions about the contents of the serialized payload.

Do you think this is sound or am I missing something?

scottdixon · December 2, 2022, 8:14pm

Per the dev call, this is what I think we all agreed to

uint4 version                      # <- 1
void4
 
@assert _offset_ == {8}
uint3 priority                     # Duplicates QoS for ease of access; 0 -- highest, 7 -- lowest.
void5
 
@assert _offset_ == {16}
uint16 source_node_id
uint16 destination_node_id
uint16 data_specifier              # Like in Cyphal/serial: subject-ID | (service-ID + request/response discriminator).
 
@assert _offset_ == {64}
uint64 transfer_id
 
@assert _offset_ == {128}
uint31 frame_index
bool end_of_transfer
 
uint16 user_data
# Opaque application-specific data with user-defined semantics. Generic implementations should ignore
 
@assert _offset_ % 16 == {0}
uint8[2] header_crc16_big_endian
 
@assert _offset_ / 8 == {24}       # Fixed-size 24-byte header with natural alignment for each field ensured.
@sealed

This is an optimization for UDP/IP on Ethernet. By limiting the multicast group ID to the least significant 23 bits, Ethernet hosts can avoid additional filtering responsibilities above layer 2.
RFC. 2365, Section 6.2.1 reserves 239.0.0.0/10 and 239.64.0.0/10 for future use (because of footnote 1, Cyphal/UDP does not have access to the 239.128.0.0/10 scope). Cyphal/UDP uses this bit to isolate IP header version 0 traffic (note that the IP header version is not, necessarily, the same as the Cyphal Header version) to the 239.0.0.0/10 scope but we can enable the 239.64.0.0/10 scope in the future.
SNM (Service, Not Message): If set then this is an RPC request or response and the 16 LSbs of the destination IP address is the full-range destination node identifier. If not set then the 15 LSbs of the destination IP address are a subject identifier for a pub/sub message and the 16th LSb is 0.
Zero on transmit, discard on receipt unless zero.
This is a temporary UDP port. We’ll register an official one later.
Per RFC 1112, the default TTL is 1, which is unacceptable. Therefore, publishers should use the TTL value of 16 by default, which is chosen as a sensible default suitable for any intravehicular network.
(comment removed)
The data specifier is taken directly from Cyphal/Serial (pycyphal.transport.serial package — PyCyphal 1.15.0 documentation)
If the SNM10 bit is set then this is a 10-bit service identifier with a 1-bit IRNR11 flag, otherwise it is a 13-bit subject identifier.
SNM (Service, Not Message). Same value as found in the destination IP header (SNM3).
IRNR (Is Request Not Response) if SNM10 is set.
(comment removed)
Like in CAN: 0 – highest priority, 7 – lowest priority. This data is duplicated from lower-layer QoS fields but provided in the Cyphal header to simplify transfer forwarding where the QoS data is not readily available above the transport layer.
0xFFFF == anonymous transfer
The 31 bit frame index within the current transfer.
EOT (End Of Transfer): if the most significant bit (31st) bit of the 32-bit frame index is set if the current frame is the last frame of the transfer.
If EOT16 is set then this is the CRC of the reassembled transfer (header data excluded). This also applies to single frame transfers where EOT will always be set. In this case the CRC applies to just the single frame which is different than CAN where single transfers do not use a Cyphal CRC as they rely on the CAN CRC exclusively. Because the UDP checksum is weak the UDP version of Cyphal relies on the UDP checksum only as an optimization for multi-part transfers (where the CRC failure can catch an error before the transfer is reassembled and the strong Cyphal CRC is applied).
0xFFFF == broadcast
Header CRC is CRC-16/CCITT-FALSE (aka CRC-16/AUTOSAR) and is encoded as a two-byte, big-endian integer.