Amendment to the transfer reception state machine implementations

The implementations of the multi-frame transfer reassembly state machine in libcanard, libudpard, and pycyphal may be considered non-spec-compliant under some interpretations of the Specification. There is one edge case that may require special treatment.

The Specification sets forth the requirements for the [multi-frame] transfer reception process in section 4.1.4 Transfer reception. The description is purposefully made abstract to allow implementation flexibility while ensuring wire compatibility.

To the best of my knowledge, the existing implementations available in libcanard, libudpard, and pycyphal match the specification but for one edge case discovered recently. The Specification requires that if the time elapsed since the arrival of the last transfer exceeds some configured transfer-ID timeout, the next transfer shall be accepted even if its transfer-ID is out of sequence. It does not require a transfer to be rejected if the time delta between the arrival of any pair of its frames exceeds the transfer-ID timeout. However, libcanard, libudpard, and pycyphal reject such transfers as invalid.

For example, below is the dump of a valid multi-frame CAN transfer (with the transfer CRC of 0x104A):

 (1681243583.288644)  slcan0  RX - -  10644C7F   [8]  09 30 00 00 00 00 00 B1   '.0......'
 (1681243583.291624)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 11   '........'
 (1681243583.294662)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 31   '.......1'
 (1681243583.297647)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 11   '........'
 (1681243583.300635)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 31   '.......1'
 (1681243583.303616)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 11   '........'
 (1681243583.306614)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 31   '.......1'
 (1681243583.309578)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 00 11   '........'
 (1681243583.312569)  slcan0  RX - -  10644C7F   [8]  00 00 00 00 00 00 10 31   '.......1'
 (1681243583.315564)  slcan0  RX - -  10644C7F   [2]  4A 51                     'JQ'

If the transfer-ID is set to any value below approx. 3 ms, all existing implementations will reject it as invalid. This case may not affect high-priority transfers or lightly loaded networks, but it may cause nodes to reject well-formed lower-priority transfers if they are delayed by interjecting high-priority traffic. Service traffic, such as logging, file transfer, diagnostics, etc., is particularly likely to be affected.

I see it as an implementation defect and an unintentional deviation from the Specification; hence I am planning to submit changes to libcanard and pycyphal to rectify it.

2 Likes

Good catch Pavel!