Refined requirements for the RX pipeline in LibUDPard

pavel.kirienko · July 13, 2023, 12:19pm

In LibUDPard, the transfer reassembler is a bit more convoluted than in the other Cyphal libraries due to the following assumptions:

Frames of multi-frame transfers may be delivered out-of-order (A2 A0 A1).
Frames of adjacent multi-frame transfers may be interleaved (A0 A1 B0 A2 B1 …).
Frames may be duplicated even within the same interface either due to the behavior of the network or intentionally as a means of forward error correction.
If redundant interfaces are used, the redundant group should operate at the throughput and latency of the best-performing interface with zero failover delay.
The reassembler should offer linear time complexity on payload size and, at most, logarithmic on the number of transmitting nodes and the number of local subscriptions.
The reassembler should not consume dynamic memory aside from the fixed-size session state objects (one per remote node emitting matching transfers) and for the payload (at most two per session).

Given these assumptions and requirements, one particular challenge is the detection of transfer completion. As transfer frames are assigned a unique frame index [0,2^{31}), the reassembler can reconstruct the correct ordering of the frames upon reception. Knowing the MTU, which is guaranteed to be constant within a transfer and thus easy to discover automatically on a per-transfer basis, the reassembler can trivially determine the offset of each frame within the reassembled payload buffer (a special case applies when the first received frame is the EOT-frame as its MTU is unobservable, but it is easy to handle too). Duplicated frames will cause the reassembler to overwrite the payload received earlier, which is acceptable. With all of these behaviors out of the way, the difficulty arises when the reassembler has to decide whether a newly received frame completes its transfer. A slightly more abstract formulation given below makes it clear that completion is undecidable for the observer under the constant memory requirement, as it is necessary to consult with a list of unique observed elements:

The frame index values within a transfer form an ordered contiguous set of naturals of unknown size, and the reassembler (observer) is presented with elements of the set one at a time; each element may be observed repeatedly. The observer is informed of whether the currently observed element is the maximum one. For example, a set containing {0, 1, 2, 3} may be presented to the observer as a sequence of observations like (2, non-maximum), (0, non-maximum), (0, non-maximum), (3, maximum), (1, non-maximum).

Hence it is necessary to keep a certain state per unique received frame within the current transfer. To allow fast lookups (per the complexity requirements), the per-frame states need to be ordered by frame index in a tree. To avoid excessive memory footprint and, at the same time, minimize data copying, the reassembler can take ownership of the frame payloads received from the application and store them as-is, each payload attached to the metadata of its frame. The reassembler can then either copy the payloads into a contiguous buffer immediately before handing over the reassembled transfer to the application or simply hand over the fragments to the application as-is, allowing it to either make use of them directly or to copy them into a contiguous buffer beforehand. (N.B.: PyCyphal uses a similar approach to eliminate data copying within the stack: the reassembled transfer payload is passed to the application as a list of fragments.)

The trade-offs involved are as follows:

The application will need to be ready to relinquish ownership of the datagram buffers received from the network driver when invoking functions like udpardRxSubscriptionReceive and udpardRxServiceDispatcherReceive. Whether the buffers are heap-allocated or otherwise is irrelevant here, as we can control the memory management policy via UdpardMemoryResource.
When a transfer is reassembled, its payload will be presented to the application as an ordered sequence of fragments (either a linked list or a tree). The library can optionally provide a convenience function that copies the fragmented payload into a contiguous buffer for ease of consumption at the expense of time and memory. Applications that are capable of processing fragmented payloads as-is will enjoy the benefits of a zero-copy stack. It is the responsibility of the application to free the fragments.
Single-frame transfers are always zero-copy and contiguous.
The constant-MTU requirement will be removed. This doesn’t mean that the Specification will not require that.
Applications that are unable to hand over the ownership of the datagram payload buffers to the library (e.g., if the application does not own its frames but is given them by some third party) will have to create a new owned buffer and copy the data into it. From the standpoint of data copy efficiency, this approach is on par with libcanard.

I am going to amend the RX pipeline API accordingly and then we can discuss the specifics further.