At the last dev call, I mentioned that I wanted to discuss an architectural deficiency in the current working draft of the Cyphal/UDP transport specification that requires attention. The deficiency is not a major one and is not likely to jeopardize the utility of the transport at large, but it may lead to unnecessary friction in its deployment and long-term maintenance. I am not the first to call attention to this property of the protocol; the first one to question it was someone from the Amazon team (sorry, can’t recall the exact person).
This post provides an exposition of the problem, suggests a possible solution, and lists its advantages and known disadvantages at the end. I suggest implementing this proposal on a separate branch of PyCyphal for evaluation and testing purposes.
CC @scottdixon @ASMik @schoberm @lydiagh
Problem
The current draft of the Cyphal/UDP transport reifies the Cyphal node-ID through the IP address of the node as follows:
xxxxxxxx.xddddddd.nnnnnnnn.nnnnnnnn
\________/\_____/ \_______________/
(9 bits) (7 bits) (16 bits)
prefix subnet-ID node-ID
The prefix is fixed, and the subnet-ID bears little relevance, so assume it is also fixed. The node-ID replaces the two least significant octets of the address.
One can determine the origin of a given Cyphal transfer (whether message or service) by evaluating the source IP address. The destination address depends on the kind of transfer: message transfers are directed towards their multicast group address (the rules of its composition are not covered here), while service transfers are directed towards the IP address of the recipient (composed as described above).
This approach was chosen as it allows the system to be decentralized above OSI layer 3 (no need to maintain a central agent responsible for brokering the connections between nodes) and, at the same time, allows a node to commence operation immediately without the need to discover the network configuration dynamically. These requirements are based on the core design principles of Cyphal and, as such, are not questioned.
The specification focuses on IPv4. It is trivial to extend to also cover IPv6 but the benefits of such extension remain unclear.
While the approach is simple and functional, as indicated by limited testing in lab conditions, it suffers from several issues that appear to be cheap to address. They are reviewed below.
1. Leaky abstraction
The protocol stack is layered as follows:
L5…7 | Cyphal/UDP |
---|---|
L4 | UDP |
L3 | IP |
The Cyphal node-ID is a property of the top-layer protocol, but it is manifested at L3 directly. This leads to practical complications (discussed below) and has the potential to make the evolution of the protocol difficult in the longer term.
2. Difficulty supporting multiple nodes per local NIC
This problem does not affect simple, deeply embedded devices that run one logical Cyphal node per hardware unit, but it is significant for higher-level devices that may run several nodes concurrently on the same machine, which may or may not require connectivity to remote nodes over the network. A Cyphal node in this case may be hosted by a separate application running in a higher-level OS; multiple nodes per application are also possible.
Generally, the configuration of the network connectivity of a higher-level OS should not affect the application software executed in it. Say, a ROS node launched on one computer can communicate with its peers regardless of how the network is configured and whether the peers are located on the same machine or are remote. The erasure of the distinction between local and remote resources is a critical architectural feature of networked OS that should be respected to maximize the utility of the protocol. In the current draft, however, Cyphal/UDP is incompatible with these principles, as the application-level parameters of a node (specifically its ID) derive from the low-level aspects of the network configuration (the IP address). Further, to run more than one node on the same machine, one would have to apply an uncommon configuration of the OS’s networking stack to enable multiple IP addresses per NIC.
Ideally, it should be possible to run an arbitrary software node on any networked OS without the need to alter or query the network configuration at all.
3. Minor inner inconsistency
The current design relies on IP multicast for subjects, but IP unicast for services. This seems logical at first but considering that IP multicast is effectively a distinct protocol (very different from IP unicast), one might argue that the Cyphal/UDP transport could be simplified by focusing on the multicast IP exclusively.
4. Boundary node-IDs are not usable
From the node-ID to IP mapping above, one might see that node-ID values of 0 and 4095 (possibly others that are one less than a power of 2) may be unusable depending on the network mask setting.
5. ARP is required
Either a static or dynamic ARP table is required for sending IP unicast datagrams.
Solution
In general terms, the solution is to break the hard link between the Cyphal node-ID and the IP address of a node. A conventional solution would be to rely on a central broker that manages the routing and keeps the mapping between IP addresses and node identities (see ROS1). In the case of Cyphal, this is incompatible with the core requirements, but it is possible to rely on IP multicast to attain virtually the same result.
In the existing proposal, a service transfer is performed by sending a unicast IP datagram to the IP address computed as shown earlier. The current proposal is to modify this such that service transfers are also multicast transfers, where the destination address is computed as follows:
fixed service
(9 bits) res. selector
________ ||
/ \ vv
11101111.0ddddd01.nnnnnnnn.nnnnnnnn
\__/ \___/ \_______________/
(4 bits) (5 bits) (16 bits)
IPv4 subnet-ID node-ID
multicast \_______________________/
prefix (23 bits)
collision-free multicast
addressing limit of
Ethernet MAC for IPv4
The subject multicast group address is modified as follows (the two least significant bits of the subnet-ID are replaced with the message selector bit and one reserved zero bit):
fixed message reserved
(9 bits) select. (3 bits)
________ res.| _
/ \ vv / \
11101111.0ddddd00.000sssss.ssssssss
\__/ \___/ \____________/
(4 bits) (5 bits) (13 bits)
IPv4 subnet-ID subject-ID
multicast \_______________________/
prefix (23 bits)
collision-free multicast
addressing limit of
Ethernet MAC for IPv4
Since the unicast address of a node is no longer connected to its Cyphal identity, the Cyphal node-ID of the origin of a given transfer needs to be communicated using some other means. It is therefore proposed to modify the Cyphal/UDP header as follows:
-uint8 version # =0 in this revision; ignore frame otherwise.
+uint8 version # =1 in this revision; ignore frame otherwise.
uint8 priority # Like in CAN: 0 -- highest priority, 7 -- lowest priority.
-void16 # Set to zero when transmitting, ignore when receiving.
+uint16 source_node_id # Cyphal node-ID of the origin.
uint32 frame_index_eot # MSB is set if the current frame is the last frame of the transfer.
uint64 transfer_id # The transfer-ID never overflows.
void64 # This space may be used later for runtime type identification.
The existing draft does not support anonymous Cyphal/UDP transfers for two reasons: first, that would require anonymous IP transfers which are not supported by the Internet protocol; second, the problem of automatic address assignment is already covered by DHCP. This reasoning does not apply to the updated architecture anymore. Therefore, it may be desirable to introduce the support for anonymous transfers with a trivial change: by reserving the maximum valid source node-ID value of 65535 to indicate that the source address is not valid. This is also in line with the Cyphal/serial (aka Cyphal/TCP) specification draft. An alternative would be to compact the version+priority fields to provide a few free bits for the new flags field, where one of the flags would indicate anonymity.
The UDP port number assignment is not altered by this proposal.
Advantages
-
Cleaner architecture. The link between the IP address of a node and its Cyphal identity is completely eliminated.
-
Compatibility with software nodes. One can execute an arbitrary software node on any system without the need to alter or inspect the configuration of its networking stack. This brings the Cyphal experience on par with high-level pub/sub frameworks.
-
The resulting solution relies solely on IP multicast for all types of communication. Neither IP unicast nor broadcast are used; ARP is therefore also unnecessary.
-
The entire range of Cyphal node-ID is usable as IP-related restrictions are no longer relevant.
-
Implementations are expected to be simplified due to the removal of IP unicast communication (only one mode of communication is left).
Disadvantages
-
The network routers will have to manage a larger number of IP multicast groups, which is defined as the number of subjects plus the number of nodes on the network.
-
IGMP level 2 implementation will be required for all nodes. Previously, IGMP level 2 implementation was only needed for nodes that subscribe to at least one subject.
-
The number of Cyphal subnets (domains) is reduced from 128 to 32.