I designed and implemented a simple transport-agnostic extension that works on top of the existing Cyphal transports, adding support for minimalist named topics. I tried not to compromise on Cyphal’s general simplicity and robustness while also avoiding the need for manual configuration. The system is entirely plug-and-play while also supporting instant power-up-and-go, provided that the once automatically established configuration is recovered from non-volatile memory.
Requirements I defined for this work:
-
Discriminate data flows using descriptive string names instead of integer port-IDs.
-
Allow nodes to join the network with zero prior configuration of the protocol (at least above the physical layer).
-
Introduction of new topics and/or nodes must not disturb operation of the existing participants.
-
A fully converged network must offer a service quality at least as good as a statically configured network.
-
The autoconfiguration protocol must be stateless and must not require central coordinators or special nodes.
-
Retain backward compatibility with old Cyphal nodes that do not support named topics.
-
Preferably, the background service traffic should be exchanged at a constant rate to simplify latency and throughput analysis.
-
Once a stable configuration is found, it should be possible to store it in the non-volatile memory per node for instant recovery after power cycling, bypassing the autoconfiguration stage. Obsolete or incorrect per-node configuration should not affect the rest of the network. This ensures that a vehicular network with named topics will perform identically to a fully statically configured one until its configuration is changed.
-
Scalability beyond a thousand of nodes and topics per network.
-
The solution should be implementable in under 1k lines of C without dynamic memory or undue computing costs for small nodes.
Stretch goals:
- Support subscriptions with wildcard topic name matching / name substitution. See https://forum.opencyphal.org/t/rfc-add-array-of-ports/1878
The PoC is implemented as a very compact C library in less than 1k SLoC, which I call cy
. The library is transport-agnostic and thus requires glue logic to bind it with the specific transport library and platform-specific code underneath in a user-friendly way, so I made another 500-SLoC library specifically for Cyphal/UDP on POSIX, which I call cy_udp
(actually it should be renamed into cy_udp_posix
). The actual user-facing product is therefore cy_udp
, while cy
is its core. There should be cy_can
as well.
Please look at the PoC, read how it works, and perhaps test it locally here:
Distributed consensus protocols are very hard to get right. It is not impossible that there are major design flaws in my solution, so I would very much welcome an in-depth review and criticism.
Impact on the project
Compatibility with existing nodes is generally not affected. A named-topic-capable node can interact with an old node provided that the topic is pinned, meaning that its name is the subject-ID; e.g., /1234
. The range of subject-IDs in [0, 6144) is now dedicated for named topic allocation, while the remaining range in [6144, 8192) is reserved for fixed subject-IDs and pinned topics. It is possible to pin topics anywhere, but it is not recommended as there exist edge cases where it may cause allocation collisions – more on this in the readme.
Libudpard/libcanard/libserard will require a slight extension of their APIs to add optional support for node-ID autoconfiguration and topic allocation collision detection. The overall impact is about 100 SLoC per library. The current PoC works with vanilla libudpard but there is a risk of data misinterpretation; more on this in the README.
The heartbeat message is replaced with a new one, suppose it will be called cyphal.node.Heartbeat
. It is obviously wire-compatible with the old heartbeat but it adds more data for the distributed consensus algorithm, which piggybacks on the heartbeat message for simplicity. The exchange rate of the heartbeats should also be increased up to about 10 Hz; even though it is not strictly required for the protocol to function, it speeds up the initial configuration stage.
One undesirable but hard-to-avoid side effect is that every node that wishes to participate in the named topic protocol must process heartbeat messages from all other nodes. In a network with a large number of participants this may be burdensome for small MCUs. I attempted to simplify the heartbeat processing pipeline as much as possible. Right now it amounts to just deserializing the message and searching two binary trees, each topic-count elements large. If a reallocation is needed, a few more tree traversals are added. I am not yet certain how it’s going to scale but I am cautiously optimistic.
Next steps
If this first PoC does not uncover any major design flaws, the next step would be to add support for RPC endpoints, which is much easier because it does not require consensus.
Then we could focus on building compact wrapper libraries with very simple API that combine Cy and one of the libxxxards into atomic packages for various systems and protocols: Cyphal/CAN for SocketCAN, Cyphal/CAN for baremetal environment, Cyphal/UDP for POSIX (which is my cy_udp
), Cyphal/UDP for baremetal, etc. The goal here is to achieve a significant simplification of the API and a reduction of the entry barrier.
TL;DR
// SET UP LOCAL NODE:
struct cy_udp_t cy_udp;
cy_err_t res = cy_udp_new(&cy_udp,
local_unique_id, // 64-bit composed of VID+PID+IID
"/my_namespace", // topic name prefix (defaults to "/")
(uint32_t[3]){ udp_parse_iface_address("127.0.0.1") },
CY_NODE_ID_INVALID, // will self-allocate
1000); // tx queue capacity per interface
if (res < 0) { ... }
// JOIN A TOPIC (to publish and/or subscribe).
// To interface with an old node that does not support named topics, put the subject-ID into the topic name;
// e.g., `/1234`. This will bypass the automatic subject-ID allocation and pin the topic as specified.
struct cy_udp_topic_t my_topic;
cy_err_t res = cy_udp_topic_new(&cy_udp,
&my_topic,
"my_topic", // expands into "/my_namespace/my_topic"
NULL);
if (res < 0) { ... }
// SUBSCRIBE TO TOPIC (nothing needs to be done if we want to publish):
struct cy_subscription_t my_subscription;
cy_err_t res = cy_udp_subscribe(&my_topic,
&my_subscription,
1024 * 1024, // extent (max message size)
CY_TRANSFER_ID_TIMEOUT_DEFAULT_us, // going to remove this
on_message_received_callback);
if (res < 0) { ... }
// SPIN THE EVENT LOOP
while (true) {
const cy_err_t err_spin = cy_udp_spin_once(&cy_udp);
if (err_spin < 0) { ... }
// PUBLISH MESSAGES (no need to do anything else unlike in the case of subscription)
// Optionally we can check if the local node has a node-ID. It will automatically appear
// if not given explicitly at startup in a few seconds. If a collision is discovered,
// it will briefly disappear and re-appear again a few seconds later.
if (cy_has_node_id(&cy_udp.base)) {
char msg[256];
sprintf(msg, "I am %016llx. time=%lld us", (unsigned long long)cy_udp.base.uid, (long long)now);
const struct cy_payload_t payload = { .data = msg, .size = strlen(msg) };
const cy_err_t pub_res = cy_udp_publish(&my_topic, now + 100000, payload);
if (pub_res < 0) { ... }
}
}
Related posts
Now, building a ROS 2 middleware based on Cyphal is entirely within reach. Here is an interesting article on the subject: ROS 2 Over Email: rmw_email, an Actual Working RMW Implementation – Christophe Bédard. Do we have any volunteers?