For a little over a month, I’ve been working toward bringing my PoC implementation of what I’d like to eventually call Cyphal v1.1 to a functional state where it can be shown to others. Today, it is ready to be cautiously looked at:
I named this thread “RFC” but it is not a conventional RFC — there is no design doc to look at; instead, this time I decided to straight-up code it to get a better feel of it. In this case it is a more efficient approach than drafting an RFC because I got to experience real-time feedback from my design decisions and change course accordingly with zero delay, and also because the resulting codebase is very compact and simple. At some later point we are either going to formalize this as a proper RFC, or (more likely) directly submit a changeset to the Specification.
Those unfamiliar can catch up by skimming through this topic where this project was first announced half a year ago:
Other related discussions:
- An exploratory study: UAVCAN as a middleware for ROS
- Using UAVCAN as a general purpose CAN(FD) Embedded DCPS Middleware?
Quick summary
The updated design provides higher-level abstractions to applications at a low complexity cost. The main selling point is that instead of numerical subject-IDs we now use conventional topic names. There is still an option to use numerical IDs if necessary, especially for compatibility with v1.0 nodes — more on this later.
I added a few more fields to the Heartbeat type to use it as a gossip message for exchanging CRDT state between nodes. This is used to find consensus on how to assign unique subject-IDs to topic names. A TLA+ model for the formal verification of the protocol is included.
An API is provided for pattern subscriptions – an essential feature of the protocol inspired by RFC: add array of ports. This allows an application to subscribe to a topic using patterns like ins/?/data, which will collect data from topics like ins/foo/data and ins/bar/data. For each received message, the application is informed which exact topic was matched, and which name substitutions had to be made (foo and bar in this example).
RPC as a separate transport-layer entity has been removed from v1.1. Instead, we allow subscribers to send a direct peer-to-peer response to any message. Any topic can be both a conventional pub/sub link and an RPC endpoint.
Reliable delivery is supported at the transport layer. At the moment, “reliable” means that the published message is retried until at least one acknowledgement is received or a deadline is reached, and the application is notified of the outcome. There are plans to amend this by adding a discovery of the active subscriber set, approximating the logic of DDS, which is a relatively simple change to the transport library (ca. 100 SLoC estimated).
Backward compatibility
The solution is fully wire-compatible with Cyphal/CAN v1.0 through so-called “pinned topics”, where one can pub/sub on a specially named topic of the form /#01ab that will always map to the same subject-ID encoded as hex in its name (0x01ab = 427 in this example). This allows full interoperability with old devices that are unable to participate in the new topic allocation protocol. The old RPCs have been removed but old devices can continue using RPCs between themselves – these interactions are invisible to Cyphal v1.1.
Wire compatibility with the experimental Cyphal/UDP v1.0 could not be preserved, but both versions can share the same network. The Cyphal/UDP stack has seen major changes toward simplification that make it impractical to try and support both. The updated proposal can be found on the experimental branch of the libudpard repo, and the new header format can be found in specification issue 143. It is worth mentioning that the redesigned libudpard API is smaller and much more ergonomic, while also supporting occasionally useful features such as message ordering recovery (if messages arrive like 1 3 2, the old implementation would only accept 1 and 2, while the new one will wait for a configurable reordering window duration for 3 to show up, such that the application sees 1 2 3).
Current status and next steps
The updated libudpard is robust and well-tested. It is not ready to be rolled out because without the higher-level parts of the stack it is not very useful.
The Cy library that sits on top of it is poorly tested, lacks deinitialization routines, and is overall not yet stable. It is my intention to cautiously apply it in a few experimental or low-criticality systems to collect empirical feedback, which is to be used later to guide our next steps toward Cyphal v1.1.
Our very own @laktoosivaba is working on Rust bindings on top of Cy. We would very much welcome native support for Cyphal v1.1 in canadensis eventually (wink @samcrow), but for now it is easier to work with a single codebase than maintaining two separate implementations. In retrospect, I am slightly regretful of my decision to code Cy in C, but it is too late to turn back on that now.
I am going to personally focus on upgrading libcanard to support v1.1 while retaining compatibility with v1.0 in the same library revision.
Zooming out, I would like to slightly pivot Cyphal v1.1 toward being a more general-purpose real-time embedded-firendly pubsub framework, without explicitly focusing on any particular kind of application within that domain. We discussed this with @scottdixon on a few occasions in the past and I believe there is a clear consensus here. One of the immediate practical outcomes of this decision is that we will remove all standard DSDL types except for two:
cyphal.Heartbeat, which is wire-compatible with the originaluavcan.node.Heartbeat.- Potentially
cyphal.CRUD, which defines a very basic set of create/read/update/delete operations on named entities inside a node (such as files, parameters, etc).
Cyphal v1.1 will focus only on providing a simple and robust pub/sub layer, with everything else built on top by third parties.
PyCyphal hasn’t been touched yet but there is an issue that outlines the changes I intend for its v2 release, which will support both Cyphal v1.1 and v1.0: PyCyphal v2 roadmap · Issue #351 · OpenCyphal/pycyphal · GitHub. One of the key changes is the removal of all application-level features, in line with the overall direction of the project.
One longer-term objective of great interest is to build a new ROS middleware on top of Cyphal v1.1; looking for volunteers.
Call to action
Please grab Cy with libudpard, play with it, and report your findings here. Out of the box it can only run on GNU/Linux, but if you make it run elsewhere, please submit patches. You should not be surprised if it segfaults, leaks memory, or explodes. To the contrary, the new Libudpard is expected to be robust already and should run anywhere.
If you found this interesting, it is best to attend the next bi-weekly call, which is due next Friday, Jan 16, where this project will be discussed.