RFC: Early preview of Cyphal v1.1

For a little over a month, I’ve been working toward bringing my PoC implementation of what I’d like to eventually call Cyphal v1.1 to a functional state where it can be shown to others. Today, it is ready to be cautiously looked at:

I named this thread “RFC” but it is not a conventional RFC — there is no design doc to look at; instead, this time I decided to straight-up code it to get a better feel of it. In this case it is a more efficient approach than drafting an RFC because I got to experience real-time feedback from my design decisions and change course accordingly with zero delay, and also because the resulting codebase is very compact and simple. At some later point we are either going to formalize this as a proper RFC, or (more likely) directly submit a changeset to the Specification.

Those unfamiliar can catch up by skimming through this topic where this project was first announced half a year ago:

Other related discussions:

Quick summary

The updated design provides higher-level abstractions to applications at a low complexity cost. The main selling point is that instead of numerical subject-IDs we now use conventional topic names. There is still an option to use numerical IDs if necessary, especially for compatibility with v1.0 nodes — more on this later.

I added a few more fields to the Heartbeat type to use it as a gossip message for exchanging CRDT state between nodes. This is used to find consensus on how to assign unique subject-IDs to topic names. A TLA+ model for the formal verification of the protocol is included.

An API is provided for pattern subscriptions – an essential feature of the protocol inspired by RFC: add array of ports. This allows an application to subscribe to a topic using patterns like ins/?/data, which will collect data from topics like ins/foo/data and ins/bar/data. For each received message, the application is informed which exact topic was matched, and which name substitutions had to be made (foo and bar in this example).

RPC as a separate transport-layer entity has been removed from v1.1. Instead, we allow subscribers to send a direct peer-to-peer response to any message. Any topic can be both a conventional pub/sub link and an RPC endpoint.

Reliable delivery is supported at the transport layer. At the moment, “reliable” means that the published message is retried until at least one acknowledgement is received or a deadline is reached, and the application is notified of the outcome. There are plans to amend this by adding a discovery of the active subscriber set, approximating the logic of DDS, which is a relatively simple change to the transport library (ca. 100 SLoC estimated).

Backward compatibility

The solution is fully wire-compatible with Cyphal/CAN v1.0 through so-called “pinned topics”, where one can pub/sub on a specially named topic of the form /#01ab that will always map to the same subject-ID encoded as hex in its name (0x01ab = 427 in this example). This allows full interoperability with old devices that are unable to participate in the new topic allocation protocol. The old RPCs have been removed but old devices can continue using RPCs between themselves – these interactions are invisible to Cyphal v1.1.

Wire compatibility with the experimental Cyphal/UDP v1.0 could not be preserved, but both versions can share the same network. The Cyphal/UDP stack has seen major changes toward simplification that make it impractical to try and support both. The updated proposal can be found on the experimental branch of the libudpard repo, and the new header format can be found in specification issue 143. It is worth mentioning that the redesigned libudpard API is smaller and much more ergonomic, while also supporting occasionally useful features such as message ordering recovery (if messages arrive like 1 3 2, the old implementation would only accept 1 and 2, while the new one will wait for a configurable reordering window duration for 3 to show up, such that the application sees 1 2 3).

Current status and next steps

The updated libudpard is robust and well-tested. It is not ready to be rolled out because without the higher-level parts of the stack it is not very useful.

The Cy library that sits on top of it is poorly tested, lacks deinitialization routines, and is overall not yet stable. It is my intention to cautiously apply it in a few experimental or low-criticality systems to collect empirical feedback, which is to be used later to guide our next steps toward Cyphal v1.1.

Our very own @laktoosivaba is working on Rust bindings on top of Cy. We would very much welcome native support for Cyphal v1.1 in canadensis eventually (wink @samcrow), but for now it is easier to work with a single codebase than maintaining two separate implementations. In retrospect, I am slightly regretful of my decision to code Cy in C, but it is too late to turn back on that now.

I am going to personally focus on upgrading libcanard to support v1.1 while retaining compatibility with v1.0 in the same library revision.

Zooming out, I would like to slightly pivot Cyphal v1.1 toward being a more general-purpose real-time embedded-firendly pubsub framework, without explicitly focusing on any particular kind of application within that domain. We discussed this with @scottdixon on a few occasions in the past and I believe there is a clear consensus here. One of the immediate practical outcomes of this decision is that we will remove all standard DSDL types except for two:

  • cyphal.Heartbeat, which is wire-compatible with the original uavcan.node.Heartbeat.
  • Potentially cyphal.CRUD, which defines a very basic set of create/read/update/delete operations on named entities inside a node (such as files, parameters, etc).

Cyphal v1.1 will focus only on providing a simple and robust pub/sub layer, with everything else built on top by third parties.

PyCyphal hasn’t been touched yet but there is an issue that outlines the changes I intend for its v2 release, which will support both Cyphal v1.1 and v1.0: PyCyphal v2 roadmap · Issue #351 · OpenCyphal/pycyphal · GitHub. One of the key changes is the removal of all application-level features, in line with the overall direction of the project.

One longer-term objective of great interest is to build a new ROS middleware on top of Cyphal v1.1; looking for volunteers.

Call to action

Please grab Cy with libudpard, play with it, and report your findings here. Out of the box it can only run on GNU/Linux, but if you make it run elsewhere, please submit patches. You should not be surprised if it segfaults, leaks memory, or explodes. To the contrary, the new Libudpard is expected to be robust already and should run anywhere.

If you found this interesting, it is best to attend the next bi-weekly call, which is due next Friday, Jan 16, where this project will be discussed.

2 Likes

The “response” feature, can it be expanded to handle multiple responses like a gRPC stream? For example:

request →
← working 10%
← working 50%
← working 99%
← response: success

1 Like

Yes, quite trivially also. P2P responses are sent in a transport-specific way.

In Cyphal/UDP, responses are unicast back to the sender that delivered the original message with a fixed 24-byte header where we encode the topic hash and the transfer-ID of the original message that we are responding to. Currently we only send a single response, but nothing in the protocol design prohibits us from sending an arbitrary number of them. It is purely an API design issue (a rather simple one).

In Cyphal/CAN, which I am still working on right now, responses are sent as the old RPC service response transfers using service-ID 511 (unused in Cyphal v1.0) with a 7-byte prefix encoding the topic hash and the transfer-ID of the original message (like in UDP, except that the transfer-IDs are only 5 bits wide, and we only send 48 most significant bits of the topic hash to fit into a single Classic CAN frame). Again, it is a no-brainer to send more than one response, but at the moment cy.c removes the state associated with the pending response once the first one is received.


UPDATE: I decided to amend my response with code references. This is where we accept a P2P transfer and extract the transfer-ID and the topic hash from the header; there are two kinds of P2P transfers — delivery acks and RPC responses:

If the transfer is of kind P2P_KIND_RESPONSE (as opposed to P2P_KIND_ACK), we send the result to the application via self->vtable->on_message(). It is forwarded through the thin glue layer called cy_udp_posix and lands in this protocol-agnostic handler cy_on_response(), where we do an AVL tree lookup for the matching pending response state:

You can see that we set the result and finalize the future:

future_cancel_and_notify(&fut->base); // future invalidated

So what needs to be done to support streaming is just to not finalize the future.

Is this the start of a Cyphal Streaming protocol? :thinking:

I think it’s an extension that can be added very easily at any point without even altering the wire format. If you have a specific practical use case where it could be immediately useful, we can add it right away, otherwise I would prefer to wait because I don’t have a real use case for this and hence it will be harder for me to design it sensibly.

video over Cyphal UDP with flow control

request (start stream bitrate xxx) →
← data xxx
← data xxx
← data xxx
update (change stream bitrate yyy) →
← data yyy
← data yyy
← data yyy
← stall
← stall

(etc, etc, etc: all the audio and video “stuff”)

But this is getting close to 802.1CB and 802.1Q-2022 stream handles so I’m not sure how the layering works between something like Cyphal P2P and SRP.

Yes TSN is interesting. I just updated my earlier post with code references for clarity.

Just last Saturday, @laktoosivaba and me (my participation was very marginal though) prototyped a LAN video multicast streaming using Cy. It was pure pub/sub without P2P but nevertheless it was an interesting test, considering that we were pushing 11.6 Mbps using large multiframe messages. It is almost entirely vibe-coded though :grimacing:

https://github.com/OpenCyphal-Garage/cy-vid/pull/2/changes

I prototyped and merged a very simple implementation of P2P streaming as discussed above. This is more of a side project related to Cyphal v1.1, mostly done to prove that the existing protocol design can accommodate this feature without significant issues.

Please look at the README and also check out examples/main_udp_streaming_client.c and examples/main_udp_streaming_server.c.

We provide two QoS levels: best-effort and reliable. RPC responses, just like any other transfer, can be of either kind.

Best-effort responses do not provide the server with any deliverability feedback. While this is fine where a finite number of responses is required (ordinary RPC with a single response being the limit case), with streaming there needs to be a way to tell the server to terminate the stream. With a stateful connection-oriented protocol this comes naturally but statefulness is something that we explicitly want to avoid in Cyphal. With best-effort responses we pretty much have to tell the server when to stop streaming, and periodically re-request it to prevent the stream from timing out. That is quite straightforward.

Reliable responses are more interesting:

cy_future_t* future = cy_respond_reliable(breadcrumb, deadline, message);
// Some time later...
if (cy_future_status(future) == cy_future_success) {
    // Message delivered, the client is still alive. We can send another one.
} else if (cy_future_status(future) == cy_future_failure) {
    // Client did not acknowledge the message. Stop the stream.
}

One problem with this implementation, currently, is that message reliability operates at the transport layer. This makes sense for many reasons that I don’t want to discuss right now, let’s just say it’s a pretty typical solution. The transport layer will confirm the reception of a response before the higher layer is able to check if there is an active future for it, misleading the server into thinking that the stream is still being listened to. This is more of a software design issue rather than a protocol design issue, and it’s not a major one for that matter, but for now I am going to ignore it, since there are more important things to focus on. End-to-end ACKs can be retrofitted at a later stage and for now they are probably safe to ignore.