Cyphal vs. DroneCAN

pavel.kirienko · January 10, 2023, 6:17pm

Cyphal has been conceived as an improved successor to a highly successful experimental protocol called UAVCAN v0, also known as DroneCAN. This article briefly outlines the differences between Cyphal and its predecessor and provides a rationale for why new designs should use Cyphal.

If any concepts covered here seem unfamiliar, consulting with the Cyphal Guide may help.

Complex decentralized systems

Cyphal is well-suited for the construction of complex decentralized real-time networks. Any Cyphal network is chiefly defined by the publish/subscribe topics that the participants of this network use to exchange data in real time. Each topic has a data type. Any node can publish or subscribe to any topic as long as it uses a compatible data type. These principles constitute a faithful implementation of the standard data-centric publish-subscribe (DCPS) architecture, adapted for use in deterministic high-integrity vehicular systems.

The syntax of a data type is identified by its name, like uavcan.si.sample.temperature.Scalar.1.0. The semantics are identified by an integer topic identifier. The two are orthogonal except for certain special use cases, of which one can think as singletons in an OOP program or as well-known UDP/TCP port numbers.

DroneCAN is incompatible with the DCPS pattern and unsuitable for complex networks because data types must be attached to a topic when defined. It follows that, in order to construct a new node or a data type, the designer has to have a holistic view of the system it will be used in, which isn’t feasible aside from trivial scenarios.

To illustrate how Cyphal allows system designers to implement the DCPS pattern, suppose there is a need to render a certain temperature available on the network. To achieve this in Cyphal, one could come up with a straightforward data type definition like:

float32 temperature  # [kelvin]
@sealed

Or one could just use one of the standard definitions under the uavcan.si.* namespace (the uavcan standard namespace is named so for reasons of not breaking the compatibility with existing applications; otherwise, it would have been cyphal). This solution is architecturally clean as it does not require the designer to introduce concepts that do not pertain to the problem at hand, such as how to route the data through the network or how many different temperature measurements may need to be present on the network — Cyphal will take care of that automatically.

In DroneCAN, it is not possible to define abstract interfaces like the one above because each data type has to be bound to a fixed topic at the time of the data type definition. Hence, the designer would have to amend the definition with some additional information that will be used by the application to filter out the required information from the message stream:

float32 temperature
uint8 temperature_id

This approach has multiple flaws that may be obvious to someone experienced with DCPS systems. First, the problem of routing the data through the network to its intended consumers has now shifted from the protocol itself to the application because it is no longer possible to express the publisher-consumer relation at the protocol layer (like it is done in Cyphal via topic identifiers); instead, it is done using a separate field in the definition. The application would then contain lines of code like:

def handle_reading(msg):
    if msg.temperature_id not in (10, 11, 12, 13):
        return   # This is not the reading you are looking for.
    # Process the reading...

Second, the definition is polluted with an entity unrelated to the business logic, namely the temperature_id field. While it may not seem like a big deal in this stripped-down example, it becomes a problem in complex definitions or when data types have to be composed with each other.

Third, the designer has to be aware of the possible composition of all other sources of temperature data on the network; this knowledge is expressed through the temperature_id field. What if uint8 is not wide enough? Should it be expanded at the cost of the extra overhead? Or, perhaps, temperature_id needs to be removed completely, and then we require that there shall be at most one source of temperature data per node on the network? But then, what if the node has to publish more than one reading? If we keep temperature_id, should a temperature sensor be identified by said ID alone or by a tuple of its node-ID plus the temperature-ID? Soon it becomes evident that in DroneCAN, it is difficult or often impossible to decompose the network into isolated components (network services); and vice versa — to compose complex systems from basic components — due to the logical coupling between the transport and the application layers imposed through the hard linkage between data types and topics.

Fourth, as all data types have to have unique topic identifiers, independent vendors/teams defining their own data types are bound to run into identifier conflicts. Indeed, the DroneCAN ecosystem constantly runs the risk of data type identifier conflicts, but even this is a comparatively less pressing problem than the fundamental incompatibility of the protocol with well-architected DCPS interfaces.

Cyphal solves these problems by enforcing separation between the transport layer and the application layer. The transport layer facilitates DCPS; its job is to deliver the data from the source to the destination using whatever means necessary without disturbing other nodes. The application layer is involved only with the business logic, not in the routing decisions. This enables the construction of complex and robust systems requiring little cognitive load on the designer.

More on how to design good Cyphal-based systems can be found in the Cyphal Guide.

More practical examples

Groups of motors on a VTOL

There is one vendor of a quadplane VTOL that once said: “We have two groups of motors in our design: one drives the tractor propellers in the plane mode, the other is used in the hover mode only. We need to run one at a slow rate at the other at a high rate. How do we implement that?”

They were using DroneCAN. The answer given to them was roughly like this: “You don’t – DroneCAN does not support that because this use case was not envisioned when the standard was designed. Define your own data types to fix it.”

The design broke because DroneCAN is too rigid. We don’t want anyone to repeat these mistakes in the future. To see how this use case can be addressed in Cyphal, look at the UDRAL actuator service definition, which manages it properly:

The setpoint message does not have a fixed topic identifier, allowing one to implement an arbitrary number of motor groups.
The interface to motor drives and servos is largely unified. Commonalities are easy to extract when following sensible design practices.
The feedback from actuators is published in logically pure messages which do not contain any means of instance segregation — this job is delegated to the broker (the networking stack) as it should be. We don’t have a “motor status” message with everything in it; instead, we have a message for kinematics, electric power, and so on. This is also invaluable for bandwidth-limited networks where the possibility to disable or throttle unnecessary publications is critical.

Internal combustion engine controls

In DroneCAN, there is an interface for reporting the status of the fuel injection system of an internal combustion engine. The interface aggregates a large number of loosely coupled parameters into one large message, which is a violation of the interface segregation principle and a major design problem in itself, but let’s skip this part for now. Another major problem is that the interface lacks any controls for starting/stopping the engine and commanding its power setting.

To fix the problem, a DroneCAN-based system will have to either abuse the actuator messages (which is a common workaround) or define ad-hoc special-purpose messages. A Cyphal system would not have this problem because the idiomatic way to accept a command is to subscribe to a highly generic type like uavcan.si.unit... or uavcan.primitive..., which don’t necessitate allocation of a fixed ID. New types do need to be designed occasionally, but by virtue of being architecturally pure, they are much more reusable, shielding the end-user of the protocol from problems that arise when the interface designer fails to envision a specific use case.

Idiomatic Cyphal would also split the old fuel injection status message into a group of smaller messages: dynamic/kinematic states (torque, angular velocity, etc.), states of the electrical system (voltage, current, etc.), and so on. Reliance on highly generic types widens the scope of the standard.

Safety switch button

Take any off-the-shelf DroneCAN GNSS receiver. Currently, its usage in applications that do not involve traditional small UAVs is hindered by the rigidity of the DroneCAN interfaces. Let’s focus on the seemingly benign feature: the safety switch button, which is seen often in such devices.

In DroneCAN, there is a dedicated message type that may be leveraged to publish the state of this button as the global system arming state. But the button is such a basic UI feature; surely one could reuse it for something different?

In Cyphal, it is trivially implementable by publishing uavcan.primitive.scalar.Bit. Any subscriber can be configured to use this button for any purpose by merely setting its topic ID accordingly. In DroneCAN, the button can only be used for those applications that were envisioned by the author of the interface.

Forget the button. What if one has several subsystems that must be armed/disarmed selectively, such as in more complex vehicles? In Cyphal, this can be addressed by extracting relevant concerns into a separate network service which can be instantiated as necessary (e.g., see the standard Readiness service in UDRAL). For example, one instance can be configured to control the avionics while a dedicated one arms the propulsion system. In DroneCAN, none of these options are available because the designer of that service failed to think about more complicated usage scenarios; fixing that requires introducing new data types that will have to co-exist with the legacy ones.

Rigorous formal specification

Cyphal has a rigorous and complete formal specification, a necessary precondition for deploying the protocol in safety-critical systems. The specification offered by DroneCAN is incomplete — certain aspects are left uncovered, which introduces risks related to safety and wire compatibility.

Transport-agnosticism

DroneCAN only works over CAN and CAN FD, while Cyphal supports multiple transports: CAN (FD), UDP, TCP, serial port.

For the benefit of safety-critical systems, Cyphal supports heterogeneous transport redundancy, where multiple physical transports with different failure modes (e.g., CAN and a wireless link) facilitate one aggregate data link such that the failure of any of the physical transports will not affect connectivity at the application level.

Software ecosystem

Cyphal offers a much stronger software ecosystem that includes MISRA-compliant, verified implementations suitable for hard real-time high-integrity applications, as well as quality implementations in high-level languages suitable for tooling and automation. None of this is available in the DroneCAN ecosystem; most of the existing DroneCAN implementations are poorly tested, feature awkward APIs, and are unsuitable for high-integrity systems.

The highlights of the Cyphal software ecosystem are:

Libcanard — reference implementation of Cyphal/CAN in C.
Libudpard — reference implementation of Cyphal/UDP in C.
PyCyphal — implementation in Python for HMI and automation supporting all transport layers.
Kocherga — multi-transport, highly robust bootloader for high-integrity systems that supports both Cyphal and DroneCAN.
Canadensis — multi-transport implementation of Cyphal in Rust supporting Cyphal/UDP, Cyphal/CAN, et al.
107-Arduino-Cyphal — an easy-to-use implementation for the Arduino platform.
Nunavut — a multi-language transpiler for DSDL (Cyphal’s interface definition language).
Application examples

Cyphal offers better tooling as well! There is a powerful command line utility called Yakut:

Also, there is Yukon — a feature-rich GUI for diagnostics and management of Cyphal networks (it also provides minimal support for DroneCAN firmware upgrade):

Common misconceptions

Fixed data type identifiers used in DroneCAN are robust. In Cyphal, they are substituted with configuration parameters that can be easily changed. We can’t dispatch blobs of bytes to a subsystem in the flight controller based on what amounts to a hint.

This is only true as long as you treat topic IDs as a hint rather than a robust system-defining parameter. There are safety-critical production systems all over the world running diverse protocol stacks ranging from CANopen (with variable/dynamic PDO) and CANKingdom up to DDS and your typical message queues like MQTT. They do just fine without fixed predefined identifiers because they manage configuration properly.

Another example is in aviation, where DO-178C defines a concept of “Parameter Data Items” as a core part of software systems. PDIs are managed alongside software binaries (termed Executable Object Code, EOC) using the same verification and certification processes.

The topic ID space is very large; surely we can tolerate a few fixed topic IDs without ever running out of them?

This idea comes from a misunderstanding of what the topics (aka subjects) are for. The problem is not that we run out of identifiers; the problem is that fixed identifiers break the network architecture, as explained earlier.

All that process of configuring topics just to avoid having a type identifier in the transfers? Why?

Merely specifying the type is not sufficient as it communicates no information about the meaning of data. Say, if the type is vendor.geometry.Pose, what pose is this? We seem to require an identifier of semantics rather than type, which is what the topic identifiers (aka subject identifiers) are for.

As a vendor of avionics, my concern is that Cyphal recommends that vendors provide a way to reconfigure their devices to allow non-fixed topic identifiers. This seems complex and something that a user of my devices doesn’t care about at all.

Vendors are interested in upholding this new approach because it expands the set of viable applications of their products and, thus, the addressable market.

If you manufacture, say, a BMS, you will delegate to the integrator to decide where and how it should be integrated into the system. As a vendor, you would be unable to decide for the user which particular role the battery is going to be serving in the system: is it a traction battery? Which one of several? Is it powering the payload? Every participant of the ecosystem is interested in ensuring that the resulting systems are scalable, evolvable, and flexible, even if the methods of ensuring these properties may seem non-obvious at first.

The view that the approaches implemented in Cyphal are more complex than the traditional alternatives is incorrect and arises from a misunderstanding of how intravehicular networks are composed of COTS components. Whichever strategy is chosen, the hardware vendor cannot dictate the place and function of the hardware unit within the system (excepting a few marginal cases) – at least some configurability is necessary. Keeping this in mind, you can see that Cyphal’s approach does not add complexity or variance to COTS units, but rather it reorganizes the existing variabilities in a different way. Continuing the BMS example, the traditional approach would assume that the BMS has a configuration parameter that sets the instance-identifier for this device (which one of the several batteries onboard would that be?); the architecturally clean approach that we have implemented in Cyphal prescribes to remove the instance-identifier and configure the topic ID instead. A convenient analogy is the topic name from ROS, DDS, MQ*, etc.

Are you saying that the topic identifiers will need to be assigned dynamically?

No. They will be assigned statically, but they will be assigned by the integrator rather than the vendor. This means that the vendor will have to make them configurable. This is not the same as “dynamic” because they won’t be changing while the system is operational.

Is there a well-known fixed service that can be used to assign system-specific topic identifiers? Almost like how TCP port 80 is used to distribute out random unused ports to actually serve data during a transfer.

Yes, it’s called the Register API. It’s on a fixed ID (just like many other standard services).

My network is running out of bandwidth with DroneCAN. I cannot switch to Cyphal/CAN because its bandwidth utilization is even higher.

Cyphal/CAN does not use more bandwidth than DroneCAN. A properly architected and configured Cyphal network is, in fact, likely to exhibit a lower network bandwidth utilization than a comparable DroneCAN deployment. This is because Cyphal uses more granular network interfaces composed of smaller messages. The system integrator can selectively enable only those topics that are necessary and, optionally, configure the publication rates and priorities per topic. The bandwidth management tools provided by DroneCAN are comparatively limited to nonexistent, which normally leads to suboptimal network utilization. There are helper spreadsheets that one can use to estimate the bandwidth utilization of a Cyphal/CAN network: Cyphal/CAN over CAN FD, Classic CAN; plus there is one practical example.

A DroneCAN network is easier to inspect thanks to the rigid coupling between topics and data types. Have you ever used the Wireshark protocol analyzer? Have you tried opening it on a random network and having a look? You’ll see that it can parse the data flowing on the network. It can do this because there is sufficient information going over the network to know what is going on. With the model that Cyphal is pushing, the network analyzer can’t do that. It can’t look at the packets and know how to parse them.

This Wireshark example is incorrect because general-purpose packet analyzers are useless with application-layer protocols. Wireshark lets you debug USB or DDS transfers, but it cannot offer anything at the higher layers that these protocols are designed to serve (e.g., UVC streams or ROS topics). Speaking of Cyphal in particular, Wireshark is able to discern message/service transfers and segregate them by topics, but looking into them is above the layer of abstraction it is designed to manage. Just like you can’t substitute, say, rostopic echo with pcap.

To analyze an intravehicular network at the application layer, one has to know the data type and function of each topic. These relations are a core part of the vehicle’s design, just like the software that runs on its nodes or the data type definitions they use to communicate. When analyzing a live network, though, the developer doesn’t need to provide such knowledge externally since it is trivially extractable from the nodes using the built-in introspection capabilities in Cyphal (this is implemented in Yukon and Yakut).

Even if you ignore the above considerations, the DroneCAN approach also depends on the prior knowledge of the vehicle configuration because it relies on instance identifier fields (see examples above) and node identifiers to specify the meaning of the data exchanged through the network. Merely observing, say, a GNSS solution or a voltage reading on the network is not sufficient because one has to know the origin of that data: is it the left-side or right-side GNSS receiver? What component is the voltage reading coming from? If there is only one data producer per node, what component does that node identifier belong to?