UAVCAN: a highly dependable publish-subscribe protocol for real-time intravehicular networking

pavel.kirienko · July 2, 2019, 6:24pm

This write-up is loosely based on the materials presented at the PX4 Dev Summit in Zurich on June 20th, 2019 by Pavel Kirienko (Zubax Robotics) and Scott Dixon (Amazon Prime Air).

What UAVCAN is?

The following diagram is designed to aid one’s understanding of the place of UAVCAN among the other communication protocols used in relevant industries. We define a two-dimensional space where one dimension represents the effort required to obtain a verified and validated deployment; and the other dimension represents the level of abstraction provided by the protocol for the application. Although this mapping is somewhat arbitrary, its purpose is to give a general overview of the relative positions of the communication protocols rather than provide an accurate representation of them.

The space is segregated into very broad categories pertaining to different application domains. One can see that the upper-left corner of the diagram represents the “holy grail” of safety-critical intravehicular networking. If such protocol existed, it would provide rich abstractions but would not require complex failure mode analysis or other costly calls needed to ensure the safety of critical vehicular systems. Our objective is to push UAVCAN towards the upper-left corner; this objective is reflected in our core design goals (see Specification for details).

The above principles are manifested in the simple and robust design. It favors statelessness and static configurations, focusing the protocol on field deployments and tangible products rather than research or experimental applications (one might say that this contrasts with historically research-oriented systems like ROS). UAVCAN has a formal specification document that is (at the time of writing) a mere 80 pages (not including the standard data type definitions) approximately. Construction of a custom UAVCAN implementation may require a modest effort of approximately one man-month according to our assessment, which makes the protocol comparable to CANaerospace or CANopen in complexity. The availability of a formal specification sets the protocol apart from solutions intended for non-critical/research applications (such as ROS) enabling high-level requirement tracing and verification.

Simplicity extends beyond just ease of implementation. UAVCAN provides convenient and familiar high-level communication abstractions. These are:

stateless publish-subscribe, where “stateless” means the lack of any state sharing between the publisher and the subscriber; and
request-response (similar to RPC) interactions that model function calls executed on remote nodes.

The abstractions are designed to impose a very low semantic overhead (or, to use the terminology of some programming languages, they are “near zero-cost”), meaning that a customized solution designed for a particular application is unlikely to be more efficient than the generic abstractions built into UAVCAN. For example, a single-frame message publication in UAVCAN-over-CAN carries only one byte of payload overhead.

A major part of UAVCAN that could further the ease-of-use argument is the schema definition language DSDL (Data Structure Description Language) which should be easily recognizable to anyone familiar with ROS:

#
# Generic human-readable text message for logging and displaying purposes.
# Generally it should be published at the lowest priority level.
#

# Optional timestamp in the network-synchronized time system; zero if undefined.
# The timestamp value conveys the exact moment when the reported event took place.
uavcan.time.SynchronizedTimestamp.1.0 timestamp

# Standard severity, 3 bit wide.
Severity.1.0 severity

# Message text.
# Normally, messages should be kept as short as possible, especially those of high severity.
void6
uint8[<=112] text

@assert offset % 8 == {0}
@assert offset.max <= (124 * 8)     # Two CAN FD frames max

The line uint8[<=112] text showcases an important design decision of the protocol and DSDL in particular: no arbitrary-size data structures are allowed. Whenever you define a variable-size entity, you have to define the upper-size boundary. This property, together with some other design decisions made at the transport layer, makes the protocol suitable for highly deterministic, real-time systems because they can only be implemented using constant-complexity and/or well-characterized bounded-complexity algorithms.

Normally, a regular developer wishing to leverage UAVCAN in their application would not have to deal with the protocol at the implementation level, since we have high-quality reference implementations available that happen to be MIT-licensed (the MIT license is one of the most permissive OSS licenses out there and it is highly valued by the industry):

Implementation	Language	Transports
Libcanard	C99	CAN (FD)
Libuavcan	C++11	CAN (FD)
PyUAVCAN	Python	CAN (FD), UDP/IP, serial, … (extensible)
uavcan.rs	Rust	CAN (FD)

The simplicity and abstractions provided by UAVCAN make it a viable candidate for safety-critical systems. Being simple and equipped with a formal specification, the protocol can be implemented in about 1000 SLoC (see libcanard as an example), making its validation and verification a substantially less expensive effort than for some other solutions offering a comparable level of abstraction.

Failure mode analysis is simplified by the decentralized nature of UAVCAN where every node has equal rights on the bus (no master or other types of concern separation – every node can exchange data with any other node) and the nodes are not required to establish their presence before beginning their normal duties; that is, a node is able to begin its normal operation immediately upon powering up. High-assurance systems will benefit from the native support for redundant transports built into the transport layer: the protocol models redundant transports as a virtual aggregate link that sits on top of an arbitrary number of redundant transports; the associated data duplication and deduplication activities are abstracted away from the application.

In order to keep the protocol simple, certain design trade-offs have been made. The major one being that UAVCAN does not support inherently unreliable transports; in other words, every data frame must be likely to be deliverable to the recipient at least once under normal operating conditions (frame duplication is handled by the protocol). This feature usually does not add additional constraints to a safety-critical vehicular deployment since they use transports that provide the necessary reliability metrics at the link layer, often rendering advanced QoS/reliability features (which are usually non-deterministic by nature due to associated data recovery uncertainties) built into more advanced protocols (such as DDS) unnecessary. Currently, CAN 2.0 and CAN FD are supported as the first-class transport options, and there is ongoing research into making UAVCAN support UDP/IP, IEEE 802.15.4 (like WAIC), and raw serial links (such as UART, RS-485/232, or USB CDC ACM).

Concerning data transfers, one common misconception should be resolved. UAVCAN found a major use as a motor control protocol in unmanned vehicles, where low data latency is paramount. For some time there existed a misconception that UAVCAN is unable to meet the performance levels attainable with low-level specialized protocols such as DShot. Simple math and empirical evidence indicate otherwise – for a basic 1Mbps (8Mbps) CAN (FD) bus deployment one can arrive at the following performance figures:

It’s been mentioned that UAVCAN can be implemented from scratch in a relatively short time-frame with a low number of SLoCs but another way to gauge its complexity is to look at its resource utilization. Consider the product known as NicaDrone OpenGrab EPMv3:

It is a very interesting device in its own right but in the context of this piece, the relevant fact is that it runs a complete UAVCAN node while being based on a highly resource-constrained MCU: an NXP LPC11C24 (ARM Cortex M0, 50 MHz, 8K RAM, 32K ROM).

Another similar example is the open source ESC Thiemar S2740VC. This device is also equipped with a UAVCAN node and a UAVCAN bootloader while being based on STM32F302 (ARM Cortex M4F at 72 MHz) with only 16K RAM and 64K ROM:

Generally, if we were to summarize the memory footprint of a typical application based on UAVCAN against other protocols, we would expect data points similar to those in the table below. The exact values are, of course, approximate and depend upon many variables such as the particular implementation, the complexity of the business logic, the microarchitecture, the transport, and many other such parameters.

Protocol	RAM	ROM	Topology
UAVCAN	8K	32K	Decentralized
CANaerospace	4K	8K	Decentralized
DDS/XRCE	32K	250K	Master node required
DDS/RTPS	100K	>>250K	Decentralized

The availability of tools and resources are a major focus area for the maintainers. As mentioned, we maintain high-quality MIT-licensed reference implementations in major programming languages (see uavcan · GitHub) which are reusable in closed-source commercial applications with no strings attached. Developers working with and learning UAVCAN will benefit from our graphical user tools that allow them to monitor, test, diagnose, and understand the UAVCAN bus. One such tool is UAVCAN GUI Tool (screenshots shown below) which will soon be replaced by the newer, more capable web-based tool called Yukon (front-end developers are welcome to join the effort!).

Ultimately, we would like the new graphical tool to provide a visual schematic network editor for the benefit of research and experimental applications, as demonstrated on the following mock-up:

Brief history of UAVCAN

What’s described above is the so-called UAVCAN v1 – the first long-term stable version of the protocol. At the time of writing, UAVCAN v1 is so new that it is yet hardly used. Most existing systems use UAVCAN v0, which was the first experimental version of the protocol authored in 2015. As v1 is not backward-compatible with v0, we face a transition period that demands resources which adopters may find concerning. We must therefore explain why the transition is necessary.

The core idea behind UAVCAN v0 is that it is a research vessel. UAVCAN v0 is used in the field in real applications. It has brought us a wealth of empirical data and experience about how it is used and what constraints and bottlenecks exist. UAVCAN v1 has been built on top of that experience. Consider the following statistics sampled among the members of the Dronecode foundation in early 2018:

UAVCAN v0 was designed for the narrow domain of unmanned aerial vehicles. However, we observed that this domain shares the requirements for an adequate intravehicular communication solution with other domains within the metadomain of software-defined intelligent vehicles. Hence, we have changed the scope of the protocol by relinquishing the UAV domain to a third party, covering the requirements of other application domains by generalizing the provided communication abstractions, and supporting other communication mediums. One can visualize the transition graphically as follows; this is the scope of v0:

Upon transition to v1 the new state of affairs is as follows (the set of domains and protocols is for reference purposes only and is not exhaustive):

We propose establishing and maintaining domain-specific profiles where each domain would be managed by an inter-vendor non-profit body representing the interests of the relevant industries. For the domain of UAV, the Dronecode foundation seems to be in an ideal position to undertake the role of such domain-specific UAVCAN profile maintainer. Logistically, each domain-specific profile would be a DSDL root namespace containing relevant data types and specifications defining how the data types should be used to ensure cross-vendor compatibility and interoperability. Such explicit separation of concerns between the UAVCAN maintainers and domain-specific profile maintainers ensures that UAVCAN will remain sufficiently generic for unconventional, new, or cross-domain applications. At the same time, its domain-specific profiles will be built from the experience and expertise of companies operating in relevant industries. Upon closer inspection of the proposal, one can identify weak parallels with the concept of device profile defined by CANopen or device class defined by USB.

However, UAVCAN should remain usable even outside of the scope of a domain-specific profile; particularly as it is expected that a profile is likely to address only the most common use cases. This is why vendor- and application-specific data types are first-class citizens in the stable version of UAVCAN. The specification also defines a set of well-defined management policies that govern development, release, and retirement of the vendor-specific data types that form public interfaces; the policies are intended to ensure conflict-free interoperation and backward compatibility spanning across different versions and different domains of the protocol. Those interested are most welcome to familiarize themselves with Specification.

UAVCAN v1 goes a step further in accommodating the needs of research, experimental, and low-demand deployments by providing two new collections of standard data types: uavcan.si (international system of units) and uavcan.primitive (arbitrary scalars and vectors). These definitions allow one to quickly draft UAVCAN-based applications without the need to define custom data types.

uavcan
├── primitive
│   ├── array
│   │   ├── Bit.1.0.uavcan
│   │   ├── Integer16.1.0.uavcan
│   │   ├── Integer32.1.0.uavcan
│   │   ├── Integer64.1.0.uavcan
│   │   ├── Integer8.1.0.uavcan
│   │   ├── Natural16.1.0.uavcan
│   │   ├── Natural32.1.0.uavcan
│   │   ├── Natural64.1.0.uavcan
│   │   ├── Natural8.1.0.uavcan
│   │   ├── Real16.1.0.uavcan
│   │   ├── Real32.1.0.uavcan
│   │   └── Real64.1.0.uavcan
│   ├── Empty.1.0.uavcan
│   ├── scalar
│   │   ├── Bit.1.0.uavcan
│   │   ├── Integer16.1.0.uavcan
│   │   ├── Integer32.1.0.uavcan
│   │   ├── Integer64.1.0.uavcan
│   │   ├── Integer8.1.0.uavcan
│   │   ├── Natural16.1.0.uavcan
│   │   ├── Natural32.1.0.uavcan
│   │   ├── Natural64.1.0.uavcan
│   │   ├── Natural8.1.0.uavcan
│   │   ├── Real16.1.0.uavcan
│   │   ├── Real32.1.0.uavcan
│   │   └── Real64.1.0.uavcan
│   ├── String.1.0.uavcan
│   └── Unstructured.1.0.uavcan
└── si
    ├── acceleration
    │   ├── Scalar.1.0.uavcan
    │   └── Vector3.1.0.uavcan
    ├── angle
    │   ├── Quaternion.1.0.uavcan
    │   └── Scalar.1.0.uavcan
    ├── angular_velocity
    │   ├── Scalar.1.0.uavcan
    │   └── Vector3.1.0.uavcan
    ├── duration
    │   ├── Scalar.1.0.uavcan
    │   └── WideScalar.1.0.uavcan
    ├── electric_charge
    │   └── Scalar.1.0.uavcan
    ├── electric_current
    │   └── Scalar.1.0.uavcan
    ├── energy
    │   └── Scalar.1.0.uavcan
    ├── length
    │   ├── Scalar.1.0.uavcan
    │   ├── Vector3.1.0.uavcan
    │   └── WideVector3.1.0.uavcan
    ├── magnetic_field_strength
    │   ├── Scalar.1.0.uavcan
    │   └── Vector3.1.0.uavcan
    ├── mass
    │   └── Scalar.1.0.uavcan
    ├── power
    │   └── Scalar.1.0.uavcan
    ├── pressure
    │   └── Scalar.1.0.uavcan
    ├── temperature
    │   └── Scalar.1.0.uavcan
    ├── velocity
    │   ├── Scalar.1.0.uavcan
    │   └── Vector3.1.0.uavcan
    ├── voltage
    │   └── Scalar.1.0.uavcan
    ├── volume
    │   └── Scalar.1.0.uavcan
    └── volumetric_flow_rate
        └── Scalar.1.0.uavcan

There are two other major changes that set v1 apart from v0: the introduction of data type versioning and the deprecation of the data type identifier. A detailed technical description of either of those would require much more dedication than the format of this overview permits, so we will limit ourselves to a brief overview. Those interested in the details could learn more by reading Stockholm Summit recap.

Data type versioning enables one to introduce changes into released data types while ensuring backward compatibility and/or a well-defined migration path. It allows a complete vehicle system to evolve its parts incrementally and allows vendors to innovate without breaking compatibility with their existing install base. This is still a static system so one cannot magically make the existing base use a newer type, but one can allow devices that use the new type to interoperate with the existing devices. Historical trivia: extensive technical discussions around this feature spanned almost 1.5 years.

Removal of the data type identifier allowed us to untangle data formats from their meaning. Such abstract description may sound odd, so picture it this way: in UAVCAN v0 you could not use the same data type for reporting the temperature of the tire surface and the brake mechanism because the same data type ID would render their messages indistinguishable. In UAVCAN v1, the same data type can be used for both, and the semantics of each reading would be reflected in the new concept of Subject ID.

Integrating UAVCAN

This section describes UAVCAN from the standpoint of a large-scale adopter.

One of the most important characteristics of the protocol is that it is an open standard that can be used by anyone license-free without approval of any kind. This is never going to change. The specification is available to anyone (no membership required) and MIT-licensed reference implementations are hosted openly on GitHub.

An adopter can choose UAVCAN quickly without much involvement by legal departments and with no involvement by procurement departments. They can start this integration at the prototype stage and can continue to use UAVCAN as their prototype develops into a production system.

The UAVCAN team maintains open libraries, debug tools, documentation, and a public discussion forum, ensuring that the barriers to entry and the cost of adoption are minimal. Safety-critical systems are first-class targets, being supported by the ongoing work to allow system integrators to validate their integration and to generate documentation for certification.

DSDL enables a greater degree of isolation between subsystems without weakening type safety, which contrasts with historically monolithic designs of legacy CAN deployments. With DSDL, subsystems can be developed in parallel by a large engineering organization or by external vendors; the language encourages design of clear and well-defined interfaces. Because DSDL is a language agnostic IDL, there is no lock-in to a specific language caused by its adoption. Just to belabor the point: the figure on the left shows how CAN bus messaging is typically implemented without an intervening higher-layer protocol. When using UAVCAN and DSDL, the two firmwares can depend on interface descriptions that need only be compatible with each other (see versioning in specification):

UAVCAN comes with only a single required function (heartbeat message uavcan.node.Heartbeat must be published by all nodes) and a small list of standard application functions that are needed for most production systems. These are:

Heartbeat and uptime are required and are essential to a reliable vehicle system. Uptime is often overlooked but it can be the only indication that a sub-system has entered a continuous reset loop.
uavcan.node.GetInfo is essential for validating a vehicle’s configuration and forensic analysis of failures.
Diagnostics, data flow statistics, and the registers’ function all provide a degree of flexibility and introspection that can greatly accelerate prototyping, debugging, and tuning vehicle systems.
Plug-and-play may seem unnecessary at first but, for some systems, it quickly becomes essential. With plug-and-play capability vehicles can use COTS parts without having to redefine an otherwise static system. This is critical for certified systems.
Node update over the data bus is essential for aerospace where weight is a critical factor. Often the sub-systems on these vehicles are ultra-compact and weatherized making it difficult to expose a debug port to allow direct firmware upload.

Future of UAVCAN

Broadly speaking, we intend to continue pushing UAVCAN towards the upper-left corner of the matrix presented at the beginning of this publication. We are continuing to research new transport protocols and the limits of UAVCAN in modern and future vehicular systems, where legacy networking solutions are unable to meet the demands of new applications. At the same time, we remain committed to simplicity. UAVCAN should remain simple at its core and the barriers to adoption should continue to fall.

Certified safety-critical applications will remain our first-class targets. To this end, we are committed to maintaining high software quality standards and meeting the requirements of safety-critical system integrators.

Although just recently released, UAVCAN v1 is now the recommended choice for new projects. Existing deployments using UAVCAN v0 will benefit from our commitment to provide support until at least 2021.

Our longer-term goal for v1 is to commence formal standardization after the system has been deployed in the field with at least two major large-scale projects and a substantial number of cumulative operating hours. Interested parties can follow our progress along this path by monitoring the public forum.

In the future, we would like our adopters to be more public about their reliance on UAVCAN. Some of the largest technology companies use UAVCAN in their research projects and products, yet this largely remains unknown to the public. This may have somewhat adverse effects on the image of UAVCAN. Such undisclosed, invisible adoption is an expected consequence of the open nature of the project, yet we would like to restate that for the long-term success of UAVCAN, it is desirable for our users to come forward and acknowledge their reliance on the protocol publicly. Ideally, we would like to set up a public list of featured adopters on the front page; if your company is amenable to this, please contact us via this forum or send an email to uavcan-maintainers@googlegroups.com.

Last, but not least: many consider “UAVCAN” to mean “CAN for UAV”. Nowadays we recommend interpreting it as Uncomplicated Application-level Vehicular Communication And Networking.

Resources

pavel.kirienko · July 3, 2019, 11:11pm

Video of the Zurich talk: