First I should restate the idea I shared at the call. The old v0 (aka Andrew’s approach) builds the standard in a holistic way, such that the standard envisions the entire architecture of the vehicle. The v1 approach is focused on isolated and composable network services instead to manage complexity and allow reuse. I am not sure analogies can be of help here but perhaps you can conceptualize it roughly as the difference between a highly complex monolithic program versus idiomatic OOP.
In v1, the design of the vehicular system is finalized by the integrator that links various network services together to achieve the required behaviors. This approach is superior compared to the legacy but it is still being misunderstood. I will try another way to explain it, this time speaking in very practical & hands-on terms instead of abstract ideas.
Hands-on examples
Laser rangefinder
You can grab a COTS product like this and use it as an altimeter:
The underlying lidar is well-suited for any other task involving rangefinding, which is also stated by the manufacturer. Yet, its UAVCAN v0 interface stands in the way. The v0 approach would interface it with the help of a dedicated message type with a fixed-ID roughly like this:
uint8 lidar_id
float32 altitude
You can’t use that for measuring any distance other than AGL.
Okay, so what if you defined a more generic data type that carries just the raw range reading like so:
uint8 lidar_id
float32 range # not altitude but just range to whatever
That lets you remove the assumption about the direction the sensor is facing, cool. Now, suppose that on your (flying) robot there are multiple sensors and your application needs to read the data of select few sensors. You make a subscriber and go roughly like:
def handle_reading(msg):
if msg.lidar_id not in (10, 11, 12, 13):
return # This is not the reading you are looking for.
# Process the reading...
<...>
Problems galore:
- Your application is forced to do the job of the transport layer by sifting through data at the application layer instead of letting the protocol stack figure out which data to deliver efficiently. This is a bad design.
- Addition of a new rangefinder will affect all existing subscribers to rangefinder data because they are coupled through the common topic. This is a very bad design.
- Your models are polluted with entities that bear no relevance for the application: instead of having just the range you care about, you also have to make assumptions about how many sensors will be there and which topics they are to share. This is a terrible design.
In idiomatic UAVCAN v1, you solve this problem as follows:
float32 range
(or you just use uavcan.si.unit.length.Scalar
in order to enhance compatibility with 3rd-party software and avoid reinventing this particular wheel)
See the instance-ID? Me neither. In order to interface your sensor with the component that is intended to receive its data, you pick any free subject-ID and assign it to both the sensor and the subscriber. The protocol stack will ensure that the data is delivered from one to the other without introducing undue logical coupling with other parts of the system and without leaking the transport layer details up to your application layer. Shall any new data consumers appear later, you string them onto the same subject-ID.
You might say here that the sensor-ID is unnecessary in v0 because you can just differentiate sources by node-ID. This is also a terrible design because it leaks abstractions the other way: 1. you have to assume that there be at most one data source per node, which is not a valid assumption at the design stage; 2. you can’t rely on the well-known and well-understood principles of data-centric publish-subscribe (DCPS) that would let you isolate data consumers from data providers with the help of the network (which effectively serves as the data broker).
Why is the second point important? Because it reduces the cognitive load on the person designing the network services — in v1, when you design a service you are focused on the service itself and you don’t need to care how the entire network is going to integrate with it. It surely doesn’t matter with this lidar example but it becomes seriously critical when you get to highly complex networks involving multiple nodes interoperating via a dozen of subjects. At this point, people experienced with UAVCAN v0 start asking if there is a way to run DDS over (UAV)CAN because v0 is clearly unfit for the purpose.
Idiomatic DCPS is much simpler conceptually because entities that belong to other layers of the protocol (node-ID from the transport layer and instance-ID) are not manifested in the application. Instead, you have one robust identifier — the port-ID (or the “topic name” in other systems) — that is solely responsible for addressing the data within the system. If you prefer a more traditional CS analogy, think of the network as the address space in a program, where the port-ID is the pointer that points to your data structure in memory. You don’t care who puts that structure where you are reading it from, because it’s irrelevant — all you care about is the data itself. Aside from making the system more flexible, this approach also simplifies failure mode analysis by virtue of involving fewer variable states.
Groups of ESC
So there is one vendor of quadplane VTOL that once told me this: “We have two groups of ESC in our design: one drives the tractor propellers in the plane mode, the other is used in the hover mode only. We need to run one at a slow rate at the other at a high rate. How do we implement that?”
They were (still are) using v0. The answer I gave them was roughly like: “You don’t – UAVCAN v0 does not support that because this use case was not envisioned when the standard was designed. Go define your own data types to fix it.”
IIRC they are using RCPWM for one of the groups now.
The design broke because the v0 architecture is too rigid. It is, quite literally, its fatal flaw, and I don’t want you to carry this broken design into the new standard. To see how this use case is to be addressed in UAVCAN v1, go look at the DS-015 ESC service, which manages it properly:
-
The setpoint message does not have a fixed port-ID, allowing you to implement an arbitrary number of groups.
-
The interface to ESC and servo is largely unified. Commonalities are easy to extract when you are following sensible design practices instead of jamming bits together in one large fixed message.
-
The feedback from ESC/servo is published in logically pure messages which do not contain any means of instance segregation — this job is delegated to the broker (the networking stack) as it should be. We don’t have an “ESC status” message with everything in it; instead, we have a message for kinematic states, one for electric states, and so on. This is also invaluable for bandwidth-limited networks where the possibility to disable or throttle unnecessary publications is critical.
Magnetometer on the gimbal
How can you “re use” this as anything other than a magnetic field?
Magnetic field readings are not only used for navigation. You seem to insist that the only way a (flying) robot or a vehicle may use magnetic field readings is for navigation, which is obviously not true. A magnetometer installed on an electropermanent magnet or on the gimbal obviously has no relevance for the navigation system, yet the published data contains magnetic field readings nonetheless.
Internal combustion engine controls
This example is derived from my today’s chat with @Dima, he might be able to add more info on this as soon as he caught up with this thread. His team is working on the Innopolis VTOL dynamics simulator (among other things).
In UAVCAN v0, there is an interface for reporting the status of the fuel injection system of an internal combustion engine. The interface aggregates a large number of loosely coupled parameters into one large message, which is a violation of the interface segregation principle and a major design problem in itself, but I spoke about it above already so let’s skip this part. Another major problem is that the interface lacks any controls for starting/stopping the engine and commanding its power setting.
To fix the problem, a v0 system will have to either abuse the ESC messages (which they do in their simulator at the moment) or to define ad-hoc fixed messages I spoke about several posts earlier. A v1 system would not have this problem because the idiomatic way to accept command is to subscribe to a highly generic type like uavcan.si.unit...
or uavcan.primitive...
, which don’t necessitate allocation of a fixed-ID. New types do need to be designed occasionally, but by virtue of being architecturally pure, they are much more reusable, shielding the end-user of the protocol from problems that arise when the protocol designer failed to envision a specific use case.
Idiomatic v1 would also split the old fuel injection status message into a group of smaller messages arranged roughly as follows:
- Dynamic states of the engine (torque, speed, etc). In DS-015, the corresponding type was
reg.drone.physics.dynamics.rotation.PlanarTs
. - States of the electrical system, like
reg.drone.physics.electricity.PowerTs
- etc.
Reliance on highly generic types widens the scope of the standard drastically.
Arming controls
Take a product like this:
https://www.tindie.com/products/arkelectronics/ark-gps/
Currently, its reuse in applications that do not involve traditional small UAVs is hindered by the rigidity of the v0 interfaces it supports. Let’s focus on the seemingly benign feature: the safety switch button.
In v0, there is a dedicated message type that may be leveraged to publish the state of this button as the global system arming state. But the button is such a basic UI feature, surely I could reuse it for something different?
In UAVCAN v1, it is trivially implementable by publishing uavcan.primitive.scalar.Bit
. Any subscriber can be configured to use this button for any purpose by merely setting its subject-ID accordingly. In v0, the button can only be used for those applications that were envisioned by the author of the interface.
Forget the button. What if I have several subsystems that need to be armed/disarmed selectively, as is the case in more complex vehicles? In DS-015, this is addressed by extracting relevant concerns into the readiness service, which can be instantiated as necessary. For example, one instance can be configured to control the avionics while a dedicated one arms the propulsion system (details in the documentation). In UAVCAN v0, none of these options are available because the designer of that service failed to think about more complicated usage scenarios; fixing that requires the introduction of new data types that will have to co-exist with the legacy ones.
Synthetic PyUAVCAN demo
I want you spend an hour to launch this demo on your machine (doing that should not require removal of UAVCAN v0 from your system anymore thanks to @coder_kalyan):
https://pyuavcan.readthedocs.io/en/stable/pages/demo.html
Try to answer the following question: how do you implement the same behaviors using fixed port-IDs without defining data types specific to this application only?
Summary
We can’t dispatch blobs of bytes to a subsystem in the flight controller based on what amounts to a hint.
This is only true as long as you treat port-IDs as a hint rather than a robust system-defining parameter. There are safety-critical production systems all over the world running diverse protocol stacks ranging from CANopen (with variable/dynamic PDO) and CANKingdom up to DDS and your typical message queues. They do just fine without fixed predefined identifiers because they manage configuration properly. Saying that non-fixed port-IDs are unfit for UDRAL amounts to discarding the extensive experience from the industry, which is hardly a sensible thing to do. Only inherently limited protocols that are useless outside of their extremely narrow domains, such as CANaerospace or MAVLink, can afford to rely on fixed identifiers and be blind to the general trends in the wider world out there.
Your requirement of having fixed identifiers at the cost of reusability and application flexibility is not justifiable. The failure modes you are concerned about are only manifested at configuration time and therefore they do not affect the operational safety of the vehicle. At the configuration stage, they are trivial to mitigate using mechanisms already discussed.
We are not going to re-introduce the same broken design back into v1, so fixed semantic-IDs are not happening. Instead, I suggest we give ArduPilot a closer look to try and find out how to integrate the non-fixed port-IDs into it without breaking the existing logic and without putting undue strain on the maintainers.
strong typing is one of the bedrocks of reliable computing. It is why we use classes, types etc in languages like C++.
I don’t think this analogy is particularly relevant, but since you brought it up — in native languages like C, C++, Rust, or perhaps any language without mandatory RTTI, types do not exist at runtime. Even Java generics are implemented via type erasure so they lack detailed type information at runtime. You obviously know that. In these terms, interfaces in C++ do work like subjects in UAVCAN v1 — the compiler guarantees that the types are correct before the runtime, just like the configuration stage guarantees that subjects are linked correctly before the vehicle is operational. At runtime, all you have are binary blobs.
For unregulated data types the situation is completely different. In that case a central assignment of IDs would be very counter productive, as it would make v1 much less attractive for experimentation.
<…>
I suspect the examples @pavel.kirienko was talking about on the last call would have been for the unregulated case.
The fact that you mention experimentation suggests that you still don’t understand what port-IDs (topic names) are for. I have attempted to correct this in this post; let me know if I succeeded or not. In the context of this discussion, I make no distinction between regulated and unregulated types.
Remember that implementing one method and applying it throughout the stack is also easier than implementing two separate approaches for regulated and unregulated types.
on receipt regulated messages can be immediately dispatched without any registry lookups
Neither of the approaches requires registry lookups at runtime.
Andrew, if I were to put this post into one sentence: I urge you to be a little visionary and look beyond the immediate needs you have in front of you right now. Lest one day somebody will ask you how to run DDS over CAN.