Port type safety enforcement

we could, but it really just takes up flash space for no purpose.

This is where I start feeling uncomfortable. It’s not that there is no purpose; it’s that the purpose may not be immediately obvious in the field of small unmanned aerial vehicles in the present day. I would think that to claim to support UDRAL, a UAVCAN node would at least need to acknowledge the existence of other domains and use cases.

We could, but would it suffice to determine types in the packet (perhaps with the reserved bit or perhaps in the payload) and leave semantics up to configuration?

It appears that I wrote a bad example. My point is that you could still end up getting data of the same “type” that is useful to some other node but not to vehicle navigation. The flight controller isn’t the only consumer of data on the bus.

if the compiler handled it efficiently so it doesn’t generate an extra level of function calls then adding the extra type layer would be harmless. I suspect the compiler won’t be that smart, so it will just add flash.
What exactly are you trying to re-use with this? The serialization of a vector of int16_t for a magnetic field?
Even so, if that is what it takes to make Pavel happy with having a semantic_id then it would be worth the extra flash space.

2 Likes

Our terminology got somewhat distorted. In your SOAP example, soapaction has the same role as port-ID (subject-ID or service-ID) in UAVCAN. There is no need to identify the data type explicitly because the data is represented using a self-describing encoding (XML). I don’t think SOAP is hugely relevant though — my references to SOA are related to the design principles rather than one particular method of implementing them in a practical system.

I did spend some time pondering about your latest posts but I am not yet ready to offer much valuable input. I do have a question though.

Speaking very generally, if the type information was trivially deducible from any received transfer, would you find such design acceptable? Your proposal of adding semantic-ID to DSDL suggests that it might be the case.

I should leave this thread here which contains my unedited thought process behind the current architecture:

who knew that happiness can be measured in bytes

i’m not sure where you are going with this. The semantic type needs to be directly available so we can dispatch the packet in a manner that can be completely relied upon and we can decode in a network analyser without probing the source. Just the syntactic type information isn’t enough - for example using BER encoding (a self describing format commonly used with ASN.1) would give you the syntax but not the semantics unless you also added a semantic tag.

Let’s refine our definitions. By “semantic type” you mean a joint descriptor of the syntax (the data structure layout) and the semantics (purpose) of the transfer, right? Can the v0 data type ID be considered a semantic type ID?

purpose isn’t a good word. As discussed with @coder_kalyan, we don’t need to know what the data is going to be used for. So for magnetometer data we don’t care if the data is for a yaw source or to detect when a magnetic payload has been inserted or some completely different use for magnetic fields. We just care that it is a magnetic field reading (and which magnetic field sensor on the source node, the instance number).
It needs to be sufficient information for dispatch of the data to the right subsystem (or multiple subsystems in some cases). To do that you need to know what the data is (ie. it is a “magnetic field”) and the shape of the structure and the units of the fields.
Which for a protocol that uses a protocol description language like DSDL it needs to be enough information to associate the incoming data with the compiled form of the DSDL.

So it seems that you just need the syntax (data structure), not the semantics at all. This actually simplifies things somewhat.

I’m curious how you plan to work without this information?

Could this be accomplished by just using a separate port ID per instance? (Assuming we put the data ID in the payload) Or would that not be acceptable?

nope. In CS the syntax of the following two structures is the same:

struct {
 uint8_t instance;
 int16_t mag_field_mgauss[3];
} struct1;
struct {
 uint8_t instance;
 int16_t acceleration_mg[3];
} struct2;

but their semantics is different, as the meaning of the fields is different.
Neither the syntax or the semantics tells you anything about what someone will do with the data.

1 Like

Thinking about this some more, I think we are missing one key factor in the discussion, which is the distinction between regulated and unregulated data types.
I have been focused on regulated data types, such as the core sensor messages for use by ArduPilot and PX4. For those regulated data types the use of a centrally administratively assigned ID is essential as the whole point of the regulation is to create a shared use standard with strong type safety.
For unregulated data types the situation is completely different. In that case a central assignment of IDs would be very counter productive, as it would make v1 much less attractive for experimentation. Having to get IDs assigned by a central administrative authority would be a major pain.
I suspect the examples @pavel.kirienko was talking about on the last call would have been for the unregulated case. For unregulated messages having a registry based lookup to assign subject IDs would be fine. It would actually be very nice, as it means you could mix and match unregulated messages while avoiding collisions either by manual configuration of devices or by string ID matching for the semantic name of the message. We couldn’t do that in v0 without risk of ID collisions.
So I think what we should do is:

  • in the DSDL, use “@regulated_id NNNN” instead of the @semantic_id I suggested earlier
  • have a bit in the header marking if a message is from a regulated message set or not (it comes from a regulated message set of there is a @regulated_id in the message DSDL). This bit could be one of the existing reserved bits, or we could carve off one bit of the 13 bit subject ID for this
  • when sending a regulated message we would generate code to by default set the “regulated” bit, and put the @regulated_id in the subject-ID. A caller could choose to send the message as unregulated if they want to, in which case it will need config (either manual or automatic) for the subject-ID, and would not set the regulated bit in the packet
  • on receipt regulated messages can be immediately dispatched without any registry lookups
  • unregulated messages would need to be mapped to their semantics via registry lookups

This approach has one big advantage of what I proposed before with @semantic_id, which is the ability for even regulated messages to be checked for complete syntactic compatibility via registry lookups (eg. could be done by arming checks). We would just need a structural checksum of the DSDL available as a registry entry on the sending node. Ideally it would actually be 2 structural checksums, one for the unextended message, and one including any extensions that the sending node includes.
This ability to check the structural checksum of messages could be very useful if a vendor releases a bad firmware that sends a regulated message with bad DSDL (eg. through a mixup with git versions). We could catch it and report an error.

1 Like

I like where this is going, but I do have one other concern:

UAVCANv1 has the intention of being applicable to many vehicular and robotics projects, not just drones/unmanned aerial vehicles. Hence transport layer rigidity such as switching bits and fixing port IDs makes me uncomfortable as it shows a strong preference towards the UAV use case in the regulated types. I’m not sure whether or not this will actually end up being a problem, but it’s something to keep in mind.

for any regulated set of messages you really want those fixed IDs. It isn’t specific to UAVs.
For unregulated messages having the mapping done via registry lookups is a win for sharing messages between loosely connected projects.

Do we want to have a call tomorrow? I can do the same time or earlier.

First I should restate the idea I shared at the call. The old v0 (aka Andrew’s approach) builds the standard in a holistic way, such that the standard envisions the entire architecture of the vehicle. The v1 approach is focused on isolated and composable network services instead to manage complexity and allow reuse. I am not sure analogies can be of help here but perhaps you can conceptualize it roughly as the difference between a highly complex monolithic program versus idiomatic OOP.

In v1, the design of the vehicular system is finalized by the integrator that links various network services together to achieve the required behaviors. This approach is superior compared to the legacy but it is still being misunderstood. I will try another way to explain it, this time speaking in very practical & hands-on terms instead of abstract ideas.

Hands-on examples

Laser rangefinder

You can grab a COTS product like this and use it as an altimeter:

The underlying lidar is well-suited for any other task involving rangefinding, which is also stated by the manufacturer. Yet, its UAVCAN v0 interface stands in the way. The v0 approach would interface it with the help of a dedicated message type with a fixed-ID roughly like this:

uint8 lidar_id
float32 altitude

You can’t use that for measuring any distance other than AGL.

Okay, so what if you defined a more generic data type that carries just the raw range reading like so:

uint8 lidar_id
float32 range  # not altitude but just range to whatever

That lets you remove the assumption about the direction the sensor is facing, cool. Now, suppose that on your (flying) robot there are multiple sensors and your application needs to read the data of select few sensors. You make a subscriber and go roughly like:

def handle_reading(msg):
    if msg.lidar_id not in (10, 11, 12, 13):
        return   # This is not the reading you are looking for.
    # Process the reading...
    <...>

Problems galore:

  • Your application is forced to do the job of the transport layer by sifting through data at the application layer instead of letting the protocol stack figure out which data to deliver efficiently. This is a bad design.
  • Addition of a new rangefinder will affect all existing subscribers to rangefinder data because they are coupled through the common topic. This is a very bad design.
  • Your models are polluted with entities that bear no relevance for the application: instead of having just the range you care about, you also have to make assumptions about how many sensors will be there and which topics they are to share. This is a terrible design.

In idiomatic UAVCAN v1, you solve this problem as follows:

float32 range

(or you just use uavcan.si.unit.length.Scalar in order to enhance compatibility with 3rd-party software and avoid reinventing this particular wheel)

See the instance-ID? Me neither. In order to interface your sensor with the component that is intended to receive its data, you pick any free subject-ID and assign it to both the sensor and the subscriber. The protocol stack will ensure that the data is delivered from one to the other without introducing undue logical coupling with other parts of the system and without leaking the transport layer details up to your application layer. Shall any new data consumers appear later, you string them onto the same subject-ID.

You might say here that the sensor-ID is unnecessary in v0 because you can just differentiate sources by node-ID. This is also a terrible design because it leaks abstractions the other way: 1. you have to assume that there be at most one data source per node, which is not a valid assumption at the design stage; 2. you can’t rely on the well-known and well-understood principles of data-centric publish-subscribe (DCPS) that would let you isolate data consumers from data providers with the help of the network (which effectively serves as the data broker).

Why is the second point important? Because it reduces the cognitive load on the person designing the network services — in v1, when you design a service you are focused on the service itself and you don’t need to care how the entire network is going to integrate with it. It surely doesn’t matter with this lidar example but it becomes seriously critical when you get to highly complex networks involving multiple nodes interoperating via a dozen of subjects. At this point, people experienced with UAVCAN v0 start asking if there is a way to run DDS over (UAV)CAN because v0 is clearly unfit for the purpose.

Idiomatic DCPS is much simpler conceptually because entities that belong to other layers of the protocol (node-ID from the transport layer and instance-ID) are not manifested in the application. Instead, you have one robust identifier — the port-ID (or the “topic name” in other systems) — that is solely responsible for addressing the data within the system. If you prefer a more traditional CS analogy, think of the network as the address space in a program, where the port-ID is the pointer that points to your data structure in memory. You don’t care who puts that structure where you are reading it from, because it’s irrelevant — all you care about is the data itself. Aside from making the system more flexible, this approach also simplifies failure mode analysis by virtue of involving fewer variable states.

Groups of ESC

So there is one vendor of quadplane VTOL that once told me this: “We have two groups of ESC in our design: one drives the tractor propellers in the plane mode, the other is used in the hover mode only. We need to run one at a slow rate at the other at a high rate. How do we implement that?”

They were (still are) using v0. The answer I gave them was roughly like: “You don’t – UAVCAN v0 does not support that because this use case was not envisioned when the standard was designed. Go define your own data types to fix it.”

IIRC they are using RCPWM for one of the groups now.

The design broke because the v0 architecture is too rigid. It is, quite literally, its fatal flaw, and I don’t want you to carry this broken design into the new standard. To see how this use case is to be addressed in UAVCAN v1, go look at the DS-015 ESC service, which manages it properly:

  • The setpoint message does not have a fixed port-ID, allowing you to implement an arbitrary number of groups.

  • The interface to ESC and servo is largely unified. Commonalities are easy to extract when you are following sensible design practices instead of jamming bits together in one large fixed message.

  • The feedback from ESC/servo is published in logically pure messages which do not contain any means of instance segregation — this job is delegated to the broker (the networking stack) as it should be. We don’t have an “ESC status” message with everything in it; instead, we have a message for kinematic states, one for electric states, and so on. This is also invaluable for bandwidth-limited networks where the possibility to disable or throttle unnecessary publications is critical.

Magnetometer on the gimbal

How can you “re use” this as anything other than a magnetic field?

Magnetic field readings are not only used for navigation. You seem to insist that the only way a (flying) robot or a vehicle may use magnetic field readings is for navigation, which is obviously not true. A magnetometer installed on an electropermanent magnet or on the gimbal obviously has no relevance for the navigation system, yet the published data contains magnetic field readings nonetheless.

Internal combustion engine controls

This example is derived from my today’s chat with @Dima, he might be able to add more info on this as soon as he caught up with this thread. His team is working on the Innopolis VTOL dynamics simulator (among other things).

In UAVCAN v0, there is an interface for reporting the status of the fuel injection system of an internal combustion engine. The interface aggregates a large number of loosely coupled parameters into one large message, which is a violation of the interface segregation principle and a major design problem in itself, but I spoke about it above already so let’s skip this part. Another major problem is that the interface lacks any controls for starting/stopping the engine and commanding its power setting.

To fix the problem, a v0 system will have to either abuse the ESC messages (which they do in their simulator at the moment) or to define ad-hoc fixed messages I spoke about several posts earlier. A v1 system would not have this problem because the idiomatic way to accept command is to subscribe to a highly generic type like uavcan.si.unit... or uavcan.primitive..., which don’t necessitate allocation of a fixed-ID. New types do need to be designed occasionally, but by virtue of being architecturally pure, they are much more reusable, shielding the end-user of the protocol from problems that arise when the protocol designer failed to envision a specific use case.

Idiomatic v1 would also split the old fuel injection status message into a group of smaller messages arranged roughly as follows:

  • Dynamic states of the engine (torque, speed, etc). In DS-015, the corresponding type was reg.drone.physics.dynamics.rotation.PlanarTs.
  • States of the electrical system, like reg.drone.physics.electricity.PowerTs
  • etc.

Reliance on highly generic types widens the scope of the standard drastically.

Arming controls

Take a product like this:

Currently, its reuse in applications that do not involve traditional small UAVs is hindered by the rigidity of the v0 interfaces it supports. Let’s focus on the seemingly benign feature: the safety switch button.

In v0, there is a dedicated message type that may be leveraged to publish the state of this button as the global system arming state. But the button is such a basic UI feature, surely I could reuse it for something different?

In UAVCAN v1, it is trivially implementable by publishing uavcan.primitive.scalar.Bit. Any subscriber can be configured to use this button for any purpose by merely setting its subject-ID accordingly. In v0, the button can only be used for those applications that were envisioned by the author of the interface.

Forget the button. What if I have several subsystems that need to be armed/disarmed selectively, as is the case in more complex vehicles? In DS-015, this is addressed by extracting relevant concerns into the readiness service, which can be instantiated as necessary. For example, one instance can be configured to control the avionics while a dedicated one arms the propulsion system (details in the documentation). In UAVCAN v0, none of these options are available because the designer of that service failed to think about more complicated usage scenarios; fixing that requires the introduction of new data types that will have to co-exist with the legacy ones.

Synthetic PyUAVCAN demo

I want you spend an hour to launch this demo on your machine (doing that should not require removal of UAVCAN v0 from your system anymore thanks to @coder_kalyan):

https://pyuavcan.readthedocs.io/en/stable/pages/demo.html

Try to answer the following question: how do you implement the same behaviors using fixed port-IDs without defining data types specific to this application only?

Summary

We can’t dispatch blobs of bytes to a subsystem in the flight controller based on what amounts to a hint.

This is only true as long as you treat port-IDs as a hint rather than a robust system-defining parameter. There are safety-critical production systems all over the world running diverse protocol stacks ranging from CANopen (with variable/dynamic PDO) and CANKingdom up to DDS and your typical message queues. They do just fine without fixed predefined identifiers because they manage configuration properly. Saying that non-fixed port-IDs are unfit for UDRAL amounts to discarding the extensive experience from the industry, which is hardly a sensible thing to do. Only inherently limited protocols that are useless outside of their extremely narrow domains, such as CANaerospace or MAVLink, can afford to rely on fixed identifiers and be blind to the general trends in the wider world out there.

Your requirement of having fixed identifiers at the cost of reusability and application flexibility is not justifiable. The failure modes you are concerned about are only manifested at configuration time and therefore they do not affect the operational safety of the vehicle. At the configuration stage, they are trivial to mitigate using mechanisms already discussed.

We are not going to re-introduce the same broken design back into v1, so fixed semantic-IDs are not happening. Instead, I suggest we give ArduPilot a closer look to try and find out how to integrate the non-fixed port-IDs into it without breaking the existing logic and without putting undue strain on the maintainers.

strong typing is one of the bedrocks of reliable computing. It is why we use classes, types etc in languages like C++.

I don’t think this analogy is particularly relevant, but since you brought it up — in native languages like C, C++, Rust, or perhaps any language without mandatory RTTI, types do not exist at runtime. Even Java generics are implemented via type erasure so they lack detailed type information at runtime. You obviously know that. In these terms, interfaces in C++ do work like subjects in UAVCAN v1 — the compiler guarantees that the types are correct before the runtime, just like the configuration stage guarantees that subjects are linked correctly before the vehicle is operational. At runtime, all you have are binary blobs.

For unregulated data types the situation is completely different. In that case a central assignment of IDs would be very counter productive, as it would make v1 much less attractive for experimentation.
<…>
I suspect the examples @pavel.kirienko was talking about on the last call would have been for the unregulated case.

The fact that you mention experimentation suggests that you still don’t understand what port-IDs (topic names) are for. I have attempted to correct this in this post; let me know if I succeeded or not. In the context of this discussion, I make no distinction between regulated and unregulated types.

Remember that implementing one method and applying it throughout the stack is also easier than implementing two separate approaches for regulated and unregulated types.

on receipt regulated messages can be immediately dispatched without any registry lookups

Neither of the approaches requires registry lookups at runtime.


Andrew, if I were to put this post into one sentence: I urge you to be a little visionary and look beyond the immediate needs you have in front of you right now. Lest one day somebody will ask you how to run DDS over CAN.

2 Likes

I would like to, however tonight does not work for me. Could we do tomorrow night?

I think we should have this call, because it seemed like we were close to making some progress at the end of the last call, before Andrew had to drop off.

@dagar @pavel.kirienko @bbworld1 Could you please state your availability?

Also, I was wondering if we could move it a bit earlier. I understand that it would be difficult for @pavel.kirienko, but every member of the call in North America was having a hard time keeping focus (especially @dagar, for whom it was 3AM).

Moving the call 30 minutes earlier is not a huge problem. I should be available at this time for the entire week, including the weekend.

Perhaps we should spend some time now analyzing the possible failure modes at configuration time (due to autoconfiguration, user error, hotplug failures, etc) and addressing them in our design before we move forward. Maybe that will make others feel more comfortable?

1 Like

This sounds like we’ve looped back to the start point.