Port type safety enforcement

@dagar and @tridge are correctly pointing out that that non-fixed port-ID assignment is prone to a particular failure mode where multiple nodes interact with the same subject under incompatible assumptions about its semantics. This is an intentional design decision that is made explicit in Specification, where it simply hands off this problem to the implementer:

UAVCAN v0 used to offer a somewhat limited solution to this problem by seeding the multi-frame transfer-CRC with some function of the identity of the data type. While it worked well in simple scenarios, it breaks when type polymorphism, intentional network casting, or differently named data types are involved, so it had to be removed from v1. Here is a relevant excerpt from the Guide:

[A] network service specification is an abstract blueprint, focused on high-level behaviors and contracts that do not define the means of its implementation. Implementing the said behaviors and contracts such that they become available to consumers is referred to as service instantiation. Instantiating a service necessarily involves assigning its subjects and UAVCAN-services certain specific port-identifiers at the discretion of the implementer (for example, configuring an air data computer of an aircraft to publish its estimates as defined by the air data service contract over specific subject-IDs chosen by the integrator).

Excepting special use cases, the port-ID assignment is necessarily removed from the scope of service specification because its inclusion would render the service inherently less composable and less reusable, and additionally forcing the service designer to decide in advance that there should be at most one instance of the said service per network. While it is possible to embed the means of instance identification into the service contract itself (for example, by extending the data types with a numerical instance identifier like it is sometimes done in DDS keyed topics), this practice is ill-advised because it constitutes a leaky abstraction that couples the service instance identification with its domain objects. Continuing with the air data computer (ADC) example, one could assume that multiple ADC may be differentiated by a dedicated numerical ID, but this would come at a cost of polluting the application data with unrelated implementation support details and forcing the service designer to determine the allowed composition strategies.

[…]

Even though there are costs in the form of the new failure mode, they are justified by the significantly expanded capabilities of the protocol. I am providing this context to help the members understand the design of v1; keep in mind that the discussion of this specific design aspect, or the design of UAVCAN v1 in general, is outside of the scope of this SIG.

This issue was first brought up by Andrew in the now-closed thread on DS-015:

I actually wrote a comprehensive response to that post that was never published because the decision to terminate DS-015 was made just before I could post it. Let me copy-paste the relevant bit from that unpublished response here:

There is another option that I did not mention in my original response, but it was recalled by @dagar — the standard subject type register, which is actually present in the (now defunct) DS-015 demos I shared earlier:

From the pure functionality standpoint, I think that these three measures should be sufficient to address the problem. The UDRAL standard (do we agree on this name, by the way?) should probably make it a requirement that the uavcan.PORT_KIND.PORT_NAME.type and uavcan.node.cookie registers are mandatory for all nodes.

I want to reiterate once again that adding a sensor type-ID field to every message is deadly for the architecture of the distributed system for reasons explained in the Guide.

So say Holybro and CUAV both create a differential pressure sensor.

  • Holybro stumbles through the public_regulated_data_types and decides to use uavcan.si.unit.pressure.Scalar.1.0.
  • CUAV just uses a float (uavcan.primitive.scalar.Real32.1.0)
  • they both use the obvious subject name “differential_pressure” (configured via the register uavcan.pub.differential_pressure.id)

This is okay because we can look at uavcan.pub.differential_pressure.type and see the type? What about actually using the data meaningfully in the flight controller? Are we going to do this on a case by case basis with every product from every vendor? How are vendors going to know what to implement to get support for peripherals across different flight controllers?

What you’ve laid out is very flexible, but what I think most people want from a standard at this level is the piece of information that tells them what to do in the first place. What would be so wrong with having a simple registry of predefined subjects and types? Vendors would know exactly what to do for the common cases and we’d know what baseline to implement in the flight controller for support. Providing a set of common subjects and types doesn’t limit the flexibility of the system, but it would make it much simpler (or even potentially viable) for a large portion of the usage we have right now.

Yes, strict matching by the name of the type will yield false-positive type errors so it is actually incompatible with the structural subtyping implemented in UAVCAN v1 (this may change eventually), but it still suits the specific case at hand, at least to some extent. One naive approach is to let the flight controller alert the integrator that nodes are using differently named types on the same subject, allowing the integrator to validate the configuration manually (in your specific example, the human would deduce that uavcan.primitive.scalar.Real32.1.0 and uavcan.si.unit.pressure.Scalar.1.0 are, in specification-speak, “sufficiently congruent”, and tell the autopilot to stop complaining about this specific subject).

The magic cookie approach should be robust if the flight controller implements the auto-configuration capability. I don’t expect this to be implemented soon so probably we should omit this option for now.

The device-ID register could be a robust alternative that is completely decoupled from the type information; so, in your example, Holybro and CUAV would use different types but by virtue of using the same device-ID, they will avoid triggering a false-positive warning from the flight controller.

They will follow the documentation we provide for our service definitions, similar to the way it was with DS-015. Eventually, we will probably extend DSDL to enable automatic code generation for service instantiation to automate this part away.

There is a significant risk that implementations will appear that will only use hard-coded identifiers, which will then harm the ecosystem and the protocol itself, because people will learn to perceive UAVCAN as a low-level protocol suitable only for UAV sensor networks. Notice how I am talking about UAVCAN here rather than UDRAL, because people will attribute the deficiences of one to the other.

I think the perceived complexity of the non-fixed port-ID management is mostly exaggerated. The DS-015 demo applications illustrate that it doesn’t really add complexity to the nodes, it merely moves one configurable state from the application layer to the transport layer where it belongs (the state being the so-called sensor-ID).

I’m not sure I see that, in many of the use cases today there’s already the expectation of supporting multiple devices on the same bus (airspeed, GPS, ESCs, etc), and supporting this is obviously aligned with their interests of selling more units.

If we layout and enforce the full concept (automatic code generation for service instantiation) then we can also easily supply a compliance kit that tests the required configurability of the port ID for a given subject (among other things). It also eliminates the potential type matching problems.

I really don’t like relying on register lookups for type safety. One of the issues I see with it is that an entity observing the traffic on the bus may not be able to decode the published data correctly. This causes several problems:

  • it makes writing a useful packet analyser much harder as the packet analyser needs to have a UI for mapping subject IDs to particular types
  • it means captured traces may not be able to be decoded if the trace is missing a key register lookup transaction, so a single lost packet can render the trace useless
  • this impacts on both developers and on end users. For example, end-users commonly use the packet analyser built into MissionPlanner. They expect to be able to start that up and immediately see what is happening on the bus
  • it makes “black box” data recorders much less functional as any packet loss makes it not possible to decode the trace

Looking at the two largest implementors of v0, ArduPilot and px4, we can see we have about 60 actively used v0 messages (publishers or subscribers). We have a subject ID range of 8k, minus a few fixed subject IDs currently in the standard (eg. uavcan.time.Synchronization which is 7168).
I propose we use all subject IDs of value 7000 and above as fixed subject IDs, and allocate them to the publicly regulated message sets or to the core v1 types (like time sync). This gives us lots of room, while not really having much impact on the flexibility of v1.
This one change would make UAVCAN v1 much easier to implement and much more robust.
The question then would be if we should do the same with a range of service IDs. I am in general much less concerned about services, partly because it is much less common for devices to define new services.
btw, how did we end up with such an odd scattering of fixed subject IDs and service IDs in the standard? For example, we seem to have 13 fixed subject IDs in v1.0-beta, 2 of which are deprecated. Lowest value is 7168, highest is 8184. Is there some pattern to these that I’m missing?

I think your position is self-contradictory, look:

  • One implication is that by adding recommended non-fixed subject-IDs, we significantly simplify the system configuration, because a new device can be integrated into the system by merely connecting it to the bus without the need to change the subject-IDs.

  • The other implication is that configurations where different subject-IDs will be required (like multiple devices of the same kind) are common, so vendors will avoid hard-coding the IDs. Yet, this implies that predefined IDs will typically need to be changed anyway, which nullifies the advantages enabled by the first point.

If I read you right and the statements are, indeed, mutually exclusive, we will end up with either:

  • Hard-coded IDs will emerge at least for some device types. This outcome has the potential to kill UAVCAN v1.
  • Predefined subject-IDs will not significantly improve the experience for an average user.

One way forward is just to table this problem until later. We move on to define some initial draft of the message set and then, having that as a working MVP, think about how to improve the life of a regular integrator. As I said earlier, the complexity of manually assigning the subject-IDs is being seriously exaggerated. Once we have some minimal user-friendly GUI for that implemented (either in Yukon or integrated into QGC/MP/whatever), it should no longer be perceived as a problem. That GUI, by the way, will be necessary regardless of whether there are predefined subject-IDs or not.

This is a different topic though. We were mostly talking about type safety enforcement during regular operation, while your points are mostly focused on network analysis rather than the normal operation of the vehicle. We will talk about this too, but if we have reached a consensus on the previously raised questions (e.g., that it is feasible to use register lookups for type checking by the autopilot), it should be made clear.

A correctly configured entity is always able to decode the published data correctly. The problem is then to find robust means of configuring said entity.

Your approach is focused on changing the protocol by re-introducing the same defective architecture as in v0 that we removed from v1: using type ID to specify the semantics of the data. This approach is incompatible with modern paradigms of designing information systems, as I wrote at length in The Guide.

I just want to steer this conversation away from any dead-ends to save us all time: fixed port-IDs are not going to happen, because it is not supported by UAVCAN v1, and because architectures built on that are fundamentally flawed. We are not going to change the protocol, it is not what this SIG is about.

Note that Daniel is not speaking about fixed port-IDs. His suggestion was to link pre-defined ID ranges with specific subjects, not data types, if you follow the difference. His proposal is to basically add a table that says, for instance, that outside air temperature is reported on subjects from X to Y, and their types are uavcan.si.sample.temperature.Scalar.1.0. Then there might be some other group of subjects, say, engine temperature goes on subjects Y to Z, and they are probably to be of the same type.

The idea of defining a separate data type to represent one particular specialization of the same underlying process or state is inadmissible because it doesn’t scale and it simply goes against the modern knowledge of how to build a distributed system. This is why in UAVCAN v1, unlike v0, you don’t just define a data type for outside air temperature and then a separate one for engine temperature next to it.

The case of removing sensor-ID fields from data type definitions may seem different but it follows from the same basic principles: a data type is defined to model a particular concept that the application deals with. To continue with the above example of temperature measurement, you just define a data type that contains temperature and that’s that. You may add variance or timestamp or whatever your application requires. If you add a sensor-ID at this point, or define a fixed subject, you are dealing with a bad case of leaky abstraction because suddenly you have to think about the architecture of the whole system whereas the initial task was to define only one particular component of it.

I heard the objection like “but the subject-ID space is so large, surely we can tolerate a few fixed IDs without ever running out of it” many times, it comes out of a misunderstanding of what the subjects are for. The problem is not that we run out of space, the problem is that they are simply not suitable for the purpose.

Going back to the logging issue: adding a fixed port-ID would help you fix the logging problem at the cost of everything else. The user who reads the log is interested in the application-level states exchanged over the network (like that temperature). The way these states are exchanged is defined by the protocol layer, which is below the application layer. When you add fixed IDs, you introduce a hard link between the layers which breaks the application’s abstractions and limits the applicability of the protocol. Making the ID configurable allows us to decouple the two layers. The mapping of subject-IDs to the application’s states/processes becomes part of the vehicle’s design (or configuration). You need to know how the vehicle is designed (or configured) to read its black box.

Yes. From what I wrote above you now see how this is a mandatory part of UAVCAN v1 that is basically baked into the protocol: you need your subject-IDs configured if you want to do anything at all with the network.

The subject mapping problem is outside of the scope of the protocol. Either you obtain that information from the vehicle’s configuration (e.g., MissionPlanner would know how the subjects are configured on the flight controller, or this information may be stored in the log’s metadata) or it is punched in by the user.

In the case of real-time analysis, can’t you just query the subject-ID mapping from the flight controller directly? Otherwise, the same points apply as above.


To summarize, we can talk about Daniel’s proposal. I think it creates a serious existential threat to UAVCAN (not just UDRAL), but it is not fundamentally incompatible with v1.

We will not talk about fixed port-IDs. This topic was discussed on this very forum many times already and a large section of the Guide is dedicated to it.

If I understood your question correctly, it should be answered in the README here: https://github.com/UAVCAN/public_regulated_data_types/#identifier-ranges

I don’t think it’s contradictory if you distinguish the groups and actual usage. For example only a tiny minority of users (say < 1%) have redundant GPS, but all GPS vendors will have customers that want redundant GPS. The users (this includes system integrators) are who deals with the burden of configuration complexity, the vendors only need to follow the standard. Predefined subject IDs will not need to be changed in the majority of typical usage, and even in the redundant cases they might only need to change for each additional unit (eg secondary GPS). This would very much improve the experience of the average user without limiting anyone.

We can minimize this by being very clear with the standard and combining things in generated code/libraries like the predefined subject id only getting its port ID from a register of a certain name. Failure to do this can be stated unambiguously as not compliant.

I’m far more concerned that if we don’t get traction with UDRAL it’s going to do more damage to UAVCANv1. If we fail with UDRAL and some people still continue with UAVCANv1 I can pretty much guarantee it’s going to be done with hard coded port IDs if at all. We need to make it easy and obvious for people to do the “right” thing via this standard.

Consider we already have an iteration of that with DS-015 and based on that feedback from multiple parties it’s why I don’t think we’re going to have much success if we don’t find some way to make the experience more palatable.

I think we might be misaligned in what we’re trying to accomplish here. It’s great that with UAVCANv1 people will be able to trivially re-use anything from public_regulated_data_types for whatever purpose on custom subjects/port ids, but that’s not really sufficient. The entire value of this standard from my perspective is finding a way to rein in the complexity and solve the majority of existing use cases out of the box. We want to be able to enforce some level of conformity across vendors and flight stacks for these relatively simple use cases (GPS, ESC, etc). Combining that baseline ecosystem with the flexibility of UAVCANv1 is the killer feature that could make it ubiquitous.

We need to try and collaborate on solving these problems (type safety, subject IDs, AND efficient messages) in unison right now or people will never let go of simple sensor messages and fixed port-IDs because quite frankly that would solve most of our problems. I don’t know if a registry of subject-ids (possibly with some default port IDs) is sufficient or if it means we really need Yukon for a minimal MVP, but I do know what we currently have is not enough.

I see what you are saying. I think the listed goals could perhaps be attained without the introduction of predefined subject-identifiers but with predefined device classes specific to UDRAL.

For example, we could define a UDRAL-standard read-only register (say udral.pnp.class) that contains a certain predefined value that communicates to outside entities that this node implements, say, an UDRAL-conformant GNSS receiver. The flight controller would read this value and deduce which port names should be configured to make the node operational. There may be multiple classes per node.

Nodes that are already auto-configured can be detected by reading another mutable UDRAL-defined register (say udral.pnp.cookie) that contains arbitrary value that identifies the system that was the last one to auto-configure the node (such that the user would not distress the network by connecting a device that is already configured by another flight controller).

The advantages of this approach are:

  • We don’t have to pre-define any port-IDs at all. We pre-define device classes instead at the UDRAL level. Vendors of simple nodes like sensors or actuators only need to provide one constant and one mutable register to support auto-configuration.

  • Specific port-ID ranges can be defined by vehicle vendors or flight controller vendors as necessary. This does not have to be documented anywhere because it does not affect compatibility between conformant devices. PX4 and ArduPilot may use completely different strategies for port-ID assignment without any harm to the ecosystem.

  • The flight controller can automatically accommodate multiple devices of the same class on the network. For instance, if the user connects multiple GNSS units, the flight controller would detect that there are multiple devices of the same class and assign their port-IDs appropriately.

This obviously does not mean that no manual configuration will be needed. For example, if I am integrating a gimbal unit that requires subscription to a subject containing geodetic coordinates of the vehicle, it is impossible to auto-configure that. Ultimately, this approach only works for very simple star networks where data feeds either originate or terminate at the flight controller, but this is intentional since the full auto-configuration can only be performed with deep understanding of the architecture of the vehicle, which is not feasible to automate.

I am going to try and provide a crude functional demo of this idea shortly to enhance the discussion.

Took me a while but the PoC is now ready. It’s a bit crude but I hope it’s good enough to illustrate the concept:

It is a mix of the original idea I shared at Automatic configuration of port identifiers - #7 by pavel.kirienko and the cookie register I mentioned in the previous post. Before I explain how it works, let me explain its behavior.

What it does

You can grab the demo from the repo, install Yakut (pip install yakut), and run it:

yakut compile -O . https://github.com/UAVCAN/public_regulated_data_types/archive/master.zip \
                   https://github.com/Zubax/zubax_dsdl/archive/master.zip
export UAVCAN__CAN__IFACE="socketcan:vcan0"
export UAVCAN__NODE__ID=1                    # Any node-ID will do
./udral_pnp.py

Be sure to configure a virtual CAN interface or use a real one.

Run Yakut monitor with the node-ID allocator:

export UAVCAN__CAN__IFACE="socketcan:vcan0"
export UAVCAN__NODE__ID=127                  # Any other node-ID will do
y mon -P node_id_allocation_table.db

The PoC will emit a few messages saying that it’s detected a new node (the monitor), but the node doesn’t seem to be UDRAL PnP-capable because there is no cookie register in it:

root:       Detected new online node 127
node_proxy: Started auto-configuration of node 127
node_proxy: Node 127 is not UDRAL-PnP-capable, please configure it manually

Now you can grab the old DS-015 demos extended with the cookie register and build them; here is the relevant diff:

@@ -645,6 +649,13 @@ int main(const int argc, char* const argv[])
     val._string.value.count = 0;
     registerRead("uavcan.node.description", &val);  // We don't need the value, we just need to ensure it exists.
 
+    // The UDRAL cookie is used to mark nodes that are auto-configured by a specific auto-configuration authority.
+    // We don't use this value, it is managed by remote nodes; our only responsibility is to persist it across reboots.
+    // This register is entirely optional though; if not provided, the node will have to be configured manually.
+    uavcan_register_Value_1_0_select_string_(&val);
+    val._string.value.count = 0;  // The value should be empty by default, meaning that the node is not configured.
+    registerRead("udral.pnp.cookie", &val);
+
     // Configure the transport by reading the appropriate standard registers.
     uavcan_register_Value_1_0_select_natural16_(&val);
     val.natural16.value.count       = 1;

Now, if you run either demo (or both), the PoC will auto-configure them immediately. For example, when I start the differential pressure demo for the first time, I get roughly this:

root:       Detected new online node 118
node_proxy: Started auto-configuration of node 118
node_proxy: Node 118 requires autoconfiguration because cookie '' != expected 'autoconfigured 4b65eb38'
node_proxy: Registers available on node 118: ['uavcan.node.id', 'uavcan.node.description', 'udral.pnp.cookie', 'uavcan.can.mtu', 'uavcan.pub.airspeed.differential_pressure.id', 'uavcan.pub.airspeed.differential_pressure.type', 'uavcan.pub.airspeed.static_air_temperature.id', 'uavcan.pub.airspeed.static_air_temperature.type', 'uavcan.node.unique_id']
node_proxy: Node 118: currently configured ports: PortAssignment(pub={'airspeed.differential_pressure': 65535, 'airspeed.static_air_temperature': 65535}, sub={}, cln={}, srv={})
root:       Allocating services of remote node 118; available ports: PortAssignment(pub={'airspeed.differential_pressure': 65535, 'airspeed.static_air_temperature': 65535}, sub={}, cln={}, srv={})
root:       Detected services on node 118: {'airspeed': {'': PortSuffixMapping(pub={'differential_pressure': 'airspeed.differential_pressure', 'static_air_temperature': 'airspeed.static_air_temperature'}, sub={}, cln={}, srv={})}}
root:       New airspeed client of node 118: AirspeedClient(diff_pressure=Subscriber(dtype=uavcan.si.sample.pressure.Scalar.1.0, transport_session=CANInputSession(InputSessionSpecifier(data_specifier=MessageDataSpecifier(subject_id=6010), remote_node_id=None), PayloadMetadata(extent_bytes=11))), temperature=Subscriber(dtype=uavcan.si.sample.temperature.Scalar.1.0, transport_session=CANInputSession(InputSessionSpecifier(data_specifier=MessageDataSpecifier(subject_id=6020), remote_node_id=None), PayloadMetadata(extent_bytes=11))))
node_proxy: Node 118: new ports: PortAssignment(pub={'airspeed.differential_pressure': 6010, 'airspeed.static_air_temperature': 6020}, sub={}, cln={}, srv={})
node_proxy: Writing registers of node 118: {'uavcan.pub.airspeed.differential_pressure.id': uavcan.register.Value.1.0(natural16=uavcan.primitive.array.Natural16.1.0(value=[6010])), 'uavcan.pub.airspeed.static_air_temperature.id': uavcan.register.Value.1.0(natural16=uavcan.primitive.array.Natural16.1.0(value=[6020])), 'udral.pnp.cookie': uavcan.register.Value.1.0(string=uavcan.primitive.String.1.0(value='autoconfigured 4b65eb38'))}
node_proxy: Node 118: command uavcan.node.ExecuteCommand.Request.1.1(command=65530, parameter='') response: uavcan.node.ExecuteCommand.Response.1.1(status=0)
node_proxy: Node 118: command uavcan.node.ExecuteCommand.Request.1.1(command=65535, parameter='') response: uavcan.node.ExecuteCommand.Response.1.1(status=0)
node_proxy: Node 118 configured successfully

These messages are followed by the output of the simulated sensor feed after the node is configured. The servo demo is integrated in a similar fashion, the PoC supports up to 2 airspeed sensors and up to 2 servos.

The following screenshot shows both demos running, with the auto-allocated ports being 5000, 5050, 6010, and 6020:

How it works

The basic algorithm is already explained in Automatic configuration of port identifiers - #7 by pavel.kirienko. While working on its implementation, I noticed that it is not necessary to introduce additional registers for network service identification, since this information is trivially deducible from the names of the available ports, especially if they are mandated to follow a specific pattern. Here is a high-level overview of the process:

  1. The configurator node (e.g., the flight controller) has a certain token that identifies the current configuration of the vehicle. In the PoC, this is simply a 32-bit random number that is generated once and stored in a non-volatile register. If the vehicle needs to be re-configured, the token can be changed to trigger reconfiguration of all nodes connected to the vehicle from now on.

  2. Whenever a new node is detected on the network, the PnP configurator reads the value of its string-typed register udral.pnp.cookie. If the register does not exist (or is of a wrong type), the node does not support auto-configuration, in which case the process ends here. The human may be hinted to configure the node manually.

  3. If the value of the cookie matches the value of the configuration token, the node was auto-configured earlier so it needs no further processing. If the cookie contains some unexpected value (e.g., a string like manual), auto-configuration is also skipped assuming that the human prefers to configure the device manually. Notice that this hot path (one register request) will be executed always whenever any node becomes online.

  4. By this point we have determined that the device supports and requires auto-configuration. We will need to determine which standard network services it supports (like servo control or airspeed sensor data publication). If we find any standard services that we can accommodate automatically, that’s fine; there may be other services that may need to be configured manually (like vendor extensions, application-specific, or other standards). Let’s call this process service discovery, it is very simple; the first thing to do is to read all the registers that are available on the node using uavcan.register.List.

  5. Of all registers exposed by the node, we are only interested in those that configure port-IDs. Per the standard, they follow the pattern like uavcan.(pub|sub|cln|srv).[a-z0-9_.].id. For example, register uavcan.pub.airspeed.differential_pressure.id defines the ID of published subject airspeed.differential_pressure.

  6. Extract port names from the register names. At this stage, we end up with a list like:

    PortAssignment(
        pub={'servo.feedback': 1234, 'servo.status': 65535, 'servo.power': 1236, 'servo.dynamics': 1237},
        sub={'servo.setpoint': 1238, 'servo.readiness': 65535, 'some_vendor_specific_thing': 3333},
        cln={'another.vendor_specific_thing': 123},
        srv={},
    )
    
  7. Next we check if there are any port names that look familiar (follow the naming convention we define for UDRAL). Ports that don’t follow the convention are simply ignored (either to be manually configured or to be auto-configured using some other means). Suppose that UDRAL-compliant network services name their ports following the pattern like service_name.instance_name.suffix, where the service name defines which kind of service it is (e.g., “airspeed”, “servo”, “gnss”, “esc”, etc.), the instance name is provided if there is more than instance implemented by the node (e.g., “first”, or just “0”), and the suffix reflects the purpose (e.g., differential_pressure). For example, servo.left.dynamics, or airspeed.static_air_temperature. Here is how this logic is implemented in Python:

    @dataclasses.dataclass(frozen=True)
    class PortSuffixMapping:
        pub: dict[str, str] = dataclasses.field(default_factory=dict)
        sub: dict[str, str] = dataclasses.field(default_factory=dict)
        cln: dict[str, str] = dataclasses.field(default_factory=dict)
        srv: dict[str, str] = dataclasses.field(default_factory=dict)
    
    def detect_service_instances(pub: Iterable[str] = (),
                                 sub: Iterable[str] = (),
                                 cln: Iterable[str] = (),
                                 srv: Iterable[str] = ()) -> dict[str, dict[str, PortSuffixMapping]]:
        out: dict[str, dict[str, PortSuffixMapping]] = {}
    
        def psm(s: str, i: str) -> PortSuffixMapping:
            return out.setdefault(s, {}).setdefault(i, PortSuffixMapping())
    
        for svc, ins, suf, port in _split(pub): psm(svc, ins).pub[suf] = port
        for svc, ins, suf, port in _split(sub): psm(svc, ins).sub[suf] = port
        for svc, ins, suf, port in _split(cln): psm(svc, ins).cln[suf] = port
        for svc, ins, suf, port in _split(srv): psm(svc, ins).srv[suf] = port
        return out
    
    def _split(port_names: Iterable[str]) -> Iterable[Tuple[str, str, str, str]]:
        for pn in port_names:
            p = pn.split(".", 2)
            if   len(p) > 2: yield p[0], p[1], p[2], pn  # e.g., "servo.first.dynamics"
            elif len(p) > 1: yield p[0],   "", p[1], pn  # e.g., "servo.dynamics" (no instance name)
            else:            yield   "",   "", p[0], pn  # e.g., "dynamics" (no service/instance name)
    
    >>> pub = [
    ...     "airspeed.foo.differential_pressure",   # Service "airspeed", instance "foo"
    ...     "airspeed.foo.static_air_temperature",  # Service "airspeed", instance "foo"
    ...     "airspeed.bar.differential_pressure",   # Service "airspeed", instance "bar"
    ...     "servo.feedback",                       # Service "servo", anonymous instance (singleton)
    ...     "servo.status",                         # etc.
    ...     "servo.power",
    ...     "servo.dynamics",
    ... ]
    >>> sub = [
    ...     "airspeed.bar.heater.state",            # Service "airspeed", instance "bar" (see above)
    ...     "servo.setpoint",                       # Service "servo", anonymous instance (see above)
    ...     "servo.readiness",
    ...     "unrelated.subscription",               # Application-specific or vendor-specific subject, non-standard
    ... ]
    >>> srv = [
    ...     "unrelated.server",                     # Application-specific or vendor-specific server, non-standard
    ...     "standalone_server",                    # Not part of a service, non-standard
    ... ]
    >>> result = detect_service_instances(pub=pub, sub=sub, srv=srv)
    >>> list(result)
    ['airspeed', 'servo', 'unrelated', '']
    >>> result["airspeed"]
    {'foo': PortSuffixMapping(pub={'differential_pressure':  'airspeed.foo.differential_pressure',
                                   'static_air_temperature': 'airspeed.foo.static_air_temperature'},
                              sub={},
                              cln={},
                              srv={}),
     'bar': PortSuffixMapping(pub={'differential_pressure': 'airspeed.bar.differential_pressure'},
                              sub={'heater.state':          'airspeed.bar.heater.state'},
                              cln={},
                              srv={})}
    >>> result[""]
    {'': PortSuffixMapping(pub={},
                           sub={},
                           cln={},
                           srv={'standalone_server': 'standalone_server'})}
    

    In this example, we assume that the airspeed service defines the following ports:

    • Publisher differential_pressure of type uavcan.si.sample.pressure.Scalar
    • Publisher static_air_temperature of type uavcan.si.sample.temperature.Scalar

    There are similar conventions for the servo service.

  8. We instantiate service clients locally by picking any arbitrary unoccupied port-IDs. The demo uses 6010…6019 for differential pressure, 6020…6029 for the static air temperature, 5000…5049 for servo dynamics, and so on. Naturally, such ranges do not need to be mentioned in the standard to discourage poor design. Type safety is ensured by segregating different functions by ID ranges; although, in the case of manual configuration, the user is still responsible for setting the types correctly.

  9. Having allocated the identifiers, we update the PortAssignment instance (see above) and update the remote node configuration accordingly. We also rewrite the cookie to indicate that the configuration is now installed.

  10. At the last step, we issue uavcan.node.ExecuteCommand with COMMAND_STORE_PERSISTENT_STATES (in case the node does not implement an automatic update of the storage) and then COMMAND_RESTART (to apply the new settings).

Note that this particular demo does not handle indexed group command messages, such as ESC commands. While this is trivial to implement, I decided not to overcomplicate the PoC for now. If you find this direction sensible, then it might be better to start experimenting with the actual flight controller codebase rather than with Python scripts.

The advantage of this approach is that it is architecturally clean unlike the way of fixing port-identifiers, and it is able to automatically configure multiple instances of function nodes.

The downside of this approach is the added complexity on the flight controller (but not the other nodes, which only have to implement the cookie register to support this). The Python PoC is a little under 400 lines of code large. I expect that on an embedded system it would take about two thousand lines of C++.

this it is not “clean”, but massively overcomplicated. All that complicated process just to avoid having a type in the packets, and still doesn’t allow a network analyser to monitor traffic in a sane way.
Having the packet type encoded in the packet is vastly better. It is much more robust and so much simpler.

It’s great to have you back, Andrew, really.

Observe that merely specifying the type is not sufficient as it communicates no information about the semantics of data. Say, if the type is vendor.geometry.Pose, what pose is this?

We seem to require an identifier of semantics rather than type, which we already have – it’s called the port-ID.

You have been absent for a while and I’m not sure if you’ve had a chance to read my earlier post in this thread where I talked about this (among other things).

Thanks for getting back online Andrew!

I just wanted to elaborate on what Pavel said, in case anyone else is reading and got confused. I am trying not to just hop on the bandwagon here, so I gave port type safety quite a bit of thought yesterday and (for the most part) came to the same conclusions as Pavel - granted that certain assumptions and architectural values are taken into consideration.

We are operating under the assumption that fixed port IDs are not going to happen, and hard coding any type of ID should be minimized (whether or not it can be completely avoided is to be seen). Note that this is not a value desired only for UAVCAN, but in general computer science: hard coding is an antipattern due to the increased rigidity of the architecture. This may make someone uncomfortable to begin with, but please bear with me.

I hope that the conclusions I make here demonstrate that Pavel’s PoC actually solves problems in an intuitive way rather than an overcomplicated fashion.

The basic problem statement:
A device operating under the UDRAL specification may choose to publish a piece of data, let’s say thats a vendor.motor.Setpoint. Another (motor driver) device is expected to consume this data. As per how UAVCANv1 is defined, this data is published and subscribed on a port, which is identified using a port ID (integer). The question now becomes how we effectively (and automatically) define these port IDs in order to link the publisher to the subscriber, without hard coding the IDs in the specification. We want to also prevent type safety issues (example: another device publishing a differential pressure output from an airspeed sensor decides to use the same port ID, causing our motor driver to consume data that was serialized as differential pressure and de-serialize it as a motor setpoint. This can have disastrous consequences.

At first glance, it seems that providing a data type in the message payload determining its type could fix this problem. However, observe that this is not actually necessary, because the problem is not about data types. It is instead about port semantics. What I mean by this is that we want to make sure that the type of data (GPS position, motor setpoints, differential pressure) being published on a certain port ID is strictly the type of data that the subscriber expects. UDRAL will enforce the type of data published on a port, so if we get the port correct, we are guaranteed that the data is what we expect.

If we want to solve this problem without hardcoding port IDs, we turn to the next layer of abstraction: port names. Observe that this is a practice that is popular in other DDS implementations and protocols, such as ROS topics. UDRAL
s service classes define a list of standard port names that devices must use. Along with these port names, there is a standard of what UDRAL-specified data type is being published on that port. Thus, @dagar’s concern:

So say Holybro and CUAV both create a differential pressure sensor. Holybro stumbles through the public_regulated_data_types and decides to use uavcan.si.unit.pressure.Scalar.1.0. CUAV just uses a float (uavcan.primitive.scalar.Real32.1.0). They both use the obvious subject name “differential_pressure” (configured via the register uavcan.pub.differential_pressure.id)

Is invalid as far as I can see, because UDRAL will enforce compliance to what type is published under a port name.

Now we are back to the problem of how we map an architectural design (port name) to its wire implementation (port ID). This is trivially accomplished by having a register to assign a port ID to a port name. The configurator, either an automatic process or a manual integrator, need only look at a list of the registers, check the service class(es) implemented by the device, and compare it with their own record of all the port names allowed under that service class, as defined by UDRAL. For each one that is found, the configurator pulls an ID out of a hat and assigns it.

The last remaining problem is actually the human aspect: allowing manual integrators to configure the network (this is required both for determinism when the user desires it as well as allowing port names not defined under UDRAL) can cause a conflict between IDs assigned by the configurator and those assigned by a human (or a previous configurator). The previous configurator issue (which is eaiser to solve) is then trivially addressed by marking the configuration as “stale” through some cookie mechanism (such as a random 32 bit integer). For a potential solution to the human problem, see the bottom of my post.

Note that such problems have been faced and addressed in the field of IP networking. An analogy would be if the HTTP specification specified what IP address a webserver could have, or what ports it could use to serve HTTP requests. HTTP is different in that there are indeed conventions, but they are merely conventions. I can still choose to run my webserver on port 8080, 3000, or whatever, a flexibility that is paramount. In IP networking, the state space is much larger, and still, devices communicate extremely reliably while also avoiding hard coding things. Dynamic assignment (DHCP, etc) have been proven to be reliable and convenient enough for anyone to pull out a laptop and connect to a network without any understanding of what is going on. Do we dismiss the implementation of DHCP because it adds “complex” logic to our protocol? Of course not!

I spoke to a friend of mine who works with networking, and he had a couple of suggestions. I am not sure yet if these will work but they are definitely worth considering:

  1. DHCP also suffers from the problem of conflicts between manually and automatically allocated IDs. The solution to this is simple in networking: remove all manual configuration of IDs from the device, and instead move them to a “hints” table on the automatic allocator. DHCP implements this via an “IP reservations table”. This allows the human to customize anything while preventing conflicts with the allocator.
  2. (This one is a bit more tricky and may not be practical) create a transaction to allow devices to negotiate on port ID uniqueness. For example, both an airspeed sensor and a flight controller say that they are publishing something on port 1234. However, they realize that since both of them are publishing something completely different on said port, something is wrong. The allocator or human can then be hinted to solve this conflict.
2 Likes

Nice summary!

Not to be too pedantic but the part about type matching requires an amendment:

These points do not take into account covariant or equivalent types:

  • The publisher may use a data type that is a structural supertype of that of the subscriber. The subscriber will ignore the extra data at the end by virtue of the implicit truncation rule.

  • Members of any data link may successfully use different data types that do not belong to a given type hierarchy as long as they are semantically compatible. Examples of this are given in the first section of the Guide that talks about data type evolution; also consider the case of data type renaming (changing the name does not change the syntax).

It is therefore not technically correct to require exact type matching. The Specification only requires that the types are to be “sufficiently congruent”.

It is therefore not technically correct to require exact type matching. The Specification only requires that the types are to be “sufficiently congruent”.

I see what you mean, the thing that makes me uncomfortable is the definition of sufficiently congruent. I am not 100% versed on the specification, but as long as it can be statically guaranteed that two types, upon deserialization, will be interpreted the same way, that is satisfactory.

@pavel.kirienko What did you think about the two suggestions at the end (especially completely moving static configuration to a hints table on the allocator rather than the allocator ignoring manually configured nodes? We can keep this to just UDRAL, or even make it more widely applicable - if port identification is a separate standard, all ports could be differentiated by names? Either way, the allocator wouldn’t conflict with the human.

It’s an interesting idea but wouldn’t it make the configuration management a bit more centralized since manual configuration will be entered via the hints table instead of directly at each node? Or maybe I am misreading you. Either way, I think that in general, we should aim at having a simple MVP sooner rather than a more sophisticated solution later. One architectural benefit of the auto-configuration approach we are discussing is that nodes other than the auto-configuration authority (e.g., the flight controller) are isolated from the allocation strategy, which allows us to roll out a simple strategy first, and then perhaps replace it with a more advanced one later without having to upgrade any of the fielded nodes other than the autoconfiguration authority.

At this stage, we should definitely keep it to UDRAL. Shall the design prove successful and widely applicable, we can consider migrating it to the core Specification at one point. Right now I suggest we avoid looking into such a distant future and focus on the tasks at hand.

by type I mean ‘magnetometer’ or ‘imu’. Like we did in v0. It is how it is done by just about every protocol in widespread use. Because it makes sense to put the identifier of the semantics and the syntax in an integer in the packet.

I’m afraid this is complete rubbish. That is not an anti pattern in CS. In case it matters, I do have a PhD in CS, and a heck of a lot of experience in network protocols. I have no idea what text book this is coming from, but it sure isn’t one I would teach from.
We’re building for a resource constrained, bandwidth constrained environment. When you build for that sort of environment you design for efficiency.

if you want to use the webserver analogy (and it isn’t a good one) then we can run with that if it helps you understand. I am not suggesting hard coding IP addresses or TCP/UDP port numbers. I’m suggesting that the packets being sent by the protocol contain information in the packet itself that says what the semantics of the packet is.
Have you ever used the wireshark protocol analyser? Have you tried opening it on a random network and having a look? You’ll see that it can parse the data flowing on the network. It can do this because there is sufficient information going over the network to know what is going on.
With the model that v1 is pushing the network analyser can’t do that. It can’t look at the packets and know how to parse them.
Also, please for both you and Pavel please stop assuming I’m some ignorant newbie in networking. I understand what you’re doing with getting semantics via a lookup on the senders registry. I understand it and I know it is a really bad idea. Go spend some time with wireshark looking at all the protocols flowing on your local network. See that you can (for unencrypted data) know the sematics by inspecting the packets in the capture. Look at all of the fixed identifiers. They are not evil protocols. They are sensibly designed.

@tridge I’m sorry if I came across as patronizing. Please note that I’m also trying to explain to myself and anyone else reading this by documenting the problem statement in a very verbose fashion.

@tridge Are you open to a voice chat at some point, preferably with @dagar and @pavel.kirienko? I feel that we are not getting anywhere on this forum - I want my agenda fulfilled, you want yours, Pavel has his, and no one is agreeing on anything. I am not sure if a VC will solve our problems but I’d like to try it.

Please let me know if you’re open/available and when.

Meta

Feels awkward to say it directly considering that you are far more experienced and decorated than any of us, but this is not how constructive dialogue works. I understand you are terribly frustrated that we are being blind to the truths that seem obvious to you, just like I am unhappy to see some of my arguments overlooked. Can we please focus on the ideas rather than their authors?

Your Wireshark example is instructive because general-purpose packet analyzers are mostly useless with application-layer protocols. Wireshark lets you debug USB or DDS but it is unable to offer anything at the higher layers that these protocols are designed to serve (e.g., UVC streams or ROS topics). Speaking of UAVCAN in particular, Wireshark would be able to discern message/service transfers and segregate them by port-ID, but looking into them is above the layer of abstraction it is designed to manage. Just like you can’t substitute, say, rostopic echo with pcap.

I said that the example is instructive because you chose to constrain your reasoning about the problem at the transport layer. When you look at our proposal from this perspective, it appears nonsensical indeed.

Fixing a bunch of port-IDs for specific messages is not expected to help much because that would bring us back to square one, the v0 design with: 1. rigidly coupled data types and their semantics; 2. instance identifiers (fields like sensor_id) polluting application’s abstractions; 3. non-composable interfaces ignoring the problems addressed by SOA.


I think Kalyan is right, we are not getting anywhere: three months in, still going in circles. I’m not sure if the call would help here but surely we can give it a try.

I imagine that our timezones are roughly like this: https://www.timeanddate.com/worldclock/meetingtime.html?iso=20210719&p1=242&p2=57&p3=234

Next week I am easily available at almost any time in 09:00—01:00 Tallinn time (which is EEST or UTC+3), but mornings suit me better. If you are unable to make it work, let me know and I will stretch my schedule as necessary.

@tridge @dagar @coder_kalyan please state your availability.

1 Like

I should be able to make myself available at most times of the day from 08:00 to 22:00, stretched if necessary. Mornings and afternoons work better but based on looking at the meeting time link that may not be possible.

To avoid endless discussion: here is a proposed meeting time. It is not ideal to anyone (quite early for Pavel and very late for Daniel) but it’s the best I could find so far.

21:00 PDT == 14:00 ACT == 07:00 EEDT == 24:00 EDT

I am not sure about @scottdixon’s time zone or availability but I cannot imagine that his would conflict with the already stretched schedule shown here :slight_smile:. @scottdixon if you are available to join, please let me know, I think it would be beneficial.

@bbworld1 I’d appreciate it if you could make it as well.

Fixing a bunch of port-IDs for specific messages is not expected to help much because that would bring us back to square one, the v0 design with: 1. rigidly coupled data types and their semantics; 2. instance identifiers (fields like sensor_id) polluting application’s abstractions; 3. non-composable interfaces ignoring the problems addressed by SOA.

Amendment: it would definitely help in the sense that it would solve the port type safety and auto-configuration issue at the same time. I don’t think anyone is questioning that because it’s how v0 works and v0 does not suffer from any of these robustness problems. However as you pointed out it is not ideal in other regards.