Port type safety enforcement

At the end of the discussion, @pavel.kirienko mentioned to me an example that I am reposting here (Pavel, I hope you don’t mind). I have also added some of my own thoughts to the example to enhance it a bit.

The example was not brought up during the call as it is not in the same domain as UAVCAN or UAVs.

In the early days of consumer computing, PCs were equipped with all sorts of random connectors for various things. These include: parallel port (what Pavel brought up), as well as others - PS/2, RS-232, etc. At the time, computing resources were limited, and it was cheaper to create dedicated interfaces that addressed specific use cases in an efficient manner. However, as applications of computers grew, this quickly got out of hand - computers had far too many random connectors, operating system kernels had quite a lot of drivers, and daisy chaining adapters was common.

Enter the Universal Serial Bus (USB). It is a highly inefficient protocol that relies on multiple layers of abstraction and does not optimize for any specific use cases. It slowly started replacing existing solutions:

  • RS 232 was replaced by the CDC ACM device class
  • PS/2 was replaced by the HID device class
  • parallel port was also replaced
  • USB-C provides charging and configurable power delivery (UCPD) and has replaced dedicated power connectors.
  • In recent years, even dedicated display ports (like HDMI) have been replaced by USB connectors!

Is USB non-specialized? Yes
Does USB introduce more application layer complexity? A lot of it
Is USB slow and inefficient? Absolutely
Do these properties mean it is not in use? Clearly not

Despite these shortcomings, USB had the benefit of standardization and composability. The introduction of a standard connector over the entire ecosystem, that can still handle custom arbitrary use cases, has proved to be useful. When it was first created, USB 1.0 was very slow (relatively speaking). USB 2.0 high speed and 3.0 super speed fixed this over time - just like CAN-FD and ethernet will in the future. Did the designers of USB imagine that someone would be plugging a laptop into a workstation setup where 65 watt power, external keyboard and mouse, external audio setup, and 2 4K monitors are served over a single port? No.

Let us please be very cautious when over-optimizing a protocol to the current use cases. Future use cases are, by definition, not easy/possible to predict. It is also far easier to introduce more rigidity later on than remove rigidity in a standard that is already widely deployed.

1 Like

EDIT: I am not removing this message as it does illistrutate my point well, but I realize in hindsight that it is a bit too emotionally loaded. I do not want to be a hypocrite by condemning others’ use of strong language while doing the same myself. This post is not meant to offend anyone. It also does not imply that MAVLink is not fit for its current use case - just that it is an example of a very optimized protocol.

To take a more relevant example: MAVLink.

Disclaimer: I do not like MAVLink as a protocol, I believe it has many design flaws. Sorry if this offends anyone.

MAVLink was developed almost a decade ago when the concept of autonomous vehicles was new. It was defined to solve a specific use case: communication of comands, telemetry, and other relevant information between a ground control station and a vehicle using a very constrained transport (433/900mhz telemetry radios are usually in the kilobit range). It is highly efficient (very little packet overhead) and optimized (fixed message IDs). MAVLink is even worse than v0 in the sense that it does not provide an easy means of creating custom messages for other use cases. It has further issues:

  • It does not provide nearly enough node IDs (1 byte only) for the complex fleet use cases these days (or other use cases)
  • Even the entire node and component ID thing is a huge mess because the standard isn’t specific enough about how to implement it, and hardly anyone follows it
  • Data types are a mess. There are degrees being used everywhere (instead of radians) and other non-SI units that creates developer headaches. I myself faced an issue where I had to implement 2 versions of a message (one as an RPC request and one as a pub/sub) where the pub sub used quaternions but the RPC request used euler angles due to the limitations of the protocol. This is ridiculous.
  • It is not flexible or configurable, limiting its applicability.

Recently, I have seen the proposed adoption of MAVLink to similar use cases that UAVCAN was designed to target: software defined vehicles. There has been the introduction of a MAVLink gimbal protocol, smart battery protocol, tunnel protocol, camera protocol, and more. I have even seen prototypes of MAVLink over CAN bus. I am surprised and absolutely horrified at this. MAVLink was not created to target such use cases. I have myself tried to implement complex internal logic using MAVLink over UDP, and ended up abandoning the project due to the rigidity and bad design of MAVLink. The software-defined vehicle problem should not be solved using MAVLink.

MAVLink went from being a good protocol for the only use case in existance, to overstaying its welcome, in under a decade. I do not want the same thing to happen to UAVCAN by 2030.

that only works if the id is guaranteed by the design, preferably by the dsdl compiler. If I’ve understood your proposal correctly it is a convention, not a guarantee. Unless it is guaranteed then it is no use at all. We can’t dispatch blobs of bytes to a subsystem in the flight controller based on what amounts to a hint.

it doesn’t achieve what we have in v0 without it being guaranteed by the compiler

you would need to introduce new data types and code in v1 as well. The nodes that knows the pos/vel/attitude (usually the flight controller) needs to offer that data. It won’t do that without someone writing the code to do that. If multiple nodes offer that data then you still need some way for the gimbal to be told which source of that information to trust. You certainly don’t want the gimbal to just take any position data on the network, as the position data from something like a GNSS is not suitable for a gimbal (due to GNSS lag the gimbal would perform very badly).
Just because two pieces of data are the same type does not mean they are equivalent.

a few counter points.
The mavlink compiler (mavgen) generates code that still works with v0.9 and v1.0, despite v2.0 being introduced several years ago. Users mix and match v1.0 and 2.0 on the same uart. We evolved mavlink in a way that introduced very little pain for users whereas UAVCAN is taking the attitude of “throw it all out, start again and damn the users”.

the way things are going now we can expect UAVCAN to be dead and forgotten well before then. I expect that MAVLink will still be around and widely used.
When we were low on IDs (the 8 bit message ID limitation) we extended it to 24 bits in a way that users didn’t even notice. When we wanted message extensibility we added that in a simple manner that allowed for extensions without breaking existing code.
The design of many of the message types in MAVLink is indeed poor, just as the design of many of the message in v0 is poor. Neither is nearly as bad as DS-015 was though.
UAVCAN could learn a lot from MAVLink.

1 Like

Here is another idea. We have several reserved bits in the current v1 beta spec that are reserved. Here is the current packet format:


Bit 23 is marked as discard if bit isn’t 1. We could change that to be that when bit 23 is 0 the subject ID is used as a semantic ID.
Then we’d have DSDL like this:

# GNSS message

@semantic_id 17

uint3 instance
uint3 status
uint32 time_week_ms
uint16 time_week
int36 latitude
int36 longitude
float32 altitude
float32 yaw
uint16 hdop
uint16 vdop
uint8 num_sats
float32 velocity[3]
float16 speed_accuracy
float16 horizontal_accuracy
float16 vertical_accuracy
float16 yaw_accuracy

note the @semantic_id marker. This would do several things:

  • the code generator would automatically fill in the subject_id with the semantic_id when a semantic_id is specified in a message
  • bit 23 would be set to 0 in these messages
  • the compiler would check that there is no re-use of the semantic_id values across all of the DSDL being compiled for the project

This still allows for v1 to support all the SOA capabilities that Pavel wants to see, while also supporting robust sensor networks which is what we need for ArduPilot and PX4. By using up one reserved bit we get past this logjam in our discussions.

3 Likes

Noooooooo! Not our precious reserved bit!?! @kjetilkjeka, Andrew wants our reserved bit! Do you have any idea how hard we fought @pavel.kirienko to get that bit? :grinning:

Seriously though, I think the idea of just having a smattering of fixed port identifiers as part of a super-simple profile is the best way forward and I really don’t want to touch the UAVCAN V1 core specification to make this work (while I don’t expect to use UDRAL I also don’t want it to conflict with other profiles I will use). Again, I’ll raise the not-particularly-apt-but-somewhat-illustrative example of I2C addresses. Millions of I2C devices are manufactured with very little by way of a guarantee that the device address will be unique for a given bus. Manufacturers are willing to live with this because there are probably not many devices on any single I2C bus and they all can probably be configured between a set of two or three different address options and if not the board designer can probably just use another I2C peripheral to isolate the troublesome part, etc. So while I2C addressing should be a shit-show in theory it’s generally fine in practice.

If you define a “simple sensor network” profile as the first output from the UDRAL effort where each manufacturer is given guidance in how they should map port identifiers to datatypes for just this profile and we require that the manufactured-in defaults can be manually tweaked if someone fails to follow the community guidance then I think the semantic mapping of ids should be, generally, fine for the constrained problem space. Even looking at the newest offerings from NXP, they’ll be enough CAN peripherals available that you could just put a poorly configured sensor on its own CAN bus.

So, my concrete proposal is:

  1. define a set of datatypes that compose the UDRAL “Simple Sensor Network Profile.”
  2. maintain a matrix of data types to three port identifiers each (one recommended and two alternative) that is generated from the @semantic_id fields in the profile’s DSDL
  3. publish guidance for how port identifier deconfliction should be accommodated (e.g. solder jumpers, dip switches, UART terminals, USB programmers, etc) but plan for most devices to “just work” when you attach them to a system.
  4. (later/next) develop a more advanced profile (“Advanced Avionics Network” profile, perhaps?) that includes a distributed ledger defining the system configuration including port to data-type identifier mapping. Build on this to implement device attestation and trust boundaries.

Now, this all said, I’m curious Andrew, how does Ardupilot deal with node identifiers today?

1 Like

Whoops! I see that Scott replied ahead of me, and he doesn’t appear to like this proposal. However:

While I don’t know what @pavel.kirienko will think about modifying the standard like this, I do have to say that this is the best solution I’ve seen so far. It is the only one I’ve read that tries to approach some sort of compromise, and at a superficial level it seems quite reasonable to me.

My only question is: @tridge if this actually is implemented, I would want vendors/firmware to support both at configuration time - otherwise the fragmentation would becoming annoying. Would you be okay with that? Whether or not ardupilot would implement support for consuming the non-fixed counterparts is, I think, up to you.

Either way: I’d like to thank you for coming up with a solution that doesn’t involve restating our own existing ideas again and again. It really helps to add novel ideas to the discussion.

1 Like

and I thank you for doing so! now I’ll steal it :slight_smile:
note that there are several reserved bits. Which particular one did you have your eyes on? I picked bit 23 for this proposal, but it could just as easily be one of the other ones.

I’d be perfectly happy to reserve a range of values for the 13 bit subject-ID field for use by the sensor messages instead of grabbing one of the reserved bits. We don’t need 8192 IDs, so the reserved bit is grabbing a lot more space than it needs, but @pavel.kirienko did not seem keen on taking a chunk of that 13 bit space, so this was an alternative.

I2C is really quite different. I2C is a master/slave setup, where devices only answer when they are asked to do something (at least in normal i2c usage, there are other ways to use i2c).
In UAVCAN we really want a sensor to be able to immediately start sending as soon as it is powered up. We have seen cases of sensors browning out in flight and resetting, and we don’t want to wait around for a “we haven’t heard from that sensor in a while, maybe we should probe it and see if it needs reconfiguring” cycle.

one of the things that is really missing in v1 is the semantic CRC we have in mavlink 1.0 and 2.0. For those who don’t know about that, the way it works is this:

this means that if someone is using a different message (different xml) for the same ID then it will almost certainly give a crc failure and the message is discarded by the recipient.
One thing that I regret is seeding the whole of the CRC with the structural CRC. I should have seeded only 8 bits of it, and left 8 bits as a “just the data” CRC. That would mean mavlink routers could get a checksum for message ID they didn’t have at compile time.
Another subtlety is that you can only run the structural CRC over the base part of the message - any message extensions don’t change the structural CRC or you end up breaking if sender and recipient have different subsets of extensions for a message.
The application of this idea for UAVCAN is complicated by the lack of a CRC on single frame messages. It still would be nice for multi-frame however, and would provide a lot of protection against projects not keeping their DSDL in sync.

i’m not really sure what you’re asking …
we use the CAN node ID as part of device IDs, and we track it in our DNA server (we have our own DNA implementation, we don’t use the libuavcan one). We have arming checks for duplicates discovered by the DNA server, and we persistently store the DNA database as part of hal.storage.
For AP_Periph, we default to CAN_NODE=0 parameter, meaning “use DNA”. Users can change the parameter on the node to get a fixed node ID.
I get the feeling I missed the point of your question though :slight_smile:

in the short term we wouldn’t, as it would just waste flash space. If someone can show me a real benefit to users of implementing it then we could add it once the kinks in the config system are all sorted out.
We take a very pragmatic approach in ArduPilot. We will support just about anything if it really does benefit users and the implementation/maintenance cost is not horrific. We have a bunch of different CAN protocols now, added on the basis that they are useful to someone.

I like this reserved bit + semantic_id directive approach. It preserves the flexibility and capability of v1 while also allowing for a simple and clean solution that can be defined in DSDL and checked at compile time; in my opinion it preserves many of the benefits v0 had while also allowing users to take advantage of v1’s flexibility. Such a solution can be implemented into the tooling fairly simply, and the modification involved to the UAVCANv1 standard to accommodate such changes is largely uninvasive as far as I can see. This solution has my full support, for what it’s worth.

@tridge

If you don’t particularly care about which reserved bit we’re using, perhaps one of these would be more preferable? They’re currently completely unused (ignored by the specification).

thanks!

thanks for checking if they are used. I chose 23 as it is marked as an incompatible bit (ie. don’t process pkt if its not set to 1). Using 21 or 22 would be fine as long as there aren’t v1 implementation out there that will be unhappy about the packets sent by this semantic_id approach. I suspect it’s early enough in the v1 release process that we can re-purpose one of those bits, but Pavel may know of deployments that would care.

I think you understand my proposal correctly, yes. I am wary of using terms like “convention” or “guarantee” at this point though because our environment is not that well formalized so the difference is not clear-cut, which, I think, matters.

Say, in the common usage scenario, v0 ensures that you don’t accidentally network-cast (nice term by the way) your data to a wrong type, so that’s safe. But it doesn’t provide stronger guarantees than v1 does regarding the instance-ID: say, if you were to misidentify the feed from sensor X as sensor Y, drastic consequences may result despite the data being of the same type (e.g., you could have two magnetometers oriented differently). The instance identification is implemented not at the compile time (unlike data type ID) but at the configuration time (by assigning the node-ID and/or sensor-ID). Following your terminology, it would be a “convention”, not a “guarantee”. Am I reading you correctly?

You seem to be accepting that as safe with the help of your arming checks that ensure that the node-IDs are unique within the system. I don’t think this is fundamentally different from v1, where similar checks can be easily automated as well. Let me summarize so that you could double-check that I really understand where you are coming from:

Property checked at… v0 v1
Data type compile-time configuration-time
Data semantics N/A configuration-time
Service instance configuration-time configuration-time

In order to implement a safe service client, the full triple is required. For instance, using the magnetometer example, we need to know:

  • the type (is it a magnetic field strength type?)
  • the semantics (is this magnetic field reading intended for navigation or it’s coming from the payload (like gimbal)?)
  • the instance (is it the sensor A, which is oriented (0,180,0), or sensor B, which is oriented (90,90,0)?)

v0 statically ensures (type+semantics) but not instance. v1 statically ensures none. Both have the potential for misconfiguration. Admittedly, v1 contains two more dimensions to be misconfigured, but it is not fundamentally different.

Just like your PnP node-ID allocator can take care of these checks for many use cases, my port autoconfiguration method can offer equivalent user experience (actually better user experience since it can automate more cases as I wrote earlier).

We agree on all of the points listed. I think I wasn’t clear enough, let me correct that.

New functionality certainly won’t appear magically so:

  • someone needs to design network services to model it (which usually but not always involves crafting new data types);
  • someone needs to write code to implement these services;
  • someone needs to configure nodes so that they are properly connected to said services (this means setting port-IDs).

The difference is that per SOA, v1 allows you to extensively reuse network services without the need to patch the network with ad-hoc specialized interfaces (types). This is the critical advantage that is necessary to make UAVCAN long-term viable and usable outside of small drones.

sus

Are you trying to smuggle data type IDs back into the Spec (disguised under a different name)? I think you are.

I think it is a bad (and quite complex) proposal because it perpetuates the deficiencies of v0 that I am not ready to tolerate. You already know my position on this. We did not yet exhaust our options in idiomatic v1 so let’s put your proposal aside for now.

Very true. I consider this as one of the fundamental advantages of UAVCAN over higher-level solutions like DDS.

The lack of a single-frame CRC is the lesser of the problems here. The really critical problem is that UAVCAN has to rely on polymorphism to make advanced and extensible interfaces possible, which is incompatible with rigid hashing. This is a known issue but it cannot be addressed with a simple CRC, which is why it was removed. More on this:

For use cases ArduPilot is concerned about, this is addressable using preflight checks as I mentioned above.

not even close.
The sensor_id in 1002.MagneticFieldStrength2.uavcan is in the packet. First the v0 type safety guarantees you are getting a 1002.MagneticFieldStrength2.uavcan packet, then directly inside that packet you have the sensor_id.
In v1 the “configuration” based check is distributed. You don’t get information with each packet on either the type or the instance ID.
The big distinction I am making is that type safety needs to be in the packet.

you know better that that. Stop trying to dress this up as something it’s not.

again, totally different things. The consequences of the DNA being broken and ending up with two nodes having the same ID is fairly benign. We don’t end up casting baro data to compass data.

Calling it ad-hoc when it isn’t the design you like doesn’t make it bad. Your SOA approach is just as ad-hoc as adding a message to v0.

yes. i’m trying to do whatever it takes to keep UAVCAN viable. That means finding some way past the blockage formed by your ideas on what makes a good protocol for sensors on CAN in UAVs.
The hand waving approach to type safety that you are advocating in v1 just doesn’t cut it. Type safety really matters. You can still compose packets all you like.

then come up with something that has compile time guaranteed type safety and can work with a network analyser like we have in v0.
SOA and type safety are not incompatible, you just need to have the semantic type in the packet. The SOAP protocol (which is the poster child for SOA) does this all the time, by putting the string request type at the top of every request. Here is a typical SOAP request (this is the SOAP FlightAxis protocol used for connecting flight controllers to the RealFlight simulator):
image
see the “soapaction” field? That is a semantic ID. This is normal in SOAP. Why? Because eliminating the semantic type in packets is a really really bad idea.
See SOAP spec:

The hash can be on the structure only, not the field names. It is meant to protect against the evolution of the DSDL within two different projects (eg. PX4 and ArduPilot) that want to be able to communicate reliably. It has absolutely nothing to do with polymorphism.

not even close.

At this point in the conversation, I am becoming increasingly cautious to take sides in this conversation. As such, I’m going to try to ignore my own opinion on this and ask questions (to everyone) as a 3rd party.

This worries me, because if AP_Periph does not support the flexible approach, it would harm interoperability in other domains.

It is different - from what I can see, the failure modes caused by not ensuring type or semantics are far more drastic than instance IDs. It is also easier to figure out instance IDs automatically if you are 100% sure the type and semantics are correct.

Yes he is, but in a non-intrusive fashion. For that specific reason, I support Andrew’s idea in this case. If that bit is kept at 1, network services can carry on as if nothing was ever hard-coded.

Upon thinking more, I have another concern with this proposal (although originally I really liked it): in order to support the fixed-id use case, messages would need to be duplicated for each specific type and semantic meaning. This would conflict with the benefits of having non-fixed port IDs in the first place.

This is the one point that is still underemphasized despite reiteration - some optimization-related sacrifices must be made in order to make UAVCAN applicable to a wider domain across a longer time scale. There are some sacrifices - a direct consequence of designing services that are applicable to a wider scope than what is used in the present day - but I believe these sacrifices can be overcome and are worth it in the long run, but @tridge may not agree.

What I don’t understand is why? Maybe I’m missing something extremely obvious but I am yet to come across an example where an auto configurator would break down.

I think you misinterpreted what Pavel is saying. Ad-hoc is not a good word here; specialized is more accurate. He is saying that he is opposed to any overly domain or equipment-specialized messages because such messages limit the reusability and applicable scope of any network service across future use cases and the broader UAVCANv1 ecosystem.

There are two completely independent problems here, so let’s please treat them as such.

  1. Compile time guaranteed type safety:
    Yes, you are correct. The current proposal does not guarantee this. It is unfortunate but I still can’t see any actual problems occurring as a result of this.
  2. Network analysis
    It will be some more time before I have a prototype, but this can definitely be arranged if an application layer bus monitor (such as post-mortem playback in Yukon) has context about port IDs. This can be achieved by looking at the same registers that the ID allocator does, or by just asking the allocator directly. Keep in mind that these configurations are reasonably static, so once exported (or manually input, even though this is quite cumbersome) once, they apply to any future log.

As iterated before, the concern in general is not that we will run out of IDs. I think this has been stated before. I hope you understand the following already, @tridge, but I will state it again: my main problem with introducing standardized semantic IDs (other than the service classes I already proposed for UDRAL, which are serialized at a packet level) is that it is limited to only those use cases that the original standards designer can think up, or at the very least, introduces (non)-intentional biases towards such semantics. This restricts integrators from using these services in a semantic configuration that is not envisioned by the original standards designer.

I am satisfied with the reserved bit toggle because these restrictions can be turned off, no questions asked, by a network service provider or consumer - but they are still there for those who want them. Pavel is of the opinion that he doesn’t even want such semantic associations anywhere in the standard because they have the potential to encourage bad practices, which is why he suggested enforcing it at the autopilot ecosystem level. I don’t agree with that solution because it doesn’t solve the original problems you intended it to, which is static type and semantic safety across the entire UDRAL ecosystem.

not at all. Take the magnetic field message:

@semantic_id 2
uint3 instance
int16 field_mgauss[3] # in milliGauss

it contains a magnetic field. The recipient needs to know it contains a magnetic field (and the units) to make any use of it. How can you “re use” this as anything other than a magnetic field? Changing the subject ID you send on doesn’t magically make this a gyro rate or an acceleration.
You can combine this magnetic field message with any other number of messages to achieve whatever goals you like - so it is composable. When you do that the bits in this message still represent a magnetic field.

nope. That magnetic field will always be a magnetic field no matter how much you play with service ID assignments. There is no applicability of this data outside of it being a magnetic field.
The semantic_id doesn’t force you to use this field in a particular way - you can use magnetic fields in lots of different ways. What you can’t sanely do is use it as an air pressure or a gyro rate. That is just nonsense.
The whole idea that we will turn a message that has a specific set of semantics into something grander through removing the semantic type is just rubbish.

strong typing is one of the bedrocks of reliable computing. It is why we use classes, types etc in languages like C++. We want to guarantee that we don’t pass a magnetic field into a piece of code expecting a pressure. Languages that guarantee that at compile time are preferred as they lead to more robust systems and less aircraft ending up in pieces on the ground due to bugs.
The network equivalent of that is putting an ID in each packet saying what the semantic type of the packet is. That is not a difficult concept, and it is the way it has been done for decades in network protocols because it is simple and it is robust. It is also done in SOA systems and is completely compatible with that.
You could have dynamic address assignment and still have semantic IDs in packets. We have that in v0 with the DNA dynamic address assignment. We could add another layer of ports in between and still keep the semantic type ID somewhere else in the packet (that is what my first CANDevices proposal did). I don’t think that extra layer of runtime assigned port IDs actually adds any value, but it is possible.

Yes. I think you misread my point again. Of course that type will always be a magnetic field. However, it may not always be a magnetic field for use in a UAV for heading estimation. Type != semantic usage. This is also the reason I am strongly in favor of enforcing SI units and semantic splitting of messages, as they make the same data applicable to more use cases.

I am not saying it is. I understand it perfectly. I am saying that it puts an implicit boundary on the subset of applications that can use the message by making associations between type and semantic usage where there should not be any, at the standards level.

the semantic_id just says its a magnetic field. It does not say you have to use it for heading. That is up to the consumer of the data.

it does no such thing. It just says what the data is. It says nothing about how someone should use it.

the semantic_id just says its a magnetic field. It does not say you have to use it for heading. That is up to the consumer of the data.

In that case, I find this somewhat acceptable. Not ideal by any stretch in my mind, but acceptable.

However, are you content with just solving the type safety? What about the semantic usage and instance ID safety problem (using the gimbal’s magnetometer for vehicle heading reference)?

Pavel may not find this acceptable, but I will let him make his own conclusions.

We can still do both if you’re worried about building up a library of re-usable types.

# MagneticField.uavcan
int16 field_strength_milligauss[3] 
# Magnetometer.uavcan
@semantic_id 2
uint3 instance
MagneticField raw_value
1 Like