Port type safety enforcement

scottdixon · July 19, 2021, 4:46pm

I’m PDT. So, this meeting proposal is for when? I see 21:00 PDT proposed as a time but I’m not clear what the day is.

pavel.kirienko · July 19, 2021, 4:49pm

We don’t know yet either but let’s say it’s Wed/Thu/Fri?

Also, I sure hope we could do it later because 07:00 is a bit too extreme for me. If @dagar happens to be an owl like I am, this might just work.

coder_kalyan · July 19, 2021, 4:57pm

@pavel.kirienko does 8:00 work for you, or should we go later than that? I should be fine but I don’t want to impose on @dagar too much.

As for date, I don’t see much of a reason to push it back… today might be too early for some others (although I am free so LMK ) so if that doesn’t work maybe Tuesday night North America / Wednesday morning Europe?

scottdixon · July 19, 2021, 5:34pm

I can do 22:00 PDT on my Tuesday evening otherwise 23:00 actually works better

coder_kalyan · July 19, 2021, 5:37pm

23:00 is a stretch for me, but also very late for @dagar .

Ahh, timezones

coder_kalyan · July 19, 2021, 5:42pm

Oh and I spoke to @bbworld1 and he said he could make it at 22:00 PDT as well, he’s more of an owl than I am as well so 23:00 should work for him if necessary.

pavel.kirienko · July 19, 2021, 5:44pm

Okay so: @dagar can you tolerate 02:00? @tridge are you available on Wed 16:00 your time?

tridge · July 19, 2021, 9:34pm

Each of those things is just using disparaging language for what is clearly an advantage for a sensor network protocol for UAVs.
Rigidly coupling data types and semantics is what gives you robustness (and the ability to make a sane network analyser).
Instance identifiers gives us a way to speak clearly about the sensor instances a node provides. When a node provides two GPS instances we want to know which is the one on the left wing and which is the one on the right wing, as position matters. We don’t want to force that to be auto-assigned. Same goes for many other sensor types.
On composing interfaces, why do we actually want to compose a GPS message with a magnetometer or any other combination?
With each of these attributes you are forcing bad properties on the protocol for no gain for the actual users of the protocol.

the v1 protocol already has a bunch of fixed IDs for specific messages. I want to reserve a range for UAV sensors so that we get the properties we need for a UAV sensor network. We don’t need a huge number. We have less than 50 in use in v0 after many years. We’d actually end up with less in v1 as we wouldn’t have the Fix/Fix2/aux mess. The extensibility of messages in v1 means we shouldn’t get the same growth in numbers we had in v0.

tridge · July 19, 2021, 9:35pm

I could, but I’d have a hard limit of 1 hour as the weekly ArduPilot EU dev call starts at 5pm on Wednesday my time.

coder_kalyan · July 19, 2021, 9:39pm

I could, but I’d have a hard limit of 1 hour as the weekly ArduPilot EU dev call starts at 5pm on Wednesday my time.

@tridge I think this is acceptable. If we make any progress on the call, we can schedule another one.

@tridge @pavel.kirienko Or we could move it back one hour, which would require you to get up an hour earlier but give us 2 hours to work. I am fine with either. @dagar?

dagar · July 20, 2021, 1:56am

I’m late to the party, so I’ll accommodate whatever works.

coder_kalyan · July 20, 2021, 2:03am

Alright. I propose a compromise:

We move it back 30 minutes, so 22:30 PDT == 01:30 EDT == 15:30 AEST == 08:30 EEST

Vincent and Scott said they would be available at that time
Slightly easier for Daniel, maybe still doable for Pavel?
A bit earlier for me is preferable
Gives us 1.5 hours to discuss before Andrew needs to drop off. I think this is a perfect amount of time to discuss, after which time people will start getting tired and frustrated. If the meeting goes well, and there is more to discuss, we can schedule another call.

@scottdixon @tridge @pavel.kirienko @dagar Does this work for you?

See you guys tomorrow night/morning/afternoon!

pavel.kirienko · July 20, 2021, 1:59pm

Deal. The meeting point is here: https://meet.jit.si/UAVCANWeeklyCall

pavel.kirienko · July 21, 2021, 11:49am

This is NOT a summary of today’s call (Kalyan may post one later) but an illustration to go with what I said.

Say, we have an airspeed sensor. Per v1 design, its messages have no fixed port-ID, so when a new sensor is connected to the network, it just sits there silent until configured. Suppose that ArduPilot defines a table that goes like this:

Function	Subject-ID
Airspeed sensor #0 pressure	1000
Airspeed sensor #0 temperature	1001
Airspeed sensor #1 pressure	1002
Airspeed sensor #1 temperature	1003
…	…

(whether pressure and temperature have to be separated is irrelevant at this moment, either approach is fine)

This table does not necessarily have to be part of the standard. For instance, PX4 may manage without any predefined ranges at all (which it probably will considering @PetervdPerk’s work on DS-015 so far). ArduPilot may be able to implement a trivial mapping between port-IDs from the table and its internal device-ID via bitfield manipulations.

So in order to connect a new sensor, we have to configure the subject-IDs appropriately. Suppose that initially, in the early days of UDRAL, we expect the human to do that manually. The user experience is terrible and the potential for misconfiguration is huge. However:

Automation for non-critical setups will arrive later (see my PoC).
High-integrity setups are expected to prefer manual configuration.
The effects of misconfiguration are contained at the initial commissioning stage and under reasonable assumptions cannot manifest in-flight. Hence, the overall operational safety of the vehicle is not affected.

When ArduPilot receives a message from subject 1002, it handles it exactly the same way as if it received a v0 message like 1027.RawAirData with a sensor-ID field in it (well, the one we use in this example doesn’t have a sensor-ID, but suppose that it does). The node-ID is irrelevant and is not expected to be used in this design except for RPC calls (a detailed explanation of why reliance on node-IDs at the application layer is evil is given in the Interface Design Guidelines; tldr: anonymous DCPS abstracts the origin/destination away, reaching to the node-ID constitutes a leaky abstraction). The remote register lookup that Andrew mentioned does not occur at any point (it was never part of the design).

If a node contains several sensors, then it will publish on multiple subjects concurrently. The integrator will have the option to manage bandwidth utilization by selectively enabling only those subjects that are of relevance for the system at hand.

ArduPilot’s own log analysis software will know how to interpret sensor messages correctly since it will use the aforementioned table as a reference. Other systems (e.g. PX4) may approach this entirely differently, I don’t suppose we need to discuss this at this moment but we can do it later if there is interest.

I see how this can be perceived as a somewhat unconventional way of attaining the same goals as v0 did. The critical advantage offered by this architecture is its ability to accommodate new functionality without the need to change data type definitions and introduce modifications into existing components (which is of paramount importance for high-integrity deployments). Recall the two examples I introduced at the call: a gimbal with geotagging and a servo with its position feedback loop closed through the network. As I attempted to illustrate during the call, the v0-style approach does not allow one to address these use cases sensibly, forcing one to either introduce highly specialized ad-hoc data types (which is disastrous for the ecosystem in the long term) or to force high logical coupling between independent components of the system. Note that I don’t want us to spend too much time assessing practical usage scenarios where v0 falls short thinking how to improve it, because we will end up overfitting our solution only for those scenarios we can come up with. Instead, the preferred strategy is to rely on the well-known principles of SOA, where the same problems have been addressed long before us.

Vendors of compliant hardware are critically interested in the adoption of this enhanced, more abstract architecture, because it allows them to significantly expand the target market from only UAV to adjacent domains such as certain robotic systems. Anticipating objections, I don’t think relying on their direct input here would be wise, as the example of the old DS-015 thread we all remember indicated that many of the participants simply failed to grasp the topic of the conversation.

I am not sure yet what the next steps should be. I feel like we were making some progress at the call towards convergence. I think we should repeat that exercise again but @coder_kalyan, @tridge, and @dagar should post their notes/observations/feedback first.

coder_kalyan · July 21, 2021, 6:10pm

At the end of the discussion, @pavel.kirienko mentioned to me an example that I am reposting here (Pavel, I hope you don’t mind). I have also added some of my own thoughts to the example to enhance it a bit.

The example was not brought up during the call as it is not in the same domain as UAVCAN or UAVs.

In the early days of consumer computing, PCs were equipped with all sorts of random connectors for various things. These include: parallel port (what Pavel brought up), as well as others - PS/2, RS-232, etc. At the time, computing resources were limited, and it was cheaper to create dedicated interfaces that addressed specific use cases in an efficient manner. However, as applications of computers grew, this quickly got out of hand - computers had far too many random connectors, operating system kernels had quite a lot of drivers, and daisy chaining adapters was common.

Enter the Universal Serial Bus (USB). It is a highly inefficient protocol that relies on multiple layers of abstraction and does not optimize for any specific use cases. It slowly started replacing existing solutions:

RS 232 was replaced by the CDC ACM device class
PS/2 was replaced by the HID device class
parallel port was also replaced
USB-C provides charging and configurable power delivery (UCPD) and has replaced dedicated power connectors.
In recent years, even dedicated display ports (like HDMI) have been replaced by USB connectors!

Is USB non-specialized? Yes
Does USB introduce more application layer complexity? A lot of it
Is USB slow and inefficient? Absolutely
Do these properties mean it is not in use? Clearly not

Despite these shortcomings, USB had the benefit of standardization and composability. The introduction of a standard connector over the entire ecosystem, that can still handle custom arbitrary use cases, has proved to be useful. When it was first created, USB 1.0 was very slow (relatively speaking). USB 2.0 high speed and 3.0 super speed fixed this over time - just like CAN-FD and ethernet will in the future. Did the designers of USB imagine that someone would be plugging a laptop into a workstation setup where 65 watt power, external keyboard and mouse, external audio setup, and 2 4K monitors are served over a single port? No.

Let us please be very cautious when over-optimizing a protocol to the current use cases. Future use cases are, by definition, not easy/possible to predict. It is also far easier to introduce more rigidity later on than remove rigidity in a standard that is already widely deployed.

coder_kalyan · July 21, 2021, 6:28pm

EDIT: I am not removing this message as it does illistrutate my point well, but I realize in hindsight that it is a bit too emotionally loaded. I do not want to be a hypocrite by condemning others’ use of strong language while doing the same myself. This post is not meant to offend anyone. It also does not imply that MAVLink is not fit for its current use case - just that it is an example of a very optimized protocol.

To take a more relevant example: MAVLink.

Disclaimer: I do not like MAVLink as a protocol, I believe it has many design flaws. Sorry if this offends anyone.

MAVLink was developed almost a decade ago when the concept of autonomous vehicles was new. It was defined to solve a specific use case: communication of comands, telemetry, and other relevant information between a ground control station and a vehicle using a very constrained transport (433/900mhz telemetry radios are usually in the kilobit range). It is highly efficient (very little packet overhead) and optimized (fixed message IDs). MAVLink is even worse than v0 in the sense that it does not provide an easy means of creating custom messages for other use cases. It has further issues:

It does not provide nearly enough node IDs (1 byte only) for the complex fleet use cases these days (or other use cases)
Even the entire node and component ID thing is a huge mess because the standard isn’t specific enough about how to implement it, and hardly anyone follows it
Data types are a mess. There are degrees being used everywhere (instead of radians) and other non-SI units that creates developer headaches. I myself faced an issue where I had to implement 2 versions of a message (one as an RPC request and one as a pub/sub) where the pub sub used quaternions but the RPC request used euler angles due to the limitations of the protocol. This is ridiculous.
It is not flexible or configurable, limiting its applicability.

Recently, I have seen the proposed adoption of MAVLink to similar use cases that UAVCAN was designed to target: software defined vehicles. There has been the introduction of a MAVLink gimbal protocol, smart battery protocol, tunnel protocol, camera protocol, and more. I have even seen prototypes of MAVLink over CAN bus. I am surprised and absolutely horrified at this. MAVLink was not created to target such use cases. I have myself tried to implement complex internal logic using MAVLink over UDP, and ended up abandoning the project due to the rigidity and bad design of MAVLink. The software-defined vehicle problem should not be solved using MAVLink.

MAVLink went from being a good protocol for the only use case in existance, to overstaying its welcome, in under a decade. I do not want the same thing to happen to UAVCAN by 2030.

tridge · July 21, 2021, 10:28pm

that only works if the id is guaranteed by the design, preferably by the dsdl compiler. If I’ve understood your proposal correctly it is a convention, not a guarantee. Unless it is guaranteed then it is no use at all. We can’t dispatch blobs of bytes to a subsystem in the flight controller based on what amounts to a hint.

it doesn’t achieve what we have in v0 without it being guaranteed by the compiler

you would need to introduce new data types and code in v1 as well. The nodes that knows the pos/vel/attitude (usually the flight controller) needs to offer that data. It won’t do that without someone writing the code to do that. If multiple nodes offer that data then you still need some way for the gimbal to be told which source of that information to trust. You certainly don’t want the gimbal to just take any position data on the network, as the position data from something like a GNSS is not suitable for a gimbal (due to GNSS lag the gimbal would perform very badly).
Just because two pieces of data are the same type does not mean they are equivalent.

tridge · July 21, 2021, 10:40pm

a few counter points.
The mavlink compiler (mavgen) generates code that still works with v0.9 and v1.0, despite v2.0 being introduced several years ago. Users mix and match v1.0 and 2.0 on the same uart. We evolved mavlink in a way that introduced very little pain for users whereas UAVCAN is taking the attitude of “throw it all out, start again and damn the users”.

the way things are going now we can expect UAVCAN to be dead and forgotten well before then. I expect that MAVLink will still be around and widely used.
When we were low on IDs (the 8 bit message ID limitation) we extended it to 24 bits in a way that users didn’t even notice. When we wanted message extensibility we added that in a simple manner that allowed for extensions without breaking existing code.
The design of many of the message types in MAVLink is indeed poor, just as the design of many of the message in v0 is poor. Neither is nearly as bad as DS-015 was though.
UAVCAN could learn a lot from MAVLink.

tridge · July 22, 2021, 12:13am

Here is another idea. We have several reserved bits in the current v1 beta spec that are reserved. Here is the current packet format:

Bit 23 is marked as discard if bit isn’t 1. We could change that to be that when bit 23 is 0 the subject ID is used as a semantic ID.
Then we’d have DSDL like this:

# GNSS message

@semantic_id 17

uint3 instance
uint3 status
uint32 time_week_ms
uint16 time_week
int36 latitude
int36 longitude
float32 altitude
float32 yaw
uint16 hdop
uint16 vdop
uint8 num_sats
float32 velocity[3]
float16 speed_accuracy
float16 horizontal_accuracy
float16 vertical_accuracy
float16 yaw_accuracy

note the @semantic_id marker. This would do several things:

the code generator would automatically fill in the subject_id with the semantic_id when a semantic_id is specified in a message
bit 23 would be set to 0 in these messages
the compiler would check that there is no re-use of the semantic_id values across all of the DSDL being compiled for the project

This still allows for v1 to support all the SOA capabilities that Pavel wants to see, while also supporting robust sensor networks which is what we need for ArduPilot and PX4. By using up one reserved bit we get past this logjam in our discussions.

scottdixon · July 22, 2021, 3:47am

Noooooooo! Not our precious reserved bit!?! @kjetilkjeka, Andrew wants our reserved bit! Do you have any idea how hard we fought @pavel.kirienko to get that bit?

Seriously though, I think the idea of just having a smattering of fixed port identifiers as part of a super-simple profile is the best way forward and I really don’t want to touch the UAVCAN V1 core specification to make this work (while I don’t expect to use UDRAL I also don’t want it to conflict with other profiles I will use). Again, I’ll raise the not-particularly-apt-but-somewhat-illustrative example of I2C addresses. Millions of I2C devices are manufactured with very little by way of a guarantee that the device address will be unique for a given bus. Manufacturers are willing to live with this because there are probably not many devices on any single I2C bus and they all can probably be configured between a set of two or three different address options and if not the board designer can probably just use another I2C peripheral to isolate the troublesome part, etc. So while I2C addressing should be a shit-show in theory it’s generally fine in practice.

If you define a “simple sensor network” profile as the first output from the UDRAL effort where each manufacturer is given guidance in how they should map port identifiers to datatypes for just this profile and we require that the manufactured-in defaults can be manually tweaked if someone fails to follow the community guidance then I think the semantic mapping of ids should be, generally, fine for the constrained problem space. Even looking at the newest offerings from NXP, they’ll be enough CAN peripherals available that you could just put a poorly configured sensor on its own CAN bus.

So, my concrete proposal is:

define a set of datatypes that compose the UDRAL “Simple Sensor Network Profile.”
maintain a matrix of data types to three port identifiers each (one recommended and two alternative) that is generated from the @semantic_id fields in the profile’s DSDL
publish guidance for how port identifier deconfliction should be accommodated (e.g. solder jumpers, dip switches, UART terminals, USB programmers, etc) but plan for most devices to “just work” when you attach them to a system.
(later/next) develop a more advanced profile (“Advanced Avionics Network” profile, perhaps?) that includes a distributed ledger defining the system configuration including port to data-type identifier mapping. Build on this to implement device attestation and trust boundaries.

Now, this all said, I’m curious Andrew, how does Ardupilot deal with node identifiers today?