First draft of the UDRAL DSDL namespace

The pull request is up:

Responding to @coder_kalyan’s comment on GitHub:

I would suggest allocating more time for interested parties to actually take a look at it in depth and comment

I see no problem with that. Shall we make it Oct 15? As I just wrote above, merging this PR does not amount to any hard commitments.

Sounds acceptable, but considering the previous feedback on the original DS-015 message set, we should wait until everyone is actually content with the draft before merging (and I don’t think everyone is currently), which may or may not take until after Oct 15. The point is to make a satisfactory message set, not meet a deadline, if that means ripping it up later on.

I’ll provide my own neutral review soon when I get a chance.

This objective is likely to be unreachable within a realistic time frame, which is why having a soft deadline is important. If one prefers a v0-style design, that is simply not going to happen regardless of the amount of time allocated for the debate.

The issues raised here: https://forum.opencyphal.org/t/meeting-minutes-july-21-2021-utc-udral-call/1365
and https://forum.opencyphal.org/t/uavcan-drone-application-layer-sig-guidelines/1280/16 provide a good summary of the requirements a UDRAL proposal should meet. The definitions in the PR are nothing but find/replace drone/udral on the DS-015 definitions. With no attempt to address the shortfalls identified in that message set, I see no reason to expect different feedback on them.
As I’ve mentioned before, I think we need to get the design agreed and the tooling in place before publishing message definitions. I don’t think we’re at that point yet.
I’d also suggest that as definitions are developed we leave them in a branch until they’re iterated, agreed, and stable. MAVLink has demonstrated many times that WIP messages have a habit of making their way into production systems, making it very difficult to iterate and change them. That’s why in MAVLink we’ve moved away from WIP messages in favour of development.xml (which is annoying, but a necessary compromise given the workflows in that project).

The proposed design addresses every significant issue raised on this forum so far, which I explained in the OP post. If you find this to be untrue, please, provide a detailed response instead of making vague references to past discussions. The UDRAL DSDL codebase did not see significant change compared to DS-015 because none was necessary to address the known issues.

Regarding the risk of publishing WIP types: I see your point, but I imagine that UAVCAN is reasonably shielded from this risk by virtue of incorporating a well-defined version number with every definition. One should not expect a data type with a version number of v0.x to be stable. Yet, having WIP types available in the main branch is helpful as it encourages early-stage experimental adoption.

Re WIP / versioning. WIP messages in mavlink were explicitly tagged as WIP in the xml. Didn’t stop people deploying them. I understand the intent to make prototyping easier, but we also need to protect users as much as we can. Any dev can build off a development branch, which in my view is a safer approach. Perhaps we could ask the SIG/consortium?

Re specific points, if you’d prefer I copy them in here, sure:

Andrew TridgelltridgeUAVCAN Consortium member representative

May 4

yes, we should have simple ones that just reflect the real data. I don’t think we should add the “air_data_computer” unless there is going to be real hardware that will really be used that needs it.

we should just stop using covariance matrices. It just encourages developers to make up meaningless numbers. We shouldn’t be wasting bandwidth on stuff that is just made up.
A few guiding principles in message design:

  • don’t add fields unless there is a real need for them
  • don’t add fields that force the developer to make stuff up that they don’t really know

closer, but should remove a bunch of fields.

  • I don’t think the timestamp really has value on this message. Timestamps have enormous value on messages like GNSS position and velocity, but on differential pressure used for airspeed I don’t think it is useful. The time of arrival is fine.
  • remove both filter_delay and the filtered differential pressure. It would only make sense if we were greatly reducing the sample rate on the bus and I don’t think we are likely to be doing that.
  • get rid of the variance, as the sensor is unlikely to really have a good measure of that
  • only have one temperature

We should aim to get it down to a single CAN frame if possible.

but that doesn’t tell you what this reading actually is. It just says it is a difference between two pressures and a temperature (plus a pointless timestamp and covariance). We need it to be broadcast in a form that says “this is from a pitot tube, if you want to get a pitot based airspeed you can use this”.

To summarise, this PR does not address the fundamental problem:

Vadim has attempted to address the port identification/type safety issues, but the other significant rub points are untouched.
The DS-015/UDRAL messages assume high level functionality in each node, and this simply isn’t the reality for most of the systems that currently use UAVCAN. For some nodes, such as actuators or gimbals, this level of abstraction can be made to work. For many sensors, such as GNSS and other low level devices, it can’t without introducing risk. It is not reasonable to expect that this can be overcome by a one-off configuration by the integrator, or by transferring complexity to another system (ie the autopilot).
Developing a standard based on an idealised expectation of some future reality doesn’t make sense. Having the scope/flexibility within the standard to adapt to a future state is important, but for the standard to have any chance at success it needs to also be able to functionally replace v0 (particularly given that you’ve deprecated v0).
The reality is that if UAVCANv1/UDRAL doesn’t achieve that, it fails. With what you’ve presented here, it fails.

I provided a detailed review of the benefits of SOA in this post: https://forum.opencyphal.org/t/port-type-safety-enforcement/1303/73?u=pavel.kirienko. The bandwidth concerns (along with the derived issues, such as timestamping and covariance) appear to be unsubstantiated, as illustrated in the OP post.

As for the other points, they are already covered in the OP post, and I see little value in restating them again.

I agree and disagree with some of the points made here. Note that I also made some notes in the UDRAL planning document based on a compromise, and I’d like to see some of those decisions implemented here (I think they were quite reasonable).

Developing a standard based on an idealised expectation of some future reality doesn’t make sense. Having the scope/flexibility within the standard to adapt to a future state is important, but for the standard to have any chance at success it needs to also be able to functionally replace v0 (particularly given that you’ve deprecated v0).
The reality is that if UAVCANv1/UDRAL doesn’t achieve that, it fails. With what you’ve presented here, it fails.

Re flexibility: My original design draft aimed to create usable low level data types as well as high level services for each service class. The idea was to make it very practical to use low level types while providing a path to transition to high level services. Most of this is alright currently, but I believe some of the physics namespace types are meant for higher level services and don’t really promote the use of direct publishing.

we should just stop using covariance matrices. It just encourages developers to make up meaningless numbers. We shouldn’t be wasting bandwidth on stuff that is just made up.

I agree with this. Unfortunately no one has convinced me yet that they are useful enough (or useful at all) to justify the bandwidth they are taking, regardless of whether the bandwidth can be spared.

We should aim to get it down to a single CAN frame if possible.

This is not a necessary goal for a GNSS or air data frame. However, it is definitely a necessary goal for an ESC setpoint - something that I believe I outlined in the planning document which was not implemented in the message draft.

Developing a standard based on an idealised expectation of some future reality doesn’t make sense.

This is only partly true - standards tend to (and should) outlive specific implementations and design choices of the day. The best we can do is think hard about the current “best practices” in the hope they are somewhat future proof. I think UAVCANv1 does that correctly.

The bulk of the call was basically to-ing and fro-ing about “how do we use an architecture designed for high level distributed computing as a low level sensor network”.

We don’t - the idea (in my mind) was to provide a set of solid low level types that vendors can use temporarily without concern, while providing higher level abstract services to migrate to that are more practical in a more complex system.

For many sensors, such as GNSS and other low level devices, it can’t without introducing risk.

How so? I believe all the proven issues were taken care of by @VadimZ’s proposals.

In any case, it’s good to have a nice example fleshed out for a reasonable VTOL vehicle, and still have the bandwidth utilization for v1 on a 1Mbps CAN 2.0B bus be < 30%. I think the bandwidth for our (Volansi’s) vehicles might be higher, but on the other hand, we may split devices between two buses, and further, we’re planning to adopt CAN-FD as soon as we can, so it will be a non-issue either way.

I like the idea here - but the question of what vehicle to use as a standard remains. UDRAL currently takes up more bandwidth than I’d like on a 1Mbps CAN 2.0B bus, which is the reality of the hardware most of us are working on. For instance, my team develops an octocopter, and we’d really like to be able to fit that as well…

I think the approach of presenting abstract arguments (or pointers to past arguments) is unlikely to bring consensus here.

I would like to propose a two-part approach:

  1. Try reviewing transport, service discovery and tooling part (service registers, nunaweb, type signature etc) separately from message formats. Here the basic requirements are likely to be uncontroversial, so it should be possible to find common ground.
  2. When discussing the message formats, it would be more productive to focus on comparing specific, fully described alternatives of message implementation, before their abstract properties. It might also make sense to postpone this discussion until after some common ground is established on part 1.
2 Likes

@auturgy, are you seeing remaining rub points outside of message design (with several sub-issues such as granularity, nesting, timestamping and bandwidth optimization) ?

That’s a good starting point, lets examine a GNSS + magnetometer device case in more detail ? We are actually manufacturing those now, so I could meaningfully participate …

When discussing the message formats, it would be more productive to focus on comparing specific, fully described alternatives of message implementation, before their abstract properties.

Sure! Let me propose something simple below then, related to the actuator setpoint service, since it’s a very simple case to start out with:

Currently the generic setpoint messages look like this:

float16[<n>] value
@extent 16 * 256

I propose modifying it to use 14 bit integer setpoints (same as V0):

int14[<n>] value
@extent 16 * 256

The above has the benefit of halving bandwidth usage on quadcopters when using Classic CAN (7 byte payload * 8 bits / 4 setpoint values = 14 bits, fits in a single frame), which are likely one of the most common use cases. Octocopters also see an decrease in bandwidth usage. Since ESC setpoints are relatively heavy (high rate), this is a significant improvement, and also allows the integrator to potentially raise the ESC setpoint frequency. There’s also minimal loss in semantic meaning - they are still normalized/scaled ratiometric setpoints with sufficient resolution [-8192, 8191].

Setpoint efficiency
  • Approve (14 bit scaled integer setpoint)
  • Disapprove (keep the 16 bit float setpoint)

0 voters

I think I like this idea, though for slightly different reasons.

The change from floating point to fixed point would require explicit scaling parameter for non-ratiometric control variables (those with physical unit such as current or rotation speed). Having this extra parameter slightly increases configuration complexity, however it makes the quantization uniform, and more importantly, predictable and explicit.

FP16 would provide and illusion of infinite range, and then surprise with unexpected non-uniform quantization effects

I think I can live with the general high level design and defining services where each subject is mapped to a particular register and all the required pieces to generate code end to end.

The number of subjects per servo seems a bit excessive (4 publications, 2 subscriptions in the demo), but I suppose with appropriate tooling it could be tolerable.

Isn’t a reg.udral.physics.kinematics.translation.Linear.0.1 sent to each servo kind of excessive? What about having some flexibility in the services so that a manufacturer can expose what’s even supported in the first place.

Instead of having so much flexibility per service why not simply carry different services for variations that actually exist? Leaving things open to interpretation with multiple options and details buried in an comment essay seems like a good way to ensure every vendor will carry their own set of implementation quirks.

Do you understand the motivation for such design though, and do you recognize its validity (assuming that we are not discussing the most trivial applications)?

Observe that the complexity of the new design scales with the application. The most basic systems are unlikely to require, say, the feedback, status, and power subjects, which leaves only three, which is comparable to v0.

On a typical UAV, it probably is excessive, which is why there is a simpler alternative provided. The full kinematics message is intended for applications that require controlling the motion profile (such as limiting the acceleration).

We can certainly discuss the extra flexibility and feature reporting but right now I suggest focusing on the bare MVP.

This is partly due to the fact that UDRAL provides a much more detailed specification compared to v0, which also boosts the illusion of added complexity.

Before we solve a problem, let us ensure that it is actually there. Can we please model the subjects of a highly complex vehicle that would run out of bandwidth with UDRAL, and then optimize for that? I am not questioning the existence of such configurations, obviously, but I am questioning their relevance for UDRAL over Classic CAN.

So far, your proposal seems inferior to the current design in two points:

  1. It introduces extra complexity for non-ratiometric modes (requires additional scaling as Vadim wrote).

  2. It somewhat undermines the composability for the case where the actuator group contains one actuator (or a number of them commanded in lockstep) because there is no primitive-typed message for int14 while there is one for float16.

You’re saying the setpoint subject can be different types? I thought that had been settled in the type safety discussion, but perhaps I’ve misunderstood? How is that supposed to work safely? How do you even know which type of setpoint your servo can use on that subject? I feel like we’re going in circles.

Right now the overwhelming majority of vehicles in our ecosystem use dumb PWM servos with no feedback, status, etc. That’s the basic problem we’ve so far failed to solve and where I think we should be focused before anything else.

The type is chosen based on the configuration of the node, so the type safety is not affected:

Alternatively, there could be a separate port for the second type of setpoint.

indeed.

I’ve been looking through the battery, ESC and servo messages in the PR, and evaluating them against the needs of ArduPilot (and presumably PX4). I was going to do one post to summarize my findings for all 3, but it turns out to be much too long, so I will do battery first, then go onto ESCs and servos.
To facilitate this analysis I wrote a little hackish tool to allow me to ‘unwind’ a type from the DSDL in v1. This just automates what I did manually for my posts on GNSS and other messages on the DS-015 thread. Perhaps there was already a tool to do this (in which case a link would be appreciated!), but otherwise if anyone wants it my ugly hack tool is here:
http://uav.tridgell.net/tmp/showtype.py

Battery Messages

The README.md in reg/udral explains the battery service as the main example for udral structure. The first thing we notice is it puts the concept of how the battery will be used inside the node providing the service. I think this is a mistake. The example given shows a battery with registered names of “battery.primary” and “main_drive”. It is a bad idea for this “how it will be used” information to be inside the node itself as it increases the amount of information you need to configure in the nodes, and doesn’t lend itself to automatic configuration. I would much prefer that nodes just present an integer ID for a battery provided by the node and do all of the assignment in the flight controller. Putting the smarts in the node makes it much harder to mix technologies (eg. analog batteries, SMBus/I2C batteries, v0 batteries and others).

Adding a generic ability for a node to have an optional string label set by the user would be OK, but this shouldn’t really be linked into the data type.

Diving into the messages themselves, we have 3 top level messages for batteries.

energy_source reg.udral.physics.electricity.SourceTs (at between 1 and 100Hz)
status        reg.udral.service.battery.Status       (at around 1Hz)
parameters    reg.udral.service.battery.Parameters   (at around 0.2Hz)

Let’s expand the energy_source message using the showtype tool:

reg.udral.physics.electricity.SourceTs.0.1 length=23
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        value: reg.udral.physics.electricity.Source.0.1 length=16
                reg.udral.physics.electricity.Source.0.1 length=16
                        power: reg.udral.physics.electricity.Power.0.1 length=8
                                reg.udral.physics.electricity.Power.0.1 length=8
                                        current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                                                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                                                        ampere: saturated float32 length=4
                                        voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                                                uavcan.si.unit.voltage.Scalar.1.0 length=4
                                                        volt: saturated float32 length=4
                        energy: uavcan.si.unit.energy.Scalar.1.0 length=4
                                uavcan.si.unit.energy.Scalar.1.0 length=4
                                        joule: saturated float32 length=4
                        full_energy: uavcan.si.unit.energy.Scalar.1.0 length=4
                                uavcan.si.unit.energy.Scalar.1.0 length=4
                                        joule: saturated float32 length=4

the good news is this only has one timestamp, although I’d argue that even that is too many. I don’t think time-stamping this data is really useful, as the integration time for this sort of data is quite long and the transport delays won’t be significant.

In the expansion, we have current, voltage, energy and full_energy. Unfortunately there is no information on what to give if the node doesn’t have the information. Low end CAN battery monitors (where they just connect to a battery over an XT60 for example) don’t know the full_energy or energy. In that case it would know the amount of energy used so far, but not the full_energy, which means it could not fill in either field.

Now lets look at the status message (sent at about 1Hz). Expanding the reg.udral.service.battery.Status message we see:

reg.udral.service.battery.Status.0.2 length=604
        heartbeat: reg.udral.service.common.Heartbeat.0.1 length=2
                reg.udral.service.common.Heartbeat.0.1 length=2
                        readiness: reg.udral.service.common.Readiness.0.1 length=1
                                reg.udral.service.common.Readiness.0.1 length=1
                                        value: truncated uint2 length=1
                        health: uavcan.node.Health.1.0 length=1
                                uavcan.node.Health.1.0 length=1
                                        value: saturated uint2 length=1
        temperature_min_max: uavcan.si.unit.temperature.Scalar.1.0[2] length=8
                uavcan.si.unit.temperature.Scalar.1.0 length=4
                        kelvin: saturated float32 length=4
        available_charge: uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                        coulomb: saturated float32 length=4
        error: reg.udral.service.battery.Error.0.1 length=1
                reg.udral.service.battery.Error.0.1 length=1
                        value: saturated uint8 length=1
        cell_voltages: saturated float16[<=255] length=2

the high value for total length (604 bytes) comes from the high maximum array size for cell_voltages. Supporting 255 cell batteries does seem like overkill to me, but maybe such sizes are coming.

Using coulombs for available charge seems like a poor choice when Ah or (better) Wh is much more commonly used. The conversion isn’t too hard though.

Now the parameters message:

reg.udral.service.battery.Parameters.0.3 length=71
        unique_id: truncated uint64 length=8
        mass: uavcan.si.unit.mass.Scalar.1.0 length=4
                uavcan.si.unit.mass.Scalar.1.0 length=4
                        kilogram: saturated float32 length=4
        design_capacity: uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                        coulomb: saturated float32 length=4
        design_cell_voltage_min_max: uavcan.si.unit.voltage.Scalar.1.0[2] length=8
                uavcan.si.unit.voltage.Scalar.1.0 length=4
                        volt: saturated float32 length=4
        discharge_current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        discharge_current_burst: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_current_fast: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_termination_threshold: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                uavcan.si.unit.voltage.Scalar.1.0 length=4
                        volt: saturated float32 length=4
        cycle_count: saturated uint16 length=2
        : void16 length=2
        state_of_health_pct: saturated uint7 length=1
        : void1 length=1
        technology: reg.udral.service.battery.Technology.0.1 length=1
                reg.udral.service.battery.Technology.0.1 length=1
                        value: saturated uint8 length=1
        nominal_voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                uavcan.si.unit.voltage.Scalar.1.0 length=4
                        volt: saturated float32 length=4

A few oddities here. It has a design_cell_voltage_min_max, but not the number of cells. Are you supposed to wait for the status message and look at the array length? If that is the plan then it won’t work, as it is quite common for a smart battery to not be able to read cell voltages on all the cells. Common SMBus chips may (for example) support up to 8 cells but only be able to read cell voltages for 4 of the cells, but can then also give you a total voltage. In that case we can’t really fill things in correctly. We could calculate an average voltage for the remaining cells (ArduPilot can do this for mavlink), but it would be better to add a num_cells in Parameters.

The Parameters message lacks any model name string or manufacturer name. I’d very much like to log the manufacturer info and serial number. It does have a unique_id, but I think a manufacturer string is worthwhile. I suspect that most UAVCAN battery monitors will backend onto SMBus battery systems, which do offer a manufacturer name and also a manufacture date. As the age of a battery can be quite important (for maintenance schedules at least) we really should include the date as well.

It is also worth comparing this battery message set to the closest equivalent in MAVLink is BATTERY_STATUS and SMART_BATTERY_INFO.


I think the first thing you notice is it is much easier to read and understand the mavlink XML than the dsdl. The deep nesting in the dsdl really gets in the way of making this understandable. Once you unwind the dsdl you find that the mavlink and xml are similar. The dsdl supports more cells (mavlink tops out at 14 cells, but could be extended). MAVLink uses a current_consumed instead of energy_remaining, which better fits adapters that don’t know the full capacity of the battery.

Overall the proposed battery messages are not terrible. There are some issues, but not awful. The ESC and servos messages have much bigger issues.

2 Likes

Using coulombs for available charge seems like a poor choice when Ah or (better) Wh is much more commonly used. The conversion isn’t too hard though.

Just regarding this, I know its strange, and I thought so as well at first, but I think the minor inconvenience is worth the added consistency of keeping A * s as per SI. And using this doesn’t hurt the data (clipping or transport bandwidth costs) so it should be fine.

Other than that, I agree on your review of the battery services for the most part. I don’t feel I’m well versed enough in this topic to provide a thorough review though.