First draft of the UDRAL DSDL namespace

Do you understand the motivation for such design though, and do you recognize its validity (assuming that we are not discussing the most trivial applications)?

Observe that the complexity of the new design scales with the application. The most basic systems are unlikely to require, say, the feedback, status, and power subjects, which leaves only three, which is comparable to v0.

On a typical UAV, it probably is excessive, which is why there is a simpler alternative provided. The full kinematics message is intended for applications that require controlling the motion profile (such as limiting the acceleration).

We can certainly discuss the extra flexibility and feature reporting but right now I suggest focusing on the bare MVP.

This is partly due to the fact that UDRAL provides a much more detailed specification compared to v0, which also boosts the illusion of added complexity.

Before we solve a problem, let us ensure that it is actually there. Can we please model the subjects of a highly complex vehicle that would run out of bandwidth with UDRAL, and then optimize for that? I am not questioning the existence of such configurations, obviously, but I am questioning their relevance for UDRAL over Classic CAN.

So far, your proposal seems inferior to the current design in two points:

  1. It introduces extra complexity for non-ratiometric modes (requires additional scaling as Vadim wrote).

  2. It somewhat undermines the composability for the case where the actuator group contains one actuator (or a number of them commanded in lockstep) because there is no primitive-typed message for int14 while there is one for float16.

Youā€™re saying the setpoint subject can be different types? I thought that had been settled in the type safety discussion, but perhaps Iā€™ve misunderstood? How is that supposed to work safely? How do you even know which type of setpoint your servo can use on that subject? I feel like weā€™re going in circles.

Right now the overwhelming majority of vehicles in our ecosystem use dumb PWM servos with no feedback, status, etc. Thatā€™s the basic problem weā€™ve so far failed to solve and where I think we should be focused before anything else.

The type is chosen based on the configuration of the node, so the type safety is not affected:

Alternatively, there could be a separate port for the second type of setpoint.

indeed.

Iā€™ve been looking through the battery, ESC and servo messages in the PR, and evaluating them against the needs of ArduPilot (and presumably PX4). I was going to do one post to summarize my findings for all 3, but it turns out to be much too long, so I will do battery first, then go onto ESCs and servos.
To facilitate this analysis I wrote a little hackish tool to allow me to ā€˜unwindā€™ a type from the DSDL in v1. This just automates what I did manually for my posts on GNSS and other messages on the DS-015 thread. Perhaps there was already a tool to do this (in which case a link would be appreciated!), but otherwise if anyone wants it my ugly hack tool is here:
http://uav.tridgell.net/tmp/showtype.py

Battery Messages

The README.md in reg/udral explains the battery service as the main example for udral structure. The first thing we notice is it puts the concept of how the battery will be used inside the node providing the service. I think this is a mistake. The example given shows a battery with registered names of ā€œbattery.primaryā€ and ā€œmain_driveā€. It is a bad idea for this ā€œhow it will be usedā€ information to be inside the node itself as it increases the amount of information you need to configure in the nodes, and doesnā€™t lend itself to automatic configuration. I would much prefer that nodes just present an integer ID for a battery provided by the node and do all of the assignment in the flight controller. Putting the smarts in the node makes it much harder to mix technologies (eg. analog batteries, SMBus/I2C batteries, v0 batteries and others).

Adding a generic ability for a node to have an optional string label set by the user would be OK, but this shouldnā€™t really be linked into the data type.

Diving into the messages themselves, we have 3 top level messages for batteries.

energy_source reg.udral.physics.electricity.SourceTs (at between 1 and 100Hz)
status        reg.udral.service.battery.Status       (at around 1Hz)
parameters    reg.udral.service.battery.Parameters   (at around 0.2Hz)

Letā€™s expand the energy_source message using the showtype tool:

reg.udral.physics.electricity.SourceTs.0.1 length=23
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        value: reg.udral.physics.electricity.Source.0.1 length=16
                reg.udral.physics.electricity.Source.0.1 length=16
                        power: reg.udral.physics.electricity.Power.0.1 length=8
                                reg.udral.physics.electricity.Power.0.1 length=8
                                        current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                                                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                                                        ampere: saturated float32 length=4
                                        voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                                                uavcan.si.unit.voltage.Scalar.1.0 length=4
                                                        volt: saturated float32 length=4
                        energy: uavcan.si.unit.energy.Scalar.1.0 length=4
                                uavcan.si.unit.energy.Scalar.1.0 length=4
                                        joule: saturated float32 length=4
                        full_energy: uavcan.si.unit.energy.Scalar.1.0 length=4
                                uavcan.si.unit.energy.Scalar.1.0 length=4
                                        joule: saturated float32 length=4

the good news is this only has one timestamp, although Iā€™d argue that even that is too many. I donā€™t think time-stamping this data is really useful, as the integration time for this sort of data is quite long and the transport delays wonā€™t be significant.

In the expansion, we have current, voltage, energy and full_energy. Unfortunately there is no information on what to give if the node doesnā€™t have the information. Low end CAN battery monitors (where they just connect to a battery over an XT60 for example) donā€™t know the full_energy or energy. In that case it would know the amount of energy used so far, but not the full_energy, which means it could not fill in either field.

Now lets look at the status message (sent at about 1Hz). Expanding the reg.udral.service.battery.Status message we see:

reg.udral.service.battery.Status.0.2 length=604
        heartbeat: reg.udral.service.common.Heartbeat.0.1 length=2
                reg.udral.service.common.Heartbeat.0.1 length=2
                        readiness: reg.udral.service.common.Readiness.0.1 length=1
                                reg.udral.service.common.Readiness.0.1 length=1
                                        value: truncated uint2 length=1
                        health: uavcan.node.Health.1.0 length=1
                                uavcan.node.Health.1.0 length=1
                                        value: saturated uint2 length=1
        temperature_min_max: uavcan.si.unit.temperature.Scalar.1.0[2] length=8
                uavcan.si.unit.temperature.Scalar.1.0 length=4
                        kelvin: saturated float32 length=4
        available_charge: uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                        coulomb: saturated float32 length=4
        error: reg.udral.service.battery.Error.0.1 length=1
                reg.udral.service.battery.Error.0.1 length=1
                        value: saturated uint8 length=1
        cell_voltages: saturated float16[<=255] length=2

the high value for total length (604 bytes) comes from the high maximum array size for cell_voltages. Supporting 255 cell batteries does seem like overkill to me, but maybe such sizes are coming.

Using coulombs for available charge seems like a poor choice when Ah or (better) Wh is much more commonly used. The conversion isnā€™t too hard though.

Now the parameters message:

reg.udral.service.battery.Parameters.0.3 length=71
        unique_id: truncated uint64 length=8
        mass: uavcan.si.unit.mass.Scalar.1.0 length=4
                uavcan.si.unit.mass.Scalar.1.0 length=4
                        kilogram: saturated float32 length=4
        design_capacity: uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                uavcan.si.unit.electric_charge.Scalar.1.0 length=4
                        coulomb: saturated float32 length=4
        design_cell_voltage_min_max: uavcan.si.unit.voltage.Scalar.1.0[2] length=8
                uavcan.si.unit.voltage.Scalar.1.0 length=4
                        volt: saturated float32 length=4
        discharge_current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        discharge_current_burst: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_current_fast: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_termination_threshold: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                        ampere: saturated float32 length=4
        charge_voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                uavcan.si.unit.voltage.Scalar.1.0 length=4
                        volt: saturated float32 length=4
        cycle_count: saturated uint16 length=2
        : void16 length=2
        state_of_health_pct: saturated uint7 length=1
        : void1 length=1
        technology: reg.udral.service.battery.Technology.0.1 length=1
                reg.udral.service.battery.Technology.0.1 length=1
                        value: saturated uint8 length=1
        nominal_voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                uavcan.si.unit.voltage.Scalar.1.0 length=4
                        volt: saturated float32 length=4

A few oddities here. It has a design_cell_voltage_min_max, but not the number of cells. Are you supposed to wait for the status message and look at the array length? If that is the plan then it wonā€™t work, as it is quite common for a smart battery to not be able to read cell voltages on all the cells. Common SMBus chips may (for example) support up to 8 cells but only be able to read cell voltages for 4 of the cells, but can then also give you a total voltage. In that case we canā€™t really fill things in correctly. We could calculate an average voltage for the remaining cells (ArduPilot can do this for mavlink), but it would be better to add a num_cells in Parameters.

The Parameters message lacks any model name string or manufacturer name. Iā€™d very much like to log the manufacturer info and serial number. It does have a unique_id, but I think a manufacturer string is worthwhile. I suspect that most UAVCAN battery monitors will backend onto SMBus battery systems, which do offer a manufacturer name and also a manufacture date. As the age of a battery can be quite important (for maintenance schedules at least) we really should include the date as well.

It is also worth comparing this battery message set to the closest equivalent in MAVLink is BATTERY_STATUS and SMART_BATTERY_INFO.


I think the first thing you notice is it is much easier to read and understand the mavlink XML than the dsdl. The deep nesting in the dsdl really gets in the way of making this understandable. Once you unwind the dsdl you find that the mavlink and xml are similar. The dsdl supports more cells (mavlink tops out at 14 cells, but could be extended). MAVLink uses a current_consumed instead of energy_remaining, which better fits adapters that donā€™t know the full capacity of the battery.

Overall the proposed battery messages are not terrible. There are some issues, but not awful. The ESC and servos messages have much bigger issues.

2 Likes

Using coulombs for available charge seems like a poor choice when Ah or (better) Wh is much more commonly used. The conversion isnā€™t too hard though.

Just regarding this, I know its strange, and I thought so as well at first, but I think the minor inconvenience is worth the added consistency of keeping A * s as per SI. And using this doesnā€™t hurt the data (clipping or transport bandwidth costs) so it should be fine.

Other than that, I agree on your review of the battery services for the most part. I donā€™t feel Iā€™m well versed enough in this topic to provide a thorough review though.

Also, just wanted to mention again that this tool exists (written by @bbworld1) : https://nunaweb.uavcan.org/api/storage/docs/docs/reg/index.html

Not quite the same as the tool you wrote, but probably beneficial nonetheless.

ESC Messages

Following on with my dive into the proposed battery, ESC and servo messages, here is what I got from looking at the ESC dsdl. The ESC messages are a much more complex design than the battery messages, and really show some of the design flaws in the type system for UAVCAV v1.

Before we get into the messages themselves we need to first off discuss ā€˜groupsā€™ in the documentation for ESCs in UDRAL. The doc says:

# ESCs (drives) are segregated into groups. Each ESC in a group has an index that is unique within the group.
# Drives in a group are commanded synchronously by publishing a message containing an array of setpoints.

ok, so having groups could make sense. We donā€™t need it for ArduPilot (we have a flat namespace of actuator numbering), but I could see some systems using it. The problem is I cannot at all see how the group you are trying to control is communicated. There is no group ID field in any of the messages that I can see, so I wondered if it was a top
level concept Iā€™d missed in the UAVCAN_Specification_v1.0.pdf, so I checked there and while the word ā€˜groupā€™ is used for quite a few different things (eg. a byte is a group of 8 bits), there is no use of group in there that makes sense with this ESC usage of group.

Can someone explain how groups work?

Now on to the messages to command the ESCs to spin. There are two sets of messages,

reg.udral.service.actuator.common.sp.* and
reg.udral.service.common.Readiness.

The wildcard reg.udral.service.actuator.common.sp.* is really odd. It comes from the way zero truncation in UAVCAN v1 is handled. In UAVCAN v0 it is this simple array in 1030.RawCommand:

 int14[<=20] cmd

In the proposed URDAL message the recipient needs to subscribe to Vector31.0.1.uavcan which is this:

 float16[31] value

but the sender (typically the flight controller) has to choose which of the following messages to send based on the number of actuators:

 Scalar.0.1.uavcan
 Vector2.0.1.uavcan
 Vector3.0.1.uavcan
 Vector4.0.1.uavcan
 Vector6.0.1.uavcan
 Vector8.0.1.uavcan
 Vector31.0.1.uavcan

(all of these are down inside the reg.udral.service.actuator.common.sp.* name space).

This then takes advantage of the zero truncation so when the ESC receives Vector31 and the flight controller sends Vector6 that the ESC gets zeros in the later elements.

This way of handling arrays of ESC commands is very awkward, and requires that the flight controller have code for sending a bunch of different messages for the same purpose, presumably using a set of if statements to work out the threshold.

It is also inefficient if the ID number of the ESCs youā€™re trying to address doesnā€™t start with zero. For example, you may want your ESCs on a quadplane to be ID numbers 5 to 8 (which is the default for ArduPilot quadplanes). With this structure the flight controller has to send a Vector8, with the first 4 elements as zero. It would be much better to have a bitmask at the start of message saying which ESCs you are wanting to address. That would scale nicely to a larger number of ESCs. As it stands, bandwidth usage will depend a lot on the IDs you choose.

Combined with these sp messages, we have reg.udral.service.common.Readiness, which puts the ESC into a ready
state of SLEEP, STANDBY or ENGAGED. Having a way to change the readiness state could be useful, although I havenā€™t missed not having it in v0.

Next we look at the ESC feedback messages. There are 4 separate messages for ESC feedback, following along with the philosophy in UAVCAN v1 of breaking things down into small pieces. The 4 feedback
messages are:

  feedback            reg.udral.service.actuator.common.Feedback
  status              reg.udral.service.actuator.common.Status
  power               reg.udral.physics.electricity.PowerTs
  dynamics            reg.udral.physics.dynamics.rotation.PlanarTs

Letā€™s look at these messages. First the feedback message:

reg.udral.service.actuator.common.Feedback.0.1 length=67
        heartbeat: reg.udral.service.common.Heartbeat.0.1 length=2
                reg.udral.service.common.Heartbeat.0.1 length=2
                        readiness: reg.udral.service.common.Readiness.0.1 length=1
                                reg.udral.service.common.Readiness.0.1 length=1
                                        value: truncated uint2 length=1
                        health: uavcan.node.Health.1.0 length=1
                                uavcan.node.Health.1.0 length=1
                                        value: saturated uint2 length=1
        demand_factor_pct: saturated int8 length=1

now the status message:

reg.udral.service.actuator.common.Status.0.1 length=67
        motor_temperature: uavcan.si.unit.temperature.Scalar.1.0 length=4
                uavcan.si.unit.temperature.Scalar.1.0 length=4
                        kelvin: saturated float32 length=4
        controller_temperature: uavcan.si.unit.temperature.Scalar.1.0 length=4
                uavcan.si.unit.temperature.Scalar.1.0 length=4
                        kelvin: saturated float32 length=4
        error_count: saturated uint32 length=4
        fault_flags: reg.udral.service.actuator.common.FaultFlags.0.1 length=2
                reg.udral.service.actuator.common.FaultFlags.0.1 length=2
                        overload: saturated bool length=1
                        voltage: saturated bool length=1
                        motor_temperature: saturated bool length=1
                        controller_temperature: saturated bool length=1
                        velocity: saturated bool length=1
                        mechanical: saturated bool length=1
                        vibration: saturated bool length=1
                        configuration: saturated bool length=1
                        control_mode: saturated bool length=1
                        : void6 length=1
                        other: saturated bool length=1

the power message:

reg.udral.physics.electricity.PowerTs.0.1 length=15
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        value: reg.udral.physics.electricity.Power.0.1 length=8
                reg.udral.physics.electricity.Power.0.1 length=8
                        current: uavcan.si.unit.electric_current.Scalar.1.0 length=4
                                uavcan.si.unit.electric_current.Scalar.1.0 length=4
                                        ampere: saturated float32 length=4
                        voltage: uavcan.si.unit.voltage.Scalar.1.0 length=4
                                uavcan.si.unit.voltage.Scalar.1.0 length=4
                                        volt: saturated float32 length=4

and finally the dynamics message:

reg.udral.physics.dynamics.rotation.PlanarTs.0.1 length=23
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        value: reg.udral.physics.dynamics.rotation.Planar.0.1 length=16
                reg.udral.physics.dynamics.rotation.Planar.0.1 length=16
                        kinematics: reg.udral.physics.kinematics.rotation.Planar.0.1 length=12
                                reg.udral.physics.kinematics.rotation.Planar.0.1 length=12
                                        angular_position: uavcan.si.unit.angle.Scalar.1.0 length=4
                                                uavcan.si.unit.angle.Scalar.1.0 length=4
                                                        radian: saturated float32 length=4
                                        angular_velocity: uavcan.si.unit.angular_velocity.Scalar.1.0 length=4
                                                uavcan.si.unit.angular_velocity.Scalar.1.0 length=4
                                                        radian_per_second: saturated float32 length=4
                                        angular_acceleration: uavcan.si.unit.angular_acceleration.Scalar.1.0 length=4
                                                uavcan.si.unit.angular_acceleration.Scalar.1.0 length=4
                                                        radian_per_second_per_second: saturated float32 length=4
                        torque: uavcan.si.unit.torque.Scalar.1.0 length=4
                                uavcan.si.unit.torque.Scalar.1.0 length=4
                                        newton_meter: saturated float32 length=4

This ā€œlots of small piecesā€ approach is really bad for several reasons:

  • the flight controller will want to combine the data from these messages at a single point in time in order to make decisions about whether an ESC is in a truly healthy state, or if something may be going wrong. This is needed both for in-flight diagnostics and for accurate logging and analysis of a failure.
  • it is much less efficient on the wire than just putting all the info in one message, and rapid ESC messages. When complex vehicles are using CAN ESCs then bxCAN bandwidth is at a premium, so breaking things up like this is a poor use of resources.
  • aligning the messages using the timestamp (for dynamics and power) makes it a lot more complex for the flight controller and for log analysis

The breaking down into lots of messages is a really good example of ā€˜the tail wagging the dogā€™. UAVCAN v1 starts off by defining a huge pile of base SI unit types, and now wants some way to use them all to justify their existence. Pavel has repeatedly said this is good as it makes for more re-use, but it is all topsy turvy. In UAVCAN v0 we have less than 50 messages total that are actually used in ArduPilot (PX4 has considerably fewer). In UAVCAN v1 I count 207 messages, and weā€™ve only got battery, ESC and servo so far. So the ā€˜reuseā€™ approach has greatly increased the number of messages.
It would be vastly simpler and more efficient to use flat message definitions and not do this level of abstraction. Adding abstraction layers and nesting is not a good end goal in and of itself.

Ok, back to the message fields. Most ESCs wonā€™t be able to fill in the angular_position and angular_acceleration fields, and could only give angular_velocity. The units they normally can give is an eRPM, which canā€™t be converted to radian/second without knowing the number of poles. This message set would mean that you would have to set the number of poles on each ESC, rather than setting it centrally on the flight controller and until you do that you wonā€™t get RPM logging as the ESC node would have no way to fill it in. Iā€™d much prefer to report an eRPM and also report a number of poles, where zero is used if not known. That would allow ESCs to work out of the box, giving a much better experience for end users while still allowing an RPM to be displayed and (if someone really wanted it) a value in radian_second.

Servos

The servo messages, like ESC messages, are in two parts. The first is the actuator command, for which there are two versions:

  reg.udral.physics.dynamics.rotation.Planar
  reg.udral.physics.dynamics.translation.Linear

oddly, there are both timestamped and non-timestamped versions of each of these messages. It isnā€™t clear what the purpose is of sending a timestamp in an actuation command. Is the servo supposed to not act upon the command until the timestamp is past?

The Planar message (which is likely to be the most common type of servo) looks like this:

eg.udral.physics.dynamics.rotation.Planar.0.1 length=16
        kinematics: reg.udral.physics.kinematics.rotation.Planar.0.1 length=12
                reg.udral.physics.kinematics.rotation.Planar.0.1 length=12
                        angular_position: uavcan.si.unit.angle.Scalar.1.0 length=4
                                uavcan.si.unit.angle.Scalar.1.0 length=4
                                        radian: saturated float32 length=4
                        angular_velocity: uavcan.si.unit.angular_velocity.Scalar.1.0 length=4
                                uavcan.si.unit.angular_velocity.Scalar.1.0 length=4
                                        radian_per_second: saturated float32 length=4
                        angular_acceleration: uavcan.si.unit.angular_acceleration.Scalar.1.0 length=4
                                uavcan.si.unit.angular_acceleration.Scalar.1.0 length=4
                                        radian_per_second_per_second: saturated float32 length=4
        torque: uavcan.si.unit.torque.Scalar.1.0 length=4
                uavcan.si.unit.torque.Scalar.1.0 length=4
                        newton_meter: saturated float32 length=4

The Linear message is as follows:

reg.udral.physics.dynamics.translation.Linear.0.1 length=16
        kinematics: reg.udral.physics.kinematics.translation.Linear.0.1 length=12
                reg.udral.physics.kinematics.translation.Linear.0.1 length=12
                        position: uavcan.si.unit.length.Scalar.1.0 length=4
                                uavcan.si.unit.length.Scalar.1.0 length=4
                                        meter: saturated float32 length=4
                        velocity: uavcan.si.unit.velocity.Scalar.1.0 length=4
                                uavcan.si.unit.velocity.Scalar.1.0 length=4
                                        meter_per_second: saturated float32 length=4
                        acceleration: uavcan.si.unit.acceleration.Scalar.1.0 length=4
                                uavcan.si.unit.acceleration.Scalar.1.0 length=4
                                        meter_per_second_per_second: saturated float32 length=4
        force: uavcan.si.unit.force.Scalar.1.0 length=4
                uavcan.si.unit.force.Scalar.1.0 length=4
                        newton: saturated float32 length=4

the way these work is you send NaN for the fields you donā€™t want. The vast majority of servos will only want angular_position in radians. One unfortunate consequence of the use of SI units is that it wonā€™t be possible to use this message without the CAN node knowing the angle the servo can go through. I think the only practical thing to do when a CAN node is being used to interface to a non-CAN (eg. PWM) servo is to default to an assumption of 90 degrees range, and allow for a parameter to set the actual range. For native CAN servos they should know the true angular range.

It is unfortunate how bandwidth inefficient these messages are. In UAVCAN v0 each Command was 3 bytes, which is reasonably efficient for something that is typically sent at 50 to 100Hz. In the proposed UDRAL it is 16 bytes per servo. That is a huge increase in bandwidth requirement and is just not practical for something like a complex quadplane with UAVCAN for both ESCs and servos.

The servo message set also offers an alternative approach which is based on the reg.udral.service.actuator.common.sp.* messages, just like ESCs. Unfortunately that suffers from all the problems I mentioned above for ESCs.

Overall I think these ESC and servo messages are a backwards step from what we have now with UAVCAN v0 which is surprising as I think the v0 messages are not great.

Conclusion

Which brings me to the summary from having looked through these 3 sample message sets (battery, ESC and servo). I think it shows that the lessons I tried to make clear in my DS-015 analysis have really not been learnt. There would be a lot of pain in moving from v0 to v1 for users, for developers and for vendors. To balance that pain there would have to be really big gains in what we get out of the protocol. So far all Iā€™m seeing is pain (and loss of functionality) and no gains. So there is absolutely no motivation to adopt UAVCAN v1 with UDRAL.
For ArduPilot to adopt v1 Pavel will need to have a major change of heart, and give up some of the things heā€™s been adamant on. There are many ways to do that that Iā€™ve suggested over the past months. Iā€™m sure there are other approaches, but to just keep insisting that all is well and that the detractors are just exaggerating the problems is not going to get us to a solution.
That is even without the really big issue of a clear migration path from v0 to v1. We would have to develop capabilities messages so we can automatically work out which of the new features we can use in a mixed network (eg. bit rate, message sets). Without that technical underpinning of the migration path then v1 is dead in the water.

we must give up this ā€œmust be SIā€ mantra. These standards are not being developed in a vacuum. None of the battery vendors use coulombs and certainly none of the end users, and it just gives a new way to trip up developers.

I could be wrong, but my understanding was that groups are handled using separate topics/port IDs, and named.

This is what groups are designed to address. Your quadplane ESCs would be on one group, while the fixed wing servos and forward thruster would be on another group. This way you can command each without any interference with the other, and no wasted bandwidth there. Anything that should be handled synchronously by a single control loop (for instance, you would always want to command the 4 vertical ESCs of a quadplane together) are in a single group.

Sorry, I donā€™t understand what you mean here by change the state? Isnā€™t that up to the ESC itself?

Why would you want to set this value on the flight controller? In my mind itā€™s an implementation detail that only the drive itself should be worried about. Making the flight controller keep track of physical motor specifications instead of hiding them behind a motor drive-level abstraction layer seems counterintuitive to me, especially as there are other motor properties used by more complex drives. Furthermore, both PX4/Sapog and Zubaxā€™s Myxa ESCs currently configure the poles on the ESC already, and thatā€™s worked fine for me in the past. This also allows a potential vendor of an integrated motor + drive solution to create a drive that does not need any user configuration to work in that regard.

I generally agree with your bandwidth concerns with feedback, I donā€™t have any specific comments but Iā€™m also worried that this might be a bottleneck.

Agreed.

This is a whole other beast that you should probably bring up in a new thread if you want to. :+1:

I hope that clarified at least a few things about where the design is coming from.

Thanks @tridge. In summary it seems that for the ArduPilot use cases, the cost (in terms of bandwidth, utility and complexity) of implementing highly disaggregated messages outweighs any (potential) benefit of composability, which is compounded by the limitations imposed by strict SI units. These costs would largely remain even if specific message contents and rates are fixed/resolved.
To me the remaining advantages of UAVCAN/CAN v1 are FDCAN support, message extensions/versioning, removal of TAO, and the opportunity to revisit message definitions (an acknowledged weakness in v0.9).
With no identified pathway to reasonably migrate from v0.9 to v1, immature tooling and libraries (afaik libuavcan remains WIP), and continued debate over message/service structure and core design principles, is it perhaps better to iterate v0.9 than continue going around in circles with v1?
Some of the necessary work to improve v0.9, such as FDCAN support, is already well underway (https://github.com/ArduPilot/ardupilot/pull/17336), and it is most certainly easier to manage migration taking that approach.
Any work expended in designing a refreshed message set would be re-useable for future implementation in v1, should Pavel relax his idealogical insistence on absolute composability and base SI units.
Food for thought.

that is certainly the path that ArduPilot will go down if v1 doesnā€™t change a lot from itā€™s current path.
Incrementally improving v0.9 would gain us a very clear migration path, smaller code, good existing tools and vastly less pain for users, developers and vendors.
Note that Sid already has uavcan_gui_tool working with FDCAN on v0.9.

it also add a lot of complexity.
Note that if we use a bitmask approach along with packed array of commands then we can also command any subset of actuators, and it involves a lot less complex infrastructure, less code in the flight controller, and will be an awful lot easier to understand in a network analyser. It will also be more bandwidth efficient.

being able to distinguish between ā€œhold current positionā€ and ā€œgo slackā€ can be useful in an ESC. It is common to want to get an ESC aligned with the airflow when in forward flight on a quadplane. Iā€™m not sure that the messages in this UDRAL draft are actually sufficient for that, as you really want some way to command the ESC to go into a ā€œslow rotationā€ mode while you align with a set of sensors, then command a ā€œholdā€ torque once you are happy. I suspect weā€™d actually need a new message to handle that sort of thing.
Similarly with servos it is very useful to be able to command a servo to ā€œgo slackā€ versus ā€œhold current positionā€. The difference becomes critical when you have multiple servos doing one job, and you want to be able to tell one that is misbehaving to stop trying to control position. Whether this is actually good enough for real servo redundancy will depend on the level of assurance you need. It would be much safer to have a separate mechanism to cut power to the servo when you want it inactive, and make sure you choose servos that are loose when not powered.

ok, thanks!
I donā€™t see how to make it produce a compact summary though.

Yeah, maybe thatā€™s something we should add. Iā€™ll fille an issue later and hopefully @bbworld1 can get to it at some point.

Glad to see new activity on the UDRAL standard.

Thanks for the suggestion, Iā€™ll work on adding something like this.

Correct me if Iā€™m wrong, but wouldnā€™t this be handleable by setting a force setpoint of zero? A separate message here might be a good idea anyway however, so we can see about adding a message for this.

I will try to post a more comprehensive review of the ideas on this thread when Iā€™ve read more.

2 Likes

thanks!

On battery

The first thing we notice is it puts the concept of how the battery will be used inside the node providing the service. I think this is a mistake.

I see that you made an inference that I did not expect a reader to make. My bad ā€“ perhaps I should improve the example. A microservice does not necessarily need to be aware of the context it is being incorporated into.

In the expansion, we have current, voltage, energy and full_energy. Unfortunately there is no information on what to give if the node doesnā€™t have the information. Low end CAN battery monitors (where they just connect to a battery over an XT60 for example) donā€™t know the full_energy or energy.
<ā€¦>
MAVLink uses a current_consumed instead of energy_remaining, which better fits adapters that donā€™t know the full capacity of the battery.

You are approaching the problem of network service design from the wrong end.

Conventional SOA prescribes that a network service is designed based on the needs of the service consumer ā€“ in your case that would be the flight controller. The flight controller needs to be aware of the available energy and the full energy, hence they are to be defined in the service contract.

There already are nodes out there (e.g., BatMon) that implement this interface as described. Shall a limited node be unable to uphold it, the missing parameters may need to be configured separately using the register API.

Designing a network service based on the capabilities of the least capable node is backwards.

One question that is likely to arise here is how do we implement UDRAL in the existing nodes of limited functionality that are available here and now: the most basic battery monitors, UAVCAN-RCPWM adapters, and so on. As they are unable to implement the full spec, we should either define reduced interfaces (e.g., see the differential pressure sensor demo vs. the air data computer example), or describe which quantities are to be configured using the register interface (such as the number of poles in the motor, the travel range of a servo, the full energy in the battery, and so on). If we document these aspects explicitly (easy), we would have a sensible migration plan in place that would allow us to bridge the gap between the current, limited v0 nodes and the feature-complete v1 nodes in an iterative manner, without rendering userā€™s hardware obsolete overnight.

Using coulombs for available charge seems like a poor choice when Ah or (better) Wh is much more commonly used. The conversion isnā€™t too hard though.

SI is a non-negotiable requirement for reasons explained in the Guide. We should be aware of the costs associated with not using SI.

It has a design_cell_voltage_min_max, but not the number of cells. Are you supposed to wait for the status message and look at the array length? If that is the plan then it wonā€™t work, as it is quite common for a smart battery to not be able to read cell voltages on all the cells.

Generally, my answer about service design applies. Specifically, in this case, I see no trouble with adding an extra field for the cell count as it does not seem to introduce extra coupling, abstraction leaks, or other undesirable effects.

The Parameters message lacks any model name string or manufacturer name. Iā€™d very much like to log the manufacturer info and serial number. It does have a unique_id, but I think a manufacturer string is worthwhile. I suspect that most UAVCAN battery monitors will backend onto SMBus battery systems, which do offer a manufacturer name and also a manufacture date. As the age of a battery can be quite important (for maintenance schedules at least) we really should include the date as well.

I see no problem with that. Let me update the DSDL accordingly.

Overall the proposed battery messages are not terrible. There are some issues, but not awful.

We are making progress!

On actuators

The problem is I cannot at all see how the group you are trying to control is communicated.
<ā€¦>
Can someone explain how groups work?

A set of ESC belongs to the same group if they subscribe to the same setpoint subject. If in ArduPilot you always have one setpoint subject, you always have one group. For example, in VTOLs, you may have two groups: a high-frequency group for lifters and a low-rate group for the main thrusters.

This way of handling arrays of ESC commands is very awkward, and requires that the flight controller have code for sending a bunch of different messages for the same purpose, presumably using a set of if statements to work out the threshold.

Perhaps this is awkward, and I am not particularly keen on keeping this design. The idea is to eliminate the redundancy between the array length prefix and the total message length, with the long-term goal of possibly supporting flex arrays in DSDL natively so that the multiple vector definitions could be avoided. We can easily switch back to the old regular variable-length array if preferred.

It is also inefficient if the ID number of the ESCs youā€™re trying to address doesnā€™t start with zero. For example, you may want your ESCs on a quadplane <ā€¦> As it stands, bandwidth usage will depend a lot on the IDs you choose.

Irrelevant ā€“ see how groups work.

the flight controller will want to combine the data from these messages at a single point in time in order to make decisions about whether an ESC is in a truly healthy state, or if something may be going wrong. This is needed both for in-flight diagnostics and for accurate logging and analysis of a failure.

This is a non-issue as it can be done trivially in a dozen lines of code with no significant runtime costs incurred.

Speaking of which, I should reiterate that I am willing to make the Consortium (co-)sponsor the software development on ArduPilot/AP_Periph in order to provide a correct and conformant implementation of UDRAL. I donā€™t expect the ArduPilot team and its partners to carry this burden alone.

it is much less efficient on the wire

sigh

image

putting all the info in one message

This breaks the protocol, Andrew. Did you not read my post on the benefits of SOA, and why we should care? You are proposing to ditch the sensible design for the imaginary benefit of conserving the bandwidth.

It saddens me to see that you choose to flat-out disregard the arguments I introduced earlier. It is going to be difficult to find a common ground like this.

It isnā€™t clear what the purpose is of sending a timestamp in an actuation command. Is the servo supposed to not act upon the command until the timestamp is past?

Setpoints are not timestamped, only feedback messages are. This is explained in the docs for the servo service.

I think the only practical thing to do when a CAN node is being used to interface to a non-CAN (eg. PWM) servo is to default to an assumption of 90 degrees range, and allow for a parameter to set the actual range

Just like in the case of motor poles in an ESC, the missing context is to be provided through configuration.

Overall I think these ESC and servo messages are a backwards step from what we have now with UAVCAN v0 which is surprising as I think the v0 messages are not great.

While certainly imperfect in some aspects, the new design is supposed to be a leap forward, although to understand that we need to do some aligning on the values of service-oriented design first. This also involves changing the typical UX compared to what we have with v0.

Next steps

We are clearly failing to communicate here. For some reason, you are choosing to stick to the old practices instead of at least considering the new values I am trying to communicate.

Let us try a different approach. I suggest you, Andrew, to come up with an alternative set of DSDL definitions addressing the three use cases weā€™ve discussed here: smart battery, ESC, and servo. If you donā€™t mind, I recommend we call it ā€œAPDSDLā€.

I ask you to respect only the following rules:

  • No fixed port-IDs ā€“ UAVCAN v1 doesnā€™t really support them outside of the standard message set.

  • Adhere to SI ā€“ there is no valid reason not to use SI; the fact that the smart battery spec or any other standard doesnā€™t use SI couldnā€™t be less relevant.

It is probably best to host this in a separate repository for now.

You may find Nunavutā€™s HTML backend helpful here. This is how you invoke it on a specific DSDL namespace:

nnvg --experimental-languages --target-language html /home/andrew/apdsdl --lookup-dir public_regulated_data_types/uavcan

Iā€™m proposing to design it to include support for both least capable and most capable nodes. This can be done by either adding a consumed_energy alongside full_energy or just having full_energy.
The idea is to get as much functionality as possible with the minimum of user configuration. Having a CAN battery monitor provide useful information without it being configured is a good thing. If the user then does configure it then it can provide more information.
It is also very common with these XT60 based adapters to change battery capacity between flights. It is much easier to let the CAN adapter report what it can directly measure (consumed energy) and let the user configure the total capacity in the flight controller using existing interfaces for that setting.

this is only a reasonable approach if we strive to minimise the number of additional things to be configured. For v1 to have any hope of adoption it needs to provide a user experience at least as good as v0, preferably better. Otherwise v1 will quickly get a reputation of being painful to use and best avoided.

This is a dogmatic and poor decision. Using Ah or Wh would make for less mistakes. It is what the underlying devices will use, and it is what the target for these services will use. It is also what the users expect, and what engineers and developers are used to.
Using coulomb just increases the chance of bugs due to unfamiliar units.
In scientific literature and in engineering using domain specific units is common and good practice. In astrophysics solar masses are used, not kg, when talking about the weight of a black hole.

this doesnā€™t seem to add any value. To use it requires more coupling down to the network layer. Far better to offer a larger linear addressing scheme and let the flight controller deal with breaking that into groups if it wants to.
For example, a msg like this:

  uint8 base_idx
  uint8 mask
  float16[<=8] command

that allows up to 8 to be sent in a single message, it has the same network cost no matter what the indexes are and allows for up to 263 actuators.

It canā€™t be done correctly in a dozen lines of code. You need to cope with missing packets and you need to cope with the timestamps not matching (we donā€™t control all the implementations). It also requires memory allocation for the buffered messages, which could come in in any order. So to implement it correctly is quite complex. It is just a completely unnecessary complication.

Showing the bandwidth is OK for one config, or even a set of configs, does not mean that throwing away bandwidth is OK.
One of the key things users will want is to be able to push their hardware to the limit. As Iā€™ve told you several times, there are existing systems out there that already use nearly all the bandwidth on 1MBit buses. Simply dismissing bandwidth concerns is not going to see v1 adoption.

yes, and absolutely nothing in your reasons aligns with the needs of the ArduPilot developer or user community. Weā€™re not going through a purely academic exercise here. At the end of the process I need to justify the considerable pain and time involved in switching to v1, including hardware costs (more flash), bandwidth, software complexity, tools etc.
Right now v1 is not looking attractive at all.

I could say the same.

that again makes the user experience worse.
If we provide an eRPM and we also provide a field for poles then you could sell UAVCAN ESCs that do not need a config tool to be run to change node config. The node ID could be auto-assigned, and the user could turn the propellers in sequence to tell the GCS what order the propellers are in and what spin direction to use. The filght log can then show an eRPM and the user has a simple to setup, easy to use system.
If instead we require the ESC to give a number in radian/sec then the ESC canā€™t report anything till the user has configured each ESC with the right number of motor poles. That makes these ESCs less appealing for wide usage.

that just isnā€™t going to happen. It is a recipe for a very unpopular protocol in the UAV space.

we are looking at a new message set, but weā€™re doing it in the v0 framework. At this stage it seems unlikely you will be willing to accept the sorts of changes to v1 that would be needed for it to be an attractive option so evolving v0 seems like a more productive approach.

The basic adapters you are referring to here are not smart batteries, so trying to interface them using the smart battery interface is not expected to work. This is one of those cases where a more limited interface is needed to bridge the gap between the devices that are currently available and the full-featured UDRAL nodes.

A message like this is inferior to the alternative that contains only the setpoints because it couples the means of group management (base_idx, mask) with their application-layer data (command). It may not seem like much of a difference at this point, but as systems grow in complexity it becomes critically important.

I am confident that one can always find a way to run out of bandwidth regardless of how much bandwidth optimization was done. The question is whether we should care. We have robust data that shows that common configurations do not require much more than half of the available bandwidth, and we build our decisions upon that. You seem to be driving your design based on a vague notion of running out of bandwidth without having ever introduced any evidence to support that.

You are aware of how MAVLink breaks for marginally unconventional use cases that have not been envisioned by the designer of the data types, which occasionally requires arcane solutions for trivial issues (like reporting the fuel tank level using messages designed for smart batteries). Problems of this sort will grow in magnitude over time as the industry moves on to more sophisticated architectures, and I donā€™t want UAVCAN to be an obstacle to that, like MAVLink.

Let us review the changes you consider necessary here, please. I am confident that nobody would benefit from you sticking to v0 and the MAVLink way of handling data on the network.

Today is October 15 ā€” per the original plan, the UDRAL proposal is intended to move on now.

Based on the feedback provided by @tridge, we expect that ArduPilot is not going to accept the new standard due to the incompatible business requirements; yet, accommodating their requirements would compromise the design. It does not appear to be possible to find a middle ground here that would satisfy all involved parties, therefore, we are going to split the project into two independent efforts:

  • The UDRAL standard is going to continue on its current course.
  • A separate application-layer standard under the working name of v0_style will be developed alongside.

As the name suggests, the second standard is going to be designed in the spirit of UAVCAN v0, with fixed regulated port-IDs and other features inherent to v0. Based on the inputs provided by Andrew earlier, @dagar and @coder_kalyan have drafted up the initial vision of v0_style on a separate branch here:

I would like to ask @tridge if this is more in line with his expectations. Kalyan and Daniel may add extra details later. I am going to set up a separate place for the second SIG here soon.