First draft of the UDRAL DSDL namespace

The lengthy debate on the architecture of UDRAL culminated in the following draft that is currently maintained on a feature branch of the public regulated data types repository:

Everyone is welcome to read through the enclosed documentation in README.md and submit feedback in this thread.

At the moment, the draft only includes the definitions for three basic network services: smart battery, ESC, and servo. Additional services will be added incrementally. One known issue is the reg.udral.physics namespace which may need some refactoring, any help to this end would be welcome.

Those interested in the basic ideas behind this design may find this thread relevant: On multi-agent services and design guidelines

The demos have been updated to use the UDRAL draft instead of DS-015:

Overview

This design is based on the extensive notes taken by @coder_kalyan and @bbworld1 earlier, as well as the late DS-015 draft. The new design is intended to address the following issues raised on this forum:

  • Composability and service orientation – by inheriting core design principles from DS-015.
  • Type compatibility validation – by adopting the type compatibility check proposal submitted by @VadimZ.
  • Automated port-ID assignment – by adopting the automated port-ID allocation proposal for which I made a PoC.
  • Well-documented port-ID assignment – by adopting Vadimfiles containing the mapping from port names to port-IDs as proposed by @VadimZ.
  • Bandwidth limitations – by validating the expected bandwidth utilization of this design against a set of typical vehicle configurations.
  • Cognitive complexity – by relying on auto-generated rich HTML documentation for reading the data type definitions instead of raw DSDL files.

Find details on each item below.

Composability and service orientation

The draft upholds the principles of service-oriented design as explained in the Interface Design Guidelines published in the UAVCAN Guide.

This approach necessitates splitting large specialized messages as found in v0 into smaller orthogonal messages. Subscribers that are interested in the full set of data will need to perform timestamp-based matching. One related concern was that of the bandwidth implications, which is addressed in a dedicated section below.

Type compatibility validation

@VadimZ has demonstrated a convincing prototype of a data type signature that is compatible with the structural polymorphism and extensibility introduced in UAVCAN v1. The technical details are discussed in the dedicated thread.

At the moment, the draft does not mention this capability at all. I would like to invite Vadim to submit a pull request adding the documentation for his type signature directly to the UDRAL’s README document. Also, the actual implementation of the signature generation code should probably be accepted into PyDSDL, as well as data types generated by Nunavut should be annotated with their signatures.

Automated port-ID assignment

A detailed description is already available in the README document. The only implementation available at the moment is my PoC in Python. Volunteers are needed to provide a production-grade implementation of this logic in C++ for eventual inclusion into PX4 & ArduPilot. Please report here if you would like to take this on.

Well-documented port-ID assignment

Vadimfiles are intended to define the mapping between ports and their identifiers. This proposal has seen very little development so far, so I would like to encourage @VadimZ to continue on this promising path once the type signature is documented in the UDRAL doc and its first implementation deployed in PyDSDL and Nunavut. I suggest we start a separate thread dedicated to this sub-project.

Bandwidth limitations

A deeper inquiry revealed that the new design is not significantly inferior to v0 in terms of its bandwidth utilization. This is mostly attributed to two properties: 1. The new design provides a standard interface for turning off undesirable publications via uavcan.pub.*.id, which is not possible with the legacy version. 2. The new design avoids large specialized messages in favor of smaller, generic, orthogonal messages, which can be published at different rates thus conserving bandwidth.

While the new design does indeed require more bandwidth to attain the same goal, the difference is not significant enough to make it unsuitable for Classic CAN networks (which are not the primary transport for this standard). For reference, this spreadsheet submitted by @Dima directly compares the v0 vs. v1 bandwidth utilization for a reasonably complex VTOL:

There is another assessment for a similar configuration done for v1 only:

Cognitive complexity

While the HTML documentation generator created by @bbworld1 is not yet finished, it already provides a significant improvement over browsing DSDL files manually. Please submit feedback and change proposals on GitHub.

Next steps

I started this thread to collect feedback and introduce appropriate modifications to the draft. Once this stage is completed, the draft will be merged into the main branch of the public regulated data types repository, replacing the defunct DS-015 proposal.

We need to complete the two major proposals submitted by @VadimZ: the type signature and the Vadimfiles.

Also, the physics namespace requires a review to determine whether its refactoring is necessary.

2 Likes

Nice milestone! While not (yet) perfect, I think this is a significant improvement from the days of DS-15 struggles and can be a base for future improvements.
If this seems a lot to digest (and it certainly is), what might help is to see and review it as combination of (mostly) independent proposals:

  1. UAVCAN service discovery and initialization protocol
  2. supporting tools and conventions
  3. specific service/message design for drone application area

The first two items, if proven useful, would be applicable to all of UAVCAN beyond drone.

2 Likes

Beside the inno_vtol bandwidth it would be nice to see a similar calculation based on any complex uavcan application. It is especially interesting to see something near to the limit (I remember that tridge wrote that they are running out of bandwidth on v0 already). For the attached case with inno_vtol it seems that increased frame rate after moving to the v1 version will not lead to any issue. Even if we use our uavcan hitl simulator that publishes additional imu data, it will be ok.

1 Like

Are the values in this spreadsheet correct? I went through a very similar exercise a while back, and for standard COTS components which do actually include the “optional” covariance matrices in the v0 messages, then the actual frame count per message is considerably higher than what is listed. For example, the frame count for a compass is actually 2; for the GNSS Fix message, it’s 12… for an accurate bandwidth utilization study, the “worst-case” message size from v0 would be a more accurate representation of the typical bus utilization.

In any case, it’s good to have a nice example fleshed out for a reasonable VTOL vehicle, and still have the bandwidth utilization for v1 on a 1Mbps CAN 2.0B bus be < 30%. I think the bandwidth for our (Volansi’s) vehicles might be higher, but on the other hand, we may split devices between two buses, and further, we’re planning to adopt CAN-FD as soon as we can, so it will be a non-issue either way.

My understanding is that @Dima’s spreadsheet was supposed to model a very specific vehicle in its current configuration rather than a synthetic worst case. If you could share a similar one for your case, that would make us more confident in our bandwidth analysis.

I am planning to open a pull request from the current udral branch against the main branch after Vadim’s signature PR is taken in. I would like it to be merged by Oct 1st Oct 15 unless there are major change proposals. Note that all of the DSDL definitions in it are versioned as v0.1, so we are not committing to long-term stability yet.

The pull request is up:

Responding to @coder_kalyan’s comment on GitHub:

I would suggest allocating more time for interested parties to actually take a look at it in depth and comment

I see no problem with that. Shall we make it Oct 15? As I just wrote above, merging this PR does not amount to any hard commitments.

Sounds acceptable, but considering the previous feedback on the original DS-015 message set, we should wait until everyone is actually content with the draft before merging (and I don’t think everyone is currently), which may or may not take until after Oct 15. The point is to make a satisfactory message set, not meet a deadline, if that means ripping it up later on.

I’ll provide my own neutral review soon when I get a chance.

This objective is likely to be unreachable within a realistic time frame, which is why having a soft deadline is important. If one prefers a v0-style design, that is simply not going to happen regardless of the amount of time allocated for the debate.

The issues raised here: https://forum.opencyphal.org/t/meeting-minutes-july-21-2021-utc-udral-call/1365
and https://forum.opencyphal.org/t/uavcan-drone-application-layer-sig-guidelines/1280/16 provide a good summary of the requirements a UDRAL proposal should meet. The definitions in the PR are nothing but find/replace drone/udral on the DS-015 definitions. With no attempt to address the shortfalls identified in that message set, I see no reason to expect different feedback on them.
As I’ve mentioned before, I think we need to get the design agreed and the tooling in place before publishing message definitions. I don’t think we’re at that point yet.
I’d also suggest that as definitions are developed we leave them in a branch until they’re iterated, agreed, and stable. MAVLink has demonstrated many times that WIP messages have a habit of making their way into production systems, making it very difficult to iterate and change them. That’s why in MAVLink we’ve moved away from WIP messages in favour of development.xml (which is annoying, but a necessary compromise given the workflows in that project).

The proposed design addresses every significant issue raised on this forum so far, which I explained in the OP post. If you find this to be untrue, please, provide a detailed response instead of making vague references to past discussions. The UDRAL DSDL codebase did not see significant change compared to DS-015 because none was necessary to address the known issues.

Regarding the risk of publishing WIP types: I see your point, but I imagine that UAVCAN is reasonably shielded from this risk by virtue of incorporating a well-defined version number with every definition. One should not expect a data type with a version number of v0.x to be stable. Yet, having WIP types available in the main branch is helpful as it encourages early-stage experimental adoption.

Re WIP / versioning. WIP messages in mavlink were explicitly tagged as WIP in the xml. Didn’t stop people deploying them. I understand the intent to make prototyping easier, but we also need to protect users as much as we can. Any dev can build off a development branch, which in my view is a safer approach. Perhaps we could ask the SIG/consortium?

Re specific points, if you’d prefer I copy them in here, sure:

Andrew TridgelltridgeUAVCAN Consortium member representative

May 4

yes, we should have simple ones that just reflect the real data. I don’t think we should add the “air_data_computer” unless there is going to be real hardware that will really be used that needs it.

we should just stop using covariance matrices. It just encourages developers to make up meaningless numbers. We shouldn’t be wasting bandwidth on stuff that is just made up.
A few guiding principles in message design:

  • don’t add fields unless there is a real need for them
  • don’t add fields that force the developer to make stuff up that they don’t really know

closer, but should remove a bunch of fields.

  • I don’t think the timestamp really has value on this message. Timestamps have enormous value on messages like GNSS position and velocity, but on differential pressure used for airspeed I don’t think it is useful. The time of arrival is fine.
  • remove both filter_delay and the filtered differential pressure. It would only make sense if we were greatly reducing the sample rate on the bus and I don’t think we are likely to be doing that.
  • get rid of the variance, as the sensor is unlikely to really have a good measure of that
  • only have one temperature

We should aim to get it down to a single CAN frame if possible.

but that doesn’t tell you what this reading actually is. It just says it is a difference between two pressures and a temperature (plus a pointless timestamp and covariance). We need it to be broadcast in a form that says “this is from a pitot tube, if you want to get a pitot based airspeed you can use this”.

To summarise, this PR does not address the fundamental problem:

Vadim has attempted to address the port identification/type safety issues, but the other significant rub points are untouched.
The DS-015/UDRAL messages assume high level functionality in each node, and this simply isn’t the reality for most of the systems that currently use UAVCAN. For some nodes, such as actuators or gimbals, this level of abstraction can be made to work. For many sensors, such as GNSS and other low level devices, it can’t without introducing risk. It is not reasonable to expect that this can be overcome by a one-off configuration by the integrator, or by transferring complexity to another system (ie the autopilot).
Developing a standard based on an idealised expectation of some future reality doesn’t make sense. Having the scope/flexibility within the standard to adapt to a future state is important, but for the standard to have any chance at success it needs to also be able to functionally replace v0 (particularly given that you’ve deprecated v0).
The reality is that if UAVCANv1/UDRAL doesn’t achieve that, it fails. With what you’ve presented here, it fails.

I provided a detailed review of the benefits of SOA in this post: https://forum.opencyphal.org/t/port-type-safety-enforcement/1303/73?u=pavel.kirienko. The bandwidth concerns (along with the derived issues, such as timestamping and covariance) appear to be unsubstantiated, as illustrated in the OP post.

As for the other points, they are already covered in the OP post, and I see little value in restating them again.

I agree and disagree with some of the points made here. Note that I also made some notes in the UDRAL planning document based on a compromise, and I’d like to see some of those decisions implemented here (I think they were quite reasonable).

Developing a standard based on an idealised expectation of some future reality doesn’t make sense. Having the scope/flexibility within the standard to adapt to a future state is important, but for the standard to have any chance at success it needs to also be able to functionally replace v0 (particularly given that you’ve deprecated v0).
The reality is that if UAVCANv1/UDRAL doesn’t achieve that, it fails. With what you’ve presented here, it fails.

Re flexibility: My original design draft aimed to create usable low level data types as well as high level services for each service class. The idea was to make it very practical to use low level types while providing a path to transition to high level services. Most of this is alright currently, but I believe some of the physics namespace types are meant for higher level services and don’t really promote the use of direct publishing.

we should just stop using covariance matrices. It just encourages developers to make up meaningless numbers. We shouldn’t be wasting bandwidth on stuff that is just made up.

I agree with this. Unfortunately no one has convinced me yet that they are useful enough (or useful at all) to justify the bandwidth they are taking, regardless of whether the bandwidth can be spared.

We should aim to get it down to a single CAN frame if possible.

This is not a necessary goal for a GNSS or air data frame. However, it is definitely a necessary goal for an ESC setpoint - something that I believe I outlined in the planning document which was not implemented in the message draft.

Developing a standard based on an idealised expectation of some future reality doesn’t make sense.

This is only partly true - standards tend to (and should) outlive specific implementations and design choices of the day. The best we can do is think hard about the current “best practices” in the hope they are somewhat future proof. I think UAVCANv1 does that correctly.

The bulk of the call was basically to-ing and fro-ing about “how do we use an architecture designed for high level distributed computing as a low level sensor network”.

We don’t - the idea (in my mind) was to provide a set of solid low level types that vendors can use temporarily without concern, while providing higher level abstract services to migrate to that are more practical in a more complex system.

For many sensors, such as GNSS and other low level devices, it can’t without introducing risk.

How so? I believe all the proven issues were taken care of by @VadimZ’s proposals.

In any case, it’s good to have a nice example fleshed out for a reasonable VTOL vehicle, and still have the bandwidth utilization for v1 on a 1Mbps CAN 2.0B bus be < 30%. I think the bandwidth for our (Volansi’s) vehicles might be higher, but on the other hand, we may split devices between two buses, and further, we’re planning to adopt CAN-FD as soon as we can, so it will be a non-issue either way.

I like the idea here - but the question of what vehicle to use as a standard remains. UDRAL currently takes up more bandwidth than I’d like on a 1Mbps CAN 2.0B bus, which is the reality of the hardware most of us are working on. For instance, my team develops an octocopter, and we’d really like to be able to fit that as well…

I think the approach of presenting abstract arguments (or pointers to past arguments) is unlikely to bring consensus here.

I would like to propose a two-part approach:

  1. Try reviewing transport, service discovery and tooling part (service registers, nunaweb, type signature etc) separately from message formats. Here the basic requirements are likely to be uncontroversial, so it should be possible to find common ground.
  2. When discussing the message formats, it would be more productive to focus on comparing specific, fully described alternatives of message implementation, before their abstract properties. It might also make sense to postpone this discussion until after some common ground is established on part 1.
2 Likes

@auturgy, are you seeing remaining rub points outside of message design (with several sub-issues such as granularity, nesting, timestamping and bandwidth optimization) ?

That’s a good starting point, lets examine a GNSS + magnetometer device case in more detail ? We are actually manufacturing those now, so I could meaningfully participate …

When discussing the message formats, it would be more productive to focus on comparing specific, fully described alternatives of message implementation, before their abstract properties.

Sure! Let me propose something simple below then, related to the actuator setpoint service, since it’s a very simple case to start out with:

Currently the generic setpoint messages look like this:

float16[<n>] value
@extent 16 * 256

I propose modifying it to use 14 bit integer setpoints (same as V0):

int14[<n>] value
@extent 16 * 256

The above has the benefit of halving bandwidth usage on quadcopters when using Classic CAN (7 byte payload * 8 bits / 4 setpoint values = 14 bits, fits in a single frame), which are likely one of the most common use cases. Octocopters also see an decrease in bandwidth usage. Since ESC setpoints are relatively heavy (high rate), this is a significant improvement, and also allows the integrator to potentially raise the ESC setpoint frequency. There’s also minimal loss in semantic meaning - they are still normalized/scaled ratiometric setpoints with sufficient resolution [-8192, 8191].

Setpoint efficiency
  • Approve (14 bit scaled integer setpoint)
  • Disapprove (keep the 16 bit float setpoint)

0 voters

I think I like this idea, though for slightly different reasons.

The change from floating point to fixed point would require explicit scaling parameter for non-ratiometric control variables (those with physical unit such as current or rotation speed). Having this extra parameter slightly increases configuration complexity, however it makes the quantization uniform, and more importantly, predictable and explicit.

FP16 would provide and illusion of infinite range, and then surprise with unexpected non-uniform quantization effects

I think I can live with the general high level design and defining services where each subject is mapped to a particular register and all the required pieces to generate code end to end.

The number of subjects per servo seems a bit excessive (4 publications, 2 subscriptions in the demo), but I suppose with appropriate tooling it could be tolerable.

Isn’t a reg.udral.physics.kinematics.translation.Linear.0.1 sent to each servo kind of excessive? What about having some flexibility in the services so that a manufacturer can expose what’s even supported in the first place.

Instead of having so much flexibility per service why not simply carry different services for variations that actually exist? Leaving things open to interpretation with multiple options and details buried in an comment essay seems like a good way to ensure every vendor will carry their own set of implementation quirks.