Problems with DS-015

Maarrk · May 5, 2021, 7:05pm

Thesis: The direction DS-015 is going into fundamentally misaligned with ArduPilot expectations of UAVCAN. Alternatively, ArduPilot community is “misusing” UAVCAN to create sensor networks.

Proposal: Not deprecating UAVCAN v0

Having read the sources recommended by @pavel.kirienko, I agree that the requirements proposed by @tridge are incompatible with current DS-015. At the risk of adding a “+1 type post” to the discussion, I also think that ensuring interoperation of v0 and v1 in the same networks would be more important to current UAVCAN on ArduPilot users than getting a design-wise better, but currently (!) less usable standard.

I think that ArduPilot cannot fully embrace the current goals of UAVCAN. The very first design goal is the first incompatibility that, in my opinion is not really feasible to implement:

Democratic network— There will be no master node. All nodes in the network will have the same communication rights; there should be no single point of failure.

I think that there must be a master node in the UAV, just as there is a pilot in a manned aircraft. Even if a co-pilot is present, there is no democracy. One senior entity has command, and I believe the same has been true at sea for millenia. Vast majority (if not all) of ArduPilot users run their UAV as a central flight controller connected to a network of sensors and actuators, and for complication and cost reasons I doubt that will change anytime soon.

It is understandable that developers of the AP flight stack, acting in the interest of us (AP users) are repeatedly pushing to use UAVCAN as a basis for a star topology with only the central element being “smart”. I agree that a more distributed model of computing has a potential to massively improve capabilities of autonomous vehicles. But from the user feedback it appears that a large part of the UAV community is not (yet!) ready to benefit from those.

From the point of view of a UAVCAN developer and maintainer, it certainly would be better to mark the v0 as legacy, and only focus your efforts on improving v1 feature set. There is however a major problem with that:

Isn’t data exchange precisely what ArduPilot expects from UAVCAN? - provide a common interface to peripherals that would be compatible with CAN physical layer. In line with the “improve, not fork” attitude AP developers turned to UAVCAN standard, and after spending a considerable development effort they received the features they wanted.

I am afraid that UAVCAN v1 decision-makers dismiss these functions as trivial, but all users do with their UAVs is:

connect sensors
connect motors and servos
set parameters

Even if current v0 implementation does not have infinite routes of self-governing improvement and new data types, as long as it provides these services it is really usable for us (UAV operators). Having the ability to update firmware over data link? Brilliant, love it, but it would’t be a deal breaker for me if this wouldn’t be possible. Vendors and developers have to coordinate their message types? It would be great if it was automagically solved by devices, but as long as the developers figure it out I’m okay with that. Even though there seems to be a divide between ArduPilot and PX4, because of using MAVLink usually “it just works”.

To sum up my lengthy opinion-filled post: From the fact that all discussion constantly derails into v0/v1 critique, maybe this is what the Drone Special Interest Group is actually interested in. I am afraid that for my use case, the benefits of v1 are remote, too advanced and academic.

I would feel cheated upon if I was invited to collaborate on a standard supposedly meant to help me, but all my input was dismissed with “But you see, you’re making your drones wrong. We will not work with you on simple sensor networks.”

tridge · May 5, 2021, 9:38pm

Thanks Marek.
I should also explain that I am very familiar with distributed computing. I programmed systems with 10s of thousands of CPUs back in the 90s when those types of systems first came out (eg. the CM2). A large part of my PhD research was in parallel computing. That type of parallel algorithm is different from what Pavel is talking about here (distributed algorithm versus distributed components), but many of the key ideas are in common.
There is a huge difference between a network that enables you to distribute components and one that tries to force you to distribute components. I am perfectly happy for ArduPilot to use distributed computing components. A node that just does state estimation (implementing the ArduPilot 23 state EKF in a CAN node) is something I have thought about doing, as I’ve wanted to create an ArduPilot implementation of “state estimation in a box”, much like VectorNav, InertialLabs etc. That was in mind when I added the “external AHRS” subsystem in ArduPilot recently, and you are likely to see that being an option with AP_Periph in the future.
That type of system where the system designer can choose to distribute components is perfectly achievable with v0. We already have ArduPilot partners flying triple-redundant autopilot systems (3 flight controllers, 2 in standby) and they are doing it with UAVCAN v0.
Trying to force distribution of the algorithms involved in flying a UAV does not make it more robust. In fact it will almost certainly lead to it being more complex and less robust. Enabling a system designer to distribute components where it makes sense for the design of the vehicle can make it more robust, but that must come from an overall understanding of the failure modes and a careful design of the system to minimise those failures.
The vast majority of ArduPilot users are best served with a central flight controller with a set of attached sensors and actuators.
Cheers, Tridge

pavel.kirienko · May 5, 2021, 10:22pm

Hi Marek,

Thank you for the sensible input. It is great that you recognize the potential benefits of the distributed architecture.

I think your point about the distinction between democratic and centralized networks might be a bit off the mark. The core design goal that you mentioned — that the network be democratic — refers to the communication protocol design, rather than the design of the application built on top of said protocol. The objective here is that the network itself does not require centralization, the rest is irrelevant at this level. I think this point is not actually related to the conversation; but, while we’re at it, it’s pertinent to mention that the very same design goal is present in the v0 specification, and in the CAN specification itself (it uses the term “multimaster” to the same effect).

This thread is mostly about v0 vs. DS-015 rather than v0 vs. v1 since we are primarily discussing the application-layer aspects rather than the underlying connectivity. As I wrote earlier, and as previously suggested by @dagar, we can stretch DS-015 towards simpler architectures that involve the exchange of basic, concrete states, as opposed to rich abstractions. I call this “non-idiomatic DS-015”, but a better term is welcome if one comes to mind. The point is that we address the existing requirements using the means provided by DS-015 while avoiding undue fragmentation of the ecosystem. I think we should construct a convincing proof-of-concept to demonstrate that the adoption of that “non-idiomatic DS-015” does not really go against the business objectives of ArduPilot. I need @tridge’s input for this to be sure that we are sufficiently aligned; see the “call to action” in my post above.

Speaking of the business objectives, I don’t think we have seen any evidence yet that the special interest group (SIG) is actually interested in the perpetuation of v0 or its particular style of problem-solving. This is rather an XY problem.

According to my mental model, the users you mentioned don’t want to transfer data or build a sensor network. They don’t want to build a distributed computing system either. Rather, they want their unmanned vehicles to do the job well. In this light, the approaches proposed by v0 and DS-015 should be considered equivalent. Does the user care whether the vehicle computes the airspeed on the flight management unit or on a separate node? When you flip your cellphone from portrait to landscape, do you care or know which part of it was responsible for determining its orientation? Is that the CPU or a dedicated IMU chip?

Leaving aside the purely technical benefits offered by the more advanced architecture (which you have already recognized and which have been covered earlier in this thread), there is also the network effect.

An architecturally cleaner design that is sufficiently abstracted from the specifics of one extremely narrow set of applications (i.e., small drones of a particular configuration) can foster a much larger ecosystem of DS-015-compliant products (hardware and software) and services (design & consulting). Meaning that:

Vendors of said products benefit from a larger target market (not only small drones that run ArduPilot or PX4 but also robotics and other advanced cyber-physical systems).
Users of said products benefit from a larger pool of available options thanks to the expanded market of DS-015-compliant items.

It is my understanding (based on the years of engagement with various users of UAVCAN v0) that these benefits are relevant to the SIG, even though a regular user may not immediately recognize it.

Maarrk · May 6, 2021, 11:04am

Thank you for your explanation Pavel. I misunderstood the scope of this design goal, indeed it does not prove the point I was trying to make. I hoped to illustrate that on application level, we are having centralized master-slave design, and I doubt we’ll move away from it.

I share the opinion that it would be better if a communication standard did not require every node to have complex processing in order to cooperate. Requirement for uncomplicated data on the network forces to move the complication of processing the data to the nodes. As some other posts here have pointed out, for this specific application that is undesirable.

But it is the central element. The user assumes it is the central element. We still call it the autopilot, like the single person that controls the aircraft. Please don’t force the users of the standard (both developers and end users) to reject this and pretend it is not true. A more abstract, extensible standard will be welcome by those who use it. I am afraid that for all others it will be a source of confusion.

These users do care about implementation

In a perfect environment the implementation details are hidden from users, but that rarely is the case for us. I have no idea what kind of architecture a drone sold by DJI has, and I will never need to know because that is a closed, proprietary system. For user experience, the openness of the standard and wide variety of vendors is a double-edged sword. Things will inevitably be misconfigured, come with wrong defaults, or some things will need to be fixed by a “simple software update”, that will require the user to install SLCAN drivers, flip their autopilot to passthrough mode etc. Learning to do that is not trivial with just the autopilot, and I fear that I’d need to learn to configure every smart device I buy. Even if UAVCAN provides a unified parameter service, which I do appreciate, there will be need to read separate manuals to learn what these parameters do.

I am afraid, that there will never be “in the long run” in this particular case, because people turn to open flight stacks in order to build their custom machines with cutting edge capabilities, using devices available for sale since only the previous year. If they were fine with waiting until a feature is widespread and well-developed they would simply buy a whole COTS drone.

With that in mind, even ignoring current solutions, I believe that keeping the application architecture similar to the physical system is a valid advantage.

The tolerant middle ground

I think the key disagreement happens on the “smart nodes - simple nodes” axis. It seems that participants of this discussion stand at and see from different points on this line. I hope that everyone will look at the following example from the same direction:

In the new approach we find attitude control too coupled to the specifics of the implementation. We define a new service called reg.drone.attitude.Roll. This may be provided by an aileron servo, or an ESC with propeller placed on a relevant arm of the multicopter. Thanks to this new service-oriented approach, the system will be much more composable, as you will be able to swap autopilots between different airframes more easily. The autopilot will no longer need to know if it’s driving a fixed-wing or a multicopter aircraft.

There is a decision to be made how far do we take the service concept. The example is a bit extreme, but this is how I perceive the discussion when coming from the simpler side. I hope this illustrates that this specific application calls for some special treatment.

I appreciate that Pavel recognises that a compromise needs to be made. I don’t think that anyone is trying to persuade to abandon the long term goals of UAVCAN. Just to stretch the standard to satisfy the needs of its adopters, as perceived by them, if you care about adoption at all. Even if the requests seem wrong. It is true that the XY problem response does apply to me, but I would restrain myself from suggesting that about core AP dev team.

tridge · May 7, 2021, 11:06am

A few problems with this:

monolithic kernels won, and for very good reason. The complexity of the layers needed to make real microkernels created worse problems than what microkernels tried to solve. That old microkernel vs monolithic kernel debate is long over, and monolithic won. There is a good parallel with your vision for v1 - you’re making the same mistake that Tanenbaum did. The complexity inherent in your vision of smart nodes will make the system, taken as a whole, much more complex and less robust.
You seem to be implying that you can’t do hard real-time with sensor nodes feeding a central autopilot. You can. Real-time is completely orthogonal to the topology.

That is a complete cop out. We can and should expect plug and play to be the default for most users just like it is for v0. Saying this is like a firmware writing itself is utter nonsense.

it is a good example, but only of why the DS-015 idiom is an absolutely terrible one. It brings no tangible benefit to our user community, increases complexity and greatly increases fragility of the system. Alignment of firmware versions to get a working system gets a lot harder.

no, it’s not. The air data computer doesn’t have the information available to do it.
I’m not sure if you realise that selection of sensors in a modern autopilot is dynamic not static. The EKF will switch what sensors it uses based on the data, not just based on a configuration. You can’t just assign an ID and think it is equivalent.
We’ve been moving towards more and more dynamic sensor selection in ArduPilot as it is a huge win for robustness. Take a look at the “sensor affinity” documentation for ArduPilot EKF3 as an example:
https://ardupilot.org/plane/docs/common-ek3-affinity-lane-switching.html

this could work for differential pressure, but would not extend nicely to GNSS and other more complex sensors.
The “smart nodes” and “forcing decentralisation” in DS-015 is terminal. We need a completely different message set with a design that meets our requirements.

I have in fact prototyped that, tunnelling i2c and spi, but I abandoned that path as it has awful timing requirements, and results in very fragile sensors, plus uses a lot more bandwidth. So no, we’re not going to do that for commonly used sensors.
We may support it for rarely used sensors, say a radiation counter or similar sensor which is not flight critical, but we’re not going down that path for core sensors.

In my model most airspeed sensors would just publish a pressure, a temperature and an ID to allow for multiple sensors on one node. That’s it.
If you’re referring to a hypotherical airspeed sensor that publishes CAS, then no, I’m not going to do that exercise with you. It is just pointless. If you still have not understood after all my explanations why it is a terrible idea to go down that path for a simple differential pressure sensor then I suspect there is no use discussing it further.

DS-015, in anything like it’s current form, is not something we want.
If the UAVCAN consortium is OK with the ArduPilot community developing an alternative message set which is published for anyone in the UAV community to use and collaborate on then that would be great. I think you’ll find it will suit vendors, users and autopilot projects a lot better than DS-015 and will give us a real way to move v1 forward.
If not then we’ll look at extending v0 to add FDCAN support, larger frames and the ability to extend messages (similar to how mavlink2 allows extensions). In that framework we’d create a new message set. Those are the features we really want and what will provide real benefit to our users.
Cheers, Tridge

VadimZ · May 7, 2021, 9:13pm

There are sometimes valid reasons to choose lower level of abstraction is the physical decomposition on the system.
Among those reasons:
- The design expertise and complexity tolerance are not distributed evenly in the ecosystem. There is significantly more available in AP dev community than in the peripheral development. So by Conway’s law ( architecture follows organization) it makes sense to split the functionality accordingly.
- Debug and diagnostic tools favor more a centralized design
All abstractions are leaky abstractions when put under sufficient pressure. So it takes very deep domain expertise to choose abstraction level. It is a complex tradeoff with an optimum point, that can not take one-sided mandates “move as high as possible”
“thin client” hardware architecture does not equal bad design and “god object” antipattern. There can still be well-designed software on the central node(s) with optimal modularity of architecture.
Meta point 1: I think it serves little useful purpose to load an excellent transport layer with opinions on what is essentially technically independent adjacent layers. UAVCAN v1 as the transport layer can support different system architectures (“sensor networks”, “smart nodes” and everything in between ) equally well. It would be much better to let players with “skin in the game” work to converge to the suitable solution while having the best platform possible to work on. It is entirely possible that different groups would converge on different solutions (Ardupilot vs Nuclear reactor manufacturers association). And none would be abstractly better, just more suited to the respective use case.
- The argument over “air data computer” looks especially redundant in this light: if someone wants to make one, fine. If someone else prefers to make a simple sensor, fine too. Let the market select which one wins (if not both).
- The approach of @tridge and Ardupilot is not in contradiction to “UAVCANv1 as the transport layer”. Any attempts to couple the transport layer with the system design and specific decomposition/architecture decisions are bound to slow adoption and harm the nascent and still fragile ecosystem.
Meta point 2: It would help to foster the spirit of collaboration and good faith if there is less use of terms loaded with negative connotations applied to concepts writer is opposed to. Examples would include “modern” vs “legacy”. State practical downsides, don’t attack emotionally.

tridge · May 7, 2021, 11:14pm

A look at GNSS in DS-015
We’ve spent a lot of time now analyzing airspeed. For a simple float differential pressure it has caused a lot of discussion, but it is time to move onto the true horrors of DS-015. Nothing exemplifies those horrors more than the GNSS message. This will be a long post because it will dive into exactly what is proposed for GNSS, and that is extremely complex.
What it should be
This is the information we actually need from a CAN GNSS sensor:

        uint3 instance
        uint3 status
        uint32 time_week_ms
        uint16 time_week
        int36 latitude
        int36 longitude
        float32 altitude
        float32 ground_speed
        float32 ground_course
        float32 yaw
        uint16 hdop
        uint16 vdop
        uint8 num_sats
        float32 velocity[3]
        float16 speed_accuracy
        float16 horizontal_accuracy
        float16 vertical_accuracy
        float16 yaw_accuracy

It is 56 bytes long, and pretty easy to understand. Note that it includes yaw support, as yaw from GPS is becoming pretty common these days, and in fact is one of the key motivations for users buying the higher end GNSS modules. When a float value in the above isn’t known then a NaN would be used. For example if you don’t know vertical velocity as you have a NMEA sensor attached and those mostly don’t have vertical velocity then the 3rd element of velocity would be NaN. Same for the accuracies that you don’t get from the sensor.
I should note that the above structure is a big improvement over the one in UAVCAN v0, which requires a bunch of separate messages to achieve the same thing.
GNSS in DS-015
Now let’s look at what GNSS would be like using current DS-015. Following the idiom of UAVCAN v1 the GNSS message is a very deeply nested set of types. It took me well over an hour to work out what is actually in the message set for GNSS as the nesting goes so deep.
To give you a foretaste though, to get the same information as the 56 byte message above you need 243 bytes in DS-015, and even then it is missing some key information.
How does it manage to expand such a simple set of data to 243 bytes? Here are some highlights:

there are 55 covariance floats. wow
there are 6 timestamps. Some of the timestamps have timestamps on them! Some of the timestamps even have a variance on them.

I’m sure you must be skeptical by now, so I’ll go into it in detail. I’ll start from the top level and work down to the deepest part of the tree of types

Top Level
The top level of GNSS is this:

#   point_kinematics            reg.drone.physics.kinematics.geodetic.PointStateVarTs   1...100
#   time                        reg.drone.service.gnss.Time                             1...10
#   heartbeat                   reg.drone.service.gnss.Heartbeat                        ~1
#   sensor_status               reg.drone.service.sensor.Status

the “1…100” is the range of update rates. This is where we hit the first snag. It presumes you’ll be sending the point_kinematics fast (typical would be 5Hz or 10Hz) and the other messages less often. The problem with this is it means you don’t get critical information that the autopilot needs on each update. So you could fuse data from the point_kinematics when the status of the sensor is saying “I have no lock”. The separation of the time from the kinematics also means you can’t do proper jitter correction for transport timing delays.

point_kinematics - 74 bytes
The first item in GNSS is point_kinematics. It is the following:

reg.drone.physics.kinematics.geodetic.PointStateVarTs:
   74 bytes
   uavcan.time.SynchronizedTimestamp.1.0 timestamp
   PointStateVar.0.1 value

Breaking down the types we find:

uavcan.time.SynchronizedTimestamp.1.0:
   7 bytes
   uint56

Keep note of this SynchronizedTimestamp, we’re going to be seeing it a lot.

PointStateVar.0.1:
   67 bytes
   PointVar.0.1 position
   reg.drone.physics.kinematics.translation.Velocity3Var.0.1 velocity

Looking into PointVar.0.1 we find:

PointVar.0.1 position:
   36 bytes
   Point.0.1 value
   float16[6] covariance_urt

and there we have our first covariances. I’d guess most developers will just shrug their shoulders and fill in zero for those 6 float16 values. The chances that everyone treats them in a consistent and sane fashion is zero.
Ok, so now we need to parse Point.0.1:

Point.0.1:
   24 bytes
   float64 latitude   # [radian]
   float64 longitude  # [radian]
   uavcan.si.unit.length.WideScalar.1.0 altitude

there at last we have the latitude/longitude. Instead of 36 bits for UAVCAN v0 (which gives mm accuracy) we’re using float64, which allows us to get well below the atomic scale. Not a great use of bandwidth.
What about altitude? That is a WideScalar:

uavcan.si.unit.length.WideScalar.1.0:
   8 bytes, float64

yep, another float64. So atomic scale vertically too.
Back up to our velocity variable (from PointStateVar.0.1) we see:

reg.drone.physics.kinematics.translation.Velocity3Var.0.1:
  31 bytes
  uavcan.si.sample.velocity.Vector3.1.0 value
  float16[6] covariance_urt

so, another 6 covariance values. More confusion, more rubbish consuming the scant network resources.
Looking inside the actual velocity in the velocity we see:

uavcan.si.sample.velocity.Vector3.1.0:
  19 bytes
  uavcan.time.SynchronizedTimestamp.1.0
  float32[3] velocity

there is our old friend SynchronizedTimestamp again, consuming another useless 7 bytes.
Now we get to the Time message in GNSS:

time: reg.drone.service.gnss.Time
  21 bytes
  reg.drone.physics.time.TAI64VarTs.0.1 value
  uavcan.time.TAIInfo.0.1 info

Diving deeper we see:

reg.drone.physics.time.TAI64VarTs.0.1:
  19 bytes
  uavcan.time.SynchronizedTimestamp.1.0 timestamp
  TAI64Var.0.1 value

yes, another SynchonizedTimestamp! and what is this timestamp timestamping? A timestamp. You’ve got to see the funny side of this.
Looking into TAI64Var.0.1 we see:

TAI64Var.0.1:
  12 bytes
  TAI64.0.1 value
  float32 error_variance

so there we have it. A timestamped timestamp with a 32 bit variance. What the heck does that even mean?
Completing the timestamp type tree we have:

TAI64.0.1:
  8 bytes
  int64 tai64n

so finally we have the 64 bit time. It still hasn’t given me the timestamp that I actually want though. I want the iTOW. That value in milliseconds tells me about the actual GNSS fix epochs. Tracking that timestamp in its multiple of 100ms or 200ms is what really gives you the time info you want from a GNSS. Can I get it from the huge tree of timestamps in DS-015? Maybe. I’m not sure yet if its possible.
Now on to the heartbeat. This is where we finally know what the status is. Note that the GNSS top level docs suggest this is sent at 1Hz. There is no way we can do that, as it contains information we need before we can fuse the other data into the EKF.
A heartbeat is a reg.drone.service.gnss.Heartbeat

reg.drone.service.gnss.Heartbeat:
  25 bytes
  reg.drone.service.common.Heartbeat.0.1 heartbeat
  Sources.0.1 sources
  DilutionOfPrecision.0.1 dop
  uint8 num_visible_satellites
  uint8 num_used_satellites
  bool fix
  bool rtk_fix

here we finally find out the fix status. But we can’t differentiate between 2D, 3D, 3D+SBAS, RTK-Float and RTK-Fixed, which are all distinct levels of fix and are critical for users and log analysis. Instead we get just 2 bits (presumably to keep the protocol compact?).
We do however get both the number of used and number of visible satellites. That is somewhat handy, but is a bit tricky as “used” has multiple meanings in the GNSS world.
Looking deeper we have:

reg.drone.service.common.Heartbeat.0.1:
  2 bytes
  Readiness.0.1 readiness
  uavcan.node.Health.1.0 health

which is made up of:

Readiness.0.1:
  1 byte
  truncated uint2 value

uavcan.node.Health.1.0:
  1 byte
  uint2

these are all sealed btw. If 2 bits ain’t enough then you can’t grow it.
Now on to the sources:

Sources.0.1:
  48 bits, 6 bytes
  bool gps_l1
  bool gps_l2
  bool gps_l5
  bool glonass_l1
  bool glonass_l2
  bool glonass_l3
  bool galileo_e1
  bool galileo_e5a
  bool galileo_e5b
  bool galileo_e6
  bool beidou_b1
  bool beidou_b2
  void5
  bool sbas
  bool gbas
  bool rtk_base
  void3
  bool imu
  bool visual_odometry
  bool dead_reckoning
  bool uwb
  void4
  bool magnetic_compass
  bool gyro_compass
  bool other_compass
  void14

so we have lots of bits (using 56 bits) telling us exactly which satellite signals we’re receiving, but not information on what type of RTK fix we have.
What is the “imu”, “visual_odomotry”, “dead_reckoning” and “uwb” doing in there? Does someone really imagine you’ll be encoding your uwb sensors on UAVCAN using this GNSS service? Why??
Diving deeper we have the DOPs:

DilutionOfPrecision.0.1:
  14 bytes
  float16 geometric
  float16 position
  float16 horizontal
  float16 vertical
  float16 time
  float16 northing
  float16 easting

that is far more DOP values than we need. The DOPs are mostly there to keep users who are used them them happy. They want 1, or at most 2 values. We don’t do fusion with these as they are not a measure of accuracy. Sending 6 of them is silly.
Now onto sensor_status:

sensor_status: reg.drone.service.sensor.Status

reg.drone.service.sensor.Status:
  12 bytes
  uavcan.si.unit.duration.Scalar.1.0 data_validity_period
  uint32 error_count
  uavcan.si.unit.temperature.Scalar.1.0 sensor_temperature

yep, we have the temperature of the GNSS in there, along with an “error_count”. What sort of error? I have no idea. The doc says it is implementation-dependent.
The types in the above are:

uavcan.si.unit.duration.Scalar.1.0:
  4 bytes
  float32

uavcan.si.unit.temperature.Scalar.1.0:
  4 bytes
  float32

quite what you are supposed to do with the “data_validity_period” from a GNSS I have no idea.
Ok, we’re done with what is needed for a GNSS that doesn’t do yaw, but as I mentioned, yaw from GNSS is one of the killer features attracting users to new products, so how would that be handled?
We get this:

# Sensors that are able to estimate orientation (e.g., those equipped with IMU, VIO, multi-antenna RTK, etc.)
# should also publish the following in addition to the above:
#
#   PUBLISHED SUBJECT NAME      SUBJECT TYPE                                            TYP. RATE [Hz]
#   kinematics                  reg.drone.physics.kinematics.geodetic.StateVarTs        1...100

so, our GNSS doing moving baseline RTK for yaw needs to publish reg.drone.physics.kinematics.geodetic.StateVarTs, presumably at the same rate as the above. For ArduPilot we fuse the GPS yaw in the same measurement step as the GPS position and velocity, so we’d like it at the same rate. We could split that out to a separate fusion step, but given yaw is just a single float, why not send it at the same time?
Well, we could, but in DS-015 it takes us 111 bytes to send that yaw. Hold onto your hat while I dive deep into how it is encoded.

reg.drone.physics.kinematics.geodetic.StateVarTs:
  111 bytes
  uavcan.time.SynchronizedTimestamp.1.0 timestamp
  StateVar.0.1 value

another SynchronizedTimestamp. Why? Because more timestamps is good timestamps I expect.
Now into the value:

StateVar.0.1:
  104 bytes
  PoseVar.0.1 pose
  reg.drone.physics.kinematics.cartesian.TwistVar.0.1 twist

yep, our yaw gets encoded as a pose and a twist. I’ll give you all of that in one big lump now, just so I’m not spending all day writing this post. Take a deep breath:

StateVar.0.1:
  104 bytes
  PoseVar.0.1 pose
  reg.drone.physics.kinematics.cartesian.TwistVar.0.1 twist

reg.drone.physics.kinematics.cartesian.TwistVar.0.1:
  66 bytes
  Twist.0.1 value
  float16[21] covariance_urt

Twist.0.1:
  24 bytes
  uavcan.si.unit.velocity.Vector3.1.0 linear
  uavcan.si.unit.angular_velocity.Vector3.1.0 angular

PoseVar.0.1:
  82 bytes
  Pose.0.1 value
  float16[21] covariance_urt

Pose.0.1:
  40 bytes
  Point.0.1                           position
  uavcan.si.unit.angle.Quaternion.1.0 orientation

uavcan.si.unit.angular_velocity.Vector3.1.0:
  12 bytes
  float32[3]

uavcan.si.unit.velocity.Vector3.1.0:
  12 bytes
  float32[3]

Point.0.1:
  24 bytes
  float64 latitude   # [radian]
  float64 longitude  # [radian]
  uavcan.si.unit.length.WideScalar.1.0 altitude

uavcan.si.unit.angle.Quaternion.1.0:
  16 bytes
  float32[4]

phew! That simple yaw has cost us 111 bytes, including 42 covariance variables, some linear and angular velocities, our latitude and longitude (again!!) and even a 2nd copy of our altitude, all precise enough for quantum physics. Then finally the yaw itself is encoded as a 16 byte quaternion, just to make it maximally inconvenient.
Conclusion
if you’ve managed to get this far then congratulations. If someone would like to check my work then please do. Diving through the standard to work out what actually goes into a service is a tricky task in UAVCAN v1, and it is very possible I’ve missed a turn or two.
The overall message should be pretty clear however. The idiom of DS-015 (and to a pretty large degree UAVCAN v1) is “abstraction to ridiculous degrees”. It manages to encode a simple 56 byte structure into a 243 byte monster, spread across multiple messages, with piles of duplication.
We’re already running low on bandwidth with v0 at 1MBit. When we switch to v1 we will for quite a long time be stuck at 1MBit as there will be some node on the bus that can’t do higher rates. So keeping the message set compact is essential. Even when the day comes that everyone has FDCAN the proposed DS-015 GNSS will swallow up most of that new bandwidth. It will also swallow a huge pile of flash, as the structure of DS-015 and of v1 means massively deep nesting of parsing functions. So expect the expansion in bandwidth to come along with an equally (or perhaps higher?) expansion in flash cost.
The DS-015 “standard” should be burnt. It is an abomination.

joshwelsh · May 8, 2021, 12:28am

I’m trying to stay somewhat neutral in this, but the amount of extra data on the bus is unacceptable; as someone leading a project to deliver a UAVCAN-based GNSS to the market in the $2000USD price range, as well as someone driving adoption of UAVCAN v1 on to our actuators, fix this Pavel. The time for debate has passed, collaborate or we’re going to consider leaving.

proficnc · May 8, 2021, 7:07am

Our hardware will be following Tridge’s recommendation here, why put the effort in to move to FDCAN just to throw it away on wasted data.

The experimental continued reinventing of UAVCAN needs to to stop! Improvements are fine, but this is just going backwards @pavel.kirienko

JacobCrabill · May 8, 2021, 7:19pm

This topic has gotten far out of hand. The hostility and unwillingness to collaborate is not productive.

As someone with some stake in the UAVCAN v1 / DS-015 game, I decided I should step in to try to make this discussion more constructive - because this is a totally solvable problem, and the rancor here is hiding how simple I think it could be to solve.

@tridge and others - you seem to be assuming that as soon as DS-015 was “released”, it was carved in stone and no longer subject to any further modifications. This is absolutely not the case. If you had been present in the discussions surrounding DS-015, I think you would have very different context on it. I wasn’t too actively involved in most of the discussions, but I did follow along for many of them, and my main takeaway was that the decision in the end was along the lines of “we don’t know how this is going to work until we try it, so let’s just release something to get the ball rolling and iterate from experience”. At no point, from my perspective, did I hear the opinion that the current form of DS-015 was expected to be the end-all be-all of UAVCAN drone communications.

So please, instead of foaming at the mouth at how bad DS-015 is, let’s be reasonable and start working together on improving the situation. To @tridge’s point, the GNSS service, in its current form, is rather abysmal. In hindsight, however, I’m not convinced anyone had previously bothered to go through the details of looking at the full size of the service as defined - it was designed at the level of the abstract types, and that’s as far as it went. So now is the time to iterate on that.

For additional context (at least from my tangentially-involved perspective; apologies if I’m putting words in anyone’s mouths), the goal was to release something just so we could stop debating the details and go try to create a real proof of concept implementation, because none of this matters if it doesn’t get built into a real system. That’s the state that PX4 is currently working towards - the UAVCAN v1 infrastructure is still being implemented, with the composable services of DS-015 being used to guide what that implementation looks like (specifically, driving us towards dynamic, reconfigurable Publisher/Subscriber objects from
which to build DS-015-like services). Most of the work is actually being done by @PetervdPerk who has (to my knowledge) really only focused on the Battery service, while I have focused on the
ESC and Servo services. We haven’t gotten to a full GNSS service, or any others for that matter.

So this is my roundabout way of saying yes, I agree that DS-015 needs to change, and I don’t think you’ll find any vocal arguments to the contrary. But before it’s completely overhauled, we need to find the issues and propose alternatives. The way in which the PX4 community has been doing that is by just going out and building it, which I think is far more constructive than any number of critical forum posts.

So in the spirit of collaboration and revising DS-015, here’s a rough proposal for a revised
GNSS service, for the sake of discussion (based on @tridge’s recommendation):

# Revised GNSS Service:
reg.drone.service.gnss.TimeWeek.0.1 gnss_time_week # Proposed new topic
  6 bytes
  + uint32 time_week_ms
  + uint16 time_week
reg.drone.physics.kinematics.Point.0.1
  24 bytes
  + float64 latitude
  + float64 longitude
  + uavcan.si.unit.length.WideScalar
    + float64
uavcan.si.unit.velocity.Vector3.0.1
  6 bytes
  + float32[3]
reg.drone.service.gnss.Status.0.2 dop # Proposed new topic -- placeholder name
  6 bytes
  + uint16 hdop
  + uint16 vdop
  + uint8 num_sats
  + uint3 fix_type

# Optional Topics
reg.drone.service.gnss.Accuracy.0.1 # Proposed new topic
  8 bytes
  + float16 speed_accuracy
  + float16 horizontal_accuracy
  + float16 vertical_accuracy
  + float16 yaw_accuracy
reg.drone.service.common.GroundTrack.0.0 # (Very rough) Proposed new topic
  6 bytes
  + uavcan.si.unit.velocity.Scalar.1.0 ground_speed
    + float32 meter_per_second
  + uavcan.si.unit.velocity.Scalar.1.0 ground_course
    + float32 radian
  + uavcan.si.unit.velocity.Scalar.1.0 heading
    + float32 radian

Total Bytes:
6 + 24 + 6 + 6 + 8 + 6 = 56 (14 bytes of which are optional).

(Omitted: The standard Heartbeat message that all nodes must publish).

Note that this keeps the UAVCAN v1 mindset of using a composition of basic types to do most
of the work rather than a single message (like Fix or Fix2) that not everyone agrees on,
and adds in DS-015 specific types when the basic UAVCAN types don’t suffice, with those more specific types split up such that a change to one won’t affect the others, and allows some of the data to be an optional part of the service.

This same approach can be taken to all of the services defined by DS-015, and future services
that need to be added to it (e.g. rangefinders, optical flow sensors, IMUs, VectorNav-type
units, …).

Also note that, if we add perhaps just a little more clarification and detail to the port-naming
conventions, we can very easily develop plug & play support around all of these services,
so the hobbyist ArduPilot user can plug & play with new devices to their heart’s content,
while still letting the professional / commercial integrators fine-tune the system to their
own specifications.

Let’s try to remain constructive here, friends, and work towards a better solution!

pavel.kirienko · May 8, 2021, 7:55pm

Differential pressure sensor demo

@tridge It is great that you have moved on to analyze other parts of DS-015, but I would like to reach some sort of conclusion regarding the airspeed sensor node (mind the difference: not an air data computer, so not an idiomatic DS-015) before switching the topic. I proposed that we construct a very simple demo based on your inputs. I did that yesterday; please, do have a look (@scottdixon also suggested that we make the repository public, so it is now public):

I hope this demo will be a sufficient illustration of my proposition that DS-015 can be stretched to accommodate your requirements (at least in this case for now). In fact, as it stands, the demo does not actually leverage any data types from the reg.drone namespace, not that it was an objective to avoid it.

May I suggest that you run it locally using Yakut and poke it around a little bit? I have to note though that Yakut is a rather new tool; if you run into any issues during its installation, please, open a ticket, and sorry for any inconvenience. You may notice that the register read/write command is terribly verbose, that’s true; I should improve this experience soon (this is, by the way, the part that can be automated if the aforementioned plug-and-play auto-configuration proposal is implemented).

We can easily construct additional demos as visual aids for this discussion (it only takes an hour or two).

Goals and motivation

I risk repeating myself again here since this topic was covered in the Guide, but please bear with me — I don’t want any accidental misunderstanding to poison this conversation further.

My reference to Torvalds vs. Tanenbaum was to illustrate the general scope of the debate, not to recycle the arguments from a different domain. We both know that distributed systems are commonly leveraged in state-of-the-art robotics and ICT. I am not sure if one can confidently claim that “distributed systems won” (I’m not even sure what would that mean exactly), but I think we can easily agree that there exists a fair share of applications where they are superior. Avionics is one of them. Robotic systems are another decent example — look at the tremendous ecosystem built around ROS!

It is my aspiration (maybe an overly ambitious one? I guess we’ll see) to enable a similar ecosystem, spanning a large set of closely related domains from avionics to robotics and more, with the help of UAVCAN. It is already being leveraged in multiple domains, although light unmanned aviation remains, by far, the largest adopter (this survey was also heavily affected by a selection bias, so the numbers are only crude approximations):

Requirements to LRU and software components between many of these domains overlap to a significant extent. It is, therefore, possible to let engineers and researchers working in any of these fields be able to rely on the advances made in the adjacent areas. I am certain that many business-minded people who are following this conversation will recognize the benefits of this.

UAVCAN v1 is perfectly able to serve as the foundation for such an inter-domain ecosystem, but it will only succeed if we make the first steps right and position it properly from the start. One of the preconditions is that we align its design philosophy with the expectations of modern-day experts, many of whom are well-versed in software engineering. This is not surprising, considering that just about every sufficiently complex automation system being developed today — whether vehicular, robotic, industrial — is software-defined.

The idea of UAVCAN becoming a technology that endorses and propagates flawed design practices like the specimen below keeps me up at night. I take full responsibility for it because an engineer working with UAVCAN v0 simply does not have access to adequate tools to produce good designs.

You might say that a man is always free to shoot himself in the foot, no matter how great the tool is. But as a provider of said tool, I am responsible to do my part at raising the sanity waterline, if only by a notch. Hence the extensive design guide, philosophy, tropes, ideologies, and opinionated best practices. Being a mere human, I might occasionally get carried away and produce overly idealistic proposals, which is why I depend on you and other adopters to keep the hard business objectives in sight. My experience with the PX4 leadership indicates that it is always possible to find a compromise between immediate interests and long-term benefits of a cleaner architecture by way of sensible and respectful dialogue.

Your last few posts look a bit troubling, as they appear to contain critique directed at your original interpretation of the standard while not taking into account the corrections that I introduced over the course of this conversation. Perhaps the clarity of expression is not my strong suit. The one thing that troubles me most is that you appear to be evaluating DS-015 as a sensor network rather than what it really is. I hope the demos will make things clearer.

On GNSS service

I think I understand where you are coming from. We had a lengthy conversation a year and a half ago about the deficiencies of the v0 application layer, where we agreed it could be improved; you went on to list the specifics. I suppose I spent enough time tinkering with v0 and its many applications to produce a naïve design that would exactly match your (and many other adopters) expectations. Instead, I made DS-015. Why?

When embarking on any non-trivial project, one starts by asking “what are the business requirements” and “what are the constraints”, then apply constrained optimization to approximate the former. With DS-015, the requirements (publicly visible subset thereof) can be seen here: DS-015 MVP progress tracking. One specific constraint was that it is to be possible to deploy a basic DS-015 configuration on a small UAV equipped with a 1 Mbps Classic CAN bus. If the constraint is satisfied and the business requirements are met, the design is acceptable. I suppose it makes sense.

One might instead maximize an arbitrary parameter without regard for other aspects of the design. For example, it could be the bus utilization, data transfer latency, flash space, how many lines of code one needs to write to bring up a working system, et cetera. Such blind optimization is mostly reminiscent of games or hobby projects, where the process of optimization is the goal in itself. This is not how engineering works.

At 10 Hz, the example 57-byte message you’ve shown requires 57 bytes * 10 Hz = 570 bytes per second of bandwidth, or 90 Classic CAN frames per second. At 1 Mbps, the resulting worst-case bus load would be about 1%.

At the same 10 Hz, the DS-015 GNSS service along with a separate yaw message requires 156 frames per second, thereby loading the bus by 2%:

The yaw is to be represented using uavcan.si.sample.angle.Scalar. The kinematic state message is intended only for publishers that are able to estimate the full kinematic state, which is not the case in this example.

Is DS-015 less efficient? Yes! It is about twice less efficient compared to your optimized message, or 1% less efficient, depending on how you squint. Should you care? I don’t think so. You would need to run just about 500 GNSS receivers on the bus to exhaust its throughput. Then you will be able to selectively disable subjects that your application doesn’t require to conserve more bandwidth (this is built into UAVCAN v1).

If you understand the benefits of service-oriented design in general (assuming idealized settings detached from our constraints), you might see then how this service definition is superior compared to your alternative, while having negligible cost in terms of bandwidth. I, however, should stop making references to the Guide, where this point is explained sufficiently well.

I should also address your note about double timestamping in reg.drone.physics.time.TAI64VarTs. In robotics, it is commonly required to map various sensor feeds, data transfers, and events on a shared time system — this enables complex distributed activities. In UAVCAN, we call it “synchronized time”. Said time may be arbitrarily chosen as long as all network participants agree about it. In PX4-based systems (maybe this is also true for ArduPilot?), this time is usually the autopilot’s own monotonic clock. In more complex systems like ROS-based ones, this is usually the wall clock. Hence, this message represents the GNSS time in terms of the local distributed system’s time, which is actually a rather common design choice.

Timestamping of all sensor feeds also allows you to address the transport latency variations since each sample from time-sensitive sensor feeds comes with an explicit timestamp that is invariant to the transmission latency.

Regarding the extra timestamp in reg.drone.physics.kinematics.translation.Velocity3Var: this is a defect. The nested type should have been uavcan.si.unit.velocity.Vector3 rather than uavcan.si.sample.velocity.Vector3. @coder_kalyan has already volunteered to fix this, thanks Kalyan.

As for the excessive variance states, you simply counted them incorrectly. This is understandable because crawling through the many files in the DS-015 namespace is unergonomic at best. The good news is that @bbworld1 is working to improve this experience (at the time of writing this, data type sizes reported by this tool may be a bit nonsensical, so beware):

https://bbworld1.gitlab.io/uavcan-documentation-example/reg/Namespace.html

It is hard not to notice that your posts are getting a bit agitated. I can relate. But do you not see how a hasty dismissal may have long-lasting negative consequences on the entire ecosystem? People like @proficnc, @joshwelsh, and other prominent members of the community look up to you to choose the correct direction for ArduPilot, and, by extension, for the entire world of light unmanned systems for a decade to come. We don’t want this conversation to end up in any irresponsible decisions being made, so let us please find a way to communicate more sensibly.

I don’t want to imply that the definitions we have are perfect and you are just reading them wrong. Sorry if it came out this way. I think they are, in fact, far from perfect (which is why the version numbers are v0.1, not v1.0), but the underlying principles are worth building upon.

Should we craft up and explore a GNSS demo along with the airspeed one?

Questions

Lastly, I should run through the following questions that appear to warrant special attention.

I implied no such thing. Sorry if I was unclear.

I agree this is useful in many scenarios, but the degree to which you can make the system self-configurable is obviously limited. By way of example, your existing v0 implementation is not fully auto-configurable either, otherwise, there would be no bits like this:

What I am actually suggesting is that we build the implementation gradually. We start with an MVP that takes a bit of dedication to set up correctly. Then we can apply autoconfiguration where necessary to improve the user experience. Said autoconfiguration does not need to require active collaboration from simple nodes, if you read the thread I linked.

Observe that the main product of the GNSS service is the set of kinematic states published over separate subjects. These subjects are completely abstracted from the means of estimating the vehicle’s pose/position. Whether it is UWB, VO, or any related technology, the output is representable using the same basic model.

I approve of your intent to move the conversation into a more constructive plane. Although before we propose any changes, we should first identify how exactly @tridge’s business requirements differ, and why. For example, it is not clear why the iTOW is actually necessary and how is it compatible with other GNSS out there aside of GPS; I suspect another case of an XY problem, but maybe we don’t have the full information yet (in which case I invite Andrew to share it).

coder_kalyan · May 8, 2021, 8:38pm

Hey all,

Thanks @JacobCrabill for pushing for a more constructive conversation. The only way we will ever actually improve the standard is with constructive criticism and real world examples, not heated and pointless arguments.

Regarding the extra timestamp in reg.drone.physics.kinematics.translation.Velocity3Var: this is a defect. The nested type should have been uavcan.si.unit.velocity.Vector3 rather than uavcan.si.sample.velocity.Vector3. @coder_kalyan has already volunteered to fix this, thanks Kalyan.

No problem! I’ll fix this soon, and we can take another look at the other service types to make sure there are no other defects of a similar type. @tridge If you have any other specific examples of defects like this, please let me know and I’ll be happy to fix them.

Should we craft up and explore a GNSS demo along with the airspeed one?

This is a good time to mention that I am (in my spare time) developing an independent firmware for the CUAV NEO V3 Pro CAN GPS with UAVCAN v1. The board has a fast enough processor (F4) to do a fair bit of calculation, and a standard UAV GPS puck sensor set (GNSS, magnetometer, barometer, arming switch) but no IMU, so an EKF can’t/shouldn’t be directly used. Now, I am aware that this may be pointless in the long run, as the infrastructure built up already by AP_Periph and PX4 cannode is quite extensive and hard to beat. However, while I embarked on this with the goal of learning UAVCAN, I realize that this may be a good opportunity; an open and independently developed example will demonstrate a GNSS CAN node in a real world scenario (not just a Linux socketcan demonstration with fake data) as well as allow us to verify that the message set is what we want and make modifications as necessary. I will post a link to this demo once it starts to take shape. Perhaps as importantly, I am also looking to implement support in PX4 on the autopilot side to make sure the data published is sufficiently usable to actually fly a drone.

@pavel.kirienko and I have also been discussing a similar port of Sapog (an independent open source brushless ESC firmware) to UAVCANv1, which will be a good opportunity to stress test the ESC/actuator services. I am especially interested in testing the real world performance considerations for the ESC service, which unlike the low rate GNSS service, causes considerable load on the bus. (The hardware is not FD capable).

What I am actually suggesting is that we build the implementation gradually. We start with an MVP that takes a bit of dedication to set up correctly. Then we can apply autoconfiguration where necessary to improve the user experience. Said autoconfiguration does not need to require active collaboration from simple nodes, if you read the thread I linked.

I agree; let’s focus on an MVP that is robust and otherwise meets criteria, and then we can slowly improve the pnp experience without compromising on other design goals.

tridge · May 8, 2021, 10:33pm

Thank you, that is a nice demo. I’ve run in on Linux and it highlights some issues nicely.
The first issue I hit was having to remove the old uavcan python module to run any of the v1 demos. This may seem trivial, but it really isn’t. Remember how I’ve harked on so many times about the critical importance of v0/v1 coexistence? If the basic tools can’t even be installed at the same time then we have no hope of v0 and v1 coexisting. Coexistence should be a fundamental aim at all levels. If we don’t fix this then it will become a major impediment to adoption of v1.
The next thing the demo highlights is the cost of the separate subjects for things that really should be one subject. Having temperature and differential pressure as separate subjects costs in several ways:

it doubles the bandwidth cost, as it has to be two frames for something that easily fixes in one frame (assuming bxCAN 8 byte frames)
it makes for more user confusion and opportunity for mis-configuration, as they need to configured two subject IDs. We will end up with users that have two airspeed sensors lumping the temperature of one with the pressure of another.
it clutters user interfaces with double the subjects
it doubles the sensor status costs, with the periodic announcements, which are very large

On that last point, I’d like to understand what is going on in these packets in the demo:

 (1620509330.669797)  vcan0  TX B -  1C7D567D  [64]  0A 00 00 00 01 04 55 1D 56 1D 64 00 65 00 02 00 00 00 01 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 A7
 (1620509330.669814)  vcan0  TX B -  1C7D567D  [64]  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07
 (1620509330.669831)  vcan0  TX B -  1C7D567D  [48]  00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 40 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 F2 67
 (1620509330.679558)  vcan0  TX B -  0C60647D   [5]  C5 B8 20 4D E0            '.. M.'
 (1620509330.680321)  vcan0  TX B -  1C7D567F  [64]  0A 00 00 00 01 04 55 1D 56 1D E5 1F E6 1F 01 00 00 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 A8
 (1620509330.682447)  vcan0  TX B -  1C7D567F  [64]  00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08
 (1620509330.682752)  vcan0  TX B -  1C7D567F  [32]  00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 DC 3D 68

Once a second we’re paying for 336 bytes of mostly zeros. I presume this is the subject and node announcements? Can I use yakut to parse these to see exactly what we’re getting for such a large cost?
I know I am emphasizing bandwidth a lot, and you don’t seem to see it as important. It really does matter. We really are running out of bandwidth on v0 already. The v1 protocol is setting itself up to be a death of a thousand cuts on bandwidth. The fundamental decisions in v1 are doubling (and in many cases much worse) the cost of encoding the data we need.
The demo and the discussion we’re having also highlights very clearly that we don’t seem to have the tools we need to keep bandwidth under control. It takes far too much effort to work out what the bandwidth cost of a message set is.

Distribution is great when applied intelligently. I’ve worked on distributed systems since the 90s, and I have no problem with distribution of services when done well. What I do have a problem with is applying it in what seems to me to be a dogmatic manner when the distribution of the service in question runs counter to the other important engineering goals for the project. That is what happened with our debate on CAS for airspeed sensors. Distribution of the maths of airspeed calibration hurts the system and it should not have taken a big debate to see that.

ROS is great, but that doesn’t mean mimicking it in UAVCAN is good. It focuses on quite a different domain.
Where ROS shines is:

high level, low update rate applications (drive a rover, SLAM algorithms etc)
academic environments where the end user is highly technical
relatively high bandwidth environments

You don’t see ROS being used to directly control the ESCs on a consumer quadcopter. I’m sure someone could point me at an example of where someone has done this, but I’m sure it won’t produce a good result as the architecture of ROS is poor at time critical operations like this.
In UAVCAN we do care about time critical, bandwidth sensitive applications. A multirotor is a very good example.
That doesn’t mean we can’t build a great ecosystem around UAVCAN. In fact, we have built a great ecosystem around UAVCAN already, using v0. There are flaws in it, just like there are flaws in every system, but it did become a widespread, robust CAN ecosystem. You should be very proud of what you achieved. I just don’t want those successes to be lost with v1 because the fundamental building blocks and the way they are applied are inappropriate to the task.

I complete agree with this, but I think that the execution on this has not been good.

I’m a software engineer, and I think the application of the design philosophy that I’ve seen so far in UAVCAN v1 is not good. It neglects some really key principles:

clarity: it should not take an hour to work out what the actual network cost of a message structure will be. That lack of clarity is what made the GNSS DS-015 example so bad.
user experience: the proliferation of separately configurable subjects for what in the users mind is one logical piece makes for a poor experience
migration path: we’re not bringing v1 into a vacuum. v1 will only succeed if it makes the process of coexisting with v0 very easy. Look at how I managed the migration from mavlink 0.9 to 1.0 to 2.0 to see something that is largely seamless and easy for users.

and there is a major problem. The principal way it would be used is as a sensor network. There are other use cases I’m sure, but a sensor network is going to be 99% of the use cases. So it must be a darn good sensor network. It isn’t.

That is not sufficient. With current v0 and a simple vehicle we do have plenty of bandwidth, but with a more complex vehicle we don’t. Setting the bar so low as to find it acceptable as long as a simple vehicle fits in 1Mbps results in a design that does not scale well to more complex vehicles.

it sure is how engineering works. Efficient use of resources is a central tenet of professional engineering. In my experience it tends to be hobby projects that discard that principle. A huge number of professional engineers spend their lives optimising for the real world constraints of the field they work in. Think about aerodynamics - hobbyists throw together a bit of foam and balsa and are happy when they can fly around a park for a few minutes. Professional aerodynamic engineers spend millions on CFD, wind-tunnel tests, advanced materials, all to make a bit more efficient use of the environment the vehicle is in.
Good network protocol design does care about bandwidth. I’m not talking about the java coders of the world making banking apps - those tend to be bloated monsters and they just throw a few more servers at it. I’m talking about professional design of a network meant to be used for realtime control. That is the field we are in, and we must think in those terms.

and yet the same could be said of pretty much every v0 and v1 message, but I can assure you that real world complex vehicles are running out of bandwidth already. By dismissing those concern you are precluding the application of v1 to the most interesting professional applications.

no thank you. uavcan.si.sample.angle.Scalar costs us in several ways:

it includes yet another useless timestamp
it is 11 bytes, so 2 frames, when it should be included in the base GNSS message
it doesn’t give us the yaw accuracy, so we’d need another two frames to get that

yet that isn’t what the DS-015 spec says - it says we use the kinematic state message for systems that do multi-antenna RTK, which is exactly the case here.

I very much care. I don’t want to be explaining to the next company building a complex vehicle that they can’t achieve what they want because some vague philosophy meant we have to lose half their bandwidth, especially when it is actually worse than a 2x cost.

No, it is not superior. It is worse in so many ways:

the bandwidth, which matters
the user complexity of multiple subjects, which matters. It is not negligible.
the disassociation of data that should be associated in time. We want to fuse yaw at the same time as fusing velocity and position. The yaw is generated on the same GNSS time horizons as the velocity and position
the use of so many useless timestamps
the massive proliferation of covariances, which are ill-defined and massively inefficient
the separation of status information from the data it is logically associated with

It is a fundamentally poor design.

We use a time jitter removal system that does not require synchonization. Each device can operate within its own time domain and yet the recipient can correct any incoming timestamp into its own time domain with zero network cost.

yep, I’m rather familiar with timestamping as I wrote the timing correction code for ArduPilot. I’m not arguing for removing all timestamps. I’m saying we only need 1 timestamp for the complete GNSS message set. If internally PX4 wants to replicate that into dozens of timestamps they can do that. We don’t need them on the wire.

If that is the only defect you see from my analysis of the GNSS message set then you really don’t understand. It is all defects from top to bottom.

Please point out specifically where I miscounted. I tried to be pretty careful. It is 6+6+21+21 if we include the yaw (which as I pointed out, is following the spec as written). That is 54 - ok, so I did miscount, I said 53 when it is 54.
We should stop using covariances completely in these sensor messages. They just cause confusion and are a poor representation of what we’ve got available from the sensors. We should send data that has a clear representation of what real sensors can provide. For GNSS that is accuracy numbers as listed in my small example.

My posts are indeed agitated. That is a result of how much of a hole v1 has dug itself into with its design goals, and you not seeming to understand just how deep in a pit you are.
When the v1 design goals first came out I thought “ok, a bit odd, but let’s see how this pans out”. Now I see how it pans out I am horrified. The realization of the design goals has resulted in a very poor system. That needs to be fixed for v1 to have a chance.

tridge · May 8, 2021, 11:41pm

We support all of those technologies in ArduPilot, but as the underlying physical mechanisms are very different we should not represent them with the one message. The types of errors that a VO system has is very different from GNSS. The handling of origins is very different. An autopilot really does need to treat them quite differently if it wants to be robust.
In a vicon lab it is fine to represent vision data as a GNSS service as a quick hack to get something going. The environment is highly constrained, the vision system of extremely high quality, and the consequences of error is small (a quad hits the nets at the side). In a professional UAS setting the two do need to be separated as they are fundamentally different.

A GNSS (not just GPS) operates on a discrete time interval with a delayed time horizon. Internally a GNSS does signal tracking on a very fine time scale, but the output is on discrete time steps. You can monitor that discrete time internal process using the iTOW. By doing the jitter correction for transport delays against the iTOW we eliminate not just the CAN transport delays but also the transport delays within the CAN sensor caused by UART timing jitter and the jitter from the internal processing load inside the sensor (eg. inside a u-blox module).
With something like a moving baseline RTK setup where two GNSS are cooperating, the iTOW is what links the times between the units. The autopilot being able to see the iTOW from both units allows it to properly handle both sets of data.

in that example the defaults are correct for one UAVCAN rangefinder. So for most people no config is needed. A good principle of design in software engineering is to make the common cases easy and the more complex cases possible.

tridge · May 9, 2021, 3:36am

My language is strong because the we’re close to the point of abandoning plans to do v1 at all. I hoped that the illustration of how the rigid design principles of v1, when applied to real sensors results in such a poor result would lead to a rethink so that we end up with something that is fit for purpose.

and that is one of the problems. The v1 design principles encourage this sort of poor design because it is actually pretty difficult to see the relationship between the message design and the wire representation.

thanks - it is missing the accuracy numbers and missing the yaw, but closer. I also think it should not split it up as separate subjects. We should get all the GNSS data in one packet. That removes a bunch of overhead, and prevents the recipient having to match timestamps to align the data.
My structure also wasted a bit of space as it had redundant grouund_course and ground_speed, which is a historical thing from early days of ArduPilot and should not be in the UAVCAN packet.

Sending those optional bytes as a separate message is not a good idea. All reasonably consumers of this data will need it, and the overhead in terms of code complexity, framing overheads, user confusion with the proliferation of published subjects means it is so much better to include it in the packet.

I think that is pretty clearly a mistake in this case.

tridge · May 9, 2021, 11:23am

Proposal for CANDevices Repository
I’d like to make a more concrete proposal for how we can move forward with sensor and device oriented messages that specifically aim to cover the very important use case of a set of sensors on the bus while being robust and efficient.
I propose creating a new CANDevices git repo under github.com/ArduPilot which we will populate with UAVCAN v1 messages for the key devices (at least GNSS, mag, baro, airspeed, rangefinder and likely a few more).
With Pavel’s permission we’d also like a section on this UAVCAN forum to be created specifically for discussion of this CANDevices message set. That gives a common location for discussion between vendors and autopilot stack developers.
Changes to CANDevices would be by pull requests against the CANDevices repo.

Idiom of CANDevices
The messages in CANDevices would be designed to be a regulated message set, and meant to be brought in as a git submodule. The design of the messages would follow the efficient/clear/robust approach I’ve been advocating above. It would not follow the v1 idiom of separate topics for a single logical device (eg. barometer would be one topic containing both pressure and temperature, not two topics). This would be explained in the top level README.md for the repo.
The AP_Periph firmware would implement CANDevices, plus the core UAVCAN v1 message set, along with v0. For boards with sufficient flash selection of v0/v1 will be by parameter allowing for only v0, only v1 or both.
The CANDevices repo will also offer a nice chance for people to point out where I screw up messages for v1, just like I’ve been (very) critical of the DS-015 message set. As I’ve never created a v1 message before it seems likely I’ll make mistakes and I hope we can sort those out on the CANDevices section of the UAVCAN forum.

A Device Class Id
I am also considering having a common uint8 device class ID as the first byte in every message in CANDevices. This is to address a concern I have about the fragility of the UAVCAN v1 subject-id configuration system. There would be a text file in the root of CANDevices where these class IDs are allocated by pull request. The IDs would be high level devices classes, such as “GNSS”, “Barometer”, “Magnetometer”.
The aim of this ID is to allow for mis-configuration detection at the point the data is entering the consuming node. So for example in the ArduPilot code structure, the AP_Compass_UAVCAN.cpp driver would check this field then it gets a message and if it isn’t the right class ID then it would signal a mis-configuration error, prevent arming, and notify the user of the error.
The motivation for this ID is this sort of scenario:

a user is out at the field about to do an important flight. One of their UAVCAN v1 barometers is misbehaving, and it is causing a pre-arm error.
the user decides to swap it out, so pulls the misbehaving device off the vehicle and replaces it with one from their box of spares.
the spare was previously used on a different vehicle, perhaps a vehicle running a different firmware version, it may have even been previously on a copter when they are currently configuring a plane. On that previous vehicle the barometer had been allocated a PnP subject ID which it had stored in its parameters.
when the spare baro is plugged into the new vehicle it has no idea that it has been sitting in the spares box for 6 months. It has no realtime clock, no battery, so no way of knowing that something might be wrong. It thus immediately takes its subject ID configuration and starts publishing barometer data.
the subject ID it inherited from the other vehicle happens to be one allocated for an airspeed sensor on this new vehicle.
as V1 has no indicator in the wire format of what message format the data is, and doesn’t even have a structural checksum like mavlink, the baro data effectively gets “network cast” into differential pressure data by the recipient.
when the user turns on the vehicle it starts getting garbage for airspeed. The EKF might or might not be able to detect this, especially if it is a backup sensor.

Is there any existing mechanism in V1 that protects against scenarios like this? If so, can someone explain what it is?
Adding a 1 byte sensor class ID at the front of all CANDevices messages seems like a cheap way to auto-detect this type of config error for critical sensors, unless there is some mechanism for robustly detecting mis-configuration that I’ve missed. Note that this 8 bit ID is just another field in a v1 message as far as UAVCAN v1 is concerned. The meaning is given by the conventions of the message set and the table committed into the repo.

proficnc · May 9, 2021, 12:50pm

@tridge Cubepilot / Hex / ProfiCNC are in support of your proposal.

We make hardware that is used by both the Ardupilot and PX4 communities.

As the longest continuously running hardware project in the PX4 / Ardupilot community, the Pixhawk/ Cube hardware has been designed with CAN at its heart and we have dedicated considerable expense and effort in the support of UAVCAN.

As a significant stakeholder in the UAVCAN user base, we spoke up at the very beginning of this V1 push, and we were ignored by those that wanted to fragment the community by dumping support for the original UAVCAN.

Calling it experimental well after release, dropping its designation from simply UAVCAN to 0.9, and now v0 has eroded the industry’s trust in this standard.

We have bet our company future on UAVCAN, @pavel.kirienko, stop letting us down! You have one of the wisest people in this whole industry giving you absolutely brilliant advice, please bury your pride and listen to @tridge.

What Tridge is offering is a massive olive branch.

mrpollo · May 9, 2021, 7:31pm

Thanks for posting your concerns, @tridge, and thanks for clarifying your good intentions. They are very well received. I want to start by stating that I agree with you on most of the issues you have raised on this thread, and I think we should definitely find a way to resolve any technical deficiencies in the standards (UAVCAN v1 & DS-015)

I’m not going to weigh in on any of the technical discussions. I can say that DS-015’s original goal was to define a Drone message set for the UAVCANv1 implementation in the same spirit as the v0 messages, but with some learnings applied, and I think that goal hasn’t changed, nor or willingness to fulfill its mission.

As @dagar pointed out above, we are not that far from being fully aligned, and I propose we work together to define the message set that works best for the drone industry.

Given the history behind APM and PX4 and how both communities still have members that continue to exacerbate the situation, we are open to discuss how to best structure governance and where and in which form so everybody feels that we have neutral grounds.

tridge · May 9, 2021, 11:54pm

thanks Ramon. I’ll wait till Pavel has had a chance to consider my CANDevices proposal before proceeding.
I also think we need to urgently address a few associated issues:

re-releasing pyuavcan for v0 as a “pyuavcan_v0” module in pip so it can be installed alongside the v1 variant (alternatively the v1 varient could be renamed as pyuavcan_v1).
re-working the old uavcan v0 GUI to use the result of step 1 (should be trivial)
getting a GUI tool going for mixed v0/v1 networks. I think the fastest way forward on this is likely to be to build upon the old uavcan_gui_tool, but maybe those working on yukon could comment on the likely timeline for a mixed v0/v1 UI based on that work?

mrpollo · May 10, 2021, 1:53am

Sounds good. We will wait for @pavel.kirienko feedback. Thanks again for putting this together.