- There are sometimes valid reasons to choose lower level of abstraction is the physical decomposition on the system.
Among those reasons:- The design expertise and complexity tolerance are not distributed evenly in the ecosystem. There is significantly more available in AP dev community than in the peripheral development. So by Conway’s law ( architecture follows organization) it makes sense to split the functionality accordingly.
- Debug and diagnostic tools favor more a centralized design
- All abstractions are leaky abstractions when put under sufficient pressure. So it takes very deep domain expertise to choose abstraction level. It is a complex tradeoff with an optimum point, that can not take one-sided mandates “move as high as possible”
- “thin client” hardware architecture does not equal bad design and “god object” antipattern. There can still be well-designed software on the central node(s) with optimal modularity of architecture.
- Meta point 1: I think it serves little useful purpose to load an excellent transport layer with opinions on what is essentially technically independent adjacent layers. UAVCAN v1 as the transport layer can support different system architectures (“sensor networks”, “smart nodes” and everything in between ) equally well. It would be much better to let players with “skin in the game” work to converge to the suitable solution while having the best platform possible to work on. It is entirely possible that different groups would converge on different solutions (Ardupilot vs Nuclear reactor manufacturers association). And none would be abstractly better, just more suited to the respective use case.
- The argument over “air data computer” looks especially redundant in this light: if someone wants to make one, fine. If someone else prefers to make a simple sensor, fine too. Let the market select which one wins (if not both).
- The approach of @tridge and Ardupilot is not in contradiction to “UAVCANv1 as the transport layer”. Any attempts to couple the transport layer with the system design and specific decomposition/architecture decisions are bound to slow adoption and harm the nascent and still fragile ecosystem.
- Meta point 2: It would help to foster the spirit of collaboration and good faith if there is less use of terms loaded with negative connotations applied to concepts writer is opposed to. Examples would include “modern” vs “legacy”. State practical downsides, don’t attack emotionally.
A look at GNSS in DS-015
We’ve spent a lot of time now analyzing airspeed. For a simple float differential pressure it has caused a lot of discussion, but it is time to move onto the true horrors of DS-015. Nothing exemplifies those horrors more than the GNSS message. This will be a long post because it will dive into exactly what is proposed for GNSS, and that is extremely complex.
What it should be
This is the information we actually need from a CAN GNSS sensor:
uint3 instance
uint3 status
uint32 time_week_ms
uint16 time_week
int36 latitude
int36 longitude
float32 altitude
float32 ground_speed
float32 ground_course
float32 yaw
uint16 hdop
uint16 vdop
uint8 num_sats
float32 velocity[3]
float16 speed_accuracy
float16 horizontal_accuracy
float16 vertical_accuracy
float16 yaw_accuracy
It is 56 bytes long, and pretty easy to understand. Note that it includes yaw support, as yaw from GPS is becoming pretty common these days, and in fact is one of the key motivations for users buying the higher end GNSS modules. When a float value in the above isn’t known then a NaN would be used. For example if you don’t know vertical velocity as you have a NMEA sensor attached and those mostly don’t have vertical velocity then the 3rd element of velocity would be NaN. Same for the accuracies that you don’t get from the sensor.
I should note that the above structure is a big improvement over the one in UAVCAN v0, which requires a bunch of separate messages to achieve the same thing.
GNSS in DS-015
Now let’s look at what GNSS would be like using current DS-015. Following the idiom of UAVCAN v1 the GNSS message is a very deeply nested set of types. It took me well over an hour to work out what is actually in the message set for GNSS as the nesting goes so deep.
To give you a foretaste though, to get the same information as the 56 byte message above you need 243 bytes in DS-015, and even then it is missing some key information.
How does it manage to expand such a simple set of data to 243 bytes? Here are some highlights:
- there are 55 covariance floats. wow
- there are 6 timestamps. Some of the timestamps have timestamps on them! Some of the timestamps even have a variance on them.
I’m sure you must be skeptical by now, so I’ll go into it in detail. I’ll start from the top level and work down to the deepest part of the tree of types
Top Level
The top level of GNSS is this:
# point_kinematics reg.drone.physics.kinematics.geodetic.PointStateVarTs 1...100
# time reg.drone.service.gnss.Time 1...10
# heartbeat reg.drone.service.gnss.Heartbeat ~1
# sensor_status reg.drone.service.sensor.Status
the “1…100” is the range of update rates. This is where we hit the first snag. It presumes you’ll be sending the point_kinematics fast (typical would be 5Hz or 10Hz) and the other messages less often. The problem with this is it means you don’t get critical information that the autopilot needs on each update. So you could fuse data from the point_kinematics when the status of the sensor is saying “I have no lock”. The separation of the time from the kinematics also means you can’t do proper jitter correction for transport timing delays.
point_kinematics - 74 bytes
The first item in GNSS is point_kinematics. It is the following:
reg.drone.physics.kinematics.geodetic.PointStateVarTs:
74 bytes
uavcan.time.SynchronizedTimestamp.1.0 timestamp
PointStateVar.0.1 value
Breaking down the types we find:
uavcan.time.SynchronizedTimestamp.1.0:
7 bytes
uint56
Keep note of this SynchronizedTimestamp, we’re going to be seeing it a lot.
PointStateVar.0.1:
67 bytes
PointVar.0.1 position
reg.drone.physics.kinematics.translation.Velocity3Var.0.1 velocity
Looking into PointVar.0.1 we find:
PointVar.0.1 position:
36 bytes
Point.0.1 value
float16[6] covariance_urt
and there we have our first covariances. I’d guess most developers will just shrug their shoulders and fill in zero for those 6 float16 values. The chances that everyone treats them in a consistent and sane fashion is zero.
Ok, so now we need to parse Point.0.1:
Point.0.1:
24 bytes
float64 latitude # [radian]
float64 longitude # [radian]
uavcan.si.unit.length.WideScalar.1.0 altitude
there at last we have the latitude/longitude. Instead of 36 bits for UAVCAN v0 (which gives mm accuracy) we’re using float64, which allows us to get well below the atomic scale. Not a great use of bandwidth.
What about altitude? That is a WideScalar:
uavcan.si.unit.length.WideScalar.1.0:
8 bytes, float64
yep, another float64. So atomic scale vertically too.
Back up to our velocity variable (from PointStateVar.0.1) we see:
reg.drone.physics.kinematics.translation.Velocity3Var.0.1:
31 bytes
uavcan.si.sample.velocity.Vector3.1.0 value
float16[6] covariance_urt
so, another 6 covariance values. More confusion, more rubbish consuming the scant network resources.
Looking inside the actual velocity in the velocity we see:
uavcan.si.sample.velocity.Vector3.1.0:
19 bytes
uavcan.time.SynchronizedTimestamp.1.0
float32[3] velocity
there is our old friend SynchronizedTimestamp again, consuming another useless 7 bytes.
Now we get to the Time message in GNSS:
time: reg.drone.service.gnss.Time
21 bytes
reg.drone.physics.time.TAI64VarTs.0.1 value
uavcan.time.TAIInfo.0.1 info
Diving deeper we see:
reg.drone.physics.time.TAI64VarTs.0.1:
19 bytes
uavcan.time.SynchronizedTimestamp.1.0 timestamp
TAI64Var.0.1 value
yes, another SynchonizedTimestamp! and what is this timestamp timestamping? A timestamp. You’ve got to see the funny side of this.
Looking into TAI64Var.0.1 we see:
TAI64Var.0.1:
12 bytes
TAI64.0.1 value
float32 error_variance
so there we have it. A timestamped timestamp with a 32 bit variance. What the heck does that even mean?
Completing the timestamp type tree we have:
TAI64.0.1:
8 bytes
int64 tai64n
so finally we have the 64 bit time. It still hasn’t given me the timestamp that I actually want though. I want the iTOW. That value in milliseconds tells me about the actual GNSS fix epochs. Tracking that timestamp in its multiple of 100ms or 200ms is what really gives you the time info you want from a GNSS. Can I get it from the huge tree of timestamps in DS-015? Maybe. I’m not sure yet if its possible.
Now on to the heartbeat. This is where we finally know what the status is. Note that the GNSS top level docs suggest this is sent at 1Hz. There is no way we can do that, as it contains information we need before we can fuse the other data into the EKF.
A heartbeat is a reg.drone.service.gnss.Heartbeat
reg.drone.service.gnss.Heartbeat:
25 bytes
reg.drone.service.common.Heartbeat.0.1 heartbeat
Sources.0.1 sources
DilutionOfPrecision.0.1 dop
uint8 num_visible_satellites
uint8 num_used_satellites
bool fix
bool rtk_fix
here we finally find out the fix status. But we can’t differentiate between 2D, 3D, 3D+SBAS, RTK-Float and RTK-Fixed, which are all distinct levels of fix and are critical for users and log analysis. Instead we get just 2 bits (presumably to keep the protocol compact?).
We do however get both the number of used and number of visible satellites. That is somewhat handy, but is a bit tricky as “used” has multiple meanings in the GNSS world.
Looking deeper we have:
reg.drone.service.common.Heartbeat.0.1:
2 bytes
Readiness.0.1 readiness
uavcan.node.Health.1.0 health
which is made up of:
Readiness.0.1:
1 byte
truncated uint2 value
uavcan.node.Health.1.0:
1 byte
uint2
these are all sealed btw. If 2 bits ain’t enough then you can’t grow it.
Now on to the sources:
Sources.0.1:
48 bits, 6 bytes
bool gps_l1
bool gps_l2
bool gps_l5
bool glonass_l1
bool glonass_l2
bool glonass_l3
bool galileo_e1
bool galileo_e5a
bool galileo_e5b
bool galileo_e6
bool beidou_b1
bool beidou_b2
void5
bool sbas
bool gbas
bool rtk_base
void3
bool imu
bool visual_odometry
bool dead_reckoning
bool uwb
void4
bool magnetic_compass
bool gyro_compass
bool other_compass
void14
so we have lots of bits (using 56 bits) telling us exactly which satellite signals we’re receiving, but not information on what type of RTK fix we have.
What is the “imu”, “visual_odomotry”, “dead_reckoning” and “uwb” doing in there? Does someone really imagine you’ll be encoding your uwb sensors on UAVCAN using this GNSS service? Why??
Diving deeper we have the DOPs:
DilutionOfPrecision.0.1:
14 bytes
float16 geometric
float16 position
float16 horizontal
float16 vertical
float16 time
float16 northing
float16 easting
that is far more DOP values than we need. The DOPs are mostly there to keep users who are used them them happy. They want 1, or at most 2 values. We don’t do fusion with these as they are not a measure of accuracy. Sending 6 of them is silly.
Now onto sensor_status:
sensor_status: reg.drone.service.sensor.Status
reg.drone.service.sensor.Status:
12 bytes
uavcan.si.unit.duration.Scalar.1.0 data_validity_period
uint32 error_count
uavcan.si.unit.temperature.Scalar.1.0 sensor_temperature
yep, we have the temperature of the GNSS in there, along with an “error_count”. What sort of error? I have no idea. The doc says it is implementation-dependent.
The types in the above are:
uavcan.si.unit.duration.Scalar.1.0:
4 bytes
float32
uavcan.si.unit.temperature.Scalar.1.0:
4 bytes
float32
quite what you are supposed to do with the “data_validity_period” from a GNSS I have no idea.
Ok, we’re done with what is needed for a GNSS that doesn’t do yaw, but as I mentioned, yaw from GNSS is one of the killer features attracting users to new products, so how would that be handled?
We get this:
# Sensors that are able to estimate orientation (e.g., those equipped with IMU, VIO, multi-antenna RTK, etc.)
# should also publish the following in addition to the above:
#
# PUBLISHED SUBJECT NAME SUBJECT TYPE TYP. RATE [Hz]
# kinematics reg.drone.physics.kinematics.geodetic.StateVarTs 1...100
so, our GNSS doing moving baseline RTK for yaw needs to publish reg.drone.physics.kinematics.geodetic.StateVarTs, presumably at the same rate as the above. For ArduPilot we fuse the GPS yaw in the same measurement step as the GPS position and velocity, so we’d like it at the same rate. We could split that out to a separate fusion step, but given yaw is just a single float, why not send it at the same time?
Well, we could, but in DS-015 it takes us 111 bytes to send that yaw. Hold onto your hat while I dive deep into how it is encoded.
reg.drone.physics.kinematics.geodetic.StateVarTs:
111 bytes
uavcan.time.SynchronizedTimestamp.1.0 timestamp
StateVar.0.1 value
another SynchronizedTimestamp. Why? Because more timestamps is good timestamps I expect.
Now into the value:
StateVar.0.1:
104 bytes
PoseVar.0.1 pose
reg.drone.physics.kinematics.cartesian.TwistVar.0.1 twist
yep, our yaw gets encoded as a pose and a twist. I’ll give you all of that in one big lump now, just so I’m not spending all day writing this post. Take a deep breath:
StateVar.0.1:
104 bytes
PoseVar.0.1 pose
reg.drone.physics.kinematics.cartesian.TwistVar.0.1 twist
reg.drone.physics.kinematics.cartesian.TwistVar.0.1:
66 bytes
Twist.0.1 value
float16[21] covariance_urt
Twist.0.1:
24 bytes
uavcan.si.unit.velocity.Vector3.1.0 linear
uavcan.si.unit.angular_velocity.Vector3.1.0 angular
PoseVar.0.1:
82 bytes
Pose.0.1 value
float16[21] covariance_urt
Pose.0.1:
40 bytes
Point.0.1 position
uavcan.si.unit.angle.Quaternion.1.0 orientation
uavcan.si.unit.angular_velocity.Vector3.1.0:
12 bytes
float32[3]
uavcan.si.unit.velocity.Vector3.1.0:
12 bytes
float32[3]
Point.0.1:
24 bytes
float64 latitude # [radian]
float64 longitude # [radian]
uavcan.si.unit.length.WideScalar.1.0 altitude
uavcan.si.unit.angle.Quaternion.1.0:
16 bytes
float32[4]
phew! That simple yaw has cost us 111 bytes, including 42 covariance variables, some linear and angular velocities, our latitude and longitude (again!!) and even a 2nd copy of our altitude, all precise enough for quantum physics. Then finally the yaw itself is encoded as a 16 byte quaternion, just to make it maximally inconvenient.
Conclusion
if you’ve managed to get this far then congratulations. If someone would like to check my work then please do. Diving through the standard to work out what actually goes into a service is a tricky task in UAVCAN v1, and it is very possible I’ve missed a turn or two.
The overall message should be pretty clear however. The idiom of DS-015 (and to a pretty large degree UAVCAN v1) is “abstraction to ridiculous degrees”. It manages to encode a simple 56 byte structure into a 243 byte monster, spread across multiple messages, with piles of duplication.
We’re already running low on bandwidth with v0 at 1MBit. When we switch to v1 we will for quite a long time be stuck at 1MBit as there will be some node on the bus that can’t do higher rates. So keeping the message set compact is essential. Even when the day comes that everyone has FDCAN the proposed DS-015 GNSS will swallow up most of that new bandwidth. It will also swallow a huge pile of flash, as the structure of DS-015 and of v1 means massively deep nesting of parsing functions. So expect the expansion in bandwidth to come along with an equally (or perhaps higher?) expansion in flash cost.
The DS-015 “standard” should be burnt. It is an abomination.
I’m trying to stay somewhat neutral in this, but the amount of extra data on the bus is unacceptable; as someone leading a project to deliver a UAVCAN-based GNSS to the market in the $2000USD price range, as well as someone driving adoption of UAVCAN v1 on to our actuators, fix this Pavel. The time for debate has passed, collaborate or we’re going to consider leaving.
Our hardware will be following Tridge’s recommendation here, why put the effort in to move to FDCAN just to throw it away on wasted data.
The experimental continued reinventing of UAVCAN needs to to stop! Improvements are fine, but this is just going backwards @pavel.kirienko
This topic has gotten far out of hand. The hostility and unwillingness to collaborate is not productive.
As someone with some stake in the UAVCAN v1 / DS-015 game, I decided I should step in to try to make this discussion more constructive - because this is a totally solvable problem, and the rancor here is hiding how simple I think it could be to solve.
@tridge and others - you seem to be assuming that as soon as DS-015 was “released”, it was carved in stone and no longer subject to any further modifications. This is absolutely not the case. If you had been present in the discussions surrounding DS-015, I think you would have very different context on it. I wasn’t too actively involved in most of the discussions, but I did follow along for many of them, and my main takeaway was that the decision in the end was along the lines of “we don’t know how this is going to work until we try it, so let’s just release something to get the ball rolling and iterate from experience”. At no point, from my perspective, did I hear the opinion that the current form of DS-015 was expected to be the end-all be-all of UAVCAN drone communications.
So please, instead of foaming at the mouth at how bad DS-015 is, let’s be reasonable and start working together on improving the situation. To @tridge’s point, the GNSS service, in its current form, is rather abysmal. In hindsight, however, I’m not convinced anyone had previously bothered to go through the details of looking at the full size of the service as defined - it was designed at the level of the abstract types, and that’s as far as it went. So now is the time to iterate on that.
For additional context (at least from my tangentially-involved perspective; apologies if I’m putting words in anyone’s mouths), the goal was to release something just so we could stop debating the details and go try to create a real proof of concept implementation, because none of this matters if it doesn’t get built into a real system. That’s the state that PX4 is currently working towards - the UAVCAN v1 infrastructure is still being implemented, with the composable services of DS-015 being used to guide what that implementation looks like (specifically, driving us towards dynamic, reconfigurable Publisher/Subscriber objects from
which to build DS-015-like services). Most of the work is actually being done by @PetervdPerk who has (to my knowledge) really only focused on the Battery service, while I have focused on the
ESC and Servo services. We haven’t gotten to a full GNSS service, or any others for that matter.
So this is my roundabout way of saying yes, I agree that DS-015 needs to change, and I don’t think you’ll find any vocal arguments to the contrary. But before it’s completely overhauled, we need to find the issues and propose alternatives. The way in which the PX4 community has been doing that is by just going out and building it, which I think is far more constructive than any number of critical forum posts.
So in the spirit of collaboration and revising DS-015, here’s a rough proposal for a revised
GNSS service, for the sake of discussion (based on @tridge’s recommendation):
# Revised GNSS Service:
reg.drone.service.gnss.TimeWeek.0.1 gnss_time_week # Proposed new topic
6 bytes
+ uint32 time_week_ms
+ uint16 time_week
reg.drone.physics.kinematics.Point.0.1
24 bytes
+ float64 latitude
+ float64 longitude
+ uavcan.si.unit.length.WideScalar
+ float64
uavcan.si.unit.velocity.Vector3.0.1
6 bytes
+ float32[3]
reg.drone.service.gnss.Status.0.2 dop # Proposed new topic -- placeholder name
6 bytes
+ uint16 hdop
+ uint16 vdop
+ uint8 num_sats
+ uint3 fix_type
# Optional Topics
reg.drone.service.gnss.Accuracy.0.1 # Proposed new topic
8 bytes
+ float16 speed_accuracy
+ float16 horizontal_accuracy
+ float16 vertical_accuracy
+ float16 yaw_accuracy
reg.drone.service.common.GroundTrack.0.0 # (Very rough) Proposed new topic
6 bytes
+ uavcan.si.unit.velocity.Scalar.1.0 ground_speed
+ float32 meter_per_second
+ uavcan.si.unit.velocity.Scalar.1.0 ground_course
+ float32 radian
+ uavcan.si.unit.velocity.Scalar.1.0 heading
+ float32 radian
Total Bytes:
6 + 24 + 6 + 6 + 8 + 6 = 56 (14 bytes of which are optional).
(Omitted: The standard Heartbeat message that all nodes must publish).
Note that this keeps the UAVCAN v1 mindset of using a composition of basic types to do most
of the work rather than a single message (like Fix
or Fix2
) that not everyone agrees on,
and adds in DS-015 specific types when the basic UAVCAN types don’t suffice, with those more specific types split up such that a change to one won’t affect the others, and allows some of the data to be an optional part of the service.
This same approach can be taken to all of the services defined by DS-015, and future services
that need to be added to it (e.g. rangefinders, optical flow sensors, IMUs, VectorNav-type
units, …).
Also note that, if we add perhaps just a little more clarification and detail to the port-naming
conventions, we can very easily develop plug & play support around all of these services,
so the hobbyist ArduPilot user can plug & play with new devices to their heart’s content,
while still letting the professional / commercial integrators fine-tune the system to their
own specifications.
Let’s try to remain constructive here, friends, and work towards a better solution!
Differential pressure sensor demo
@tridge It is great that you have moved on to analyze other parts of DS-015, but I would like to reach some sort of conclusion regarding the airspeed sensor node (mind the difference: not an air data computer, so not an idiomatic DS-015) before switching the topic. I proposed that we construct a very simple demo based on your inputs. I did that yesterday; please, do have a look (@scottdixon also suggested that we make the repository public, so it is now public):
I hope this demo will be a sufficient illustration of my proposition that DS-015 can be stretched to accommodate your requirements (at least in this case for now). In fact, as it stands, the demo does not actually leverage any data types from the reg.drone
namespace, not that it was an objective to avoid it.
May I suggest that you run it locally using Yakut and poke it around a little bit? I have to note though that Yakut is a rather new tool; if you run into any issues during its installation, please, open a ticket, and sorry for any inconvenience. You may notice that the register read/write command is terribly verbose, that’s true; I should improve this experience soon (this is, by the way, the part that can be automated if the aforementioned plug-and-play auto-configuration proposal is implemented).
We can easily construct additional demos as visual aids for this discussion (it only takes an hour or two).
Goals and motivation
I risk repeating myself again here since this topic was covered in the Guide, but please bear with me — I don’t want any accidental misunderstanding to poison this conversation further.
My reference to Torvalds vs. Tanenbaum was to illustrate the general scope of the debate, not to recycle the arguments from a different domain. We both know that distributed systems are commonly leveraged in state-of-the-art robotics and ICT. I am not sure if one can confidently claim that “distributed systems won” (I’m not even sure what would that mean exactly), but I think we can easily agree that there exists a fair share of applications where they are superior. Avionics is one of them. Robotic systems are another decent example — look at the tremendous ecosystem built around ROS!
It is my aspiration (maybe an overly ambitious one? I guess we’ll see) to enable a similar ecosystem, spanning a large set of closely related domains from avionics to robotics and more, with the help of UAVCAN. It is already being leveraged in multiple domains, although light unmanned aviation remains, by far, the largest adopter (this survey was also heavily affected by a selection bias, so the numbers are only crude approximations):
Requirements to LRU and software components between many of these domains overlap to a significant extent. It is, therefore, possible to let engineers and researchers working in any of these fields be able to rely on the advances made in the adjacent areas. I am certain that many business-minded people who are following this conversation will recognize the benefits of this.
UAVCAN v1 is perfectly able to serve as the foundation for such an inter-domain ecosystem, but it will only succeed if we make the first steps right and position it properly from the start. One of the preconditions is that we align its design philosophy with the expectations of modern-day experts, many of whom are well-versed in software engineering. This is not surprising, considering that just about every sufficiently complex automation system being developed today — whether vehicular, robotic, industrial — is software-defined.
The idea of UAVCAN becoming a technology that endorses and propagates flawed design practices like the specimen below keeps me up at night. I take full responsibility for it because an engineer working with UAVCAN v0 simply does not have access to adequate tools to produce good designs.
You might say that a man is always free to shoot himself in the foot, no matter how great the tool is. But as a provider of said tool, I am responsible to do my part at raising the sanity waterline, if only by a notch. Hence the extensive design guide, philosophy, tropes, ideologies, and opinionated best practices. Being a mere human, I might occasionally get carried away and produce overly idealistic proposals, which is why I depend on you and other adopters to keep the hard business objectives in sight. My experience with the PX4 leadership indicates that it is always possible to find a compromise between immediate interests and long-term benefits of a cleaner architecture by way of sensible and respectful dialogue.
Your last few posts look a bit troubling, as they appear to contain critique directed at your original interpretation of the standard while not taking into account the corrections that I introduced over the course of this conversation. Perhaps the clarity of expression is not my strong suit. The one thing that troubles me most is that you appear to be evaluating DS-015 as a sensor network rather than what it really is. I hope the demos will make things clearer.
On GNSS service
I think I understand where you are coming from. We had a lengthy conversation a year and a half ago about the deficiencies of the v0 application layer, where we agreed it could be improved; you went on to list the specifics. I suppose I spent enough time tinkering with v0 and its many applications to produce a naïve design that would exactly match your (and many other adopters) expectations. Instead, I made DS-015. Why?
When embarking on any non-trivial project, one starts by asking “what are the business requirements” and “what are the constraints”, then apply constrained optimization to approximate the former. With DS-015, the requirements (publicly visible subset thereof) can be seen here: DS-015 MVP progress tracking. One specific constraint was that it is to be possible to deploy a basic DS-015 configuration on a small UAV equipped with a 1 Mbps Classic CAN bus. If the constraint is satisfied and the business requirements are met, the design is acceptable. I suppose it makes sense.
One might instead maximize an arbitrary parameter without regard for other aspects of the design. For example, it could be the bus utilization, data transfer latency, flash space, how many lines of code one needs to write to bring up a working system, et cetera. Such blind optimization is mostly reminiscent of games or hobby projects, where the process of optimization is the goal in itself. This is not how engineering works.
At 10 Hz, the example 57-byte message you’ve shown requires 57 bytes * 10 Hz = 570 bytes per second of bandwidth, or 90 Classic CAN frames per second. At 1 Mbps, the resulting worst-case bus load would be about 1%.
At the same 10 Hz, the DS-015 GNSS service along with a separate yaw message requires 156 frames per second, thereby loading the bus by 2%:
The yaw is to be represented using uavcan.si.sample.angle.Scalar
. The kinematic state message is intended only for publishers that are able to estimate the full kinematic state, which is not the case in this example.
Is DS-015 less efficient? Yes! It is about twice less efficient compared to your optimized message, or 1% less efficient, depending on how you squint. Should you care? I don’t think so. You would need to run just about 500 GNSS receivers on the bus to exhaust its throughput. Then you will be able to selectively disable subjects that your application doesn’t require to conserve more bandwidth (this is built into UAVCAN v1).
If you understand the benefits of service-oriented design in general (assuming idealized settings detached from our constraints), you might see then how this service definition is superior compared to your alternative, while having negligible cost in terms of bandwidth. I, however, should stop making references to the Guide, where this point is explained sufficiently well.
I should also address your note about double timestamping in reg.drone.physics.time.TAI64VarTs
. In robotics, it is commonly required to map various sensor feeds, data transfers, and events on a shared time system — this enables complex distributed activities. In UAVCAN, we call it “synchronized time”. Said time may be arbitrarily chosen as long as all network participants agree about it. In PX4-based systems (maybe this is also true for ArduPilot?), this time is usually the autopilot’s own monotonic clock. In more complex systems like ROS-based ones, this is usually the wall clock. Hence, this message represents the GNSS time in terms of the local distributed system’s time, which is actually a rather common design choice.
Timestamping of all sensor feeds also allows you to address the transport latency variations since each sample from time-sensitive sensor feeds comes with an explicit timestamp that is invariant to the transmission latency.
Regarding the extra timestamp in reg.drone.physics.kinematics.translation.Velocity3Var
: this is a defect. The nested type should have been uavcan.si.unit.velocity.Vector3
rather than uavcan.si.sample.velocity.Vector3
. @coder_kalyan has already volunteered to fix this, thanks Kalyan.
As for the excessive variance states, you simply counted them incorrectly. This is understandable because crawling through the many files in the DS-015 namespace is unergonomic at best. The good news is that @bbworld1 is working to improve this experience (at the time of writing this, data type sizes reported by this tool may be a bit nonsensical, so beware):
https://bbworld1.gitlab.io/uavcan-documentation-example/reg/Namespace.html
It is hard not to notice that your posts are getting a bit agitated. I can relate. But do you not see how a hasty dismissal may have long-lasting negative consequences on the entire ecosystem? People like @proficnc, @joshwelsh, and other prominent members of the community look up to you to choose the correct direction for ArduPilot, and, by extension, for the entire world of light unmanned systems for a decade to come. We don’t want this conversation to end up in any irresponsible decisions being made, so let us please find a way to communicate more sensibly.
I don’t want to imply that the definitions we have are perfect and you are just reading them wrong. Sorry if it came out this way. I think they are, in fact, far from perfect (which is why the version numbers are v0.1, not v1.0), but the underlying principles are worth building upon.
Should we craft up and explore a GNSS demo along with the airspeed one?
Questions
Lastly, I should run through the following questions that appear to warrant special attention.
I implied no such thing. Sorry if I was unclear.
I agree this is useful in many scenarios, but the degree to which you can make the system self-configurable is obviously limited. By way of example, your existing v0 implementation is not fully auto-configurable either, otherwise, there would be no bits like this:
What I am actually suggesting is that we build the implementation gradually. We start with an MVP that takes a bit of dedication to set up correctly. Then we can apply autoconfiguration where necessary to improve the user experience. Said autoconfiguration does not need to require active collaboration from simple nodes, if you read the thread I linked.
Observe that the main product of the GNSS service is the set of kinematic states published over separate subjects. These subjects are completely abstracted from the means of estimating the vehicle’s pose/position. Whether it is UWB, VO, or any related technology, the output is representable using the same basic model.
I approve of your intent to move the conversation into a more constructive plane. Although before we propose any changes, we should first identify how exactly @tridge’s business requirements differ, and why. For example, it is not clear why the iTOW is actually necessary and how is it compatible with other GNSS out there aside of GPS; I suspect another case of an XY problem, but maybe we don’t have the full information yet (in which case I invite Andrew to share it).
Hey all,
Thanks @JacobCrabill for pushing for a more constructive conversation. The only way we will ever actually improve the standard is with constructive criticism and real world examples, not heated and pointless arguments.
Regarding the extra timestamp in
reg.drone.physics.kinematics.translation.Velocity3Var
: this is a defect. The nested type should have beenuavcan.si.unit.velocity.Vector3
rather thanuavcan.si.sample.velocity.Vector3
. @coder_kalyan has already volunteered to fix this, thanks Kalyan.
No problem! I’ll fix this soon, and we can take another look at the other service types to make sure there are no other defects of a similar type. @tridge If you have any other specific examples of defects like this, please let me know and I’ll be happy to fix them.
Should we craft up and explore a GNSS demo along with the airspeed one?
This is a good time to mention that I am (in my spare time) developing an independent firmware for the CUAV NEO V3 Pro CAN GPS with UAVCAN v1. The board has a fast enough processor (F4) to do a fair bit of calculation, and a standard UAV GPS puck sensor set (GNSS, magnetometer, barometer, arming switch) but no IMU, so an EKF can’t/shouldn’t be directly used. Now, I am aware that this may be pointless in the long run, as the infrastructure built up already by AP_Periph and PX4 cannode is quite extensive and hard to beat. However, while I embarked on this with the goal of learning UAVCAN, I realize that this may be a good opportunity; an open and independently developed example will demonstrate a GNSS CAN node in a real world scenario (not just a Linux socketcan demonstration with fake data) as well as allow us to verify that the message set is what we want and make modifications as necessary. I will post a link to this demo once it starts to take shape. Perhaps as importantly, I am also looking to implement support in PX4 on the autopilot side to make sure the data published is sufficiently usable to actually fly a drone.
@pavel.kirienko and I have also been discussing a similar port of Sapog (an independent open source brushless ESC firmware) to UAVCANv1, which will be a good opportunity to stress test the ESC/actuator services. I am especially interested in testing the real world performance considerations for the ESC service, which unlike the low rate GNSS service, causes considerable load on the bus. (The hardware is not FD capable).
What I am actually suggesting is that we build the implementation gradually. We start with an MVP that takes a bit of dedication to set up correctly. Then we can apply autoconfiguration where necessary to improve the user experience. Said autoconfiguration does not need to require active collaboration from simple nodes, if you read the thread I linked.
I agree; let’s focus on an MVP that is robust and otherwise meets criteria, and then we can slowly improve the pnp experience without compromising on other design goals.
Thank you, that is a nice demo. I’ve run in on Linux and it highlights some issues nicely.
The first issue I hit was having to remove the old uavcan python module to run any of the v1 demos. This may seem trivial, but it really isn’t. Remember how I’ve harked on so many times about the critical importance of v0/v1 coexistence? If the basic tools can’t even be installed at the same time then we have no hope of v0 and v1 coexisting. Coexistence should be a fundamental aim at all levels. If we don’t fix this then it will become a major impediment to adoption of v1.
The next thing the demo highlights is the cost of the separate subjects for things that really should be one subject. Having temperature and differential pressure as separate subjects costs in several ways:
- it doubles the bandwidth cost, as it has to be two frames for something that easily fixes in one frame (assuming bxCAN 8 byte frames)
- it makes for more user confusion and opportunity for mis-configuration, as they need to configured two subject IDs. We will end up with users that have two airspeed sensors lumping the temperature of one with the pressure of another.
- it clutters user interfaces with double the subjects
- it doubles the sensor status costs, with the periodic announcements, which are very large
On that last point, I’d like to understand what is going on in these packets in the demo:
(1620509330.669797) vcan0 TX B - 1C7D567D [64] 0A 00 00 00 01 04 55 1D 56 1D 64 00 65 00 02 00 00 00 01 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 A7
(1620509330.669814) vcan0 TX B - 1C7D567D [64] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07
(1620509330.669831) vcan0 TX B - 1C7D567D [48] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 40 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 F2 67
(1620509330.679558) vcan0 TX B - 0C60647D [5] C5 B8 20 4D E0 '.. M.'
(1620509330.680321) vcan0 TX B - 1C7D567F [64] 0A 00 00 00 01 04 55 1D 56 1D E5 1F E6 1F 01 00 00 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 A8
(1620509330.682447) vcan0 TX B - 1C7D567F [64] 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08
(1620509330.682752) vcan0 TX B - 1C7D567F [32] 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 DC 3D 68
Once a second we’re paying for 336 bytes of mostly zeros. I presume this is the subject and node announcements? Can I use yakut to parse these to see exactly what we’re getting for such a large cost?
I know I am emphasizing bandwidth a lot, and you don’t seem to see it as important. It really does matter. We really are running out of bandwidth on v0 already. The v1 protocol is setting itself up to be a death of a thousand cuts on bandwidth. The fundamental decisions in v1 are doubling (and in many cases much worse) the cost of encoding the data we need.
The demo and the discussion we’re having also highlights very clearly that we don’t seem to have the tools we need to keep bandwidth under control. It takes far too much effort to work out what the bandwidth cost of a message set is.
Distribution is great when applied intelligently. I’ve worked on distributed systems since the 90s, and I have no problem with distribution of services when done well. What I do have a problem with is applying it in what seems to me to be a dogmatic manner when the distribution of the service in question runs counter to the other important engineering goals for the project. That is what happened with our debate on CAS for airspeed sensors. Distribution of the maths of airspeed calibration hurts the system and it should not have taken a big debate to see that.
ROS is great, but that doesn’t mean mimicking it in UAVCAN is good. It focuses on quite a different domain.
Where ROS shines is:
- high level, low update rate applications (drive a rover, SLAM algorithms etc)
- academic environments where the end user is highly technical
- relatively high bandwidth environments
You don’t see ROS being used to directly control the ESCs on a consumer quadcopter. I’m sure someone could point me at an example of where someone has done this, but I’m sure it won’t produce a good result as the architecture of ROS is poor at time critical operations like this.
In UAVCAN we do care about time critical, bandwidth sensitive applications. A multirotor is a very good example.
That doesn’t mean we can’t build a great ecosystem around UAVCAN. In fact, we have built a great ecosystem around UAVCAN already, using v0. There are flaws in it, just like there are flaws in every system, but it did become a widespread, robust CAN ecosystem. You should be very proud of what you achieved. I just don’t want those successes to be lost with v1 because the fundamental building blocks and the way they are applied are inappropriate to the task.
I complete agree with this, but I think that the execution on this has not been good.
I’m a software engineer, and I think the application of the design philosophy that I’ve seen so far in UAVCAN v1 is not good. It neglects some really key principles:
- clarity: it should not take an hour to work out what the actual network cost of a message structure will be. That lack of clarity is what made the GNSS DS-015 example so bad.
- user experience: the proliferation of separately configurable subjects for what in the users mind is one logical piece makes for a poor experience
- migration path: we’re not bringing v1 into a vacuum. v1 will only succeed if it makes the process of coexisting with v0 very easy. Look at how I managed the migration from mavlink 0.9 to 1.0 to 2.0 to see something that is largely seamless and easy for users.
and there is a major problem. The principal way it would be used is as a sensor network. There are other use cases I’m sure, but a sensor network is going to be 99% of the use cases. So it must be a darn good sensor network. It isn’t.
That is not sufficient. With current v0 and a simple vehicle we do have plenty of bandwidth, but with a more complex vehicle we don’t. Setting the bar so low as to find it acceptable as long as a simple vehicle fits in 1Mbps results in a design that does not scale well to more complex vehicles.
it sure is how engineering works. Efficient use of resources is a central tenet of professional engineering. In my experience it tends to be hobby projects that discard that principle. A huge number of professional engineers spend their lives optimising for the real world constraints of the field they work in. Think about aerodynamics - hobbyists throw together a bit of foam and balsa and are happy when they can fly around a park for a few minutes. Professional aerodynamic engineers spend millions on CFD, wind-tunnel tests, advanced materials, all to make a bit more efficient use of the environment the vehicle is in.
Good network protocol design does care about bandwidth. I’m not talking about the java coders of the world making banking apps - those tend to be bloated monsters and they just throw a few more servers at it. I’m talking about professional design of a network meant to be used for realtime control. That is the field we are in, and we must think in those terms.
and yet the same could be said of pretty much every v0 and v1 message, but I can assure you that real world complex vehicles are running out of bandwidth already. By dismissing those concern you are precluding the application of v1 to the most interesting professional applications.
no thank you. uavcan.si.sample.angle.Scalar costs us in several ways:
- it includes yet another useless timestamp
- it is 11 bytes, so 2 frames, when it should be included in the base GNSS message
- it doesn’t give us the yaw accuracy, so we’d need another two frames to get that
yet that isn’t what the DS-015 spec says - it says we use the kinematic state message for systems that do multi-antenna RTK, which is exactly the case here.
I very much care. I don’t want to be explaining to the next company building a complex vehicle that they can’t achieve what they want because some vague philosophy meant we have to lose half their bandwidth, especially when it is actually worse than a 2x cost.
No, it is not superior. It is worse in so many ways:
- the bandwidth, which matters
- the user complexity of multiple subjects, which matters. It is not negligible.
- the disassociation of data that should be associated in time. We want to fuse yaw at the same time as fusing velocity and position. The yaw is generated on the same GNSS time horizons as the velocity and position
- the use of so many useless timestamps
- the massive proliferation of covariances, which are ill-defined and massively inefficient
- the separation of status information from the data it is logically associated with
It is a fundamentally poor design.
We use a time jitter removal system that does not require synchonization. Each device can operate within its own time domain and yet the recipient can correct any incoming timestamp into its own time domain with zero network cost.
yep, I’m rather familiar with timestamping as I wrote the timing correction code for ArduPilot. I’m not arguing for removing all timestamps. I’m saying we only need 1 timestamp for the complete GNSS message set. If internally PX4 wants to replicate that into dozens of timestamps they can do that. We don’t need them on the wire.
If that is the only defect you see from my analysis of the GNSS message set then you really don’t understand. It is all defects from top to bottom.
Please point out specifically where I miscounted. I tried to be pretty careful. It is 6+6+21+21 if we include the yaw (which as I pointed out, is following the spec as written). That is 54 - ok, so I did miscount, I said 53 when it is 54.
We should stop using covariances completely in these sensor messages. They just cause confusion and are a poor representation of what we’ve got available from the sensors. We should send data that has a clear representation of what real sensors can provide. For GNSS that is accuracy numbers as listed in my small example.
My posts are indeed agitated. That is a result of how much of a hole v1 has dug itself into with its design goals, and you not seeming to understand just how deep in a pit you are.
When the v1 design goals first came out I thought “ok, a bit odd, but let’s see how this pans out”. Now I see how it pans out I am horrified. The realization of the design goals has resulted in a very poor system. That needs to be fixed for v1 to have a chance.
We support all of those technologies in ArduPilot, but as the underlying physical mechanisms are very different we should not represent them with the one message. The types of errors that a VO system has is very different from GNSS. The handling of origins is very different. An autopilot really does need to treat them quite differently if it wants to be robust.
In a vicon lab it is fine to represent vision data as a GNSS service as a quick hack to get something going. The environment is highly constrained, the vision system of extremely high quality, and the consequences of error is small (a quad hits the nets at the side). In a professional UAS setting the two do need to be separated as they are fundamentally different.
A GNSS (not just GPS) operates on a discrete time interval with a delayed time horizon. Internally a GNSS does signal tracking on a very fine time scale, but the output is on discrete time steps. You can monitor that discrete time internal process using the iTOW. By doing the jitter correction for transport delays against the iTOW we eliminate not just the CAN transport delays but also the transport delays within the CAN sensor caused by UART timing jitter and the jitter from the internal processing load inside the sensor (eg. inside a u-blox module).
With something like a moving baseline RTK setup where two GNSS are cooperating, the iTOW is what links the times between the units. The autopilot being able to see the iTOW from both units allows it to properly handle both sets of data.
in that example the defaults are correct for one UAVCAN rangefinder. So for most people no config is needed. A good principle of design in software engineering is to make the common cases easy and the more complex cases possible.
My language is strong because the we’re close to the point of abandoning plans to do v1 at all. I hoped that the illustration of how the rigid design principles of v1, when applied to real sensors results in such a poor result would lead to a rethink so that we end up with something that is fit for purpose.
and that is one of the problems. The v1 design principles encourage this sort of poor design because it is actually pretty difficult to see the relationship between the message design and the wire representation.
thanks - it is missing the accuracy numbers and missing the yaw, but closer. I also think it should not split it up as separate subjects. We should get all the GNSS data in one packet. That removes a bunch of overhead, and prevents the recipient having to match timestamps to align the data.
My structure also wasted a bit of space as it had redundant grouund_course and ground_speed, which is a historical thing from early days of ArduPilot and should not be in the UAVCAN packet.
Sending those optional bytes as a separate message is not a good idea. All reasonably consumers of this data will need it, and the overhead in terms of code complexity, framing overheads, user confusion with the proliferation of published subjects means it is so much better to include it in the packet.
I think that is pretty clearly a mistake in this case.
Proposal for CANDevices Repository
I’d like to make a more concrete proposal for how we can move forward with sensor and device oriented messages that specifically aim to cover the very important use case of a set of sensors on the bus while being robust and efficient.
I propose creating a new CANDevices git repo under github.com/ArduPilot which we will populate with UAVCAN v1 messages for the key devices (at least GNSS, mag, baro, airspeed, rangefinder and likely a few more).
With Pavel’s permission we’d also like a section on this UAVCAN forum to be created specifically for discussion of this CANDevices message set. That gives a common location for discussion between vendors and autopilot stack developers.
Changes to CANDevices would be by pull requests against the CANDevices repo.
Idiom of CANDevices
The messages in CANDevices would be designed to be a regulated message set, and meant to be brought in as a git submodule. The design of the messages would follow the efficient/clear/robust approach I’ve been advocating above. It would not follow the v1 idiom of separate topics for a single logical device (eg. barometer would be one topic containing both pressure and temperature, not two topics). This would be explained in the top level README.md for the repo.
The AP_Periph firmware would implement CANDevices, plus the core UAVCAN v1 message set, along with v0. For boards with sufficient flash selection of v0/v1 will be by parameter allowing for only v0, only v1 or both.
The CANDevices repo will also offer a nice chance for people to point out where I screw up messages for v1, just like I’ve been (very) critical of the DS-015 message set. As I’ve never created a v1 message before it seems likely I’ll make mistakes and I hope we can sort those out on the CANDevices section of the UAVCAN forum.
A Device Class Id
I am also considering having a common uint8 device class ID as the first byte in every message in CANDevices. This is to address a concern I have about the fragility of the UAVCAN v1 subject-id configuration system. There would be a text file in the root of CANDevices where these class IDs are allocated by pull request. The IDs would be high level devices classes, such as “GNSS”, “Barometer”, “Magnetometer”.
The aim of this ID is to allow for mis-configuration detection at the point the data is entering the consuming node. So for example in the ArduPilot code structure, the AP_Compass_UAVCAN.cpp driver would check this field then it gets a message and if it isn’t the right class ID then it would signal a mis-configuration error, prevent arming, and notify the user of the error.
The motivation for this ID is this sort of scenario:
- a user is out at the field about to do an important flight. One of their UAVCAN v1 barometers is misbehaving, and it is causing a pre-arm error.
- the user decides to swap it out, so pulls the misbehaving device off the vehicle and replaces it with one from their box of spares.
- the spare was previously used on a different vehicle, perhaps a vehicle running a different firmware version, it may have even been previously on a copter when they are currently configuring a plane. On that previous vehicle the barometer had been allocated a PnP subject ID which it had stored in its parameters.
- when the spare baro is plugged into the new vehicle it has no idea that it has been sitting in the spares box for 6 months. It has no realtime clock, no battery, so no way of knowing that something might be wrong. It thus immediately takes its subject ID configuration and starts publishing barometer data.
- the subject ID it inherited from the other vehicle happens to be one allocated for an airspeed sensor on this new vehicle.
- as V1 has no indicator in the wire format of what message format the data is, and doesn’t even have a structural checksum like mavlink, the baro data effectively gets “network cast” into differential pressure data by the recipient.
- when the user turns on the vehicle it starts getting garbage for airspeed. The EKF might or might not be able to detect this, especially if it is a backup sensor.
Is there any existing mechanism in V1 that protects against scenarios like this? If so, can someone explain what it is?
Adding a 1 byte sensor class ID at the front of all CANDevices messages seems like a cheap way to auto-detect this type of config error for critical sensors, unless there is some mechanism for robustly detecting mis-configuration that I’ve missed. Note that this 8 bit ID is just another field in a v1 message as far as UAVCAN v1 is concerned. The meaning is given by the conventions of the message set and the table committed into the repo.
@tridge Cubepilot / Hex / ProfiCNC are in support of your proposal.
We make hardware that is used by both the Ardupilot and PX4 communities.
As the longest continuously running hardware project in the PX4 / Ardupilot community, the Pixhawk/ Cube hardware has been designed with CAN at its heart and we have dedicated considerable expense and effort in the support of UAVCAN.
As a significant stakeholder in the UAVCAN user base, we spoke up at the very beginning of this V1 push, and we were ignored by those that wanted to fragment the community by dumping support for the original UAVCAN.
Calling it experimental well after release, dropping its designation from simply UAVCAN to 0.9, and now v0 has eroded the industry’s trust in this standard.
We have bet our company future on UAVCAN, @pavel.kirienko, stop letting us down! You have one of the wisest people in this whole industry giving you absolutely brilliant advice, please bury your pride and listen to @tridge.
What Tridge is offering is a massive olive branch.
Thanks for posting your concerns, @tridge, and thanks for clarifying your good intentions. They are very well received. I want to start by stating that I agree with you on most of the issues you have raised on this thread, and I think we should definitely find a way to resolve any technical deficiencies in the standards (UAVCAN v1 & DS-015)
I’m not going to weigh in on any of the technical discussions. I can say that DS-015’s original goal was to define a Drone message set for the UAVCANv1 implementation in the same spirit as the v0 messages, but with some learnings applied, and I think that goal hasn’t changed, nor or willingness to fulfill its mission.
As @dagar pointed out above, we are not that far from being fully aligned, and I propose we work together to define the message set that works best for the drone industry.
Given the history behind APM and PX4 and how both communities still have members that continue to exacerbate the situation, we are open to discuss how to best structure governance and where and in which form so everybody feels that we have neutral grounds.
thanks Ramon. I’ll wait till Pavel has had a chance to consider my CANDevices proposal before proceeding.
I also think we need to urgently address a few associated issues:
- re-releasing pyuavcan for v0 as a “pyuavcan_v0” module in pip so it can be installed alongside the v1 variant (alternatively the v1 varient could be renamed as pyuavcan_v1).
- re-working the old uavcan v0 GUI to use the result of step 1 (should be trivial)
- getting a GUI tool going for mixed v0/v1 networks. I think the fastest way forward on this is likely to be to build upon the old uavcan_gui_tool, but maybe those working on yukon could comment on the likely timeline for a mixed v0/v1 UI based on that work?
@tridge CUAV supports your suggestions and thank you for your efforts to prevent uavcan from generating more forks. As a hardware manufacturer, we are very troubled by the difference in standards. There are already many uavcan devices on the market, and uavcan is still discussing protocol compatibility (V1 and v0). The standard has fallen far behind the product, and v1 and v0 are not compatible, which will force hardware manufacturers to abandon uavcan 1.0. Switch to other can protocols; although our hardware supports FD_CAN; but we should avoid occupying too much CAN bandwidth; because it is a trend to use can to replace traditional interfaces, including can ESC, can airspeed, and can smart battery.
Thank you everyone for the criticism. Based on the extensive feedback accumulated here, we have resolved to cancel the DS-015 project and start a new application-layer standard on top of UAVCAN v1.0 under the exclusive governance of the UAVCAN Consortium. The Consortium is ideally positioned to provide a neutral ground for its members to collaborate on the standard.
I am going to formulate a more detailed proposal and share it with the key stakeholders privately to secure initial alignment. A public announcement will follow.
I’ve been following the conversation here over the last few days and I can see that there is a very high level of energy and commitment to create a standard - otherwise the debate wouldn’t be this intense. I personally believe that simplicity is key for adoption (and for keeping things maintainable) and so having messages that you can wrap your head around within seconds instead of minutes is essential.
Back when we started DS-015 the UAVCAN consortium didn’t exist formally and in today’s world it seems to be the kind of neutral ground that can enable wider collaboration, so I wholeheartedly believe that creating a fresh, new effort here would be a good thing for the industry.
Completely resetting the effort seems fair as it originally started with message definitions that were not too far away from what v0 is carrying but with some cleanups - and coming back to this spirit seems like a consensus everyone could potentially agree on.
This topic has been closed, but @pavel.kirienko has suggested I post a final message.
We need a replacement for DS-015, one that meets the requirements for an efficient, reliable message set for sensors and actuators for drones. That new message set should be created on an open forum. I’ve suggested a topic in this forum. Network protocols created in the open tend to be technically better than those created behind closed doors.
We also need to address a few “elephants in the room”.
The first is the fact that for years now the official position on the UAVCAN website is that new designs should use v1. That should not be the case until v1 is actually workable. The key decision point for a new release of a protocol or a software package should be “what would give the best result for a new user?”. A new vendor being pointed at v1 now would do them a great disservice.
The second big thing is the lack of focus on the migration path from v0 to v1. The fact that the core tools for v0 and v1 can’t even be installed at the same time makes this glaringly obvious. We can’t build an ecosystem which gives users a clear migration path if developers can’t doing any monitoring and analysis of a mixed bus.
Finally we need to seriously consider the problem of protocol fragility. The v1 protocol embraces lack of inherent type safety as a virtue. The little example I gave above of a situation where we end up with a cast from one packet type to another and a misinterpretation of data is a massive problem. This needs to be addressed in a truly robust fashion. I know v1 is trying to be ambitious in its goals, and ambition is fine if it is backed up with rigorous answers to basic properties like robustness. The suggestion I made for a class ID in CANDevices is a band-aid over a major problem in v1, but at least it gives us something to counter the inherent fragility of v1.
Finally, I was very hard on DS-015 above. If individuals took offence at the approach I took then I apologise for that. Please remember that this is part of a much longer conversation where the problem with DS-015 and v1 have been raised many times, and it just wasn’t cutting through to get a good technical result.
We can create a great network protocol to replace v0, but we do need to keep the focus on the key requirements for any good protocol:
- robustness
- clarity
- efficiency
The DS-015 standard lacked all 3. We’re vastly better off without it.