The resource utilization issues you are describing seem to be rooted in the fact that your networks are highly dynamic.
Oh yes, definitely. I’m very aware I’m porting this to an application domain which is a little bit ‘next door’ to what UAVCAN was originally intended for, but I’m hoping my experiences can be helpful to make the spec. more widely used in those domains. I see a lot of potential there, and the work on alternative transport protocols is an indication that UAVCAN is trying to expand in those directions.
I’ve had extensive experience writing microcontroller firmware over the years for PICs, Atmels (Arduinos), and now ESP, and a constant issue has been finding ways to link them together. Protocols are either so heavy-weight that they won’t fit the memory constraints, or light-weight that they fail to provide enough capability. (eg. classic MODBUS doesn’t do strings or floating point numbers) IoT is all the rage now, and while RESTful services are easier to implement on modern chips, their Achilles heel is the need for centralized servers.
In v0 we had a provision for dynamically reconfigurable networks where we allowed transfer-ID counters to be dropped by timeout to reclaim the memory back. If you consider this measure sufficient to support your case, we could consider re-introducing that provision back into v1 by lowering the requirement level of the above text to “ destruction of transfer-ID counter states is not recommended ”.
The last thing I want to do is cause changes to the spec. which either make it more complicated, or which degrade the original focus on high-reliability intra-vehicle networks. What I suspect is happening here is a conflict between two core design goals of UAVCAN: high reliability for hard real-time systems, and minimal shared context.
The Transfer ID’s are the ‘battleground’ between those two goals - they are the minimum shared state required to create high reliability, and in static networks with fixed numbers of nodes and transports they do that with minimal overhead. All good.
But yes, in more dynamic implementations with arbitrary numbers of peers and alternate transports, that minimum shared state starts getting quite large. One design goal starts losing the conflict with the other.
Worse, the moment something becomes optional in the spec. (or “not recommended”) it begins causing problems for implementors. They’ll ask why it’s optional, and in what cases. That’s bad, confusion must be avoided at all costs.
So how do we resolve those conflicting goals? How do we keep FAA CAST-16 reliability, while potentially enabling low-context alternate transports for other domains (especially configuration and ‘debug’ monitoring) while preventing confusion?
Other specs resolve this by having ‘Profiles’. eg: the MPEG/H264 standards specify a set of features, but also define which features should be enabled in certain circumstances. eg: Limits on block sizes which allow ‘hardware’ ASIC decoders in Blueray players to guarantee they will always be able to decode a baseline stream (since once sold, those players last for decades in people’s homes and cannot be upgraded) but which are more flexible and advanced in other profiles intended for modern professional-grade camera gear which only needs to communicate with equally modern editing software.
So they have a “Baseline Profile” used in videoconferencing hardware, a “High Profile” for high-definition television broadcasts, and “High 4:4:4 Predictive Profile” for pro camera gear. (as well as others) In effect the same algorithms are used in each profile, but with changes to datatype sizes and limits on the amount of processing power and memory available to the codec.
I would advise thinking along the same lines, and defining which features of UAVCAN form a “High Reliability Profile” (basically the entire current feature set) which gives the safety-critical guarantees you’ve worked so hard for, but also a “Low State Profile” which lists what parts of the protocol become optional when optimizing for that domain.
Any place in the spec where you say must or shall or should potentially gets marked with the profile that it’s for.
Ideally the profiles are interoperable in at least one direction… the same way an “Extended Profile” H.264 decoder can parse “Constrained Baseline Profile” streams but not vice-versa. But it’s clear to implementors that if they don’t conform to some optional part of the spec, then their implementation sits in a different class, even if they can exchange compatible frames.
This also gives you options down the track if you wish to extend the spec further, such as adding extra timing constraints that might differentiate a “Hard Real-Time” profile (perhaps implemented in an FPGA) from libraries such as pyuavcan which will always be limited by OS delays.
The issues I’m having is that I have to keep a large amount of state to satisfy a Profile that I’m never going to reach regardless. There’s no way a UDP transport over WiFi is ever going to meet FAA CAST-16 reliability standards.
So if I can’t reach that bar anyway, but still see huge advantages in using UAVCAN for it’s app-level features like SI datatypes and decentralized node discovery, then what other parts of the spec. also become optional? Making my own choices on a feature-by-feature basis seems… unwise. And has the potential to fragment the standard into confetti.
I am currently working on a UAVCAN bootloader for deeply embedded systems that supports UAVCAN/CAN alongside with UAVCAN/serial over UART or USB CDC ACM. The bootloader exposes an independent logical node instance per transport interface, so they do not form a redundant group. Hence, a request received over USB is responded to using USB only, on the assumption that the interfaces interconnect completely different networks (e.g., the CAN may be connected to the vehicular bus while the USB may interconnect only the local node and the technician’s laptop).
Yes, that is exactly the kind of use case I’m also looking at! (The TCP serial case is mostly a wireless version of the same) That means you’ve also got a situation where you need to allocate lots of state to provide the alternate transport, so much it might potentially interfere with the CAN functions.
If I understand the bootloader concept, you may have a situation where the USB interface is connected to a host network with an unknown set of logical nodes, have to emit a broad range of subjects, and invoke services (if you’re using uavcan.file to retrieve the new firmware) from arbitrary remote units. Possibly while sharing some state/services between the ‘logical interface nodes’ within the unit. (such as statistics and registers) I have this problem multiplied by a potentially arbitrary number of TCP/IP connections (practically limited to a dozen or so) if I choose to implement that transport.
Anything that can be done to reduce the overhead of the serial transport implicitly improves the reliability of the CAN transport. Paradoxically, degrading the reliability of one can improve the other. In this case, “The Perfect is the enemy of the Good” quite literally.
This is specified explicitly but the wording may be suboptimal. Observe, section 4.1.4 Transfer reception
For a given session specifier, a successfully reassembled transfer that is temporally separated from any other successfully reassembled transfer under the same session specifier by more than the transfer-ID timeout is considered unique regardless of its transfer-ID value.
I’ll admit I didn’t get the full implication of that at first, but I did read the bit which said:
4.1.4.1:
Transfer-ID timeout is a time interval that is 2 (two) seconds long. The semantics of this entity are explained below. Implementations are allowed to redefine this value provided that such redefinition is explicitly documented.
And said to myself “Well, in that case I’ll simply set it to Zero for my implementation and I’ll document that and then I won’t have to worry about it anymore.” - again, not nice and probably not what you wanted, but if you’re going to make things optional then some of us are going to take the easy way out.
Although in my defense I was thinking about the WiFi UDP and TCP/IP serial protocols where the order is fairly strictly determined by the laws of physics (for the radio link) and TCP protocol. Plus I’m a big fan of idempotency.
The alternative is storing even more monotonic Transfer ID’s per session for up to 2 seconds on what could be high-traffic links (10Mbit/s over WiFi) and that could easily overflow my microcontrollers’ limited RAM.
hmm… re-reading it again I still don’t see if the protocol explicitly specifies what should happen if the transfers (not frames, but complete transfers) arrive out-of-sequence. The Transfer-ID timeout seems to de-duplicate identical transfer ID’s within that interval, but non-sequential transfers seem to be totally allowed.
I suppose that implicitly means that out-of-order transfers are allowed and should be responded to in the order they arrive? And if they’re service requests it’s up to the requester to sort it out when the replies get back?
I guess that’s unavoidable. If a unit starts rebooting several times a second (eg from power brownouts) you have to accept transfers where the counter’s gone back to zero, although if it’s within the 2 second window they will be discarded.
Oh, on the topic of reception timestamps:
4.1.4.1:
Transport frame reception timestamp specifies the moment of time when the frame is received by a node. Transfer reception timestamp is the reception timestamp of the earliest received frame of the transfer.
In cases where a frame arrives in small fragments (say over serial links) Would you prefer that timestamp to be sampled at the start of the frame or the end? I could set the timestamp to the start frame delimiter, the first actual frame byte, last byte, or the end delimiter in cases where they’re measurably different. I’d probably pick the first frame byte, since frame delimiters can be ambiguous as to which one you’re getting.
ps: when I’m re/quoting you, what’s the bbcode to include your name header thingy? That quite nice.