There have been reports that the current approach to timestamping is too wasteful, as every timestamp is represented by a large 56-bit wide unsigned integer. There have been suggestions to utilize a more compact form for the sake of lower transfer fragmentation, lower bus load, and lower latency.
The reason why we use large 56-bit timestamps is because they provide an unambiguous context-independent representation of a point in time. A 56-bit timestamp value contains an immediately usable value that never expires and is always valid.
Let us discuss the advantages and disadvantages of smaller overflowing expirable timestamp representations. I would like to avoid talking about representations that offer a reduced resolution (lower than one microsecond) for reasons of consistency and avoiding introduction of new restrictions on application design (a message that relies on a compact timestamp may also require full microsecond resolution).
The standard 56-bit microsecond timestamp would overflow in 2284 years, which means virtually never.
A reduced 32-bit microsecond representation would overflow in 1 hour and ~12 minutes while being 24 bit shorter and introducing an additional computational burden on the application.
Neighboring byte-aligned representations (namely, 24-bit and 40-bit) do not seem to be worth the effort as one of them would overflow too frequently (16 s), and the other does not offer a significant improvement over the full timestamp (being only 16 bit shorter).
The following calculation does not show any improvement for 32-bit timestamps with CAN FD, as the freed up space would be used by padding bytes anyway:
I think trying to have one single synchronized time for the whole bus makes it too complex and introduces much difficulty in implementation as well as potential failure modes, assuming the goal is just to timestamp events.
I can think of a couple solutions:
Solution 1:
All timestamped messages become multi-frame messages (this is effectively true already). A field is added to the last message which contains the interval between the event occurring and the transmission of the first frame. 24-bit microseconds would be more than reasonable for this field, 32-bit microseconds would allow timestamping an event an hour in the past. 0 would mean timestamping is unsupported by the transmitter.
Pros:
Relatively simple to implement and offers lower overhead in most cases on CAN than the existing solution.
Provides support for optional timestamping at the protocol level. Application doesn’t really have to implement anything (at least, nothing stateful)
Cons:
On CAN FD, most messages will be single-frame and the overhead from doing this could get nasty - it basically doubles the bus time required by timestamped messages.
Protocol change
Solution 2:
Perhaps we could base timestamps off of the most recently transmitted NodeStatus message? They can effectively be uniquely identified by source node id and uptime.
A timestamp then becomes:
Source node id of the referenced NodeStatus message.
Uptime of the referenced NodeStatus message.
Interval between the transmit time of the referenced NodeStatus message and the event being timestamped.
Then the overhead of a timestamp is dependent on the number of assumptions we are willing to make, for example:
Assumption #1: The referenced NodeStatus message has the same source node id as the timestamped message
Assumption #2: The transmitter can guarantee that the timestamped transfer will not be transmitted more than a few seconds after it is generated. I.e. it is guaranteed to reference one of the 32 most recent NodeStatus messages. (Note: most nodes will probably only keep track of the last 2 or 3 from a given node)
Assumption #3: We never want to timestamp an event more than about a minute away from the most recent NodeStatus message.
That gives us a 32-bit timestamp: 5 bits for the uptime of the referenced NodeStatus message and a 27 bit signed microsecond interval between the referenced NodeStatus message and the event being timestamped.
Pros:
Arguably simpler to implement and offers lower overhead in most cases than the existing solution.
Cons:
More complex than solution 1.
Receivers of timestamped messages that care to decode the timestamp need to keep track of the most recent NodeStatus messages from transmitting nodes.
Vulnerable to priority issues - if NodeStatus messages can’t transmit because of a flooded bus, timestamped messages can’t transmit either. There would need to be some kind of priority inheritance to deal with this. This is a pain in the ass.
I really, really like the simplicity of solution 1 - it is a real shame that it will add so much overhead with CAN FD.
Also note that there’s inconsistency between the algorithm description and the pseudocode in the time sync docs: the description specifies that passive masters must synchronize with the active master, but the pseudocode does not reflect that.
Clever, although the assumption about the goal of the time sync mechanism is not perfectly valid - it is true that event timestamping is probably the most common application of a shared time model, but there are other uses which we would like to support for the sake of genericity, e.g., synchronous execution (where a setpoint is supplied with a future timestamp to be acted upon by several actuators simultaneously; although I understand that a technique similar to what you described could probably be employed there as well).
You are right to note that the disadvantage of Solution 1 is that it only makes sense with CAN 2.0, which I personally expect to be eventually displaced by CAN FD (especially in the emerging robotic applications which are unencumbered by legacy).
Unless I misunderstood the second solution, it would suffer ambiguity issues if the referenced NodeStatus message is emitted more often than once per second, although it is really trivial to fix by replacing the uptime reference with a transfer ID reference.
An important issue with both of the proposed solutions is that it only works with physical bus topologies (e.g., like CAN or MIL-STD-1553, unlike Ethernet or SpaceWire), which limits its applicability in a high-level protocol (because it would force the protocol to make strong assumptions about the medium). Do you happen to have an equally clever proposition that would be free from this limitation?
We consider the technical aspects of UAVCAN v1.0 to be already decided, but we may want to go back and introduce minor changes if there is a sufficiently compelling case built.
You seem to be referring to the old spec, are you not? The new spec states that the omission is intentional: