As indicated on the UDRAL Planning Document, a robust, machine-comprehensible error protocol would be a powerful addition to the new UDRAL standard, allowing devices such as flight controllers to intelligently monitor for errors and handle component failures. While UAVCANv1 defines some protocols related to error handling - namely uavcan.node.Health (published as part of uavcan.node.Heartbeat) and uavcan.diagnostic.Record, these alone seem insufficient to provide robust error monitoring and handling.
This thread is intended to organize discussion around the two questions:
- How should UDRAL service classes report errors?
- How will UDRAL handle error reporting in an efficient and robust manner?
The planning document currently has a proposal which would publish Health messages at a low regular rate for each service class implemented by a node in addition to the node heartbeat Health message. This proposal allows for the monitoring and handling of health and errors on each service, but it has a few disadvantages:
- If a node only implements one service, then the service health would be redundant to the node heartbeat health
- The Health message contains little information on the actual type of error; it merely reports that an error has happened with a certain severity. This means that any type of error handling based on Health reports would need to rely on inference based on service class type, which does not seem like a particularly robust or flexible method to handle failures.
With that in mind, this thread is intended to propose superior solutions. If anyone has a better error handling solution, perhaps they would like to chime in.