SocketCAN API on a RTOS

PetervdPerk · February 10, 2020, 3:56pm

This is a follow up to the SocketCAN proposal discussion for the NuttX RTOS.

SocketCAN is allowing the CAN controllers to be addressed through the network socket layer (see wikipedia picture below - SocketCAN - Wikipedia).

The Nuttx architecture already supports the socket interface for networking, so a SocketCAN driver should nicely fit.

The UAVCAN library already supports the SocketCAN interface, so it can easily be used in Linux systems.

So an architecture wise nice solution for adding UAVCAN to the S32K14x chips would be to:

1. Create a CAN controller driver for the S32K14x
2. Create a SocketCAN driver which would be benificial for all Nuttx targets.

We’ve received feedback from Pavel

I concur with the general idea that SocketCAN in Linux may be suboptimal for time-deterministic systems. At the same time, as far as I understand SocketCAN, the limitations are not due to its fundamental design but rather due to its Linux-specific implementation details. I admit that I did not dig deep but my initial understanding is that it should be possible to design a sensible API in the spirit of SocketCAN that would meet the design objectives I linked in this thread earlier.
https://forum.opencyphal.org/t/queue-disciplines-and-linux-socketcan/548

Based on all the feedback, I decided to create a testbed running SocketCAN on a microcontroller. That’s when I found Zephyr RTOS which is partly posix compatible and provides a SocketCAN implementation.

The testbed consists of:

NXP FRDM-K64F board with
NXP DEVKIT-COMM CAN tranceiver.
Zephyr RTOS
CAN bus bitrate 1Mhz
PC sending a 8 byte CAN frame every 100ms

When the interrupt occurs the GPIO pin will be pulled up and when the userspace application receives the CAN frame the GPIO will be pulled down. which be shown by the yellow line, the pink line is the can frame.

1st Measurement Zephyr Native CAN implementation
Time from frame to interrupt is 12.8us
Time from interrupt to user space copy 10.85us
On my oscilloscope I don’t see variance in the “interrupt to user space copy” therefore jitter < 0.001us

2nd Measurement Zephyr SocketCAN implementation
Time from frame to interrupt is 12.8us
Time from interrupt to user space copy 65.32us
On my oscilloscope I don’t see variance in the “interrupt to user space copy” therefore jitter < 0.001us

Testbed conclusion

Zephyr SocketCAN does increase the latency from interrupt to userspace by 54.47us, is this acceptable? I don’t know, furthermore I didn’t look in the specifics of the Zephyr network stack, fine tuning can be achieved.
Zephyr SocketCAN seems to be deterministic and doesn’t increase jitter which is good realtime behavior.

Most of the points discussed in RTOS CAN API requirements can be covered by the SocketCAN api.

Received/Transmitted frames shall be accurately timestamped. This can be achieved by adding a socket options which will enable this behavior.
Transmit frames that could not be transmitted by the specified deadline shall be automatically discarded by the driver. Can also be achieved by a socket option, or an ioctl if libuavcan wants full control.
Avoidance of the inner priority inversion problem I didn’t look into the specifics of the Zephyr CAN implementation but I assume a correct implementation will avoid this problem.
SocketCAN allows easy porting of posix applications to an RTOS, I do have internal port of libuavcan (master branch) to Zephyr which was fairly simple. It’s working on the FRDM-K64F board but I have to look for better tooling to measure realtime performance of the full stack.

If the UAVCAN team can provide feedback on these results that would be great.

pavel.kirienko · February 11, 2020, 6:33pm

It is hard to judge without having specific application requirements at hand but my educated guess is that 54 microseconds on a 120 MHz ARMCM4 as a starting point (i.e., design worst case) might be acceptable, keeping in mind also that NuttX-specific opportunities for optimization may become available.

Regarding the question raised in the email about whether it makes sense to keep the old API alongside SocketCAN: I can’t say for other applications, but we found the old interface to be unsuitable for the needs of distributed real-time control systems, which was the reason we ended up writing baremetal drivers instead of relying on the RTOS-native APIs. Given that, I don’t see much value in keeping the old API around other than to support legacy applications, if there are any.

scottdixon · February 12, 2020, 6:19am

Thanks for all the due diligence here. It’s strong work but I’m not clear on the parameters of your testing. What interrupt are you measuring here? It looks like the receive interrupt? I’m actually more concerned with the latency and jitter of the transmitted messages because of the additional complexity introduced by intermediate tx buffers. I’ll explain this in a bit but first; we also need to look at jitter experienced when more than one user-space process is sending messages and with other load on the system that could interrupt the kernel (e.g. ethernet traffic). This is where socketCAN can lose the ability to provide the expected realtime guarantees that on-metal firmware enjoys.

Buffer Bloat and Virtual Arbitration

In this (very rough, sorry, didn’t have much time for this) diagram I’m showing what a Linux device using libuavcan v0 will have for tx queues and the path an enqueued message would traverse to get on-bus (I am omitting the peripheral and possible DMA queues to simplify our discussion). The revision cloud labeled (1) shows an unintended intermediate step that, for some queue disciplines, is completely incorrect (e.g. codel) and for others sub-optimal (e.g. pfifo_fast). In addition to the additional latency introduced by the kernel buffer we have priority inversions where the media layer’s expectation is that hardware-managed arbitration is taking place as soon as the next message in its priority buffer is sent to the system. Instead, the message sent to the system may get stuck behind a lower priority message sent by another process. One solution is to change the queueing discipline to replicate CAN arbitration but we already do this in libuavcan’s media layer so now we would be doing this twice. Even more, the software managed arbitration makes the tx buffer act like a virtual CAN bus on top of the real CAN bus. This changes the topology of the bus creating a star network on top of one (or more, if there are more posix systems on this bus) of the real CAN nodes. While I haven’t fully thought through the ramifications of this it does seem like modeling tools may fail to account for latencies or bandwidth utilization if we treat these posix nodes as if they were regular CAN nodes. Trying to model the virtual topology could help but it would require the UAVCAN standard to provide guidance for how routing between two networks should be handled.

A proposal

What if we avoided the problems of buffer bloat and multiple-virtual arbitration by moving the libuavcan media layer into the kernel and having each user-space transport layer interface with it directly. This would provide the exact same characteristics for two or more applications in an RTOS as two or more threads in a process that had direct and exclusive access to a CAN peripheral. This provides us with a single set of problems to solve in libuavcan and provides logic in the kernel optimized for the UAVCAN protocol. SocketCAN would still be the API but the socket would get setup differently; for example:

s = socket(PF_CAN, SOCK_RAW, CAN_UAVCAN);

The revision cloud (2) in this diagram still denotes the use of software managed arbitration but a single tx buffer means we have far more deterministic behaviour. Furthermore, we can optimize timestamping within the kernel layer reducing the amount of userspace CPU time each application will have to dedicate to this task. Finally, we are able to handle redundant interfaces in this kernel module which might otherwise become odd where each application had to manage redundancy independently.

PetervdPerk · February 12, 2020, 2:52pm

Hi Pavel & Scott,

Thank you for the valuable feedback

@pavel.kirienko, I do agree 54 microseconds is “not great, not terrible” if we take latency and determinism in account when implementing SocketCAN we can aim to get this number down. About the NuttX mailing list discussion, I do agree that I see no use in the old CAN interface and don’t want its weight bear down the new implementation.

@scottdixon I’m measuring the receive interrupt, and then I’m measuring the time it takes to get the data to the userspace application. I didn’t measured the transmitted messages but based on your feedback I’ve decided to measure them as well.

The same setup as in my initial post, but now the MCU is sending a CAN message every 100ms. Before sending it will pull the GPIO up, that’s how we can measure the latency from send to the CAN frame on the CAN bus

(unfortunately I’m not allowed to upload images therefore I’ve uploaded them to imgur)

Zephyr Native CAN send
Has a latency of ~ 4us with a jitter ~ 0.5us

[Album] Zephyr Native CAN transmit latency

Zephyr SocketCAN send
Has a latency ~ 55us with a jitter ~ 0.5us

[Album] Zephyr SocketCAN send

It’s true that numbers are without the influence from other processes/tasks on the system. And the question is how can we still provide realtime guarantees in these cases.

I think your proposal by moving the libuavcan transport/media layer into the kernel by adding a network family CAN_UAVCAN just like TCP/UDP is on top of IP is a great idea. (it even opens the possibility to add this network family to Linux). However this approach also has some caveats like:

libuavcan is c++, if we want to move this transport layer would mean a rewrite in C (we can use libcanard though I’m not sure if it feature complete)
I’m not sure how flexible libuavcan design is when it comes to move the transport layer, it might require some kind of fork
I’m not sure how big the libuavcan transport layer is but I would suppose that a kernel maintainer doesn’t want to maintain a big & complex network family
We’re still not sure if this approach would improve the realtime performance (high effort vs low reward)

However we do have an alternative, we can implement libuavcan on Nuttx using SocketCAN and use some kind of zero-copy queue discipline to avoid buffer bloat you’ve explained in your picture. Which will avoid these unnecessary buffers, then if the implementation is good enough we can always decide to move the libuavcan transport layer into kernel.

pavel.kirienko · February 12, 2020, 4:24pm

I like where this is going. Moving the transport layer into the kernel does sound like an interesting idea. I would also like to see it implemented in the Linux version of SocketCAN. With the transport layer implemented in the kernel, and the DSDL serialization implemented in Nunavut in an implementation-agnostic way, there will be very little logic left in Libuavcan itself, so I imagine that some applications would choose to work on the raw socket layer without any additional library logic on top.

I am working on Libcanard (slowly) and there’s not much left to be done. The transport layer is under 1k lines of C99 (in essence, Libcanard is just the transport layer itself with some very minimal serialization logic on top which is irrelevant for the kernel) so I imagine porting that into the kernel space should be fairly trivial.

PetervdPerk · February 28, 2020, 9:49am

An update about implementing SocketCAN onto NuttX.

Initial SocketCAN implementation is working on a S32K148EVB. We can create a PF_CAN socket and read/write to it. I’m supporting both CAN and CAN FD.

Below are some preliminary results from the NuttX SocketCAN implementation, please do note this is on S32K148 with a Cortex-M4 @ 80Mhz compared to the K64 with a Cortex-M4 @ 120Mhz used for the Zephyr tests. The receive latency has been decreased compared to the Zephyr implementation and the transmit latency is a bit worse, however I’ve already found the problem why the transmit latency is higher and can probably reduce it to 30-40us however this needs some further testing.

Preliminary results	Transmit	Receive
NuttX SocketCAN	67us	26us
NuttX SocketCAN FD	71us	30us

If you’re interested in the code it’s located on SocketCAN NuttX. Current implementation has been mostly focused on getting all required features to work and I didn’t look specifically in performance and network scheduling.

There’s still a lot of work on my todo list to make it fully SocketCAN compliant, please find the list of missing features below.

Missing features (TODO list)

CAN & CAN FD bitrate configuration (autobaud)
Socket socktopt filtering support
Self-receive loopback support
Filter/error mask support
Multi-CAN device
Avoid priority inversion (Here I’ve to investigate how NuttX network scheduling works)
Avoid buffer bloat
Implementing specific UAVCAN socket options e.g. frame timestamping, transmitted frame removal from tx mailbox

pavel.kirienko · February 28, 2020, 10:10am

This is wonderful.

CAN & CAN FD bitrate configuration (autobaud)

Do you have an approach in mind on how to implement autobaud for the data segment of CAN FD frames when BRS is set? One possible solution is to define the data segment bitrate as a function of the arbitration segment bit rate (e.g., a fixed multiplier ca. 2x…5x). We have a table of recommended bit rates which is supposed to define the data bit rate as a multiple of the arbitration bit rate but as of right now it is not yet populated (see section 7.2.2.2 CAN FD). I’ve been told that the hardware people of Amazon are going to provide guidance there and if things go right we will have that table populated very soon so we could utilize that.

Avoid priority inversion (Here I’ve to investigate how NuttX network scheduling works)

Please, please look at the recommendations in section 4.2.4.3 Inner priority inversion. I believe that the considerations expressed there can be used to gauge the adequacy of a TX pipeline. If you have any reservations about that, please let us know.

This is great, I have just a minor nit: I think for documentation purposes it’s best to call these real-time-specific options rather than UAVCAN-specific because UAVCAN is not unique in these requirements.

PetervdPerk · February 28, 2020, 11:19am

Do you have an approach in mind on how to implement autobaud for the data segment of CAN FD frames when BRS is set?

Not particularly but I will look into timing you mentioned. But probably it will kinda look like what David Sidrane did in the Kinetis CAN driver. However feedback is always welcome

Please, please look at the recommendations in section 4.2.4.3 Inner priority inversion

Thanks I was aware of that, therefore we’ve to verify that we get this specified behavior in the SocketCAN implementation.

This is great, I have just a minor nit: I think for documentation purposes it’s best to call these real-time-specific options rather than UAVCAN-specific because UAVCAN is not unique in these requirements.

I called them UAVCAN socket options because it’s not official part of the reference SocketCAN implementation. But we can call it something like SocketCAN real-time extensions / socket options, however for now these will be only available on the NuttX operating system and not in the reference SocketCAN implementation on Linux.

PetervdPerk · March 6, 2020, 1:53pm

Just a small update for this week, current focus has been on running the libuavcan library on NuttX. Unfortunely I ran into some problems getting the C++ std library to work. However eventually I managed to compile to make the LLVM C++ std library to work (instructed here). The PX4 team seems to use another solution unfortunately I didn’t got any response yet from my question on their slack channel.

Demo video:

The demo consists of a S32K148EVB running NuttX 8.2 with the SocketCAN port. It’s connected to a PC running the uavcan_dynamic_node_id_server through a PeakCAN usb dongle. The UAVCAN GUI tool is sniffing through the Zubax babel. What you see is that the S32K148EVB is able to publish msgs on the DebugKeyValue topic, however it’s not perfect it seems that the NodeStatus topic publishing incorrect uptime values and the name is not always coming up in the UAVCAN GUI tool. However for demo and proof-of-concept purposes it seems to be working. Next focus will be on finishing my TODO list mentioned on the previous post

PetervdPerk · March 12, 2020, 2:34pm

Update: I’ve implement the software based CAN_RAW_FILTER socket option where specific can frames can be filtered on socket basis. I’m also thinking about implementing an ioctl to do hw based CAN filtering to save CPU cycles.

Also the SocketCAN implementation now supports the SO_TIMESTAMP socket option. When this option is enabled, you can use the recvmsg() function and a cmsg structure to receive associated timestamp in a timeval structure. The timestamp is retrieved using the clock_systimespec() function and the accuracy of this timestamp depends systimer tick period (NuttX default is 10000us) or you can enable the high-res RTC timer which is based on a 32.768kHz which gives us an accuracy of 31us.

I couldn’t really find hard requirements on the accuracy of this clock in the UAVCAN spec. So the question is, does a 32.768kHz RTC provide enough accuracy? Or do we have use other timing functions to get higher accuracy?

If anyone is interested in the code it’s located on my Github under:

And required apps

I’ve divided now my todo list into a functionality todo list to get a working product and a todo list that focuses on improving quality-of-life:

Functionality todo list:

Transmitted frame removal from TX mailbox
Comply to 4.2.4.3 Inner priority inversion

Quality-of-life todo list:

CAN & CAN FD bitrate configuration (autobaud)
Self-receive loopback support
Filter/error mask support
Multi-CAN device
Ioctl for HW based CAN filtering

PetervdPerk · March 18, 2020, 11:40am

Update: I’ve added a new socket option called CAN_RAW_TX_DEADLINE. When this socket option is enabled you’ve to use sendmsg() function and send a CAN frame with deadline (struct timeval) in the cmsg header.

When the CAN driver supports this TX deadline option it will store this deadline timestamp when sending the CAN msg and setup a TX timeout watchdog for the mailbox it used to send the CAN frame. When the TX interrupt occurs the corresponding watchdog linked to a mailbox gets cancelled. In the case when the watchdog timer fires the TX deadlines in all mailboxes will be checked and CAN frames that have timed out will be aborted.

Currently I’m in the process of cleaning up code and merging back to apache/incubator-nuttx/SocketCAN branch. For more information see https://github.com/apache/incubator-nuttx/pull/581/

pavel.kirienko · March 18, 2020, 6:33pm

The work described here so far looks great and I have no substantial feedback. I think this is generally on the right track. We will most likely need to have this API supported on other platforms as well (STM32 first of all) so a porting guide would help, I suppose.

I expect that 31 us will meet the requirements of most applications. Are there going to be facilities for higher-resolution timestamping shall the application require that?

PetervdPerk · March 19, 2020, 8:00am

Hi Pavel, Good to hear you’re pleased with the results.

When the merging has been completed, I will look into making documentation, add sample code (probably a port of can-utils candump) to NuttX, and make a basic porting guide. For now for porting new hardware I would refer to the existing ethernet device porting guide (Porting Guide - NUTTX - Apache Software Foundation) since the SocketCAN relies on the same software interface.

I do have a question about uavcan v1, so libcanard v1 has now been released. But what’s the status of libuavcan v1, can I start making a NuttX SocketCAN driver? Is there are a Linux SocketCAN driver I can use as a base? I cannot find the CAN drivers anymore in the libuavcan v1 repo and uavcan github page.

Are there going to be facilities for higher-resolution timestamping shall the application require that?

So the high resolution timer is a bit of a NuttX problem, it’s uses the MCU RTC as the high resolution timer which doesn’t give that much resolution because it based on a 32.768khz clock. The solution for this would be using a dedicated timer with a high frequency input as the high resolution timer which will give you higher precision. The people from PX4 uses a out-of-tree solution where they make their own high resolution timer see (hrt.c) so in PX4 we could use the hrt timer instead. But in the future we could ask PX4 to upstream their solution or discuss with the NuttX developers what would be a good solution.

pavel.kirienko · March 19, 2020, 11:20am

It’s a deep WIP. We are not going to see the release until 2020Q3.

Yes. The low-level API is already defined mostly. It may undergo minor modifications as we extend the library to be transport-agnostic but I expect that change to be fairly minor on the API side.

@noxuz has built a bare-metal Libuavcan v1 driver for S32K already; he’s going to send a pull request soon; meanwhile, you can find his work here:

Yes, there is a Libuavcan v0 driver for GNU/Linux SocketCAN (all drivers for all libraries have been moved into this single repository):

You will notice that it relies on a clumsy method of TX timeout management where it keeps track of enqueued frames; it causes issues with some SocketCAN backends and also may cause a priority inversion under some circumstances. Thankfully, with your new API, it will be possible to resolve these issues.

Seeing as the driver should also be compatible with GNU/Linux, I think it should somehow detect if the extended API options you’ve implemented are supported and simply omit the relevant functionality if they are not. For example, if the TX timeout tracking is not supported, it should behave as if the TX timeout is infinite, etc.

While we are at it, we could really use the same driver for Libcanard. It’s much simpler than for Libuavcan because the library itself does not interact with the underlying media layer; see the existing example for v0.

pavel.kirienko · March 27, 2020, 11:23pm

@lorenz.meier said today that the work on the media layer adapter bridging SocketCAN with libuavcan/libcanard is about to be underway (if not already), so I would like to provide some hands-on guidance here. I invite @scottdixon to verify this and expand my recommendations as necessary. CC @TSC21 FYI.

Generally, I think the solid work @PetervdPerk has done covers most if not all of the specifics of real-time communication, so the media layer implementations for our libraries will amount to a few hundred lines of trivial glue logic. It makes sense to consult with the v0 platform-specific components, keeping in mind that in v1 the media layer has been extended to support CAN FD and in some sense simplified.

Libuavcan v1

Libuavcan has a layered architecture where it bridges the underlying transport network and the application:

+---------------+
|  Application  |
+-------+-------+
        |
+-------+-------+
|   Libuavcan   |
+-------+-------+
        |  <---------- you are here
+-------+-------+
|  Media layer  |  <-- e.g., SocketCAN
+-------+-------+
        |
+-------+-------+
|    Hardware   |
+---------------+

So there are two interfaces: the front-end and the back-end. The library is not yet implemented but the back-end interface is already defined under /libuavcan/include/libuavcan/media. The interface may end up being slightly altered after the support for other transports is built into the design, but such changes are not expected to have any major impact on existing implementations so it is safe to proceed.

The media layer adapter is just the implementation of the interfaces defined under libuavcan::media; @noxuz has done it for his baremetal S32K driver. An important difference compared to v0 is that the TX queue and transmission deadline tracking will now be managed by SocketCAN itself (in v0 it used to be managed by Libuavcan). This design choice has certain implications for Linux.

Seeing as the Linux version of SocketCAN does not support transmission deadline tracking, that option should be simply omitted when compiling for Linux. In v0 we run a sophisticated queue monitoring trying to enforce deadlines and prevent priority inversion, but in reality, neither of the objectives are attained fully and there are serious issues with certain types of CAN hardware (discussed on this forum and on GitHub/Libuavcan) so that logic should not be recreated. This feature can be re-enabled later if someone managed to upstream Peter’s real-time extensions into mainline Linux SocketCAN.

The adapter should be designed to support any flavor of SocketCAN on any POSIX-like system. Judging by Peter’s experiment with Libuavcan v0, I don’t expect such OS-agnosticism to be hard to ensure, but it does mean that we will have to let go of certain facilities to accommodate the requirements of real-time systems: heap, exception handling, etc. will have to go.

I think that despite this being a concrete driver, it might make sense to make it header-only to simplify integration.

None of the apps and utilities shipped with the original v0 SocketCAN driver are needed anymore.

This component should be located under /socketcan/libuavcan in the platform-specific component repository.

Libcanard v1

Libcanard v1 is not layered. The library provides data handling services for the application; it is the responsibility of the latter to organize the flow of data between the library, the hardware, and the business logic:

+---------------------------------+
|           Application           |
+-------+-----------------+-------+
        |                 | <----------- you are here
+-------+-------+ +-------+-------+
|   Libcanard   | |  Media layer  |  <-- e.g., SocketCAN
+---------------+ +-------+-------+
                          |
                  +-------+-------+
                  |    Hardware   |
                  +---------------+

It follows that the definition of the interface between the application and the media layer is entirely up to its designer. I imagine in our case here it would amount to a simple wrapper that hides the somewhat clumsy Berkeley socket API behind a libcanard-oriented wrapper; perhaps it might look like this:

typedef int fd_t;
fd_t socketcanOpen(const char* const can_iface_name, const bool can_fd);
int16_t socketcanTransmit(const fd_t fd, const struct canfd_frame* const frame, const uint64_t deadline_usec);
int16_t socketcanReceive(const fd_t fd, struct canfd_frame* const out_frame, uint64_t* const out_timestamp_usec);
int16_t socketcanConfigureFilter(const fd_t fd, const size_t num_filters, const struct can_filter* filters);

Reliance on Libcanard itself is entirely optional.

We should strive to follow the rules of MISRA C, although there is no intention to enforce full documentation compliance.

This component should be located under /socketcan/libcanard in the platform-specific component repository.

You may use the old v0 driver as an inspiration but note that its API is a bit overcomplicated.

UPDATE: keep in mind that multiple SocketCAN sockets may be used on the same node side-by-side in configurations with redundant transports.

Quality assurance

The v1 codebase adheres to a higher quality standard. We enforce a consistent coding style following the Zubax C/C++ Conventions, use static analysis extensively, and strive to provide full test coverage. We recognize that it may sometimes be impractical to hold the platform-specific components (which we are discussing here) to the same stringent QA requirements because our resources are limited. Still, we would appreciate contributions leveraging the following methods:

Clang-Tidy and Clang-Format. A recommended configuration can be found in the Libcanard v1 repo.
A cloud static analysis tool supporting a decent subset of MISRA and CERT rules. Our C/C++ libraries currently leverage SonarQube, but we are entirely open to new options (they are actually preferable for the sake of exploration). The tool should be configured adequately, relevant rules should be enabled.
Configured CI/CD pipeline, preferably with HITL.

Also see the contribution guideline for Libuavcan and Libcanard.