Loopback = false problems

deekay · January 29, 2019, 6:09pm

Hi!

Are testing uavcan socketCan driver on can0 interface. But only few frames (2 nodestatus) are coining out from the platform.
When debugging it is noticed that int res below always returns -1.

const int res = ::recvmsg(fd_, &msg, MSG_DONTWAIT);
if (res <= 0)
{
     return (res < 0 && errno == EWOULDBLOCK) ? 0 : res; 
}

And therefore, the internal queue is not cleared hence no msg is coming back. In turn, the small queue gets full and no more frames can be sent.

candump can0 in shell works, hence the loopback is working. A test was performed to set on = 0 in socket initiation routine. When loopback = off, candump can0 did not show anything. Turning loopback back on enabled candump can0 again.

Has anyone seen this problem before?

BR

deekay · January 30, 2019, 5:38pm

After further deep diving into the uavcvan stack, it is noticed that the loopback variable in socketcan.cpp always sets to 0:

        loopback = (msg.msg_flags & static_cast<int>(MSG_CONFIRM)) != 0;
//loopback is  zero
        if (!loopback && !checkHWFilters(sockcan_frame))
        {
            return 0;
        }

Hence the recvmg seems to work, but the msg.msg_flags does not contain MSG_CONFIRM.
The platform is freescale and flexcan driver is used. Looking into flexcan drivers it looks like it uses hardware loopback. If that has something to do with it, I do not know.

pavel.kirienko · February 1, 2019, 1:27am

What do you mean by “coming out from the platform” – frames being transferred from the bus to the library, or the other way around?

What is the value of errno?

Libuavcan uses loopback frames only for time synchronization needs. If you are not using a time sync master, it will never emit loopback frames.

MSG_CONFIRM is used to flag loopback packets only. By the way, much of the background explaining why the socketcan driver was built the way it is can be found in this (very old) thread: CAN bus transmission timeout

deekay · February 1, 2019, 7:39am

Hi Pavel!

I am sorry for being unclear about the details…

Only two nodestatus messages are leaving the can0 interface and going out on the physical bus.

Regarding res = -1 and errno, I think this result was normal in this situation, so sorry for confusing…

The problem seems to be that MSG_CONFIRM flag is never set by the the socket. So the
read(..) never reach 'frame = makeUavcanFrame(sockcan_frame);
As I have understood, correct me if I am wrong, this flag is dependent on if can driver implements support for echo or not according to SocketCAN readme:

4.1.7 RAW socket returned message flags

  When using recvmsg() call, the msg->msg_flags may contain following flags:
    MSG_DONTROUTE: set when the received frame was created on the local host.
    MSG_CONFIRM: set when the frame was sent via the socket it is received on.
      This flag can be interpreted as a 'transmission confirmation' when the
      CAN driver supports the echo of frames on driver level, see 3.2 and 6.2.
      In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.

Right now I am not sure if it is a driver issue or not, but it might be… I have read some forums about similar issues with the flexcan driver, http://socket-can.996257.n3.nabble.com/Help-with-CAN-RAW-LOOPBACK-td431.html
Have tested the “tst-rcv-own-msgs” program mentioned in the thread, and it works ok, however, it does not evaluate the MSG_CONFIRM flag.

I noticed your test_socket.cpp application under libuavcan_drivers/linux/apps and will execute it also to see if it passes as well.

Thanks for the link! I will investigate further…

pavel.kirienko · February 1, 2019, 10:08am

I now realize that I accidentally misinformed you in my earlier post. The SocketCAN driver enables loopback for all outgoing frames for reasons of TX timeout management (the background is explained in the mailing list thread linked above). If the underlying CAN driver does not support loopback frames, the libuavcan’s SocketCAN driver won’t work. The symptoms you’re describing seem to match well.

Specifically, you are seeing only two frames transmitted because this is the maximum hardware TX queue depth utilized by the driver by default:

github.com

UAVCAN/libuavcan_linux/blob/4ee477b54007da03ebb585c0437c8a1ca1ce5119/include/uavcan_linux/socketcan.hpp#L372


        return true;
    }
}


public:
/**
 * Takes ownership of socket's file descriptor.
 *
 * @ref max_frames_in_socket_tx_queue       See a note in the class comment.
 */
SocketCanIface(const SystemClock& clock, int socket_fd, int max_frames_in_socket_tx_queue = 2)
    : clock_(clock)
    , fd_(socket_fd)
    , max_frames_in_socket_tx_queue_(max_frames_in_socket_tx_queue)
{
    assert(fd_ >= 0);
}


/**
 * Socket file descriptor will be closed.
 */

I see two solutions:

Modify the libuavcan’s driver: add a special mode that doesn’t expect loopback frames from SocketCAN. This approach is undesrable because it breaks the CAN scheduling guarantees.
Fix your FlexCAN driver by adding support for loopback frames. This is the preferred approach; the other one is prone to disturbing the bus in the event of a temporary interface failure.

deekay · February 1, 2019, 1:00pm

Thanks for the info!

The kernel we use on our freescale hardware is quite old. 2.6.35.3 (year 2010). And I see in the requirements of libuavcan that newer i preferred. I do not know if just modifying the kernel module driver flexcan.ko will help, but perhaps… Some thread discussion at http://socket-can.996257.n3.nabble.com/Detection-of-CAN-frame-actually-sent-td1963.html tells problem in old kernels.

deekay · February 12, 2019, 8:33pm

Hi again Pavel!
After reading the thread in the link you sent me (the old thread), and as you describe in your response here, the reason for evaluating MSG_CONFIRM flag is to handle occasions of bus interface failures (buffered frames are queued up and flushed when interface comes back online). Am I right?

If I cannot change the driver to set this flag, is my assumption correct that if I use parts from wasInPendingLoopbackSet(...) on all frames (no just the expected loopback frames but on all RX frames) we can go around the driver MSG_CONFIRM issue?

Like:

if (pending_loopback_ids_.count(frame.id) > 0)
{
     loopback = 1;
}
else 
{
     loopback = 0;
}

Kind regards

pavel.kirienko · February 13, 2019, 11:17am

I can’t comment on the code right now (because I wrote it years ago and can’t investigate it right now) but the idea is that you should remove the loopback checks completely and remove the TX queue scheduling code; just dump all outgoing frames into the low-level driver’s TX queue immediately as they appear, and route all incoming frames back to the application without any checks. You may need to implement a special handling logic for frames where a loopback is requested; since you can’t rely on the driver’s built-in low-level loopback feature, you will have to manually put every frame for which the loopback flag is set into the RX queue (otherwise features like time sync master won’t work at all). This will lead to a degraded TX timestamping performance, but there doesn’t seem to be a better way of handling this (assuming you can’t just fix your low-level driver; if you could, this whole mess won’t be necessary).

The upstream could benefit from your changes, as long as they remain disabled by default.