Unknown reason for timeout during file.Read

I am implementing SW update routine for my application and I use the example code provided for libuavcan. However I experience that .Read service call result sometimes fails. Below is a modified version of the response handler for the read_client. I have implemented retries to get around the problem until I figure out the root cause.

So, when it happens, result.isSuccessful() is FALSE, e.g. result.getStatus() != Success.
When looking in GUI tool to see the frames, all look good and the messages are complete without errors, but the SW update reading routing stops due to above fault. Hence my retry code fix below.

My conclusion so far is that the service call timed out, but I am not sure why. Best guess is that the response was not processed by stack in time, or that some frame got lost on the way from bus to lib.

I am calling .spin(…) periodically, with 250ms blocking time. My applications does stuff in between calls, but nothing that blocks. My node subscribes many broadcasts so I guess some time is needed to process them during spin(). The read_client call has a timeout of 1000ms.

    void handleReadResponse(const uavcan::ServiceCallResult<uavcan::protocol::file::Read> &result)
    {
        static uint8_t retry = 0;

        if (result.isSuccessful() && result.getResponse().error.value == 0)
        {
            auto &data = result.getResponse().data;
            retry = 0;
            image_.insert(image_.end(), data.begin(), data.end());

            if (data.size() < data.capacity())
            { // Termination condition
                status_ = Status::Success;
                uavcan::TimerBase::stop();
                DEBUG_OUT(DEBUG_LVL_2, "Finished!");
            }
        }
        else
        {
            retry++;
            DEBUG_OUT(DEBUG_LVL_2, "Read fail, status = " << result.getStatus());
            if(retry > 3)
            {
                status_ = Status::Failure;
                uavcan::TimerBase::stop();
                DEBUG_OUT(DEBUG_LVL_2, "STOP");
                retry = 0;
            }
            else
            {
                DEBUG_OUT(DEBUG_LVL_2, "Retry " << unsigned(retry));
            }
        }
    }

The above approach seams to solve the problem, however I am keen on making my application multithreaded with subnode(s) as per example provided here https://uavcan.org/Implementations/Libuavcan/Tutorials/12._Multithreading/
to let some uavcan functionalities such as sw update run in main thread, and other application functions run in sub node.

Can this be a more controlled way of handling this type of problems?

Are there more examples for linux and socketCan on the Multithreading topic?

Kind regards

I just notices the test_multithreading.cpp in linux driver folder…

In order to weed out the possibility of this being timeout-related, increase the timeout drastically (say, 30 seconds) and see if the problem is resolved. If not, you may be losing frames in the TX queue of your server-side CAN interface. To investigate that, get a USB CAN adapter (a separate adapter! you won’t be able to assess the bus reliably using the same interface that is used for sending Read responses) and use it to ensure that the response frames make their way to the bus quickly enough for the request to not time out and that none of them are lost. That should help you determine where to dig next.

Are there more examples for linux and socketCan on the Multithreading topic?

None that I know of, but you could search GitHub.

Hi!

My setup consists of PC (linux) running GUI tool, another PC (linux) running GUI tool and my target linux platform. All with their separate CAN interface connected on the same CAN bus (via socketCAN) . One of the GUI tools runs as file server using “firmware update” feature. The other GUI just sniffs the bus and according to the sniffing everything looks ok, just a bit long delay between request and response (~300ms), but not as long as 1000ms. All frames are actually transferred to the bus.

But I will test to prolong the timeout on the client (target platform) side and test again to see if the problem lies there (which it probably does).

Thanks!
BR

Just a short question:

Does read_client_.hasPendingCalls() return false after the request callback has returned? Assumed no more calls are queued…

BR

Yes.

Yes, that was obvious :grin:

Extended timeout seems to solve it.

1 Like