Unable to receive a correct transfer with heap allocation crash

Hi,

I am using libcanard library and I just want tor subscribe and receive data from the remote node.
So first, I have initiated the instance:

static O1HeapInstance* heap_;

static void* memAllocate(CanardInstance* const ins, const size_t amount)
{
    (void) ins;
    return o1heapAllocate(heap_, amount);
}

static void memFree(CanardInstance* const ins, void* const pointer)
{
    (void) ins;
    o1heapFree(heap_, pointer);
}

inline auto allocate(CanardInstance* const ins, const std::size_t amount) -> void*
{
    (void) ins;
    (void) amount;
    return nullptr;
}

inline void free(CanardInstance* const ins, void* const pointer)
{
    (void) ins;
    (void) pointer;
}

and Instance is a member of my class which is initialized during the constructor call:

ins_ = canardInit(&memAllocate, &memFree);
ins_.mtu_bytes = CANARD_MTU_CAN_CLASSIC;
ins_.node_id = 10;

After initializing my instance, I made a subscription:

(void) canardRxSubscribe(&ins_, CanardTransferKindMessage, 1234, 500, CANARD_DEFAULT_TRANSFER_ID_TIMEOUT_USEC, &radar1_subscription_);

And then, in my main loop function I just read the CAN interface to populate my CanardFrame and then using canardRxAccept:

CanardTransfer transfer;
CanardFrame frame{};
struct canfd_frame cfd;
poll_int = poll(&poll_, 1, 100);
if (poll_int <= 0) continue;

struct timespec ts;
clock_gettime(CLOCK_TAI, &ts);

uint8_t payload_buffer[500]{};
const ssize_t read_size = read(soc_, &cfd, sizeof(struct can_frame));
(void) memset(&frame, 0, sizeof(CanardFrame));
frame.timestamp_usec  = (CanardMicrosecond)((ts.tv_sec * MEGA) + (ts.tv_nsec / KILO));
frame.extended_can_id = cfd.can_id & CAN_EFF_MASK;
frame.payload_size    = cfd.len;
frame.payload         = payload_buffer;
(void) memcpy(payload_buffer, &cfd.data[0], cfd.len);
ROS_INFO("The Extended CAN ID: %d, payload size from CAN: %lu", frame.extended_can_id, frame.payload_size);

const int8_t result = canardRxAccept(&ins_, &frame, 0, &transfer);
ROS_INFO("Port ID: %d, remote_id: %d, the payload size: %lu", transfer.port_id, transfer.remote_node_id, transfer.payload_size);

However, I do not see the correct Port ID nor correct remote id and some garbage payload size and eventually the program crashes due to heap assertion:

[ INFO] [1597998075.124745439]: The Extended CAN ID: 268751402, payload size from CAN: 8
[ INFO] [1597998075.124920397]: Port ID: 58992, remote_id: 58, the payload size: 187650331065304
can_server1: /home/master-wheel/ros_ws/src/can_server/include/can_server/o1heap.c:293: o1heapAllocate: Assertion `handle != ((void *)0)' failed.

What am I doing wrong?

You donā€™t need your allocate()/free(), delete them. One of the problems is that you access invalid memory when doing your last logging call:

ROS_INFO("Port ID: %d, remote_id: %d, the payload size: %lu", transfer.port_id, transfer.remote_node_id, transfer.payload_size);

Because the transfer instance is populated only if the transfer is accepted. Please read the documentation for canardRxAccept().

I did not look further but I suggest to use Valgrind if you encounter memory issues.

P.S.: you donā€™t seem to need payload_buffer, too; you can just make the frame payload point to cfd.data.

I see. Can you please tell me what does the result of 0 means when using canardRxAccept? I can see that documentation specifies the result of 1 or negated

It means that the received frame did not complete a transfer. In other words, the library updated its inner state but there is nothing yet for your application to do because the transfer is not yet fully received (or the frame is not related to any of the existing subscriptions).

I just pushed a fix that should make the docs a bit clearer:

Thanks. Regarding the possible memory leak, I used Valgrind but it just did not make a lot of sense to me. Do you think this is due to the CAN frame or something I done in my code with heap?

[ INFO] [1598000187.289836366]: CanServer::CanServer: Trying to open a CAN
==2251== Syscall param socketcall.bind(my_addr.sa_data) points to uninitialised byte(s)
==2251==    at 0x4DD1B28: bind (in /usr/local/lib/libc.so.6)
==2251==    by 0x1455DF: CanServer::OpenCANPort() (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x144F3F: CanServer::CanServer(ros::NodeHandle&) (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x146CF3: main (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==  Address 0x1ffeffe9ea is on thread 1's stack
==2251==  in frame #1, created by CanServer::OpenCANPort() (???:)
==2251== 
[ INFO] [1598000187.468103211]: CanServer::openCANPort: Opened a CAN port
[ INFO] [1598000187.915139324]: Result: 0
[ INFO] [1598000187.932190015]: Result: 0
[ INFO] [1598000188.012498456]: Result: 0
[ INFO] [1598000188.018163940]: Result: 0
can_server1: /home/master-wheel/ros_ws/src/can_server/include/can_server/o1heap.c:293: o1heapAllocate: Assertion `handle != ((void *)0)' failed.
==2251== 
==2251== Process terminating with default action of signal 6 (SIGABRT): dumping core
==2251==    at 0x4D34070: raise (in /usr/local/lib/libc.so.6)
==2251==    by 0x4D20D67: abort (in /usr/local/lib/libc.so.6)
==2251==    by 0x4D2D5DB: ??? (in /usr/local/lib/libc.so.6)
==2251==    by 0x4D2D643: __assert_fail (in /usr/local/lib/libc.so.6)
==2251==    by 0x1638AF: o1heapAllocate (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x14473F: memAllocate(CanardInstance*, unsigned long) (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x16158F: rxAcceptFrame (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x161C0F: canardRxAccept (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x1460EF: CanServer::Run() (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x14503F: CanServer::CanServer(ros::NodeHandle&) (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251==    by 0x146CF3: main (in /home/master-wheel/ros_ws/devel/lib/can_server/can_server1)
==2251== 
==2251== HEAP SUMMARY:
==2251==     in use at exit: 8,473,300 bytes in 611 blocks
==2251==   total heap usage: 2,239 allocs, 1,628 frees, 8,715,682 bytes allocated
==2251== 
==2251== LEAK SUMMARY:
==2251==    definitely lost: 16 bytes in 2 blocks
==2251==    indirectly lost: 0 bytes in 0 blocks
==2251==      possibly lost: 1,824 bytes in 6 blocks
==2251==    still reachable: 8,471,460 bytes in 603 blocks
==2251==         suppressed: 0 bytes in 0 blocks
==2251== Rerun with --leak-check=full to see details of leaked memory
==2251== 
==2251== Use --track-origins=yes to see where uninitialised values come from
==2251== For lists of detected and suppressed errors, rerun with: -s
==2251== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Iā€™m sorry but I canā€™t debug your code for you. The best suggestion I can make is to sit down and sift through the output carefully. Fixing memory issues takes time and dedication, itā€™s natural. For a start you can just follow the advice given by Valgring at the end of the output.

I am not asking you to debug my code but just asking whether this is caused by incorrectly using the libcanard which is obviously is and you being the active member I thought you could easily spot some stupid mistake. Nevermind, I will try to find how I am misusing the library.

1 Like

Ok I found a mistake if anyone encounters. I simply did not allocate the heap. I am using Linux so I removed the o1heap instance and simply rewrote the memalocation like this:

static void* memAllocate(CanardInstance* const ins, const size_t amount)
{
    //(void) ins;
    //return o1heapAllocate(heap_, amount);
    return malloc(amount);
}

static void memFree(CanardInstance* const ins, void* const pointer)
{
    free(pointer);
}

I should warn you that if the code you posted above is still in use, you did not actually fix the problem but made it less apparent, for now. You can clearly see that the pointer that was not supposed to be NULL became so, which implies memory corruption (stack corruption, most likely). Therefore your program is likely to crash again but in a different place this time.

Where do you mean the pointer is becoming NULL?

Here:

can_server1: /home/master-wheel/ros_ws/src/can_server/include/can_server/o1heap.c:293: o1heapAllocate: Assertion `handle != ((void *)0)' failed.

I am not using o1heap anymore, that is why I posted the above code that I only use malloc and free without any heap instances

It doesnā€™t matter. The problem that caused it to crash is obviously still there, because your change does not affect the root cause.

What do you mean? With no heap instance there is no way for my program to even call the o1heapAllocate, it does not even exist now. Previously, my program used the declared but not initialized instace:

static O1HeapInstance* heap_;

And during the canardRxAccept, it called the function from o1heapAllocate:
return o1heapAllocate(heap_, amount);

There, heap_ was NULLPTR and it crashed my program but now, that function does not exist anymore

You donā€™t understand. The crash was not caused by o1heap, it was caused by a broken API contract which was due to the problem in your code. You changed your code such that it no longer relies on that API, which made the problem disappear for now, but you did not fix the root cause of the broken contract.

Looking at your OP post again I can see that you are not checking the return value of read() which implies that you are handling garbage data, which is at least one of the critical issues.