I’d like to propose a new sub project for Cyphal which implements an extremely limited Ethernet/IP/UDP stack implemented in C++14 in a very small footprint (hopefully < 10k when finished). This would be an ideal stack for a micro controller to run which does not need the bulk of LwIP (30k) or other stacks.
Basic Ethernet Frames - possibly VLANs in the future
ARP Requests and Responses
Local Addresses, Multicast, Private Lan, etc.
ICMP (echo request/response only)
IGMP (for alerting other networking equipment)
Multiple Interfaces and a Loadable Route Table
Hardware offload (or software) CRC/Checksum
As few frame copies as possible (none ideally)
No support for IPv4 header options
No support for TCP at all
No support for other ICMP subtypes
No Posix Socket Model
UDP Registrar, Listeners, and Transmitters - This is the client interface layer.
IP Stack - holds the Route Table and implements a UDP Registrar and an IP Packet Listener
Interfaces - holds the IPv4 address, implements the IP Packet Transmitter and Ethernet Frame Listener
Devices - holds the MAC address and ARP Table, interface to control the Ethernet Controllers (HW), takes a reference to the Interface to push frames up.
Frame Pool - a statically allocated number of ethernet frames
Some Global objects to disconnect the libc calls like printf, etc along with a standards implementation which calls them.
Heavy constexpr usage to test some features at compile time.
C++14 w/ move semantics to prevent Frames from being access at multiple layers at the same time.
As few #defines as possible preferring namespace redirection or constexprs.
Has need of some common templates like span which could come from CETL
Cyphal of course would sit directly on top of this mdoule and send/receive UDP datagrams using the Client Interface layers (UDP Registrar, UDP Listener, UDP Transmitter)
Is it going to be possible to disable the ARP support at compilation time if it’s not needed? A Cyphal/UDP application that does not make use of other IP-based protocols does not need unicast support; hence ARP becomes redundant. It is a potentially significant saving because the ARP cache needs memory.
It would be nice if the frame buffers were heap-allocated instead as it would allow for a significant reduction of memory copying when passing data between different components of the application (e.g., between your stack and libudpard). We understand that “heap” does not mean “bad for real-time”; possibly the opposite if buffer copying is eliminated. Here, when I say “heap” I don’t mean malloc but rather o1heap or a similar constant-complexity low-fragmentation solution.
I don’t understand the entire design yet, but I would cautiously invite everyone not to avoid dynamic memory unnecessarily as its avoidance often costs in complexity (and in this case in buffer copying). Our o1heap, suitably applied, covers items a, b, c, e from the list below:
Item “d” is probably outside of the scope of the allocator itself but is addressable with smart pointers.
Do I understand correctly that the device layer is responsible for the construction of link-layer packets from outgoing IP packets, but the interface layer is responsible for the reverse operation — extraction of IP packets from link-layer packets? This is an interesting asymmetry.
ARP could be disable-able but I found it to be needed to get the stack off the ground for a couple different uses (ICMP echos primarily and then some non-multicast UDP). I only put 5 entries in the ARP cache in my testing so it’s rather small here per entry (16 bytes) and the total functionality is very tiny (<200 LOC), so not a huge impact on removal. Today, it doesn’t support all ARP features, just responses to it’s own IP request.
The frame pool itself is not required to be static or dynamic, there’s just an interface to provide it and the default implementation is static. Once provided it works w/ smart pointers and move semantics in the stack. The reason I started was this was in anticipation of needing to delay packets due to a need for state machines, or delayed transmission or retransmission, which so far I have been able to avoid, so the Frame Pool may vanish all together. Since the call paths are straight through to the other side the need for the fanciness here is possibly an over-complication. Once we incorporate the IGMP support, we’d re-evaluate the need for it.
The (near) zero-copy approach that the stack takes means that the same frame which holds the received frame off the wire is passed all the way to the application. Since most platforms I work with are little endian and not Network Order, there’s a bit of “copy and flip” code to reverse the endianness to get into Network Byte Order per object (Ethernet Frame, IP Header, UDP Header, ICMP Datagram, etc). It does not touch the datagram itself and leaves that to clients.
Receive Call Order:
Device::ProcessIncomingFrames() → grabs frame, extracts frame data from hardware, extracts frame header
Interface::OnReceiveAsFrame(frameheader, frame) → asks frame for subspan of IP packet, extract ip header
Stack::OnReceiveAsPacket(ipheader, frame) → asks frame for subspan for UDP Datagram, extracts datagram header
udp::Listener::OnReceiveDatagram(addresses, ports, datagram&) → cyphal copies into transfer reassembly layer
The transmit side starts with a ethernet frame sized buffer and reverse packs in the datagram, then udp header then IP header, then finally the frame header. Since this is an extremely limited functionality stack, there’s really only 1 way do this. The future VLAN support would require the ethernet Frame to grow by a short, so that would be a compile-time choice then.
Interface: udp::Transmitter::Transmit(address, ports, datagram&) (Actually in Stack)-> allocates frame, copies from cyphal’s transfer buffer into frame. Constructs UDP header in frame. Determines which Interface to route to
Interface::TransmitAsPacket(<ip metadata>, frame) → Constructs IP header in frame (handle loopback address)
Device::TransmitAsFrame(<frame metadata>, frame) → Constructs the Frame header in the frame and transmitted (however the platform does that).
I want to note that Device is just a C++ pure virtual interface, not a supplied class like the network interfaces or the stack. It’s glue code that a platform needs to write given the primitives supplied by MicroNet and their OS/FW.
Also to note, IP packet fragmentation would also be an unsupported feature. Hopefully we weren’t hoping for that in Cyphal/UDP?
Does the application receive a smart pointer, or is it still required to copy the frame into its own buffer? What I’m after is to be able to pass the frame payload into libudpard without copying (possibly by releasing it from the smart pointer) and the other way around. This is expected to be highly beneficial for single-frame transfers.
For multi-frame transfers, one copy will be necessary, though (from the frame buffer into the contiguous reassembly buffer), as gather-scatter buffers are a bit difficult to work with in bare C (in Python, we use multi-fragment buffers specifically to eliminate all data copying throughout the stack, end-to-end, and Nunavut serialization is designed around it).