Cyphal/UDP Routing over Multiple Networks

lydiagh · May 23, 2022, 2:48pm

I think the solution I described above would lie between Cyphal and the application layer. I’m not sure we should require users of the protocol to have to come up with a way to send a transfer from one CAN bus to another CAN bus or to an Ethernet bus.

From my perspective, if we provide a way to send broadcast messages through the router node, we should also provide a way to support RPC-style messages for the sake of completeness.

I don’t think talking to the entire Internet is a hard requirement. I was just pointing out that the solution above is extensible enough to allow it. Perhaps, @scottdixon or @schoberm could comment more here on what our requirements are.

scottdixon · May 24, 2022, 10:12pm

This seems to imply that the application layer has to know about the gateway by selecting a meta-message type? Maybe I don’t understand the proposal.

scottdixon · May 25, 2022, 6:26pm

May 25th meeting on VC (Jitsi Meet )

Discussed was the design for Cyphal/UDP and how we can close on an initial proposal and begin drafting a Cyphal/UDP section of the specification. Attendees included:

@pavel.kirienko
@lydiagh
@scottdixon
@schoberm
@erik.rainey

Notes:

Cyphal/UDP should be optimized for lab scenarios rather than integrated vehicle system scenarios.
- Zero-touch configuration is a goal. Ideally you’d just plug a vehicle system into consumer-level ethernet switch and open yakut on your laptop attached to the same switch and it just works.
- We need an story about deep inspection of nodes. How do we get a full view of a system from a UDP segment such that we don’t hide CAN nodes from diagnostic tools? This should also consider logging; ensuring we can record the full state of the networks over time from a UDP segment (TBD is if you can do the same from any network segment on a mixed CAN/UDP system).
- RPC is needed for software update and other RPC calls typical in test, diagnostic, or maintenance mode (whatever you want to call it).
IGMP
- Pavel suggested multicast should degrade to broadcast for most cheap routers.
- Need to bottom out on the issues with IGMP/multicast reported by some MacIntosh users.
- May need broadcast option where firewall rules prevent IGMP
ARINC-825
- TODO: read ARINC-825 section 6 to see if there are obvious patterns we should follow.
Routing Service proposal
- Read Cyphal/UDP Routing over Multiple Networks - #19 by lydiagh and consider this alternative to the current IP/bit-shifting scheme.

pavel.kirienko · May 25, 2022, 6:33pm

Is the shared node-ID space across all segments a dealbreaker or not? If not, then the bridge might be an adequate solution to, it would appear, all (?) of the listed points (this would obviate the need for @lydiagh’s proposal).

Can’t help with the macOS multicast issue but I don’t expect it to be a major blocker.

lydiagh · May 25, 2022, 6:51pm

I think that’s right, Scott. This might require coordination at the application level but I think there are still relevant requirements we could add to the Cyphal specification itself, considering that the current specification has a section dedicated to the Application layer. We could also define a routing service and and assign it a fixed service ID like the GetInfo service.

Also, wanted to call out a silly typo in my original post to avoid confusion. In the section, where I talk about Node routing tables:

This should actually be:

Bus 0 NID 1:
Bus 0 → local
* → Bus 0 NID 3 (star means all others here)

lydiagh · May 25, 2022, 7:01pm

By shared node-ID space does that mean that across all network segments we would only be able to have 128 nodes? So, for example, if we had three network segments - CAN bus 0, CAN bus 1, and Ethernet bus 1 - CAN bus 0 could hypothetically have 43 nodes, CAN bus 1 could have 42 nodes, and the Ethernet bus could have 43 nodes.

schoberm · May 25, 2022, 7:16pm

All nodes that would want to communicate across network boundaries need to be unique. However, if you have nodes that don’t need to communicate across boundaries then they can be in a subset of shared nodes (e.g. bus 0 nodes 1-50 are local only and bus 1 nodes 1-50 are local only, but bus 0 and 1 share nodes 51-128 and are globally unique). On a UDP only node the node ID could be a value higher than 128 if it doesn’t need to transfer to CAN nodes.

The bridge will throw out incompatible messages from nodes collisions etc.

scottdixon · May 27, 2022, 7:16pm

Update on why pycyphal/UDP doesn’t work on mac:

I verified this is the issue.

schoberm · June 7, 2022, 6:27pm

My apologies if I misuse RPC/service transfer, etc.

edit/disclaimer if certain node IDs are already reserved we can just choose different numbers, this is mostly for example and discussion.

What about a convention/heuristic with routing or bridging that converts between CAN Node ID and IP address Node ID. edit: and put routing and bridging together as a gateway.

Assuming IP address format A.B.C.X

ABC defines the target network (as usual in IP) where A and B are static for local networks and C varies as necessary

X: 0-125 defines a CAN only node or a UDP only node
X: 126 reserved for external traffic
X: 127 defines a CAN / UDP gateway (bridge/router)
X: 126-256 defines a UDP only node

Gateway responsibilities:

Route/Forward broadcast messages
Bridge for RPC / service
Push network traffic to a designated logging node

With the transport + gateway/bridge/router we have the following:

CAN Only nodes can only initiate RPC with nodes on their network (Net_1) regardless of transport via the gateway as long as the Node IDs are compatible. E.g. Net_0 Node_1 can initiate an RPC with CAN to Net_0 Node_57 via Net_0 Node_127, but not to Net_0 Node_132.

A broadcast from Net_1 Node_132 will be received as a broadcast from Net_1 Node_127 to CAN nodes on Net_1 (CAN Bus 1). That same broadcast would be received as a broadcast from Net_0 Node_127 to CAN nodes on Net_0 (CAN Bus 0).

All traffic on UDP nodes is forwarded to the Net_2 Node_n logger. CAN traffic on Net_1 would be forwarded by the Net_1 Node_127 gateway node to the logger, similar with Net_0 and its gateway.

For UDP to CAN RPC we can use a reserved Node ID (or set of reserved IDs) to not confuse CAN bus for external traffic. For example with a Laptop (but this could be any UDP Node)

Laptop connects to Net_2 as Node_1
Laptop starts communications with Net_1 Node_n via Net_1 Node_127
Gateway Net_1 Node_127 sets Net_2 Node_1 as Net_1 Node_126 to CAN Bus 1
Net_1 Node_n uses destination Node_126 and source Node_n for RPC
Gateway converts Node_126 back to Net_2 Node_1

If we didn’t use a reserved Node ID then Net_1 Node_n would use destination Node_1 and source Node_n which might confuse Net_1 Node_1?

Alternatively it could use Net_1 Node_127 (the gateway) as the reserved external traffic node. Or perhaps reserve 123-126 (multiple external traffic nodes).

Maybe I am thinking about this wrong and this is becoming too stateful? There may be some more optimal ways to actually do this with existing message types, etc.

edit we could probably just use ports instead of reserved Node IDs, right?

pavel.kirienko · June 9, 2022, 10:00pm

If I am reading you correctly, you are trying to solve the problem of the limited address space in the case of the bridging approach (based on snooping/spoofing). My understanding is that the only remaining issue with the bridge is the flat shared address space across all CAN segments.

Would it be a similar outcome if we were to extend the bridge example with a new parameter defined per bridge, let’s call it a node-ID mask, such that transfer forwarding in either direction (downlink/uplink) alters the source & destination node-IDs as follows:

src_node_id = (src_node_id ^ node_id_mask)  # no effect if anonymous
dst_node_id = (dst_node_id ^ node_id_mask)  # no effect if broadcast

Normally, the 7 least significant bits of the mask should be zero, but this is not strictly required.

The degenerate case where node_id_mask = 0 equals the current bridge PoC. Usage of non-zero and unique node-ID masks per bridge allows you to have multiple CAN segments connected to the same UDP segment such that nodes belonging to different CAN segments are unable to communicate/collide with each other, while nodes on the UDP segment can communicate with any CAN nodes that share the same mask.

The critical limitation is that a UDP node can only send transfers to CAN nodes that are behind a bridge whose node_id_mask equals (the UDP node’s own node-ID & 0xFF80). In the old bridge PoC, a UDP node was unable to send transfers to any CAN nodes unless its own node-ID is less than 128 (which is the degenerate case of node_id_mask=0).

I implemented this idea here:

pavel.kirienko · June 10, 2022, 1:12pm

Usage of a single gateway node will not work in the bridge scenario because it is likely to cause a transfer-ID collision. Say, if UDP nodes X and Y send a transfer with the same transfer-ID, on the other side of the bridge the second transfer will look like a duplicate of the first.

erik.rainey · August 17, 2022, 3:34am

Throwing my 2 cents in:

Expanding this notion something more about “Cyphal Forwarding Across Heterogeneous Segments/Transports”, I think at the high level (above the transports like CAN or UDP) the required pieces of information are:

a table which correlates the Subject ID to a set of Transports which can accept that broadcast (i.e. a “Forwarding” Table)
a table which correlates the Transport each unique Node ID (regardless of bit depth) is located on (i.e. a “Node” Table).
- Each Transport may have some Node ID Mask to indicate the bit space of the Nodes on that Transport.
a table which correlates the originating Transport (network segment) for the forwardable Subject ID (i.e. Subscription List) [optional]

Using these the Bridge can subscribe to the Broadcasts on the appropriate Transports using the subscription list (or on all Transports if none is given). When a Message is received, it simply needs to be looked up in the Forwarding table to find the set of Transports to use (subtracting the originating Transport by looking up either the subscription transport or using the Node table to lookup the originating transport associated by the Node ID). If the originating Node ID is transmittable on the other Transports, the Broadcast is sent (using the originating Node ID). Incidentally the Message does not need to be de-serialized in this scheme as long as the metadata (priority, etc) is promulgated to the transmitting Transports.

If the originating Node ID is preserved, the Cyphal notion of a Node ID regardless of Transport has to be unique and equally sized across Transports for this to work. By allowing UDP Node IDs to be a larger range, there isn’t a way to preserve the UDP originator in the CAN Transport and the information would be lost (even if a flag existed to indicate it’s loss). CAN to UDP forwarding can easily preserve a larger depth Node ID w/ some special subnet for CAN devices.

There can be multiple Bridges across multiple segments as long as each Bridge has these tables. Two CAN segments in this forwarding scheme would have to share the same limited range however. UDP segments are free to have different top level subnets when Node ID bit depth is larger.

UDP based Node Publishers which wish to be forwardable to CAN must exist within the 7 bit Node ID limit. All CAN Node Message originators would be forwardable to UDP Nodes.

While source Node ID may not be the most critical piece of information on a Transport it does help disambiguate multiple Broadcasters in the case Pavel mentioned.

pavel.kirienko · August 17, 2022, 7:44am

I presume the tables you described are to be configured explicitly at the time when the bridge is commissioned, is that not so? The discussion in this thread is focused on purely automatic solutions unless I misunderstood the basic objectives.

erik.rainey · August 17, 2022, 7:58pm

This is a predefined set of knowledge, yes. For any automatic support these tables in the Bridge could be populated at run time if the Bridge has no previous knowledge of the Node ID topology or the Subject IDs of such nodes by passively sniffing the the HeartBeat (should take a few seconds to populate each segment the Bridge connects to). Then the Forwarding Table can be configurable via a Service interface unless the protocol itself marks each message as “forwardable” (w/ a flag however it would be assumed that it’s forwardable everywhere where it didn’t come from). This mechanism is equivalent to the IGMP multicast notifications to Routers. Conversely some Node, not necessarily the originator, could query the Bridge to insert a forwarding entry into it’s table. It could be up to the Bridge to disallow after a certain point or from random Node IDs.

The Bridge Service could report how many transports/segement are available (as a mask), query/modify existing Forwards, and add/sub a Subject ID as a forwardable message. The pre-compiled table version of a Bridge would simply come with a completely fleshed out Forwarding table which would return failures on modify/add/sub.

@union
# Empty
GetTransportMask get_mask
# Empty
GetForwardTableEntryCount get_count
# Contains the index which must be < size <= capacity
GetForwardTableEntryByIndex get_entry
# Contains a ForwardTableEntry and index (modify on existing or add when index == size but < capacity)
# Subtractions could remove an index by setting it to a blank fields. 
SetForwardTableEntryByIndex set_entry
---
@union
# Contains a bit mask of a set of Transports (or some other set topology)
TransportMask full_mask
# Contains active size plus capacity of the table where size <= capacity
ForwardTableEntryCount count 
# Contains the Subject ID + Transport Mask
ForwardTableEntry entry
# Success or Failure of Setting the Entry
StatusCode code

pavel.kirienko · August 19, 2022, 4:51pm

As we just discussed at the call, the current bridge demo is incompatible with the security requirements. We should explore the possibility of extending the bridge with additional configuration parameters to address that.

Extending the protocol with additional services or states that modify the forwarding behaviors is possible but there might be some risk of defeating the advantages provided by the DCPS architecture.

erik.rainey · August 23, 2022, 1:40am

This Bridge Service as constructed above does have some downsides in that a Node which is modifying the Forwarding Table needs to have special knowledge about each segment a Transport is servicing. In the Transport Mask, each Transport availble is simply a bit in a mask w/ no information as to what or why it should be enabled. This may facilitate the need for a Service to query a Bridge Node about it’s segments which could communicate other important information like it’s Node ID to Transport Table per transport.

The effect of adding these services and features however it to convert the Bridge from needing to have a static table to possibly having a dynamic table if the implementer wishes. Moreover it just moves the locus of static knowledge from the Bridge to some other Node (perhaps a script or Human behind dev tools).

The more we design this to be flexible, the more it seems like some form of SNMP (Simple Network Management Protocol).

Only a little bit of this addresses the original ask of automatic behaviors. If we wish the publishing and receiving nodes to be unaware of the network topology themselves then that forces us into a position where the Bridges must have these forwarding notions built in (either statically or dynamically) by listen for the Port List and forwarding everything by default or some subset by configuration.

The security aspect forces us to consider that fully automatic forwarding of all information may not be a good idea.

pavel.kirienko · August 25, 2022, 10:05am

I think it would be useful to extend the bridge demo I linked above with optional statically configured forwarding masks. The configuration can be set up using dedicated registers (e.g., bit arrays per interface selecting which subject/service to forward or something similar); this way the configuration can be modified at runtime by the system integrator by merely changing the value of the register. The default configuration could be to allow forwarding of everything everywhere.

erik.rainey · August 25, 2022, 10:53pm

That’s a good start.