Cyphal/UDP Routing over Multiple Networks

This is my understanding also. Considering our discussion so far there is nothing that would have to be manifested in the transport layer design.

Per my proposal above both options are acceptable.

I think you mean service transfers (there is no such thing as a service request/response message). I don’t quite understand how do you see a service transfer to cross a router if its destination node-ID is not valid on the other side of the router.

The router idea is super-interesting. This simple approach using IGMP is a great start and it provides a clear path to 802.1Qcc (SRP), 802.1CB (FRER), and other TSN protocols. Generally, this is what I’m looking for; a clear path to mapping Cyphal/CAN ports to TSN streams. This requires translation rather than encapsulation in order to maintain hardware acceleration for anything that isn’t a boundary node (i.e. we have to accept Cyphal specialization somewhere between CAN and Ethernet but once we route onto an Ethernet segment we should expect no further specialization unless/until the packet reaches another Cyphal/Ethernet<->Cyphal/CAN router).

The lack of RPC seems inelegant but, perhaps, acceptable. This is something I’ll need to consider.

My impulse is that a router is a special Cyphal node and that it is acceptable to require a valid network architecture without loops as a simplifying factor for production systems, however; such a requirement may make experimental configurations more problematic. As such, we should look to Ethernet for a solution. Don’t the multicast protocols provide mechanisms for detecting and avoiding cyclic paths?

Not to be overly pedantic but did you mean “mapping Cyphal ports to TSN streams”? My point is that there is nothing special to Cyphal/CAN ports as opposed to Cyphal/anything ports. One of the core design objectives should be to ensure a clear boundary between transport-specific features and abstract Cyphal concepts like ports.

Indeed. I want Cyphal/(TSN|UDP) to be a first-class transport that has value on its own, beyond merely tunneling Cyphal/CAN through Ethernet networks.

1 Like

Absolutely. You are correct sir. Thanks.

Using Scott’s diagram from a couple of posts ago:


In regards to the problem of the “router node” approach not handling service transfers, one possibly naive solution might be to encapsulate the protocol in cases of cross network communication to include the extra bits needed for the transport layer. So, let’s say we have NID1 on bus 0 and it wants to send a transfer to NID1 on bus 1, we can use a special routing service id X. Each node should have a routing table. So in the graph we will have:

Bus 0 NID 1:
Bus 0 → local

  • → Bus 0 NID 3 (star means all others here)

Bus 0 NID 3:
Bus 0 → local
Eth → local
Bus 1 → Bus 1 NID 3

Bus 1 NID 1:
Bus 0 → Bus 1 NID 3
Eth → Bus 1 NID 3
Bus 1 → local

To send a service transfer, Bus 0 NID 1 will craft the following request:

service(type X, src:NID1, dst:NID3 [srcBus: bus0, dstBus: bus1, actual service(type Y, src:NID1, dst:NID1)])

where [ ] is the data field.

Bus 0 NID3 will receive this transfer on the local bus (bus 0) and see that it is a routing service type X, then it will look at the data and see that the destination bus is Bus 1. Bus 0 NID3 will look at its routing table and determine that it needs to send the transfer over UDP to Bus 1 NID3. The data that was sent to it is going to be in the data section of the UDP transfer. When Bus 1 NID 3 receives the transfer, it will look inside the data and see that it needs to be sent to the local bus 1 NID1. The bus 1 NID 1 will have enough information to send a response to bus 0 NID 1.

The way to think of this approach is that we are recreating the IP address of each node by using some extra bits in the data field for the network mask. So, if we operate in 192.168.x.x then we can say that bus 0 is 192.168.0.x, bus 1 is 192.168.1.x and the Ethernet network is 192.168.2.x and all the IP can be directly mapped from the node id to an IP and back (simplified by keeping it to a 24 bit mask, but 25 fits better to the 128 NID/IP limit)

If we set the number of bits for the src/dst bus in the above service transfer to 25 bits, we could talk to the entire internet(!) with the only limitation that a local network on a bus cannot have more than 128 IPs. As an optimization, we could choose a number smaller than 25 bits by fixing the first Y bits (e.g. fixing 192.168.0 as the prefix would allow us to talk to two networks and have a single bit overhead)

A nice feature of this approach is that it adds no overhead for intra-bus communication. You only pay for the cross network communication if you request it.

The uavcan.metatransport namespace appears somewhat related to what you described but it approaches the problem differently, by simply tunneling transport frames on dedicated subjects:

https://nunaweb.opencyphal.org/api/storage/docs/docs/uavcan/index.html#uavcan_metatransport

These matters might be out of the scope of the Cyphal/UDP transport design though, as @schoberm pointed out above. If we want to tunnel things at the application layer then it matters little whether the underlying transport is CAN, UDP, or pigeons.

What are the specific use cases for RPC-service forwarding through the router nodes? In my understanding, they are to act as logical isolators between network segments, emitting/consuming data to/from topics on their own behalf. That is, in the view of a subscriber consuming data from a topic published by a router node, it is the router node itself that is the data provider on this topic and not some hidden agent on a different network segment. Is this not compatible with your requirements? Is, by any chance, talking to the entire Internet a hard requirement? (in that case, perhaps, it should be addressed differently)

I think the solution I described above would lie between Cyphal and the application layer. I’m not sure we should require users of the protocol to have to come up with a way to send a transfer from one CAN bus to another CAN bus or to an Ethernet bus.

From my perspective, if we provide a way to send broadcast messages through the router node, we should also provide a way to support RPC-style messages for the sake of completeness.

I don’t think talking to the entire Internet is a hard requirement. I was just pointing out that the solution above is extensible enough to allow it. Perhaps, @scottdixon or @schoberm could comment more here on what our requirements are.

This seems to imply that the application layer has to know about the gateway by selecting a meta-message type? Maybe I don’t understand the proposal.

May 25th meeting on VC (Jitsi Meet )

Discussed was the design for Cyphal/UDP and how we can close on an initial proposal and begin drafting a Cyphal/UDP section of the specification. Attendees included:

@pavel.kirienko
@lydiagh
@scottdixon
@schoberm
@erik.rainey

Notes:

  • Cyphal/UDP should be optimized for lab scenarios rather than integrated vehicle system scenarios.
    • Zero-touch configuration is a goal. Ideally you’d just plug a vehicle system into consumer-level ethernet switch and open yakut on your laptop attached to the same switch and it just works.
    • We need an story about deep inspection of nodes. How do we get a full view of a system from a UDP segment such that we don’t hide CAN nodes from diagnostic tools? This should also consider logging; ensuring we can record the full state of the networks over time from a UDP segment (TBD is if you can do the same from any network segment on a mixed CAN/UDP system).
    • RPC is needed for software update and other RPC calls typical in test, diagnostic, or maintenance mode (whatever you want to call it).
  • IGMP
    • Pavel suggested multicast should degrade to broadcast for most cheap routers.
    • Need to bottom out on the issues with IGMP/multicast reported by some MacIntosh users.
    • May need broadcast option where firewall rules prevent IGMP
  • ARINC-825
    • TODO: read ARINC-825 section 6 to see if there are obvious patterns we should follow.
  • Routing Service proposal

Is the shared node-ID space across all segments a dealbreaker or not? If not, then the bridge might be an adequate solution to, it would appear, all (?) of the listed points (this would obviate the need for @lydiagh’s proposal).

Can’t help with the macOS multicast issue but I don’t expect it to be a major blocker.

I think that’s right, Scott. This might require coordination at the application level but I think there are still relevant requirements we could add to the Cyphal specification itself, considering that the current specification has a section dedicated to the Application layer. We could also define a routing service and and assign it a fixed service ID like the GetInfo service.

Also, wanted to call out a silly typo in my original post to avoid confusion. In the section, where I talk about Node routing tables:

This should actually be:

Bus 0 NID 1:
Bus 0 → local
* → Bus 0 NID 3 (star means all others here)

By shared node-ID space does that mean that across all network segments we would only be able to have 128 nodes? So, for example, if we had three network segments - CAN bus 0, CAN bus 1, and Ethernet bus 1 - CAN bus 0 could hypothetically have 43 nodes, CAN bus 1 could have 42 nodes, and the Ethernet bus could have 43 nodes.

All nodes that would want to communicate across network boundaries need to be unique. However, if you have nodes that don’t need to communicate across boundaries then they can be in a subset of shared nodes (e.g. bus 0 nodes 1-50 are local only and bus 1 nodes 1-50 are local only, but bus 0 and 1 share nodes 51-128 and are globally unique). On a UDP only node the node ID could be a value higher than 128 if it doesn’t need to transfer to CAN nodes.

The bridge will throw out incompatible messages from nodes collisions etc.

2 Likes

Update on why pycyphal/UDP doesn’t work on mac:

I verified this is the issue.

2 Likes

My apologies if I misuse RPC/service transfer, etc.

edit/disclaimer if certain node IDs are already reserved we can just choose different numbers, this is mostly for example and discussion.

What about a convention/heuristic with routing or bridging that converts between CAN Node ID and IP address Node ID. edit: and put routing and bridging together as a gateway.

Assuming IP address format A.B.C.X

ABC defines the target network (as usual in IP) where A and B are static for local networks and C varies as necessary

X: 0-125 defines a CAN only node or a UDP only node
X: 126 reserved for external traffic
X: 127 defines a CAN / UDP gateway (bridge/router)
X: 126-256 defines a UDP only node

Gateway responsibilities:

  1. Route/Forward broadcast messages
  2. Bridge for RPC / service
  3. Push network traffic to a designated logging node

With the transport + gateway/bridge/router we have the following:

CAN Only nodes can only initiate RPC with nodes on their network (Net_1) regardless of transport via the gateway as long as the Node IDs are compatible. E.g. Net_0 Node_1 can initiate an RPC with CAN to Net_0 Node_57 via Net_0 Node_127, but not to Net_0 Node_132.

A broadcast from Net_1 Node_132 will be received as a broadcast from Net_1 Node_127 to CAN nodes on Net_1 (CAN Bus 1). That same broadcast would be received as a broadcast from Net_0 Node_127 to CAN nodes on Net_0 (CAN Bus 0).

All traffic on UDP nodes is forwarded to the Net_2 Node_n logger. CAN traffic on Net_1 would be forwarded by the Net_1 Node_127 gateway node to the logger, similar with Net_0 and its gateway.

For UDP to CAN RPC we can use a reserved Node ID (or set of reserved IDs) to not confuse CAN bus for external traffic. For example with a Laptop (but this could be any UDP Node)

Laptop connects to Net_2 as Node_1
Laptop starts communications with Net_1 Node_n via Net_1 Node_127
Gateway Net_1 Node_127 sets Net_2 Node_1 as Net_1 Node_126 to CAN Bus 1
Net_1 Node_n uses destination Node_126 and source Node_n for RPC
Gateway converts Node_126 back to Net_2 Node_1

If we didn’t use a reserved Node ID then Net_1 Node_n would use destination Node_1 and source Node_n which might confuse Net_1 Node_1?

Alternatively it could use Net_1 Node_127 (the gateway) as the reserved external traffic node. Or perhaps reserve 123-126 (multiple external traffic nodes).

Maybe I am thinking about this wrong and this is becoming too stateful? There may be some more optimal ways to actually do this with existing message types, etc.

edit we could probably just use ports instead of reserved Node IDs, right?

If I am reading you correctly, you are trying to solve the problem of the limited address space in the case of the bridging approach (based on snooping/spoofing). My understanding is that the only remaining issue with the bridge is the flat shared address space across all CAN segments.

Would it be a similar outcome if we were to extend the bridge example with a new parameter defined per bridge, let’s call it a node-ID mask, such that transfer forwarding in either direction (downlink/uplink) alters the source & destination node-IDs as follows:

src_node_id = (src_node_id ^ node_id_mask)  # no effect if anonymous
dst_node_id = (dst_node_id ^ node_id_mask)  # no effect if broadcast

Normally, the 7 least significant bits of the mask should be zero, but this is not strictly required.

The degenerate case where node_id_mask = 0 equals the current bridge PoC. Usage of non-zero and unique node-ID masks per bridge allows you to have multiple CAN segments connected to the same UDP segment such that nodes belonging to different CAN segments are unable to communicate/collide with each other, while nodes on the UDP segment can communicate with any CAN nodes that share the same mask.

The critical limitation is that a UDP node can only send transfers to CAN nodes that are behind a bridge whose node_id_mask equals (the UDP node’s own node-ID & 0xFF80). In the old bridge PoC, a UDP node was unable to send transfers to any CAN nodes unless its own node-ID is less than 128 (which is the degenerate case of node_id_mask=0).

I implemented this idea here:

Usage of a single gateway node will not work in the bridge scenario because it is likely to cause a transfer-ID collision. Say, if UDP nodes X and Y send a transfer with the same transfer-ID, on the other side of the bridge the second transfer will look like a duplicate of the first.

Throwing my 2 cents in:

Expanding this notion something more about “Cyphal Forwarding Across Heterogeneous Segments/Transports”, I think at the high level (above the transports like CAN or UDP) the required pieces of information are:

  • a table which correlates the Subject ID to a set of Transports which can accept that broadcast (i.e. a “Forwarding” Table)
  • a table which correlates the Transport each unique Node ID (regardless of bit depth) is located on (i.e. a “Node” Table).
    • Each Transport may have some Node ID Mask to indicate the bit space of the Nodes on that Transport.
  • a table which correlates the originating Transport (network segment) for the forwardable Subject ID (i.e. Subscription List) [optional]

Using these the Bridge can subscribe to the Broadcasts on the appropriate Transports using the subscription list (or on all Transports if none is given). When a Message is received, it simply needs to be looked up in the Forwarding table to find the set of Transports to use (subtracting the originating Transport by looking up either the subscription transport or using the Node table to lookup the originating transport associated by the Node ID). If the originating Node ID is transmittable on the other Transports, the Broadcast is sent (using the originating Node ID). Incidentally the Message does not need to be de-serialized in this scheme as long as the metadata (priority, etc) is promulgated to the transmitting Transports.

If the originating Node ID is preserved, the Cyphal notion of a Node ID regardless of Transport has to be unique and equally sized across Transports for this to work. By allowing UDP Node IDs to be a larger range, there isn’t a way to preserve the UDP originator in the CAN Transport and the information would be lost (even if a flag existed to indicate it’s loss). CAN to UDP forwarding can easily preserve a larger depth Node ID w/ some special subnet for CAN devices.

There can be multiple Bridges across multiple segments as long as each Bridge has these tables. Two CAN segments in this forwarding scheme would have to share the same limited range however. UDP segments are free to have different top level subnets when Node ID bit depth is larger.

UDP based Node Publishers which wish to be forwardable to CAN must exist within the 7 bit Node ID limit. All CAN Node Message originators would be forwardable to UDP Nodes.

While source Node ID may not be the most critical piece of information on a Transport it does help disambiguate multiple Broadcasters in the case Pavel mentioned.

I presume the tables you described are to be configured explicitly at the time when the bridge is commissioned, is that not so? The discussion in this thread is focused on purely automatic solutions unless I misunderstood the basic objectives.