Sorry I’m late to this discussion - I’ve been in implementor hell battling with UDP - I love the proposal to move to COBS but wanted to throw in a couple of ideas before code gets written, since if things are changing anyway might as well consider them now. Most of these ideas are probably bad for various reasons, but shooting them down now might save troubles later.
Since COBS does a look-ahead anyway, and since zero is probably going to be a very common byte value in most streams, we might want to consider a modification where instead of one byte of overhead we add two - the first to indicate where the next zero is, and the second to count the zeroes. That would compress runs of zeroes, which potentially could reduce the total frame size. (Of course the analysis of that is very app-dependent)
Another variant of that idea might use the top bit of the first overhead byte to signal that the second run-count byte was “not one”. That would increase overhead to at least one byte every 126 payload bytes, but if we expect zeros to turn up that often anyway, that’s not a big loss.
The second thing I’d suggest is making it explicit that trailing zeroes in the frame be truncated. (or at least, the expected behavior) I think at the moment that’s kind of optional and expected only with minor version polymorphism? Since the decoder is supposed to already pretend each frame has infinite trailing zeroes that will only modify the encoder, and makes full use of the existing decoder logic. (If this is already the recommendation whoops sorry I missed it…) Again this may have the effect of actually reducing the transmitted frame size, which is always nice. This idea is mutually exclusive with the previous zero-run-compression idea… if run compression is already happening this adds almost nothing.
And now the third and final thing… this is the big one… many years ago when I was looking into serial encoding I spent some time investigating Low Density Parity Checks (LDPC) which is the ultimate evolution of forward error correction algorithms. Basically, this would replace the CRC32 check.
LDPC is related to “Turbo Codes”, which are slightly less ultimate (LDPC codes get as close as possible to the Shannon limit) but perhaps slightly easier to compute. Neither would be as quick as CRC32 but they have a big payoff… while CRC is an error checking algorithm, LDPC and Turbo Codes are error correction algorithms. So long as relatively few bits are corrupted by noise the algorithms can figure out which, and flip them back. An otherwise lost packet becomes recoverable.
Most Forward Error Correction (FEC) codes have the issue that a corruption to the error check bits is an unrecoverable error… you’d need another FEC to ‘protect’ your previous check code and then you’d need one to error correct that block… so it’s either “turtles all the way down” or you have parts of the frame which are inherently more vulnerable to noise than other parts.
LDPC is an improvement because the parity check bits are themselves included in the error correction. Another way of thinking of it is to say that the check block is never transmitted (and assumed to be a fixed value like zero) and ‘extra bits’ are appended to the payload that are flipped so that condition is true. Those “parity check” bits are either scattered through the payload or simply appended to the end. The algorithm has other nice properties, such as being constant overhead and easily implementable in ASIC/FPGA hardware, although it requires a “parity check matrix” much like the CRC32 table. Most of the complexity of the algorithm is in the generation of this table - it takes quite a lot of math to create it - but of course that doesn’t matter to the encoder/decoder once the table is prepared.
In essence you can think of LDPC codes as having a set of parity bits (of constant known value) which are connected to a ‘random-ish’ subset of the payload and ‘extra’ bits. If the receiver detects that the bits don’t quite match, it asks “what’s the minimum set of bits I have to flip back to make all the parity checks work”.
It sounds like a lot of CPU overhead, but the matrix pre-encodes much of the work. When I was last looking at the algorithm the transcoders were still a LOT slower than CRC32, but that was years ago, and there may have been significant practical improvements since then, such as matrices that are optimal for software encoders. Either way, LPDC represents the “Gold Standard” for best-possible FEC (and therefore most efficient channel use) and is worth looking at.
There are very few times in Comp. Sci that you can say an algorithm is “unbeatable” and LDPC is one of them.
Turbo Codes don’t quite reach that level but get pretty close. While developed beforehand they can be considered a kind of ‘special case’ of LDPC with a sub-optimal matrix, but are quite fast, if I recall.
In both cases, the assumption is that correcting occasional small errors gives better channel use than simply detecting errors and discarding frames. You can 'tune" the LDPC matrix so that it has a truly stunning level of reliability. Some variants even allow the payload to parity check ratio to be variable so that the algorithm can dynamically allocate more error-correction bits to cope with changing line noise conditions, at the cost of implementation complexity.
It would certainly make more efficient use of the channel than the temporal redundancy option, although at the cost of extra CPU. If we’re changing the protocol anyway, I’d say now’s the time to look at forward error-correction.
In situations where the error-correction introduces a variable-time overhead which breaks hard real-time systems, you can revert to just using it as error detection and discard the packet (as you would have to anyway if more bits were corrupted than the parity block can cope with) but that gets to be an implementation choice.
Argh, look at the time… I gotta run.