New proposed project: llvm-dsdl

llvm-dsdl

I’m proposing replacing pydsdl and nunavut both with a new project that does all code generation using MLIR and LLVM

I’ve code-generated a version that will generate c, c++, c++/pmr, Rust (std), Go, Python, Python w/ C bindings, and Typescript.

This is still a proof of concept but I’ve reached a level of comfort with the idea to propose it here. I’d like to move it into the garage and start maturing it. My plan, as the Nunavut maintainer, is to deprecate Nunavut as soon as I’ve insured llvm-dsdl has reached parity (and more) and has a release pipeline for windows, mac, and linux. This is a native compiler and will be released as binaries. It should perform significantly faster because of this.
The use of MLIR means there is a significant amount of shared structure in the compiler where adding new languages is no longer a major undertaking (It took codex about 22 minutes to add both pure Python and Python with C bindings to the project).

What’s left to do?

  1. Python testing wasn’t finished. Low-risk grunt work for the codegen
  2. release pipelines and CI
  3. Peer reviews of the design
  4. The frontend and CLI need more features
  5. documentation needs to be cleanedup
  6. Some hand-tuning of the generated types (cosmetic work)
  7. More verifications and fuzz testing
3 Likes

Very interesting! Also, sick logo.

At Zubax we’ve been migrating some of our work to Rust so the first question that comes to mind here is: are there significant blockers for no-std?

:+1:

I’m sure the effort reduction has something to do with LLM agents here beside just the new architecture.

I don’t think this tool is a replacement for PyDSDL, as far as I can see at the moment. It is close to Nunavut but PyDSDL has utility on its own, including but not limited to this: Add serialization by pavel-kirienko · Pull Request #120 · OpenCyphal/pydsdl · GitHub


I am now going to experiment with this a little with the help of my own agents. The immediate feedback I have so far after opening the repo is that there’s too much red herring in the README, it looks LLM-generated and a little noisy, I think some cleanup would help.

For sure. The documentation is a mess. This is all LLM generated. It’s a bit of an experiment in just how far I can push LLMs to work autonomously since I simply don’t have time to do all of this myself.

As for Rust no-std? Easy. I’ll do that next. I’m just having codex finish up it’s work on Python right now.

I also just pushed dsdld last night. This is a full-featured language server for DSDL. I’ve only just generated the framework but haven’t had time to debug it so it probably has big gaps but the idea is there which is a clangd-like daemon for DSDL.

What does pydsdl do that this package couldn’t do better? For one thing the performance is fantastic and it’s not even been aggressively optimized. It can chew through 2000+ DSDL types and generate c code for them in ~6 seconds.

The front end hasn’t been developed much at all so there’s a lot more clean up to do there but, in all, the fact that we’re building LLVM AST should mean this code base can be whatever pydsdl can be and can do it better simply because it’s native code.

Finally, the distribution story is going to be a bit of a pain for me but for users it’ll be so much better then python. I’m completely over Python’s broken distribution story. It’s just been a nightmare of incompatible runtimes and ten different ways to setup an environment. For llvm-dsdl my plan is to push it through all the standard distribution channels for such tools; brew, apt-get, maybe WinGet.

Of course. But I meant that the LLMs took about 20 hours to add the first language then then it only took the LLM 22 minutes to add Python after establishing all that work. The 20 hours of building a common representation with 22 minutes to add a new language on top of that shows how much logic is shared and how, relatively, little new logic is needed to add language back ends on top of it.

It exposes the AST in Python. This is occasionally useful, such as for example for serialization without code generation (see linked PR). It is also useful for linting, although you could argue that dsdld could also do that so not sure about this point specifically. It can also be used as a relatively clean reference implementation, at least for a while. Overall I see no point deprecating it unless we introduce major DSDL spec changes that would require massive rework (but it doesn’t seem like we’re going to do that soon).

Yeah, I’ve implemented linting in dsdld. I haven’t fully developed a ruleset yet but I have it doing the basics with quick fixes and refactoring. Dsdld will provide syntax highlighting, linting, autocomplete/tooltips, quick fixes, full namespace indexing, AI integration via MCP, and possibly language generation target expansion, etc. It is designed to be the only thing one needs to work with DSDL and, because it is using the same AST and MLIR as dsdlc, it will guarantee high-fidelity between the editing state of dsdl files and the code generated for it.

That’s good, but my point is still standing – I want to keep a pure-Python parser (with serializer) in parallel with llvm-dsdl because it has utility of its own that is not directly related to code generation.