Generate rich navigable HTML docs using Nunavut

DSDL definitions in their original form are hard to read by humans which impedes the adoption of UAVCAN and DS-015. Yet DSDL is sufficient for describing behaviors of a distributed computing system (DCS) without the need to resort to additional means of documentation (that would run the risk of divergence). It is therefore desirable to make DSDL specifications more approachable for humans without changing the language or specifications themselves.

To illustrate, suppose that you want to implement the servo network service as defined by the DS-015 standard. You go to the service definition file:

…whereat you see that to fully grasp what’s in there you need to do quite a bit of jumping around the files in the repo that are not even syntax-highlighted. This is a serious obstacle if you are just evaluating whether UAVCAN/DS-015 are the right solutions for you.

We, therefore, need to come up with a better presentation of DSDL definitions. The solution I propose is to define an additional target for Nunavut that yields HTML pages with documentation per DSDL root namespace. But before we get to that, there is one blocker to take care of:

Exposing comments in the AST constructed by PyDSDL

PyDSDL is the DSDL processing front-end used by Nunavut. It accepts a root namespace and yields a well-annotated AST based on that. Currently, PyDSDL discards comments, so we need to change this behavior:

The AST should be extended with two extra entities — composite type documentation and attribute documentation:

# This header comment is the documentation for this composite type.
# It may span an arbitrary number of lines and is terminated by the first non-comment line.
float64[4] foo  # This is an attribute comment for field "foo"
bool bar
# This is an attribute comment for field "bar".
# It spans multiple lines.

# This comment is not attached to anything because it follows a blank line, so it is dropped.
uavcan.primitive.Empty.1.0 baz  # This is for "baz".
# And this one is for "baz", too.
# This comment is attached to the response section.
void64 # This comment is for the padding field.
int64 MATH_PI = 4
# This is the best known approximation of Pi.

The composite type documentation is to be exposed via new property doc:str on pydsdl.CompositeType. A similar property should be added to pydsdl.Attribute.

Comments can be extracted from the source file by adding a new node handler visit_comment() to the internal class pydsdl.parser._ParseTreeProcessor.

The leading # and the space after it (if present) should be removed.

Once this is done, we can proceed to the second part.

Emitting HTML using Nunavut

Proper templates provided, Nunavut can map a DSDL root namespace to a fully-static website (which may be contained in one or several HTML files, perhaps with additional files for styles, scripts, or other resources; in the interest of portability it might be better to bundle everything into one large file). It is important to rely on a web-compatible format because we can’t require the user to download any artifacts to be able to explore DSDL.

The view should be similar to a directory tree. Take the standard root namespace uavcan:

- uavcan
  + diagnostic
  + file
  + internet
  + metatransport
  + node
  + pnp
  + primitive
  + register
  + si
  + time

The user clicks on a namespace and it expands in-place. The same goes for data type definitions, this is important:

- uavcan
  - diagnostic
    + Record.1.0 [fixed subject-ID 8184, extent 300 bytes]
    - Record.1.1 [fixed subject-ID 8184, extent 300 bytes]

        Generic human-readable text message for logging and displaying purposes.
        Generally, it should be published at the lowest priority level.
      + uavcan.time.SynchronizedTimestamp.1.0 timestamp
        Optional timestamp in the network-synchronized time system; zero if undefined.
        The timestamp value conveys the exact moment when the reported event took place.
      + Severity.1.0 severity
        uint8[<256] text
        Message text.
        Normally, messages should be kept as short as possible, especially those of high severity.

    + Severity.1.0
  + file
  + internet
  + metatransport
  + node
  + pnp
  + primitive
  + register
  + si
  + time

The text should be syntax-highlighted but it does not need to replicate the source token-by-token (it is not even possible because the AST does not contain the required information). It is easier to re-generate the text by simply invoking __str__() on each attribute and adding the docs around them:

>>> import pydsdl
>>> composites = pydsdl.read_namespace('public_regulated_data_types/uavcan')
>>> str(composites[1].attributes[2])
'saturated uint8[<=112] text'

The user may click any attribute inside a composite type and it would expand in-place in the same manner. Another kind of click (with a modifier key like shift+click or using a dedicated button) should take the user directly to the definition of the attribute’s type instead of unfurling it in-place.

Hovering over a field, type, or namespace should display its contents along with key information like size but without doc comments in a quick pop-up.

PyDSDL provides the offset information per field; it should be displayed next to the field to simplify manual serialization and to keep the user aware of the data footprint.

Many doc comments contain references to other data types. They lack any special formatting but full data type names are sufficiently unique to unambiguously detect them in text as-is. For example:

Notice the reference to reg.drone.physics.kinematics.translation.Velocity1VarTs. The version number is not given, which means that the latest one is implied (v0.1 in this case). Such references should be automatically highlighted as clickable links. There may also be links to namespaces (with or without the trailing .*:

This fragment should take the user to the namespace reg.drone.service.actuator.common.sp.

Due to the fact that Nunavut is unable to process more than one namespace at once, links to foreign root namespaces would necessarily navigate the user to a different generated site. If the generated site is compressed into a single HTML file the navigation would be trivial to implement since we know that an entity like reg.anything can be reached via URI like reg.html#anything.

There are special data type definitions that are used to document namespaces. They are named _ (single low line), one is shown above. Such data types need not be shown in the output but instead, their contents should be expanded directly under the corresponding namespace entry.

I think it is sensible to interpret the text of doc comments as Markdown to allow data type developers to construct more appealing documentation. It would require fixing the formatting across the public regulated data types repository but it is no big deal.

@bbworld1 Would you like to work on this? This is very high-priority right now (above Yukon) because it is perceived to be an adoption blocker.

@scottdixon Did I miss anything important?

Sorry for the slightly late response. I am definitely interested in working on this, but I don’t have much time this weekend to work on it - I will however probably be able to work on it next week.

What are the time constraints on this task?

There is no rigid limit but we should aim to have it deployed by end of April, so in 4 weeks.

Design-wise, I’d model this as a language in Nunavut. By treating this as just another language you shouldn’t have to build anything new in the API or core library and can utilize the language ‘support’ mechanism for the ancillary, static artifacts.

What the language is, specifically, is interesting. Our options (in my view) are:

  • html – Generating html is the easiest path and would require no new dependencies or capabilities by Nunavut. However, it would lead to a lot of duplication if we wanted to support additional documentation formats like PDF it might drive Nunavut to reinvent existing and mature documentation translation frameworks like pandoc, which would be distracting and not useful.
  • latex – Generating latex is interesting since this can act as an intermediate format from which PDF or HTML can be generated. The downside is that HTML from latex tends to be … not great.
  • docbook – Using an XML format like docbook as an intermediate would allow good translations to other formats like HTML using pandoc. The downside to this is that Nunavut would not be sufficient to generate HTML (i.e. you would need to go through nunavut → docbook → html)
  • sphinx – Generating Sphinx-compliant ReStructuredText would provide a pythonic intermediate format that has good html generators with minimal additional Python dependencies but which would allow for translation to PDF and other formats using pandoc.
  • xhtml – Similar to docbook, generating XHTML would output a valid set of HTML documents but only the ugly, structural part of the information. This ugliness could then be transformed (possibly using only css) into something beautiful and human.

These are valid points but if we want really rich previews (think George Soros-rich) with hovers and folds, can we obtain that with docbook or sphinx? LaTeX is an interesting option but any work done with latex is 30% suffering so my preference is to steer clear of it unless one requires really high-quality static output (like we do with the Specification).

I’m unfamiliar with docbook but, being XML and therefore pure structure, I have to imagine it would translate into rich HTML nicely. Another option I didn’t mention in my list is XHTML which we could use to generate sterile, ugly, but structurally correct HTML that can be translated into beautiful tag soups using pandoc (I assume) or perhaps just using advanced CSS and Javascript frameworks. This needs a front-end expert to validate my assumptions (the last time I created HTML/Javascript the DOM was just a Twinkle in the W3C’s eye).

This seems like something we should build a quick prototype of. Ultimately, my desired requirement is that Nunavut outputs a single, structured, and correct documentation format and that further translations are performed by other tools.

Do we have a target HTML style template we can use in such a prototype?

Can you share practical examples where final output formats other than HTML may be required?

Nope, we’re starting from scratch.

As part of a structured avionics program, Interface Control Documents (ICDs) are often required. These documents may be in many formats including word, PDF, or XML. It is advantageous to programmatically generate such ICDs instead of maintaining them by hand.

But then again, HTML is exportable into PDF, too.

I have very basic HTML generation working:

Obviously, a lot of it is still missing, but the general idea is there, I think.

As for what generation format to use, I agree that an intermediate format would probably be best, but on the other hand in my view it’s simpler to start with HTML and then go from there to other formats, not the other way around.

Related to the generated view - after experimenting with the nested directory-tree style view, it seems to me that it’s a bit difficult to read and search, as you have to expand each namespace and type definition to see what’s inside. It’s a great view if you know where the message you’re looking for is, but if you’re searching through the docs to find a certain kind of message (e.g. you want to find a message for latitude/longitude of a drone, but don’t know the name) then it becomes difficult to navigate. I propose instead that we split the docs into a tree view and a list of all types within the namespace, a la MAVLink docs:

The tree view then becomes a handy navigation aid, rather than a cumbersome view of all information inside the namespace. Having all the types in a list on the page also makes it much easier to CTRL-F for relevant types, and other types can be linked to on the page. What do you think @pavel.kirienko ?

P.S. Once we have documentation generation working we could possibly tweak Nunaweb to generate and statically host documentation pages. It shouldn’t be a significant load increase (just static pages), and it would make it much easier to host easily accessible online documentation. What do you think of this idea?

A tree view on the left is a sensible idea but it is important to keep the nested structure in the main view as well such that one could expand/collapse nested fields and namespaces there. The MAVLink experience will not work well for UAVCAN because DSDL types tend to be much more complex with multiple levels of nesting whereas MAVLink types are flat.

Let’s just take what you have on the first screenshot and attach the tree view on the left.

Let’s do it. It would be best to automatically regenerate the docs whenever new commits are pushed to the public regulated data types repo. Do you think we could somehow automate this eventually, e.g., via webhooks?

Sorry it’s been a week since the last update. School has been busy.

Unfortunately not that much progress this week, but we do have two major updates.
One is that we now have expandable nested composite types, so for example a uavcan.node.ID can be expanded (so you can see the docs on it from within the other data type). Links are still pending.
The other is the extent and fixed-port-ID reporting; all serializable data types now show their extent/size, and services/messages with fixed port IDs are also tagged.

The documentation is now in sans-serif font because having the docs in monospace serif while the data types are in sans-serif is a) ugly and b) kind of hard to read. It’s a holdover until we have Markdown comments anyway, so that’s a minor change.

What remains to be done is:

  1. Linking of types (especially to other external namespaces)
  2. Expansion of array types
  3. Linking of types within comments
  4. Type info popups
  5. Field offset information (separate from extent?)
  6. Clean up code and generated output (remove unnecessary generated files, move namespace docs to where we expect them to be)

Absolutely; setting up webhooks should actually be quite simple and require very little modification. All we need to do is set up an endpoint to receive Github’s event API format, then extract the repo URL and feed it through the regular upload endpoint. We can probably then use that to deploy a new version of docs.

1 Like

Here is an example of a pure-monospace document: I am thinking that maybe we could use monospace everywhere to ensure that the text looks similar both on the webpage and in the text editor? This is of minor importance though.

Yeah. Extent is the property of the field itself, whereas its offset is a property of the outer type. They are unrelated. If you think this is time-consuming to implement, we can safely remove it from the MVP and introduce it later.

Update on progress: Most of the basic features should be ironed out; types are linked properly, namespace documentation is expanded and array types expand properly. Type info popups and field offset information can probably wait until after we have an MVP. The only main feature left is probably linking of types in doc comments, which shouldn’t be too difficult.

Three points I wanted to ask:

  1. The templates currently generate documentation only for namespaces, with actual types being included in the nested view. The actual type templates (e.g. uavcan/primitive/String_1_0.html) are currently empty. Should we generate pages for specific types or remove them from the output?
  2. At this point in time, I’d like to start getting feedback on the usability and navigability of the generated documentation. There’s something I’m not quite satisfied with in the current UI, but I can’t put my finger on it; some feedback from other developers using UAVCAN would be helpful in determining points to improve.

A functional generated documentation site for the public regulated data types is here:

I’d like to clean up the code and start the process of getting the documentation generation merged into upstream Nunavut some time this week, after resolving the above issues.

We are definitely on the right track here. Here is my feedback on the current functionality.

  • Earlier you suggested that we make a tree list on the left and use that for navigation. We rejected that on the basis that it is not as convenient as what we have now. However, it might be sensible as a high-level navigation aid rather than a replacement of the current approach. If I want to look up a specific data type or quickly inspect the contents of a namespace, a high-level tree view on the left might be helpful. What if we added a new panel for that on the left and possibly made it synchronized with the main view, such that when you expand a definition in the main view, the left panel automatically scrolls to reveal the place where the expanded type is defined? Do you think it makes sense?

  • As discussed at the dev call, strong styling/coloring is needed to make the docs look more structured and less like a wall of text. For instance, we could make data type names appear in bold font, apply syntax highlighting in definitions, make the documentation text stand out compared to the actual definitions, show definitions on a darker background to make it clear where the next one begins, etc. We can discuss the specifics separately.

  • PyDSDL specifies the extent (and everything in general) in bits, not bytes. I think we should show bytes because they are more convenient for users. The DSDL Specification guarantees that the extent of any composite type is always an integer multiple of 8 bits. This does not hold for primitives though, so we may end up with extents like “1.375 bytes”. We can either tolerate that or hide the extent for such types completely.

  • Service request docs are shown twice: first, in the service’s own entry, and then again in its request entry. I suppose we don’t need the latter, do we?

  • One radical idea regarding the services: do you think it’s possible to show request/response side-by-side without overcomplicating the layout? This is of minor importance though.

  • Deprecated data types should be shown in muted/dimmed color and labeled as such. Maybe we could even have a checkbox somewhere that would show/hide deprecated types? We could use this later to add other display configuration options.

  • Can we (eventually?) have a search bar at the top?

  • Can you force tree nodes to forget the state of their children when they are collapsed, such that when you re-expand the node again, all its children are restored collapsed? This logic is implemented in CLion/PyCharm and I find it super convenient.

The difference a little text styling makes is remarkable. The type definitions and names are in bold and the namespaces are in italic; the docs remain in normal text.

Basic syntax highlighting is also implemented. (Primitive) types are highlighted in green, constant values in blue, and constant names in magenta, which helps differentiate them from plain black fields.

We can definitely try adding a panel on the left to improve navigability. I’ll take a look at adding this in the morning if possible.

The extent display has been fixed. It displays bytes when the extent is a multiple of 8, and bits when it is not.

Service request docs are now hidden; the docs are displayed on the service itself.

The side-by-side service display I think is an interesting idea, but it might sort of interrupt the flow of the page. I’ll try it anyway and post the results.

Restoring children as collapsed is trivial with a little bit of JS; I’ve just implemented it. Animations have stopped working however, which I’ll debug in the morning.

I’m thinking of including a sort of combined search/options panel in the top right for hiding deprecated types, filtering message types, etc. Will post an update on this as well when it’s implemented.

Overall there’s a good amount still left to implement, and on top of all this I also have exams this month, so we may need another week or two before this is ready to merge. Apologies for the delay. Most of the features should be in by next week, and after some code cleanup we can probably start the merge process.

One question regarding the display of types: attributes are currently prefixed with saturated or truncated to indicate their cast type. However, all attributes are already saturated by default, so should we only display truncated and hide saturated by default? (We could also differentiate the two via syntax highlighting e.g. truncated in orange and saturated in gray).

1 Like

The reader will never know that saturated is the default without reading the Spec, which (understandably) very few people do, so I would keep it displayed (as it is now) for the sake of better approachability.

@scottdixon can you please give it a look and say if you think this is okay for inclusion into Nunavut?

Can you be a bit more specific as to what “this” is?


And this: