-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should Mr. Docs be XML-only? #678
Comments
The XML format could be "breadth-first." That is, once we have extracted all the symbols into a tree, the XML file is produced by performing a breadth-first visitation on the root. There should be no nested symbols. That is, a namespace will not have children. Instead the children will refer to the namespace by id, and the namespace will be seen first. |
This matches my intuition of what MrDocs should have been when it started. It would do one thing well, at least in the MVP, and we would have additional utilities over time. I also have stated that I'm in favor of removing some significant things, although for different reasons, like:
Each of these things could be reevaluate in the future, but I don't think these dependencies are helping us a lot right now. However, I'm not too fond of removing the Asciidoc generator now. We depend a lot on Asciidoc for other projects, and removing Asciidoc support now would complicate things a lot. Although this could have been a good idea at the beginning of the project, I don't think it's a good idea now:
|
Most of the stuff you mentioned are not really relevant. The one thing which is:
That's a problem. |
After some analysis, it looks like XML-only is dead on arrival for now.
And having an immutable data structure which is shared between multiple Python threads of execution. |
Yes. That's true if we find an easy way to integrate the post-processing step.
We'll only know about efficiency with some experiments. |
Pasting some comments by René:
|
@grafikrobot in the current implementation, authors can create additional Generator extensions implemented as dynamically loaded DLL or shared object files. So Mr. Docs today is not in theory limited to the Handlebars templates that we have created. The nice thing is that developing for our extensions API does not require a local installation of clang/LLVM source code. There are overheads with XML but I believe we can mitigate them completely, or close to it, by restructuring the XML. Making it flat instead of hierarchical will allow parallel processing. And we can build a separate index of file offsets to enable random access. I am confident we can make this work, but it isn't clear that Python will deliver us the performance we expect. Of course I would love your "markdown-agnostic" solution, but the annoying implementation details get in the way. For example, computation of the "SafeName" for a C++ symbol very much depends on the target Markdown language, because some special characters are valid in some markdowns and not others. In fact the SafeName also has to kind of care about the target filesystem, because it can't use characters which are illegal or special for that filesystem. Our current implementation limits the set of characters in a SafeName to only the subset of characters which are not special on linux or Windows partitions. Furthermore, forming a link to a symbol depends on the target markdown language. It is kind of ugly and for performance reasons best implemented natively. Having a partial / template produce the link would be a mess. The formed link also depends on whether you are doing single-page or multi-page. In other words there are a small handful of markdown-specific algorithms which are best expressed in C++, having access to the in-memory representation of the program's metadata. I appreciate that we explored the XML-only solution, and I think for now we should just finish what we have so that we have uncovered all the unknowns. Get this into the hands of users and some months or years of field experience. And then, after all the rough edges are exposed and smoothed out we can revisit this XML-only idea. |
If we make Mr. Docs XML-only, we get a number of benefits:
However there could be downsides:
Some of my rambling:
Maybe our XML output should be close to flat, instead of having nested scopes.
The advantage of making Mr. Docs XML-only is that we can focus on doing one thing and doing it well. We can optimize every step of the extraction for XML, without worrying about bitcode or other representations. And we wouldn't need plugins. This is a tradeoff though, because we have 1. the problem of emitting one huge XML file, and 2. performance questions about converting XML to markdown using a sandboxed language.
And we get to use industry standard components like Jinja or handlebars. The author of the XML converter has complete freedom to use any tools they want.
I'm ok with emitting a single XML file but I think we need to be smart about the format. We should consider flattening the output so that a child comes after the parent rather than being nested in the parent. We can use node IDs or whatever, the "id" field, to refer to the parent. If we order the entries in the XML output from the top scope down to the most nested scope, then the XML converter can ingest parent scopes incrementally and then decide if it wants to launch additional threads to process the children.
The text was updated successfully, but these errors were encountered: