Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a WIP to add a CST structure that preserves some details of the original markdown syntax. This is a draft PR and still needs a lot of work. I'm opening this to open up discussion and get early feedback.
The first two commits are from @patricoferris' work on https://github.com/patricoferris/omd/tree/omd-print. There are a few commits after that are not very relevant.
a3a8fae is where the CST structure is added. This is a very basic implementation, I basically copied a lot of the current AST code and implemented functions that go from AST to CST. I'm not sure I understood the discussion in #223 regarding the CST structure and how it should be implemented. I'm pretty sure there's a much better solution than what I have here.
Subsequent commits add details to the CST so that information is not lost when trying to print the structure back to string.
What I realized is that there is some information that we need to keep in order not to change the "meaning" of the markdown.
This is the case with:
In master and when parsing the above markdown into an AST structure, we correctly parse it as a regular text and not a heading due to the escape char
\
, but it's the escape character is not preserved in the AST:So when we parse it back it becomes a heading which is obviously not correct and need to be fixed
Besides that, there are other missing pieces of information that make the string we generate different from the original, but don't change the "meaning" of the markdown. That's the case with the emphasis character, for example.
We don't store in the AST if the emphasis character is
_
or*
. But in the end, we can choose whatever we want when we print the AST back to a string, it won't change the "meaning" and the HTML will be the same. Actually, Pandoc doesn't keep this information either:I'm wondering what we're aiming for in our case? Do we strictly want to print back the exact same string we parsed, or is it fine as long as the markdown result/HTML output is the same?