Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General problem with extensibility of the Microdown parser framework #913

Open
riuttner opened this issue Oct 31, 2024 · 0 comments
Open

Comments

@riuttner
Copy link

In my current application I have a use case where I successfully can use the Microdown package to cover most of the functionality I need: I have plain markdown files which I convert to HTML files. During the conversion, I need the following extensions:

  • Markdown does not need explicit anchors for targets that are headings. For local references, you just translate the respective heading text to lowercase and replace spaces with hyphens. To convert these references to anchors, however, you have to adjust them to plain ASCII anchors if they contain unicode characters.
  • When a markdown file refers to another file, I want to replace the file name extension from .md to .html.

I implemented this functionality in a HTMLUrlTranslator class, providing a preconfigured instance that can be called on every single URL to process it. Instead of the original MicHTMLVisitor, I use an own subclass hooking #visitAnchor:, #visitHeader: and #visitLink: to add needed anchors and track everything, such that I can later check consistency - so far, so easy. However, I run into an additional problem during translation from markdown to html. Here already the parsing fails, because ZnPercentEncoder throws an error if it encounters a unicode character within an anchor. This happends during call of MicInlineBlockWithUrl>>#closeMe, which is the last line called from within MicInlineParser>>#parseNameUrlBlock:from:token:.

Of course my subclass of MicInlineParser can overwrite the whole method and fix the URL before #closeMe is run for the block (which I already do because I want to make it work). But when having a deeper look on the whole way how variety is handled in the current implementation, I noticed the following:

  • Typically, classes representing a variant of functionality are passed to a method (like to MicInlineParser>>#parseNameUrlBlock:from:token:, which actually does not expect a UrlBlock instance, but a block type instead).
  • These classes are then sent #new to create an instance of the respective type, which does not allow to transfer any useful context at this late point.
  • In general, there is no hook on creation of a subelement to optionally get any context from the parent creator. Such a hook would also make sense as some of the child elements can hold properties, even though these properties mostly unused.

Even passing some context to my own subclass of MicInlineParser was quite tricky, as the original code creates it as follows:

MicElement>>#newInlineParser

	^ self inlineParserClass new

Fortunately there is enough depth of hierarchy such that I could implement the following extension to inject my own subclass and call a setup hook on the main parser available in each block element:

MicAbstractBlock>>#newInlineParser

	^ self inlineParserClass == MicConfiguredInlineParser superclass 
		ifTrue: [ MicConfiguredInlineParser newOn: parser ]
		ifFalse: [ super newInlineParser ]

By this approach I could at least reduce the count of methods to overwrite to one - even though I consider this way as something preliminary.

What I would like to contribute to make the whole thing much easier for my own and possibly other future changes is:

  • Exchanging method arguments to instances instead of types in all cases when appropriate.
  • Providing general hooks for setting up the different types of children on their respective creators.
  • None of the changes should lead to different behaviour of the standard classes, of course.

I noticed that recently the inline parser has been refactored very thoroughly (which was nice anyway, as the old code was a bit strange). If similar work is expected to other core classes in the near future, it might not be the best time to come up with a PR from my side as a discussion base, but waiting a bit is no problem for me. I will prepare my proposal anyway and check how it behaves and how easy it really makes implementation of extensions for changes deeper inside. (What I plan for the near future is creating XHTML in parallel to HTML, as I want to generate stuff that I can translate later outside of my application to something more different.)

Please tell me what you think about!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant