Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microdown conversion to HTML creates nested external links #919

Open
riuttner opened this issue Nov 6, 2024 · 0 comments
Open

Microdown conversion to HTML creates nested external links #919

riuttner opened this issue Nov 6, 2024 · 0 comments

Comments

@riuttner
Copy link

riuttner commented Nov 6, 2024

When updating my microdown extensions based on the latest release, I ran into an issue where my tests for compatibility with the legacy implementations failed. One of my checks uses MicDocFactory class>>#exampleMicAsHTMLBodyString to create a HTML body string.

This string now includes a nested anchor for external links (so one inside of another), as you can see here:

<p>Here is an external link: <a target="_blank" href="http://pharo.org/"><a target="_blank" href="http://pharo.org/">http://pharo.org</a></a>.</p>

So for some reason the same link has been created inside of a link that was just being created. My first guess was to check MicHTMLVisitor>>#visitLink: because I noticed that there were some changes considering the last clause:

with: [self visitAll: aLink children] had been recently changed from the older with: aLink caption.

But reverting to the old implementation lead to another wrong result:

<p>Here is an external link: <a target="_blank" href="http://pharo.org/">[http://pharo.org](http://pharo.org)</a>.</p>

This was quite strange, because it told me that the reason for nesting the anchors must be something totally different. I finally noticed that something special had been introduced about one year ago by Kasper and changed on March 26 by Cyril, which both introduced the actual nesting bug, but in a way that it did not catch somebody's attention. My extensions were not affected, because I stayed on P12 for a long time and after that did not have compatibility tests, such that I did not notice the problem at all. Even though I do not understand all details, I could fix the issue in method

MicInlineHttpDelimiter>>#markupAsRegex`

	^ `'http(s)?\://[^]\s]+'`

First of all, I tested the regexp and found that this regexp has 2 matches on a link like this:
[https://pharo.org/](https://pharo.org)

and only one match on a link without a URL-style link caption:
[Pharo](https://pharo.org)

I tried several variants, excluding and including brackets at start, but none of them worked in principal (because I do not understand the intended way how to exclude some further characters left of the pattern such that the microdown logic does not complain about something). But reading the method comment carefully, finally lead me to a solution that fully works for me (and hopefully also in general): If I change the regular expression such that it does not match an URL string that is part of the link name, everything works fine. To achieve that, I include an additional ( at start and check for the opposite parenthesis at end. This does not lead to problems, because the Mic link handler internally anyway strips off the parentheses of the URL part. My regexp now looks like:

'\(http(s)?\://[^)\s]+'

I think this fix is correct, especially as the the old code was dealing with a square bracket limiter, which only worked because anyway both parts of the link were available. As a side effect of my fix, the restriction mentioned in the method comment should not exist any longer, because the new pattern does not check for square brackets at all.

(If you try to drop the very first escaped ( of the regexp, this leads to a strange concatenation of URL caption and URL itself, clarifying that it is important to not include the URL caption in the pattern match.)

I am testing on a recent P13 image created by launcher, adding the following loading script:

#( 'Microdown' 'BeautifulComments' 'DocumentBrowser' ) do: [ :name |
		(IceRepository repositoryNamed: name)
			ifNil: [ self inform: 'Project not found: ' , name ]
			ifNotNil: [ :found |
				found
					unload;
					forget ] ].
					
Metacello new
    baseline: 'Microdown';
    repository: 'github://pillar-markup/Microdown:Pharo13';
    load.

For checking the fix, I recommend to add the following testing method, which is green for the fixed pattern only:

MicHTMLDocumentTest>>#testLinksFromMicrodown
	| html |
	
	html := doc fromMicrodown: '
		Here is an external link: [Pharo](https://pharo.org).

		This external link includes its URL as text: [https://pharo.org/](https://pharo.org).

		Here is an internal link with an intentionally misleading text: [http://misleading.stuff](#heading-1).'.
	#(
		'<a target="_blank" href="https://pharo.org/">Pharo</a>'
		'<a target="_blank" href="https://pharo.org/">https://pharo.org/</a>'
		'<a target="_blank" href="#heading-1">http://misleading.stuff</a>'
	)
		do: [ :t | self assert: (html contents indexOfSubCollection: t) > 0 ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant