You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When updating my microdown extensions based on the latest release, I ran into an issue where my tests for compatibility with the legacy implementations failed. One of my checks uses MicDocFactory class>>#exampleMicAsHTMLBodyString to create a HTML body string.
This string now includes a nested anchor for external links (so one inside of another), as you can see here:
<p>Here is an external link: <a target="_blank" href="http://pharo.org/"><a target="_blank" href="http://pharo.org/">http://pharo.org</a></a>.</p>
So for some reason the same link has been created inside of a link that was just being created. My first guess was to check MicHTMLVisitor>>#visitLink: because I noticed that there were some changes considering the last clause:
with: [self visitAll: aLink children] had been recently changed from the older with: aLink caption.
But reverting to the old implementation lead to another wrong result:
<p>Here is an external link: <a target="_blank" href="http://pharo.org/">[http://pharo.org](http://pharo.org)</a>.</p>
This was quite strange, because it told me that the reason for nesting the anchors must be something totally different. I finally noticed that something special had been introduced about one year ago by Kasper and changed on March 26 by Cyril, which both introduced the actual nesting bug, but in a way that it did not catch somebody's attention. My extensions were not affected, because I stayed on P12 for a long time and after that did not have compatibility tests, such that I did not notice the problem at all. Even though I do not understand all details, I could fix the issue in method
First of all, I tested the regexp and found that this regexp has 2 matches on a link like this: [https://pharo.org/](https://pharo.org)
and only one match on a link without a URL-style link caption: [Pharo](https://pharo.org)
I tried several variants, excluding and including brackets at start, but none of them worked in principal (because I do not understand the intended way how to exclude some further characters left of the pattern such that the microdown logic does not complain about something). But reading the method comment carefully, finally lead me to a solution that fully works for me (and hopefully also in general): If I change the regular expression such that it does not match an URL string that is part of the link name, everything works fine. To achieve that, I include an additional ( at start and check for the opposite parenthesis at end. This does not lead to problems, because the Mic link handler internally anyway strips off the parentheses of the URL part. My regexp now looks like:
'\(http(s)?\://[^)\s]+'
I think this fix is correct, especially as the the old code was dealing with a square bracket limiter, which only worked because anyway both parts of the link were available. As a side effect of my fix, the restriction mentioned in the method comment should not exist any longer, because the new pattern does not check for square brackets at all.
(If you try to drop the very first escaped ( of the regexp, this leads to a strange concatenation of URL caption and URL itself, clarifying that it is important to not include the URL caption in the pattern match.)
I am testing on a recent P13 image created by launcher, adding the following loading script:
#( 'Microdown' 'BeautifulComments' 'DocumentBrowser' ) do: [ :name |
(IceRepository repositoryNamed: name)
ifNil: [ self inform: 'Project not found: ' , name ]
ifNotNil: [ :found |
found
unload;
forget ] ].
Metacello new
baseline: 'Microdown';
repository: 'github://pillar-markup/Microdown:Pharo13';
load.
For checking the fix, I recommend to add the following testing method, which is green for the fixed pattern only:
MicHTMLDocumentTest>>#testLinksFromMicrodown
| html |
html := doc fromMicrodown: '
Here is an external link: [Pharo](https://pharo.org).
This external link includes its URL as text: [https://pharo.org/](https://pharo.org).
Here is an internal link with an intentionally misleading text: [http://misleading.stuff](#heading-1).'.
#(
'<a target="_blank" href="https://pharo.org/">Pharo</a>'
'<a target="_blank" href="https://pharo.org/">https://pharo.org/</a>'
'<a target="_blank" href="#heading-1">http://misleading.stuff</a>'
)
do: [ :t | self assert: (html contents indexOfSubCollection: t) > 0 ]
The text was updated successfully, but these errors were encountered:
When updating my microdown extensions based on the latest release, I ran into an issue where my tests for compatibility with the legacy implementations failed. One of my checks uses
MicDocFactory class>>#exampleMicAsHTMLBodyString
to create a HTML body string.This string now includes a nested anchor for external links (so one inside of another), as you can see here:
So for some reason the same link has been created inside of a link that was just being created. My first guess was to check
MicHTMLVisitor>>#visitLink:
because I noticed that there were some changes considering the last clause:with: [self visitAll: aLink children]
had been recently changed from the olderwith: aLink caption
.But reverting to the old implementation lead to another wrong result:
This was quite strange, because it told me that the reason for nesting the anchors must be something totally different. I finally noticed that something special had been introduced about one year ago by Kasper and changed on March 26 by Cyril, which both introduced the actual nesting bug, but in a way that it did not catch somebody's attention. My extensions were not affected, because I stayed on P12 for a long time and after that did not have compatibility tests, such that I did not notice the problem at all. Even though I do not understand all details, I could fix the issue in method
First of all, I tested the regexp and found that this regexp has 2 matches on a link like this:
[https://pharo.org/](https://pharo.org)
and only one match on a link without a URL-style link caption:
[Pharo](https://pharo.org)
I tried several variants, excluding and including brackets at start, but none of them worked in principal (because I do not understand the intended way how to exclude some further characters left of the pattern such that the microdown logic does not complain about something). But reading the method comment carefully, finally lead me to a solution that fully works for me (and hopefully also in general): If I change the regular expression such that it does not match an URL string that is part of the link name, everything works fine. To achieve that, I include an additional
(
at start and check for the opposite parenthesis at end. This does not lead to problems, because the Mic link handler internally anyway strips off the parentheses of the URL part. My regexp now looks like:'\(http(s)?\://[^)\s]+'
I think this fix is correct, especially as the the old code was dealing with a square bracket limiter, which only worked because anyway both parts of the link were available. As a side effect of my fix, the restriction mentioned in the method comment should not exist any longer, because the new pattern does not check for square brackets at all.
(If you try to drop the very first escaped
(
of the regexp, this leads to a strange concatenation of URL caption and URL itself, clarifying that it is important to not include the URL caption in the pattern match.)I am testing on a recent P13 image created by launcher, adding the following loading script:
For checking the fix, I recommend to add the following testing method, which is green for the fixed pattern only:
The text was updated successfully, but these errors were encountered: