-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix some emphasis parsing #269
fix some emphasis parsing #269
Conversation
Okay, I think I managed to fix all the broken tests mentioned in #244. At least tests are passing now and nothing else broke 😁 It's the first time I'm working on a parser, and that was not easy 🥵😅 Based on the expected outcome, I first tried to guess what should be the solution until I told myself that it should probably be a good idea to read the spec. First time that I'm reading a spec (or rather a part of it), and that was very helpful! 😁 There are 3 main commits:
I'm not very happy with the code of 1 and 3 so I'll see if I come up with something better. I'm planning on creating a separate PR for each of them so we can have a dedicated discussion for each. The fix is different anyway so I think it makes sense. |
This is great. Thank you. Splitting PRs would be good. |
Yeah, this is really great work! I agree that refactoring it into independent PRs could help keep things very clean. One heads up is should we be aware of the potential with conflicts with #266. I'll work on trying to get that merged in the next day or two, which should help clear up that risk one way or another :) |
I would be fine with review this from the current PR. While I appreciate the intention to keep the PRs clean and focused, this group of spec violations was already grouped into a single issue, and the changeset is small enough here, and close enough related, that I think it makes sense to review in one PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial questions and suggestions.
I think your additions generally read ok and fits with the existing code around it.
My biggest ask is that you provide comments to explain the logic you're introducing. This is something we are sorely lacking in the existing parsing code, and it would be awesome to start sharing the knowledge while it is fresh in people's minds after they have to reconstruct it from the code.
All of my other suggestions here are negotiable if you have strong feelings that point in another direction.
src/parser.ml
Outdated
is_opener x | ||
&& (n1 + n2) mod 3 = 0 | ||
&& n1 mod 3 != 0 | ||
&& n2 mod 3 != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you either add a comment explaining the logic for this conditional, or define a predicate that provided a "self-documenting" explanation of what is being tested in this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an is_match
function with a description coming from the spec: https://github.com/ocaml/omd/pull/269/files#diff-2f56b48efe4c0b849c5b527b223a6c8ab85abd599e10f8b465f5c9801449a468R921
Not convinced by the function's name though but I couldn't come up with anything better. Open to any better alternative :)
src/parser.ml
Outdated
let xs = | ||
if n1 >= 2 && n2 >= 2 then | ||
if n2 > 2 then Emph (Other, post, q2, n2 - 2) :: xs else xs | ||
else if n2 > 1 then Emph (Punct, post, q2, n2 - 1) :: xs | ||
else xs | ||
in | ||
let r = | ||
let il = concat (List.map to_r (parse_emph (List.rev acc))) in | ||
if n1 >= 2 && n2 >= 2 then R (Strong ([], il)) :: xs | ||
else R (Emph ([], il)) :: xs | ||
in | ||
let r = | ||
if n1 >= 2 && n2 >= 2 then | ||
if n1 > 2 then Emph (pre, Other, q1, n1 - 2) :: r else r | ||
else if n1 > 1 then Emph (pre, Punct, q1, n1 - 1) :: r | ||
else r | ||
in | ||
parse_emph r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't add this code, but if you happened to have reasoned out what it does and can help illuminate with some comments, that would be a big help. I find it pretty inscrutable! If not, not worries :)
src/parser.ml
Outdated
if not is_next_closer_same then loop (x :: acc) xs1 | ||
else | ||
let xs' = parse_emph xs in | ||
if xs' = xs then loop (x :: acc) xs1 else loop acc xs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't puzzled out this function enough, do you happen to know why we are doing a (polymorphic) equality comparison on the lists here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea 😬 I could investigate but that comparison doesn't seem to be needed as the following code successfully pass all the tests.
diff --git a/src/parser.ml b/src/parser.ml
index e90b96d..46f37b6 100644
--- a/src/parser.ml
+++ b/src/parser.ml
@@ -959,9 +959,7 @@ module Pre = struct
| Some (_, _, q3, _) -> q2 = q3
in
if not is_next_closer_same then loop (x :: acc) xs1
- else
- let xs' = parse_emph xs in
- if xs' = xs then loop (x :: acc) xs1 else loop acc xs'
+ else loop acc (parse_emph xs)
| x :: xs -> loop (x :: acc) xs
| [] -> x :: List.rev acc
in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's already a good investigation! I'll make note to followup here with a change that removes this bit. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing that equality comparison in master
diff --git a/src/parser.ml b/src/parser.ml
index 6342989..053ca50 100644
--- a/src/parser.ml
+++ b/src/parser.ml
@@ -926,9 +926,8 @@ module Pre = struct
else r
in
parse_emph r
- | (Emph _ as x) :: xs1 as xs when is_opener x ->
- let xs' = parse_emph xs in
- if xs' = xs then loop (x :: acc) xs1 else loop acc xs'
+ | (Emph _ as x) :: _ as xs when is_opener x ->
+ loop acc (parse_emph xs)
| x :: xs -> loop (x :: acc) xs
| [] -> x :: List.rev acc
in
When running the tests, it gets stuck at 99%. I assume it's because of an infinite loop. So it appears that this code is useful in master but becomes unnecessary with that PR.
7e6a712
to
72ea961
Compare
Thanks for the review! 😊 I would have preferred to split that PR into 3 smaller ones, but now that you started the review process, let's continue here :) I will address your comments and try to do my best to document my changes. |
I should have addressed your comments. I also added some comments to document the code I added. That wasn't super easy to explain so hopefully it's more or less understandable 😅 I'm wondering if the functions I'm not sure if that makes sense and if that's feasible. I'll try and see how far I can go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few followup suggestions, but looks excellent to me.
Thank you very much for your careful work here. The comments are a huge help in understanding what's going on in this code and, moreover and more importantly, set a good example for the rest of us to follow as we continue maintaining the parser.
(* | ||
- *foo**bar**baz* | ||
|
||
*foo** -> the second delimiter ** is both an opening and closing delimiter. | ||
The sum of the length of both delimiters is 3, so they can't be matched. | ||
|
||
**bar** -> they are both opening and closing delemiters. | ||
Their sum is 4 which is not a multiple of 3 so they can be matched to produce <strong>bar</strong> | ||
|
||
The end result is: <em>foo<strong>bar</strong>baz</em> | ||
|
||
- *foo***bar**baz* | ||
|
||
*foo*** -> *** is both an opening and closing delimiter. | ||
Their sum is 4 so they can be matched to produce: <em>foo</em>** | ||
|
||
**bar** -> they are both opening and closing delemiters. | ||
Their sum is 4 which is not a multiple of 3 so they can be matched to produce <strong>bar</strong> | ||
|
||
The end result is: <em>foo</em><strong>bar</strong>baz* | ||
|
||
- ***foo***bar**baz* | ||
|
||
***foo*** -> the second delimiter *** is both an opening and closing delimiter. | ||
Their sum is 6 which is a multiple of 3. However, both lengths are multiples of 3 | ||
so they can be matched to produce: <em><strong>foo</strong></em> | ||
|
||
bar**baz* -> ** is both an opening and closing delimiter. | ||
Their sum is 3 so they can't be matched | ||
|
||
The end result is: <em><strong>foo</strong></em>bar**baz* | ||
*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wonderful documentation. Thank you very much!
*foo**bar*baz* The second delimiter that's both an opener/closer ( ** before bar) | ||
doesn't match with the next delimiter ( * after bar). **bar will be | ||
considered as regular text. The end result will be: <em>foo**bar</em>baz* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang, this is so subtle. Having the comment explaining really helps.
src/parser.ml
Outdated
if not is_next_closer_same then loop (x :: acc) xs1 | ||
else | ||
let xs' = parse_emph xs in | ||
if xs' = xs then loop (x :: acc) xs1 else loop acc xs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's already a good investigation! I'll make note to followup here with a change that removes this bit. Thanks!
I'm sorry about that! I didn't mean to push you into doing just one PR. I'll refrain from doing a review before you indicate you're ready in the future. As soon as you are happy with this one, please convert it from a draft PR. All that's left to be done there is to add an entry into the changelog (I'll do that if you haven't beaten me to it by the time this is converted from a draft). Thanks again for your excellent work here on fixing the parser! :) |
Hi, @tatchi! Just checking here. If you're happy with the state of things, I'd be happy to make a small add here to update the changelog and merge this in. But I don't want to step on your toes in case you had been planning to do more work on this. |
Hi @shonfeder, everything looks good to me except these two I'm not sure if that will be feasible but I would like at least to give it a try. Good news is that I'll have time to look at it this week, so expect an outcome by the end of the week 😁 |
I discovered an emphasis parsing bug that was not covered by the conformance tests. I added an extra test in 46d6a61 with the fix in 6e17dd3 For the rest, I haven't found a way to remove these two I marked the PR as ready for review so it can be merged now 😁 |
Many thanks for the great work here, @tatchi! |
CHANGES: - Expose the HTML escape function `htmlentities` (ocaml-community/omd#295 @cuihtlauac) - Support generation of identifiers in headers (ocaml-community/omd#294, @tatchi) - Support GitHub-Flavoured Markdown tables (ocaml-community/omd#292, @bobatkey) - Update parser to support CommonMark Spec 0.30 (ocaml-community/omd#266, @SquidDev) - Preserve the order of input files in the HTML output to stdout (ocaml-community/omd#258, @patricoferris) - Fix all deviations from CommonMark Spec 0.30 (ocaml-community/omd#284, ocaml-community/omd#283, ocaml-community/omd#278, ocaml-community/omd#277, ocaml-community/omd#269, @tatchi)
Not ready for review yet. Opening it as a draft to give an overview of what I'm currently working on.