Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix some emphasis parsing #269

Merged
merged 15 commits into from
Aug 1, 2022
135 changes: 114 additions & 21 deletions src/parser.ml
Original file line number Diff line number Diff line change
Expand Up @@ -904,31 +904,124 @@ module Pre = struct
| Emph (_, _, Underscore, n) -> Text ([], String.make n '_')
| R x -> x

let rec find_next_emph = function
| Emph (pre, post, style, n) :: _ -> Some (pre, post, style, n)
| _ :: xs -> find_next_emph xs
| [] -> None

let rec find_next_closer_emph = function
| (Emph (pre, post, style, n) as e) :: _ when is_closer e ->
Some (pre, post, style, n)
| _ :: xs -> find_next_closer_emph xs
| [] -> None

(* From the spec: "If one of the delimiters can both open and close emphasis, then the sum of the lengths
of the delimiter runs containing the opening and closing delimiters must not be
a multiple of 3 unless both lengths are multiples of 3" *)
tatchi marked this conversation as resolved.
Show resolved Hide resolved
let is_match n1 n2 =
tatchi marked this conversation as resolved.
Show resolved Hide resolved
(*
- *foo**bar**baz*

*foo** -> the second delimiter ** is both an opening and closing delimiter.
The sum of the length of both delimiters is 3, so they can't be matched.

**bar** -> they are both opening and closing delemiters.
Their sum is 4 which is not a multiple of 3 so they can be matched to produce <strong>bar</strong>

The end result is: <em>foo<strong>bar</strong>baz</em>

- *foo***bar**baz*

*foo*** -> *** is both an opening and closing delimiter.
Their sum is 4 so they can be matched to produce: <em>foo</em>**

**bar** -> they are both opening and closing delemiters.
Their sum is 4 which is not a multiple of 3 so they can be matched to produce <strong>bar</strong>

The end result is: <em>foo</em><strong>bar</strong>baz*

- ***foo***bar**baz*

***foo*** -> the second delimiter *** is both an opening and closing delimiter.
Their sum is 6 which is a multiple of 3. However, both lengths are multiples of 3
so they can be matched to produce: <em><strong>foo</strong></em>

bar**baz* -> ** is both an opening and closing delimiter.
Their sum is 3 so they can't be matched

The end result is: <em><strong>foo</strong></em>bar**baz*
*)
Comment on lines +924 to +955
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful documentation. Thank you very much!

if (n1 + n2) mod 3 = 0 && n1 mod 3 != 0 && n2 mod 3 != 0 then false
else true

let rec parse_emph = function
| (Emph (pre, _, q1, n1) as x) :: xs when is_opener x ->
let rec loop acc = function
| (Emph (_, post, q2, n2) as x) :: xs when is_closer x && q1 = q2 ->
let xs =
if n1 >= 2 && n2 >= 2 then
if n2 > 2 then Emph (Punct, post, q2, n2 - 2) :: xs else xs
else if n2 > 1 then Emph (Punct, post, q2, n2 - 1) :: xs
else xs
in
let r =
let il = concat (List.map to_r (parse_emph (List.rev acc))) in
if n1 >= 2 && n2 >= 2 then R (Strong ([], il)) :: xs
else R (Emph ([], il)) :: xs
in
let r =
if n1 >= 2 && n2 >= 2 then
if n1 > 2 then Emph (pre, Punct, q1, n1 - 2) :: r else r
else if n1 > 1 then Emph (pre, Punct, q1, n1 - 1) :: r
else r
| (Emph (_, post, q2, n2) as x) :: xs1 as xs
when is_closer x && q1 = q2 ->
(* At that point ☝️ we have an openener followed by a closer. Both are of the same style (either * or _) *)
tatchi marked this conversation as resolved.
Show resolved Hide resolved
if is_opener x && not (is_match n1 n2) then
(*
The second delimiter (the closer) is also an opener, and both delimiters don't match together,
according to the "mod 3" rule. In that case, we check if the next delimiter can match.

*foo**bar**baz* The second delimiter that's both an opener/closer ( ** before bar)
matches with the next delimiter ( ** after bar). They'll become
<strong>bar</strong>. The end result will be: <em>foo<strong>bar</strong>baz</em>


*foo**bar*baz* The second delimiter that's both an opener/closer ( ** before bar)
doesn't match with the next delimiter ( * after bar). **bar will be
considered as regular text. The end result will be: <em>foo**bar</em>baz*
Comment on lines +976 to +978
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, this is so subtle. Having the comment explaining really helps.

*)
match find_next_emph xs1 with
| Some (_, _, _, n3) when is_match n3 n2 ->
let xs' = parse_emph xs in
loop acc xs'
| _ -> loop (x :: acc) xs1
else
let xs =
if n1 >= 2 && n2 >= 2 then
if n2 > 2 then Emph (Other, post, q2, n2 - 2) :: xs1
else xs1
else if n2 > 1 then Emph (Punct, post, q2, n2 - 1) :: xs1
else xs1
in
let r =
let il = concat (List.map to_r (parse_emph (List.rev acc))) in
if n1 >= 2 && n2 >= 2 then R (Strong ([], il)) :: xs
else R (Emph ([], il)) :: xs
in
let r =
if n1 >= 2 && n2 >= 2 then
if n1 > 2 then Emph (pre, Other, q1, n1 - 2) :: r else r
else if n1 > 1 then Emph (pre, Punct, q1, n1 - 1) :: r
else r
in
parse_emph r
| (Emph (_, _, q2, _) as x) :: xs1 as xs when is_opener x ->
(*
This case happens when we encounter a second opener delimiter. We look ahead for the next closer,
and if the next closer is of the same style, we can match them together.

*foo _bar_ baz_ The second opener (_ before `bar`) is of the same style as the next closer
(_ after `bar`). We can match them to produce <em>bar</em>
The end result will be: *foo <em>bar</em> baz_


*foo _bar* baz_ The second opener (_ before `bar`) is not of the same style as the next closer
( * after `bar`). They can't be matched so we'll consider _bar as regular text.
The end result will be: <em>foo _bar</em> baz_
*)
let is_next_closer_same =
match find_next_closer_emph xs1 with
| None -> false
| Some (_, _, q3, _) -> q2 = q3
in
parse_emph r
| (Emph _ as x) :: xs1 as xs when is_opener x ->
let xs' = parse_emph xs in
if xs' = xs then loop (x :: acc) xs1 else loop acc xs'
if not is_next_closer_same then loop (x :: acc) xs1
else
let xs' = parse_emph xs in
if xs' = xs then loop (x :: acc) xs1 else loop acc xs'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't puzzled out this function enough, do you happen to know why we are doing a (polymorphic) equality comparison on the lists here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea 😬 I could investigate but that comparison doesn't seem to be needed as the following code successfully pass all the tests.

diff --git a/src/parser.ml b/src/parser.ml
index e90b96d..46f37b6 100644
--- a/src/parser.ml
+++ b/src/parser.ml
@@ -959,9 +959,7 @@ module Pre = struct
                 | Some (_, _, q3, _) -> q2 = q3
               in
               if not is_next_closer_same then loop (x :: acc) xs1
-              else
-                let xs' = parse_emph xs in
-                if xs' = xs then loop (x :: acc) xs1 else loop acc xs'
+              else loop acc (parse_emph xs)
           | x :: xs -> loop (x :: acc) xs
           | [] -> x :: List.rev acc
         in

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already a good investigation! I'll make note to followup here with a change that removes this bit. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing that equality comparison in master

diff --git a/src/parser.ml b/src/parser.ml
index 6342989..053ca50 100644
--- a/src/parser.ml
+++ b/src/parser.ml
@@ -926,9 +926,8 @@ module Pre = struct
                 else r
               in
               parse_emph r
-          | (Emph _ as x) :: xs1 as xs when is_opener x ->
-              let xs' = parse_emph xs in
-              if xs' = xs then loop (x :: acc) xs1 else loop acc xs'
+          | (Emph _ as x) :: _ as xs when is_opener x ->
+              loop acc (parse_emph xs)
           | x :: xs -> loop (x :: acc) xs
           | [] -> x :: List.rev acc
         in

When running the tests, it gets stuck at 99%. I assume it's because of an infinite loop. So it appears that this code is useful in master but becomes unnecessary with that PR.

| x :: xs -> loop (x :: acc) xs
| [] -> x :: List.rev acc
in
Expand Down
8 changes: 8 additions & 0 deletions tests/dune.inc

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions tests/extract_tests.ml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ let protect ~finally f =
finally ();
r

let disabled =
[ 206; 215; 216; 410; 411; 414; 415; 416; 428; 468; 469; 519; 539 ]
let disabled = [ 206; 215; 216; 519; 539 ]

let with_open_in fn f =
let ic = open_in fn in
Expand Down