stdlib audit: use DLS for parsing global state #11225

Octachron · 2022-04-28T15:17:38Z

The Parsing module which is used by every ocamlyacc generated parsers relies on a global mutable work state.

This means that currently no parsers generated by ocamlyacc should be run concurrently.
This property does not seem that simple to satisfy in presence of multiple libraries that may use ocamlyacc parser internally.

This PR proposes thus to alleviate this issue by making the Parsing work state a domain local state.

This is not sufficient to make ocamlyacc thread-safe: users will still need to use locks to achieve concurrency safety, but they would no longer be required to use a single lock for all ocamlyacc generated parsers. I believe that this is an usability improvement. Moreover, this is the the best that we can do in backward compatible way since location functions like right_symbol depends on the implicit parser work state.

avsm · 2022-04-28T15:25:34Z

manual/src/cmds/lexyacc.etex

+domain but is shared between various threads.
+It is thus recommended to use locks when using ocamlyacc parsers
+on different threads, but domain parallelism is achievable.
+


Could we state a little more explicitly what is safe? If I understand this right, spawning multiple parsers in separate domains and letting them run to completion is safe. Suspending a computation on a domain and resuming it on the same domain is also safe, if no other parsers run in the interim on that domain.

But stack switching and migrating the parser execution to a different domain is not safe. And suspending a parser execution on a domain and resuming another parser execution on the same domain is not safe.

I agree that it is better to be more explicit, and yes the simple safe use case is to run a parser to completion on the same domain without any interleaving without another ocamlyacc parser.

Maybe something like:

It is safe to spawn multiple parsers in separate domains and let them run to completion.
It is also safe to suspend and resume a parser on the same domain if no other parsers run in the interim on this domain.
It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain.

?

That reads better to me! It might also be worth giving a small example about how locks might be used to achieve concurrency-safety:

users will still need to use locks to achieve concurrency safety

It's not quite clear to me how to use locks safely here without a single lock around the entire execution of the parser.

The lock needs to be taken on the entire execution of the parser indeed.
I have added explicit examples to the manual to clarify the limited safety offered.

If I understand correctly, unsafe usage of ocamlyacc according to these guidelines is already possible in OCaml 4 (you can call recursively another parser from within a semantic action) and will not break memory- or type-safety, right?

I think the documentation could clarify what safe means here, and in particular what unsafe means. If I'm hunting a segfault in a Multicore OCaml program, should I or should I not rule out ocamlyacc as a plausible culprit?

This is a significant turn-off: it means that ocamlyacc was already memory-unsafe to use with OCaml 4.x, and this PR makes the code more complex (and potentially slightly less efficient) without fixing the issue.

(Florian told me "by voice" about an alternative approach he has in mind that is simpler and may also fix this Threads issue.)

If you're looking for weird soundness bugs with ocamlyacc, #10080 is still open. It doesn't even need other threads or other parser, just some raise Parse_error in the wrong places.

Since after more thinking, we really don't want to encourage users to run ocamlyacc parser concurrently, a simpler fix is probably to add a global lock to Parsing and make yyparse acquire this lock. This creates a deadlock for nested ocamlyacc parsers (parser calling other parser in their semantic action), but this use case is not really supported and we could potentially add a global parameter to allow such nested parsers.

If using nested parsers is not supported, a lock will detect it and fail appropriately, I think ("deadlock avoided"). The user should even get a nice stacktrace showing where the lock attempt failed?

xavierleroy

The code is fine, but I'm still nit-picking on the documentation.

xavierleroy · 2022-05-04T08:39:32Z

manual/src/cmds/lexyacc.etex

+It is also safe to suspend and resume a parser on the same domain if no other parsers run in the interim on this domain.
+
+\begin{verbatim}
+let parse s =
+  let lexbuf = Lexing.from_string s in
+  let r = Parser.main Lexer.token lexbuf in
+  Format.printf "%d@." r
+
+let rec print s n =
+  if n = 0 then ()
+  else begin
+    Format.printf "print(%s) %d@." s n;
+    Thread.yield ();
+    print s (n-1)
+  end
+
+let on_domain s () =
+  let t1 = Thread.create (print s) 10 in
+  let t2 = Thread.create parse (s^"\n") in
+  Thread.join t1; Thread.join t2
+
+let () =
+  let dl =
+    List.map (fun s ->Domain.spawn (on_domain s))
+      ["1+6"; "7+8"; "9+17"]
+  in
+  List.iter Domain.join dl
+\end{verbatim}


I must say I don't fully understand what this example does :-( Could we tell a simpler story, e.g. "in all other cases use locking" ? Just omit this example and go straight to the next one...

That example was supposed to illustrate the narrow coexistence path between Threads and ocamlyacc parser. But yes, it is probably better to omit this example.

xavierleroy · 2022-05-04T08:40:43Z

manual/src/cmds/lexyacc.etex

+\end{verbatim}
+
+
+It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain.


I'm not sure all our users understand that "interleave the execution" includes in particular "using the Threads library".

Octachron · 2022-07-22T13:42:23Z

What should we do with this change? Should we consider dropping it and seeing if users are impacted by the ocamlyacc parser shared mutable state?

gasche

I read the code and I believe it is correct.

I wonder if there is a noticeable impact on performance. (We are now making several DLS reads on most semantic actions.) Would it be worth taking a simple ocamlyacc grammar and checking that the runtimes are undistinguishable?

(The env-hungry functions are mostly around locations/positions, so I'd try a small grammar but with location/positions handling).

gasche · 2022-07-22T14:27:18Z

manual/src/cmds/lexyacc.etex

+domain but is shared between various threads.
+It is thus recommended to use locks when using ocamlyacc parsers
+on different threads, but domain parallelism is achievable.
+


If I understand correctly, unsafe usage of ocamlyacc according to these guidelines is already possible in OCaml 4 (you can call recursively another parser from within a semantic action) and will not break memory- or type-safety, right?

I think the documentation could clarify what safe means here, and in particular what unsafe means. If I'm hunting a segfault in a Multicore OCaml program, should I or should I not rule out ocamlyacc as a plausible culprit?

Octachron · 2022-08-22T08:44:05Z

Running the ocaml 4.07 parser on the 353 compiler ml files after loading them in memory, yields a median increase of parsing time of 7% when using the DLS version.

More precisely, looking at percentile

percentile	increase in parsing time
10%	+3.2%
20%	+4.6%
30%	+5.5%
40%	+6.4%
50%	+6.9%
60%	+7.4%
70%	+8%
80%	+8.6%
90%	+9.6%

with the worst cases being:

rank	percentile	name	increase in parsing time
349	98.5876%	ocamldoc/odoc_module.ml	1.12673
350	98.8701%	bytecomp/simplif.ml	1.12819
351	99.1525%	tools/eqparsetree.ml	1.12993
352	99.435%	typing/printtyped.ml	1.13111
353	99.7175%	ocamldoc/odoc_texi.ml	1.15284

Consequently, it seems that there is some visible cost (at least when parsing from memory).
Moreover, since the restriction on concurrent uses already exist with threads, I am starting to think that it might be better to keep the global state with some added warning in the documentation.

Octachron · 2022-09-23T15:48:18Z

Taking in account the facts that:

thread safety is already an issue in OCaml 4
menhir is a widely available upgrade
there are hacks that allow to decouple a specific parser from the global ocamlyacc state in case of imperious needs

I am closing this PR in favor of #11562 which documents the existing issue and points towards menhir as a thread-safe alternative to ocamlyacc.

Octachron added 2 commits April 28, 2022 15:26

stdlib audit: isolate parsing work state

c6ca2d1

stdlib: parsing documentation

29f45c2

Octachron added the no-change-entry-needed label Apr 28, 2022

avsm reviewed Apr 28, 2022

View reviewed changes

Octachron added 2 commits April 29, 2022 11:41

manual: explicit examples for ocamlyacc parsers

71627bb

concurrency-safety documentation: parsing

2a2bd08

This was referenced Apr 29, 2022

stdlib concurrency safety documentation, part two: global states #11227

Merged

Audit stdlib for mutable state #10960

Closed

xavierleroy reviewed May 4, 2022

View reviewed changes

ocamlyacc documentation: simplify concurrency documentation

6250024

Octachron added this to the 5.0 milestone Jul 22, 2022

gasche approved these changes Jul 22, 2022

View reviewed changes

Octachron mentioned this pull request Sep 23, 2022

manual: ocamlyacc parsers are not thread-safe #11562

Merged

Octachron closed this Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stdlib audit: use DLS for parsing global state #11225

stdlib audit: use DLS for parsing global state #11225

Octachron commented Apr 28, 2022

avsm Apr 28, 2022

Octachron Apr 28, 2022

avsm Apr 28, 2022

Octachron Apr 29, 2022 •

edited

Loading

gasche Jul 22, 2022

gasche Jul 22, 2022

gasche Jul 22, 2022

lthls Jul 22, 2022

Octachron Jul 22, 2022

c-cube Jul 22, 2022

xavierleroy left a comment

xavierleroy May 4, 2022

Octachron May 4, 2022

xavierleroy May 4, 2022

Octachron commented Jul 22, 2022

gasche left a comment

gasche Jul 22, 2022

Octachron commented Aug 22, 2022

Octachron commented Sep 23, 2022

		\end{verbatim}


		It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain.

stdlib audit: use DLS for parsing global state #11225

stdlib audit: use DLS for parsing global state #11225

Conversation

Octachron commented Apr 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xavierleroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron commented Jul 22, 2022

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron commented Aug 22, 2022

Octachron commented Sep 23, 2022

Octachron Apr 29, 2022 •

edited

Loading