Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib audit: use DLS for parsing global state #11225

Closed
wants to merge 5 commits into from

Conversation

Octachron
Copy link
Member

The Parsing module which is used by every ocamlyacc generated parsers relies on a global mutable work state.

This means that currently no parsers generated by ocamlyacc should be run concurrently.
This property does not seem that simple to satisfy in presence of multiple libraries that may use ocamlyacc parser internally.

This PR proposes thus to alleviate this issue by making the Parsing work state a domain local state.

This is not sufficient to make ocamlyacc thread-safe: users will still need to use locks to achieve concurrency safety, but they would no longer be required to use a single lock for all ocamlyacc generated parsers. I believe that this is an usability improvement. Moreover, this is the the best that we can do in backward compatible way since location functions like right_symbol depends on the implicit parser work state.

domain but is shared between various threads.
It is thus recommended to use locks when using ocamlyacc parsers
on different threads, but domain parallelism is achievable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we state a little more explicitly what is safe? If I understand this right, spawning multiple parsers in separate domains and letting them run to completion is safe. Suspending a computation on a domain and resuming it on the same domain is also safe, if no other parsers run in the interim on that domain.

But stack switching and migrating the parser execution to a different domain is not safe. And suspending a parser execution on a domain and resuming another parser execution on the same domain is not safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is better to be more explicit, and yes the simple safe use case is to run a parser to completion on the same domain without any interleaving without another ocamlyacc parser.

Maybe something like:

It is safe to spawn multiple parsers in separate domains and let them run to completion.
It is also safe to suspend and resume a parser on the same domain if no other parsers run in the interim on this domain.
It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reads better to me! It might also be worth giving a small example about how locks might be used to achieve concurrency-safety:

users will still need to use locks to achieve concurrency safety

It's not quite clear to me how to use locks safely here without a single lock around the entire execution of the parser.

Copy link
Member Author

@Octachron Octachron Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock needs to be taken on the entire execution of the parser indeed.
I have added explicit examples to the manual to clarify the limited safety offered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, unsafe usage of ocamlyacc according to these guidelines is already possible in OCaml 4 (you can call recursively another parser from within a semantic action) and will not break memory- or type-safety, right?

I think the documentation could clarify what safe means here, and in particular what unsafe means. If I'm hunting a segfault in a Multicore OCaml program, should I or should I not rule out ocamlyacc as a plausible culprit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant turn-off: it means that ocamlyacc was already memory-unsafe to use with OCaml 4.x, and this PR makes the code more complex (and potentially slightly less efficient) without fixing the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Florian told me "by voice" about an alternative approach he has in mind that is simpler and may also fix this Threads issue.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're looking for weird soundness bugs with ocamlyacc, #10080 is still open. It doesn't even need other threads or other parser, just some raise Parse_error in the wrong places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since after more thinking, we really don't want to encourage users to run ocamlyacc parser concurrently, a simpler fix is probably to add a global lock to Parsing and make yyparse acquire this lock. This creates a deadlock for nested ocamlyacc parsers (parser calling other parser in their semantic action), but this use case is not really supported and we could potentially add a global parameter to allow such nested parsers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using nested parsers is not supported, a lock will detect it and fail appropriately, I think ("deadlock avoided"). The user should even get a nice stacktrace showing where the lock attempt failed?

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is fine, but I'm still nit-picking on the documentation.

Comment on lines 748 to 775
It is also safe to suspend and resume a parser on the same domain if no other parsers run in the interim on this domain.

\begin{verbatim}
let parse s =
let lexbuf = Lexing.from_string s in
let r = Parser.main Lexer.token lexbuf in
Format.printf "%d@." r

let rec print s n =
if n = 0 then ()
else begin
Format.printf "print(%s) %d@." s n;
Thread.yield ();
print s (n-1)
end

let on_domain s () =
let t1 = Thread.create (print s) 10 in
let t2 = Thread.create parse (s^"\n") in
Thread.join t1; Thread.join t2

let () =
let dl =
List.map (fun s ->Domain.spawn (on_domain s))
["1+6"; "7+8"; "9+17"]
in
List.iter Domain.join dl
\end{verbatim}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say I don't fully understand what this example does :-( Could we tell a simpler story, e.g. "in all other cases use locking" ? Just omit this example and go straight to the next one...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example was supposed to illustrate the narrow coexistence path between Threads and ocamlyacc parser. But yes, it is probably better to omit this example.

\end{verbatim}


It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all our users understand that "interleave the execution" includes in particular "using the Threads library".

@Octachron Octachron added this to the 5.0 milestone Jul 22, 2022
@Octachron
Copy link
Member Author

What should we do with this change? Should we consider dropping it and seeing if users are impacted by the ocamlyacc parser shared mutable state?

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code and I believe it is correct.

I wonder if there is a noticeable impact on performance. (We are now making several DLS reads on most semantic actions.) Would it be worth taking a simple ocamlyacc grammar and checking that the runtimes are undistinguishable?

(The env-hungry functions are mostly around locations/positions, so I'd try a small grammar but with location/positions handling).

domain but is shared between various threads.
It is thus recommended to use locks when using ocamlyacc parsers
on different threads, but domain parallelism is achievable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, unsafe usage of ocamlyacc according to these guidelines is already possible in OCaml 4 (you can call recursively another parser from within a semantic action) and will not break memory- or type-safety, right?

I think the documentation could clarify what safe means here, and in particular what unsafe means. If I'm hunting a segfault in a Multicore OCaml program, should I or should I not rule out ocamlyacc as a plausible culprit?

@Octachron
Copy link
Member Author

Running the ocaml 4.07 parser on the 353 compiler ml files after loading them in memory, yields a median increase of parsing time of 7% when using the DLS version.

More precisely, looking at percentile

percentile increase in parsing time
10% +3.2%
20% +4.6%
30% +5.5%
40% +6.4%
50% +6.9%
60% +7.4%
70% +8%
80% +8.6%
90% +9.6%

with the worst cases being:

rank percentile name increase in parsing time
349 98.5876% ocamldoc/odoc_module.ml 1.12673
350 98.8701% bytecomp/simplif.ml 1.12819
351 99.1525% tools/eqparsetree.ml 1.12993
352 99.435% typing/printtyped.ml 1.13111
353 99.7175% ocamldoc/odoc_texi.ml 1.15284

Consequently, it seems that there is some visible cost (at least when parsing from memory).
Moreover, since the restriction on concurrent uses already exist with threads, I am starting to think that it might be better to keep the global state with some added warning in the documentation.

@Octachron
Copy link
Member Author

Taking in account the facts that:

  • thread safety is already an issue in OCaml 4
  • menhir is a widely available upgrade
  • there are hacks that allow to decouple a specific parser from the global ocamlyacc state in case of imperious needs

I am closing this PR in favor of #11562 which documents the existing issue and points towards menhir as a thread-safe alternative to ocamlyacc.

@Octachron Octachron closed this Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants