-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdlib audit: use DLS for parsing global state #11225
Conversation
domain but is shared between various threads. | ||
It is thus recommended to use locks when using ocamlyacc parsers | ||
on different threads, but domain parallelism is achievable. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we state a little more explicitly what is safe? If I understand this right, spawning multiple parsers in separate domains and letting them run to completion is safe. Suspending a computation on a domain and resuming it on the same domain is also safe, if no other parsers run in the interim on that domain.
But stack switching and migrating the parser execution to a different domain is not safe. And suspending a parser execution on a domain and resuming another parser execution on the same domain is not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is better to be more explicit, and yes the simple safe use case is to run a parser to completion on the same domain without any interleaving without another ocamlyacc parser.
Maybe something like:
It is safe to spawn multiple parsers in separate domains and let them run to completion.
It is also safe to suspend and resume a parser on the same domain if no other parsers run in the interim on this domain.
It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reads better to me! It might also be worth giving a small example about how locks might be used to achieve concurrency-safety:
users will still need to use locks to achieve concurrency safety
It's not quite clear to me how to use locks safely here without a single lock around the entire execution of the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock needs to be taken on the entire execution of the parser indeed.
I have added explicit examples to the manual to clarify the limited safety offered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, unsafe usage of ocamlyacc according to these guidelines is already possible in OCaml 4 (you can call recursively another parser from within a semantic action) and will not break memory- or type-safety, right?
I think the documentation could clarify what safe means here, and in particular what unsafe means. If I'm hunting a segfault in a Multicore OCaml program, should I or should I not rule out ocamlyacc as a plausible culprit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a significant turn-off: it means that ocamlyacc was already memory-unsafe to use with OCaml 4.x, and this PR makes the code more complex (and potentially slightly less efficient) without fixing the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Florian told me "by voice" about an alternative approach he has in mind that is simpler and may also fix this Threads issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're looking for weird soundness bugs with ocamlyacc
, #10080 is still open. It doesn't even need other threads or other parser, just some raise Parse_error
in the wrong places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since after more thinking, we really don't want to encourage users to run ocamlyacc parser concurrently, a simpler fix is probably to add a global lock to Parsing
and make yyparse
acquire this lock. This creates a deadlock for nested ocamlyacc parsers (parser calling other parser in their semantic action), but this use case is not really supported and we could potentially add a global parameter to allow such nested parsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using nested parsers is not supported, a lock will detect it and fail appropriately, I think ("deadlock avoided"). The user should even get a nice stacktrace showing where the lock attempt failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is fine, but I'm still nit-picking on the documentation.
manual/src/cmds/lexyacc.etex
Outdated
It is also safe to suspend and resume a parser on the same domain if no other parsers run in the interim on this domain. | ||
|
||
\begin{verbatim} | ||
let parse s = | ||
let lexbuf = Lexing.from_string s in | ||
let r = Parser.main Lexer.token lexbuf in | ||
Format.printf "%d@." r | ||
|
||
let rec print s n = | ||
if n = 0 then () | ||
else begin | ||
Format.printf "print(%s) %d@." s n; | ||
Thread.yield (); | ||
print s (n-1) | ||
end | ||
|
||
let on_domain s () = | ||
let t1 = Thread.create (print s) 10 in | ||
let t2 = Thread.create parse (s^"\n") in | ||
Thread.join t1; Thread.join t2 | ||
|
||
let () = | ||
let dl = | ||
List.map (fun s ->Domain.spawn (on_domain s)) | ||
["1+6"; "7+8"; "9+17"] | ||
in | ||
List.iter Domain.join dl | ||
\end{verbatim} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must say I don't fully understand what this example does :-( Could we tell a simpler story, e.g. "in all other cases use locking" ? Just omit this example and go straight to the next one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That example was supposed to illustrate the narrow coexistence path between Threads
and ocamlyacc parser. But yes, it is probably better to omit this example.
manual/src/cmds/lexyacc.etex
Outdated
\end{verbatim} | ||
|
||
|
||
It is not safe to migrate a parser execution to a different domain or to interleave the execution of two or more parsers on the same domain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure all our users understand that "interleave the execution" includes in particular "using the Threads library".
What should we do with this change? Should we consider dropping it and seeing if users are impacted by the ocamlyacc parser shared mutable state? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code and I believe it is correct.
I wonder if there is a noticeable impact on performance. (We are now making several DLS reads on most semantic actions.) Would it be worth taking a simple ocamlyacc grammar and checking that the runtimes are undistinguishable?
(The env
-hungry functions are mostly around locations/positions, so I'd try a small grammar but with location/positions handling).
domain but is shared between various threads. | ||
It is thus recommended to use locks when using ocamlyacc parsers | ||
on different threads, but domain parallelism is achievable. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, unsafe usage of ocamlyacc according to these guidelines is already possible in OCaml 4 (you can call recursively another parser from within a semantic action) and will not break memory- or type-safety, right?
I think the documentation could clarify what safe means here, and in particular what unsafe means. If I'm hunting a segfault in a Multicore OCaml program, should I or should I not rule out ocamlyacc as a plausible culprit?
Running the ocaml 4.07 parser on the 353 compiler ml files after loading them in memory, yields a median increase of parsing time of 7% when using the DLS version. More precisely, looking at percentile
with the worst cases being:
Consequently, it seems that there is some visible cost (at least when parsing from memory). |
Taking in account the facts that:
I am closing this PR in favor of #11562 which documents the existing issue and points towards |
The
Parsing
module which is used by every ocamlyacc generated parsers relies on a global mutable work state.This means that currently no parsers generated by ocamlyacc should be run concurrently.
This property does not seem that simple to satisfy in presence of multiple libraries that may use ocamlyacc parser internally.
This PR proposes thus to alleviate this issue by making the
Parsing
work state a domain local state.This is not sufficient to make ocamlyacc thread-safe: users will still need to use locks to achieve concurrency safety, but they would no longer be required to use a single lock for all ocamlyacc generated parsers. I believe that this is an usability improvement. Moreover, this is the the best that we can do in backward compatible way since location functions like
right_symbol
depends on the implicit parser work state.