-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser combinators are not thread-safe #321
Comments
Imported From: https://issues.scala-lang.org/browse/SI-4929?orig=1
|
Brian Maso (bmaso) said (edited on Oct 7, 2011 4:15:51 PM UTC): The tack trace I received is the same as the original reporter'sfor the last few stack frames: java.lang.NullPointerException: null The offending line is 132. AFAICT, The only way an NPE would happen at this point is if the "next", "lastNoSuccess", or "lastNoSuccess.next" members are null. "lastNoSuccess" is a var in the outer class, so while the not-null check at line 132 would seem to guarantee "lastNoSuccess" is not null, the highly infrequent and random occurrence of the NPE points to a race condition -- perhaps the "lastNoSuccess" member is somehow being changed to null after the null check, but before the "< lastNoPosition.next" check? |
@retronym said: You should construct a new This should be better documented in both the parser and JSON packages. http://scala-programming-language.1934581.n4.nabble.com/Scala-Parsers-are-not-thread-safe-td2243477.html |
@acruise said: |
@dcsobral said: Now, I'm all for correctness -- it's easy to make something fast if it doesn't have to work. However, I can't think of any reason why parser should be shared between multiple threads. Why not simply instantiate it on demand? JSON itself could be turned into a class, and the object turned into a factory. Here's the pull request. |
@paulp said: |
@dcsobral said: |
Stephen Judkins (stephenjudkins) said:
|
@dcsobral said: |
Stephen Judkins (stephenjudkins) said: We should have actual benchmark data before we continue this discussion. Have you written one yet? If not, would you like me to do so? |
@dcsobral said: I haven't written the benchmark for this yet -- I won't have time for that before Friday, and I didn't have any particular regex parser in mind. However, I just recalled Scalate is based on Scala parsers, and I think they do have some benchmarks. If so, it would be a nice, non-trivial, real-world test case. |
jsh (jhooda) said: The exception was noticed in |
@SethTisue said (edited on Jan 4, 2013 3:01:32 AM UTC): I was able to work around it as follows by running this code when I'm done parsing:
but, yuck. the good news is the lastNoSuccess thing is deprecated in 2.10, so it can be removed in 2.11. |
Stephen Judkins (stephenjudkins) said: I'm confused, though. From http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html:
Further, I can't reproduce this: watching this contrived example [https://gist.github.com/8df7bba520c7f3a56ac3] run on 2.10 in VisualVM indicates no leakage. I suppose there might be a circular reference back to the Thread somewhere in your particular parser? As a form of penance I'd be happy to help you track down this particular issue. |
@SethTisue said (edited on Jan 9, 2013 6:30:36 PM UTC): You don't have to have multiple threads for there to be a leak. In fact, the leak potential is greatest when you have only a single thread. The issue is that Parsers.lastNoSuccessVar retains a reference to some of the input to the parser. That is the leak. (Your input might just be some Strings, but in my app, the input was Token objects that were connected to entire graphs of other objects.) dce6b34c didn't introduce the leak; it already existed. But it made the leak worse, as follows. Before dce6b34c, a reference to the input was retained, but assuming the Parsers object itself eventually became eligible for GC, then the reference will get GC'ed along with it. After dce6b34c, it is no longer enough for the Parser object to become eligible for GC for the reference to the input be dropped. Because the reference is retained through a ThreadLocal, that means it won't go away until the thread on which parsing took place gets GC'ed. So if you do parsing on a long-lived thread, you're in trouble. Furthermore, dce6b34c doesn't just use ThreadLocal, it used InheritableThreadLocal, so the issue propagates to any child threads of the original thread, even if no parsing takes place on the child thread. (This wrinkle is what led me down a garden path about multiple threads; in my code I was doing parsing on both a parent thread and its child thread.) |
@SethTisue said (edited on Jan 9, 2013 6:31:24 PM UTC): An alternative would be to change the InheritableThreadLocal to a regular ThreadLocal (this would require using ThreadLocal directly, instead of scala.util.DynamicVariable) and provide an explicit cleanup method that calls Either solution would make me and my code happy. I don't have a strong opinion about the "best" solution. But I guess I lean towards simply reverting, since ThreadLocal is notoriously error-prone and leak-prone, and since the original thread safety issue is of long standing. This ticket was never closed, so the issue was never advertised as fixed in 2.10. |
@adriaanm said: |
@adriaanm said: deferring to 2.11 |
@adriaanm said: |
Jens Halm (jenshalm) said:
Regarding 1) there would be a fairly straightforward fix. Currently each NoSuccess instance writes to the variable, blindly. So when the top level parser is not a phrase parser a NoSuccess will be written and never cleaned up afterwards. Although using phrase as the top level parser is probably the most common use case, some have run into this issue apparently. The solution would be to add a layer of indirection inside the ThreadLocal and wrap the option in an instance that would know whether it is tracking or not: private class LastNoSuccess (tracking: Boolean) {
private var current: Option[NoSuccess] = None
def set (value: Option[NoSuccess]) = if (tracking) current = value
def get = current
}
private val lastNoSuccess = new DynamicVariable(new LastNoSuccess(false)) And then in the phrase parser: def apply(in: Input) = lastNoSuccess.withValue(new LastNoSuccess(true)) { This would retain the thread-safety while mitigating the effect of the leak as it would not leave a reference to some (potentially large) input in the ThreadLocal. (Note that I haven't tested this approach, as I'd like to gather feedback first). Regarding 2) I'm not sure how much value the tracking variable actually adds and wether it is worth all the hassle it causes, now that it is no longer public API. The only scenario in which it is usually used is when the phrase parser succeeds and has some input left which has not been consumed. If the lastNoSuccess offset was behind the position of the final parser that succeeded it is returned to the caller. I don't have a good overview of how many realistic scenarios exist where this will give the best hint for the actual cause of the failure. Any thoughts? If someone thinks 1) or 2) would be a promising route, I'd be happy to implement and test this and do a pull request (probably after ScalaDays though). |
@SethTisue said: I agree that suggestion scala/bug#1 is strictly better than the current code. However, of course ideally we'd have no ThreadLocal at all. (The perfect is always the enemy of the good...) Let's try and construct a case where
And then we parse "0+". With the So then the question is, which error is better? There are two ways of viewing the input. One is that it consists of good input (0) followed by some garbage (+), thus, "end of input expected". The other is that all the input is good but it is merely incomplete, thus, "expected number". The |
@SethTisue said: |
@SethTisue said: |
Jens Halm (jenshalm) said (edited on Jun 28, 2013 12:03:41 AM UTC): As for putting it inside Success, I'm afraid that won't work (or I don't understand the idea). It's a case class used in pattern matches everywhere so we cannot change the API. And in the hundreds of places where new Success instances get created, how would we set the value there, where would we get it from? I feel the ideal solution (but that would require API changes, too), would be to route something through the parsing operations (e.g. piggy-backing on Reader or (semantically cleaner) in a separate, custom state object. Similarly like the parsec library for Haskell allows to route user state through the parsing process. But I don't know the considerations behind the original design that led to user state being left out. Regarding my idea 1), what are your concerns, apart from not being 100% perfect? It would reduce the leak from a ThreadLocal potentially holding the full input string, to a ThreadLocal holding a None, so a huge improvement. And yes, always using the phrase parser as an entry point avoids the issue. I do that in Laika, too, so I'm not really affected. I'm just volunteering as I noticed this ancient issue causes some "bad press" for the combinators. :-) |
@SethTisue said: I have nothing specific against the ThreadLocal thing other than general fear and distrust of ThreadLocal. I think you should go ahead. |
Sarah Gerweck (gerweck) said: I also think it would be a big mistake to get rid of the proper phrase support that As far as I can tell Jens's proposal would be good enough to enable any real-world use case. It's a nice bonus that you wouldn't have to move away from I think any kind of thread-local storage means you should put a notice in the documentation that things won't work properly unless all the parsing happens on the same thread. One of the benefits of combinatorial parsing in Scala is that you can interact with your code & data structures from inside your parser. This can make it easy to start mixing in actors or parallel collections in certain situations. An inheritable thread-local variable won't be retained if you're dealing with actors or worker pools. I don't think that it's at all unreasonable to say that all the parsing has to happen on the same thread. As long as people know it's a restriction, I don't think they'll have any trouble respecting it. If they don't know, somebody is going to be back here filing another bug. ;-) |
@adriaanm said: |
@adriaanm said (edited on Jan 29, 2014 5:46:05 PM UTC): |
David Carlton (davidcarltonsumo) said: |
Sarah Gerweck (gerweck) said: A cursory look at the code does suggest that if you're reusing the same parser from a lot of different threads, you can wind up with stray references. If your parsers are actually short-lived though, those dynamic variables should be getting garbage collected. |
David Carlton (davidcarltonsumo) said: Examining the heap dump, it seems like the thread object has an array of inherited thread locals which contain the I dunno; I'll try to come up with an isolated test case so I can understand when my hundreds of thousands of not garbage collected parsers are coming from... |
Jens Halm (jenshalm) said: |
David Carlton (davidcarltonsumo) said: |
David Carlton (davidcarltonsumo) said: |
Jens Halm (jenshalm) said (edited on Nov 27, 2014 1:17:11 PM UTC): The problem is already fully understood. From the source code it is very obvious that the ThreadLocal will leak when you don't use the phrase parser as the top level parser. I already suggested a workaround that would minimize the problem sometime ago (see older comments). It would reduce the leak from having the entire parser hierarchy including the full input string hanging in the ThreadLocal to only leaking a None per ThreadLocal which is comparably harmless. I just never got around to implementing this improvement because I never got feedback from project maintainers whether they'd accept such a fix and then it somehow fell from my radar. |
@SethTisue said: |
with any luck, fixed by #234 |
JSON.parseFull or parseRaw randomly fails with NPE. In order to get the stacktrace, one must use -Xint flag. It works fine most of the time and randomly fails. When I run a simple script that parses JSON.parseFull("{"hello": "dude"}"), when run in a loop 10K times, it fails a few times during the run (again, randomly). Below is the stacktrace...
java.lang.NullPointerException: null
at scala.util.parsing.combinator.Parsers$NoSuccess.(Parsers.scala:132) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$Failure.(Parsers.scala:159) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:499) ~[scala-library.jar:na]
...
The text was updated successfully, but these errors were encountered: