Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser combinators are not thread-safe #321

Closed
scabug opened this issue Aug 19, 2011 · 38 comments
Closed

Parser combinators are not thread-safe #321

scabug opened this issue Aug 19, 2011 · 38 comments

Comments

@scabug
Copy link

scabug commented Aug 19, 2011

JSON.parseFull or parseRaw randomly fails with NPE. In order to get the stacktrace, one must use -Xint flag. It works fine most of the time and randomly fails. When I run a simple script that parses JSON.parseFull("{"hello": "dude"}"), when run in a loop 10K times, it fails a few times during the run (again, randomly). Below is the stacktrace...

java.lang.NullPointerException: null
at scala.util.parsing.combinator.Parsers$NoSuccess.(Parsers.scala:132) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$Failure.(Parsers.scala:159) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:499) ~[scala-library.jar:na]
...

@scabug
Copy link
Author

scabug commented Aug 19, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4929?orig=1
Reporter: Ilya Sterin (isterin)
Affected Versions: 2.9.2
Attachments:

@scabug
Copy link
Author

scabug commented Oct 7, 2011

Brian Maso (bmaso) said (edited on Oct 7, 2011 4:15:51 PM UTC):
I just ran in to this same NPE, at the same line in Parsers.NoSuccess. No that the NPE occurred when processing tens of thousands of JSON docs (Twitter API tweets, actually). The NPE seems to have occurred on a line of JSON input document that is well formed. The JSON parser is the Dispatch JSON parser, version 0.8.5.

The tack trace I received is the same as the original reporter'sfor the last few stack frames:

java.lang.NullPointerException: null
at scala.util.parsing.combinator.Parsers$NoSuccess.(Parsers.scala:132) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$Failure.(Parsers.scala:159) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:499) ~[scala-library.jar:na]
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:497) ~[scala-library.jar:na]

The offending line is 132. AFAICT, The only way an NPE would happen at this point is if the "next", "lastNoSuccess", or "lastNoSuccess.next" members are null. "lastNoSuccess" is a var in the outer class, so while the not-null check at line 132 would seem to guarantee "lastNoSuccess" is not null, the highly infrequent and random occurrence of the NPE points to a race condition -- perhaps the "lastNoSuccess" member is somehow being changed to null after the null check, but before the "< lastNoPosition.next" check?

@scabug
Copy link
Author

scabug commented Oct 8, 2011

@retronym said:
The stdlib Parser combinators aren't threadsafe, and hence neither is JSON.parseRaw

You should construct a new scala.util.parsing.json.Parser for each request, and use a pool of them if performance counts.

This should be better documented in both the parser and JSON packages.

http://scala-programming-language.1934581.n4.nabble.com/Scala-Parsers-are-not-thread-safe-td2243477.html
http://www.scala-lang.org/node/8356

@scabug
Copy link
Author

scabug commented Nov 23, 2011

@acruise said:
Can we either change the title of this bug, or file a new one, about the lack of thread safety?

@scabug
Copy link
Author

scabug commented May 8, 2012

@dcsobral said:
I'm troubled by the fix to this bug. It makes phrase and NoSuccess slower, by introducing a lazy val, and DynamicVariable and an Option. Now, phrase shouldn't be much of a problem, but normal parsing process will create tons of NoSuccess, even if the parser succeeds (one for every option that was considered and failed).

Now, I'm all for correctness -- it's easy to make something fast if it doesn't have to work. However, I can't think of any reason why parser should be shared between multiple threads. Why not simply instantiate it on demand? JSON itself could be turned into a class, and the object turned into a factory.

Here's the pull request.

@scabug
Copy link
Author

scabug commented May 8, 2012

@paulp said:
Well, I sat on this for months because I wasn't real excited about it, and finally I asked adriaan to make a determination and he said go with it, although it wasn't the most ringing endorsement either. But mostly, neither of us can afford to engage with peripheral library issues right now. If you want to send another pull request that'd be great.

@scabug
Copy link
Author

scabug commented May 8, 2012

@dcsobral said:
Sure. Close the ticket and I'll look into possible performance issues. Happily, I have the benchmark setup I wrote for Range.foreach.

@scabug
Copy link
Author

scabug commented May 8, 2012

Stephen Judkins (stephenjudkins) said:
Thanks for looking into possible performance implications. A few points:

  • I did some benchmarking (of various JSON strings) and did not, to my recollection, see a major drop in performance. The cost of creating NoSuccess instances was small relative to everything else the parser does.
  • In our case (IRC protocol parser) constructing the parser instances was more expensive than the parsing itself. This is obviously a function of the very small size of messages, and relatively complexity of parser, but just creating a new non-threadsafe parser for every parse job would not be an acceptable solution for many people.
  • I know this solution is fugly. I played around with other solutions that don't use mutable state, but the only reasonable one I came up with (state monad containing lastNoSuccess) would involve breaking the existing parser API. If anyone has any superior suggestions I'd love to help implement them.

@scabug
Copy link
Author

scabug commented May 8, 2012

@dcsobral said:
Have you tried without {{lazy}}? I don't see it gaining much.

@scabug
Copy link
Author

scabug commented May 8, 2012

Stephen Judkins (stephenjudkins) said:
I recall (it's been awhile) that lazy val is used instead of val to avoid initialization order problems, not as an optimization.

We should have actual benchmark data before we continue this discussion. Have you written one yet? If not, would you like me to do so?

@scabug
Copy link
Author

scabug commented May 8, 2012

@dcsobral said:
I have a framework for the benchmark. http://github.com/dcsobral/scala-foreach-benchmark. What is nice about it is that you can put different versions of the compiler in a directory, and it will run the tests through caliper for each compiler version.

I haven't written the benchmark for this yet -- I won't have time for that before Friday, and I didn't have any particular regex parser in mind.

However, I just recalled Scalate is based on Scala parsers, and I think they do have some benchmarks. If so, it would be a nice, non-trivial, real-world test case.

@scabug
Copy link
Author

scabug commented Jul 20, 2012

jsh (jhooda) said:
Wondering when it will be resolved? Another issue is deep recursion inherent in the code, which affects performance (see stack trace):

The exception was noticed in
org.scala-lang
scala-library
2.9.1

@scabug
Copy link
Author

scabug commented Jul 20, 2012

@dcsobral said:
@jsh A solution has been provided. The only thing's missing is benchmark to see if there's no serious performance impact.

@scabug
Copy link
Author

scabug commented Jan 4, 2013

@SethTisue said (edited on Jan 4, 2013 3:01:32 AM UTC):
the "cure" here was worse than the disease, at least for me. scala/scala@dce6b34 broke my app under 2.10. the commit introduces a ThreadLocal, without providing any way of calling remove() on it to clean up. so my no-longer-used Thread isn't eligible for garbage collection anymore like it was on 2.9.

I was able to work around it as follows by running this code when I'm done parsing:

val field = getClass.getDeclaredField("scala$util$parsing$combinator$Parsers$$lastNoSuccessVar")
field.setAccessible(true)
val field2 = classOf[scala.util.DynamicVariable[_]].getDeclaredField("tl")
field2.setAccessible(true)
field2.get(field.get(this)).asInstanceOf[java.lang.ThreadLocal[_]].remove()
field.set(this, null)

but, yuck.

the good news is the lastNoSuccess thing is deprecated in 2.10, so it can be removed in 2.11.

@scabug
Copy link
Author

scabug commented Jan 4, 2013

Stephen Judkins (stephenjudkins) said:
Agreed: yuck. Given this report, I'm inclined to agree with skeptics that this shouldn't have been merged in.

I'm confused, though. From http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html:
`
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).

This implies that a non-alive thread can be garbage-collected even if aremove` hasn't been called. If ThreadLocal follows this contract, this implies that the NoSuccess instance contains a reference.

Further, I can't reproduce this: watching this contrived example [https://gist.github.com/8df7bba520c7f3a56ac3] run on 2.10 in VisualVM indicates no leakage. I suppose there might be a circular reference back to the Thread somewhere in your particular parser?

As a form of penance I'd be happy to help you track down this particular issue.

@scabug
Copy link
Author

scabug commented Jan 9, 2013

@SethTisue said (edited on Jan 9, 2013 6:30:36 PM UTC):
The situation here is simpler than I had imagined.

You don't have to have multiple threads for there to be a leak. In fact, the leak potential is greatest when you have only a single thread.

The issue is that Parsers.lastNoSuccessVar retains a reference to some of the input to the parser. That is the leak. (Your input might just be some Strings, but in my app, the input was Token objects that were connected to entire graphs of other objects.)

dce6b34c didn't introduce the leak; it already existed. But it made the leak worse, as follows.

Before dce6b34c, a reference to the input was retained, but assuming the Parsers object itself eventually became eligible for GC, then the reference will get GC'ed along with it.

After dce6b34c, it is no longer enough for the Parser object to become eligible for GC for the reference to the input be dropped. Because the reference is retained through a ThreadLocal, that means it won't go away until the thread on which parsing took place gets GC'ed. So if you do parsing on a long-lived thread, you're in trouble.

Furthermore, dce6b34c doesn't just use ThreadLocal, it used InheritableThreadLocal, so the issue propagates to any child threads of the original thread, even if no parsing takes place on the child thread. (This wrinkle is what led me down a garden path about multiple threads; in my code I was doing parsing on both a parent thread and its child thread.)

@scabug
Copy link
Author

scabug commented Jan 9, 2013

@SethTisue said (edited on Jan 9, 2013 6:31:24 PM UTC):
For 2.10.1, simply reverting dce6b34c is an option, but the drawback is, it would reintroduce the original thread safety issue.

An alternative would be to change the InheritableThreadLocal to a regular ThreadLocal (this would require using ThreadLocal directly, instead of scala.util.DynamicVariable) and provide an explicit cleanup method that calls remove() and/or sets lastNoSuccessVar.value to None, so I don't have to do so via reflection.

Either solution would make me and my code happy. I don't have a strong opinion about the "best" solution. But I guess I lean towards simply reverting, since ThreadLocal is notoriously error-prone and leak-prone, and since the original thread safety issue is of long standing. This ticket was never closed, so the issue was never advertised as fixed in 2.10.

@scabug
Copy link
Author

scabug commented Jan 9, 2013

@adriaanm said:
I'm happy to give decision power to whomever comes up with a convincing pull request.

@scabug
Copy link
Author

scabug commented Jan 28, 2013

@adriaanm said:
unfortunately, a change to private lazy val lastNoSuccess: DynamicVariable[Option[NoSuccess]] in trait Parsers
affects binary compatible, as private vals defined in traits give rise to java-public accessors in the corresponding interface

deferring to 2.11

@scabug
Copy link
Author

scabug commented Mar 15, 2013

@adriaanm said:
Un-assigning to foster work stealing, as announced in https://groups.google.com/forum/?fromgroups=#!topic/scala-internals/o8WG4plpNkw

@scabug
Copy link
Author

scabug commented May 26, 2013

Jens Halm (jenshalm) said:
Wouldn't the fact that lastNoSuccess is no longer public in 2.11. offer new opportunities to prevent the leak? The only guy left in town actually listening to this tracking variable is the phrase parser. As I see it this would open two new possibilities:

  1. Don't track the value if no-one is listening to it (and no-one is cleaning up afterwards)
  2. Stop tracking altogether

Regarding 1) there would be a fairly straightforward fix. Currently each NoSuccess instance writes to the variable, blindly. So when the top level parser is not a phrase parser a NoSuccess will be written and never cleaned up afterwards. Although using phrase as the top level parser is probably the most common use case, some have run into this issue apparently. The solution would be to add a layer of indirection inside the ThreadLocal and wrap the option in an instance that would know whether it is tracking or not:

private class LastNoSuccess (tracking: Boolean) {
  private var current: Option[NoSuccess] = None
  def set (value: Option[NoSuccess]) = if (tracking) current = value
  def get = current
}
  
private val lastNoSuccess = new DynamicVariable(new LastNoSuccess(false))

And then in the phrase parser:

def apply(in: Input) = lastNoSuccess.withValue(new LastNoSuccess(true)) {

This would retain the thread-safety while mitigating the effect of the leak as it would not leave a reference to some (potentially large) input in the ThreadLocal. (Note that I haven't tested this approach, as I'd like to gather feedback first).

Regarding 2) I'm not sure how much value the tracking variable actually adds and wether it is worth all the hassle it causes, now that it is no longer public API. The only scenario in which it is usually used is when the phrase parser succeeds and has some input left which has not been consumed. If the lastNoSuccess offset was behind the position of the final parser that succeeded it is returned to the caller. I don't have a good overview of how many realistic scenarios exist where this will give the best hint for the actual cause of the failure.

Any thoughts? If someone thinks 1) or 2) would be a promising route, I'd be happy to implement and test this and do a pull request (probably after ScalaDays though).

@scabug
Copy link
Author

scabug commented Jun 27, 2013

@SethTisue said:
Jens, I'm glad you're working on this.

I agree that suggestion scala/bug#1 is strictly better than the current code.

However, of course ideally we'd have no ThreadLocal at all. (The perfect is always the enemy of the good...)

Let's try and construct a case where lastNoSuccessVar makes a difference. Consider the arithmetic expression parser in Programming in Scala, at http://booksites.artima.com/programming_in_scala_2ed/examples/html/ch33.html . It uses RegexParsers.parseAll which calls Parsers.phrase. Suppose we change the definition of factor a little to give a better error:

  def factor: Parser[Any] = floatingPointNumber | "("~expr~")" | failure("expected factor")

And then we parse "0+". With the lastNoSuccessVar logic in place, we get "expected factor". Without it we just get "end of input expected". (Yup, I ran it both ways to be sure.)

So then the question is, which error is better? There are two ways of viewing the input. One is that it consists of good input (0) followed by some garbage (+), thus, "end of input expected". The other is that all the input is good but it is merely incomplete, thus, "expected number".

The filterNot call in Parser.phrase is there so that the choice that consumed more input is taken. And, sorry to say, I think that makes perfect sense. If you agree, then I think we need to address this without altering the behavior.

@scabug
Copy link
Author

scabug commented Jun 27, 2013

@SethTisue said:
Here's an idea. I haven't tried coding it up. What if {{lastNoSuccess}} was a {{val}} inside {{Success}}, rather than a a member of {{Parsers}}? Then every time something parses successfully, I could also ask what the last {{NoSuccess}} was along the way. It could be an ordinary field, not a ThreadLocal, and the {{phrase}} method would use it to pick the better error. Is this idea workable...?

@scabug
Copy link
Author

scabug commented Jun 27, 2013

@SethTisue said:
(As an aside, in my own code I was able to remove my clumsy reflection-based workaround code, simply by instead inserting a call to phrase(), which I wasn't previously calling. Always using phrase() as your entry point for parsing ensures proper cleanup of lastNoSuccessVar.)

@scabug
Copy link
Author

scabug commented Jun 28, 2013

Jens Halm (jenshalm) said (edited on Jun 28, 2013 12:03:41 AM UTC):
Thanks for providing the example. I agree that rules out no. 2 of my suggestions.

As for putting it inside Success, I'm afraid that won't work (or I don't understand the idea). It's a case class used in pattern matches everywhere so we cannot change the API. And in the hundreds of places where new Success instances get created, how would we set the value there, where would we get it from?

I feel the ideal solution (but that would require API changes, too), would be to route something through the parsing operations (e.g. piggy-backing on Reader or (semantically cleaner) in a separate, custom state object. Similarly like the parsec library for Haskell allows to route user state through the parsing process. But I don't know the considerations behind the original design that led to user state being left out.

Regarding my idea 1), what are your concerns, apart from not being 100% perfect? It would reduce the leak from a ThreadLocal potentially holding the full input string, to a ThreadLocal holding a None, so a huge improvement.

And yes, always using the phrase parser as an entry point avoids the issue. I do that in Laika, too, so I'm not really affected. I'm just volunteering as I noticed this ancient issue causes some "bad press" for the combinators. :-)

@scabug
Copy link
Author

scabug commented Jun 28, 2013

@SethTisue said:
Jens: I think you're right — my idea (making lastNoSuccess part of Success) wouldn't work without sweeping changes.

I have nothing specific against the ThreadLocal thing other than general fear and distrust of ThreadLocal. I think you should go ahead.

@scabug
Copy link
Author

scabug commented Jul 9, 2013

Sarah Gerweck (gerweck) said:
I have to say I'm very glad that the "we don't need thread safety" proposal seems to be off the table. I've built a few parsers using the combinator library, and in my tests creating a parser is a couple orders of magnitude more expensive than running a parser, making it far too expensive in the use cases I've shipped. An object pool is workable but it's kind of embarrassing to have to do that in a functional language.

I also think it would be a big mistake to get rid of the proper phrase support that noLastSuccess enables. Seth's example is a good one: no user would think that "end of input expected" meant that you forgot to include a second argument. That would be a classic example of an error message that gives no guidance on what's really wrong or how to fix it—and the parsing library would offer no way to improve the message.

As far as I can tell Jens's proposal would be good enough to enable any real-world use case. It's a nice bonus that you wouldn't have to move away from DynamicVariable since there's no risk of GC or data leaks when the container object is explicitly scoped like that.

I think any kind of thread-local storage means you should put a notice in the documentation that things won't work properly unless all the parsing happens on the same thread. One of the benefits of combinatorial parsing in Scala is that you can interact with your code & data structures from inside your parser. This can make it easy to start mixing in actors or parallel collections in certain situations. An inheritable thread-local variable won't be retained if you're dealing with actors or worker pools.

I don't think that it's at all unreasonable to say that all the parsing has to happen on the same thread. As long as people know it's a restriction, I don't think they'll have any trouble respecting it. If they don't know, somebody is going to be back here filing another bug. ;-)

@scabug
Copy link
Author

scabug commented Jul 10, 2013

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

@scabug
Copy link
Author

scabug commented Jan 29, 2014

@adriaanm said (edited on Jan 29, 2014 5:46:05 PM UTC):
FYI, we've made it much easier to contribute to the parser combinator library.
https://github.com/scala/scala-parser-combinators eagerly awaits your pull requests -- it has a standard sbt build, test suite, travis CI

@scabug
Copy link
Author

scabug commented Nov 25, 2014

David Carlton (davidcarltonsumo) said:
I think I just ran into this - I've got a heap dump with hundreds of thousands of Parsers$Failure objects, and they seem to be held through inherited thread locals. Am I correct in thinking that, if you have a mix of long-lived threads with short-lived parsers, that this creates a memory leak? If so, that seems quite bad to me, worse than the thread safety issues that this bug was originally about.

@scabug
Copy link
Author

scabug commented Nov 25, 2014

Sarah Gerweck (gerweck) said:
I'm not sure that it's worse, because before you effectively had to make a new parser every time or force all your code onto a single thread. If you make a new parser every time, garbage collection will remove the DynamicVariable that comes along with it. If you force all your code onto a single thread, you can pretty easily use a single-threaded executor so that only one instance of ThreadLocal is ever allocated.

A cursory look at the code does suggest that if you're reusing the same parser from a lot of different threads, you can wind up with stray references. If your parsers are actually short-lived though, those dynamic variables should be getting garbage collected.

@scabug
Copy link
Author

scabug commented Nov 25, 2014

David Carlton (davidcarltonsumo) said:
Yeah; I really don't understand why they're not being garbage collected in my case, because they should indeed be short-lived. But somehow these DynamicVariable objects and the associated parsers are staying around.

Examining the heap dump, it seems like the thread object has an array of inherited thread locals which contain the DynamicVariable objects, so just looking at it from a naive point of view, it seems like the dynamic variables and the associated parsers should remain alive in perpetuity. And that's the behavior that I'm currently seeing; but it seems to me like, if that were really the case, then DynamicVariable would be very hard to use safely, so there has to be something that I'm missing. (And this actually used to work fine for me, before making what should have been a relatively minor change in the code.)

I dunno; I'll try to come up with an isolated test case so I can understand when my hundreds of thousands of not garbage collected parsers are coming from...

@scabug
Copy link
Author

scabug commented Nov 25, 2014

Jens Halm (jenshalm) said:
The current version leaks into the ThreadLocal if the top-level parser is not phrase. Maybe that was the little change you made? Whether objects are short-lived or not should not make a difference, btw.

@scabug
Copy link
Author

scabug commented Nov 26, 2014

David Carlton (davidcarltonsumo) said:
Ah, thanks for the insight, that could be related.

@scabug
Copy link
Author

scabug commented Nov 27, 2014

David Carlton (davidcarltonsumo) said:
I managed to reproduce my problem with an isolated test case; given that my problem is different from the problem that this bug was filed about, I figured the best thing to do would be to file a new bug, so I created scala/bug#9010 with reproduction steps.

@scabug
Copy link
Author

scabug commented Nov 27, 2014

Jens Halm (jenshalm) said (edited on Nov 27, 2014 1:17:11 PM UTC):
I'm not sure it is a good idea to create a new ticket. Yes, the title of this ticket is misleading, but most of the discussion in this ticket already relates to the leak and not the original problem. The new ticket suggests that it is something that requires new investigations and might lead to someone overlooking this extensive discussion.

The problem is already fully understood. From the source code it is very obvious that the ThreadLocal will leak when you don't use the phrase parser as the top level parser.

I already suggested a workaround that would minimize the problem sometime ago (see older comments). It would reduce the leak from having the entire parser hierarchy including the full input string hanging in the ThreadLocal to only leaking a None per ThreadLocal which is comparably harmless.

I just never got around to implementing this improvement because I never got feedback from project maintainers whether they'd accept such a fix and then it somehow fell from my radar.

@scabug scabug closed this as completed Jul 17, 2015
@scabug
Copy link
Author

scabug commented Jul 18, 2015

@SethTisue said:
there is a PR for this, needing review, at #59

@SethTisue SethTisue transferred this issue from scala/bug Nov 19, 2020
@scala scala deleted a comment from scabug Nov 19, 2020
@SethTisue SethTisue reopened this Nov 19, 2020
@SethTisue
Copy link
Member

with any luck, fixed by #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants