Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31214][BUILD] Upgrade Janino to 3.1.2 #27860

Closed
wants to merge 1 commit into from

Conversation

HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Mar 10, 2020

What changes were proposed in this pull request?

This PR proposes to upgrade Janino to 3.1.2 which is released recently.

Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15.

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html

Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.1.1 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2.

Furthermore, from my testing, janino-compiler/janino#90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well.

Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

@SparkQA
Copy link

SparkQA commented Mar 10, 2020

Test build #119594 has finished for PR 27860 at commit dc29112.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2020

Test build #119596 has finished for PR 27860 at commit 09548f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR HeartSaVioR changed the title [SPARK-31101][SQL][BUILD] Upgrade Janino to 3.1.1 [WIP][SPARK-31101][SQL][BUILD] Upgrade Janino to 3.1.1 Mar 10, 2020
@HeartSaVioR
Copy link
Contributor Author

Build failure seems to be related - just changed the PR to WIP. I'll kindly ask Janino maintainer to retain 3.0.x release but not sure it can be accepted. I'll look into details on the failures and try to deal with 3.1.1 as well.

@dongjoon-hyun
Copy link
Member

Yes. The relevant UT failure seems to block us from upgrading.

failed to compile: org.codehaus.commons.compiler.InternalCompilerException: 
Compiling "GeneratedClass" in File 'generated.java', 
Line 1, Column 1: Compiling "MapEntries_0(InternalRow i)": 
Inconsistent operand stack: 

@dongjoon-hyun
Copy link
Member

BTW, thank you for taking a look at this, @HeartSaVioR .

@HeartSaVioR
Copy link
Contributor Author

UPDATE: I only have local dev. for now so cannot run all of Spark UTs easily, but I tried out some failed UTs with the custom build of Janino 3.0.15 + janino-compiler/janino#114 and they passed.

So it would be the ideal if we can have Janino 3.0.16 only adding janino-compiler/janino#114 on top of 3.0.15. I've asked it and waiting for the response. janino-compiler/janino#115

In parallel, hopefully one of failing case is easy to reproduce - I've reported the issue via janino-compiler/janino#116 . I'll try to cover more cases as there's a case having Janino 3.0.16 couldn't happen.

@HeartSaVioR
Copy link
Contributor Author

FYI, while I'm still waiting for hearing the news from Janino, I also took the workaround into #27872 - while #27872 targets to master branch, I guess it would be more helpful for older versions, including 2.4.x. It would depend on how long we can wait; maybe we can simply apply #27872, and revert once Janino is ready.

@HeartSaVioR
Copy link
Contributor Author

UPDATE: Somehow I fixed janino-compiler/janino#116 as well via janino-compiler/janino#117. Since I'm not sure this PR would fix all of UT failures we've seen before, I just made a tag in my repo and published via Jitpack.

Please note that I don't intend to maintain custom version Janino. Once we confirm it fixes the all test failures, I'll get back to Janino maintainer and ask for merging the PR & releasing next version. After then we can finally update the Janino version to the official one here.

@SparkQA
Copy link

SparkQA commented Mar 13, 2020

Test build #119753 has finished for PR 27860 at commit 38d6a86.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2020

Test build #119776 has finished for PR 27860 at commit 90bebd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

UPDATE: I also found janino-compiler/janino#119 and fixed it as well via janino-compiler/janino#120. Now Jenkins passes :)

If we feel OK to rely on UT passes as considering stability of Janino 3.1.x, we can go forward once Janino adopts these patches and do the next release.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119905 has finished for PR 27860 at commit 51d090d.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119907 has finished for PR 27860 at commit 51d090d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@dongjoon-hyun
Copy link
Member

Thank you for E2E experimenting. BTW, if there is a chance, let's test with JDK11 together.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119922 has finished for PR 27860 at commit 51d090d.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119933 has finished for PR 27860 at commit 51d090d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR HeartSaVioR changed the title [WIP][SPARK-31101][SQL][BUILD] Upgrade Janino to 3.1.1 [WIP][SPARK-31101][SQL][BUILD][test-java11] Upgrade Janino to 3.1.1 Mar 17, 2020
@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Mar 18, 2020

Test build #119957 has finished for PR 27860 at commit 51d090d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR HeartSaVioR changed the title [WIP][SPARK-31101][SQL][BUILD][test-java11] Upgrade Janino to 3.1.1 [WIP][SPARK-31101][SQL][BUILD][test-hadoop3.2][test-java11] Upgrade Janino to 3.1.1 Mar 18, 2020
@HeartSaVioR
Copy link
Contributor Author

retest this, please

@maropu
Copy link
Member

maropu commented Mar 19, 2020

In general, this looks okay to me for 3.1.0. BTW, @maropu ~ Do you have a plan to update your useful repo, https://github.com/maropu/spark-tpcds-datagen , for 3.0.0?

Ur, yea! I will. Thanks for letting me know!

@dongjoon-hyun
Copy link
Member

Thank you, @maropu . The benchmark rang me your repository. :)

@gatorsmile
Copy link
Member

@kiszk Yes. Let us do the perf regression tests before merging this.

@xuanyuanking Could you run our internal benchmarking regression tests and see whether this upgrade will trigger the regressions?

@gatorsmile
Copy link
Member

@HeartSaVioR @kiszk @maropu @rednaxelafx How confident are you about this new version especially when 3.1.1 are still not stable?

Since Spark 3.0 is already code freeze, could you justify why we have to upgrade it?

@HyukjinKwon
Copy link
Member

Personally I am a bit cautious whenever I see vendors are explicitly mentioned in OSS.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 20, 2020

So that's the matter of maintenance; I wouldn't expect we will get Janino 3.0.17 easily, even though we fix the issue and provide the patch. The 3.0.x version line wasn't even maintained and even after branching it's defined as "deprecated".

Regarding confidence of stability, yeah, Spark is used to work with Janino 3.0.x so Janino 3.0.16 gives more confidence, though what I originally fixed in Janino is actually long-standing bug. There're still some known bugs in Janino 3.0.x (SPARK-25987, and janino-compiler/janino#90 (comment) Josh Rosen reported to Janino) which will never be fixed in Janino 3.0.x I think, because these bugs don't exist in Janino 3.1.x.

The question remains: have we ever constructed the way to justify the confidence of Janino stability? Any tests fail to catch the bugs I mentioned, so I don't think we have stability tests which verify edge-cases. In normal cases, a bunch of Spark UTs generate codes and Janino will be tested by these codes whenever we integrate Janino. If we evaluate the confidence from just test coverages, nothing looks to be changed, assume we guarantee any new version of Janino should pass Spark UTs to be adopted.

I'd agree about general fear of adopting minor version which actually had major changes; that's why I asked about opinions in other PR. In the long run I think Janino 3.1.x is the way to go, but if adopting Janino 3.1.x becomes the burden for Spark 3.0.0, I'm OK to go with Janino 3.0.16 in Spark 3.0 - any further Janino bugs may require Spark 3.1 to resolve.

@gatorsmile
Copy link
Member

https://issues.apache.org/jira/browse/SPARK-25987 has been addressed in SPARK 3.0?

Upgrading to Janino 3.0.16 looks much safer to me. Using the latest version is always risky to me. We normally avoid doing it for critical libraries we rely on. How about waiting for a few more months, after more projects try Janino 3.1.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 20, 2020

https://issues.apache.org/jira/browse/SPARK-25987 has been addressed in SPARK 3.0?

Looks like, though it might be just by luck (no one figured out which change fixed the issue). We don't seem to fix the root issue (because the root issue is in Janino 3.0.x), so it could be still possible Spark 3.0 only escalates the threshold higher and the code is no longer crashed, but another code could still crash (though that's stack overflow which Xss parameter would simply help).

The code Spark generates are far from hand-written code which only few of Janino users would have gone through the same path - the bugs we've figured out are edge-cases which only happen when we put lots of branch in a method, or the switch statement has bunch of statements (exceeding 32K of offset) in it.

If we consider Janino as a critical library we may need to consider it as something we need to investigate and collaborate, instead of treating it simply as a one of 3rd party.

I totally understand we may want to avoid the risk on Spark 3.0.0 as it's closer and closer to out the door, but if we ever fear about adopting Janino 3.1.x into Spark 3.1 and want to wait for some other one to encounter issues and deal with these instead of us, well... I'm not sure it will happen.

@kiszk
Copy link
Member

kiszk commented Mar 20, 2020

@HyukjinKwon Sorry, I will take care of it. In my mind, there is a thought about whether this great framework can be used for this.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31101][SQL][BUILD] Upgrade Janino to 3.1.2 [SPARK-31214][SQL][BUILD] Upgrade Janino to 3.1.2 Mar 22, 2020
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31214][SQL][BUILD] Upgrade Janino to 3.1.2 [SPARK-31214][BUILD] Upgrade Janino to 3.1.2 Mar 22, 2020
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 22, 2020

Hi, @HeartSaVioR and all.

  • The JIRA issue is changed into SPARK-31214 because we need more time for this.
  • Apache Spark 3.1.0 has Janino 3.0.16 first (f55f6b5).

@srowen
Copy link
Member

srowen commented Mar 22, 2020

I'm OK with janino 3.0.x now, but, yeah if 3.1.x is risky, it's not less risky in Spark 3.1, 3.2, etc. Maybe we simply cross that bridge when forced to.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 24, 2020

Thanks for sorting it out, @dongjoon-hyun !

Btw, I have been thinking that the concern was against Spark 3.0 instead of further minor versions, like Spark 3.1. e.g. for Spark 3.1, based on comments we seem to think OK to go with Janino 3.1.x.

I'll keep this open as basically I consider Janino 3.0.16 as the last version of 3.0.x (the bridge is burning). And then we may realize that we should deal with Janino 3.1.x in the long run, IMHO in near future.

@dongjoon-hyun
Copy link
Member

Hi, @HeartSaVioR . Could you rebase this PR once more?

@HeartSaVioR
Copy link
Contributor Author

OK just rebased. Thanks for reminding.

@SparkQA
Copy link

SparkQA commented May 29, 2020

Test build #123287 has finished for PR 27860 at commit 1386031.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you all. Merged to master for Apache Spark 3.1.

@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-31101 branch May 30, 2020 08:56
dongjoon-hyun pushed a commit that referenced this pull request Aug 20, 2020
### What changes were proposed in this pull request?

This PR reverts #27860 to downgrade Janino, as the new version has a bug.

### Why are the changes needed?

The symptom is about NaN comparison. For code below
```
if (double_value <= 0.0) {
  ...
} else {
  ...
}
```

If `double_value` is NaN, `NaN <= 0.0` is false and we should go to the else branch. However, current Spark goes to the if branch and causes correctness issues like SPARK-32640.

One way to fix it is:
```
boolean cond = double_value <= 0.0;
if (cond) {
  ...
} else {
  ...
}
```

I'm not familiar with Janino so I don't know what's going on there.

### Does this PR introduce _any_ user-facing change?

Yes, fix correctness bugs.

### How was this patch tested?

a new test

Closes #29495 from cloud-fan/revert.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
srowen pushed a commit that referenced this pull request May 12, 2021
### What changes were proposed in this pull request?

This PR proposes to bump up the janino version from 3.0.16 to v3.1.4.
The major changes of this upgrade are as follows:
 - Fixed issue #131: Janino 3.1.2 is 10x slower than 3.0.11: The Compiler's IClassLoader was initialized way too eagerly, thus lots of classes were loaded from the class path, which is very slow.
 - Improved the encoding of stack map frames according to JVMS11 4.7.4: Previously, only "full_frame"s were generated.
 - Fixed issue #107: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package
 - Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions).

For all the changes, please see the change log: http://janino-compiler.github.io/janino/changelog.html

NOTE1: I've checked that there is no obvious performance regression. For all the data, see a link: https://docs.google.com/spreadsheets/d/1srxT9CioGQg1fLKM3Uo8z1sTzgCsMj4pg6JzpdcG6VU/edit?usp=sharing

NOTE2: We upgraded janino to 3.1.2 (#27860) once before, but the commit had been reverted in #29495 because of the correctness issue. Recently, #32374 had checked if Spark could land on v3.1.3 or not, but a new bug was found there. These known issues has been fixed in v3.1.4 by following PRs:
 - janino-compiler/janino#145
 - janino-compiler/janino#146

### Why are the changes needed?

janino v3.0.X  is no longer maintained.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA passed.

Closes #32455 from maropu/janino_v3.1.4.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants