Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark loader has dependency conflicts #464

Open
1 task done
haohao0103 opened this issue May 17, 2023 · 12 comments
Open
1 task done

Spark loader has dependency conflicts #464

haohao0103 opened this issue May 17, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@haohao0103
Copy link
Contributor

Bug Type (问题类型)

None

Before submit

  • I had searched in the issues and found no similar issues.

Environment (环境信息)

  • Server Version: v1.0.0
  • Toolchain Version: v1.0.0

Expected & Actual behavior (期望与实际表现)

When running spark-loader, there were a lot of dependency conflicts. Finally, I found the jersey,jakarta,hk2 related package, which was in spark's jars directory, and it was not the same as the jersey,jakarta,hk2 version in the lib directory after the loader was packaged. For example, lib has mainly 3.x jersey and spark has 2.x jars. My solution was to remove the conflicting jars from the spark jars directory and use the loader as it was, so the loader could run. Screenshot of error message:
image
image

The simple removal of the jar worked for me but wasn't general enough. I'm trying to use spark-submit --exclude-jars and see if the community has a better solution, thanks

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@haohao0103 haohao0103 added the bug Something isn't working label May 17, 2023
@imbajin
Copy link
Member

imbajin commented May 17, 2023

seems our spark loader meets some exception now #404, consider fix/ensure them as a tiny task? @simon824

@simon824
Copy link
Member

Solve this issue by shade?
what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7

@haohao0103
Copy link
Contributor Author

Solve this issue by shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7

thanks, I tried to specify that userclasspath has higher priority but I was blocked by many other issues. Use shade plugin relocation to solve this problem, right? There are a few conflicting jars, but it's a good idea I can try

@imbajin
Copy link
Member

imbajin commented Jun 9, 2023

Solve this issue by shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7

thanks, I tried to specify that userclasspath has higher priority but I was blocked by many other issues. Use shade plugin relocation to solve this problem, right? There are a few conflicting jars, but it's a good idea I can try

address this issue, any update with it?

@haohao0103
Copy link
Contributor Author

@imbajin hello, I am following up on this matter. The local test basically passed. I will submit a pr as soon as possible.thanks

@imbajin imbajin moved this to 🏗 In progress in HugeGraph Tasks Jun 12, 2023
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 14, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 14, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 14, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 14, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 14, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 15, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 15, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 15, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 15, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 19, 2023
Spark loader has dependency conflicts
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Jun 19, 2023
Spark loader has dependency conflicts
@liuxiaocs7
Copy link
Member

liuxiaocs7 commented Jul 25, 2023

Hi, @haohao0103, may I ask if you have solved this problem, there are many dependency conflicts in hugegraph-common and Spark, mainly jakarta and javax version conflicts, they cannot be imported at the same time and how to run in IDEA?

When follow dep in loader:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <artifactId>jersey-client</artifactId>
                <groupId>org.glassfish.jersey.core</groupId>
            </exclusion>
            <exclusion>
                <groupId>org.glassfish.jersey.media</groupId>
                <artifactId>jersey-media-json-jackson</artifactId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-common</artifactId>
                <groupId>org.glassfish.jersey.core</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-container-servlet</artifactId>
                <groupId>org.glassfish.jersey.containers</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-container-servlet-core</artifactId>
                <groupId>org.glassfish.jersey.containers</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-hk2</artifactId>
                <groupId>org.glassfish.jersey.inject</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-server</artifactId>
                <groupId>org.glassfish.jersey.core</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <artifactId>antlr4-runtime</artifactId>
                <groupId>org.antlr</groupId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hugegraph</groupId>
        <artifactId>hugegraph-client</artifactId>
        <version>1.0.0</version>
        <exclusions>
            <!-- Note: jackson version should < 2.13 with scala 2.12 -->
            <exclusion>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.jaxrs</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>

image

logs:

23/07/26 12:05:49 INFO SparkEnv: Registering OutputCommitCoordinator
Exception in thread "main" java.lang.NoClassDefFoundError: jakarta/servlet/Filter
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.spark.status.api.v1.ApiRootResource$.getServletHandler(ApiRootResource.scala:63)
	at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:68)
	at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)
	at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:480)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at org.apache.hugegraph.spark.HelloWorld1.main(HelloWorld1.java:17)
Caused by: java.lang.ClassNotFoundException: jakarta.servlet.Filter
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 19 more
23/07/26 12:05:49 INFO DiskBlockManager: Shutdown hook called
23/07/26 12:05:49 INFO ShutdownHookManager: Shutdown hook called

in jersey 3.0.3 used in common, use jakarta but in spark ui use javax, jersey 2.x, namespace conflicts.

@haohao0103
Copy link
Contributor Author

haohao0103 commented Jul 27, 2023

@liuxiaocs7
hello, the conflict I'm trying to resolve is exactly what you described !!!
after the shaded jar is successfully packaged, you can run bin/hugegraph-spark-loader.sh to test it.
as for how to run in IDE?
I understand that since we set spark dependency to provided, we can't run in the IDE. Before shade, we can run temporary tests in IDE by modifying the scope that spark depends on, but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project。

@z7658329
Copy link
Member

@haohao0103 @liuxiaocs7

  1. can we refer to how the Spark community deals with the issue of javax and Jakarta package names ?
  2. in this case, can exclude jakarta.servlet-api in spark-core_2.12, and add manual? because jakarta.servlet-api-4.0.x use javax but change to jakarta from version 5.0.0 , so we can upgrade to 5.0.0, see :
         <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>jakarta.servlet</groupId>
                    <artifactId>jakarta.servlet-api</artifactId>
                </exclusion>
                
                ......
                
            </exclusions>
        </dependency>


        <dependency>
            <groupId>jakarta.servlet</groupId>
            <artifactId>jakarta.servlet-api</artifactId>
            <version>5.0.0</version>
        </dependency>

@liuxiaocs7
Copy link
Member

Hi, @z7658329, I have tried to manually specify the version of jakarta.servlet-api as 5.0.0 or 6.0.0 instead of the default 4.0.3, and got the following results

Exception in thread "main" java.lang.NoClassDefFoundError: javax/servlet/Servlet
	at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:480)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at top.liuxiaocs.Main.main(Main.java:16)
Caused by: java.lang.ClassNotFoundException: javax.servlet.Servlet
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 4 more

pom:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <groupId>jakarta.servlet</groupId>
                <artifactId>jakarta.servlet-api</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <artifactId>antlr4-runtime</artifactId>
                <groupId>org.antlr</groupId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hugegraph</groupId>
        <artifactId>hugegraph-client</artifactId>
        <version>1.0.0</version>
        <exclusions>
            <!-- Note: jackson version should < 2.13 with scala 2.12 -->
            <exclusion>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.jaxrs</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>jakarta.servlet</groupId>
        <artifactId>jakarta.servlet-api</artifactId>
        <version>5.0.0</version>
    </dependency>
</dependencies>

@liuxiaocs7
Copy link
Member

@liuxiaocs7 hello, the conflict I'm trying to resolve is exactly what you described !!! after the shaded jar is successfully packaged, you can run bin/hugegraph-spark-loader.sh to test it. as for how to run in IDE? I understand that since we set spark dependency to provided, we can't run in the IDE. Before shade, we can run temporary tests in IDE by modifying the scope that spark depends on, but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project。

Thank you for your detailed explanation. I will try it based on your PR. As for the dependency whose scope is provide, you can check this option in IDEA.

image

but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project

Yeap, they cannot coexist in IDEA

@liuxiaocs7
Copy link
Member

A minimal runnable Spark3.2.2+HugeGraph-Client1.0.0 example: https://github.com/liuxiaocs7/HugeGraphSpark

@haohao0103
Copy link
Contributor Author

@liuxiaocs7 Thank you very much. I didn't know it was possible to do this without modifying the pom file.

haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Aug 2, 2023
remove incubating
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue Aug 4, 2023
fix delimiter is null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🏗 In progress
Development

No branches or pull requests

5 participants