Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7919][DNM] Migrate integration tests to run on Spark 3.5 #11994

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Sep 24, 2024

Change Logs

This PR migrates the integration tests from running on Spark 2.4 to running on Spark 3.5. Changes include:

  • Fixes hudi-hive-sync-bundle to include necessary jackson and parquet classes in the bundle to avoid missing or conflicted classes in Hive sync;
  • Fixes hudi-sync/hudi-hive-sync/run_sync_tool.sh to avoid jackson dependency conflict;
  • Changes configurations of integration-tests task in .github/workflows/bot.yml to run on Spark 3.5;
  • Changes docker demo setup to use Spark 3.5, rebuilds and uploads new images (for linux/amd64 architecture only in this PR);
    • Dockerfiles are changed. Debian stretch release is no longer supported. Debian bullseye release is used instead as the base.
    • Spark 3.5.3 is used in docker images that need Spark.
  • Changes to hudi-aws/pom.xml so that we can pull the moto image based on the correct architecture. apachehudi/moto image is uploaded. Moto port is changed so it can run locally on the Macbook to avoid port collision;
  • Changes the way of checking open port in ITTestBase#checkHealth based on the new output;
  • Re-enables ITTestHoodieDem#testParquetDemo to verify that Spark job and queries work fine in the new docker demo setup;
  • Disables ITs in ITTestHoodieSanity and ITTestHoodieSyncCommand which are already covered by other tests (HUDI-8274 to revisit).

New docker images are uploaded to the Docker Hub:

  • apachehudi/moto
  • apachehudi/hudi-hadoop_2.8.4-base
  • apachehudi/hudi-hadoop_2.8.4-datanode
  • apachehudi/hudi-hadoop_2.8.4-history
  • apachehudi/hudi-hadoop_2.8.4-hive_2.3.3
  • apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_3.5.3
  • apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.5.3
  • apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_3.5.3
  • apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_3.5.3

Old images under the same names are pushed to a different tag, stretch, in case we'd like to use that with Spark 2.4.

Impact

Makes integration test run on Spark 3.5, to unblock deprecation of Spark 2 integration in Hudi.

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@yihua yihua changed the title [HUDI-7919] Migrate integration tests to run on Spark 3.5 [HUDI-7919][DNM] Migrate integration tests to run on Spark 3.5 Sep 24, 2024
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Sep 24, 2024
@yihua yihua force-pushed the HUDI-7919-spark35-it branch 3 times, most recently from 079bc1c to 9cddf3e Compare September 24, 2024 16:42
@github-actions github-actions bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Sep 24, 2024
@@ -37,11 +36,12 @@ services:
retries: 3

datanode1:
image: apachehudi/hudi-hadoop_2.8.4-datanode:latest
image: apachehudi/hudi-hadoop_2.8.4-datanode:bullseye
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image tag will be reverted from bullseye to latest once the PR is close to merging.

@@ -17,7 +17,7 @@

ARG HADOOP_VERSION=2.8.4
ARG HADOOP_DN_PORT=50075
FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest
FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:bullseye
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here in Dockerfiles: the image tag will be reverted from bullseye to latest once the PR is close to merging.

@@ -268,11 +269,12 @@
</run>
</image>
<image>
<name>motoserver/moto:${moto.version}</name>
<name>apachehudi/moto:${moto.version}</name>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to do this to pull the image of the correct architecture as the plugin cannot do this properly.

Comment on lines +63 to +65
// This port number must be the same as {@code moto.port} defined in pom.xml
private static final int MOTO_PORT = 5002;
private static final String MOTO_ENDPOINT = "http://localhost:" + MOTO_PORT;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this integration test run locally.

@@ -170,7 +170,7 @@ private boolean checkHealth(String fromContainerName, String hostname, int port)
TestExecStartResultCallback resultCallback =
executeCommandStringInDocker(fromContainerName, command, false, true);
String stderrString = resultCallback.getStderr().toString().trim();
if (!stderrString.contains("open")) {
if (!stderrString.contains("succeeded")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output message change.

assertStdOutContains(stdOutErrPair,
"|default |stock_ticks_cow |false |\n"
+ "|default |stock_ticks_cow_bs |false |\n"
+ "|default |stock_ticks_mor |false |\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the original table name is also synced to the metastore based on the recent behavior change (#10685).

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants