-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7919][DNM] Migrate integration tests to run on Spark 3.5 #11994
base: master
Are you sure you want to change the base?
Conversation
079bc1c
to
9cddf3e
Compare
9cddf3e
to
c2f4791
Compare
c2f4791
to
381e3a0
Compare
@@ -37,11 +36,12 @@ services: | |||
retries: 3 | |||
|
|||
datanode1: | |||
image: apachehudi/hudi-hadoop_2.8.4-datanode:latest | |||
image: apachehudi/hudi-hadoop_2.8.4-datanode:bullseye |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image tag will be reverted from bullseye
to latest
once the PR is close to merging.
@@ -17,7 +17,7 @@ | |||
|
|||
ARG HADOOP_VERSION=2.8.4 | |||
ARG HADOOP_DN_PORT=50075 | |||
FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest | |||
FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:bullseye |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here in Dockerfile
s: the image tag will be reverted from bullseye
to latest
once the PR is close to merging.
@@ -268,11 +269,12 @@ | |||
</run> | |||
</image> | |||
<image> | |||
<name>motoserver/moto:${moto.version}</name> | |||
<name>apachehudi/moto:${moto.version}</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to do this to pull the image of the correct architecture as the plugin cannot do this properly.
// This port number must be the same as {@code moto.port} defined in pom.xml | ||
private static final int MOTO_PORT = 5002; | ||
private static final String MOTO_ENDPOINT = "http://localhost:" + MOTO_PORT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this integration test run locally.
@@ -170,7 +170,7 @@ private boolean checkHealth(String fromContainerName, String hostname, int port) | |||
TestExecStartResultCallback resultCallback = | |||
executeCommandStringInDocker(fromContainerName, command, false, true); | |||
String stderrString = resultCallback.getStderr().toString().trim(); | |||
if (!stderrString.contains("open")) { | |||
if (!stderrString.contains("succeeded")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output message change.
assertStdOutContains(stdOutErrPair, | ||
"|default |stock_ticks_cow |false |\n" | ||
+ "|default |stock_ticks_cow_bs |false |\n" | ||
+ "|default |stock_ticks_mor |false |\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the original table name is also synced to the metastore based on the recent behavior change (#10685).
Change Logs
This PR migrates the integration tests from running on Spark 2.4 to running on Spark 3.5. Changes include:
hudi-hive-sync-bundle
to include necessary jackson and parquet classes in the bundle to avoid missing or conflicted classes in Hive sync;hudi-sync/hudi-hive-sync/run_sync_tool.sh
to avoid jackson dependency conflict;integration-tests
task in.github/workflows/bot.yml
to run on Spark 3.5;linux/amd64
architecture only in this PR);Dockerfile
s are changed. Debian stretch release is no longer supported. Debian bullseye release is used instead as the base.hudi-aws/pom.xml
so that we can pull themoto
image based on the correct architecture.apachehudi/moto image
is uploaded. Moto port is changed so it can run locally on the Macbook to avoid port collision;ITTestBase#checkHealth
based on the new output;ITTestHoodieDem#testParquetDemo
to verify that Spark job and queries work fine in the new docker demo setup;ITTestHoodieSanity
andITTestHoodieSyncCommand
which are already covered by other tests (HUDI-8274 to revisit).New docker images are uploaded to the Docker Hub:
apachehudi/moto
apachehudi/hudi-hadoop_2.8.4-base
apachehudi/hudi-hadoop_2.8.4-datanode
apachehudi/hudi-hadoop_2.8.4-history
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_3.5.3
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.5.3
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_3.5.3
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_3.5.3
Old images under the same names are pushed to a different tag,
stretch
, in case we'd like to use that with Spark 2.4.Impact
Makes integration test run on Spark 3.5, to unblock deprecation of Spark 2 integration in Hudi.
Risk level
low
Documentation Update
N/A
Contributor's checklist