-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added image_version support to Dataproc #1833
Conversation
@piotrpolatowski, thanks for your PR! By analyzing the annotation information on this pull request, we identified @constantijn to be a potential reviewer |
lgtm |
lgtm. Can someone who knows gcp review this too? |
I like the idea, but i don't think image version 1.1 should be the default. It might be the latest image now but eventually v1.1 will become obsolete and you won't even be able to provision that version anymore. Can you change it so that if not specified it won't pass the image version to the API at all? (which means it'll default to giving you the latest version). More info on dataproc versioning here: |
Is there a way in |
Nope. But you can typically use |
Done, manual test created |
software_config = {} | ||
|
||
if self.image_version: | ||
software_config["imageVersion"] = self.image_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This oneliner is much more readable imo:
software_config = { "imageVersion": self.image_version} if self.image_version else {}
lgtm |
Any idea why CI might have failed? |
Looks like flake8 (Stylecheck/linter) failed with: |
Well. All you need now is to add tests I believe: https://github.com/spotify/luigi/blob/master/test/contrib/dataproc_test.py |
Other properties are not tested, I thought there is a reason for that, is there? |
I bet it's laziness ;p. But we have the author here so I let @constantijn answer it. :) |
Heh, there's a full set of tests to test if the Luigi side of things work properly. |
I kind of agree with @constantijn we can't test every parameter. I've added a simple unit test with |
Yeah they're integration tests that check if the Luigi task does what you expect it to do on google cloud. Not sure if this is the right way to test since what you have now leaves a cluster up and running after a successful test run, which will confuse future tests runs and use up unnecessary cloud credit. At least we need to add a delete cluster call after this one, probably also add a "2" to the cluster name so it can't interfere with the other tests. The "hard" way would be to write a real unit test that intercepts the api calls to goole cloud (It's all REST under the hood) and runs assertions on that to make sure that the Luigi options you set result in the API calls you expect. |
Oh and Flake8 is complaining again ;)
|
Flake8 again:
Other then that I'm happy if @Tarrasch is happy :) |
Yea I just want some minimal test (something is better than nothing principle). But I totally leave it up to you guys to decide if it should be there or not. Because you guys know this code much better than me. :) |
go ahead! |
@constantijn @piotrpolatowski, Can you see if you added a regression for this test here? https://travis-ci.org/spotify/luigi/jobs/157539534 Unfortunately people without write access to this repository cannot let Travis run the gcp tests. But perhaps you can try to see if you get the same error. |
The error i see is: What's the env variable? Or what's the cluster name that gets passed to the test? In the travis output I only see that the api didn't like the clusterName it got passed ... I can't see what the actual clusterName is that it got passed. |
@constantijn, I can't help you much as I don't know anything about dataproc. Can you either send a patch increasing the debug output? Is there any way you can see how this patch can have caused the build to start failing? |
I'm kinda debugging in the dark here, but I can make a guess: No idea why this wasn't caught before if this is indeed the issue. |
Blindly following the advice given by @constantijn here #1833 (comment)
@constantijn ok I submitted a PR trying what you suggested. But if it doesn't work, perhaps you or @piotrpolatowski can try to run the tests against your own gcp cluster? |
Blindly following the advice given by @constantijn here #1833 (comment)
Description
Adds the
image_version
parameter. Version listMotivation and Context
Relates to issue Dataproc image-version param not supported
Have you tested this? If so, how?
Manually tested against Google Dataproc. Both 1.0 and 1.1 cluster creation worked.