-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow GCSTarget's as input and GCSFlagTarget's as output for Hadoop jobs #1664
Conversation
By analyzing the blame information on this pull request, we identified @Tarrasch, @mvj3 and @daveFNbuck to be potential reviewers |
This PR is changing a module I don't know so much about. Can you do a |
@mvj3 @daveFNbuck The |
@geowurster The patch seems make sense, and doesn't change the main logic. But it should at least pass the travis CI. And after that, I think it deserves to be merged. |
I don't think the failing test is related to the changes.
|
@geowurster You're right. I contributed a little patch to |
Current coverage is 75.93%@@ master #1664 diff @@
=====================================
Files 94 94
Lines 10236 10240 +4
Methods 0 0
Branches 0 0
=====================================
- Hits 8047 7776 -271
- Misses 2189 2464 +275
Partials 0 0
|
This PR allows
GCSTarget()
's to be valid input andGCSFlagTarget()
's to be valid output for HadoopJobTask()
's just likeS3Target()
's.Looking at the tests in
test/contrib/hadoop_teset.py
I don't see thes3
functionality tested anywhere, so this PR does not introduce any new tests, but I am happy to add some with a bit of guidance.When running a Hadoop job on Google Compute Engine Google Cloud Storage can be used as HDFS, so
gs://bucket/path/to/resource
URL's are valid in addition to normal HDFS paths.Also like
S3FlagTarget()
, an exception is raised ifend_job_with_atomic_move_dir=True
and the output is aGCSFlagTarget()
. I'm not 100% positive this is the correct behavior, but when I tried to run a test job with atomic move aUserWarning
was issued stating the client doesn't support atomic move, and the job ultimately failed. I'm on the fence about this portion of the PR and feel like I don't necessarily have enough information to confidently implement this, so I'm happy to remove it, but unfortunately I don't have time to investigate if the atomic move is possible on GCS. We found that it was prohibitively slow anyway.