-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
targets: Implement temporary_path decorator #1909
Conversation
@ulzha, do you mind reading the docs I included in this. Maybe there's been other experiences or best practices from Spotify you want to share? |
9b9b3de
to
82f52e2
Compare
looks reasonable, although consider using |
@ulzha @dlstadther @daveFNbuck, mind reviewing this? Or at least say you have not had big problems with atomic writes like I've had. |
I like the documentation and agree with the importance of atomic writes, but I've mostly just used core code for it and not had to write any myself so I haven't had any big problems. I'm still in favor of anything that makes it easier though. |
if exc_type is None: | ||
# There were no exceptions | ||
self.target.fs.rename_dont_move(self._temp_path, self.target.path) | ||
return False # False means we don't suppress the exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we clean up the temporary file on failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about this actually. And I think it's best to not clean them up. I usually find the temporary files to be invaluable traces left when debugging. I'm however up for in an upcoming patch add a kwarg like delete_on_failure=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's reasonable. I have a daily cron job to clean up all the leftover tmp files from MRs so HDFS doesn't fill up, so I'm mostly ok either way.
self._temp_path = '{}-luigi-tmp-{:010}{}'.format( | ||
self.target.path.rstrip('/'), | ||
num, | ||
'/' if self.target._has_trailing_slash() else '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you're handling slashes here could cause problems with targets on Windows machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
82f52e2
to
36c50c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from @daveFNbuck 's comment about deleting tmp file parts upon failure, nothing stands out to me as odd, lacking, or error prone. Personally, I don't have issues with file writes not completing (perhaps my output isn't as large/complex as others').
with target.temporary_path() as tmp_path: | ||
assert re.match(pattern, tmp_path) | ||
self.fs.rename_dont_move.assert_called_once_with(tmp_path, target.path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save you the trouble of reading the Travis report, your only issue was a flake8 complaint. You included too may \n
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As has been detailed in spotify#1519, there are many ways to do temporary files in luigi. The use case for temporary files are always about achieving atomic writes. This PR finally describes the problem in the luigi documentation and tries to interlink essential parts. Also it implements a decorator that is meant to be able to unify many of the existing implementations, and as a new, clean and recommended way to do temporary files and atomic moves using luigi.
36c50c9
to
c2b38e6
Compare
Description
As has been detailed in #1519, there are many ways to do
temporary files in luigi. The use case for temporary files are always
about achieving atomic writes.
This PR finally describes the problem in the luigi documentation and
tries to interlink essential parts. Also it implements a decorator
that is meant to be able to unify many of the existing implementations,
and as a new, clean and recommended way to do temporary files and atomic
moves using luigi.
Have you tested this? If so, how?
There are included unit tests, although they only run with a lot mocked. I've run something similar in production for over half a year now. I intend to also try this a little bit (at least for hdfs+local disk)