Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a file to be moved cross-device. #1877

Merged
merged 1 commit into from
Oct 25, 2016

Conversation

miku
Copy link
Contributor

@miku miku commented Oct 8, 2016

Description

Implementation of luigi.File would now allow movement of a file across filesystems.

Motivation and Context

If you use luigi.File(src).move(target) when source and target are on different filesystems, you get an error:

OSError: [Errno 18] Invalid cross-device link

Also, this would align implementation of File.move() with File.copy() which already uses shutil.

Have you tested this? If so, how?

Semantic stays the same. Existing tests apply.

If the destination is on the current filesystem, then os.rename() is used. - shutil.move

Important tradeoff: If you move a file across filesystems, you lose atomicity of os.rename.

@mention-bot
Copy link

@miku, thanks for your PR! By analyzing the history of the files in this pull request, we identified @Tarrasch, @erikbern and @freider to be potential reviewers.

@erikbern
Copy link
Contributor

erikbern commented Oct 8, 2016

lgtm

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the following?

  • Tests actually moving across fs's, or at least mock out that shutil.move is called
  • Mention the atomicity things in the docs? Atomicity of renames are a very important thing in luigi. If it can't be maintained for some functions, the docs should say under what condition.

Other than that, LGTM

@miku
Copy link
Contributor Author

miku commented Oct 10, 2016

Thanks for the review, @Tarrasch, I tried to address your points. As for the docs, I didn't find a perfect place yet, so added a short section to the tasks docs.

@erikbern
Copy link
Contributor

Mention the atomicity things in the docs? Atomicity of renames are a very important thing in luigi. If it can't be maintained for some functions, the docs should say under what condition.

pretty sure cross device file moving will never be atomic

@miku
Copy link
Contributor Author

miku commented Oct 10, 2016

@erikbern, closest thing to atomic would probably be: copy file to a temp location on the target device, rename it to destination, then unlink the source? But I wanted to keep the code short.

Luigi allows non-atomicity, e.g. in tasks with multiple outputs:

If multiple outputs are returned, atomicity will be lost ... -- Task.output

So I thought it would be ok to drop atomicity if this special case, if the user chooses to (chooses to work with two filesystems).

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikbern Althogh the luigi docs shouldn't go into details about things a savy engineer might have already inferred. All its readers are still humans and many might benefit from a warning-bubble like the one I'm suggesting.

t = LocalTarget(self.path)
f = t.open('w')
f.write('test_data')
f.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can't you do the much more elegant context-manager thingie? with-statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. Almost done.


.. code:: python

luigi.LocalTarget(path=src).move(dst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be much more easy to put this in the docs for LocalTarget.move() docs?? You don't even need a code example. Just a warning saying to be aware of the risks when doing this across filesystems's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I addressed this the next PR.

@erikbern
Copy link
Contributor

@erikbern, closest thing to atomic would probably be: copy file to a temp location on the target device, rename it to destination, then unlink the source? But I wanted to keep the code short.

great idea, let's do this

With `os.rename` you get a `OSError: [Errno 18] Invalid cross-device link`
if you attempt to move a file across filesystems.

`shutil.move` covers the basic case:

> If the destination is on the current filesystem, then os.rename() is used.

and additionally handles the cross-device case.

If the destination is on a different filesystem,
we approximate atomicity by first copying the source
to a temporary file on the target filesystem, followed
by a rename. Finally the source file is removed.
@miku
Copy link
Contributor Author

miku commented Oct 13, 2016

great idea, let's do this

Thanks for the feedback. I implemented an almost-atomic cross-device move operation and updated the docs. (I'll be AFK for ten days, can only look into this when I'm back).

@Tarrasch
Copy link
Contributor

Does this work for you in production as well? If so, let's merge. :)

@erikbern
Copy link
Contributor

lgtm – nice!

@miku
Copy link
Contributor Author

miku commented Oct 24, 2016

Does this work for you in production as well?

@Tarrasch, yes it does.

Background: We have $TEMPDIR and data on different partitions plus we sometimes do writes to a tempfile first, then manually move the file into place with a

luigi.LocalTarget(sometempfile).move(self.output().path)

@Tarrasch Tarrasch merged commit e9ce79e into spotify:master Oct 25, 2016
@miku
Copy link
Contributor Author

miku commented Oct 25, 2016

Thanks!

This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants