Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository.copy_tree: omit subdirectories from path when copying #5648

Merged
merged 2 commits into from
Sep 22, 2022

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Sep 20, 2022

Fixes #5647

The intention for copy_tree was to copy only the contents of the
relative subpath, however, it was including the subdirectories. This was
unintentional and users also probably won't expect this behavior. For
example, when copying the content of the relative/sub subdirectory,
one wouldn't expect the content to be copied inside a relative/sub
folder as well. This is also not the behavior of cp. When executing
cp -r relative/sub target, the folder relative/sub won't be present
in the target directory.

The solution is to make the target path of the file to be copied
relative to the source path within the repository.

Although this change is technically breaking backwards compatibility, it
is correcting a bug of a feature that was introduced in v2.0. It is
very unlikely that any plugins started depending on this behavior so we
still have the possibility to fix it now.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 20, 2022

This should be backported and released with v2.0.4

(None, {'file_a': b'a', 'relative': {'file_b': b'b', 'sub': {'file_c': b'c'}}}),
('.', {'file_a': b'a', 'relative': {'file_b': b'b', 'sub': {'file_c': b'c'}}}),
('relative', {'file_b': b'b', 'sub': {'file_c': b'c'}}),
('relative/sub', {'file_c': b'c'}),
Copy link
Member

@unkcpz unkcpz Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this, I think the last test case should be like this below? But this is what are you going to fix?

('relative/sub', {'sub': {'file_c': b'c'}}),

I made a test in my command line:

╭─jyu at theospc38 in /tmp
╰─○ mkdir target_d
╭─jyu at theospc38 in /tmp
╰─○ mkdir -p relative/sub
╭─jyu at theospc38 in /tmp
╰─○ touch relative/b
╭─jyu at theospc38 in /tmp
╰─○ touch relative/sub/c
╭─jyu at theospc38 in /tmp
╰─○ cp -r relative/sub target_d
╭─jyu at theospc38 in /tmp
╰─○ ls target_d 
sub
╭─jyu at theospc38 in /tmp
╰─○ ls target_d/sub 
c

My cp version is cp (GNU coreutils) 8.30.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the behavior of cp is different. Note that tat is not just for the last test case but also the penultimate. Essentially, the last directory in the path is kept. One would have to do cp -r relative/sub/* target_d to achieve what is currently suggested by the test.

I think the docstring of copytree is consistent with the implemented behavior. It is just that my commit message regarding the parallel with cp is incorrect. I am wondering if that should be the behavior though. I am actually mirroring the behavior of shutil.copytree. If you run shutil.copytree('relative/sub', 'target_d'), it would simply create target_d and copy the file c into it. Which is what Repository.copy_tree currently does.

Also, if we were to emulate the cp behavior, then I am not sure what someone would have to do if they didn't want to keep the sub directory. In cp you would a glob, but that is not supported in the Python API.

So in short, I think we should keep the current implementation but I correct the commit message saying that it mirrors shutil.copytree and remove the mention of cp. Would that work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I'll continue to review along this line. Thanks for the explanation!

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber thanks! Looks all good from my side. cheers.

The intention for `copy_tree` was to copy only the contents of the
relative subpath, however, it was including the subdirectories. This was
unintentional and users also probably won't expect this behavior. For
example, when copying the content of the `relative/sub` subdirectory,
one wouldn't expect the content to be copied inside a `relative/sub`
folder as well. This is also not the behavior of `cp`. When executing
`cp -r relative/sub target`, the folder `relative/sub` won't be present
in the `target` directory.

The solution is to make the target path of the file to be copied
relative to the source path within the repository.

Although this change is technically breaking backwards compatibility, it
is correcting a bug of a feature that was introduced in `v2.0`. It is
very unlikely that any plugins started depending on this behavior so we
still have the possibility to fix it now.
When the `local_copy_list` was provided a directory as source, the
contents would be copied to the sandbox folder in `upload_calculation`.
The point of using the `local_copy_list` is to guarantee that the files
copied this way, are not also copied to the repository of the process
node, since this would duplicate content. This was accomplished by
adding the files copied through this method to `provenance_exclude_list`.

The implementation contained a bug though when the `local_copy_list`
instruction also defined a target. In this case the target was not
included in the paths to be excluded and so were missed when copying the
contents of the sandbox to the repository of the process node. This is
corrected and regression tests are added.
@sphuber sphuber force-pushed the fix/5647/repository-copy-tree branch from 569bf1e to 9a60e24 Compare September 22, 2022 12:55
@sphuber sphuber merged this pull request into aiidateam:main Sep 22, 2022
@sphuber sphuber deleted the fix/5647/repository-copy-tree branch September 22, 2022 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repository.copy_tree method keeps the sub directories of path instead of copying only the contents
2 participants