-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repository.copy_tree
: omit subdirectories from path when copying
#5648
Repository.copy_tree
: omit subdirectories from path when copying
#5648
Conversation
This should be backported and released with v2.0.4 |
49ceca9
to
569bf1e
Compare
(None, {'file_a': b'a', 'relative': {'file_b': b'b', 'sub': {'file_c': b'c'}}}), | ||
('.', {'file_a': b'a', 'relative': {'file_b': b'b', 'sub': {'file_c': b'c'}}}), | ||
('relative', {'file_b': b'b', 'sub': {'file_c': b'c'}}), | ||
('relative/sub', {'file_c': b'c'}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about this, I think the last test case should be like this below? But this is what are you going to fix?
('relative/sub', {'sub': {'file_c': b'c'}}),
I made a test in my command line:
╭─jyu at theospc38 in /tmp
╰─○ mkdir target_d
╭─jyu at theospc38 in /tmp
╰─○ mkdir -p relative/sub
╭─jyu at theospc38 in /tmp
╰─○ touch relative/b
╭─jyu at theospc38 in /tmp
╰─○ touch relative/sub/c
╭─jyu at theospc38 in /tmp
╰─○ cp -r relative/sub target_d
╭─jyu at theospc38 in /tmp
╰─○ ls target_d
sub
╭─jyu at theospc38 in /tmp
╰─○ ls target_d/sub
c
My cp
version is cp (GNU coreutils) 8.30
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the behavior of cp
is different. Note that tat is not just for the last test case but also the penultimate. Essentially, the last directory in the path
is kept. One would have to do cp -r relative/sub/* target_d
to achieve what is currently suggested by the test.
I think the docstring of copytree
is consistent with the implemented behavior. It is just that my commit message regarding the parallel with cp
is incorrect. I am wondering if that should be the behavior though. I am actually mirroring the behavior of shutil.copytree
. If you run shutil.copytree('relative/sub', 'target_d')
, it would simply create target_d
and copy the file c
into it. Which is what Repository.copy_tree
currently does.
Also, if we were to emulate the cp
behavior, then I am not sure what someone would have to do if they didn't want to keep the sub
directory. In cp
you would a glob, but that is not supported in the Python API.
So in short, I think we should keep the current implementation but I correct the commit message saying that it mirrors shutil.copytree
and remove the mention of cp
. Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I'll continue to review along this line. Thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sphuber thanks! Looks all good from my side. cheers.
The intention for `copy_tree` was to copy only the contents of the relative subpath, however, it was including the subdirectories. This was unintentional and users also probably won't expect this behavior. For example, when copying the content of the `relative/sub` subdirectory, one wouldn't expect the content to be copied inside a `relative/sub` folder as well. This is also not the behavior of `cp`. When executing `cp -r relative/sub target`, the folder `relative/sub` won't be present in the `target` directory. The solution is to make the target path of the file to be copied relative to the source path within the repository. Although this change is technically breaking backwards compatibility, it is correcting a bug of a feature that was introduced in `v2.0`. It is very unlikely that any plugins started depending on this behavior so we still have the possibility to fix it now.
When the `local_copy_list` was provided a directory as source, the contents would be copied to the sandbox folder in `upload_calculation`. The point of using the `local_copy_list` is to guarantee that the files copied this way, are not also copied to the repository of the process node, since this would duplicate content. This was accomplished by adding the files copied through this method to `provenance_exclude_list`. The implementation contained a bug though when the `local_copy_list` instruction also defined a target. In this case the target was not included in the paths to be excluded and so were missed when copying the contents of the sandbox to the repository of the process node. This is corrected and regression tests are added.
569bf1e
to
9a60e24
Compare
Fixes #5647
The intention for
copy_tree
was to copy only the contents of therelative subpath, however, it was including the subdirectories. This was
unintentional and users also probably won't expect this behavior. For
example, when copying the content of the
relative/sub
subdirectory,one wouldn't expect the content to be copied inside a
relative/sub
folder as well. This is also not the behavior of
cp
. When executingcp -r relative/sub target
, the folderrelative/sub
won't be presentin the
target
directory.The solution is to make the target path of the file to be copied
relative to the source path within the repository.
Although this change is technically breaking backwards compatibility, it
is correcting a bug of a feature that was introduced in
v2.0
. It isvery unlikely that any plugins started depending on this behavior so we
still have the possibility to fix it now.