Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow current directories in header paths #220

Merged
merged 1 commit into from
Jan 14, 2020
Merged

Allow current directories in header paths #220

merged 1 commit into from
Jan 14, 2020

Conversation

schultetwin1
Copy link
Contributor

The current implementation removes any "current directories" from the
header path for a file. This change stops the removal to leave the path
as is.

This change adds a test to check for regression

Fixes #219

@alexcrichton
Copy link
Owner

Thanks for the report and PR! Could you elaborate though to the motivation for this? I'm not sure why it would be necessary to keep the . directives in there

@schultetwin1
Copy link
Contributor Author

For sure! The issue is paths passed into tar::Builder::append_data (or some of the other append functions) are not comparable to what is returned from tar::Header::path. The following code (playground link) is a basic example of this:

let mut header = Header::new_gnu();
let path = std::path::Path::new("./control");
header.set_path(&path).unwrap();
    
println!("Path: {}", path.display());
println!("Path in Header: {}", header.path().unwrap().display());
assert!(path == header.path().unwrap());

This will print

Path: ./control
Path in Header: control

and then panic on the assertion. This makes it hard to iterate through a tar to find a file you put in it.

./control and control are technically the same path but I don't know how to detect that using std::path::Path. It looks like there is some work for support to normalize paths but it does not exists yet. rust-lang/rfcs#2208

Maybe another option would be to provide a function which normalizes paths for the user so they can see how the path they passed into the crate might have changed?

@alexcrichton
Copy link
Owner

Ah ok, I see!

Would counting files work alright for your use case? Indexing files by their position in the archive should be consisten across implementations

@schultetwin1
Copy link
Contributor Author

Unfortunately, counting files will not work for me. There is no defined order for the files in the tar in my use case (more info below if you are curious).

I have a simple work around to get past this issue, I was just filing this issue and this fix so that others do not run into this same issue. It took me a little while to figure out that tar-rs was silently swallowing the ./ in my path.

If you would prefer to not make this change, I'd like to switch this PR to document the behavior of the append functions modifying the path parameter the user passed in to not including the current directory ..

For more background:

I'm trying to create and parse binary debian packages. The metadata for a binary debian package is stored in a file named "control" which in turn is inside a archive (control.tar).

In most debian packages I have found in ubuntu's package repository the "control" file has a path of ./control in their tar file. So I was trying to mimic that behavior but it's impossible to with this library.

@alexcrichton
Copy link
Owner

Thanks for the info! And yeah if you're ok documenting this for now, that seems good to me to add!

@schultetwin1
Copy link
Contributor Author

Sounds good!

For my own learning, why do you want to keep in this path normalization logic which strips paths of the current directory path? Is there some edge case related to the tar file format that we can avoid by doing this?

@alexcrichton
Copy link
Owner

Oh I'm not really particularly wed to the logic one way or another, I was mostly curious as to the motivations for this. Some degree of path normalization is done to avoid dealing with .. in names or giving errors when it goes outside something. Other path normalization is required to translate \ to / on Windows. Ignoring . for the current directory just sort of fell out of all of this because it was already happening and was easy enough to handle.

I don't think that anything would break with this but I'm not 100% confident myself, hence the question for motivation to see how deeply I needed to look!

tar-rs will strip any current directory out of the paths users are
passing into the API. This documents that behavior and adds a test
to ensure the documented behavior.
@schultetwin1
Copy link
Contributor Author

My apologies for the delay. Updated this change to be documentation only.

@alexcrichton alexcrichton merged commit 4a47d31 into alexcrichton:master Jan 14, 2020
@alexcrichton
Copy link
Owner

👍

moschroe pushed a commit to moschroe/tar-rs that referenced this pull request Jun 11, 2020
tar-rs will strip any current directory out of the paths users are
passing into the API. This documents that behavior and adds a test
to ensure the documented behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't remove CurrentDir on Copy Path
2 participants