Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TarReader doesn't read paths with spaces from Pax extended attributes #78456

Closed
prettybits opened this issue Nov 16, 2022 · 3 comments · Fixed by #78744
Closed

TarReader doesn't read paths with spaces from Pax extended attributes #78456

prettybits opened this issue Nov 16, 2022 · 3 comments · Fixed by #78744

Comments

@prettybits
Copy link

prettybits commented Nov 16, 2022

Description

For tar archives with pax entries whose extended metadata contains a path with spaces that path is currently silently ignored and the standard name field used, which can lead to extraction errors when the names of headers aren't unique (e.g. I came across files that put "PaxHeader" as the name for every pax entry and their consecutive file entry.

Cannot create '/home/prettybits/tests/PaxHeader' because a file or directory with the 
same name already exists.

Spaces in values of extended metadata entries are explicitly treated as malformed here (introduced in #74281), but I can't see why. @stephentoub

Reproduction Steps

Given a tar file with multiple pax entries whose name field values for extended attribute and file entry each are "PaxHeader", and pax extended attributes which contain a path with spaces (e.g. 29 path=file with spaces.xml\n13 size=7720\n) :

using System.Formats.Tar;

TarFile.ExtractToDirectory(@"/home/prettybits/tests/paxheaders.tar", @"/home/prettybits/tests/", false);

=> Cannot create '/home/prettybits/tests/PaxHeader' because a file or directory with the same name already exists.

Expected behavior

Paths with spaces from pax extended attributes should be used as the name of the logical entry.

Actual behavior

Paths with spaces are ignored and the original name used, which might not be unique.

Regression?

Reading the diff, this likely worked correctly before #74281 was merged.

Known Workarounds

No response

Configuration

No response

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 16, 2022
@ghost
Copy link

ghost commented Nov 16, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

For tar archives with pax entries whose extended metadata contains a path with spaces that path is currently silently ignored and the standard name field used, which can lead to extraction errors when the names of headers aren't unique (e.g. I came across files that put "PaxHeader" as the name for every pax entry and their consecutive file entry.

Cannot create '/home/prettybits/tests/PaxHeader' because a file or directory with the 
same name already exists.

Spaces in values of extended metadata entries are explicitly treated as malformed here (introduced in #74281), but I can't see why.

Reproduction Steps

Given a tar file with multiple pax entries whose name field values for extended attribute and file entry each are "PaxHeader", and pax extended attributes which contain a path with spaces (e.g. 29 path=file with spaces.xml\n13 size=7720\n) :

using System.Formats.Tar;

TarFile.ExtractToDirectory(@"/home/prettybits/tests/paxheaders.tar", @"/home/prettybits/tests/", false);

=> Cannot create '/home/prettybits/tests/PaxHeader' because a file or directory with the same name already exists.

Expected behavior

Paths with spaces from pax extended attributes should be used as the name of the logical entry.

Actual behavior

Paths with spaces are ignored and the original name used, which might not be unique.

Regression?

Reading the diff, this likely worked correctly before #74281 was merged.

Known Workarounds

No response

Configuration

No response

Other information

No response

Author: prettybits
Assignees: -
Labels:

area-System.IO

Milestone: -

@adamsitnik
Copy link
Member

@jozkee @carlossanlop PTAL

Reading the diff, this likely worked correctly before #74281 was merged.

cc @stephentoub

@stephentoub
Copy link
Member

introduced in #74281

Yup. I misread the Split call I was replacing, which is why I added that IndexOf, and there wasn't a test for paths with spaces to flag my mistake.

@stephentoub stephentoub self-assigned this Nov 16, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 23, 2022
@ghost ghost removed in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner labels Nov 23, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants