Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tar.Header.Header.unmarshal should treat Normal entries with trailing slash name as Directory. #129

Closed
gravicappa opened this issue Jun 9, 2023 · 10 comments

Comments

@gravicappa
Copy link

As written here.

Also, for backward compatibility, tar treats a regular file whose name ends with a slash as a directory.

And I have encountered archives like that.

@reynir
Copy link
Member

reynir commented Jun 12, 2023

Thanks, I was not aware.

How do you think we should handle this? There's the Tar module, but also the Tar_unix and Tar_unix_lwt.

For the latter two it makes sense to consider this when extracting. In #127 I'm reworking Tar_unix etc.

For Tar I'm unsure if we should do anything differently. When we unmarshal we could synthesize the link_indicator to be Tar.Header.Link.Directory. I'm not sure we should do this. We could do it in the header reader as we already synthesize other things (long names IIRC). What do you think?

I am curious if you know how to create such an archive, or if you have a (small) sample.

@gravicappa
Copy link
Author

You can create such archives with bsdtar c --format v7 ….

I would implement that link_indicator fix in Tar.Header.unmarshal either when hdr_version (not level though) is pre-ustar or unconditionally after synthesizing long names (to avoid being misdirected by trailing slash in cut-off name). In my code I just call

let sanitize_header (header: Tar.Header.t) =
  match header.link_indicator with
  | Normal ->
      if path_is_directory header.file_name then
        { header with link_indicator = Directory }
      else
        header
  | _ -> header

in Tar_unix.Archive.with_next_file called proc and it generally works although I'm dealing with docker image layers archives only at the moment.

@reynir
Copy link
Member

reynir commented Jun 16, 2023

I was curious how gnu tar would treat a tar archive with a pax header setting the file name to clearly/a/directory/ followed by a header for an empty file placeholder with normal link indicator. So I created that with ocaml-tar, and interestingly tar treats the contents as a directory clearly/a/directory/:

$ tar -tvf pax-shenanigans.tar 
dr-------- 0/0               0 1970-01-01 01:00 clearly/a/directory/

Here's the sample archive (gzipped so github accepts it) pax-shenanigans.tar.gz

I will make a PR shortly implementing your second proposal.

reynir added a commit that referenced this issue Jun 20, 2023
reynir added a commit to reynir/opam-repository that referenced this issue Jun 20, 2023
CHANGES:

- Treat headers with link indicator '0' or '\000' (`Normal`) as directories for backward compatibility (reported in mirage/ocaml-tar#129, fix by @reynir)
@reynir
Copy link
Member

reynir commented Jun 20, 2023

tar.2.5.1 has been submitted to opam-repository with a fix. Thank you for the report and the discussion on how to fix it.

@reynir reynir closed this as completed Jun 20, 2023
reynir added a commit to reynir/opam-repository that referenced this issue Jun 21, 2023
CHANGES:

- Treat headers with link indicator '0' or '\000' (`Normal`) as directories for backward compatibility (reported in mirage/ocaml-tar#129, fix by @reynir)
@gravicappa
Copy link
Author

It seems that I was wrong: the fix up for link_indicator should be done in read_header instead: file_name is being updated there.

And I encountered archive (which unfortunately I can't provide) there LongLink contains full name and file_name is cut off right after /.

@gravicappa
Copy link
Author

This is how I fixed the issue locally (probably missed a couple of other issues):

diff --git a/lib/tar.ml b/lib/tar.ml
index 0ff1346..16de231 100644
--- a/lib/tar.ml
+++ b/lib/tar.ml
@@ -522,17 +522,7 @@ module Header = struct
         let mod_time = match extended.Extended.mod_time with
           | None -> get_hdr_mod_time c
           | Some x -> x in
-        let link_indicator =
-          let link_indicator = get_hdr_link_indicator c in
-          (* For backward compatibility we treat normal files ending in slash
-             as directories. Because [Link.of_char] treats unrecognized link
-             indicator values as normal files we check directly *)
-          if String.length file_name > 0 && file_name.[String.length file_name - 1] = '/' &&
-             (link_indicator = '0' || link_indicator = '\000') then
-            Link.Directory
-          else
-            Link.of_char ~level link_indicator
-        in
+        let link_indicator = Link.of_char ~level (get_hdr_link_indicator c) in
         let uname = match extended.Extended.uname with
           | None -> if ustar then get_hdr_uname c else ""
           | Some x -> x in
@@ -718,6 +708,17 @@ module HeaderReader(Async: ASYNC)(Reader: READER with type 'a t = 'a Async.t) =
           | None -> return (Error `Eof)
         end in
 
+    let true_link_indicator link_indicator file_name =
+      (* For backward compatibility we treat normal files ending in slash
+         as directories. Because [Link.of_char] treats unrecognized link
+         indicator values as normal files we check directly *)
+      if String.length file_name > 0
+         && file_name.[String.length file_name - 1] = '/'
+         && link_indicator = Header.Link.Normal then
+        Header.Link.Directory
+      else
+        link_indicator in
+
     let rec read_header (file_name, link_name, hdr) : (Header.t, [`Eof]) result Async.t =
       let raw_link_indicator = Header.get_hdr_link_indicator buffer in
       if (raw_link_indicator = 'K' || raw_link_indicator = 'L') && level = Header.GNU then
@@ -738,7 +739,8 @@ module HeaderReader(Async: ASYNC)(Reader: READER with type 'a t = 'a Async.t) =
       else begin
         let link_name = if link_name = "" then hdr.Header.link_name else link_name in
         let file_name = if file_name = "" then hdr.Header.file_name else file_name in
-        return (Ok {hdr with Header.link_name; file_name })
+        let link_indicator = true_link_indicator hdr.Header.link_indicator file_name in
+        return (Ok {hdr with Header.link_name; file_name; link_indicator })
       end in
     get_hdr ()
     >>= function

@hannesm hannesm reopened this Jul 31, 2023
@hannesm
Copy link
Member

hannesm commented Jul 31, 2023

reopening, thanks for your testing and reporting!

@reynir
Copy link
Member

reynir commented Sep 6, 2023

Dear @gravicappa, thank you for your report and for sharing your own fix, and sorry for not reacting earlier.

Unfortunately Link.of_char returns Link.Normal if the link indicator is unknown - so maybe a bit too much is treated as directories. Ideally, only '0' and '\000' would receive this treatment. I don't know how often these unknown-to-us link indicators occur in the wild, and I imagine the chances are slim that they end in a slash so your solution is probably acceptable. In #127 I will reconsider returning Link.Normal on "unknown" link indicators.

reynir added a commit to reynir/opam-repository that referenced this issue Sep 7, 2023
CHANGES:

- Add eio backend for tar in `tar-eio` (@patricoferris, review by @talex5, @reynir, mirage/ocaml-tar#132)
- Also apply backwards compatibility fix when GNU LongName is used.
  The compatibility fix is unfortunately also applied for unknown-to-ocaml-tar link indicators (reported by @gravicappa in mirage/ocaml-tar#129, @reynir, mirage/ocaml-tar#133)
@reynir
Copy link
Member

reynir commented Sep 8, 2023

A fix's released in 2.6.0. I will keep this issue open until I have a fix for the next branch as well. Thanks again for your reports and suggestions!

@hannesm
Copy link
Member

hannesm commented Jan 10, 2024

Done in #127

@hannesm hannesm closed this as completed Jan 10, 2024
nberth pushed a commit to nberth/opam-repository that referenced this issue Jun 18, 2024
CHANGES:

- Add eio backend for tar in `tar-eio` (@patricoferris, review by @talex5, @reynir, mirage/ocaml-tar#132)
- Also apply backwards compatibility fix when GNU LongName is used.
  The compatibility fix is unfortunately also applied for unknown-to-ocaml-tar link indicators (reported by @gravicappa in mirage/ocaml-tar#129, @reynir, mirage/ocaml-tar#133)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants