Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix MkdirAll usage #2361

Closed
wants to merge 3 commits into from
Closed

fix MkdirAll usage #2361

wants to merge 3 commits into from

Conversation

kolyshkin
Copy link
Contributor

This subtle bug keeps lurking in because error checking for Mkdir()
and MkdirAll() is slightly different wrt EEXIST/IsExist:

  • for Mkdir(), IsExist error should (usually) be ignored
    (unless you want to make sure directory was not there before)
    as it means "the destination directory was already there";

  • for MkdirAll(), IsExist error should NEVER be ignored.

This commit removes ignoring the IsExist error, as it should not
be ignored.

For more details, a quote from opencontainers/runc#162:

-quote-

TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.

Quoting MkdirAll documentation:

MkdirAll creates a directory named path, along with any necessary
parents, and returns nil, or else returns an error. If path
is already a directory, MkdirAll does nothing and returns nil.

This means two things:

  1. If a directory to be created already exists, no error is
    returned.

  2. If the error returned is IsExist (EEXIST), it means there exists
    a non-directory with the same name as MkdirAll need to use for
    directory. Example: we want to MkdirAll("a/b"), but file "a"
    (or "a/b") already exists, so MkdirAll fails.

The above is a theory, based on quoted documentation and my UNIX
knowledge.

  1. In practice, though, current MkdirAll implementation [1] returns
    ENOTDIR in most of cases described in Needs to support creating multiple layers #2, with the exception when
    there is a race between MkdirAll and someone else creating the
    last component of MkdirAll argument as a file. In this very case
    MkdirAll() will indeed return EEXIST.

Because of #1, IsExist check after MkdirAll is not needed.

Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.

Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.

[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

-end-quote-

Signed-off-by: Kir Kolyshkin [email protected]

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake
/kind other

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@rhatdan
Copy link
Member

rhatdan commented May 18, 2020

This is causing a test failure?

[+0751s] not ok 143 bud WORKDIR isa symlink
[+0751s] # (from function `die' in file ./helpers.bash, line 175,
[+0751s] #  from function `run_buildah' in file ./helpers.bash, line 163,
[+0751s] #  in test file ./bud.bats, line 1478)
[+0751s] #   `run_buildah bud --signature-policy ${TESTSDIR}/policy.json -t ${target} ${TESTSDIR}/bud/workdir-symlink' failed with status 125
[+0751s] # # [checking for: alpine]
[+0751s] # # [restoring from cache: /var/tmp/buildah-image-cache.5986 / alpine]
[+0751s] # Getting image source signatures
[+0751s] # Copying blob sha256:3eee30c545e47333e6fe551863f6f29c3dcd850187ae3f37c606adb991444886
[+0751s] # Copying config sha256:af88fdb253aac46693de7883c9c55244327908c77248d7654841503f744aae8b
[+0751s] # Writing manifest to image destination
[+0751s] # Storing signatures
[+0751s] # Loaded image(s): localhost/alpine
[+0751s] # $ /var/tmp/go/src/github.com/containers/buildah/tests/./../buildah bud --signature-policy /var/tmp/go/src/github.com/containers/buildah/tests/./policy.json -t alpine-image /var/tmp/go/src/github.com/containers/buildah/tests/./bud/workdir-symlink
[+0751s] # STEP 1: FROM alpine
[+0751s] # STEP 2: RUN mkdir /var/lib/tempest
[+0751s] # STEP 3: RUN ln -sf /var/lib/tempest /tempest
[+0751s] # STEP 4: WORKDIR /tempest
[+0751s] # STEP 5: RUN touch /etc/notareal.conf
[+0751s] # error building at STEP "RUN touch /etc/notareal.conf": error ensuring working directory "/tempest" exists: mkdir /var/tmp/tmpf8ea2454e7e8b4660048bd1f/root/vfs/dir/c9d010971efe482349ad1debbed0edde9af03e0d5125f64a8f377a1688e21f74/tempest: file exists
[+0751s] # [ rc=125 (** EXPECTED 0 **) ]
[+0751s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+0751s] # #| FAIL: exit code is 125; expected 0
[+0751s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[+0753s] not ok 144 bud WORKDIR isa symlink no target dir
[+0753s] # (from function `die' in file ./helpers.bash, line 175,
[+0753s] #  from function `run_buildah' in file ./helpers.bash, line 163,
[+0753s] #  in test file ./bud.bats, line 1495)
[+0753s] #   `run_buildah bud --signature-policy ${TESTSDIR}/policy.json -t ${target} -f Dockerfile-2 ${TESTSDIR}/bud/workdir-symlink' failed with status 125
[+0753s] # # [checking for: alpine]
[+0753s] # # [restoring from cache: /var/tmp/buildah-image-cache.5986 / alpine]
[+0753s] # Getting image source signatures
[+0753s] # Copying blob sha256:3eee30c545e47333e6fe551863f6f29c3dcd850187ae3f37c606adb991444886
[+0753s] # Copying config sha256:af88fdb253aac46693de7883c9c55244327908c77248d7654841503f744aae8b
[+0753s] # Writing manifest to image destination
[+0753s] # Storing signatures
[+0753s] # Loaded image(s): localhost/alpine
[+0753s] # $ /var/tmp/go/src/github.com/containers/buildah/tests/./../buildah bud --signature-policy /var/tmp/go/src/github.com/containers/buildah/tests/./policy.json -t alpine-image -f Dockerfile-2 /var/tmp/go/src/github.com/containers/buildah/tests/./bud/workdir-symlink
[+0753s] # STEP 1: FROM alpine
[+0753s] # STEP 2: RUN ln -sf /var/lib/tempest /tempest
[+0753s] # STEP 3: WORKDIR /tempest
[+0753s] # STEP 4: RUN touch /etc/notareal.conf
[+0753s] # error building at STEP "RUN touch /etc/notareal.conf": error ensuring working directory "/tempest" exists: mkdir /var/tmp/tmpbc4177b113dd5580bb2e2559/root/vfs/dir/20bce62acc8567daa7438b335555be3365acc3a06dba0ea15550063b3bfd8a5d/tempest: file exists
[+0753s] # [ rc=125 (** EXPECTED 0 **) ]
[+0753s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+0753s] # #| FAIL: exit code is 125; expected 0
[+0753s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[+0754s] not ok 145 bud WORKDIR isa symlink no target dir and follow on dir
[+0754s] # (from function `die' in file ./helpers.bash, line 175,
[+0754s] #  from function `run_buildah' in file ./helpers.bash, line 163,
[+0754s] #  in test file ./bud.bats, line 1515)
[+0754s] #   `run_buildah bud --signature-policy ${TESTSDIR}/policy.json -t ${target} -f Dockerfile-3 ${TESTSDIR}/bud/workdir-symlink' failed with status 125
[+0754s] # # [checking for: alpine]
[+0754s] # # [restoring from cache: /var/tmp/buildah-image-cache.5986 / alpine]
[+0754s] # Getting image source signatures
[+0754s] # Copying blob sha256:3eee30c545e47333e6fe551863f6f29c3dcd850187ae3f37c606adb991444886
[+0754s] # Copying config sha256:af88fdb253aac46693de7883c9c55244327908c77248d7654841503f744aae8b
[+0754s] # Writing manifest to image destination
[+0754s] # Storing signatures
[+0754s] # Loaded image(s): localhost/alpine
[+0754s] # $ /var/tmp/go/src/github.com/containers/buildah/tests/./../buildah bud --signature-policy /var/tmp/go/src/github.com/containers/buildah/tests/./policy.json -t alpine-image -f Dockerfile-3 /var/tmp/go/src/github.com/containers/buildah/tests/./bud/workdir-symlink
[+0754s] # STEP 1: FROM alpine
[+0754s] # STEP 2: RUN ln -sf /var/lib/tempest /tempest
[+0754s] # STEP 3: WORKDIR /tempest/lowerdir
[+0754s] # STEP 4: RUN touch /etc/notareal.conf
[+0754s] # error building at STEP "RUN touch /etc/notareal.conf": error ensuring working directory "/tempest/lowerdir" exists: mkdir /var/tmp/tmpee366ec5c66cd5695f9c8890/root/vfs/dir/8ac7536b8fbed086a10d9a2bfa64fd23900013cf6820bea187eceb49467b0326/tempest: file exists
[+0754s] # [ rc=125 (** EXPECTED 0 **) ]
[+0754s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+0754s] # #| FAIL: exit code is 125; expected 0
[+0754s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@rhatdan
Copy link
Member

rhatdan commented May 18, 2020

Looks like if the destination directory is a linkfile, then we would expect it to also succeed, at least in this case.

@rhatdan
Copy link
Member

rhatdan commented May 20, 2020

@kolyshkin How do you want to fix this?

@giuseppe
Copy link
Member

is this still being worked on?

@kolyshkin
Copy link
Contributor Author

is this still being worked on?

Yes. Sorry I've neglected this one. Looking...

@kolyshkin
Copy link
Contributor Author

OK, this happens because MkdirAll is executed on the host, but the symlink is absolute and only makes sense in a chroot.

I can think of two ways to fix this, but I don't like either. It is probably best to ignore the error and (in case it's not actually a directory) rely on failing during cd later. This is exactly what the current code does (implicitly), see 78fb869

IOW we have something like this:

ls -l ~/.local/share/containers/storage/overlay/$SHA/merged/
lrwxrwxrwx. 1 kir kir   16 Jun 11 13:24 tempest -> /var/lib/tempest
drwxrwxr-x. 3 kir kir 4096 Jun 11 13:24 var

Obviously, the symlink is correct from within the container's chroot, but since MkdirAll() is called from the host context (without chroot), it sees the symlink as invalid and reports the EEXIST error which we see here.

I have added a commit which ignores the error from MkdirAll only in this particular case, but frankly I am not sure if it brings much value. The only value is we fail earlier rather than later (when actually trying to cd into the directory).

So, second patch is optional and can be removed; please let me know what you think.

cben referenced this pull request Jun 16, 2020
Add a pull request template.  Modeled after CRI-O (Thanks @saschagrunert!) and
Dockers.

Signed-off-by: TomSweeneyRedHat <[email protected]>
@TomSweeneyRedHat
Copy link
Member

@kolyshkin can you try rebasing this please? We'd a fix in #2427 that should help at least some if not all of the test issues.

This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt `EEXIST`/`IsExist`:

 - for `Mkdir()`, `IsExist` error should (usually) be ignored
   (unless you want to make sure directory was not there before)
   as it means "the destination directory was already there";

 - for `MkdirAll()`, `IsExist` error should NEVER be ignored.

This commit removes ignoring the IsExist error, as it should not
be ignored.

[v2: skip patching (*Builder).Run]

For more details, a quote from opencontainers/runc PR containers#162:

-quote-

TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.

Quoting MkdirAll documentation:

> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.

This means two things:

1. If a directory to be created already exists, no error is
returned.

2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.

The above is a theory, based on quoted documentation and my UNIX
knowledge.

3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in containers#2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.

Because of containers#1, IsExist check after MkdirAll is not needed.

Because of containers#2 and containers#3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.

Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.

> [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

-end-quote-

Signed-off-by: Kir Kolyshkin <[email protected]>
It is not entirely correct to always ignore EEXIST here. It should only
be ignored in one special case: when a working directory already exists,
and is an absolute symlink to another directory under container root.

MkdirAll reports an error because the symlink is broken in the host
context (without chroot).

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

can you try rebasing this please?

done; thanks for looking into it @TomSweeneyRedHat

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kolyshkin
To complete the pull request process, please assign cevich after the PR has been reviewed.
You can assign the PR to them by writing /assign @cevich in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Oct 28, 2020

@kolyshkin Would you like to rebase this so we can get it in?

openshift-merge-robot added a commit that referenced this pull request Nov 5, 2020
Attempt to test `fix MkdirAll usage #2361`
@rhatdan
Copy link
Member

rhatdan commented Nov 6, 2020

I completed this PR here: #2735

@rhatdan rhatdan closed this Nov 6, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants