Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for gpg availability sometimes times out #428

Closed
znewman01 opened this issue Sep 21, 2022 · 8 comments · Fixed by #437
Closed

Checking for gpg availability sometimes times out #428

znewman01 opened this issue Sep 21, 2022 · 8 comments · Fixed by #437

Comments

@znewman01
Copy link
Contributor

When importing constants.py, securesystemslib checks for GPG by calling is_available_gnupg()"

https://github.com/secure-systems-lab/securesystemslib/blob/v0.24.0/securesystemslib/gpg/constants.py#L26-L32

In go-tuf, we've seen this time out a few times. For instance,

https://github.com/theupdateframework/go-tuf/actions/runs/3100521501/jobs/5020899719

You might think: that's ridiculous! How would a call to GPG time out? And the answer is: GitHub's hosted Windows runners experience intermittent extreme slowness. I don't believe GPG is even present on these machines, so it's really Windows taking >3 seconds to tell you that the binary is not found. Regardless, it happens.

My preference would be to avoid doing this on import at all, but I think just catching subprocess.TimeoutExpired would fix it.

znewman01 added a commit to znewman01/securesystemslib that referenced this issue Sep 21, 2022
Sometimes, in some systems, not pointing any fingers but Windows, 
just finding out that there's no `gpg` available takes >3 seconds.

In that case, we now just assume that there's no GPG because a 
binary that takes 3 seconds even to start isn't any better than no binary at all.

Fixes secure-systems-lab#428.
znewman01 added a commit to znewman01/securesystemslib that referenced this issue Sep 21, 2022
Sometimes, in some systems, not pointing any fingers but Windows,
just finding out that there's no `gpg` available takes >3 seconds.

In that case, we now just assume that there's no GPG because a
binary that takes 3 seconds even to start isn't any better than no binary at all.

Fixes secure-systems-lab#428.

Signed-off-by: Zachary Newman <[email protected]>
@adityasaky
Copy link
Member

adityasaky commented Sep 21, 2022

I think just catching subprocess.TimeoutExpired would fix it

Hm, I'm not sure if this is the best way to handle the issue because unless we're very careful, we'd end up not running GPG tests on a Windows box but the CI still goes green, meaning we merge something we shouldn't. There are a number of tests gated by the HAVE_GPG flag.

@znewman01
Copy link
Contributor Author

znewman01 commented Sep 21, 2022

Alternatives might be:

  • Test HAVE_GPG == true in a test that runs in CI, so CI fails if this doesn't work. (Seems like you should be doing this anyway, once you fix GPG tests fail on windows #339, which looks related, since otherwise you'd still wind up green if GPG went missing from the environment.)
  • Increase timeout for subprocess call to like 10s? I don't like this because it feels very band-aid-y. Who's to say it won't hang indefinitely? That's why we have timeouts, after all.
  • Don't try to execute processes on import:
    • Replace HAVE_GPG and GPG_COMMAND with two functions.
    • If you're worried about breaking people who depend on HAVE_GPG, you can make it an object and override __bool__ to lazily run this (and cache the result)
  • Let me set an environment variable to turn this behavior off (it can't be a Python global or anything because we're running into this when invoking the python-tuf CLI).

@znewman01
Copy link
Contributor Author

Ping—do any of the above sound acceptable?

@adityasaky
Copy link
Member

1 (quick fix) and 3 (more sound fix) sound good to me but I'd like to defer to the core securesystemslib devs @lukpueh @jku @joshuagl

@joshuagl
Copy link
Collaborator

Agree with @adityasaky. I like the plan to not execute processes on import, what do you think @lukpueh ?

@lukpueh
Copy link
Member

lukpueh commented Oct 4, 2022

I think it's a good idea to assert HAVE_GPG==True in tests, where we expect gpg to be available.

Evaluating HAVE_GPG and GPG_COMMAND lazily when needed seems fine, but we might want to cache the result so that we don't have to run the commands more often than necessary.

Background:
At first we ran the command on import only to determine whether to use gpg2 or gpg and to set a global GPG_COMMAND (see 507c1c4, originally pr-ed in in-toto/in-toto#218). Only later we re-purposed that info for a global HAVE_GPG, to allow clean imports of a pure-python securesystemslib (see #200, #206).

znewman01 added a commit to znewman01/securesystemslib that referenced this issue Oct 13, 2022
See secure-systems-lab#428.

This commit is over-complicated, but (mostly) not breaking (you can continue to
use `HAVE_GPG` and `GPG_COMMAND`).

In this commit:

- Add cacheing (`functools.lru_cache`) to `is_available_gnupg`.
- Add `gpg_command()` to replace `GPG_COMMAND`.
- Add `have_gpg()` to replace `HAVE_GPG`.
- Replace `GPG_COMMAND` and `HAVE_GPG` with a `lazy.wrap_thunk()` wrapper for
  their corresponding functions.
  - This wrapper lazily runs the underlying thunk and passes
    through *everything* to the result of calling it.
  - This is a terrible hack and I'm sorry.
  - Some things still break, like `StringIO`.
- Add a test that imports `securesystemslib.constants.gpg` with a busted
  `$GNUPG` to make sure that importing the library doesn't try to shell
  out. This failed before this change.

Signed-off-by: Zachary Newman <[email protected]>
znewman01 added a commit to znewman01/securesystemslib that referenced this issue Oct 13, 2022
Fixes secure-systems-lab#428. Okay to do this because we will fail tests if GPG is unexpectedly
unavailable (secure-systems-lab#434).

Signed-off-by: Zachary Newman <[email protected]>
znewman01 added a commit to znewman01/securesystemslib that referenced this issue Oct 13, 2022
See secure-systems-lab#428.

This commit is over-complicated, but (mostly) not breaking (you can continue to
use `HAVE_GPG` and `GPG_COMMAND`).

In this commit:

- Add cacheing (`functools.lru_cache`) to `is_available_gnupg`.
- Add `gpg_command()` to replace `GPG_COMMAND`.
- Add `have_gpg()` to replace `HAVE_GPG`.
- Replace `GPG_COMMAND` and `HAVE_GPG` with a `lazy.wrap_thunk()` wrapper for
  their corresponding functions.
  - This wrapper lazily runs the underlying thunk and passes
    through *everything* to the result of calling it.
  - This is a terrible hack and I'm sorry.
  - Some things still break, like `StringIO`.
- Add a test that imports `securesystemslib.constants.gpg` with a busted
  `$GNUPG` to make sure that importing the library doesn't try to shell
  out. This failed before this change.

Signed-off-by: Zachary Newman <[email protected]>
@lukpueh lukpueh reopened this Oct 26, 2022
@lukpueh lukpueh closed this as completed Oct 26, 2022
@jku
Copy link
Collaborator

jku commented Nov 1, 2022

https://github.com/secure-systems-lab/securesystemslib/actions/runs/3362099110/jobs/5573366570

FAIL: test_gpg_available (__main__.TestGpgAvailable)
Test that GPG is available.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/securesystemslib/securesystemslib/tests/check_gpg_available.py", line 39, in test_gpg_available
    self.assertTrue(securesystemslib.gpg.constants.have_gpg())
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 3.043s

This was on MacOS. The 3.043s looks suspiciously like one default timeout.

@znewman01
Copy link
Contributor Author

Weird that it's happening on Mac, it was always Windows before.

I wonder whether it's just a weird virtualization hiccup where reading the binary from disk takes a really long time. Maybe we should consider increasing the timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants