Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ohpc node provisioning fails due to repo GPG key errors #77

Open
jprorama opened this issue Mar 21, 2019 · 7 comments
Open

ohpc node provisioning fails due to repo GPG key errors #77

jprorama opened this issue Mar 21, 2019 · 7 comments

Comments

@jprorama
Copy link
Owner

jprorama commented Mar 21, 2019

After the upstream release of Open HPC 1.3.7 the TASK [compute_build_vnfs : yum install into the image chroot] is failing. There is lots of output but the final error message is:

Transaction Summary
================================================================================
Install  42 Packages (+381 Dependent packages)
Upgrade              (  19 Dependent packages)

Total download size: 403 M
Downloading packages:
Delta RPMs reduced 12 M of updates to 3.8 M (67% saved)
Public key for ModemManager-glib-1.6.10-1.el7.x86_64.rpm is not installed
Public key for NetworkManager-1.12.0-10.el7_6.x86_64.rpm is not installed
Public key for Thunar-1.6.16-1.el7.x86_64.rpm is not installed
Public key for lua-bit-ohpc-1.0.2-1.1.x86_64.rpm is not installed
Public key for lmod-ohpc-7.8.15-4.1.ohpc.1.3.67.x86_64.rpm is not installed
Some delta RPMs failed to download or rebuild. Retrying..
Public key for mesa-libEGL-18.0.5-4.el7_6.x86_64.rpm is not installed
--------------------------------------------------------------------------------
Total                                               11 MB/s | 401 MB  00:35     
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-OpenHPC-1
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7

The seems related to not trusting the GPG keys for these repos. Given they are file-based keys, it may simply mean they are out of date with the upstream repos.

@jprorama
Copy link
Owner Author

I verified that the repo keys are all correct.
centos project keys are described here
the centos key matches the official project keys

[vagrant@ohpc vagrant]$ gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
pub  4096R/F4A80EB5 2014-06-23 CentOS-7 Key (CentOS 7 Official Signing Key) <[email protected]>
      Key fingerprint = 6341 AB27 53D7 8A78 A7C2  7BB1 24C6 A8A7 F4A8 0EB5

according to the ohpc guides, the first step is to install ohpc-release rpm to set up the repo and keys
this is installed on the ohpc machine
the current ohpc key is not published in summary form but is

[vagrant@ohpc vagrant]$ gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-OpenHPC-1 
pub  1024R/26CE6884 2015-10-27 OpenHPC Build Service <obsrun@localhost>
      Key fingerprint = DD5D 8CAA CB57 364F FCC2  D3AE C468 07FF 26CE 6884
sub  1024R/F4BF4F26 2015-10-27

the epel key from the fedora project is also accurate

[vagrant@ohpc vagrant]$ gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 
pub  4096R/352C64E5 2013-12-16 Fedora EPEL (7) <[email protected]>
      Key fingerprint = 91E9 7D7C 4A5E 96F1 7F3E  888F 6A2F AEA2 352C 64E5

@jprorama
Copy link
Owner Author

I tried reverting to the OpenHPC 1.3.6 for this task by replacing the update repo entry with the content of the 1.3.6 relaese http://build.openhpc.community/OpenHPC:/1.3:/Update6/CentOS_7/OpenHPC:1.3:Update6.repo

This still failed with the same error.

@jprorama
Copy link
Owner Author

jprorama commented Mar 21, 2019

The problematic task that is failing is found here.

Converting that task in to a yum command and executing on the command-line produces no errors.

sudo yum -y install --installroot=/opt/ohpc/admin/images/centos7-compute \
   chrony \
   'kernel-3.10.0-957.1.3.el7' \
   lmod-ohpc \
   grub2 \
   freeipmi \
   ipmitool \
   ohpc-slurm-client \
   ohpc-base-compute \
   tmux ruby turbojpeg nc \
   '@X Window System' \
   '@Xfce'

This doesn't provide an easy work-around because an earlier task removes the existing compute node image which means we repeat the same error each time this role is executed via ansible.

@jprorama
Copy link
Owner Author

I turned on logging for ansible. Noticing a difference between the output when the above is run by hand vs within ansible.

Here's the transaction summary from the command-line run:

Transaction Summary
======================================================================================================================
Install  42 Packages (+381 Dependent packages)
Upgrade              (  19 Dependent packages)

Total size: 402 M
Total download size: 5.1 M
Downloading packages:
Finishing delta rebuilds of 1 package(s) (5.1 M)
systemd-219-62.el7.x86_64 is not installed
Some delta RPMs failed to download or rebuild. Retrying..
systemd-219-62.el7_6.5.x86_64.rpm                                                              | 5.1 MB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : systemd-libs-219-62.el7_6.5.x86_64                                                               1/461 
  Updating   : 1:dbus-libs-1.10.24-13.el7_6.x86_64                                                              2/461 
  Updating   : freetype-2.8-12.el7_6.1.x86_64
...

Here's the transaction summary from the run of the task within ansible;

Transaction Summary
================================================================================
Install  42 Packages (+381 Dependent packages)
Upgrade              (  19 Dependent packages)

Total download size: 402 M
Downloading packages:
Delta RPMs reduced 12 M of updates to 3.8 M (67% saved)
Public key for ModemManager-glib-1.6.10-1.el7.x86_64.rpm is not installed
Public key for NetworkManager-1.12.0-10.el7_6.x86_64.rpm is not installed
Public key for Thunar-1.6.16-1.el7.x86_64.rpm is not installed
Public key for lua-bit-ohpc-1.0.2-1.1.x86_64.rpm is not installed
Public key for lmod-ohpc-7.8.1-5.1.ohpc.1.3.6.x86_64.rpm is not installed
Public key for slurm-contribs-ohpc-17.11.11-6.2.ohpc.1.3.6.x86_64.rpm is not installed
Some delta RPMs failed to download or rebuild. Retrying..
Public key for mesa-libEGL-18.0.5-4.el7_6.x86_64.rpm is not installed
--------------------------------------------------------------------------------
Total                                               10 MB/s | 400 MB  00:38     
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
Retrieving key from http://build.openhpc.community/OpenHPC:/1.3:/Update6/CentOS_7/repodata/repomd.xml.key
Retrieving key from https://copr-be.cloud.fedoraproject.org/results/louistw/slurm-17.11.11-ohpc-1.3.6/pubkey.gpg

In particular, notice the difference in the resolution of the delta packages. When run from the command line there is a warning about a failed delta but it succeeds. Also the total download size is 5.1MB, supposedly down from 402MB due to deltas.

The ansible-based execution doesn't see the same download footprint reduction nor does it appear to recover from the delta download failure.

@jprorama
Copy link
Owner Author

After much debugging of the ansible yum command, the difference between successful installs and failures was narrowed down to the centos/7 vagrant box versions. The systems that were running prior to 1901.1 were failing.

My existing vagrant box versions were:

$ vagrant box list
centos/7              (virtualbox, 1804.02)
centos/7              (virtualbox, 1809.01)
centos/7              (virtualbox, 1811.02)
centos/7              (virtualbox, 1812.01)

I was able to resolve this error by doing a vagrant box update and get 1902.1. Repeating the vagrant up ohpc after that to get the newest version allowed this task to succeed.

@jprorama
Copy link
Owner Author

jprorama commented Mar 22, 2019

To provide some background on the debugging of this failed task, the perplexing thing is that the task fails but when a similar yum command is run from the ohpc command line after experiencing the failed task, the yum command installs all packages without error. This suggests there really is no problem with repos and that the ansible command should succeed.

The equivalent yum command that succeeds without error is:

sudo yum -y install --installroot=/opt/ohpc/admin/images/centos7-compute chrony 'kernel-3.10.0-957.1.3.el7' lmod-ohpc grub2 freeipmi ipmitool ohpc-slurm-client ohpc-base-compute tmux ruby turbojpeg nc '@X Window System' '@Xfce'

The ansible yum module builds a yum command-line and calls that command to executed the task. Ansible's yum module selects from yum3, yum4, or dnf, but defaults to automatically picking the one one the system. This suggests there is some subtle difference between the two commands or the environment they are executed in. Both ansible and yum rely on the system python 2.7 install and it doesn't appear that yum4 aka dnf is called.

Further investigation is warranted but the work around of using the latest centos/7 box avoids requiring any edits to the role.

@jprorama
Copy link
Owner Author

For future debug reference login the failure via the -vvv args shows this as the log for this task in the output log. Not the error right at the end about the file not found. Not sure if this specifically the failure but it's the last part of the msg field. Not included is the result output, which just contains the dump of the yum ouput of which the portion above is a part.

2019-03-22 15:34:53,750 p=8904 u=vagrant |  TASK [compute_build_vnfs : yum install into the image chroot] ************
********************************************
2019-03-22 15:34:53,751 p=8904 u=vagrant |  task path: /vagrant/CRI_XCBC/roles/compute_build_vnfs/tasks/main.yml:43
2019-03-22 15:34:53,767 p=8904 u=vagrant |  [DEPRECATION WARNING]: Invoking "yum" only once while using a loop via squ
ash_actions is deprecated. Instead of using
 a loop to supply multiple items and specifying `name: "{{ item }}"`, please use `name: ['chrony', 'kernel-{{ 
running_kernel_version.stdout }}', 'lmod-ohpc', 'grub2', 'freeipmi', 'ipmitool', 'ohpc-slurm-client', 'ohpc-base-
compute', 'tmux', 'ruby', 'turbojpeg', 'nc', '@X Window System', '@Xfce']` and remove the loop. This feature will be 
removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
2019-03-22 15:34:53,804 p=8904 u=vagrant |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/packagi
ng/os/yum.py
2019-03-22 15:36:17,823 p=8904 u=vagrant |  failed: [ohpc] (item=[u'chrony', u'kernel-3.10.0-957.1.3.el7', u'lmod-ohpc
', u'grub2', u'freeipmi', u'ipmitool', u'ohpc-slurm-client', u'ohpc-base-compute', u'tmux', u'ruby', u'turbojpeg', u'n
c', u'@X Window System', u'@Xfce']) => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "allow_downgrade": false, 
            "autoremove": false, 
            "bugfix": false, 
            "conf_file": null, 
            "disable_excludes": null, 
            "disable_gpg_check": false, 
            "disable_plugin": [], 
            "disablerepo": [], 
            "download_only": false, 
            "enable_plugin": [], 
            "enablerepo": [], 
            "exclude": [], 
            "install_repoquery": true, 
            "installroot": "/opt/ohpc/admin/images/centos7-compute", 
            "list": null, 
            "name": [
                "chrony", 
                "kernel-3.10.0-957.1.3.el7", 
                "lmod-ohpc", 
                "grub2", 
                "freeipmi", 
                "ipmitool", 
                "ohpc-slurm-client", 
                "ohpc-base-compute", 
                "tmux", 
                "ruby", 
                "turbojpeg", 
                "nc", 
                "@X Window System", 
                "@Xfce"
            ], 
            "releasever": null, 
            "security": false, 
            "skip_broken": false, 
            "state": "present", 
            "update_cache": false, 
            "update_only": false, 
            "use_backend": "auto", 
            "validate_certs": true
        }
    }, 
    "item": [
        "chrony", 
        "kernel-3.10.0-957.1.3.el7", 
        "lmod-ohpc", 
        "grub2", 
        "freeipmi", 
        "ipmitool", 
        "ohpc-slurm-client", 
        "ohpc-base-compute", 
        "tmux", 
        "ruby", 
        "turbojpeg", 
        "nc", 
        "@X Window System", 
        "@Xfce"
    ], 
    "msg": "mesa-libEGL-18.0.5-3.el7.x86_64 is not installed\nmesa-libgbm-18.0.5-3.el7.x86_64 is not installed\nmesa-libGL-18.0.5-3.el7.x86_64 is not installed\nmesa-libglapi-18.0.5-3.el7.x86_64 is not installed\nsystemd-libs-219-62.el7.x86_64 is not installed\nruby-libs-2.0.0.648-33.el7_4.x86_64 is not installed\nsystemd-python-219-62.el7.x86_64 is not installed\nwarning: /opt/ohpc/admin/images/centos7-compute/var/cache/yum/x86_64/7/base/packages/ModemManager-glib-1.6.10-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY\nxorg-x11-server-common-1.20.1-3.el7.x86_64 is not installed\n/usr/share/locale/da/LC_MESSAGES/util-linux.mo: No such file or directory\ncannot reconstruct rpm from disk files\nwarning: /opt/ohpc/admin/images/centos7-compute/var/cache/yum/x86_64/7/epel/packages/Thunar-1.6.16-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY\nwarning: /opt/ohpc/admin/images/centos7-compute/var/cache/yum/x86_64/7/OpenHPC-updates/packages/lmod-ohpc-7.8.15-4.1.ohpc.1.3.67.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 26ce6884: NOKEY\nImporting GPG key 0xF4A80EB5:\n Userid     : \"CentOS-7 Key (CentOS 7 Official Signing Key) <[email protected]>\"\n Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5\n Package    : centos-release-7-6.1810.2.el7.centos.x86_64 (@os-base/$releasever)\n From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7\nImporting GPG key 0x26CE6884:\n Userid     : \"OpenHPC Build Service <obsrun@localhost>\"\n Fingerprint: dd5d 8caa cb57 364f fcc2 d3ae c468 07ff 26ce 6884\n From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-OpenHPC-1\nImporting GPG key 0x352C64E5:\n Userid     : \"Fedora EPEL (7) <[email protected]>\"\n Fingerprint: 91e9 7d7c 4a5e 96f1 7f3e 888f 6a2f aea2 352c 64e5\n From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7\nTraceback (most recent call last):\n  File \"/bin/yum\", line 29, in <module>\n    yummain.user_main(sys.argv[1:], exit_code=True)\n  File \"/usr/share/yum-cli/yummain.py\", line 375, in user_main\n    errcode = main(args)\n  File \"/usr/share/yum-cli/yummain.py\", line 281, in main\n    return_code = base.doTransaction()\n  File \"/usr/share/yum-cli/cli.py\", line 695, in doTransaction\n    if self.gpgsigcheck(downloadpkgs) != 0:\n  File \"/usr/share/yum-cli/cli.py\", line 839, in gpgsigcheck\n    result, errmsg = self.sigCheckPkg(po)\n  File \"/usr/lib/python2.7/site-packages/yum/__init__.py\", line 2760, in sigCheckPkg\n    ts, po.localPkg(), payload=self.conf.payload_gpgcheck,\n  File \"/usr/lib/python2.7/site-packages/rpmUtils/miscutils.py\", line 76
, in checkSig\n    fdno = os.open(package, os.O_RDONLY)\nOSError: [Errno 2] No such file or directory: '/opt/ohpc/admin/images/centos7-compute/var/cache/yum/x86_64/7/updates/packages/systemd-219-62.el7_6.5.x86_64.rpm'\n", 
    "rc": 1, 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant