You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to install OKD on bare metal with the agent installer as described here Globally I succeeded but encountered two problems:
at some point, the bootstrap/rendez-vous (ie "BS" node) "lost" the DNS configuration as declared in theagent-config.yamlfile and it has to be manually (re-)entered
the installation was not able to terminate as the BS node did not appear as a node (in oc get nodes). All other nodes were there but not the BS one. Manually rebooting the node forced it to finished its initialization and to appear amongst the list of nodes
Topoloy
3 masters : okd5-master[1-3], 192.168.5.[63-65]
2 workers: okd5-worker[1-2], 192.168.5.[66-67]
1 load balancer in front of the cluster with HA proxy (well..) configured
DNS, DHCP all setup and working as all other prerequisites
okd5-master1 is designated as the rendez-vous / bootstrap node (ip: 192.168.5.63)
After having created the iso image etc, all the 5 nodes are started at the same time and the installation starts
The progress is monitored with
./openshift-install --dir install agent wait-for install-complete
...
INFO Host okd5-master2: updated status from insufficient to known (Host is ready to be installed)
INFO Cluster is ready for install
INFO Cluster validation: All hosts in the cluster are ready to install.
INFO Preparing cluster for installation
INFO Host okd5-master2: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
INFO Host okd5-master3 validation: Host NTP is synced
INFO Host okd5-master2 validation: Host NTP is synced
INFO Host okd5-worker2 validation: Host NTP is synced
INFO Host okd5-worker2: validation 'ntp-synced' is now fixed
INFO Host okd5-worker1 validation: Host NTP is synced
INFO Host okd5-master1 validation: Host NTP is synced
INFO Host okd5-worker1: validation 'ntp-synced' is now fixed
INFO Host okd5-master1: New image status quay.io/openshift/okd-content@sha256:786a746a4cdce34c925e0cf10082a2b9caa27edd9c0bc037272cd8a85f79f922. result: success. time: 4.04 seconds; size: 509.25 Megabytes; download rate: 132.32 MBps
INFO Host okd5-worker1: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
INFO Cluster installation in progress
INFO Host: okd5-master1, reached installation stage Writing image to disk
INFO Host: okd5-master2, reached installation stage Rebooting
INFO Host: okd5-master1, reached installation stage Waiting for control plane: Waiting for bootstrap node preparation
INFO Host: okd5-master1, reached installation stage Waiting for control plane: Waiting for masters to join bootstrap control plane
Then everything stops. The console of okd5-master1 shows that something is looping:
I then sshed to the node:
[root@okd5-master1 ~]# podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a86556f2908e localhost/podman-pause:4.7.0-1695838680 8 minutes ago Up 7 minutes 11e0716db4f5-infra
6eda9b76734b quay.io/openshift/okd-content@sha256:ae9c813b78902dc4fc99cafd7b8f3d76b06aa11b4205d18f931cf62200a2c6d5 /bin/bash start_d... 7 minutes ago Up 7 minutes assisted-db
e33f5947e76e quay.io/openshift/okd-content@sha256:ae9c813b78902dc4fc99cafd7b8f3d76b06aa11b4205d18f931cf62200a2c6d5 /assisted-service 7 minutes ago Up 7 minutes service
ebf19a760d6d quay.io/openshift/okd-content@sha256:ae9c813b78902dc4fc99cafd7b8f3d76b06aa11b4205d18f931cf62200a2c6d5 /usr/local/bin/ag... 7 minutes ago Exited (0) 7 minutes ago apply-host-config
85c950aa98b6 quay.io/openshift/okd-content@sha256:57109646c2e66aee05c7003d0e0b7f1538f37a01c2f633fad8e962b3e1727335 next_step_runner ... 7 minutes ago Up 7 minutes next-step-runner
7d601e5cca4f quay.io/openshift/okd-content@sha256:786a746a4cdce34c925e0cf10082a2b9caa27edd9c0bc037272cd8a85f79f922 --role bootstrap ... 4 minutes ago Up 4 minutes assisted-installer
d3e4d0f0bb0c quay.io/openshift/okd-content@sha256:b4aa05ed09915158bbf554dff010f1a5adde269a8c9a207fae85a8739b627583 start --node-name... 4 minutes ago Exited (0) 3 minutes ago suspicious_chandrasekhar
[root@okd5-master1 ~]#journalctl -xn -u crio | less
Mar 21 01:49:39 okd5-master1 crio[6495]: time="2024-03-21 01:49:39.388636025Z" level=info msg="Registered SIGHUP reload watcher"
Mar 21 01:49:39 okd5-master1 crio[6495]: time="2024-03-21 01:49:39.389892926Z" level=info msg="Starting seccomp notifier watcher"
Mar 21 01:49:39 okd5-master1 crio[6495]: time="2024-03-21 01:49:39.390031988Z" level=info msg="Create NRI interface"
Mar 21 01:49:39 okd5-master1 crio[6495]: time="2024-03-21 01:49:39.390052759Z" level=info msg="NRI interface is disabled in the configuration."
Mar 21 01:49:39 okd5-master1 crio[6495]: time="2024-03-21 01:49:39.391515863Z" level=info msg="Serving metrics on :9537 via HTTP"
Mar 21 01:49:39 okd5-master1 systemd[1]: Started crio.service - Container Runtime Interface for OCI (CRI-O).
¦¦ Subject: A start job for unit crio.service has finished successfully
¦¦ Defined-By: systemd
¦¦ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
¦¦
¦¦ A start job for unit crio.service has finished successfully.
¦¦
¦¦ The job identifier is 1618.
Mar 21 01:49:41 okd5-master1 crio[6495]: time="2024-03-21 01:49:41.209741849Z" level=info msg="Checking image status: quay.io/openshift/okd-content@sha256:6308b9e9ba777ea62ad55ea4ea6a9a06aa770ad40f11fc310fc915fdaf48ddb2" id=4f6e2aaa-4c1b-4252-81d2-851c74658612 name=/runtime.v1.ImageService/ImageStatus
Mar 21 01:49:41 okd5-master1 crio[6495]: time="2024-03-21 01:49:41.210172943Z" level=info msg="Image quay.io/openshift/okd-content@sha256:6308b9e9ba777ea62ad55ea4ea6a9a06aa770ad40f11fc310fc915fdaf48ddb2 not found" id=4f6e2aaa-4c1b-4252-81d2-851c74658612 name=/runtime.v1.ImageService/ImageStatus
Mar 21 01:54:41 okd5-master1 crio[6495]: time="2024-03-21 01:54:41.327860027Z" level=info msg="Checking image status: quay.io/openshift/okd-content@sha256:6308b9e9ba777ea62ad55ea4ea6a9a06aa770ad40f11fc310fc915fdaf48ddb2" id=e5855e4d-94d7-4e45-b4e2-aa9bc6ba86d4 name=/runtime.v1.ImageService/ImageStatus
Mar 21 01:54:41 okd5-master1 crio[6495]: time="2024-03-21 01:54:41.328261482Z" level=info msg="Image quay.io/openshift/okd-content@sha256:6308b9e9ba777ea62ad55ea4ea6a9a06aa770ad40f11fc310fc915fdaf48ddb2 not found" id=e5855e4d-94d7-4e45-b4e2-aa9bc6ba86d4 name=/runtime.v1.ImageService/ImageStatus
[root@okd5-master1 ~]# ping quay.io
ping: quay.io: Temporary failure in name resolution
[root@okd5-master1 ~]# more /etc/resolv.conf
[root@okd5-master1 ~]#
So the BS node was not able to continue because it could not download image from quay.io because theresolv.confis empty at this stage ! ("Image quay.io/openshift/okd-content@sha256:... not found")
I added the lines from agent-config.yaml in /etc/resolv.conf`and immediatly the installation stops looping and goes on...
and the installation of the 4 other nodes continued and succedded etc..
Second problem
Then the installation stopped again and never finished. After waiting a long time (and all nodes at about 5% cpu...), I managed to open an oc session to okd-master1
oc get nodes returned the list of all the nodes as "ready" except the BS node (okd5-master1) that was not even in the list. and of course oc get coand oc get clusterversionindicated that many operators were broken because 1/3 of the masters was missing...
INFO Bootstrap Kube API Initialized
INFO Bootstrap configMap status is complete
INFO cluster bootstrap is complete
So I sshed again in okd5-master1 and force a reboot withshutdown -r nowand tada...the installation of the BS node finished and finally the cluster installation went to the end with all the 5 nodes known to the cluster and "ready"
The text was updated successfully, but these errors were encountered:
titou10titou10
changed the title
Agent Installer installation "loose" the dns config at some point and need a manual reboot for rendez-vous host
Agent Installer installation "loses" the dns config at some point and need a manual reboot for rendez-vous host
Mar 25, 2024
OKD version: 4.15.0-0.okd-2024-03-10-010116
Summary
I tried to install OKD on bare metal with the agent installer as described here Globally I succeeded but encountered two problems:
agent-config.yaml
file and it has to be manually (re-)enteredoc get nodes
). All other nodes were there but not the BS one. Manually rebooting the node forced it to finished its initialization and to appear amongst the list of nodesTopoloy
Part of the agent-config.yaml:
The 5 other nodes are on the same pattern
Installation
First problem
After having created the iso image etc, all the 5 nodes are started at the same time and the installation starts
The progress is monitored with
Then everything stops. The console of okd5-master1 shows that something is looping:
I then sshed to the node:
So the BS node was not able to continue because it could not download image from quay.io because the
resolv.conf
is empty at this stage ! ("Image quay.io/openshift/okd-content@sha256:... not found")I added the lines from agent-config.yaml in /etc/resolv.conf`and immediatly the installation stops looping and goes on...
and the installation of the 4 other nodes continued and succedded etc..
Second problem
Then the installation stopped again and never finished. After waiting a long time (and all nodes at about 5% cpu...), I managed to open an oc session to okd-master1
oc get nodes
returned the list of all the nodes as "ready" except the BS node (okd5-master1) that was not even in the list. and of courseoc get co
andoc get clusterversion
indicated that many operators were broken because 1/3 of the masters was missing...At this point the status is this:
So I sshed again in okd5-master1 and force a reboot with
shutdown -r now
and tada...the installation of the BS node finished and finally the cluster installation went to the end with all the 5 nodes known to the cluster and "ready"The text was updated successfully, but these errors were encountered: