Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OKD 4.11][AWS] Installing cluster fails since new S3 bucket settings #1576

Closed
poperne opened this issue Apr 24, 2023 · 8 comments
Closed

[OKD 4.11][AWS] Installing cluster fails since new S3 bucket settings #1576

poperne opened this issue Apr 24, 2023 · 8 comments

Comments

@poperne
Copy link

poperne commented Apr 24, 2023

Describe the bug

Trying to set up a new cluster at the AWS account, and got an error during installation:

error creating S3 bucket ACL for mock-cluster-tkxmn-bootstrap: AccessControlListNotSupported: The bucket does not allow ACLs

Checked AWS Documentation found next:

Starting in April 2023, Amazon S3 will change the default settings for S3 Block Public Access and Object Ownership (ACLs disabled) for all new S3 buckets. For new buckets created after this update, all S3 Block Public Access settings will be enabled, and S3 access control lists (ACLs) will be disabled. These defaults are the recommended best practices for securing data in Amazon S3. You can adjust these settings after creating your bucket

Version
4.11.0-0.okd-2022-08-20-022919
IPI method

How reproducible
100% from 24.04.2023

Log bundle

@poperne poperne changed the title [OKD 4.11][AWS] Installing cluster fails due to new S3 bucket [OKD 4.11][AWS] Installing cluster fails since new S3 bucket settings Apr 24, 2023
@vrutkovs
Copy link
Member

@poperne
Copy link
Author

poperne commented Apr 24, 2023

Thank you for the issue, but still did not copy, will it be fixed in OKD 4.11? Found that newest 4.12 build already include fix

@vrutkovs
Copy link
Member

Once installer repo is cherry-picked in release-4.11 it will land in OKD 4.11 nightlies

@lisfo4ka
Copy link

Hi folks, the specified issue is a critical blocker for our team and customer as well. Our release has completely failed.

We really need to get the fix in the 4.11.0-0.okd-2022-08-20-022919 version since it was the latest working one for our platform because of the bug with Rook + Ceph (#1505).

We've already tried to build the OKD installer ourselves from https://github.com/openshift/installer/tree/7493bb2821ccd348c11aa36f05aa060b3ab6beda, which corresponds to https://github.com/okd-project/okd/releases/tag/4.11.0-0.okd-2022-08-20-022919 with cherry-pick of the openshift/installer@3b33851. Unfortunately, it seems to pull some latest code dependencies, such as AMI and API for machine sets, etc.

Please help deliver the fix to the https://github.com/okd-project/okd/releases/tag/4.11.0-0.okd-2022-08-20-022919 version or point out how to pin the dependencies version to the corresponding version during the build process.

P.S. We can't update our clusters to the 4.12 since our application requires the release of the Istio component with the fix to this issue (istio/istio#42485)

@vrutkovs
Copy link
Member

We no longer make OKD 4.11 stable releases as it has moved to 4.12.

Please help deliver the fix to the 4.11.0-0.okd-2022-08-20-022919 (release) version or point out how to pin the dependencies version to the corresponding version during the build process.

You could use installer w/ the fix (be it extracted from the nightly or built manually) with existing 4.11.0-0.okd-2022-08-20-022919 release image, see https://github.com/openshift/installer/blob/master/docs/dev/alternative_release_image_sources.md?plain=1#L8

@lisfo4ka
Copy link

@vrutkovs thanks a lot! We've set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE env variable and specified our old amiID in install-config. This solved the problem, we've managed to install the 4.11.0-0.okd-2022-08-20-022919 version with a fix. OKD cluster looks healthy and it seems we unblocked now.

But there is still one warning that machine-config ClusterOperator Cannot update. With the following messages in the status:

status:
  conditions:
    - lastTransitionTime: '2023-04-26T12:04:59Z'
      message: Working towards 4.11.0-0.okd-2022-08-20-022919
      status: 'True'
      type: Progressing
    - lastTransitionTime: '2023-04-26T12:06:40Z'
      message: >-
        One or more machine config pools are degraded, please see `oc get mcp`
        for further details and resolve before upgrading
      reason: DegradedPool
      status: 'False'
      type: Upgradeable
    - lastTransitionTime: '2023-04-26T12:16:34Z'
      message: >-
        Unable to apply 4.11.0-0.okd-2022-08-20-022919: error during
        syncRequiredMachineConfigPools: [timed out waiting for the condition,
        error pool master is not ready, retrying. Status: (pool degraded: true
        total: 3, ready 0, updated: 0, unavailable: 3)]
      reason: RequiredPoolsFailed
      status: 'True'
      type: Degraded
    - lastTransitionTime: '2023-04-26T12:16:34Z'
      message: 'Cluster has deployed []'
      status: 'True'
      type: Available
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      136m
worker   rendered-worker-ee65876cac02343f139233f00a2d66fc   True      False      False      9              9                   9                     0                      136m

Do you have any idea whaе could have changed to cause the problem? Since on the pure 4.11.0-0.okd-2022-08-20-022919 OKD clusters we do not have such an issue. But we've already checked on 2 new fixed clusters and there is the same behavior with it.

@vrutkovs
Copy link
Member

Not sure, we'll need must-gather to find out more. Also, make sure you use installer from release-4.11 branch - more recent installers may not be compatible

@lisfo4ka
Copy link

Thanks. We built it from this commit openshift/installer@7493bb2 which is for sure in the release-4.11 branch. We'll spend some time to investigate and test if it really causes any issues and I be back if any.
Really appreciate your quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants