Insights Operator pulling and exposing data from the OCM API #683

tremes · 2021-03-09T13:54:14Z

No description provided.

enhancements/insights/pulling-data-from-ocm.md

openshift-ci-robot · 2021-04-08T08:46:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign staebler after the PR has been reviewed.
You can assign the PR to them by writing /assign @staebler in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enhancements/insights/pulling-data-from-ocm.md

sbose78 · 2021-06-07T17:09:02Z

Does insights Operator have access to the org ID and/or cluster Id?

tremes · 2021-06-08T10:04:10Z

@sbose78 The Insights Operator does have access to the cluster ID, but from what I know it doesn't have access to the org ID.

iNecas · 2021-06-25T13:00:29Z

/lgtm
/approve

sbose78 · 2021-06-25T14:37:19Z

Thanks Ivan, I think we have all the approvals needed at this stage.

sbose78 · 2021-07-07T12:55:57Z

Hi @smarterclayton Would you be able to review this, and potentially merge this ?

tremes · 2021-07-07T13:19:50Z

@sbose78 Thanks for pushing it, but maybe it would be good if someone from the OCM can review too. @abhgupta pls

enhancements/insights/pulling-data-from-ocm.md

petli-openshift · 2021-07-16T14:20:03Z

LGTM from my perspective.

enhancements/insights/pulling-data-from-ocm.md

sbose78 · 2021-08-10T03:50:56Z

I'd like to address a couple of questions I see in the above in multiple discussion threads.

How does an entity make the SCA cert accessible to service accounts in other namespaces ?

Make the Secret etc-pki-entitlement available for use in other namespaces by creating a cluster-scoped Share resource

apiVersion: projectedresource.storage.openshift.io/v1alpha1
kind: Share
metadata:
  name: etc-pki-entitlement
spec:
  backingResource:
    kind: Secret
    apiVersion: v1
    name: etc-pki-entitlement
    namespace: openshift-config-managed

Admin creates a clusterrolebinding to allow a service account ( example, pod-sa ) access to the above Share resource ( sample )

Consumption by Pods

kind: Pod
apiVersion: v1
metadata:
  name: user-workload
  namespace: foo
spec:
  serviceAccountName: pod-sa  # <----- Previously a clusterrolebinding was created for this sa
  containers:
    - name: my-frontend
      image: quay.io/quay/busybox
      volumeMounts:
        - mountPath: "/data"
          name: my-csi-volume
      command:
        - sh
        - -c
        - |
          while true
          do
            ls -la /data
            touch /data/foo
            ls -la /data
            echo "calling sleep"
            sleep 120
          done
  volumes:
    - name: my-csi-volume
      csi:
        #readOnly: true
        driver: csi.shared-resources.openshift.io
        volumeAttributes:
          share: etc-pki-entitlement. # <----- name of the cluster-scoped `Share` resource.

Consumption by Builds

After https://issues.redhat.com/browse/BUILD-274 is done, one would be able to do the following:

kind: Build
apiVersion: build.openshift.io/v1
metadata:
  ...
spec:
  source:
     ...
  strategy:
    dockerStrategy:
      volumes:
      - name: etc-pki-entitlement
        type: CSI
        csi: 
          driver: projected-resource.csi.openshift.io
          volumeAttributes:
            share: etc-pki-entitlement
      volumeMounts:
      - name: etc-pki-entitlement
        mountPath: /etc/pki/entitlement

As of today, the Build API team has made progress with supporting Secret & ConfigMap volume mounts in Builds as part of https://issues.redhat.com/browse/BUILD-87 . https://issues.redhat.com/browse/BUILD-274 is a candidate for 4.10 .

sbose78 · 2021-08-10T03:54:11Z

As Ben mentioned, this EP's sole goal is to ensure the SCA certs are pulled into the cluster and lifecycle'd by the Insights Operator.

Consumption of the same is being driven by work in distinct Openshift components:

The brand new OpenShift Projected Resource CSI Driver
Changes in the Build API to support volume mounts ( specifically, CSI volume mounts )

tremes · 2021-08-10T13:50:31Z

I slightly updated the proposal to address some points and cover the part of SCA certs use. Can you @bparees, @dhellmann please take a look at my last commit? Let's discuss potential blockers tomorrow.

enhancements/insights/pulling-sca-certs-from-ocm.md

sbose78

Requesting one change : Create the Share resource.

enhancements/insights/pulling-sca-certs-from-ocm.md

wking · 2021-08-16T18:46:21Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+- `insights-operator-e2e-tests` suite can verify the SCA cert data
+  is available
+- Basic test of the validity of the SCA certs. Mount the `etc-pki-entitlement` secret and run e.g `yum install` in the container


Since most consumers will presumably be mounting the Share, maybe this integration test should use that approach instead of shortcutting to use the Secret directly?

this was discussed in a since-resolved comment thread somewhere, but.... i don't want the insights operator team's testing to be dependent on Share behavior.

they have the ability to test their functionality end to end directly, so they should do that.

The team that owns the Shared-Resource driver+builds should have tests that ensure they can consume this content successfully at their end.

I think the bits involving the Share resource should ultimately live in the OpenShift build suite. Test plan as follows:

insights operator obtains SCA cert from OCM

insights operator creates Share resource and ClusterRoleBinding

ocp build suite creates a build that does a yum install of subscription-only content

wking · 2021-08-16T18:51:20Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+### SCA certs in API
+
+The SCA certificate is available via the `etc-pki-entitlement` secret in the `openshift-config-managed` namespace. The secret will be available for use in other namespaces by creating a cluster-scoped `Share` resource. Cluster admin creates a `clusterrolebinding` to allow a service account access to the `Share` resource.


Not covered in this sentence is the structure of the Secret. Will a particular well-known key be used? If so, can we document it here, or is that a low-enough level of detail that it's not worth including in the enhancement proposal?

IIRC the key name can be any .pem file. On a RHEL system the subscription cert key names have what appear to be a uuid for the file name.

Maybe we should mention it here. I am using kubernetes.io/tls secret with tls.crt for the certificate and tls.key for the private key.

wking · 2021-08-16T18:54:35Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+### Upgrade / Downgrade Strategy
+
+There is no upgrade/downgrade strategy needed.


If a cluster is downgraded to a version that does not poll for entitlement updates, will that version of the insights operator (4.8.z?) have logic around to remove the etc-pki-entitlement secret and other cruft to keep in-cluster components from trying to consume stale data? If the insights operator is downgraded before some consumer (builds and/or CSI drivers?), will the higher-version consumers gracefully handle the consumed secret's removal?

If a cluster is downgraded to a version that does not poll for entitlement updates, will that version of the insights operator (4.8.z?)

for tech preview it's not applicable since upgrade/downgrade isn't allowed, but in general if you downgrade below the level at which the insights operator has this behavior then yes, i'd expect the content to become stale (i'm not sure how long the tokens are good for)

i'm not sure how you'd propose to solve that on downgrade, though. the older version of the operator wouldn't even know about the content to remove it(even if we could agree that was the right thing to do, which i don't think i do)

Yes this sounds like an edge case to me and yes there will be a stale secret in such case.

enhancements/insights/pulling-sca-certs-from-ocm.md

adambkaplan · 2021-08-16T20:39:57Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+Risk: Insights Operator is unable to expose/update the data in the OpenShift API
+
+Mitigation: The Insights Operator is marked as Degraded (in case the HTTP code is lower than 200 or greater than 399 and is not equal to 404, because HTTP 404 means that the organization didn't allow this feature).


Degraded means the cluster can't upgrade. Are we sure we want this?

yes, we want it.

this is tech preview. we should absolutely gather data on whether this operator is ending up degraded during the tech preview period before making a final decision for GA

degraded doesn't block z-stream upgrades, and can be overridden for y-stream upgrades, the point is it's a conscious choice. I gave this (somewhat dubious) example in another comment thread on this EP: suppose you have pods that need this entitlement to do work at startup. Doing an upgrade is going to trigger all those pods to restart. You want to know, before you upgrade your cluster, that you don't have valid entitlements on your cluster, or all those pods are going to fail to restart after the node reboots triggered by the upgrade.

fundamentally this operator has a job to do, and if it's not doing the job, it should report degraded(after a reasonable period of time)

that said, going degraded should only happen after some number of retries or period of time. not on first failure.

and needs to take into account disconnected clusters (i.e. it shouldn't be degraded on a cluster w/ no network access)

adambkaplan · 2021-08-16T20:42:11Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+### SCA certs in API
+
+The SCA certificate is available via the `etc-pki-entitlement` secret in the `openshift-config-managed` namespace. The secret will be available for use in other namespaces by creating a cluster-scoped `Share` resource. Cluster admin creates a `clusterrolebinding` to allow a service account access to the `Share` resource.


IIRC the key name can be any .pem file. On a RHEL system the subscription cert key names have what appear to be a uuid for the file name.

adambkaplan · 2021-08-16T20:48:57Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+- `insights-operator-e2e-tests` suite can verify the SCA cert data
+  is available
+- Basic test of the validity of the SCA certs. Mount the `etc-pki-entitlement` secret and run e.g `yum install` in the container


I think the bits involving the Share resource should ultimately live in the OpenShift build suite. Test plan as follows:

insights operator obtains SCA cert from OCM

insights operator creates Share resource and ClusterRoleBinding

ocp build suite creates a build that does a yum install of subscription-only content

adambkaplan · 2021-08-16T20:49:17Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+### Graduation Criteria
+
+This feature is planned as a technical preview in OCP 4.9 and is planned to go GA in 4.10.


Note that the Share bits are now planned as tech preview for OCP 4.10

Does it mean that it will block the graduation criteria from the TP to GA mentioned here? If so then we would need to go GA with your bits...I guess

adambkaplan · 2021-08-16T20:50:45Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+#### Dev Preview -> Tech Preview
+- opt-in feature (called `InsightsOperatorPullingSCA`) enabled with `TechPreviewNoUpgrade` feature set
+- Insights Operator is able to download the certificates from OCM API and expose it in a cluster API
+- Insights Operator is marked as degraded in case of the number of unsuccessful requests to the OCM API exceeds defined threshold


Ditto my comments above - degraded means no cluster upgrades. Do we want this?

It appears that prior GH comments we reached consensus that Degraded is a desirable condition. I would like to see this justification reflected in the proposal itself.

enhancements/insights/pulling-sca-certs-from-ocm.md

bparees · 2021-08-16T20:46:13Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+- `insights-operator-e2e-tests` suite can verify the SCA cert data
+  is available
+- Basic test of the validity of the SCA certs. Mount the `etc-pki-entitlement` secret and run e.g `yum install` in the container


this was discussed in a since-resolved comment thread somewhere, but.... i don't want the insights operator team's testing to be dependent on Share behavior.

they have the ability to test their functionality end to end directly, so they should do that.

The team that owns the Shared-Resource driver+builds should have tests that ensure they can consume this content successfully at their end.

bparees · 2021-08-16T20:50:19Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+### Upgrade / Downgrade Strategy
+
+There is no upgrade/downgrade strategy needed.


If a cluster is downgraded to a version that does not poll for entitlement updates, will that version of the insights operator (4.8.z?)

for tech preview it's not applicable since upgrade/downgrade isn't allowed, but in general if you downgrade below the level at which the insights operator has this behavior then yes, i'd expect the content to become stale (i'm not sure how long the tokens are good for)

i'm not sure how you'd propose to solve that on downgrade, though. the older version of the operator wouldn't even know about the content to remove it(even if we could agree that was the right thing to do, which i don't think i do)

enhancements/insights/pulling-sca-certs-from-ocm.md

bparees · 2021-08-16T21:05:31Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+Risk: Insights Operator is unable to expose/update the data in the OpenShift API
+
+Mitigation: The Insights Operator is marked as Degraded (in case the HTTP code is lower than 200 or greater than 399 and is not equal to 404, because HTTP 404 means that the organization didn't allow this feature).


yes, we want it.

this is tech preview. we should absolutely gather data on whether this operator is ending up degraded during the tech preview period before making a final decision for GA

degraded doesn't block z-stream upgrades, and can be overridden for y-stream upgrades, the point is it's a conscious choice. I gave this (somewhat dubious) example in another comment thread on this EP: suppose you have pods that need this entitlement to do work at startup. Doing an upgrade is going to trigger all those pods to restart. You want to know, before you upgrade your cluster, that you don't have valid entitlements on your cluster, or all those pods are going to fail to restart after the node reboots triggered by the upgrade.

fundamentally this operator has a job to do, and if it's not doing the job, it should report degraded(after a reasonable period of time)

bparees · 2021-08-16T21:09:05Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+Risk: Insights Operator is unable to expose/update the data in the OpenShift API
+
+Mitigation: The Insights Operator is marked as Degraded (in case the HTTP code is lower than 200 or greater than 399 and is not equal to 404, because HTTP 404 means that the organization didn't allow this feature).


that said, going degraded should only happen after some number of retries or period of time. not on first failure.

bparees · 2021-08-16T21:09:54Z

enhancements/insights/pulling-sca-certs-from-ocm.md

+
+Risk: Insights Operator is unable to expose/update the data in the OpenShift API
+
+Mitigation: The Insights Operator is marked as Degraded (in case the HTTP code is lower than 200 or greater than 399 and is not equal to 404, because HTTP 404 means that the organization didn't allow this feature).


and needs to take into account disconnected clusters (i.e. it shouldn't be degraded on a cluster w/ no network access)

enhancements/insights/pulling-sca-certs-from-ocm.md

bparees · 2021-08-17T16:39:07Z

/approve

looks like there might be a few final items @adambkaplan would like to see explicitly stated in the EP, but in general i think the critical concerns are addressed and/or roadmapped.

openshift-ci · 2021-08-17T16:41:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, iNecas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [bparees]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tremes · 2021-08-18T13:06:35Z

Thanks @bparees. I added one more commit mentioning the reasons for the decision on the degraded status.

bparees · 2021-08-18T18:49:49Z

/lgtm

sbose78 · 2021-09-29T16:08:24Z

FYI, the consumption APIs are undergoing changes
openshift/csi-driver-shared-resource#55

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 9, 2021

openshift-ci-robot requested review from dustymabe and hardys March 9, 2021 13:54

tremes force-pushed the ocm_communication branch from 4de96ea to 4dd02df Compare March 9, 2021 14:07

sbose78 reviewed Mar 11, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

sbose78 suggested changes Mar 12, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

tremes changed the title ~~WIP Insights Operator pulling and exposing data from the OCM API~~ Insights Operator pulling and exposing data from the OCM API Mar 18, 2021

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2021

sbose78 reviewed Mar 18, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

sbose78 reviewed Apr 7, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

tremes commented Apr 8, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

tremes force-pushed the ocm_communication branch from 0078ca5 to c9ca705 Compare April 27, 2021 11:07

tremes commented Apr 27, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

sbose78 approved these changes Jun 22, 2021

View reviewed changes

openshift-ci bot assigned iNecas Jun 25, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2021

Serhii1011010 reviewed Jul 13, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

dhellmann reviewed Jul 20, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

tremes force-pushed the ocm_communication branch from c9ca705 to c969d7c Compare July 22, 2021 12:33

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021

mfojtik reviewed Jul 26, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

mfojtik reviewed Jul 26, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

bparees reviewed Aug 5, 2021

View reviewed changes

enhancements/insights/pulling-data-from-ocm.md Outdated Show resolved Hide resolved

Updates based on reviews - describe use of the SCA certs in a cluster

0b42cc6

bparees reviewed Aug 10, 2021

View reviewed changes

enhancements/insights/pulling-sca-certs-from-ocm.md Outdated Show resolved Hide resolved

bparees reviewed Aug 10, 2021

View reviewed changes

enhancements/insights/pulling-sca-certs-from-ocm.md Outdated Show resolved Hide resolved

Next update based on the feedback & review

5eeb9ce

sbose78 suggested changes Aug 11, 2021

View reviewed changes

enhancements/insights/pulling-sca-certs-from-ocm.md Outdated Show resolved Hide resolved

dhellmann reviewed Aug 11, 2021

View reviewed changes

tremes added 3 commits August 11, 2021 17:26

Next update based on the feedback & review

15ef7c0

Added new section talking about configuration

141a6ea

Define the conditions for the degraded state more explicitly

5e84453

wking reviewed Aug 16, 2021

View reviewed changes

enhancements/insights/pulling-sca-certs-from-ocm.md Show resolved Hide resolved

wking reviewed Aug 16, 2021

View reviewed changes

adambkaplan reviewed Aug 16, 2021

View reviewed changes

bparees reviewed Aug 16, 2021

View reviewed changes

Update

6089e73

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2021

Mention the reason for the degraded status

bcebd4c

openshift-ci bot assigned bparees Aug 18, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2021

openshift-merge-robot merged commit 2eda226 into openshift:master Aug 18, 2021


		### SCA certs in API

		The SCA certificate is available via the `etc-pki-entitlement` secret in the `openshift-config-managed` namespace. The secret will be available for use in other namespaces by creating a cluster-scoped `Share` resource. Cluster admin creates a `clusterrolebinding` to allow a service account access to the `Share` resource.


		### Upgrade / Downgrade Strategy

		There is no upgrade/downgrade strategy needed.


		Risk: Insights Operator is unable to expose/update the data in the OpenShift API

		Mitigation: The Insights Operator is marked as Degraded (in case the HTTP code is lower than 200 or greater than 399 and is not equal to 404, because HTTP 404 means that the organization didn't allow this feature).


		### Graduation Criteria

		This feature is planned as a technical preview in OCP 4.9 and is planned to go GA in 4.10.

Insights Operator pulling and exposing data from the OCM API #683

Insights Operator pulling and exposing data from the OCM API #683

Conversation

tremes commented Mar 9, 2021

openshift-ci-robot commented Apr 8, 2021

sbose78 commented Jun 7, 2021

tremes commented Jun 8, 2021

iNecas commented Jun 25, 2021

sbose78 commented Jun 25, 2021

sbose78 commented Jul 7, 2021

tremes commented Jul 7, 2021

petli-openshift commented Jul 16, 2021

sbose78 commented Aug 10, 2021

How does an entity make the SCA cert accessible to service accounts in other namespaces ?

Consumption by Pods

Consumption by Builds

sbose78 commented Aug 10, 2021

tremes commented Aug 10, 2021

sbose78 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bparees commented Aug 17, 2021

openshift-ci bot commented Aug 17, 2021

tremes commented Aug 18, 2021

bparees commented Aug 18, 2021

sbose78 commented Sep 29, 2021