Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STRMCMP-944] Add proper validation for the sysctls field in securityContext #204

Merged
merged 1 commit into from
Jun 18, 2020

Conversation

mwylde
Copy link
Contributor

@mwylde mwylde commented Jun 18, 2020

A user created a flink config that had a map where an array was expected in the sysctls field of securityContext:

i.e., they had:

  securityContext:
    sysctls:
      net.ipv4.tcp_keepalive_time: "900"
      net.ipv4.tcp_retries2: "8"

when it should have been:

  securityContext:
    sysctls:
      - name: net.ipv4.tcp_keepalive_time
        value: "900"
      - name: net.ipv4.tcp_retries2
        value: "8"

This managed to take down the operator, which was spending all of its time logging this error:

E0507 00:24:06.711346       1 reflector.go:126] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1beta1.FlinkApplication: v1beta1.FlinkApplicationList.Items: []v1beta1.FlinkApplication: v1beta1.FlinkApplication.Spec: v1beta1.FlinkApplicationSpec.SecurityContext: v1.PodSecurityContext.Sysctls: []v1.Sysctl: decode slice: expect [ or n, but found {, error found in #10 byte of ...|sysctls":{"net.ipv4.|..., bigger context ...|,"savepointInfo":{},"securityContext":{"sysctls":{"net.ipv4.tcp_keepalive_time":"900","net.ipv4.tcp_|...

This PR adds validation of the sysctls field to the CRD to prevent such configs from being accepted by the API.

With this change, users will get the following error when they try to create the bad config above:

The FlinkApplication "operator-test-app" is invalid: spec.securityContext.sysctls: Invalid value: "object": spec.securityContext.sysctls in body must be of type array: "object"

The deeper issue here is that in the source code we re-use the existing Kubernetes API definitions for objects like securityContext. But we're prevented by kubernetes/kubernetes#62872 from also using the existing schemas, so we are forced to duplicate them. When we miss fields like we did here, it creates the potential for users to submit invalid configuration that can take down the operator.

@mwylde mwylde merged commit e169e6f into master Jun 18, 2020
@mwylde mwylde deleted the micah_securitycontext_validation branch June 18, 2020 23:08
vDMG added a commit to vDMG/flinkk8soperator that referenced this pull request Jun 19, 2020
Add proper validation for the sysctls field in securityContext (lyft#204)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants