Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(operator): Allow structured metadata only if V13 schema provided #13463

Merged
merged 8 commits into from
Jul 16, 2024

Conversation

periklis
Copy link
Collaborator

@periklis periklis commented Jul 9, 2024

What this PR does / why we need it:
With #13422 the Loki configuration defaults to allow_structured_metadata: true including runtime validation to have at least one V13 schema defined. However, existing LokiStack resource might not include a V13 schema yet running into a crashloop situation.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@periklis periklis self-assigned this Jul 9, 2024
@periklis periklis requested review from xperimental and a team as code owners July 9, 2024 12:29
@xperimental
Copy link
Collaborator

I don't think this is the correct fix for us, because we want the users to be able to use structured metadata at some point.

The validation code seems to only check the "currently active schema", which means that this validation still triggers even when the "future schemas" contain a valid configuration.

I think how the content of the error is to be interpreted is:

  • "if you don't want structured metadata at all" -> "set allow_structured_metadata: false"
  • "if you want structured metadata" -> "add a valid schema configuration to your Loki configuration and set -validation.allow-structured-metadata=false (CLI argument)

I think for us the correct handling is the second case. We already produce a warning that the user should update to v13, so they know that we want them to use v13. We should probably add an explanation that switching to v13 is a requirement for structured metadata somewhere.

For me using allow_structured_metadata looks like we would need to switch the configuration once the new schema version becomes active, but I might be misunderstanding the situation. I also have not found/looked at the ingestion code yet to see how it handles incoming structured metadata in the case of an unsupported schema version (I would assume it rejects it with a 4xx error).

@periklis periklis force-pushed the structured-metadata-on-v13-only branch from 8d00c5d to 87c38a2 Compare July 10, 2024 06:16
Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm confused... I don’t get why we set both the loki-config.yaml + CLI Arg when we want to allow structured metadata enabled 🤔

operator/internal/manifests/storage/configure.go Outdated Show resolved Hide resolved
@xperimental
Copy link
Collaborator

I'm similarly confused. I was more thinking of something like this when I wrote the comment yesterday.

@periklis
Copy link
Collaborator Author

I think I'm confused... I don’t get why we set both the loki-config.yaml + CLI Arg when we want to allow structured metadata enabled 🤔

@JoaoBraveCoding @xperimental I agree with both of you that disabling the validation is enough, but since we allow only valid schemas via the LokiStack CRD, I though we should set allow_structured_metadata: false when we only have v11 and v12 on LokiStack. This will make Push-requests with structured metadata to the distributor fail as BadRequests IIRC. I am thinking more of this distributor code: https://github.com/grafana/loki/blob/main/pkg/distributor/validator.go#L116-L124

Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still very confused... isn't allow_structured_metadata on the config and the CLI arg -validation.allow-structured-metadata=false the same thing? So currently:

  • v11 || v12 we don't set a CLI but we set allow_structured_metadata: false
  • v13 we set both the CLI -validation.allow-structured-metadata=false && allow_structured_metadata: true which seems contradictory.

Also by looking at the distributor code, it seems allow_structured_metadata: true is required to ingest structured metadata no? Otherwise, we will always be landing here

Do we have steps/instructions for sending structured metadata for Loki to test this PR?

@periklis
Copy link
Collaborator Author

  • v13 we set both the CLI -validation.allow-structured-metadata=false && allow_structured_metadata: true which seems contradictory.

Nope because the validation code is doing this over the "active schemas". we can define v13 into the future and still want to have allow to true and validation to false, else the pods crash.

@xperimental
Copy link
Collaborator

Tested this with a client actually trying to send structured metadata and, in the end, it turns out that I did not understand what -validation.allow-structured-metadata=false does correctly: it does not disable the validation, but sets the allow-structured-metadata limit globally, effectively disabling structured metadata.

For me it looks as if we need to duplicate the validation logic that Loki does during startup in the operator to set the correct allow_structured_metadata for the currently active schema. This means that we need to update this over time, so that when the v13 schema becomes active the configuration is updated. This does not need to be immediate just "close".

@JoaoBraveCoding suggested that there is probably an interval after which the operator will re-reconcile the LokiStack again (I remember this being a thing called "ResyncPeriod" in the informers). If this is the case then we probably would not need to add extra logic to the operator to requeue the LokiStack after a timeout, because it might happen often enough for our "at most daily" configuration change.

@JoaoBraveCoding
Copy link
Collaborator

I was trying to dig this out and found that maybe it makes more sense to use reconcile.Result{RequeueAfter: t} got this idea from https://github.com/kubernetes-sigs/controller-runtime/blob/a0c9fd9d3f310f48155ce985366b21914675fbea/pkg/cache/cache.go#L162-L170

@xperimental
Copy link
Collaborator

Seemed to work fine in our "overnight test", so 👍

@periklis periklis merged commit 3ac130b into grafana:main Jul 16, 2024
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants