Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(promtail): validate scrape_config job name #13719

Merged
merged 4 commits into from
Sep 18, 2024

Conversation

fgouteroux
Copy link
Contributor

What this PR does / why we need it:

Fix the panic error when there is duplicate job name defined in scrape config:

grafana-agent[2859804]: panic: send on closed channel
grafana-agent[2859804]: goroutine 350 [running]:
grafana-agent[2859804]: github.com/grafana/loki/clients/pkg/logentry/stages.(*Pipeline).Wrap.func1()
grafana-agent[2859804]:         /go/pkg/mod/github.com/grafana/[email protected]/clients/pkg/logentry/stages/pipeline.go:163 +0x1e5
grafana-agent[2859804]: created by github.com/grafana/loki/clients/pkg/logentry/stages.(*Pipeline).Wrap in goroutine 1
grafana-agent[2859804]:         /go/pkg/mod/github.com/grafana/[email protected]/clients/pkg/logentry/stages/pipeline.go:150 +0x137

Which issue(s) this PR fixes:
n/a

Special notes for your reviewer:

I've also tested the code in branch k190 (removing v3 version in import statement) and I built the grafana-agent with it.
In case of duplicate job_name it failed at the startup with this error message:

ts=2024-07-31T08:24:21.612580848Z caller=main.go:78 level=error msg="error creating the agent server entrypoint" err="unable to apply config for global_grafana_promtail: unable to create logs instance: `job_name` must be unique for each scrape_config, a duplicate `job_name` of agent was found"

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@fgouteroux fgouteroux requested a review from a team as a code owner July 31, 2024 09:06
Signed-off-by: François Gouteroux <[email protected]>
@fgouteroux fgouteroux force-pushed the fix/validate_scrape_config_job_name branch from 4d0a9e3 to c020273 Compare July 31, 2024 09:13
@cstyan
Copy link
Contributor

cstyan commented Aug 5, 2024

I think this is a good thing to fix, but imo we should do it as early as possible. We can reuse the same model Prometheus does and override the structs yaml unmarshal function: https://github.com/prometheus/prometheus/blob/main/config/config.go#L371-L382

diff --git a/clients/pkg/promtail/config/config.go b/clients/pkg/promtail/config/config.go
index 615b8e9ab..abe7ed9d2 100644
--- a/clients/pkg/promtail/config/config.go
+++ b/clients/pkg/promtail/config/config.go
@@ -40,6 +40,29 @@ type Config struct {
 	WAL             wal.Config            `yaml:"wal"`
 }

+// UnmarshalYAML implements the yaml.Unmarshaler interface.
+func (c *Config) UnmarshalYAML(unmarshal func(interface{}) error) error {
+	fmt.Println("unmarshalling config")
+	*c = Config{}
+	// We want to set c to the defaults and then overwrite it with the input.
+	// To make unmarshal fill the plain data struct rather than calling UnmarshalYAML
+	// again, we have to hide it using a type indirection.
+	type plain Config
+	if err := unmarshal((*plain)(c)); err != nil {
+		return err
+	}
+
+	jobNames := map[string]struct{}{}
+	for _, j := range c.ScrapeConfig {
+		fmt.Println("jobname: ", j.JobName)
+		if _, ok := jobNames[j.JobName]; ok {
+			return fmt.Errorf("found multiple scrape configs with job name %q", j.JobName)
+		}
+		jobNames[j.JobName] = struct{}{}
+	}
+	return nil
+}
+
 // RegisterFlags with prefix registers flags where every name is prefixed by
 // prefix. If prefix is a non-empty string, prefix should end with a period.
 func (c *Config) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) {
diff --git a/clients/pkg/promtail/config/config_test.go b/clients/pkg/promtail/config/config_test.go
index 32bab7050..3f764666f 100644
--- a/clients/pkg/promtail/config/config_test.go
+++ b/clients/pkg/promtail/config/config_test.go
@@ -47,12 +47,47 @@ clients:
       name: value
 `

+const testDuplicateJobsFile = `
+clients:
+  - external_labels:
+      cluster: dev1
+    url: https://1:[email protected]/loki/api/v1/push
+  - external_labels:
+      cluster: prod1
+    url: https://1:[email protected]/loki/api/v1/push
+scrape_configs:
+  - job_name: kubernetes-pods-name
+    kubernetes_sd_configs:
+      - role: pod
+  - job_name: system
+    static_configs:
+    - targets:
+      - localhost
+      labels:
+        job: varlogs
+  - job_name: system
+    static_configs:
+    - targets:
+      - localhost
+      labels:
+        job: varlogs2
+limits_config:
+  readline_rate: 100
+  readline_burst: 200
+`
+
 func Test_Load(t *testing.T) {
 	var dst Config
 	err := yaml.Unmarshal([]byte(testFile), &dst)
 	require.Nil(t, err)
 }

+func Test_Load_Duplicates(t *testing.T) {
+	var dst Config
+	err := yaml.Unmarshal([]byte(testDuplicateJobsFile), &dst)
+	require.Nil(t, err)
+}
+
 func TestHeadersConfigLoad(t *testing.T) {
 	var dst Config
 	err := yaml.Unmarshal([]byte(headersTestFile), &dst)

@grafana/agent-squad does this seem like the right fix to you guys?

@ptodev
Copy link
Contributor

ptodev commented Aug 19, 2024

Hi, I gave the PR a ✅ since I don't mind the fix staying like this, but I do agree that checking the config during the unmarshal stage would be better. @fgouteroux, would it be possible for you to do the check during the unmarshal?

@@ -139,10 +140,16 @@ func (p *Promtail) reloadConfig(cfg *config.Config) error {
}

cfg.Setup(p.logger)
var err error
err = scrapeconfig.ValidateJobName(cfg.ScrapeConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only need the check during a config reload? Is it not also necessary when Promtail starts?

ptodev

This comment was marked as outdated.

Copy link
Contributor

@ptodev ptodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove the ✅ until we clarify how we'd like to proceed 😄

@fgouteroux
Copy link
Contributor Author

Do you want I implement what @cstyan suggest ?

@ptodev
Copy link
Contributor

ptodev commented Aug 27, 2024

Do you want I implement what @cstyan suggest ?

@fgouteroux yes, it sounds reasonable.

@fgouteroux
Copy link
Contributor Author

Do you want I implement what @cstyan suggest ?

@fgouteroux yes, it sounds reasonable.

@cstyan implementation done

@fgouteroux fgouteroux requested a review from ptodev August 28, 2024 08:59
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgouteroux sorry, I was out on vacation

LGTM, thanks!

@cstyan cstyan merged commit f2d3499 into grafana:main Sep 18, 2024
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants