Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/postgresql] -- "Null values not handled in colum client_addr" #33107

Open
adussarps opened this issue May 17, 2024 · 5 comments
Open
Labels
bug Something isn't working receiver/postgresql

Comments

@adussarps
Copy link

adussarps commented May 17, 2024

Component(s)

receiver/postgresql

What happened?

Description

I've recently initiated monitoring on our SQL databases hosted on OVH using the PostgreSQL receiver. While the receiver successfully establishes a connection to the database, it encounters an error during the scraping process, resulting in failed metrics retrieval.

scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "postgresql", "data_type": "metrics", "error": "sql: Scan error on column index 0, name \"client_addr\": converting NULL to string is unsupported", "scraper": "postgresql"}

Note that there is indeed a null value in the pg_stat_activity; 'client_addr' column.

Expected Result

Ideally, the collector should be capable of gracefully handling rows containing NULL values, thereby bypassing them during the scraping process rather than halting due to such instances.

This issue impedes our ability to effectively monitor our databases and necessitates a resolution to ensure seamless metric collection.

Collector version

v0.88.0

PostgreSQL - version 15

Environment information

Environment

Installed via Helm version v0.99.0 on Kubernetes

OpenTelemetry Collector configuration

extensions:
    health_check: {}
    basicauth/grafana_cloud_tempo:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/basicauthextension
      client_auth:
        username: XX
        password: XX
    basicauth/grafana_cloud_prometheus:
      client_auth:
        username: XX
        password: XX
    basicauth/grafana_cloud_loki:
      client_auth:
        username: XX
        password: XX

  receivers:
    postgresql:
      endpoint: "postgresql-f32d4eca-o64f91e18.database.cloud.ovh.net:20184"
      username: avnadmin
      password: XXXXX
      databases: []
    hostmetrics:
      # Optional. Host Metrics Receiver added as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver
      scrapers:
        load:
        memory:

  processors:
    resource:
      attributes:
      - action: insert
        key: service_name
        from_attribute: service.name
      - action: insert
        key: service_name
        from_attribute: k8s.deployment.name
      - action: insert
        key: service_name
        from_attribute: k8s.service.name
      - action: insert
        key: service_namespace
        from_attribute: service.namespace
      - action: insert
        key: service_namespace
        from_attribute: k8s.namespace.name
      - action: insert
        key: service_version
        from_attribute: service.version
      - action: insert
        key: deployment_environment
        from_attribute: deployment.environment
      - action: insert
        key: environment
        value: staging
      - action: insert
        key: project
        value: xano-weweb
      - action: insert
        key: loki.resource.labels
        value: service_name,service_namespace,service_version,deployment_environment,project,environment
    resourcedetection:
      # Enriches telemetry data with resource information from the host
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
      detectors: ["env", "system"]
      override: false
    transform/add_resource_attributes_as_metric_attributes:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
      error_mode: ignore
      metric_statements:
        - context: datapoint
          statements:
            - set(attributes["namespace"], resource.attributes["k8s.namespace.name"])
            - set(attributes["container"], resource.attributes["k8s.container.name"])
            - set(attributes["pod"], resource.attributes["k8s.pod.name"])

  exporters:
    debug:
      verbosity: normal
      sampling_initial: 5
      sampling_thereafter: 200
    otlp/grafana_cloud_traces:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
      endpoint: "tempo-prod-10-prod-eu-west-2.grafana.net:443"
      auth:
        authenticator: basicauth/grafana_cloud_tempo

    loki/grafana_cloud_logs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/lokiexporter
      endpoint: "https://logs-prod-012.grafana.net/loki/api/v1/push"
      auth:
        authenticator: basicauth/grafana_cloud_loki

    prometheusremotewrite/grafana_cloud_metrics:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter
      endpoint: XXXX
      resource_to_telemetry_conversion:
        enabled: true
      add_metric_suffixes: false
      auth:
        authenticator: basicauth/grafana_cloud_prometheus


  service:
    extensions: [health_check, basicauth/grafana_cloud_tempo, basicauth/grafana_cloud_prometheus, basicauth/grafana_cloud_loki]
    pipelines:
      traces:
        receivers: [otlp]
        processors: [resourcedetection]
        exporters: [otlp/grafana_cloud_traces]
      metrics:
        receivers: [otlp, hostmetrics, postgresql]
        processors: [resourcedetection, transform/add_resource_attributes_as_metric_attributes]
        exporters: [prometheusremotewrite/grafana_cloud_metrics]
      logs:
        receivers: [otlp]
        processors: [resourcedetection, resource]
        exporters: [loki/grafana_cloud_logs]

Log output

2024-05-17T09:45:45.624Z	info	[email protected]/service.go:99	Setting up own telemetry...
2024-05-17T09:45:45.625Z	info	[email protected]/telemetry.go:103	Serving metrics	{"address": "10.2.0.80:8888", "level": "Normal"}
2024-05-17T09:45:45.628Z	info	[email protected]/exporter.go:44	using the new Loki exporter	{"kind": "exporter", "data_type": "logs", "name": "loki/grafana_cloud_logs"}
2024-05-17T09:45:45.628Z	warn	[email protected]/scraper.go:85	Feature gate receiver.postgresql.separateSchemaAttr is not enabled. Please see the README for more information: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.88.0/receiver/postgresqlreceiver/README.md	{"kind": "receiver", "name": "postgresql", "data_type": "metrics"}
2024-05-17T09:45:45.630Z	info	[email protected]/service.go:166	Starting otelcol-contrib...	{"Version": "0.99.0", "NumCPU": 2}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:34	Starting extensions...
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:37	Extension is starting...	{"kind": "extension", "name": "basicauth/grafana_cloud_loki"}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:52	Extension started.	{"kind": "extension", "name": "basicauth/grafana_cloud_loki"}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:37	Extension is starting...	{"kind": "extension", "name": "basicauth/grafana_cloud_prometheus"}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:52	Extension started.	{"kind": "extension", "name": "basicauth/grafana_cloud_prometheus"}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:37	Extension is starting...	{"kind": "extension", "name": "basicauth/grafana_cloud_tempo"}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:52	Extension started.	{"kind": "extension", "name": "basicauth/grafana_cloud_tempo"}
2024-05-17T09:45:45.630Z	info	extensions/extensions.go:37	Extension is starting...	{"kind": "extension", "name": "health_check"}
2024-05-17T09:45:45.630Z	info	[email protected]/healthcheckextension.go:35	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"10.2.0.80:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-05-17T09:45:45.634Z	info	extensions/extensions.go:52	Extension started.	{"kind": "extension", "name": "health_check"}
2024-05-17T09:45:45.635Z	info	internal/resourcedetection.go:125	began detecting resource information	{"kind": "processor", "name": "resourcedetection", "pipeline": "metrics"}
2024-05-17T09:45:45.635Z	info	system/system.go:209	This attribute changed from int to string. Temporarily switch back to int using the feature gate.	{"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "attribute": "host.cpu.family", "feature gate": "processor.resourcedetection.hostCPUModelAndFamilyAsString"}
2024-05-17T09:45:45.635Z	info	system/system.go:228	This attribute changed from int to string. Temporarily switch back to int using the feature gate.	{"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "attribute": "host.cpu.model.id", "feature gate": "processor.resourcedetection.hostCPUModelAndFamilyAsString"}
2024-05-17T09:45:45.635Z	info	system/system.go:247	This attribute will change from int to string. Switch now using the feature gate.	{"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "attribute": "host.cpu.stepping", "feature gate": "processor.resourcedetection.hostCPUSteppingAsString"}
2024-05-17T09:45:45.635Z	info	internal/resourcedetection.go:139	detected resource information	{"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "resource": {"host.name":"my-opentelemetry-collector-9c7d446cb-g8mr7","os.type":"linux"}}
2024-05-17T09:45:45.638Z	info	[email protected]/otlp.go:102	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.2.0.80:4317"}
2024-05-17T09:45:45.638Z	info	[email protected]/otlp.go:152	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.2.0.80:4318"}
2024-05-17T09:45:45.719Z	info	[email protected]/receiver.go:63	Starting shared informers and wait for initial cache sync.	{"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-05-17T09:45:45.720Z	info	[email protected]/receiver.go:74	Object Receiver started	{"kind": "receiver", "name": "k8sobjects", "data_type": "logs"}
2024-05-17T09:45:45.720Z	info	[email protected]/receiver.go:101	Started collecting	{"kind": "receiver", "name": "k8sobjects", "data_type": "logs", "gvr": "events.k8s.io/v1, Resource=events", "mode": "watch", "namespaces": []}
2024-05-17T09:45:45.720Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "ready"}
2024-05-17T09:45:45.720Z	info	[email protected]/service.go:192	Everything is ready. Begin running and processing data.
2024-05-17T09:45:45.720Z	warn	localhostgate/featuregate.go:63	The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.	{"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-05-17T09:45:45.820Z	info	[email protected]/receiver.go:84	Completed syncing shared informer caches.	{"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-05-17T09:45:46.800Z	error	scraperhelper/scrapercontroller.go:197	Error scraping metrics	{"kind": "receiver", "name": "postgresql", "data_type": "metrics", "error": "sql: Scan error on column index 0, name \"client_addr\": converting NULL to string is unsupported", "scraper": "postgresql"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:169
2024-05-17T09:45:56.767Z	error	scraperhelper/scrapercontroller.go:197	Error scraping metrics	{"kind": "receiver", "name": "postgresql", "data_type": "metrics", "error": "sql: Scan error on column index 0, name \"client_addr\": converting NULL to string is unsupported", "scraper": "postgresql"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
	go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:173

Additional context

No response

@adussarps adussarps added bug Something isn't working needs triage New item requiring triage labels May 17, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

Since client_addr is a common resource attribute, I think we should use a default value (e.g. "unknown") when it is null or empty.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label May 17, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 17, 2024
djaglowski pushed a commit that referenced this issue Aug 7, 2024
…gresql replication (#34456)

**Description:**
When monitoring a postgresql system with replication over unix sockets,
the receiver was failing to scrape metrics due to an unexpected null
value. As the postgres
[documentation](https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-VIEW)
states, the client_addr will have a null value if replication is being
done over unix sockets.

**Link to tracking Issue:**
#33107

**Testing:** Local testing was done. The existing tests do not cover
replication metrics, and after some time attempting to update the
integration tests to provide a dockerized replication setup using unix
sockets in testcontainers I had to admit defeat. I found two sources
online for others setting up dockerized replication but I was not able
to get it fully functioning using unix sockets
(https://stackoverflow.com/questions/75524147/create-postgres-replica-container
and
https://github.com/eremeykin/pg-primary-replica/blob/main/docker-compose.yaml)

**Documentation:** Documentation already claimed that unix sockets would
show the work 'unix' for the replication_client attribute. This was
previously incorrect (as it would break collection), but now will be
accurate.
@bamaas
Copy link

bamaas commented Sep 11, 2024

Any solution/workaround for this?

@github-actions github-actions bot removed the Stale label Sep 12, 2024
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…gresql replication (open-telemetry#34456)

**Description:**
When monitoring a postgresql system with replication over unix sockets,
the receiver was failing to scrape metrics due to an unexpected null
value. As the postgres
[documentation](https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-VIEW)
states, the client_addr will have a null value if replication is being
done over unix sockets.

**Link to tracking Issue:**
open-telemetry#33107

**Testing:** Local testing was done. The existing tests do not cover
replication metrics, and after some time attempting to update the
integration tests to provide a dockerized replication setup using unix
sockets in testcontainers I had to admit defeat. I found two sources
online for others setting up dockerized replication but I was not able
to get it fully functioning using unix sockets
(https://stackoverflow.com/questions/75524147/create-postgres-replica-container
and
https://github.com/eremeykin/pg-primary-replica/blob/main/docker-compose.yaml)

**Documentation:** Documentation already claimed that unix sockets would
show the work 'unix' for the replication_client attribute. This was
previously incorrect (as it would break collection), but now will be
accurate.
@djaglowski
Copy link
Member

Reviewing the thread, I think we just need a PR which sets a default value when client_addr is null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/postgresql
Projects
None yet
Development

No branches or pull requests

4 participants