Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: Service deregistration blocked by ACLs #19717

Closed
Rabbit-st opened this issue Nov 22, 2023 · 6 comments
Closed

agent: Service deregistration blocked by ACLs #19717

Rabbit-st opened this issue Nov 22, 2023 · 6 comments

Comments

@Rabbit-st
Copy link

Overview of the Issue

After upgrading from version 1.14.4 to version 1.16.2, there will be service health detection failures registered on some nodes every once in a while, and they will be restored to normal by restarting the consult server.
image

Reproduction Steps

I don't know how to reproduce it, but it appears every once in a while.

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = 
        revision = 68f81912
        version = 1.16.2
        version_metadata = 
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = 10.60.112.132:8300
        server = true
raft:
        applied_index = 314534
        commit_index = 314534
        fsm_pending = 0
        last_contact = 70.541794ms
        last_log_index = 314534
        last_log_term = 37
        last_snapshot_index = 311338
        last_snapshot_term = 37
        latest_configuration = [{Suffrage:Voter ID:cc0f834e-8c67-d394-344f-ee5331ea663a Address:10.60.238.199:8300} {Suffrage:Voter ID:f5ec631a-488d-37de-b2b2-7914cd030996 Address:10.60.112.132:8300} {Suffrage:Voter ID:a72236a8-a966-bc19-578b-65021b8f12ca Address:10.60.97.199:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 37
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 144
        max_procs = 4
        os = linux
        version = go1.20.8
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 9
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 77
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 40
        members = 3
        query_queue = 0
        query_time = 1
Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = 
        revision = 68f81912
        version = 1.16.2
        version_metadata = 
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = 10.60.112.132:8300
        server = true
raft:
        applied_index = 314534
        commit_index = 314534
        fsm_pending = 0
        last_contact = 70.541794ms
        last_log_index = 314534
        last_log_term = 37
        last_snapshot_index = 311338
        last_snapshot_term = 37
        latest_configuration = [{Suffrage:Voter ID:cc0f834e-8c67-d394-344f-ee5331ea663a Address:10.60.238.199:8300} {Suffrage:Voter ID:f5ec631a-488d-37de-b2b2-7914cd030996 Address:10.60.112.132:8300} {Suffrage:Voter ID:a72236a8-a966-bc19-578b-65021b8f12ca Address:10.60.97.199:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 37
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 144
        max_procs = 4
        os = linux
        version = go1.20.8
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 9
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 77
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 40
        members = 3
        query_queue = 0
        query_time = 1

Operating system and Environment details

Deploy using consult k8s

# consul-k8s status

==> Consul Status Summary
Name    Namespace       Status          Chart Version   AppVersion      Revision        Last Updated            
consul  consul          deployed        1.2.2           1.16.2          1               2023/10/12 10:32:19 CST

Log Fragments

2023-11-21T09:07:08.346Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:07:34.373Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:07:51.336Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:08:15.443Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:08:28.816Z [WARN]  agent: Node info update blocked by ACLs: node=cc0f834e-8c67-d394-344f-ee5331ea663a accessorID="anonymous token"
2023-11-21T09:08:28.817Z [WARN]  agent: Service deregistration blocked by ACLs: service=xxx_10.60.131.181_80 accessorID="anonymous token"
2023-11-21T09:08:28.817Z [WARN]  agent: Service deregistration blocked by ACLs: service=xxx_10.60.171.169_80 accessorID="anonymous token"
2023-11-21T09:08:28.817Z [WARN]  agent: Service deregistration blocked by ACLs: service=xxx_10.60.171.140_80 accessorID="anonymous token"
2023-11-21T09:08:28.818Z [WARN]  agent: Check deregistration blocked by ACLs: check=service:xxx_10.60.171.169_80 accessorID="anonymous token"
2023-11-21T09:08:28.818Z [WARN]  agent: Check deregistration blocked by ACLs: check=service:xxx_10.60.171.140_80 accessorID="anonymous token"
2023-11-21T09:08:28.818Z [WARN]  agent: Check deregistration blocked by ACLs: check=service:xxx_10.60.131.181_80 accessorID="anonymous token"
2023-11-21T09:08:33.416Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:08:54.231Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:09:23.554Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
@huikang
Copy link
Contributor

huikang commented Nov 22, 2023

Just want to gather more info to help us reproduce the issue:

  • are consul agents running in VM or K8s?
  • are the failed service instances (shown in the screenshot) registered at the server nodes?

@Rabbit-st
Copy link
Author

Rabbit-st commented Nov 22, 2023

Just want to gather more info to help us reproduce the issue:

  • are consul agents running in VM or K8s?
  • are the failed service instances (shown in the screenshot) registered at the server nodes?
  1. Agents run on k8s.
  2. Sorry, I got it wrong before. The scenario where an exception occurs is when the service has stopped, but the consult server will not automatically unregister the stopped service.

@huikang
Copy link
Contributor

huikang commented Nov 29, 2023

@Rabbit-st

Thanks for the updated info. Consul-k8s should handle deregistering service if you remove the service by kubectl delete

However, consul will not deregister the stopped service automatically since the service instance is stored in Consul's catalog. Consul won't route traffic to the failed instance, so the connection from downstream will be directed to healthy instances of the service.

Could you provide more details about the situation of stopped services? (is it stopped due to a true alarm or k8s node failure)

@AiJiangnan
Copy link

I have the same problem, and I fixed, service need a agent token to regiester and deregister.
https://developer.hashicorp.com/consul/docs/security/acl/tokens/create/create-an-agent-token

@Rabbit-st
Copy link
Author

I have the same problem, and I fixed, service need a agent token to regiester and deregister. https://developer.hashicorp.com/consul/docs/security/acl/tokens/create/create-an-agent-token

The token has been configured. Client issues, not supported consul 1.16.2. consul recovers after version degradation.

@MageshSrinivasulu
Copy link

MageshSrinivasulu commented Aug 5, 2024

@Rabbit-st Degraded to which version? Facing similar issues. Also please reopen this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants