Skip to content

Latest commit

 

History

History
638 lines (367 loc) · 41 KB

CHANGELOG-3.2.md

File metadata and controls

638 lines (367 loc) · 41 KB

Previous change logs can be found at CHANGELOG-3.1.

v3.2.24 (TBD 2018-07)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Improved

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

  • Add etcd_server_slow_read_indexes_total Prometheus metric.
  • Add etcd_server_quota_backend_bytes Prometheus metric.
    • Use it with etcd_mvcc_db_total_size_in_bytes and etcd_mvcc_db_total_size_in_use_in_bytes.
    • etcd_server_quota_backend_bytes 2.147483648e+09 means current quota size is 2 GB.
    • etcd_mvcc_db_total_size_in_bytes 20480 means current physically allocated DB size is 20 KB.
    • etcd_mvcc_db_total_size_in_use_in_bytes 16384 means future DB size if defragment operation is complete.
    • etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes is the number of bytes that can be saved on disk with defragment operation.
  • Add etcd_mvcc_db_total_size_in_bytes Prometheus metric.
  • Add etcd_mvcc_db_total_size_in_use_in_bytes Prometheus metric.
    • Use it with etcd_mvcc_db_total_size_in_bytes and etcd_mvcc_db_total_size_in_use_in_bytes.
    • etcd_server_quota_backend_bytes 2.147483648e+09 means current quota size is 2 GB.
    • etcd_mvcc_db_total_size_in_bytes 20480 means current physically allocated DB size is 20 KB.
    • etcd_mvcc_db_total_size_in_use_in_bytes 16384 means future DB size if defragment operation is complete.
    • etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes is the number of bytes that can be saved on disk with defragment operation.

gRPC Proxy

Go

v3.2.23 (2018-06-15)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Improved

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

Go

v3.2.22 (2018-06-06)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Security, Authentication

  • Support TLS cipher suite whitelisting.

Go

v3.2.21 (2018-05-31)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

  • Fix auth storage panic when simple token provider is disabled.
  • Fix mvcc server panic from restore operation.
    • Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, etcd server panicked during snapshot restore operation.
    • Now, this server-side panic has been fixed.

Go

v3.2.20 (2018-05-09)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

  • Purge old *.snap.db snapshot files.
    • Previously, etcd did not respect --max-snapshots flag to purge old *.snap.db files.
    • Now, etcd purges old *.snap.db files to keep maximum --max-snapshots number of files on disk.

Go

v3.2.19 (2018-04-24)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

Security, Authentication

  • Fix TLS reload when certificate SAN field only includes IP addresses but no domain names.
    • In Go, server calls (*tls.Config).GetCertificate for TLS reload if and only if server's (*tls.Config).Certificates field is not empty, or (*tls.ClientHelloInfo).ServerName is not empty with a valid SNI from the client. Previously, etcd always populates (*tls.Config).Certificates on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger (*tls.Config).GetCertificate to reload TLS assets.
    • However, a certificate whose SAN field does not include any domain names but only IP addresses would request *tls.ClientHelloInfo with an empty ServerName field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online.
    • Now, (*tls.Config).Certificates is created empty on initial TLS client handshake, first to trigger (*tls.Config).GetCertificate, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).

etcd server

  • Add etcd --initial-election-tick-advance flag to configure initial election tick fast-forward.
    • By default, etcd --initial-election-tick-advance=true, then local member fast-forwards election ticks to speed up "initial" leader election trigger.
    • This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
    • Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
    • However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
    • Now, this can be disabled by setting --initial-election-tick-advance=false.
    • Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring --initial-election-tick-advance at the cost of slow initial bootstrap.
    • If single-node, it advances ticks regardless.
    • Address disruptive rejoining follower node.

Go

v3.2.18 (2018-03-29)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Improved

  • Adjust election timeout on server restart to reduce disruptive rejoining servers.
    • Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
    • Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

Go

v3.2.17 (2018-03-08)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

Proxy v2

Go

v3.2.16 (2018-02-12)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

  • Fix mvcc "unsynced" watcher restore operation.
    • "unsynced" watcher is watcher that needs to be in sync with events that have happened.
    • That is, "unsynced" watcher is the slow watcher that was requested on old revision.
    • "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
    • Which possibly causes missing events from "unsynced" watchers.
    • A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.

Go

v3.2.15 (2018-01-22)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

Go

v3.2.14 (2018-01-11)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Improved

etcd server

Go

v3.2.13 (2018-01-02)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

Go

v3.2.12 (2017-12-20)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Dependency

etcd server

client v3

Go

v3.2.11 (2017-12-05)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Dependency

Security, Authentication

See security doc for more details.

client v3

Documentation

  • Remove --listen-metrics-urls flag in monitoring document (non-released in v3.2.x, planned for v3.3.x).

Go

v3.2.10 (2017-11-16)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Dependency

Security, Authentication

See security doc for more details.

  • Revert discovery SRV auth ServerName with *.{ROOT_DOMAIN} to support non-wildcard subject alternative names in the certs (see issue #8445 for more contexts).
    • For instance, etcd --discovery-srv=etcd.local will only authenticate peers/clients when the provided certs have root domain etcd.local (not *.etcd.local) as an entry in Subject Alternative Name (SAN) field.

etcd server

client v3

Go

v3.2.9 (2017-10-06)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Security, Authentication

See security doc for more details.

  • Update golang.org/x/crypto/bcrypt (see golang/crypto@6c586e1).
  • Fix discovery SRV bootstrapping to authenticate ServerName with *.{ROOT_DOMAIN}, in order to support sub-domain wildcard matching (see issue #8445 for more contexts).
    • For instance, etcd --discovery-srv=etcd.local will only authenticate peers/clients when the provided certs have root domain *.etcd.local as an entry in Subject Alternative Name (SAN) field.

Go

v3.2.8 (2017-09-29)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

client v2

  • Fix v2 client failover to next endpoint on mutable operation.

gRPC Proxy

Go

v3.2.7 (2017-09-01)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Security, Authentication

client v3

Go

v3.2.6 (2017-08-21)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

  • Fix watch restore from snapshot.
  • Fix multiple URLs for --listen-peer-urls flag.
  • Add --enable-pprof flag to etcd configuration file format.

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

  • Fix etcd_debugging_mvcc_keys_total inconsistency.

Go

v3.2.5 (2017-08-04)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcdctl v3

  • Return non-zero exit code on unhealthy endpoint health.

Security, Authentication

See security doc for more details.

  • Server supports reverse-lookup on wildcard DNS SAN. For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server first reverse-lookups the remote IP address to get a list of names mapping to that address (e.g. nslookup IPADDR). Then accepts the connection if those names have a matching name with peer cert's DNS names (either by exact or wildcard match). If none is matched, server forward-lookups each DNS entry in peer cert (e.g. look up example.default.svc when the entry is *.example.default.svc), and accepts connection only when the host's resolved addresses have the matching IP address with the peer's remote IP address. For example, peer B's CSR (with cfssl) SAN field is ["*.example.default.svc", "*.example.default.svc.cluster.local"] when peer B's remote IP address is 10.138.0.2. When peer B tries to join the cluster, peer A reverse-lookup the IP 10.138.0.2 to get the list of host names. And either exact or wildcard match the host names with peer B's cert DNS names in Subject Alternative Name (SAN) field. If none of reverse/forward lookups worked, it returns an error "tls: "10.138.0.2" does not match any of DNSNames ["*.example.default.svc","*.example.default.svc.cluster.local"]. See issue#8268 for more detail.

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

  • Fix unreachable /metrics endpoint when --enable-v2=false.

gRPC Proxy

Other

  • Add container registry gcr.io/etcd-development/etcd.

Go

v3.2.4 (2017-07-19)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

  • Do not block on active client stream when stopping server

gRPC proxy

  • Fix gRPC proxy Snapshot RPC error handling

Go

v3.2.3 (2017-07-14)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

client v3

  • Let clients establish unlimited streams

Other

  • Tag docker images with minor versions
    • e.g. docker pull quay.io/coreos/etcd:v3.2 to fetch latest v3.2 versions

Go

v3.2.2 (2017-07-07)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Improved

  • Rate-limit lease revoke on expiration.
  • Extend leases on promote to avoid queueing effect on lease expiration.

Security, Authentication

See security doc for more details.

  • Server accepts connections if IP matches, without checking DNS entries. For instance, if peer cert contains IP addresses and DNS names in Subject Alternative Name (SAN) field, and the remote IP address matches one of those IP addresses, server just accepts connection without further checking the DNS names. For example, peer B's CSR (with cfssl) SAN field is ["invalid.domain", "10.138.0.2"] when peer B's remote IP address is 10.138.0.2 and invalid.domain is a invalid host. When peer B tries to join the cluster, peer A successfully authenticates B, since Subject Alternative Name (SAN) field has a valid matching IP address. See issue#8206 for more detail.

etcd server

  • Accept connection with matched IP SAN but no DNS match.
    • Don't check DNS entries in certs if there's a matching IP.

gRPC gateway

  • Use user-provided listen address to connect to gRPC gateway.
    • net.Listener rewrites IPv4 0.0.0.0 to IPv6 [::], breaking IPv6 disabled hosts.
    • Only v3.2.0, v3.2.1 are affected.

Go

v3.2.1 (2017-06-23)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

etcd server

  • Fix backend database in-memory index corruption issue on restore (only 3.2.0 is affected).

gRPC gateway

  • Fix Txn marshaling.

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

  • Fix backend database size debugging metrics.

Go

v3.2.0 (2017-06-09)

See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.

Improved

  • Improve backend read concurrency.

Breaking Changes

  • Increased --snapshot-count default value from 10,000 to 100,000.
    • Higher snapshot count means it holds Raft entries in memory for longer before discarding old entries.
    • It is a trade-off between less frequent snapshotting and higher memory usage.
    • User lower --snapshot-count value for lower memory usage.
    • User higher --snapshot-count value for better availabilities of slow followers (less frequent snapshots from leader).
  • clientv3.Lease.TimeToLive returns LeaseTimeToLiveResponse.TTL == -1 on lease not found.
  • clientv3.NewFromConfigFile is moved to clientv3/yaml.NewConfig.
  • embed.Etcd.Peers field is now []*peerListener.
  • Rejects domains names for --listen-peer-urls and --listen-client-urls (3.1 only prints out warnings), since domain name is invalid for network interface binding.

Dependency

Metrics, Monitoring

Note that any etcd_debugging_* metrics are experimental and subject to change.

  • Add etcd_debugging_server_lease_expired_total metrics.

Security, Authentication

See security doc for more details.

  • TLS certificates get reloaded on every client connection. This is useful when replacing expiry certs without stopping etcd servers; it can be done by overwriting old certs with new ones. Refreshing certs for every connection should not have too much overhead, but can be improved in the future, with caching layer. Example tests can be found here.
  • Server denies incoming peer certs with wrong IP SAN. For instance, if peer cert contains any IP addresses in Subject Alternative Name (SAN) field, server authenticates a peer only when the remote IP address matches one of those IP addresses. This is to prevent unauthorized endpoints from joining the cluster. For example, peer B's CSR (with cfssl) SAN field is ["*.example.default.svc", "*.example.default.svc.cluster.local", "10.138.0.27"] when peer B's actual IP address is 10.138.0.2, not 10.138.0.27. When peer B tries to join the cluster, peer A will reject B with the error x509: certificate is valid for 10.138.0.27, not 10.138.0.2, because B's remote IP address does not match the one in Subject Alternative Name (SAN) field.
  • Server resolves TLS DNSNames when checking SAN. For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server authenticates a peer only when forward-lookups (dig b.com) on those DNS names have matching IP with the remote IP address. For example, peer B's CSR (with cfssl) SAN field is ["b.com"] when peer B's remote IP address is 10.138.0.2. When peer B tries to join the cluster, peer A looks up the incoming host b.com to get the list of IP addresses (e.g. dig b.com). And rejects B if the list does not contain the IP 10.138.0.2, with the error tls: 10.138.0.2 does not match any of DNSNames ["b.com"].
  • Auth support JWT token.

etcd server

  • RPCs
    • Add Election, Lock service.
  • Native client etcdserver/api/v3client
    • client "embedded" in the server.
  • Logging, monitoring
    • Server warns large snapshot operations.
  • Add etcd --enable-v2 flag to enable v2 API server.
    • etcd --enable-v2=true by default.
  • Add etcd --auth-token flag.
  • v3.2 compactor runs every hour.
    • Compactor only supports periodic compaction.
    • Compactor continues to record latest revisions every 5-minute.
    • For every hour, it uses the last revision that was fetched before compaction period, from the revision records that were collected every 5-minute.
    • That is, for every hour, compactor discards historical data created before compaction period.
    • The retention window of compaction period moves to next hour.
    • For instance, when hourly writes are 100 and --auto-compaction-retention=10, v3.1 compacts revision 1000, 2000, and 3000 for every 10-hour, while v3.2 compacts revision 1000, 1100, and 1200 for every 1-hour.
    • If compaction succeeds or requested revision has already been compacted, it resets period timer and removes used compacted revision from historical revision records (e.g. start next revision collect and compaction from previously collected revisions).
    • If compaction fails, it retries in 5 minutes.
  • Allow snapshot over 512MB.

client v3

  • STM prefetching.
  • Add namespace feature.
  • Add ErrOldCluster with server version checking.
  • Translate WithPrefix() into WithFromKey() for empty key.

etcdctl v3

  • Add check perf command.
  • Add etcdctl --from-key flag to role grant-permission command.
  • lock command takes an optional command to execute.

gRPC Proxy

  • Proxy endpoint discovery.
  • Namespaces.
  • Coalesce lease requests.

etcd gateway

Other

  • v3 client
    • concurrency package's elections updated to match RPC interfaces.
    • let client dial endpoints not in the balancer.
  • Release
    • Annotate acbuild with supports-systemd-notify.
    • Add nsswitch.conf to Docker container image.
    • Add ppc64le, arm64(experimental) builds.

Go