Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump go-dqlite to v1.5.1 #29

Merged
merged 1 commit into from
Apr 23, 2020

Conversation

freeekanayaka
Copy link
Contributor

@freeekanayaka freeekanayaka commented Mar 25, 2020

I have no hard evidence, but anecdotally some testing that I have done suggests that v1.4.1 has a few fixes that might help with canonical/dqlite#190 and improve stability in general, especially for the kind of workload k8s produces.

EDIT: since v1.5.1 has been released, I've updated this PR to use v1.5.1 instead of v1.4.1. It shouldn't make any difference for kine, but just to have the latest.

@leolb-aphp
Copy link

@ibuildthecloud hey! could we get this merged and bumped in k3s so that k3s-io/k3s#1639 can be solved (maybe)?
thank you

awesome work @freeekanayaka and @ibuildthecloud & rancher teams, love k3s and dqlite.

@leolb-aphp
Copy link

@freeekanayaka still getting segfaults and panics in a multi master HA of k3s with dqlite 1.4.1 : k3s-io/k3s#1639 (comment)

@freeekanayaka
Copy link
Contributor Author

@freeekanayaka still getting segfaults and panics in a multi master HA of k3s with dqlite 1.4.1 : rancher/k3s#1639 (comment)

The go-dqlite 1.4.1 release won't be enough. You need to use the latest master commits in canonical/dqlite and canonical/raft C libraries.

Have you built them too?

@leolb-aphp
Copy link

@freeekanayaka still getting segfaults and panics in a multi master HA of k3s with dqlite 1.4.1 : rancher/k3s#1639 (comment)

The go-dqlite 1.4.1 release won't be enough. You need to use the latest master commits in canonical/dqlite and canonical/raft C libraries.

Have you built them too?

Seems I had not, I had only replaced go-dqlite in go.mod and thought it would get the rest of dependencies on its own. I dug into the build system of k3s and it's a bit odd but it should be good now! In Dockerfile.dapper there's a line that gets a Docker image in which there's raft/dqlite sources. And that Docker image is built from a git repo with a version that has to be specified as well.

@freeekanayaka
Copy link
Contributor Author

@freeekanayaka still getting segfaults and panics in a multi master HA of k3s with dqlite 1.4.1 : rancher/k3s#1639 (comment)

The go-dqlite 1.4.1 release won't be enough. You need to use the latest master commits in canonical/dqlite and canonical/raft C libraries.
Have you built them too?

Seems I had not, I had only replaced go-dqlite in go.mod and thought it would get the rest of dependencies on its own. I dug into the build system of k3s and it's a bit odd but it should be good now! In Dockerfile.dapper there's a line that gets a Docker image in which there's raft/dqlite sources. And that Docker image is built from a git repo with a version that has to be specified as well.

I didn't cut a release yet, so there's currently no version tag. If using a commit hash (the one from HEAD master) does not work, please let me know and I'll do a release, since it's also about time.

@freeekanayaka
Copy link
Contributor Author

I didn't cut a release yet, so there's currently no version tag. If using a commit hash (the one from HEAD master) does not work, please let me know and I'll do a release, since it's also about time.

For sake of clarity I went on and cut 3 releases: go-dqlite v1.5.0, dqlite v1.4.1 and raft v0.9.18. Those should be the ones to test.

Note that for kine go-dqlite v1.5.0 is basically the same as v1.4.1, there are just more helpers added which are not used by kine. On the other hand dqlite v1.4.1 and raft v0.9.18 do have fixes that should be relevant for kine.

@leolb-aphp
Copy link

@freeekanayaka Could do just what you said, observing stability now! Thanks.

@leolb-aphp
Copy link

@freeekanayaka been running for 8 hours now with several services installed on top of the cluster, no issues in particular that come from dqlite I believe. Though I had some issues, but I don't think they're about dqlite.

@freeekanayaka
Copy link
Contributor Author

@freeekanayaka been running for 8 hours now with several services installed on top of the cluster, no issues in particular that come from dqlite I believe. Though I had some issues, but I don't think they're about dqlite.

Great! Thanks for the feedback, keep me posted if you notice anything.

@leolb-aphp
Copy link

leolb-aphp commented Apr 17, 2020

@freeekanayaka As a side note, maybe get dqlite into https://github.com/google/oss-fuzz?
LXD would classify as critical to IT infrastructure I'd say. https://google.github.io/oss-fuzz/getting-started/accepting-new-projects/

@leolb-aphp
Copy link

@freeekanayaka Ah.. speaking of which, it just crashed again:

avril 17 18:10:41 hostname k3s[32414]: Trace[447322305]: [880.788724ms] [880.788724ms] END
avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.235540   32414 trace.go:116] Trace[1770918308]: "List etcd3" key:/persistentvolumes,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.159699161 +0200 CEST m=+0.922300603) (total time: 1.075796643s):
avril 17 18:10:41 hostname k3s[32414]: Trace[1770918308]: [1.075796643s] [1.075796643s] END
avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.236895   32414 trace.go:116] Trace[512743640]: "List etcd3" key:/persistentvolumeclaims,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.160902868 +0200 CEST m=+0.923504310) (total time: 1.075948014s):
avril 17 18:10:41 hostname k3s[32414]: Trace[512743640]: [1.075948014s] [1.075948014s] END
avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.251609   32414 trace.go:116] Trace[2105912555]: "List etcd3" key:/minions,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.164826944 +0200 CEST m=+0.927428357) (total time: 1.086728021s):
avril 17 18:10:41 hostname k3s[32414]: Trace[2105912555]: [1.086728021s] [1.086728021s] END
avril 17 18:10:41 hostname k3s[32414]: panic: runtime error: index out of range [147167] with length 143072
avril 17 18:10:41 hostname k3s[32414]: goroutine 739 [running]:
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Message).lastByte(...)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:476
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Rows).Close(0xc00318c228, 0x10, 0x38b0820)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:631 +0x14c
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver.(*Rows).Close(0xc00318c200, 0x42ff6a, 0x7fa7e9579e40)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver/driver.go:560 +0x35
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).close.func1()
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3076 +0x3c
avril 17 18:10:41 hostname k3s[32414]: database/sql.withLock(0x4773ee0, 0xc006ec1100, 0xc001f554e8)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3184 +0x6d
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).close(0xc00318c580, 0x0, 0x0, 0x0, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3075 +0x129
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).Close(...)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3059
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).Next(0xc00318c580, 0xc00985a090)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:2748 +0xb8
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.RowsToEvents(0xc00318c580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:279 +0xfa
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.(*SQLLog).List(0xc000779b00, 0x47d93e0, 0xc001964de0, 0xc001a30220, 0x15, 0x0, 0x0, 0x2711, 0x0, 0x40e600, ...)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:246 +0x16e
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).List(0xc000b8ceb0, 0x47d93e0, 0xc001964de0, 0xc001a30220, 0x15, 0xc001a30240, 0x15, 0x2711, 0x0, 0x0, ...)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:161 +0x1a6
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*LimitedServer).list(0xc0014f0d40, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0x0, 0xc001b1ea10, 0x9b07b9)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/list.go:39 +0x307
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*LimitedServer).Range(0xc0014f0d40, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0x0, 0x0, 0xc0009b60e0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/limited.go:18 +0xa3
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*KVServerBridge).Range(0xc00077ddd8, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0xc00077ddd8, 0xc001964de0, 0xc001b1ea80)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/server.go:83 +0xbe
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler(0x3bcad20, 0xc00077ddd8, 0x47d93e0, 0xc001964de0, 0xc001a90780, 0x0, 0x47d93e0, 0xc001964de0, 0xc0002288c0, 0x31)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3545 +0x217
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700, 0xc0014ef1d0, 0x6b32e20, 0x0, 0x0, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:1007 +0x460
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).handleStream(0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:1287 +0xd97
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0009b80d0, 0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:722 +0xbb
avril 17 18:10:41 hostname k3s[32414]: created by github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:720 +0xa1
avril 17 18:10:41 hostname systemd[1]: k3s.service: main process exited, code=exited, status=2/INVALIDARGUMENT
avril 17 18:10:41 hostname systemd[1]: Failed to start Lightweight Kubernetes.

@freeekanayaka
Copy link
Contributor Author

@leolb-aphp can you provide more details about your setup? How many k3s nodes? And ideally provide me with the exact yaml to reproduce your deployment (with kubectl apply -f).

I have seen "panic: runtime error: index out of range" myself too at the beginning, but not anymore with the latest fixes.

@leolb-aphp
Copy link

leolb-aphp commented Apr 17, 2020

@leolb-aphp can you provide more details about your setup? How many k3s nodes? And ideally provide me with the exact yaml to reproduce your deployment (with kubectl apply -f).

I have seen "panic: runtime error: index out of range" myself too at the beginning, but not anymore with the latest fixes.

I have 3 masters and deployments are various, JupyterHub k8s 0.9 beta.4, Longhorn 0.8 for the default storage class, Rancher 2.4.2 (with built in monitoring enabled), Harbor 1.3.2, all from Helm.

Default configuration for everything besides JupyterHub that gets:

proxy:
  secretToken: "changeme"
  service:
    type: ClusterIP

ingress:
  enabled: true
  hosts:
    - jhub-k8s.changeme.lan

singleuser:
  image:
    name: jupyter/minimal-notebook
    tag: 2343e33dec46
  profileList:
    - display_name: "Minimal environment"
      description: "To avoid too much bells and whistles: Python."
      default: true
    - display_name: "Datascience environment"
      description: "If you want the additional bells and whistles: Python, R, and Julia."
      kubespawner_override:
        image: jupyter/datascience-notebook:2343e33dec46
  defaultUrl: "/lab"

hub:
  service:
    type: ClusterIP
  extraConfig:
    jupyterlab: |
      c.Spawner.cmd = ['jupyter-labhub']

auth:
  type: dummy
  dummy:
    password: 'changeme'
  whitelist:
    users:
      - admin
  admin:
    access: true
    users:
      - admin

Rancher can be installed like this:

$ helm install rancher rancher-latest/rancher --namespace cattle-system --set tls=external --set hostname=rancher.changeme.lan

Rancher has default configuration templates for services such as Harbor or Longhorn, just take those.

And by the way, I have to wait 8-10 hours for the crash to happen and I'm actively fiddling with the cluster trying out things.

Is there a way to be absolutely sure that k3s is built with the correct version of dqlite etc? Can it write versions to disk somehow? Do you have a repo where you built it from so we are sure we get same results? Can you describe your build process?

@leolb-aphp
Copy link

@freeekanayaka Also I'm thinking this could be the cause, I used go-dqlite 1.4.1 because 1.5.0 did not build:

Note: checking out 'v1.5.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 1c6b000 Merge pull request #88 from freeekanayaka/cli-shell
patching file internal/bindings/build.go
app/options.go:10:2: cannot find package "github.com/lxc/lxd/shared" in any of:
	/usr/local/go/src/github.com/lxc/lxd/shared (from $GOROOT)
	/go/src/github.com/lxc/lxd/shared (from $GOPATH)
cmd/dqlite-demo/dqlite-demo.go:17:2: cannot find package "golang.org/x/sys/unix" in any of:
	/usr/local/go/src/golang.org/x/sys/unix (from $GOROOT)
	/go/src/golang.org/x/sys/unix (from $GOPATH)
The command '/bin/sh -c go get -d github.com/canonical/go-dqlite &&     cd /go/src/github.com/canonical/go-dqlite &&     git checkout $GO_DQLITE_VER &&     ls /patch/go-dqlite-* | xargs -r -n1 patch -p1 -i &&     go install         -tags libsqlite3         -ldflags "-w -s -extldflags '-static'"         ./cmd/dqlite-demo' returned a non-zero code: 1

@leolb-aphp
Copy link

leolb-aphp commented Apr 17, 2020

Here's my build process:

Clone and cd into: https://github.com/leolb-aphp/dqlite-build branch master

Run:

$ docker build --build-arg http_proxy=$http_proxy -t leolb-aphp/dqlite-build:v1.4.1-r0 .

(or sort it out if you arent behind a corporate proxy)

Clone and cd into: https://github.com/leolb-aphp/k3s.git branch release-1.17

Run:

$ curl -sL https://releases.rancher.com/dapper/v0.4.2/dapper-`uname -s`-`uname -m` > .dapper.tmp

(workaround because things did not work using the Makefile.. it's probably my corporate proxy but I hardcoded modifications in my branch so you still need to do this)

(You need to be part of the "docker" group so you can use the docker socket, or install podman-docker if it works)
Run: make ci

Latest commits on every of these repos use go-dqlite 1.5.0 because I just changed them but my current tests use go-dqlite 1.4.1 because it does not build.

@leolb-aphp
Copy link

I'm thinking maybe it is due to docker build cache now, re-trying.

@freeekanayaka
Copy link
Contributor Author

I'm thinking maybe it is due to docker build cache now, re-trying.

Sorry, the build error is a mistake on my part. I've pushed a branch to fix it. Will release v1.5.1 shortly.

In any case, for k8s 1.5.0 and 1.5.1 should be exactly like v1.4.1.

@freeekanayaka
Copy link
Contributor Author

@leolb-aphp fyi I just pushed the v1.5.1 tag, so your build should work now.

@leolb-aphp
Copy link

@freeekanayaka Cool! Thanks a lot. By the way, the cluster got back on its feets on its own with automatic service restart and Kubernetes mechanisms of resilience. Interesting!

@leolb-aphp
Copy link

@leolb-aphp fyi I just pushed the v1.5.1 tag, so your build should work now.

Still getting:

cmd/dqlite-demo/dqlite-demo.go:17:2: cannot find package "golang.org/x/sys/unix" in any of:
	/usr/local/go/src/golang.org/x/sys/unix (from $GOROOT)
	/go/src/golang.org/x/sys/unix (from $GOPATH)

Is it on my side now?

@leolb-aphp
Copy link

I'm thinking the panic may be facilitated by lossy/bad networking conditions

@leolb-aphp
Copy link

Got these:

avril 17 20:38:55 hostname k3s[24725]: time="2020-04-17T20:38:55.337341755+02:00" level=info msg="Tunnel endpoint watch event: [10.172.28.2:6443 10.172.28.3:6443]"
avril 17 20:38:55 hostname k3s[24725]: time="2020-04-17T20:38:55.337429169+02:00" level=info msg="Stopped tunnel to 10.172.28.1:6443"
avril 17 20:38:55 hostname k3s[24725]: I0417 20:38:55.345994   24725 trace.go:116] Trace[954913950]: "GuaranteedUpdate etcd3" type:*core.Pod (started: 2020-04-17 20:38:53.66255931 +0200 CEST m=+27.598005983) (total time: 1.683391798s):
avril 17 20:38:55 hostname k3s[24725]: Trace[954913950]: [1.6831782s] [1.681723168s] Transaction committed
avril 17 20:38:55 hostname k3s[24725]: I0417 20:38:55.346320   24725 trace.go:116] Trace[636473417]: "Patch" url:/api/v1/namespaces/harbor/pods/harbor-harbor-redis-0/status,user-agent:k3s/v1.17.4+k3s (linux/amd64) kubernetes/507cc82,client:127.0.0.1 (started: 2020-04-17 20:38:53.662434276 +0200 CEST m=+27.597880949) (total time: 1.68383904s):
avril 17 20:38:55 hostname k3s[24725]: Trace[636473417]: [1.683625347s] [1.682370846s] Object stored in database
avril 17 20:38:55 hostname k3s[24725]: I0417 20:38:55.377765   24725 trace.go:116] Trace[1983668907]: "Get" url:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,user-agent:k3s/v1.17.4+k3s (linux/amd64) kubernetes/507cc82/leader-election,client:127.0.0.1 (started: 2020-04-17 20:38:53.753661212 +0200 CEST m=+27.689107911) (total time: 1.624003934s):
avril 17 20:38:55 hostname k3s[24725]: Trace[1983668907]: [1.623909824s] [1.623874019s] About to write a response
avril 17 20:38:55 hostname k3s[24725]: I0417 20:38:55.398573   24725 trace.go:116] Trace[11400181]: "GuaranteedUpdate etcd3" type:*core.Event (started: 2020-04-17 20:38:53.71447337 +0200 CEST m=+27.649920048) (total time: 1.684041642s):
avril 17 20:38:55 hostname k3s[24725]: Trace[11400181]: [1.606173362s] [1.606173362s] initial value restored
avril 17 20:38:55 hostname k3s[24725]: I0417 20:38:55.398794   24725 trace.go:116] Trace[1752954359]: "Patch" url:/api/v1/namespaces/harbor/events/harbor-harbor-redis-0.1606af5bd1b6f72d,user-agent:k3s/v1.17.4+k3s (linux/amd64) kubernetes/507cc82,client:127.0.0.1 (started: 2020-04-17 20:38:53.714377778 +0200 CEST m=+27.649824452) (total time: 1.68437132s):
avril 17 20:38:55 hostname k3s[24725]: Trace[1752954359]: [1.606272417s] [1.606233835s] About to apply patch
avril 17 20:38:56 hostname systemd[1]: k3s.service: main process exited, code=killed, status=11/SEGV
avril 17 20:38:56 hostname systemd[1]: Unit k3s.service entered failed state.
avril 17 20:38:56 hostname systemd[1]: k3s.service failed.
avril 17 20:39:01 hostname systemd[1]: k3s.service holdoff time over, scheduling restart.

So the first node, the one that is used as cluster init (presumably k3s handles multi master in a bad way by making later joined nodes connect to the first connected node, which makes the whole cluster probably resilient to the cluster init node I think?).

The cluster init node gets panics some times and the rest of nodes get segfaults, not at the same time but that's the behavior.

@SteelCrow
Copy link

Please see my case in k3s-io/k3s#1633. I have the same thoughts about HA...

@freeekanayaka
Copy link
Contributor Author

Please see my case in rancher/k3s#1633. I have the same thoughts about HA...

@SteelCrow as far as I could observe myself, the latest version of the dqlite dependencies have fixed most of the issues.

I'd suggest for the k3s team to make a new build with those dependencies upgraded (or do it yourself) and then retry.

@freeekanayaka
Copy link
Contributor Author

@leolb-aphp fyi I just pushed the v1.5.1 tag, so your build should work now.

Still getting:

cmd/dqlite-demo/dqlite-demo.go:17:2: cannot find package "golang.org/x/sys/unix" in any of:
	/usr/local/go/src/golang.org/x/sys/unix (from $GOROOT)
	/go/src/golang.org/x/sys/unix (from $GOPATH)

Is it on my side now?

Yes, this is a bit on "your" side. It's a new dependency and so the go.mod configuration of k3s needs to be upgraded. I'll do this in this PR and bump it from 1.4.1 to 1.5.0.

@freeekanayaka
Copy link
Contributor Author

@freeekanayaka Ah.. speaking of which, it just crashed again:

avril 17 18:10:41 hostname k3s[32414]: Trace[447322305]: [880.788724ms] [880.788724ms] END
avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.235540   32414 trace.go:116] Trace[1770918308]: "List etcd3" key:/persistentvolumes,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.159699161 +0200 CEST m=+0.922300603) (total time: 1.075796643s):
avril 17 18:10:41 hostname k3s[32414]: Trace[1770918308]: [1.075796643s] [1.075796643s] END
avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.236895   32414 trace.go:116] Trace[512743640]: "List etcd3" key:/persistentvolumeclaims,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.160902868 +0200 CEST m=+0.923504310) (total time: 1.075948014s):
avril 17 18:10:41 hostname k3s[32414]: Trace[512743640]: [1.075948014s] [1.075948014s] END
avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.251609   32414 trace.go:116] Trace[2105912555]: "List etcd3" key:/minions,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.164826944 +0200 CEST m=+0.927428357) (total time: 1.086728021s):
avril 17 18:10:41 hostname k3s[32414]: Trace[2105912555]: [1.086728021s] [1.086728021s] END
avril 17 18:10:41 hostname k3s[32414]: panic: runtime error: index out of range [147167] with length 143072
avril 17 18:10:41 hostname k3s[32414]: goroutine 739 [running]:
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Message).lastByte(...)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:476
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Rows).Close(0xc00318c228, 0x10, 0x38b0820)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:631 +0x14c
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver.(*Rows).Close(0xc00318c200, 0x42ff6a, 0x7fa7e9579e40)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver/driver.go:560 +0x35
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).close.func1()
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3076 +0x3c
avril 17 18:10:41 hostname k3s[32414]: database/sql.withLock(0x4773ee0, 0xc006ec1100, 0xc001f554e8)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3184 +0x6d
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).close(0xc00318c580, 0x0, 0x0, 0x0, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3075 +0x129
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).Close(...)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3059
avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).Next(0xc00318c580, 0xc00985a090)
avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:2748 +0xb8
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.RowsToEvents(0xc00318c580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:279 +0xfa
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.(*SQLLog).List(0xc000779b00, 0x47d93e0, 0xc001964de0, 0xc001a30220, 0x15, 0x0, 0x0, 0x2711, 0x0, 0x40e600, ...)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:246 +0x16e
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).List(0xc000b8ceb0, 0x47d93e0, 0xc001964de0, 0xc001a30220, 0x15, 0xc001a30240, 0x15, 0x2711, 0x0, 0x0, ...)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:161 +0x1a6
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*LimitedServer).list(0xc0014f0d40, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0x0, 0xc001b1ea10, 0x9b07b9)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/list.go:39 +0x307
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*LimitedServer).Range(0xc0014f0d40, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0x0, 0x0, 0xc0009b60e0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/limited.go:18 +0xa3
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*KVServerBridge).Range(0xc00077ddd8, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0xc00077ddd8, 0xc001964de0, 0xc001b1ea80)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/server.go:83 +0xbe
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler(0x3bcad20, 0xc00077ddd8, 0x47d93e0, 0xc001964de0, 0xc001a90780, 0x0, 0x47d93e0, 0xc001964de0, 0xc0002288c0, 0x31)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3545 +0x217
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700, 0xc0014ef1d0, 0x6b32e20, 0x0, 0x0, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:1007 +0x460
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).handleStream(0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700, 0x0)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:1287 +0xd97
avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0009b80d0, 0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700)
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:722 +0xbb
avril 17 18:10:41 hostname k3s[32414]: created by github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:720 +0xa1
avril 17 18:10:41 hostname systemd[1]: k3s.service: main process exited, code=exited, status=2/INVALIDARGUMENT
avril 17 18:10:41 hostname systemd[1]: Failed to start Lightweight Kubernetes.

@leolb-aphp, actually looking at this traceback more carefully I noticed that this can't be something generated by v1.4.1 or v1.5.0, since the line numbers and function names in the traceback you pasted could be generated only by v1.4.0.

So for some reason you haven't tested v1.4.1 or v1.5.0, but v1.4.0 or lower.

@SteelCrow
Copy link

Please see my case in rancher/k3s#1633. I have the same thoughts about HA...

@SteelCrow as far as I could observe myself, the latest version of the dqlite dependencies have fixed most of the issues.

I'd suggest for the k3s team to make a new build with those dependencies upgraded (or do it yourself) and then retry.

Sorry about that, I misspelled issue, this one k3s-io/k3s#1639

@SteelCrow
Copy link

@freeekanayaka Ah.. speaking of which, it just crashed again:

avril 17 18:10:41 hostname k3s[32414]: Trace[447322305]: [880.788724ms] [880.788724ms] END

avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.235540 32414 trace.go:116] Trace[1770918308]: "List etcd3" key:/persistentvolumes,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.159699161 +0200 CEST m=+0.922300603) (total time: 1.075796643s):

avril 17 18:10:41 hostname k3s[32414]: Trace[1770918308]: [1.075796643s] [1.075796643s] END

avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.236895 32414 trace.go:116] Trace[512743640]: "List etcd3" key:/persistentvolumeclaims,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.160902868 +0200 CEST m=+0.923504310) (total time: 1.075948014s):

avril 17 18:10:41 hostname k3s[32414]: Trace[512743640]: [1.075948014s] [1.075948014s] END

avril 17 18:10:41 hostname k3s[32414]: I0417 18:10:41.251609 32414 trace.go:116] Trace[2105912555]: "List etcd3" key:/minions,resourceVersion:,limit:10000,continue: (started: 2020-04-17 18:10:40.164826944 +0200 CEST m=+0.927428357) (total time: 1.086728021s):

avril 17 18:10:41 hostname k3s[32414]: Trace[2105912555]: [1.086728021s] [1.086728021s] END

avril 17 18:10:41 hostname k3s[32414]: panic: runtime error: index out of range [147167] with length 143072

avril 17 18:10:41 hostname k3s[32414]: goroutine 739 [running]:

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Message).lastByte(...)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:476

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Rows).Close(0xc00318c228, 0x10, 0x38b0820)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:631 +0x14c

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver.(*Rows).Close(0xc00318c200, 0x42ff6a, 0x7fa7e9579e40)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver/driver.go:560 +0x35

avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).close.func1()

avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3076 +0x3c

avril 17 18:10:41 hostname k3s[32414]: database/sql.withLock(0x4773ee0, 0xc006ec1100, 0xc001f554e8)

avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3184 +0x6d

avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).close(0xc00318c580, 0x0, 0x0, 0x0, 0x0)

avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3075 +0x129

avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).Close(...)

avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:3059

avril 17 18:10:41 hostname k3s[32414]: database/sql.(*Rows).Next(0xc00318c580, 0xc00985a090)

avril 17 18:10:41 hostname k3s[32414]: /usr/local/go/src/database/sql/sql.go:2748 +0xb8

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.RowsToEvents(0xc00318c580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:279 +0xfa

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.(*SQLLog).List(0xc000779b00, 0x47d93e0, 0xc001964de0, 0xc001a30220, 0x15, 0x0, 0x0, 0x2711, 0x0, 0x40e600, ...)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:246 +0x16e

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).List(0xc000b8ceb0, 0x47d93e0, 0xc001964de0, 0xc001a30220, 0x15, 0xc001a30240, 0x15, 0x2711, 0x0, 0x0, ...)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:161 +0x1a6

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*LimitedServer).list(0xc0014f0d40, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0x0, 0xc001b1ea10, 0x9b07b9)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/list.go:39 +0x307

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*LimitedServer).Range(0xc0014f0d40, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0x0, 0x0, 0xc0009b60e0)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/limited.go:18 +0xa3

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server.(*KVServerBridge).Range(0xc00077ddd8, 0x47d93e0, 0xc001964de0, 0xc0009b60e0, 0xc00077ddd8, 0xc001964de0, 0xc001b1ea80)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/server/server.go:83 +0xbe

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler(0x3bcad20, 0xc00077ddd8, 0x47d93e0, 0xc001964de0, 0xc001a90780, 0x0, 0x47d93e0, 0xc001964de0, 0xc0002288c0, 0x31)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3545 +0x217

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700, 0xc0014ef1d0, 0x6b32e20, 0x0, 0x0, 0x0)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:1007 +0x460

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).handleStream(0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700, 0x0)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:1287 +0xd97

avril 17 18:10:41 hostname k3s[32414]: github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0009b80d0, 0xc0014b4c00, 0x482b500, 0xc001a9aa80, 0xc001274700)

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:722 +0xbb

avril 17 18:10:41 hostname k3s[32414]: created by github.com/rancher/k3s/vendor/google.golang.org/grpc.(*Server).serveStreams.func1

avril 17 18:10:41 hostname k3s[32414]: /go/src/github.com/rancher/k3s/vendor/google.golang.org/grpc/server.go:720 +0xa1

avril 17 18:10:41 hostname systemd[1]: k3s.service: main process exited, code=exited, status=2/INVALIDARGUMENT

avril 17 18:10:41 hostname systemd[1]: Failed to start Lightweight Kubernetes.

@leolb-aphp, actually looking at this traceback more carefully I noticed that this can't be something generated by v1.4.1 or v1.5.0, since the line numbers and function names in the traceback you pasted could be generated only by v1.4.0.

So for some reason you haven't tested v1.4.1 or v1.5.0, but v1.4.0 or lower.

I think it’s a problem with direct dependency in k3s build, so you need to update it there too. When I first compiled it, I occasionally missed it.

@freeekanayaka freeekanayaka changed the title Bump go-dqlite to v1.4.1 Bump go-dqlite to v1.5.0 Apr 20, 2020
@freeekanayaka
Copy link
Contributor Author

@ibuildthecloud any plan at upgrading go-dqlite, libdqlite and raft in k3s?

@leolb-aphp
Copy link

leolb-aphp commented Apr 21, 2020

I think it’s a problem with direct dependency in k3s build, so you need to update it there too. When I first compiled it, I occasionally missed it.

I did everything of that, I posted my build process here: #29 (comment)
The other issue is that some times k3s doesnt build statically, for some reason

@freeekanayaka
Copy link
Contributor Author

I think it’s a problem with direct dependency in k3s build, so you need to update it there too. When I first compiled it, I occasionally missed it.

I did everything of that, I posted my build process here: #29 (comment)
The other issue is that some times k3s doesnt build statically, for some reason

I'm not familiar with k3s build process. But again, the traceback you pasted does come from go-dqlite v1.4.0 or earlier. Not sure about the version of the libdqlite and libraft shared libraries, but you need to make sure those are up-to-date too.

Too bad that the k3s devs seem unresponsive.

@erikwilson
Copy link
Contributor

sorry we haven't been answering @freeekanayaka, please feel free to ping myself or @cjellick in the future, in github or slack.

if we can update the pr to use 1.5.1 hopefully that is good enough to move forward (& might help to have pr for dqlite-build)

thanks for testing @leolb-aphp, probably want to make sure you go mod vendor the kine change using the same go minor version used for the build. if using the Makefile (& Docker.dapper) it will set environment variables to ensure k3s & parts are compiled statically, if calling the scripts directly the proper environment for static compiling may not be set.

@freeekanayaka
Copy link
Contributor Author

sorry we haven't been answering @freeekanayaka, please feel free to ping myself or @cjellick in the future, in github or slack.

if we can update the pr to use 1.5.1 hopefully that is good enough to move forward (& might help to have pr for dqlite-build)

As mentioned in this PR's description, I've changed the branch to use go-dqlite 1.5.0, so everything should be ready for merging.

On top of this change, you'll have to make sure you also update the dqlite and raft C libraries in the k3s build process (dqlite v1.4.1 and raft v0.9.18).

@erikwilson
Copy link
Contributor

thanks, is there a reason for not going to go-dqlite 1.5.1?

@freeekanayaka freeekanayaka changed the title Bump go-dqlite to v1.5.0 Bump go-dqlite to v1.5.1 Apr 23, 2020
@freeekanayaka
Copy link
Contributor Author

thanks, is there a reason for not going to go-dqlite 1.5.1?

Ugh, no reason. Sorry, I updated this PR to point to 1.5.1 now, not 1.5.0.

@erikwilson
Copy link
Contributor

Thanks @freeekanayaka!

@erikwilson erikwilson merged commit 445a965 into k3s-io:master Apr 23, 2020
@leolb-aphp
Copy link

Thanks a lot! I deployed a v1.17.5-rc2+k3s1 cluster we'll see how this goes!!

@leolb-aphp
Copy link

@freeekanayaka @erikwilson

For your information, no stability issues since then!

@freeekanayaka freeekanayaka deleted the bump-go-dqlite-to-v1.4.1 branch June 1, 2020 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants