-
Notifications
You must be signed in to change notification settings - Fork 670
Weave-Net Addon causing kernel panics on RPI 3B+. #3314
Comments
@12wrigja Thanks for the issue. In the provided syslog logs of the oops there is no content of the general purpose registers. Could you check whether it is in |
Although I don't have evidence, but the kernel version rings a bell: coreos/bugs#2382. |
Just checked, and indeed, the latest raspbian is prone to the kernel bug mentioned in my comment above, as https://github.com/raspberrypi/linux/tree/raspberrypi-kernel_1.20180417-1 misses the fix: torvalds/linux@f15ca72#diff-4f541554c5f8f378effc907c8f0c9115. To workaround, you can disable |
I'm also experiencing this issue, I've been actually trying to set it up without any success. Sadly no logs from Kubernetes and most of the times my pi (3B+) reboots. uname -a
Linux black-pearl 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux
|
@arnulfojr Have you tried following the workaround suggested above? |
Thanks Martynas,
1. R.e. dmesg output, I didn't see anything useful when I was following it
using `dmesg -w` over ssh - the ssh connection terminates before anything
useful is printed. I don't have a serial console to connect to the Pi
either, which I think is what people normally use to save this kind of
info.
2. Good point r.e. the architecture - I'll check when I get back to my
setup and see what arch the images are. I did not compile the add-on myself
- just ran the script using the command.
3. Thanks for pointing out the coreos bug - I saw that too, but am not
familiar enough with how different system's kernels relate to know if it
was the same.
Assuming that the images are the right architecture, then I'll try
disabling fastdp and see if that helps.
…On Mon, 11 Jun 2018, 07:19 Martynas Pumputis, ***@***.***> wrote:
@12wrigja <https://github.com/12wrigja> OT: Did you compile Weave Net
yourself? Many are struggling with running it on RPI 3 B+ due to invalid
image arch: #3276 <#3276>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3314 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDtXtYNeUJjYx7Wi7u7bXt9jTTcXr-2ks5t7nxsgaJpZM4UiH4Y>
.
|
Following up here:
I subscribed to the issue you created for the rpi kernel, and would be happy to test any fixes they release. |
@12wrigja Thanks for the follow up.
Interesting. The |
First of, I'm glad I'm also getting the same error than @12wrigja as debugging this is close to impossible (at least for me) and is a bit frustrating 😅 . I have inspected the image and indeed I get a https://gist.github.com/arnulfojr/a32a33a42dc7e8254e1abdbb5e3873df The kernel panic error and the |
@arnulfojr Could you answer to my question above? |
@brb So indeed, is the So far, deactivating it avoids the kernel panic. I'm using weave as a plugin with Kubernetes and finally they all came to life! for the mean time, it would be cool if you can post officially somewhere about the kernel issue, so people having the issue can easily track it down. Furthermore, I still have the I still have to test it with usage but so far my issue went away by deactivation the Thanks a lot for the issue entries to the Pi kernel repo!
|
Thanks.
The non-
Agreed. We are going to update our docs to include known issues. |
I'm still trying to help identifying the source of the problem. During the tests, I got a kernel crash log on my PI (notice that I have a 32bits kernel running):
For info:
|
I may be missing some context: this command is necessary if Weave Net is not running and you want to tear down all the supporting structures. For instance if you want to move from fastdp to sleeve. In many cases a reboot is an easier option, and will make everything go away except the IP allocation data. |
I don't have a clear understanding of why this is necessary, but the "weave net" container crashed during launch when switching from "fastdp" to "non fastdb" (even on X86 where it also ran without any problem). |
An update about my latest tests. I launched the weave container with the "fastdp"" disabled:
And the container is now running for about a whole day:
With "fastdp"activated, the container generates the kernel crash we are discussing here after about one minute. In fact, this is a satisfactory workaround for me. |
According to raspberrypi/linux#2580 (comment), the issue is going to be fixed in the next Raspbian release or it has been already fixed in Could someone try updating the package and running Weave Net with fastdp? Thanks. |
Martynas, I updated my raspberry pi's to use the latest kernel provided by running |
Thanks @12wrigja; I'll close this issue based on your report. |
One more confirmation: I upgrade my Raspbian with a standard "apt upgrade" which upgraded my kernel. |
@bernhara Thanks for letting us know. |
EDIT: I see it's been fixed upstream already. Ignore me. I have this problem too. Would a full dump of the OOPS help you? Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/arm"} Server: Docker Engine - Community Linux k3 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux
|
@press5 we closed this issue based on evidence that the kernel bug is fixed in later versions. Since you report the same kernel version I don't see any reason to look further. A workaround is described at #3314 (comment) |
Posting to confirm that running Just to add my experience... I'm following this medium guide to set up a k8s pi cluster to set up a 5 node k8s pi cluster, and discovered that weave net pods are problematic if you strictly follow this article's commands ( it happens 😏 ). Here are the pi models in my cluster:
(command to acquire this info: I decided to make the Cheers all! |
What you expected to happen?
What I expected to happen: upon two weave pods discovering each other, weave to start working.
What happened?
The weave pod seems to execute some command that causes one of the two nodes connecting to each other to crash with a kernel panic. I'm guessing it's unlikely that weave itself is the root cause here, but I figure here is a good place to start.
Logs from the kernel are at the end of this issue.
How to reproduce it?
Setup kubernetes 1.9.7 on two nodes, and apply the K8s Weave addon using
$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
- this seems to install version 2.3.0 according to the pod descriptions from the API.Wait a short amount of time for the pods to try and connect to each other.
Notice that one of the machines has rebooted, and the other is unable to connect to the first (as it has crashed).
Anything else we need to know?
Probably the most important part: both nodes in this case are running the latest version of Raspbian, as they are Raspberry Pi 3B+ machines. They are all located on a home network, with IPs 192.168.0.3-5, statically assigned. These are configured using Ansible to an extent, and I might be able to share the scripts used if needed.
Versions:
Logs:
Before one node connects to the other, everything looks mostly fine. The initial connections are attempted to each of the three peers - .3, .4, and .5.
Once one pod connects to another, it's random as to which one crashes (but one always does). All the "logs" of the kernel panic that I could get hold of are here:
The text was updated successfully, but these errors were encountered: