Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Weave Net to version 2.5.2 #6997

Closed
wants to merge 1 commit into from

Conversation

murali-reddy
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 16, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @murali-reddy. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 16, 2019
@zetaab
Copy link
Member

zetaab commented May 16, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 16, 2019
@granular-ryanbonham
Copy link
Contributor

/retest

@granular-ryanbonham
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2019
@hadrianvalentine
Copy link

Greetings,
Is there any update on the status of this PR?

@ReillyBrogan
Copy link
Contributor

/approve

I don't know if this will work with me drive-by reviewing this but it's worth a shot. I've been using Weave Net 2.5.2 with my kops clusters for the last month and haven't hit any issues.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: murali-reddy, ReillyBrogan
To complete the pull request process, please assign chrisz100
You can assign the PR to them by writing /assign @chrisz100 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hairyhenderson
Copy link

hairyhenderson commented Jul 11, 2019

@zetaab Any possibility to get this merged? Would be great to get this upgrade!

@zetaab
Copy link
Member

zetaab commented Jul 11, 2019

/lgtm

/assign @chrisz100

@itskingori
Copy link
Member

I would like to recommend that we test this a little further. We upgraded to this version after a bunch of weave pods got OOM'd at 200MB during scaling. As part of that upgrade we increased the limit to 300MB and out cluster was very very unstable on 2.5.2. We've rolled back to 2.5.1 but kept the 300MB limit change and it's been working well for the past hour or so (with scaling). With 2.5.2 we had to remove cluster-autoscaler because the cluster would slowly fall apart as weave state became inconsistent.

@murali-reddy
Copy link
Contributor Author

@itskingori

We upgraded to this version after a bunch of weave pods got OOM'd at 200MB during scaling.

As part of that upgrade we increased the limit to 300MB and out cluster was very very unstable on 2.5.2.

2.5.2 is bug fix release. I dont see any possible reasons why 2.5.2 would cause your cluster to be unstable compared to 2.5.1

Its possible you may run into the issue on 2.5.1 as well (weaveworks/weave#3650).

To me its best to understand the root cause for the memeory leak before coming to conculsion on 2.5.2

@itskingori
Copy link
Member

@murali-reddy I'm still monitoring our setup. There a two issues here:

2.5.2 is bug fix release. I dont see any possible reasons why 2.5.2 would cause your cluster to be unstable compared to 2.5.1

We were equally surprised. But our tests have been consistent since Saturday that when we're running 2.5.2 we get weave pods with different states after a while ... it's not immediate (and memory usage is not rising up to the point we get OOMs). This is a different issue. Memory usage went down slightly but we would have issues eventually (inconsistent ipam lists) as long as we had cluster-autoscaler scaling ... to cope we would remove and run a fixed number of nodes.

Its possible you may run into the issue on 2.5.1 as well (weaveworks/weave#3650).

Yes, I've hit that issue and that is what made us upgrade after reading the CHANGELOG. I consider both these issues separate. The only connection is that I thought 2.5.2 would address the OOM issue.

To me its best to understand the root cause for the memeory leak before coming to conculsion on 2.5.2

Agreed. I'll refrain from posting until I have memory heap of a pod that's getting to/past 200MB.

@itskingori
Copy link
Member

itskingori commented Jul 26, 2019

FYI. There's weaveworks/weave#3659 (comment) where @sferrett has tested and confirms that's 2.5.2 is working for him.

@hairyhenderson
Copy link

For what it's worth, we've been running 2.5.2 in 9 different clusters for over 2 weeks and haven't seen any of the error rates that we were seeing with 2.5.1 or 2.5.0.

@itskingori
Copy link
Member

My colleague (@zacblazic) has done extensive testing with this version and he has posted the results in weaveworks/weave#3677.

@rifelpet
Copy link
Member

It looks like we can close this as #7444 was a duplicate and merged into master and cherry-picked into 1.14.

@k8s-ci-robot
Copy link
Contributor

@murali-reddy: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kops-verify-gomod 0ce6ac1 link /test pull-kops-verify-gomod
pull-kops-verify-generated 0ce6ac1 link /test pull-kops-verify-generated

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@murali-reddy
Copy link
Contributor Author

It looks like we can close this as #7444 was a duplicate and merged into master and cherry-picked into 1.14.

@rifelpet thanks for informing. Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants