Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

frequent failure in 150_connect_forget_test #1030

Closed
rade opened this issue Jun 26, 2015 · 9 comments
Closed

frequent failure in 150_connect_forget_test #1030

rade opened this issue Jun 26, 2015 · 9 comments
Assignees
Labels
Milestone

Comments

@rade
Copy link
Member

rade commented Jun 26, 2015

e.g. here

---= Running 150_connect_forget_test.sh =---
Connecting and forgetting routers after launch
7e52b95f812d0bf678250a792b650fac3be0950e5fcd7a2a5a1f8ab9479129e7
fedc53aa1431eb10bcd877f3d19b5aefc7ad1cd3f7da28c6e81755ac99543d0a
fcb9ef937002787a01ba8a53c3e23d26c9e339c398e3b7292e662f738be577de
1c0712d8f0f27a8e659edef20370e0da8eb8db67946aafd08b60effb060ef201
test #6 "exec_on host1-1323-1.us-central1-a.positive-cocoa-90213 c1 ping -nq -W 1 -c 1 10.2.1.7" failed:
    program terminated with code 1 instead of 0
1 of 11 tests failed in 17.464s.

I suspect it's just timing; we've just connected to another peer and odds are the topology and routing info hasn't fully updated yet.

@rade rade added the bug label Jun 26, 2015
@awh
Copy link
Contributor

awh commented Jun 29, 2015

I've seen this a couple of times locally with 110_encryption_test.sh, not quite enough times to make an issue of it yet but I'll keep an eye on it.

@rade
Copy link
Member Author

rade commented Jun 29, 2015

I've seen this a couple of times locally with 110_encryption_test.sh

So have I, but that's a separate issue. Though possibly the same cause.

@awh
Copy link
Contributor

awh commented Jul 7, 2015

@bboreham do you have any thoughts on whether #1052 fixes this too?

@bboreham
Copy link
Contributor

bboreham commented Jul 7, 2015

See #1052 (comment)

@rade
Copy link
Member Author

rade commented Jul 13, 2015

I haven't seen this fail for a long time. Let's assume #1052 fixed it.

@rade rade closed this as completed Jul 13, 2015
@bboreham
Copy link
Contributor

Failed on my local PC today.

@rade rade reopened this Jul 14, 2015
@rade rade modified the milestones: current, 1.1.0 Jul 17, 2015
@bboreham
Copy link
Contributor

I see that the two peers can disconnect because they have inconsistent IPAM ring data.
Suggest starting with --init-peer-count=2.

@tomwilkie
Copy link
Contributor

The test does the following:

  1. start 2 routers, not connected
  2. start 2 containers (1 on each), with manually assigned IPs
  3. connect the 2 routers

The change which stops this working (#1200) is that we now check manually assigned IPs with IPAM, which means a ring is being created at 2 and 3 is failing.

@tomwilkie tomwilkie self-assigned this Jul 28, 2015
rade added a commit that referenced this issue Jul 28, 2015
Don't init the ring if the address is not in range.

Fixes #1030.
@bboreham
Copy link
Contributor

Is it wrong to add the --init-peer-count=2? I guess it's unnecessary given your change, but I'm thinking we should follow our own rules.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants