Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipv6 confingurations missing on test machines #1105

Closed
adam-thorpe opened this issue Jan 21, 2020 · 17 comments
Closed

Ipv6 confingurations missing on test machines #1105

adam-thorpe opened this issue Jan 21, 2020 · 17 comments
Assignees
Labels
Milestone

Comments

@adam-thorpe
Copy link

Part of the jdk8u242-b08_openj9-0.18.0 triage
Platform: xlinux
Machine: test-packet-ubuntu1604-x64-2

Tests:
java/net/Inet6Address/B6206527.java

trying LL addr: /fe80:0:0:0:a863:4eff:fe29:3b2e%veth3d7f09a
trying LL addr: /fe80:0:0:0:a863:4eff:fe29:3b2e
    
java.net.BindException: Cannot assign requested address (Bind failed)
	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
	at java.net.ServerSocket.bind(ServerSocket.java:390)
	at java.net.ServerSocket.bind(ServerSocket.java:344)
	at B6206527.main(B6206527.java:53)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.lang.Thread.run(Thread.java:821)

JavaTest Message: Test threw exception: java.net.BindException
JavaTest Message: shutting down test

java/net/ipv6tests/B6521014.java

java.lang.RuntimeException: Test failed: cannot create socket.
	at B6521014.main(B6521014.java:123)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.lang.Thread.run(Thread.java:821)
Caused by: java.net.BindException: Cannot assign requested address (Bind failed)
	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
	at java.net.Socket.bind(Socket.java:662)
	at B6521014.test2(B6521014.java:103)
	at B6521014.main(B6521014.java:121)
	... 6 more

JavaTest Message: Test threw exception: java.lang.RuntimeException
JavaTest Message: shutting down test

Re-build grinders
Failing on test-packet-ubuntu1604-x64-2: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1810/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1809/
Passing on other machines:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1813/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1812/

@sxa
Copy link
Member

sxa commented Jan 21, 2020

Passed at https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1828/console after I got rid of a running docker container that was presumably blocking it.

root@test-packet-ubuntu1604-x64-2:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
19bee1fa3099        b1226cfc7094        "/bin/bash /kafka-te…"   2 months ago        Up 2 months                             cocky_lewin
root@test-packet-ubuntu1604-x64-2:~# docker rm 19bee1fa3099
Error response from daemon: You cannot remove a running container 19bee1fa30990f1e53f1df997c27e83185455be827fd534cfa226ea1648a00b9. Stop the container before attempting removal or force remove
root@test-packet-ubuntu1604-x64-2:~# docker stop 19bee1fa3099
19bee1fa3099

@sxa sxa closed this as completed Jan 21, 2020
@adam-thorpe
Copy link
Author

@sxa555 could you have a poke around on test-osuosl-ubuntu1804-ppc64le-2 and see if it's also got an old running docker process on it? Am seeing the same failures as above on this machine

@adam-thorpe
Copy link
Author

adam-thorpe commented Feb 3, 2020

Also test-packet-ubuntu1604-x64-3 and test-softlayer-ubuntu1604-x64-1 are now showing the same failures. Any idea why this is a recurring problem?

@adam-thorpe
Copy link
Author

test-marist-ubuntu1604-s390x-2 as well now

@adam-thorpe
Copy link
Author

Re-iterating the full list of machines that I believe still have this problem:

test-softlayer-ubuntu1604-x64-1
test-osuosl-ubuntu1804-ppc64le-2
test-packet-ubuntu1604-x64-3
test-packet-ubuntu1604-x64-1
test-scaleway-ubuntu1604-x64-1
test-marist-ubuntu1604-s390x-2

@adam-thorpe adam-thorpe changed the title JDK net failures on test-packet-ubuntu1604-x64-2 Ipv6 confingurations missing on test machines Feb 26, 2020
@adam-thorpe
Copy link
Author

I've excluded these tests on openj9 for jdk8 and 11. Couldn't find any instances of failures on hotspot or jdk14

@sxa
Copy link
Member

sxa commented Feb 26, 2020

@smlambert Given that this seems a fairly wide variety of boxes, do you know if there's any extra config we could apply that would resolve these test issues? Have we seen this internally at IBM on any of your systems?

@smlambert
Copy link
Contributor

yes we have same issue internally. yes also companies running ipv6 on Azure Devops (where osx does not have ipv6) also have this issue.

related: adoptium/aqa-tests#1524

@sxa sxa modified the milestones: February 2020, Icebox / On Hold Feb 27, 2020
@sxa sxa removed their assignment Feb 27, 2020
@sxa sxa modified the milestones: Icebox / On Hold, May 2020 May 4, 2020
@sxa
Copy link
Member

sxa commented May 4, 2020

Now that #1298 is been merged it might be worth seeing if this solution can be used to resolve the problem described above.

@Willsparker
Copy link
Contributor

I've had a quick look at test-packet-ubuntu1604-x64-1 as it happens to be a machine I have access to.
all #1298 does is enable ipv6. I did the following on a U16 Vagrant VM, as it appears to be the Ubuntu equivalent:

sysctl -w net.ipv6.conf.all.disable_ipv6=0
sysctl -w net.ipv6.conf.default.disable_ipv6=0
sysctl -w net.ipv6.conf.lo.disable_ipv6=0

It enabled IPv6 on the VM, however the test machine I was looking at already has it enabled:

root@test-packet-ubuntu1604-x64-1:~# sysctl -a | grep disable_ipv6
...
net.ipv6.conf.all.disable_ipv6 = 0
...
net.ipv6.conf.default.disable_ipv6 = 0
...
net.ipv6.conf.lo.disable_ipv6 = 0

@adam-thorpe can we run a Grinder to make sure the problem still affects the machine ?

@adam-thorpe
Copy link
Author

Still seems to be having problems, different exception but same line.

java/net/Inet6Address/B6206527.java: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3082

10:48:14  STDOUT:
10:48:14  trying LL addr: /fe80:0:0:0:f0bf:e2ff:fe62:740%veth2be7f48
10:48:14  trying LL addr: /fe80:0:0:0:f0bf:e2ff:fe62:740
10:48:14  STDERR:
10:48:14  java.net.SocketException: No such device (Bind failed)
10:48:14  	at java.net.PlainSocketImpl.socketBind(Native Method)
10:48:14  	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
10:48:14  	at java.net.ServerSocket.bind(ServerSocket.java:390)
10:48:14  	at java.net.ServerSocket.bind(ServerSocket.java:344)
10:48:14  	at B6206527.main(B6206527.java:53)
10:48:14  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10:48:14  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
10:48:14  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10:48:14  	at java.lang.reflect.Method.invoke(Method.java:498)
10:48:14  	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
10:48:14  	at java.lang.Thread.run(Thread.java:823)

@Willsparker
Copy link
Contributor

Willsparker commented May 27, 2020

Alright - I've looked at the machine and the same docker container that @sxa found when he was fixing the first machine is there - It appears to be hanging whilst running kafka-test.sh - presumably that's what is taking up the socket that's causing other tests to fail. It may not be relevant, but the version of Kafka is 2.12-2.5.0-SNAPSHOT, on the Docker container and the process that's being ran on the machine itself is

jenkins  20857  0.0  0.0 452996  5916 ?        Sl    2019  17:20 docker run --rm adoptopenjdk-kafka-test:latest

According to docker ps -a , it had been running for 6 months(!).
Removing the container fixed the issue again:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3084/console

I'll go through the list of machines that are affected and clear them all off. If it recurs on a machine that already been cleared up (as I noticed that test-packet-ubuntu1604-x64-2 is still succeeding, so it hasn't recurred), we could look into whats the cause of it.

Cleanup list:

  • test-softlayer-ubuntu1604-x64-1
  • test-osuosl-ubuntu1804-ppc64le-2
  • test-packet-ubuntu1604-x64-3
  • test-packet-ubuntu1604-x64-1
  • test-scaleway-ubuntu1604-x64-1
  • test-marist-ubuntu1604-s390x-2

@Willsparker
Copy link
Contributor

FYI:test-osuosl-ubuntu1804-ppc64le-2 didn't have that container on it- however running the Grinder job failed with:

12:30:41  unzip file: OpenJDK8U-jdk_x64_linux_openj9_2020-05-27-09-44.tar.gz ...
12:30:42  Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
12:30:43  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
12:30:43  /lib64/ld-linux-x86-64.so.2: No such file or directory

Not related to this issue I don't think- also may not be a coincidence that that is the only non-ubuntu1604 machine there.

@Willsparker
Copy link
Contributor

Theres the same issue as above with the test-marist-ubuntu1604-s390x-2 machine too :
https://ci.adoptopenjdk.net/job/Grinder/3100/console

@sxa
Copy link
Member

sxa commented May 28, 2020

12:30:41  unzip file: OpenJDK8U-jdk_x64_linux_openj9_2020-05-27-09-44.tar.gz ...
12:30:42  Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
12:30:43  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
12:30:43  /lib64/ld-linux-x86-64.so.2: No such file or directory

Why is it pulling an x64 JDK for a ppc64le test? Not too surprising the CPU doesn't support it ...

@Willsparker
Copy link
Contributor

Ah that will be my ignorance of Grinder. Rerunning with correct variables:
https://ci.adoptopenjdk.net/job/Grinder/3101/
https://ci.adoptopenjdk.net/job/Grinder/3102/

@Willsparker Willsparker self-assigned this Jun 1, 2020
@Willsparker
Copy link
Contributor

https://ci.adoptopenjdk.net/job/Grinder/3128/console
Last machine has been fixed! Closing issue :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants