Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test-extended.system-JDK8-osx_x86-64_cmprssptrs SharedClassesAPI_0 Failed to set group access permission #4375

Closed
pshipton opened this issue Jan 22, 2019 · 20 comments

Comments

@pshipton
Copy link
Member

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/40

JVMSHRC756W Failed to set group access permission on the shared cache file as requested by the 'groupAccess' sub-option.
@pshipton
Copy link
Member Author

@jdekonin is there a umask on the osx machines which prevents setting group access?

@pshipton
Copy link
Member Author

pshipton commented Jan 22, 2019

Also, what is the current setting for sysv shared memory?
We are still getting the following in this same test. Maybe it needs to be bumped up again.

JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -393970
JVMSHRC337E Platform error message: shmget : Cannot allocate memory
JVMSHRC029E Not enough memory left on the system

@pshipton
Copy link
Member Author

@Mesbah-Alam

@jdekonin
Copy link
Contributor

osx1011vm1:workspace jenkins$ umask
0022
osx1011vm1:workspace jenkins$ mkdir joe
osx1011vm1:workspace jenkins$ ls -al
...
drwxr-xr-x   2 jenkins  staff    68 Jan 22 13:36 joe
...
osx1011vm1:workspace jenkins$ chown jenkins:everyone joe
osx1011vm1:workspace jenkins$ ls -al
...
drwxr-xr-x   2 jenkins  everyone    68 Jan 22 13:36 joe
...
osx1011vm1:workspace jenkins$ chown jenkins:foo joe
chown: foo: illegal group name
osx1011vm1:workspace jenkins$ chmod 700 joe
osx1011vm1:workspace jenkins$ ls -al
...
drwx------   2 jenkins  staff    68 Jan 22 13:36 joe
...
osx1011vm1:workspace jenkins$ chmod 770 joe
osx1011vm1:workspace jenkins$ ls -al
...
drwxrwx---   2 jenkins  staff    68 Jan 22 13:36 joe
...
osx1011vm1:~ jenkins$ cat /etc/sysctl.conf 
kern.sysv.shmmax=1258291200
kern.sysv.shmall=307200

Doesn't seem to be a problem. There doesn't appear to be shared memory left by previous jobs, but the cleanup jobs delete that content every few hours now.

@Mesbah-Alam
Copy link
Contributor

I believe the original reason why this issue was opened is resolved by : adoptium/openj9-systemtest#76

There was comma missing in front of the groupAccess option that the test was setting.

@pshipton
Copy link
Member Author

@Mesbah-Alam note the tests still failed last night, which included adoptium/openj9-systemtest#76

The sysv numbers seem very exact, to allow 4 caches of 300MB each.
@Mesbah-Alam do the tests set a cache size?

@Mesbah-Alam
Copy link
Contributor

@pshipton - the test does not set any cache size.

@hangshao0
Copy link
Contributor

We the warning JVMSHRC756W is due to the umask 0022.
umask
0022
umask -S
u=rwx,g=rx,o=rx

Group user is not able to write to the shared cache.
However, this warning might not cause the test to fail.

The sysv numbers seem very exact, to allow 4 caches of 300MB each.

I have an impression on OSX, you cannot have shared memory equal to kern.sysv.shmmax/kern.sysv.shmall, it must be less than kern.sysv.shmmax/kern.sysv.shmall. So technically this setting (1200MB) only allows 3 caches. It is better to set kern.sysv.shmmax/kern.sysv.shmall to a number like N * 300MB + 10MB.

@hangshao0
Copy link
Contributor

I suspect SharedClassesCacheChecker.delete() failed to delete the nonpersistent cache either in default directory (user home) or under /tmp/ when "groupAccess" is used.

@Mesbah-Alam
Copy link
Contributor

The non-default directory is actually somewhere under the build's workspace directory(not /tmp).

@hangshao0
Copy link
Contributor

There are 2 default shared cache directories:

  1. When "groupAccess" is used, the default directory is under /tmp/.
  2. When "groupAccess" is not used, the default directory is under user's home.

@pshipton
Copy link
Member Author

@jdekonin can you please fix the umask and kern.sysv.shmmax/kern.sysv.shmall as per #4375 (comment)

@Mesbah-Alam
Copy link
Contributor

@hangshao0,

Looking at the test code:

When non-default location is used, the cacheDir is being set to test.env().getResultsDir().childDirectory("caches"); (which should point to a folder inside the test's execution directory, under build's workspace) - https://github.com/eclipse/openj9-systemtest/blob/6cf9420140df3a982a7af8f91153b7dc71ceefe5/openj9.test.sharedClasses.jvmti/src/test.sharedClasses.jvmti/net/openj9/stf/SharedClassesAPI.java#L138

When default location is used, cacheDir="", then the cacheDir option in -Xshareclasses is not used:
(e.g. https://github.com/eclipse/openj9-systemtest/blob/6cf9420140df3a982a7af8f91153b7dc71ceefe5/openj9.test.sharedClasses.jvmti/src/test.sharedClasses.jvmti/net/openj9/stf/SharedClassesAPI.java#L248) - so the test then is expecting the cache to be created in what the "default" location is.

@Mesbah-Alam
Copy link
Contributor

Hi @jdekonin,

We ran a grinder with the SharedClassesAPI test at Adopt: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/896/tapResults/

It ran on macos1010-x64-1: {ip: 74.80.250.151, user: admin} , and failed with following:

JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -174
JVMSHRC337E Platform error message: Invalid argument
JVMSHRC026E Cannot create cache of requested size: Please check your SHMMAX and SHMMIN settings
JVMSHRC663I Error recovery: destroyed semaphore set with id=65538 associated with shared class cache.
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Checking inside the machine, we found that it does not contain /etc/sysctl.conf file.

Is it because the osx test machines at Adopt are not configured to run Shared Classes Test yet as the OpenJ9 osx machine?

A different but similar error is seen while running the test in the Internal Grinder (on osx): https://hyc-runtimes-jenkins.swg-devops.com/view/Test_system/job/Grinder/1280/console

JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -393970
JVMSHRC337E Platform error message: shmget : Cannot allocate memory
JVMSHRC029E Not enough memory left on the system
JVMSHRC663I Error recovery: destroyed semaphore set with id=1966101 associated with shared class cache.
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Can you please check the output of cat /etc/sysctl.conf in internal osx machine: https://hyc-runtimes-jenkins.swg-devops.com/computer/osxrt2/.

@jdekonin
Copy link
Contributor

umask has been set to 0002 and sysctl.conf, with @hangshao0 suggestion, to
kern.sysv.shmmax=125839605760
kern.sysv.shmall=30722560

These changes have been applied to all openj9 ci osx systems. fyi @sxa555 settings required to existing adopt osx machines. I have updated adoptium/infrastructure#212 with updated requirement to the playbook.

@pshipton
Copy link
Member Author

Now we are getting a different error.

https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-osx_x86-64_cmprssptrs/88
osx1011-x86-2

Testing: Test 26: CMVC 168131 : Create a non persistent cache
Test start time: 2019/01/25 00:34:09 Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test-sanity.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/java"  -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:gencon  -Xshareclasses:name=ShareClassesCMLTests,nonpersistent -version
Time spent starting: 3 milliseconds
Time spent executing: 2050 milliseconds
Test result: FAILED
 [ERR] JVMSHRC659E An error has occurred while opening shared memory
 [ERR] JVMSHRC336E Port layer error code = -174
 [ERR] JVMSHRC337E Platform error message: Invalid argument
 [ERR] JVMSHRC026E Cannot create cache of requested size: Please check your SHMMAX and SHMMIN settings
 [ERR] JVMSHRC663I Error recovery: destroyed semaphore set with id=2686977 associated with shared class cache.
 [ERR] JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed
 [ERR] Error: Could not create the Java Virtual Machine.
 [ERR] Error: A fatal exception has occurred. Program will exit.
>> Success condition was not found: [Output match: (java|openjdk) version]
>> Failure condition was not found: [Output match: Unhandled Exception]
>> Failure condition was not found: [Output match: Exception:]
>> Failure condition was not found: [Output match: corrupt]
>> Failure condition was not found: [Output match: Processing dump event]

@hangshao0
Copy link
Contributor

[ERR] JVMSHRC026E Cannot create cache of requested size: Please check your SHMMAX and SHMMIN settings

It still complains about SHMMAX @jdekonin

@pshipton
Copy link
Member Author

pshipton commented Jan 25, 2019

@jdekonin it seems the shmmax/shmall changes to the machines #4375 (comment) are also blocking OMR acceptance builds.
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-osx_x86-64_cmprssptrs/90/
osx1011-x86-2

Testing: Test 26: CMVC 168131 : Create a non persistent cache
Test start time: 2019/01/25 12:26:31 Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test-sanity.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/java"  -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:gencon  -Xshareclasses:name=ShareClassesCMLTests,nonpersistent -version
Time spent starting: 3 milliseconds
Time spent executing: 2061 milliseconds
Test result: FAILED
 [ERR] JVMSHRC659E An error has occurred while opening shared memory
 [ERR] JVMSHRC336E Port layer error code = -174
 [ERR] JVMSHRC337E Platform error message: Invalid argument
 [ERR] JVMSHRC026E Cannot create cache of requested size: Please check your SHMMAX and SHMMIN settings
 [ERR] JVMSHRC663I Error recovery: destroyed semaphore set with id=5111809 associated with shared class cache.
 [ERR] JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed

@AdamBrousseau
Copy link
Contributor

Typo in config file. Rebooting now

@pshipton pshipton removed the blocker label Jan 25, 2019
@hangshao0
Copy link
Contributor

Test-extended.system-JDK8-osx_x86-64_cmprssptrs SharedClassesAPI_0 is now passing:
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants