Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#74 causing asserts on Windows, and failures on osx #75

Open
pshipton opened this issue Jan 18, 2019 · 22 comments
Open

#74 causing asserts on Windows, and failures on osx #75

pshipton opened this issue Jan 18, 2019 · 22 comments

Comments

@pshipton
Copy link
Contributor Author

@Mesbah-Alam

@pshipton
Copy link
Contributor Author

Windows

JVMSHRC777W The softmx limit for shared cache usage is smaller than the minimum feasible value 3980759048 bytes.
08:16:49.391 0x1bd9a00   j9shr.1012   *   ** ASSERTION FAILED ** at CompositeCache.cpp:6276: (((0 != _theca) && hasWriteMutex(currentThread) && (getTotalSize() >= softMaxBytes) && (softMaxBytes >= getUsedBytes())))

osx
JVMSHRC756W Failed to set group access permission on the shared cache file as requested by the 'groupAccess' sub-option.

JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -393970
JVMSHRC337E Platform error message: shmget : Cannot allocate memory
JVMSHRC029E Not enough memory left on the system
JVMSHRC663I Error recovery: destroyed semaphore set with id=65622 associated with shared class cache.

@pshipton
Copy link
Contributor Author

Failed to set group access permission sounds like a machine problem. Not sure if this was occurring in the passing runs since there is no results file.

@Mesbah-Alam
Copy link
Contributor

Related: eclipse-openj9/openj9#4375

@pshipton
Copy link
Contributor Author

We are getting the assert again
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/146/

JVMSHRC777W The softmx limit for shared cache usage is smaller than the minimum feasible value 3980759048 bytes.
10:57:18.897 0xf39a00   j9shr.1012   *   ** ASSERTION FAILED ** at CompositeCache.cpp:6276: (((0 != _theca) && hasWriteMutex(currentThread) && (getTotalSize() >= softMaxBytes) && (softMaxBytes >= getUsedBytes())))

@hangshao0 please help figure this out.

@pshipton
Copy link
Contributor Author

@hangshao0
Copy link
Contributor

hangshao0 commented Jan 25, 2019

23:43:37.432892070 0x0000000000ee9a00 j9prt.58 Exit <j9shmem_stat exit - contents of controlFile C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35 is corrupted

23:43:37.432899289 0x0000000000ee9a00 j9shr.2233 Exception * SH_OSCachesysv::initializeHeader: The softmx bytes passed in is too big, it is reset to the cache file size 0

The cache file size is 0 and has been corrupted.

@Mesbah-Alam
Copy link
Contributor

There were updates made to the test by this PR yesterday: #79.

A 5x Grinder was run at OpenJ9 for it and it passed: https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/245/consoleFull

The grinder ran on https://ci.eclipse.org/openj9/computer/win2012r2-x86-1/

The release build, in which the test failed, ran on the same machine too : https://ci.eclipse.org/openj9/computer/win2012r2-x86-1/

@hangshao0
Copy link
Contributor

hangshao0 commented Jan 25, 2019

GEN 00:16:02.085 -     5  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL1
GEN 00:16:02.085 -     6  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL2
GEN 00:16:02.085 -     7  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL3
GEN 00:16:02.085 -     8  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL4
...
...
GEN 00:16:02.085 -    27  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL1
GEN 00:16:02.085 -    28  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL2
GEN 00:16:02.085 -    29  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL3
GEN 00:16:02.085 -    30  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL4

Step 7 & 8 always pass and step 29 & 30 always fail. The only difference I notice is that a cacheDir parameter is passed to step 29 & 30.

23:43:37.432873001  0x0000000000ee9a00 j9shr.510           Exit       <shmemOpenWrapper exiting, return code = 110, specLeng = 314572800
23:43:37.432879661  0x0000000000ee9a00 j9prt.56            Entry      >j9shmem_stat entered, name = C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35
23:43:37.432892070  0x0000000000ee9a00 j9prt.58            Exit       <j9shmem_stat exit - contents of controlFile C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35 is corrupted

j9shr.510 shows the cache file is created successfully (return code 110 means J9PORT_INFO_SHMEM_CREATED). But the trace point j9prt.58 - Trc_PRT_shmem_j9shmem_stat_Exit2 shows up right after that. It is triggered only when CreateFileW() returns NULL or INVALID_HANDLE_VALUE.

memHandle = CreateFileW(unicodePath, GENERIC_READ, (FILE_SHARE_READ | FILE_SHARE_WRITE), NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
	
	if (unicodeBuffer != unicodePath) {
		omrmem_free_memory(unicodePath);
	}
		
	if ((NULL == memHandle) || (INVALID_HANDLE_VALUE==memHandle)) {
		Trc_PRT_shmem_j9shmem_stat_Exit2(sharedMemoryFullPath);        <---------- triggered
		omrmem_free_memory(sharedMemoryFullPath);
		return -1;
	}

So it means the cache file is created then immediately deleted.
I guess it might be related to the cacheDir being used in step 29 & 30. (C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches). The default cacheDir in Step 7 & 8 works fine.

@hangshao0
Copy link
Contributor

hangshao0 commented Jan 25, 2019

I've created a PR to include the error code in the trace point: eclipse-openj9/openj9#4463

@hangshao0
Copy link
Contributor

#75 (comment)

Also @Mesbah-Alam launched a 5x grinder (Windows JDK8) in internal Jenkins yesterday, all passed.

@pshipton
Copy link
Contributor Author

Perhaps the failure is machine specific? Grinders only run on a single machine.

@pshipton
Copy link
Contributor Author

Although Mesba says in the comment that the same machine was used.

@pshipton
Copy link
Contributor Author

If it fails in the nightly builds but not in a grinder, maybe one of the other tests is leaving a process running.

@Mesbah-Alam
Copy link
Contributor

@hangshao0 - you mentioned about the default limit, in osx, for the maximum number of shared classes cache, that is allowed to be created - 3. Just curious, is the limit same on Windows too?

@hangshao0
Copy link
Contributor

Just curious, is the limit same on Windows too?

No.

The length of path
"C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35"
is 262, which is greater than MAX_PATH(=260) on Windows. The cacheDir/cacheName is too long. The test needs to be updated to use a shorter cacheDir/cacheName. @Mesbah-Alam

The JVM should do a better job reporting this error.

@Mesbah-Alam
Copy link
Contributor

@hangshao0 - I will have a pr with changed path for non-default cache location.

@hangshao0
Copy link
Contributor

CacheDir used in grinder is shorter than the one used in builds. So it passed in grinder but failed in builds.

CacheDir in grinder:
C:\Users\jenkins\workspace\Test-Grinder\openjdk-tests\TestConfig\test_output_15483611858909\SharedClassesAPI_0\20190124-141950-SharedClassesAPI\results\caches
CacheDir in builds:
C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches

@Mesbah-Alam
Copy link
Contributor

Path has been shortened in #80. They should now be less than MAX_PATH(=260).

@hangshao0
Copy link
Contributor

This test is now passing on Windows and OSX

Windows JDK8:

22:20:49 STF 21:20:51.524 - Overall result: PASSED
22:20:49 
22:20:49 SharedClassesAPI_0_PASSED

from https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/152/consoleFull

Window JDK11:
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/152/

OSX JDK8:
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/51/

OSX JDK11:

03:01:00 STF 00:00:59.275 - Overall result: PASSED
03:01:00 
03:01:00 SharedClassesAPI_0_PASSED

from https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/68/consoleFull

@Mesbah-Alam
Copy link
Contributor

The JVM should do a better job reporting this error.

@hangshao0 - do we have an issue opened for this?

@hangshao0
Copy link
Contributor

do we have an issue opened for this?

I will create a pull request to fix this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants