#74 causing asserts on Windows, and failures on osx #75

pshipton · 2019-01-18T13:17:10Z

I assume these are the result of #74 and #71

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/141/
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/137/

https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/36/
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/53/

pshipton · 2019-01-18T13:17:19Z

@Mesbah-Alam

pshipton · 2019-01-18T13:21:58Z

Windows

JVMSHRC777W The softmx limit for shared cache usage is smaller than the minimum feasible value 3980759048 bytes.
08:16:49.391 0x1bd9a00   j9shr.1012   *   ** ASSERTION FAILED ** at CompositeCache.cpp:6276: (((0 != _theca) && hasWriteMutex(currentThread) && (getTotalSize() >= softMaxBytes) && (softMaxBytes >= getUsedBytes())))

osx
JVMSHRC756W Failed to set group access permission on the shared cache file as requested by the 'groupAccess' sub-option.

JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -393970
JVMSHRC337E Platform error message: shmget : Cannot allocate memory
JVMSHRC029E Not enough memory left on the system
JVMSHRC663I Error recovery: destroyed semaphore set with id=65622 associated with shared class cache.

pshipton · 2019-01-18T13:23:59Z

Failed to set group access permission sounds like a machine problem. Not sure if this was occurring in the passing runs since there is no results file.

Mesbah-Alam · 2019-01-23T17:40:48Z

Related: eclipse-openj9/openj9#4375

pshipton · 2019-01-25T12:53:18Z

We are getting the assert again
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/146/

JVMSHRC777W The softmx limit for shared cache usage is smaller than the minimum feasible value 3980759048 bytes.
10:57:18.897 0xf39a00   j9shr.1012   *   ** ASSERTION FAILED ** at CompositeCache.cpp:6276: (((0 != _theca) && hasWriteMutex(currentThread) && (getTotalSize() >= softMaxBytes) && (softMaxBytes >= getUsedBytes())))

@hangshao0 please help figure this out.

pshipton · 2019-01-25T12:55:02Z

0.12 release build
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/144

hangshao0 · 2019-01-25T15:35:56Z

23:43:37.432892070 0x0000000000ee9a00 j9prt.58 Exit <j9shmem_stat exit - contents of controlFile C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35 is corrupted

23:43:37.432899289 0x0000000000ee9a00 j9shr.2233 Exception * SH_OSCachesysv::initializeHeader: The softmx bytes passed in is too big, it is reset to the cache file size 0

The cache file size is 0 and has been corrupted.

Mesbah-Alam · 2019-01-25T16:31:12Z

There were updates made to the test by this PR yesterday: #79.

A 5x Grinder was run at OpenJ9 for it and it passed: https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/245/consoleFull

The grinder ran on https://ci.eclipse.org/openj9/computer/win2012r2-x86-1/

The release build, in which the test failed, ran on the same machine too : https://ci.eclipse.org/openj9/computer/win2012r2-x86-1/

hangshao0 · 2019-01-25T20:22:28Z

GEN 00:16:02.085 -     5  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL1
GEN 00:16:02.085 -     6  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL2
GEN 00:16:02.085 -     7  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL3
GEN 00:16:02.085 -     8  execute  Run java          DefaultLocationGroupAccessJava: Start workload for DefaultLocationGroupAccessJavaWL4
...
...
GEN 00:16:02.085 -    27  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL1
GEN 00:16:02.085 -    28  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL2
GEN 00:16:02.085 -    29  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL3
GEN 00:16:02.085 -    30  execute  Run java          DefinedLocationGroupAccessJava: Start workload for DefinedLocationGroupAccessJavaWL4

Step 7 & 8 always pass and step 29 & 30 always fail. The only difference I notice is that a cacheDir parameter is passed to step 29 & 30.

23:43:37.432873001  0x0000000000ee9a00 j9shr.510           Exit       <shmemOpenWrapper exiting, return code = 110, specLeng = 314572800
23:43:37.432879661  0x0000000000ee9a00 j9prt.56            Entry      >j9shmem_stat entered, name = C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35
23:43:37.432892070  0x0000000000ee9a00 j9prt.58            Exit       <j9shmem_stat exit - contents of controlFile C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35 is corrupted

j9shr.510 shows the cache file is created successfully (return code 110 means J9PORT_INFO_SHMEM_CREATED). But the trace point j9prt.58 - Trc_PRT_shmem_j9shmem_stat_Exit2 shows up right after that. It is triggered only when CreateFileW() returns NULL or INVALID_HANDLE_VALUE.

memHandle = CreateFileW(unicodePath, GENERIC_READ, (FILE_SHARE_READ | FILE_SHARE_WRITE), NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
	
	if (unicodeBuffer != unicodePath) {
		omrmem_free_memory(unicodePath);
	}
		
	if ((NULL == memHandle) || (INVALID_HANDLE_VALUE==memHandle)) {
		Trc_PRT_shmem_j9shmem_stat_Exit2(sharedMemoryFullPath);        <---------- triggered
		omrmem_free_memory(sharedMemoryFullPath);
		return -1;
	}

So it means the cache file is created then immediately deleted.
I guess it might be related to the cacheDir being used in step 29 & 30. (C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches). The default cacheDir in Step 7 & 8 works fine.

hangshao0 · 2019-01-25T21:25:32Z

I've created a PR to include the error code in the trace point: eclipse-openj9/openj9#4463

hangshao0 · 2019-01-25T21:35:21Z

#75 (comment)

Also @Mesbah-Alam launched a 5x grinder (Windows JDK8) in internal Jenkins yesterday, all passed.

pshipton · 2019-01-25T21:55:01Z

Perhaps the failure is machine specific? Grinders only run on a single machine.

pshipton · 2019-01-25T21:56:11Z

Although Mesba says in the comment that the same machine was used.

pshipton · 2019-01-26T13:47:39Z

If it fails in the nightly builds but not in a grinder, maybe one of the other tests is leaving a process running.

Mesbah-Alam · 2019-01-28T15:26:30Z

@hangshao0 - you mentioned about the default limit, in osx, for the maximum number of shared classes cache, that is allowed to be created - 3. Just curious, is the limit same on Windows too?

hangshao0 · 2019-01-28T15:51:39Z

Just curious, is the limit same on Windows too?

No.

The length of path
"C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches\javasharedresources\\C290M4F1A64_DefinedLocationGroupAccessJavaWL4_G35"
is 262, which is greater than MAX_PATH(=260) on Windows. The cacheDir/cacheName is too long. The test needs to be updated to use a shorter cacheDir/cacheName. @Mesbah-Alam

The JVM should do a better job reporting this error.

Mesbah-Alam · 2019-01-28T15:56:46Z

@hangshao0 - I will have a pr with changed path for non-default cache location.

hangshao0 · 2019-01-28T16:43:37Z

CacheDir used in grinder is shorter than the one used in builds. So it passed in grinder but failed in builds.

CacheDir in grinder:
C:\Users\jenkins\workspace\Test-Grinder\openjdk-tests\TestConfig\test_output_15483611858909\SharedClassesAPI_0\20190124-141950-SharedClassesAPI\results\caches
CacheDir in builds:
C:\Users\jenkins\workspace\Test-extended.system-JDK8-win_x86-64_cmprssptrs\openjdk-tests\TestConfig\test_output_15477678633841\SharedClassesAPI_0\20190117-174310-SharedClassesAPI\results\caches

Mesbah-Alam · 2019-01-28T17:42:07Z

Path has been shortened in #80. They should now be less than MAX_PATH(=260).

hangshao0 · 2019-01-29T15:54:39Z

This test is now passing on Windows and OSX

Windows JDK8:

22:20:49 STF 21:20:51.524 - Overall result: PASSED
22:20:49 
22:20:49 SharedClassesAPI_0_PASSED

from https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-win_x86-64_cmprssptrs/152/consoleFull

Window JDK11:
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-win_x86-64_cmprssptrs/152/

OSX JDK8:
https://ci.eclipse.org/openj9/job/Test-extended.system-JDK8-osx_x86-64_cmprssptrs/51/

OSX JDK11:

03:01:00 STF 00:00:59.275 - Overall result: PASSED
03:01:00 
03:01:00 SharedClassesAPI_0_PASSED

from https://ci.eclipse.org/openj9/job/Test-extended.system-JDK11-osx_x86-64_cmprssptrs/68/consoleFull

Mesbah-Alam · 2019-01-29T16:01:18Z

The JVM should do a better job reporting this error.

@hangshao0 - do we have an issue opened for this?

hangshao0 · 2019-01-29T16:05:30Z

do we have an issue opened for this?

I will create a pull request to fix this shortly.

Mesbah-Alam mentioned this issue Jan 28, 2019

Use shorter path for cacheDir since on Windows MAX_PATH is 260 #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#74 causing asserts on Windows, and failures on osx #75

#74 causing asserts on Windows, and failures on osx #75

pshipton commented Jan 18, 2019 •

edited

Loading

pshipton commented Jan 18, 2019

pshipton commented Jan 18, 2019

pshipton commented Jan 18, 2019

Mesbah-Alam commented Jan 23, 2019

pshipton commented Jan 25, 2019

pshipton commented Jan 25, 2019

hangshao0 commented Jan 25, 2019 •

edited

Loading

Mesbah-Alam commented Jan 25, 2019

hangshao0 commented Jan 25, 2019 •

edited

Loading

hangshao0 commented Jan 25, 2019 •

edited

Loading

hangshao0 commented Jan 25, 2019

pshipton commented Jan 25, 2019

pshipton commented Jan 25, 2019

pshipton commented Jan 26, 2019

Mesbah-Alam commented Jan 28, 2019

hangshao0 commented Jan 28, 2019

Mesbah-Alam commented Jan 28, 2019

hangshao0 commented Jan 28, 2019

Mesbah-Alam commented Jan 28, 2019

hangshao0 commented Jan 29, 2019

Mesbah-Alam commented Jan 29, 2019

hangshao0 commented Jan 29, 2019

#74 causing asserts on Windows, and failures on osx #75

#74 causing asserts on Windows, and failures on osx #75

Comments

pshipton commented Jan 18, 2019 • edited Loading

pshipton commented Jan 18, 2019

pshipton commented Jan 18, 2019

pshipton commented Jan 18, 2019

Mesbah-Alam commented Jan 23, 2019

pshipton commented Jan 25, 2019

pshipton commented Jan 25, 2019

hangshao0 commented Jan 25, 2019 • edited Loading

Mesbah-Alam commented Jan 25, 2019

hangshao0 commented Jan 25, 2019 • edited Loading

hangshao0 commented Jan 25, 2019 • edited Loading

hangshao0 commented Jan 25, 2019

pshipton commented Jan 25, 2019

pshipton commented Jan 25, 2019

pshipton commented Jan 26, 2019

Mesbah-Alam commented Jan 28, 2019

hangshao0 commented Jan 28, 2019

Mesbah-Alam commented Jan 28, 2019

hangshao0 commented Jan 28, 2019

Mesbah-Alam commented Jan 28, 2019

hangshao0 commented Jan 29, 2019

Mesbah-Alam commented Jan 29, 2019

hangshao0 commented Jan 29, 2019

pshipton commented Jan 18, 2019 •

edited

Loading

hangshao0 commented Jan 25, 2019 •

edited

Loading

hangshao0 commented Jan 25, 2019 •

edited

Loading

hangshao0 commented Jan 25, 2019 •

edited

Loading