Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omrsysinfo_get_CPU_load() return a new error for insufficient data #7189

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

thallium
Copy link
Contributor

@thallium thallium commented Nov 22, 2023

This is to determine if a call to omrsysinfo_get_CPU_load has only
one data point recorded.

A new error, OMRPORT_ERROR_INSUFFICIENT_DATA, is returned
in this scenario.

Also fixed a mistake in the description saying that the first two
invocations return OMRPORT_ERROR_OPFAILED and fix the time unit in the
description to ms.

Related: eclipse-openj9/openj9#18451

@thallium
Copy link
Contributor Author

@tajila FYI

port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for supporting the project, and congratulations on your first contribution! A project committer will shortly review your contribution. In the mean time, if you haven't had a chance please skim over the contribution guidelines which all pull requests must adhere to. If the ECA pull request check fails, have a look at the instructions for signing the ECA in the legal considerations section.

If you run into any problems our community will be happy to assist you in any way we can. There are a number of recommended ways to interact with the community. We encourage you to ask questions, or drop by to say hello.

@tajila
Copy link
Contributor

tajila commented Nov 22, 2023

@babsingh Please review these changes

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

Currently, this PR only updates the unix impl. The win32 impl also needs to be updated:

omr/port/win32/omrsysinfo.c

Lines 1271 to 1275 in 8b19b80

if (oldestCPUTime->timestamp == 0) {
*oldestCPUTime = currentCPUTime;
*latestCPUTime = currentCPUTime;
return OMRPORT_ERROR_OPFAILED;
}

@babsingh
Copy link
Contributor

Also, the function description needs to be updated to reflect the new error handling:

/**
* Obtain the cumulative CPU utilization of all CPUs on the system as a double in the [0.0, 1.0] interval. A value of
* 0.0 means all of the CPUs were idle in the recent period of time observed, while a value of 1.0 means that all CPUs
* were actively utilized 100% of the time during the recent period of time observed.
*
* All values in the interval are possible.
*
* @param[in] portLibrary The port library.
* @param[out] cpuLoad the cumulative CPU utilization of all CPUs on the system
*
* @return
* - 0 on success
* - \arg OMRPORT_ERROR_OPFAILED on the first two invocations of this API
* - \arg OMRPORT_ERROR_OPFAILED if less than 10ns have passed since the second call to this API
* - negative portable error code on other failures
*/
intptr_t
omrsysinfo_get_CPU_load(struct OMRPortLibrary *portLibrary, double *cpuLoad)
{
return OMRPORT_ERROR_SYSINFO_NOT_SUPPORTED;
}

@babsingh
Copy link
Contributor

There is also the zos impl, which needs to be updated:

omr/port/unix/omrsysinfo.c

Lines 4671 to 4678 in 8b19b80

#elif defined(J9ZOS390) /* (defined(LINUX) && !defined(OMRZTPF)) || defined(AIXPPC) || defined(OSX) */
currentCPUTime.timestamp = portLibrary->time_nano_time(portLibrary);
if (oldestCPUTime->timestamp == 0) {
*oldestCPUTime = currentCPUTime;
*latestCPUTime = currentCPUTime;
return OMRPORT_ERROR_OPFAILED;
}

@thallium
Copy link
Contributor Author

Also, the function description needs to be updated to reflect the new error handling:

/**
* Obtain the cumulative CPU utilization of all CPUs on the system as a double in the [0.0, 1.0] interval. A value of
* 0.0 means all of the CPUs were idle in the recent period of time observed, while a value of 1.0 means that all CPUs
* were actively utilized 100% of the time during the recent period of time observed.
*
* All values in the interval are possible.
*
* @param[in] portLibrary The port library.
* @param[out] cpuLoad the cumulative CPU utilization of all CPUs on the system
*
* @return
* - 0 on success
* - \arg OMRPORT_ERROR_OPFAILED on the first two invocations of this API
* - \arg OMRPORT_ERROR_OPFAILED if less than 10ns have passed since the second call to this API
* - negative portable error code on other failures
*/
intptr_t
omrsysinfo_get_CPU_load(struct OMRPortLibrary *portLibrary, double *cpuLoad)
{
return OMRPORT_ERROR_SYSINFO_NOT_SUPPORTED;
}

Why does it says "on the first two invocations"? Is that a typo?

@babsingh
Copy link
Contributor

babsingh commented Nov 23, 2023

Why does it says "on the first two invocations"? Is that a typo?

I am unable to confirm by just reading the below design and implementation; it doesn't seem straightforward.

@thallium Can you verify by adding print statements for the return value in the existing test (see below)? I can run PR builds on all platforms once you have added debug code in the test. on the first two invocations: might be platform specific.

omr/fvtest/porttest/si.cpp

Lines 1689 to 1716 in 8b19b80

TEST(PortSysinfoTest, sysinfo_test_get_CPU_load)
{
OMRPORT_ACCESS_FROM_OMRPORT(portTestEnv->getPortLibrary());
/* As per the API specification the first two calls to this API will return a negative portable error code. However
* for the purposes of this test we will not be testing this. This is because the test infrastructure is setup such
* that we cannot guarantee that no other test has called omrsysinfo_get_CPU_utlization or omrsysinfo_get_CPU_load
* up to this point. If some other test did call these APIs then the internal buffers would have been populated and
* as such the omrsysinfo_get_CPU_load could return a zero return code on the very first invocation within this
* test.
*
* To avoid inter-test dependencies we do not assert on the return value of the first two calls here, and only test
* that the API returns valid numbers within the range outlined in the API specification.
*/
double cpuLoad;
omrsysinfo_get_CPU_load(&cpuLoad);
omrsysinfo_get_CPU_load(&cpuLoad);
/* Sleep for 100ms before re-sampling processor usage stats. This allows other processes and the operating system to
* use the CPU and drive up the user and kernel utilization. The call to cpuBurner probably won't be optimized out,
* but use the result to make absolutely sure that it isn't.
*/
omrthread_sleep(100 + cpuBurner(OMRPORTLIB, "a"));
ASSERT_EQ(omrsysinfo_get_CPU_load(&cpuLoad), 0);
ASSERT_GE(cpuLoad, 0.0);
ASSERT_LE(cpuLoad, 1.0);
}

@babsingh
Copy link
Contributor

babsingh commented Nov 23, 2023

@fjeremic For #7189 (comment), do you remember why the second invocation of omrsysinfo_get_CPU_load also returns OMRPORT_ERROR_OPFAILED?

@thallium
Copy link
Contributor Author

@babsingh I've added the debug output, can you run the test?

@babsingh
Copy link
Contributor

jenkins build all(test:'-R porttest',env:'GTEST_FILTER=PortSysinfoTest.sysinfo_test_get_CPU_load',compile:'-j8')

If the printf output is not seen, then the log methods in the test suite will need to be used.

@babsingh
Copy link
Contributor

jenkins build all(test:'-R porttest',env:'GTEST_FILTER=PortSysinfoTest.sysinfo_test_get_CPU_load')

@babsingh
Copy link
Contributor

Windows compilation failed:

00:25:44      53>c:\omr\workspace\pullrequest-win_x86-64\build\fvtest\porttest\si.cpp(1705): error C2220: warning treated as error - no 'object' file generated [C:\omr\workspace\PullRequest-win_x86-64\Build\build\fvtest\porttest\omrporttest.vcxproj]
00:25:44      53>c:\omr\workspace\pullrequest-win_x86-64\build\fvtest\porttest\si.cpp(1705): warning C4244: '=': conversion from 'intptr_t' to 'int', possible loss of data [C:\omr\workspace\PullRequest-win_x86-64\Build\build\fvtest\porttest\omrporttest.vcxproj]
00:25:44      53>c:\omr\workspace\pullrequest-win_x86-64\build\fvtest\porttest\si.cpp(1707): warning C4244: '=': conversion from 'intptr_t' to 'int', possible loss of data [C:\omr\workspace\PullRequest-win_x86-64\Build\build\fvtest\porttest\omrporttest.vcxproj]

Test output:

00:43:49  15: Return value of first invocation: -21
00:43:49  15: Return value of second invocation: -1

I only checked a few build jobs. I consistently saw the above test output. @thallium Verify the test output in the PR builds as well. Fix the Windows compilation issue and verify the test output on Windows.

@babsingh
Copy link
Contributor

babsingh commented Nov 24, 2023

The test output for Windows is also the same:

00:43:49  15: Return value of first invocation: -21
00:43:49  15: Return value of second invocation: -1

The second OMRPORT_ERROR_OPFAILED is probably returned from the end of the function. @tajila I don't believe the current changes are sufficient to support eclipse-openj9/openj9#18451. Based upon the above behaviour, there is still ambiguity between the insufficient data and insufficient time cases.

@tajila
Copy link
Contributor

tajila commented Dec 4, 2023

Intent of eclipse-openj9/openj9#18451: Make get[Process|System]CpuLoad() return 0 on the first call.

I don't think this is fully accurate. The real intent is to return zero if we have one data point instead of -1 as the RI does. This is usually the case on the first call.

can the RI still throw an error in the first call?

Im not certain, ubt I would expect that to be the case. We can try to verify this. For example, if /proc/stat was not reachable then I would expect RI to fail as well and not return zero.

If RI doesn't do this, then they are losing information, because returning 0 when there is a hard failure means there is no way to tell that there is a functional issue. In which case, I would still be okay with the behaviour being proposed here.

@babsingh
Copy link
Contributor

babsingh commented Dec 5, 2023

@thallium Please post an update in the PR once the below tasks are completed.

  • Update the documentation to reflect .... The real intent is to return zero if we have one data point instead of -1 as the RI does.
  • Verify RI's behaviour if /proc/stat is not reachable.

@thallium
Copy link
Contributor Author

thallium commented Dec 5, 2023

Verify RI's behaviour if /proc/stat is not reachable.

RI returns -1 in this situation.

@babsingh
Copy link
Contributor

babsingh commented Dec 6, 2023

Thanks @thallium for verifying RI's behaviour. Can you also update the commit message to reflect the new documentation?

Return another error if only one data point has been recorded

This is to determine if a call to omrsysinfo_get_CPU_load has only
one data point recorded.

...

@thallium thallium force-pushed the master branch 2 times, most recently from 4897e4a to 5c4e691 Compare December 6, 2023 15:32
@babsingh
Copy link
Contributor

babsingh commented Dec 6, 2023

new line (return) is missing from the first line.

image

@babsingh babsingh changed the title Make omrsysinfo_get_CPU_load() return another error code on first call omrsysinfo_get_CPU_load() return a new error for insufficient data Dec 6, 2023
@babsingh
Copy link
Contributor

babsingh commented Dec 6, 2023

still a space is missing onlyone -> only one. i have updated the PR description for you.
image

This is to determine if a call to omrsysinfo_get_CPU_load has only one
data point recorded.

Also fixed a mistake in the description saying that the first two
invocations return OMRPORT_ERROR_OPFAILED and fix the time unit in the
description to "ms".

Related: eclipse-openj9/openj9#18451

Signed-off-by: Gengchen Tuo <[email protected]>
@babsingh
Copy link
Contributor

babsingh commented Dec 6, 2023

jenkins build all

@babsingh
Copy link
Contributor

babsingh commented Dec 6, 2023

jenkins build aix

@babsingh
Copy link
Contributor

babsingh commented Dec 6, 2023

Only a known issue is observed:

@babsingh babsingh merged commit e5e7d55 into eclipse:master Dec 6, 2023
16 of 18 checks passed
thallium added a commit to thallium/openj9 that referenced this pull request Dec 13, 2023
Currently, getProcessCpuLoad() and getSystemCpuLoad() return -1 if only
one data point has been recorded. A compatibility flag
-XX:[+/-]CpuLoadCompatibility is added to match the behaviour of the RI,
which is to return 0.

Fixes: eclipse-openj9#13389
Related: eclipse/omr#7189

Signed-off-by: Gengchen Tuo <[email protected]>
thallium added a commit to thallium/openj9 that referenced this pull request Dec 15, 2023
Currently, getProcessCpuLoad() and getSystemCpuLoad() return -1 if only
one data point has been recorded. A compatibility flag
-XX:[+/-]CpuLoadCompatibility is added to match the behaviour of the RI,
which is to return 0.

Fixes: eclipse-openj9#13389
Related: eclipse/omr#7189

Signed-off-by: Gengchen Tuo <[email protected]>
thallium added a commit to thallium/openj9 that referenced this pull request Dec 17, 2023
Currently, getProcessCpuLoad() and getSystemCpuLoad() return -1 if only
one data point has been recorded. A compatibility flag
-XX:[+/-]CpuLoadCompatibility is added to match the behaviour of the RI,
which is to return 0.

Fixes: eclipse-openj9#13389
Related: eclipse/omr#7189

Signed-off-by: Gengchen Tuo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants