Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of TimeUtil in different load conditions #1746

Merged
merged 1 commit into from
Jun 29, 2021

Conversation

jasonjoo2010
Copy link
Collaborator

Describe what this PR does / why we need it

Current design of TimeUtil may cost much more than users think in mind in idle systems.

Does this pull request fix one issue?

#1702

Describe how you did it

To achieve this by making TimeUtil have multiple running states:

  • IDLE: Direct invocation state for idle conditions.
  • RUNNING: Legacy mode for busy conditions.
  • REQUEST: Temporary state from IDLE to RUNNING.

By this design TimeUtil won't cost much in idle systems any more.

Describe how to verify it

Refer to the unit test TimeUtilTest or generate different loads to demo projects.

Special notes for reviews

No

@jasonjoo2010 jasonjoo2010 linked an issue Sep 18, 2020 that may be closed by this pull request
@sczyh30 sczyh30 added the kind/enhancement Category issues or prs related to enhancement. label Sep 18, 2020
@jasonjoo2010
Copy link
Collaborator Author

@sczyh30 Why TimeUtilTest fails only in ci? Do you know the reason?
It succeeded in local environment. What's the difference?

private volatile long currentTimeMillis;
private AtomicLong lastCheck = new AtomicLong();
private LeapArray<Statistic> statistics;
private STATE state = STATE.IDLE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think state should be a volatile variable.

Copy link
Collaborator Author

@jasonjoo2010 jasonjoo2010 Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, liqiangz

Here to use modifiers without a volatile is because following considerations:

  • Low frequency of switching is assumed.
  • All writings to it only occurred in same thread (which maybe we should consider to remove the unnecessary CAS operation)
  • Use L1/L2 cache to get better performance

So should it still be volatile? Is there any other branch I maybe lack of consideration?

Copy link
Contributor

@liqiangz liqiangz Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply, but I still have a confusion.

When state in the daemon thread switch to IDLE, currentTimeMillis will not be updated. But when calling currentTime function in another thread, state may still be RUNNING. The getTime() result will always be the old currentTimeMillis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, liqiangz

Both IDLE and PREPARE are treated as IDLE in getTime() while daemon will only switch PREPARE to RUNNING when it actually started to update currentTimeMillis.

I will remove the CAS operation which is a kind of legacy after several iterations of design. And I also will consider whether it's necessary to make it volatile in java for not strictly reordering forbidden scenario. As we know in rare condition codes like below in c++:

bool cond = true;
while (cond);

may be "optimized" to be

if (cond) {
while(true){}
}

But I am not sure it will occur in JVM's byte codes. Because it more like a bug which it's common in CPP compilers. (aha)

What do you think about the visibility latency in practice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAS operation has been removed and state is taken as volatile now.
In unit test concurrency has been raised more too.

Copy link
Contributor

@liqiangz liqiangz Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be very careful about variables shared by multiple threads. If you want to do some concurrency optimization, you must really understand what you do.

For programs that are not synchronized, if you want to judge the execution order of memory operations, it is almost impossible to get a correct conclusion . A simple way is to synchronize all variables shared by multiple threads.This is the view of the book 《java concurrency in practice》 😄

In this scenario, the value of state modified by one thread may never be seen by another thread, because we don’t know what kind of optimization the compiler and processor will perform. But because currentTimeMillis is volatile(Unlike c++, after JDK 5, JMM strictly restricts the rearrangement of volatile variables and ordinary variables by the compiler and processor), There may be no problems in this program.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah totally agree with you. So CAS has been removed and volatile is marked on state while lastCheck is still a private and local variable.

@sczyh30 sczyh30 self-requested a review September 21, 2020 16:54
@sczyh30 sczyh30 added to-review To review area/performance Issues or PRs related to runtime performance labels Sep 21, 2020
- IDLE: Direct invocation state for idle conditions.
- RUNNING: Legacy mode for busy conditions.
- REQUEST: Temporary state from IDLE to RUNNING.

Through this design TimeUtil won't cost much in idle systems any more.

Signed-off-by: Jason Joo <[email protected]>
@sczyh30 sczyh30 added this to the 1.8.1 milestone Nov 6, 2020
@sczyh30 sczyh30 self-assigned this Nov 6, 2020
@linlinisme
Copy link
Collaborator

TimeUtils的行为模式和sleep的时长做成配置化会更贴近用户的需求

@jasonjoo2010
Copy link
Collaborator Author

TimeUtils的行为模式和sleep的时长做成配置化会更贴近用户的需求

这里的考虑是这样的,越多的配置项,看似自由,实则增加了工具的复杂性
所以对于一些参数化的场景,如果可以斟酌出合理的搭配的话(当然这是前提),那就尽量不增加配置项。

奥卡姆剃刀原理,如无必要,勿增实体

@linlinisme
Copy link
Collaborator

linlinisme commented Nov 20, 2020

TimeUtils的行为模式和sleep的时长做成配置化会更贴近用户的需求

这里的考虑是这样的,越多的配置项,看似自由,实则增加了工具的复杂性
所以对于一些参数化的场景,如果可以斟酌出合理的搭配的话(当然这是前提),那就尽量不增加配置项。

奥卡姆剃刀原理,如无必要,勿增实体

我的意思是对大部分sentinel使用者来说直接内部走原生的系统调用是不会有性能问题。自适应模式对这类用户的收益不大,有时候还会造成困拢。TimeUtils的优化是针对部分已确定是超高流量的情况下使用的,这种占比应该还少数。针对TimeUtils的sleep的时长,超高流量的用户他们可能会自己测试不同时长下的影响,这类属于高端玩家,都会有自己的想法。

@jasonjoo2010
Copy link
Collaborator Author

TimeUtils的行为模式和sleep的时长做成配置化会更贴近用户的需求

这里的考虑是这样的,越多的配置项,看似自由,实则增加了工具的复杂性
所以对于一些参数化的场景,如果可以斟酌出合理的搭配的话(当然这是前提),那就尽量不增加配置项。
奥卡姆剃刀原理,如无必要,勿增实体

我的意思是对大部分sentinel使用者来说直接内部走原生的系统调用是不会有性能问题。自适应模式对这类用户的收益不大,有时候还会造成困拢。TimeUtils的优化是针对部分已确定是超高流量的情况下使用的,这种占比应该还少数。针对TimeUtils的sleep的时长,超高流量的用户他们可能会自己测试不同时长下的影响,这类属于高端玩家,都会有自己的想法。

这个就看怎么评估了,是以接口方式做两种实现变成配置项(类似jvm的gc算法),还是以一个不显著(同一数量级)高于A、B两实现任一实现的方式做统一实现。
这个的调用点还是蛮多的,且属于开放类,使用者业务系统内也是可以使用的,符合“性能扩展工具包”的角色定位。

所以两个方法的确都能实现这个需求。
就个人来讲,我更倾向自动适配,毕竟jvm的gc算法已经令不仅是初学者眼花缭乱了。
这个可以更多的讨论来确定不同人的想法。

binbin0325
binbin0325 approved these changes Dec 7, 2020
@sczyh30 sczyh30 added the priority/high Very important, need to be worked with soon but not very urgent label Dec 18, 2020
@sczyh30 sczyh30 modified the milestones: 1.8.1, 1.8.2 Jan 29, 2021
@JerryChin
Copy link
Contributor

补充一点小意见:使用 Thread.sleep(ONE_MILLISECOND)TimeUnit.MILLISECONDS.sleep(1) 方法更高效。

PS: ONE_MILLISECOND = 1

@sczyh30 sczyh30 removed the to-review To review label Jun 29, 2021
Copy link
Member

@sczyh30 sczyh30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sczyh30 sczyh30 merged commit f646997 into alibaba:master Jun 29, 2021
@sczyh30
Copy link
Member

sczyh30 commented Jun 29, 2021

Nice work. This reduces the CPU footprint under low-frequent requests. Thanks for contributing!

sczyh30 pushed a commit that referenced this pull request Jun 29, 2021
…ns (#1746)

To achieve this by making TimeUtil have multiple running states:

- IDLE: Direct invocation state for idle conditions.
- RUNNING: Legacy mode for busy conditions.
- REQUEST: Temporary state from IDLE to RUNNING.

Through this design, TimeUtil won't cost much in idle systems anymore.

Signed-off-by: Jason Joo <[email protected]>
@JJFly-JOJO
Copy link

JJFly-JOJO commented Jun 30, 2021

Local test verification

    1. Low-traffic situation
      thread count = 16
      Call SphU#entry every other 1_000 ms
      QPS = 16

Before optimization: thread CPU usage=60%-80%
After optimization: thread CPU usage=about 10%

    1. High-traffic situation
      thread count = 16
      Call SphU#entry every other 2 ms
      QPS = 8000

Before optimization: thread CPU usage=about 8%
After optimization: thread CPU usage=about 8%

    1. High-concurrency situation
      thread count = 32
      Call SphU#entry every other 10 ms

Before optimization: thread CPU usage=about 16%
After optimization: thread CPU usage=about 18%

Summary:
1.this optimization significantly reduces CPU usage int Low-traffic situation
2.the CPU loss of private LeapArray<Statistic> statistics is not obvious in High-traffic situation and High-concurrency situation.

linkolen pushed a commit to shivagowda/Sentinel that referenced this pull request Aug 14, 2021
…ns (alibaba#1746)

To achieve this by making TimeUtil have multiple running states:

- IDLE: Direct invocation state for idle conditions.
- RUNNING: Legacy mode for busy conditions.
- REQUEST: Temporary state from IDLE to RUNNING.

Through this design, TimeUtil won't cost much in idle systems anymore.

Signed-off-by: Jason Joo <[email protected]>
linkolen pushed a commit to shivagowda/Sentinel that referenced this pull request Aug 16, 2021
…ns (alibaba#1746)

To achieve this by making TimeUtil have multiple running states:

- IDLE: Direct invocation state for idle conditions.
- RUNNING: Legacy mode for busy conditions.
- REQUEST: Temporary state from IDLE to RUNNING.

Through this design, TimeUtil won't cost much in idle systems anymore.

Signed-off-by: Jason Joo <[email protected]>
Zhang-0952 pushed a commit to Zhang-0952/Sentinel that referenced this pull request Mar 4, 2022
…ns (alibaba#1746)

To achieve this by making TimeUtil have multiple running states:

- IDLE: Direct invocation state for idle conditions.
- RUNNING: Legacy mode for busy conditions.
- REQUEST: Temporary state from IDLE to RUNNING.

Through this design, TimeUtil won't cost much in idle systems anymore.

Signed-off-by: Jason Joo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Issues or PRs related to runtime performance kind/enhancement Category issues or prs related to enhancement. priority/high Very important, need to be worked with soon but not very urgent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

com.alibaba.csp.sentinel.util.TimeUtil high CPU usage
7 participants