Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage when using MultiThreadedExecutor & TimerCallback #1618

Closed
k0suke-murakami opened this issue Apr 2, 2021 · 8 comments
Closed
Labels
bug Something isn't working

Comments

@k0suke-murakami
Copy link

k0suke-murakami commented Apr 2, 2021

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • 2.3.0-1focal.20210225.215848
  • DDS implementation:
    • cyclone_dds
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Clone test package: https://github.com/k0suke-murakami/multi_thread_sample

git clone [email protected]:k0suke-murakami/multi_thread_sample.git

Build & Run

ros2 run sample sample

Expected behavior

Low CPU usage.
When executed with a single-threaded executor, the CPU usage is almost 0%.

int main(int argc, char **argv)
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<Chatter>());
  rclcpp::shutdown();
}

Actual behavior

Reaches 40% CPU usage in my environment.

Additional information

@fujitatomoya
Copy link
Collaborator

checked with source build ros2/ros2@0e6cc1c, i cannot reproduce the problem.

  • terminal-1
root@134f29e8f25f:~/ros2_ws/colcon_ws# ros2 run sample sample
[INFO] [1617930704.586983609] [chatter]: update is called
[INFO] [1617930705.587006977] [chatter]: update is called
[INFO] [1617930706.587756538] [chatter]: update is called
...<snip>
  • terminal-2
root@134f29e8f25f:~# pgrep -l sample
14461 sample
root@134f29e8f25f:~# pidstat -p 14461 1
Linux 5.4.0-70-generic (134f29e8f25f)   04/09/2021      _x86_64_        (24 CPU)

09:39:00 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:39:01 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:02 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:03 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:04 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:05 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:06 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:07 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:08 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:09 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:10 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
09:39:11 AM     0     14461    0.00    0.00    0.00    0.00    0.00     2  sample
...<snip>

@Karsten1987
Copy link
Contributor

We had the same issue over at ros2-control: ros-controls/ros2_control#275

Even without any load, the multithreaded executor would exhaust all CPU and could not achieve the desired update rates.

@fujitatomoya
Copy link
Collaborator

I think this can be also fixed by #1516, which introduces MutexTwoPriorities to reduce race condition for get_next_executable.

before MutexTwoPriorities, it is really hard to remove Timer with

it is because the same mutex wait_mutex_ is used with get_next_executable

std::lock_guard<std::mutex> wait_lock(wait_mutex_);

there is a lot of race condition, and this makes really hard to remove executed Timer from the list. on the other hand, other threads just keep trying to get mutex and execute the Timer (which is already scheduled). this loop runs and consumes CPU resources.

btw, i do not think MutexTwoPriorities is perfect solution. in theory Timer should be taken by thread once, and nobody else needs to get that Timer via get_next_executable.

i might be missing something, any comments are welcome 😄

@mitsudome-r
Copy link

@Karsten1987 @fujitatomoya

I'm writing on behalf of @k0suke-murakami. Thank you for your information. We have confirmed that the CPU usage decreased significantly on master. (We were originally testing on foxy)

Do you know if this feature is going to be back-ported to foxy?

@fujitatomoya
Copy link
Collaborator

We have confirmed that the CPU usage decreased significantly on master. (We were originally testing on foxy)

good to know 👍 thanks for the confirmation.

Do you know if this feature is going to be back-ported to foxy?

as far as i see, it will break the ABI. so i guess it is unlikely to happen...but i think this could change user experience significantly.

@ivanpauno @clalancette what do you think about this backport?

@clalancette
Copy link
Contributor

as far as i see, it will break the ABI. so i guess it is unlikely to happen...but i think this could change user experience significantly.

Yes, if it changes ABI, we can't put it into Foxy. However, if someone were to come up with an ABI-compatible version of it, we could consider putting that into Foxy.

@mitsudome-r
Copy link

if someone were to come up with an ABI-compatible version of it, we could consider putting that into Foxy.

Thanks. We will take a deeper look into #1516 and see if we can do that without breaking the ABI.

@ivanpauno ivanpauno added the bug Something isn't working label Apr 9, 2021
@ivanpauno
Copy link
Member

I cannot imagine a ABI compatible fix for this, but if someone can figure out a ABI compatible fix that would be great.

The bug is fixed in master though, so I think that we can close this ticket.
@mitsudome-r @k0suke-murakami do you think that's ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants