Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(nebula): optimize AT128 decoder performance #37

Merged
merged 7 commits into from
Aug 9, 2023

Conversation

mojomex
Copy link
Collaborator

@mojomex mojomex commented Jul 20, 2023

PR Type

  • Improvement

Related Links

--

Description

This PR improves performance of the Hesai AT128 decoder by approx. 23% without changing the logical output.
To achieve this, three changes have been made:

  • Decoding points directly to scan_pc_ and overflow_pc_ instead of temporary buffers and re-using the buffers (~ 11%point improvement)
  • Removing unused copy operation for incoming scan packets (~ 9%point improvement)
  • Combining azimuth and elevation sin/cos lookup tables as they were exact duplicates (~ 4%point improvement)

Points were decoded in convert(block_id) to a temporary PointCloud, the contents of which were then copied to scan_pc_ or overflow_pc_ respectively. This caused avoidable latency. Now, depending on whether the current block being decoded is in a new frame/scan or not, convert(block_id, pcl) decodes those points directly into the right buffer.

Further, overflow_pc_ was reallocated on each full scan, causing avoidable heap allocations.
Because each incoming PandarScan message always contains a full scan (albeit not necessarily aligned to our phase setting), scan_decoder->get_pointcloud() will be called exactly once in ConvertScanToPointCloud. Thus, it is guaranteed that one of scan_pc_ and overflow_pc_ can always be used for decoding, and the other as the output buffer.
Once a scan is complete, the buffers are swapped and the overflow_pc_ is cleared (not reallocated).

No changes have been made to the base class, the new function is an overloaded version of the existing one (which has been changed to throw a "not implemented" error).

f7249549-193f-4193-a43a-ef47260f2416

Before Optimized
duration AVG 7.972725 ms 6.147384 ms
duration STD 0.759695 0.613904
duration AVG % rel. to before 100.000000 77.105188
duration STD % rel. to before 100.000000 80.809372

All measurements have been made on an Intel Core i7-12700H with

  • TurboBoost disabled
  • All cores set to 2.3 GHz
  • Thread pair 2/3 isolated
  • Nebula taskset to core 2

Measurements were repeated 3 times and ran for 37s each (~370 iterations).

Review Procedure

To confirm logical equality with the previous implementation, a rosbag of recorded AT128 packets can be used.
To confirm performance improvement, I provide the scripts and logging that yielded the above charts:

scripts/
    test_runner.bash  # runs the driver for 3 iterations on a given rosbag
    plot_times.py     # takes the logs from test_runner.bash and creates a plot like the one above

The test runner locks/unlocks core frequencies and tasksets nebula onto the specified core(s). Ideally, the taskset cores are isolated (cf. Autoware performance measurement guide)

Remarks

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

  • Assign PR to reviewer

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

  • Commits are properly organized and messages are according to the guideline
  • (Optional) Unit tests have been written for new behavior
  • PR title describes the changes

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

  • All open points are addressed and tracked via issues or tickets

CI Checks

  • Build and test for PR: Required to pass before the merge.

@mojomex mojomex changed the title perf(nebula): Optimize AT128 decoder performance perf(nebula): optimize AT128 decoder performance Jul 20, 2023
@mojomex mojomex marked this pull request as ready for review July 20, 2023 07:22
scripts/plot_times.py Outdated Show resolved Hide resolved
scripts/test_runner.bash Outdated Show resolved Hide resolved
@mojomex mojomex force-pushed the hesai-at128-performance-tuning branch from c00658e to 5f1fc29 Compare July 20, 2023 08:42
scripts/plot_times.py Outdated Show resolved Hide resolved
nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp Outdated Show resolved Hide resolved
@mojomex mojomex requested a review from amc-nu July 27, 2023 01:33
Get rid of temporary decoding pointcloud, decode packets directly into
scan_pc_ and overflow_pc_. Further, get rid of unused copy of the
incoming packets. This results in less copy operations and around a 23%
speedup of ReceiveScanMsgCallback() with no subscribers on the
published pointclouds.

Additionally, scan_pc_ and overflow_pc_ are now std::swap'd instead of
de-/reallocated upon scan completion. This eliminates heap assignments.

The duplicate sin/cos lookup tables for elevation/azimuth have been
combined and have impoved cache performance slightly.

Signed-off-by: Maximilian Schmeller <[email protected]>
This commit serves to ease performance validation of Hesai decoders by
adding logging of ReceiveScanMsgCallback() callback duration and adding
measurement runner and plotting scripts.

`profiling_runner.bash` is configurable via commandline to use
an arbitrary set of CPU cores, CPU frequency, sensor model, rosbag,
number of repetitions, etc.

The `plot_times.py` tool takes scenario names as console arguments.

Usage of those tools is now documented in `README.md`.

Signed-off-by: Maximilian Schmeller <[email protected]>
To support on-the-fly switching to dual return mode, the scan_pc_ and
overflow_pc_ buffers have to reserve that capacity beforehand.
This commit allocates the maximum number of points (2 * 1200 * 128) on
construction of the decoder.

Moved performance measurement explanation to the end of README.md.
@mojomex mojomex force-pushed the hesai-at128-performance-tuning branch from 9b07a9f to f20d7f4 Compare July 28, 2023 04:35
@mojomex mojomex requested a review from amc-nu July 28, 2023 04:38
@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2023

Codecov Report

Patch coverage: 57.14% and project coverage change: +8.17% 🎉

Comparison is base (cb25896) 13.38% compared to head (e591f46) 21.55%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #37      +/-   ##
==========================================
+ Coverage   13.38%   21.55%   +8.17%     
==========================================
  Files         111       48      -63     
  Lines       10991     6313    -4678     
  Branches     1727     1664      -63     
==========================================
- Hits         1471     1361     -110     
+ Misses       8339     3800    -4539     
+ Partials     1181     1152      -29     
Flag Coverage Δ
differential 21.55% <57.14%> (?)
total ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp 0.00% <0.00%> (ø)
...bula_decoders_hesai/decoders/pandar_at_decoder.cpp 69.04% <66.66%> (+0.65%) ⬆️

... and 63 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amc-nu
Copy link
Contributor

amc-nu commented Aug 7, 2023

Other than my comment above, the PR works great!
Thank you. @mojomex
cc @drwnz

@mojomex mojomex requested a review from amc-nu August 7, 2023 12:32
@amc-nu
Copy link
Contributor

amc-nu commented Aug 8, 2023

@mojomex can you please attend to the spelling errors on the spell-check job?

@mojomex
Copy link
Collaborator Author

mojomex commented Aug 8, 2023

@amc-nu Most of the warnings were the names of Bash commands, added them to the dictionary. stds for standard deviations is also commonly used in Python/Pandas, added to the dictionary as well.

@amc-nu amc-nu merged commit 358f09b into tier4:main Aug 9, 2023
6 of 7 checks passed
@mojomex mojomex deleted the hesai-at128-performance-tuning branch August 9, 2023 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants