perf(nebula): optimize AT128 decoder performance #37

mojomex · 2023-07-20T07:03:43Z

PR Type

Improvement

Description

This PR improves performance of the Hesai AT128 decoder by approx. 23% without changing the logical output.
To achieve this, three changes have been made:

Decoding points directly to scan_pc_ and overflow_pc_ instead of temporary buffers and re-using the buffers (~ 11%point improvement)
Removing unused copy operation for incoming scan packets (~ 9%point improvement)
Combining azimuth and elevation sin/cos lookup tables as they were exact duplicates (~ 4%point improvement)

Points were decoded in convert(block_id) to a temporary PointCloud, the contents of which were then copied to scan_pc_ or overflow_pc_ respectively. This caused avoidable latency. Now, depending on whether the current block being decoded is in a new frame/scan or not, convert(block_id, pcl) decodes those points directly into the right buffer.

Further, overflow_pc_ was reallocated on each full scan, causing avoidable heap allocations.
Because each incoming PandarScan message always contains a full scan (albeit not necessarily aligned to our phase setting), scan_decoder->get_pointcloud() will be called exactly once in ConvertScanToPointCloud. Thus, it is guaranteed that one of scan_pc_ and overflow_pc_ can always be used for decoding, and the other as the output buffer.
Once a scan is complete, the buffers are swapped and the overflow_pc_ is cleared (not reallocated).

No changes have been made to the base class, the new function is an overloaded version of the existing one (which has been changed to throw a "not implemented" error).

	Before	Optimized
duration AVG	7.972725 ms	6.147384 ms
duration STD	0.759695	0.613904
duration AVG % rel. to before	100.000000	77.105188
duration STD % rel. to before	100.000000	80.809372

All measurements have been made on an Intel Core i7-12700H with

TurboBoost disabled
All cores set to 2.3 GHz
Thread pair 2/3 isolated
Nebula taskset to core 2

Measurements were repeated 3 times and ran for 37s each (~370 iterations).

Review Procedure

To confirm logical equality with the previous implementation, a rosbag of recorded AT128 packets can be used.
To confirm performance improvement, I provide the scripts and logging that yielded the above charts:

scripts/
    test_runner.bash  # runs the driver for 3 iterations on a given rosbag
    plot_times.py     # takes the logs from test_runner.bash and creates a plot like the one above

The test runner locks/unlocks core frequencies and tasksets nebula onto the specified core(s). Ideally, the taskset cores are isolated (cf. Autoware performance measurement guide)

Remarks

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

Assign PR to reviewer

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

Commits are properly organized and messages are according to the guideline
(Optional) Unit tests have been written for new behavior
PR title describes the changes

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

All open points are addressed and tracked via issues or tickets

CI Checks

Build and test for PR: Required to pass before the merge.

scripts/plot_times.py

scripts/test_runner.bash

nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp

nebula_ros/include/nebula_ros/hesai/hesai_decoder_ros_wrapper.hpp

scripts/plot_times.py

nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp

nebula_decoders/src/nebula_decoders_hesai/decoders/pandar_at_decoder.cpp

scripts/profiling_runner.bash

Get rid of temporary decoding pointcloud, decode packets directly into scan_pc_ and overflow_pc_. Further, get rid of unused copy of the incoming packets. This results in less copy operations and around a 23% speedup of ReceiveScanMsgCallback() with no subscribers on the published pointclouds. Additionally, scan_pc_ and overflow_pc_ are now std::swap'd instead of de-/reallocated upon scan completion. This eliminates heap assignments. The duplicate sin/cos lookup tables for elevation/azimuth have been combined and have impoved cache performance slightly. Signed-off-by: Maximilian Schmeller <[email protected]>

This commit serves to ease performance validation of Hesai decoders by adding logging of ReceiveScanMsgCallback() callback duration and adding measurement runner and plotting scripts. `profiling_runner.bash` is configurable via commandline to use an arbitrary set of CPU cores, CPU frequency, sensor model, rosbag, number of repetitions, etc. The `plot_times.py` tool takes scenario names as console arguments. Usage of those tools is now documented in `README.md`. Signed-off-by: Maximilian Schmeller <[email protected]>

To support on-the-fly switching to dual return mode, the scan_pc_ and overflow_pc_ buffers have to reserve that capacity beforehand. This commit allocates the maximum number of points (2 * 1200 * 128) on construction of the decoder. Moved performance measurement explanation to the end of README.md.

nebula_decoders/include/nebula_decoders/nebula_decoders_hesai/decoders/pandar_at_decoder.hpp

codecov-commenter · 2023-08-07T08:00:04Z

Codecov Report

Patch coverage: 57.14% and project coverage change: +8.17% 🎉

Comparison is base (cb25896) 13.38% compared to head (e591f46) 21.55%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #37      +/-   ##
==========================================
+ Coverage   13.38%   21.55%   +8.17%     
==========================================
  Files         111       48      -63     
  Lines       10991     6313    -4678     
  Branches     1727     1664      -63     
==========================================
- Hits         1471     1361     -110     
+ Misses       8339     3800    -4539     
+ Partials     1181     1152      -29

Flag	Coverage Δ
differential	`21.55% <57.14%> (?)`
total	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp	`0.00% <0.00%> (ø)`
...bula_decoders_hesai/decoders/pandar_at_decoder.cpp	`69.04% <66.66%> (+0.65%)`	⬆️

... and 63 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

amc-nu · 2023-08-07T09:04:55Z

Other than my comment above, the PR works great!
Thank you. @mojomex
cc @drwnz

amc-nu · 2023-08-08T00:42:15Z

@mojomex can you please attend to the spelling errors on the spell-check job?

mojomex · 2023-08-08T02:08:14Z

@amc-nu Most of the warnings were the names of Bash commands, added them to the dictionary. stds for standard deviations is also commonly used in Python/Pandas, added to the dictionary as well.

mojomex changed the title ~~perf(nebula): Optimize AT128 decoder performance~~ perf(nebula): optimize AT128 decoder performance Jul 20, 2023

mojomex marked this pull request as ready for review July 20, 2023 07:22

mojomex commented Jul 20, 2023

View reviewed changes

scripts/plot_times.py Outdated Show resolved Hide resolved

mojomex commented Jul 20, 2023

View reviewed changes

scripts/test_runner.bash Outdated Show resolved Hide resolved

mojomex commented Jul 20, 2023

View reviewed changes

nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp Show resolved Hide resolved

mojomex commented Jul 20, 2023

View reviewed changes

nebula_ros/include/nebula_ros/hesai/hesai_decoder_ros_wrapper.hpp Outdated Show resolved Hide resolved

mojomex force-pushed the hesai-at128-performance-tuning branch from c00658e to 5f1fc29 Compare July 20, 2023 08:42

amc-nu suggested changes Jul 24, 2023

View reviewed changes

scripts/plot_times.py Outdated Show resolved Hide resolved

nebula_ros/src/hesai/hesai_decoder_ros_wrapper.cpp Outdated Show resolved Hide resolved

nebula_decoders/src/nebula_decoders_hesai/decoders/pandar_at_decoder.cpp Outdated Show resolved Hide resolved

mojomex requested a review from amc-nu July 27, 2023 01:33

amc-nu reviewed Jul 28, 2023

View reviewed changes

scripts/profiling_runner.bash Show resolved Hide resolved

mojomex added 3 commits July 28, 2023 13:29

mojomex force-pushed the hesai-at128-performance-tuning branch from 9b07a9f to f20d7f4 Compare July 28, 2023 04:35

mojomex requested a review from amc-nu July 28, 2023 04:38

Merge branch 'main' into hesai-at128-performance-tuning

a65bd35

amc-nu reviewed Aug 7, 2023

View reviewed changes

nebula_decoders/include/nebula_decoders/nebula_decoders_hesai/decoders/pandar_at_decoder.hpp Outdated Show resolved Hide resolved

Fixed uint16_t overflow in AT decoder header

762ea61

mojomex requested a review from amc-nu August 7, 2023 12:32

amc-nu approved these changes Aug 8, 2023

View reviewed changes

Update spellcheck dictionary to include bash commands

d7a78b7

Merge branch 'main' into hesai-at128-performance-tuning

e591f46

amc-nu approved these changes Aug 8, 2023

View reviewed changes

amc-nu merged commit 358f09b into tier4:main Aug 9, 2023
6 of 7 checks passed

mojomex deleted the hesai-at128-performance-tuning branch August 9, 2023 01:48

mebasoglu mentioned this pull request Aug 9, 2023

feat(velodyne_ros_wrapper): add profiling log to velodyne #51

Merged

5 tasks

mojomex mentioned this pull request Aug 18, 2023

refactor(nebula_decoders_hesai): implement generic Hesai decoder #42

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(nebula): optimize AT128 decoder performance #37

perf(nebula): optimize AT128 decoder performance #37

mojomex commented Jul 20, 2023 •

edited by amc-nu

Loading

codecov-commenter commented Aug 7, 2023 •

edited

Loading

amc-nu commented Aug 7, 2023

amc-nu commented Aug 8, 2023

mojomex commented Aug 8, 2023 •

edited

Loading

perf(nebula): optimize AT128 decoder performance #37

perf(nebula): optimize AT128 decoder performance #37

Conversation

mojomex commented Jul 20, 2023 • edited by amc-nu Loading

PR Type

Related Links

Description

Review Procedure

Remarks

Pre-Review Checklist for the PR Author

Checklist for the PR Reviewer

Post-Review Checklist for the PR Author

CI Checks

codecov-commenter commented Aug 7, 2023 • edited Loading

Codecov Report

amc-nu commented Aug 7, 2023

amc-nu commented Aug 8, 2023

mojomex commented Aug 8, 2023 • edited Loading

mojomex commented Jul 20, 2023 •

edited by amc-nu

Loading

codecov-commenter commented Aug 7, 2023 •

edited

Loading

mojomex commented Aug 8, 2023 •

edited

Loading