Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Enhancement: Parallelized Unpacking of Velodyne Packets #61

Closed
mebasoglu opened this issue Aug 29, 2023 · 3 comments
Closed

Comments

@mebasoglu
Copy link
Collaborator

Feature Overview

This enhancement introduces a significant performance improvement by parallelizing the process of unpacking Velodyne packets. By leveraging parallel processing techniques, the repository's functionality will experience enhanced efficiency and reduced processing time during the unpacking stage. This improvement is particularly relevant for VLS-128 where large amounts of Velodyne packet data need to be processed quickly and efficiently.

Could you please provide any feedback, suggestions, and guidance on the proposed enhancement @drwnz @amc-nu ?

Here are the performance metrics obtained with profiling scripts:

nebula-vls128-main nebula-vls128-parallel
d_total AVG 24.8819 6.36068
n_out AVG 213395 213395
d_total STD 9.54071 2.84753
n_out STD 114.332 114.378
d_total AVG % rel to nebula-vls128-main 100 25.5634
n_out AVG % rel to nebula-vls128-main 100 99.9999
d_total STD % rel to nebula-vls128-main 100 29.8461
n_out STD % rel to nebula-vls128-main 100 100.041

Timing_comparison

@drwnz
Copy link
Collaborator

drwnz commented Aug 29, 2023

@mebasoglu looks interesting, are you able to create a PR for your implementation?

@mebasoglu
Copy link
Collaborator Author

Hello @drwnz , I created the PR: #62

@mojomex
Copy link
Collaborator

mojomex commented Jun 12, 2024

Hi @mebasoglu, we implemented the per-packet decoding proposed in

in

and will thus close this issue in favor of that approach. The reason we favored the other approach is that packets arrive one-by one anyway, and instead of accumulating them to be able to decode in parallel, decoding them in a single thread as they arrive is even more memory efficient. it is also faster from a latency perspective, as upon arrival of the last packet, only one packet has to be decoded until the pointcloud can be published.

Nevertheless, thank you for your investigation and proposal.

@mojomex mojomex closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants