Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtlil: Add packed extract implementation for SigSpec #4350

Merged
merged 2 commits into from
Apr 24, 2024

Conversation

jix
Copy link
Member

@jix jix commented Apr 22, 2024

Previously extract on a SigSpec would always unpack it. Since a significant amount of SigSpecs have one or few chunks, it's worth having a dedicated implementation.

This is especially true, since the RTLIL frontend calls into this for every wire [lhs:rhs] slice, making this extract take up 40% when profiling read_rtlil with one of the largest coarse grained RTLIL designs I had on hand.

With this change the read_rtlil profile looks like I would expect it to look like, but I noticed that a lot of the other core RTLIL methods also are a bit too eager with unpacking or implementing SigChunk/Const overloads that just convert to a single chunk SigSpec and forward to the implementation for that, when a direct implementation would avoid temporary std::vector allocations. While not relevant for read_rtlil, to me it looks like there might be a few easy overall performance gains to be had by addressing this more generally.

Previously `extract` on a `SigSpec` would always unpack it. Since a
significant amount of `SigSpec`s have one or few chunks, it's worth
having a dedicated implementation.

This is especially true, since the RTLIL frontend calls into this for
every `wire [lhs:rhs]` slice, making this `extract` take up 40% when
profiling `read_rtlil` with one of the largest coarse grained RTLIL
designs I had on hand.

With this change the `read_rtlil` profile looks like I would expect it
to look like, but I noticed that a lot of the other core RTLIL methods
also are a bit too eager with unpacking or implementing
`SigChunk`/`Const` overloads that just convert to a single chunk
`SigSpec` and forward to the implementation for that, when a direct
implementation would avoid temporary std::vector allocations. While not
relevant for `read_rtlil`, to me it looks like there might be a few easy
overall performance gains to be had by addressing this more generally.
@jix jix requested a review from povik April 22, 2024 11:36
kernel/rtlil.cc Outdated Show resolved Hide resolved
@povik
Copy link
Member

povik commented Apr 22, 2024

While not relevant for read_rtlil, to me it looks like there might be a few easy overall performance gains to be had by addressing this more generally.

I also have the impression from looking at some profiles, and knowing how a lot of operations can trigger unpacking/packing.

@povik povik requested a review from RCoeurjoly April 22, 2024 15:27
@jix
Copy link
Member Author

jix commented Apr 22, 2024

Your new implementation looks good to me

@povik
Copy link
Member

povik commented Apr 22, 2024

Thanks!

Copy link
Contributor

@RCoeurjoly RCoeurjoly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. I checked with valgrind and test defvalue_ref.sv and no memory issues

@povik
Copy link
Member

povik commented Apr 24, 2024

Let me merge it then

@povik povik merged commit cd1fb8b into YosysHQ:main Apr 24, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants