Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post-processing: negate: optimise for non-vectorising builds #64

Merged
merged 1 commit into from
Aug 5, 2021

Conversation

davidplowman
Copy link
Collaborator

The loop can be better optimised when the vectorising compiler cannot
be used.

Signed-off-by: David Plowman [email protected]

The loop can be better optimised when the vectorising compiler cannot
be used.

Signed-off-by: David Plowman <[email protected]>
@davidplowman
Copy link
Collaborator Author

@naushir
For my amusement, I timed the loop for debug, release and vectorising builds, using either uint8_t, uint32_t or uint64_t operations. The results (in ms per 3MP frame, running single-threaded) were:

             uint8_t  uint32_t  uint64_t
Debug        900      230       120
Release      230      120       120
Vectorising  60       60        60

I've assumed we can rely on the buffers being 4-byte aligned. Do you think 8-byte alignment could be assumed, as that would run faster in a debug buidl?

@naushir
Copy link
Collaborator

naushir commented Aug 3, 2021

For YUV420 formats, they will be 8-byte aligned. However, other formats will currently be 4-byte aligned. There's nothing to stop us from changing all to 8-byte alignment.... but it sort of does not seem worth it for only improving Debug builds.

@davidplowman
Copy link
Collaborator Author

Sounds godd as it is, then. Do you want to push the merge button?

@naushir naushir merged commit fc8f6b6 into raspberrypi:post-processing Aug 5, 2021
@davidplowman davidplowman deleted the negate-optimise branch September 5, 2021 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants