Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Improving speed by splitting kernels #97

Merged
merged 3 commits into from
Nov 1, 2017

Conversation

shibatch
Copy link
Owner

With this patch, kernels can be splitted into odd and even parts in order to make computation faster.

If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution.
However, there is some overhead that is required to split the original kernel into two.
The degree of speed-up depends on the micro architecture. It may be slower on some micro architectures.
This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F.
This change affects computation speed of many functions, but for example,
the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.

…rder to make computation faster.

If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution.
However, there is some overhead that is required to split the original kernel into two.
The degree of speed-up depends on the micro architecture, and the split kernels may be slower on some micro architectures.
This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F.
This change affects computation speed of many functions, but for example,
the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.
Copy link
Collaborator

@fpetrogalli fpetrogalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please cherry pick this change into cmake-transition.

Please squash the commits into a single commit.

@shibatch shibatch merged commit 18bd3ab into master Nov 1, 2017
shibatch added a commit that referenced this pull request Nov 1, 2017
With this patch, kernels can be splitted into odd and even parts in order to make computation faster.

If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution.
However, there is some overhead that is required to split the original kernel into two.
The degree of speed-up depends on the micro architecture, and the split kernels may be slower on some micro architectures.
This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F.
This change affects computation speed of many functions, but for example,
the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.
@shibatch shibatch deleted the Improving_speed_by_splitting_kernels branch December 4, 2017 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants