Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Improving speed by splitting kernels #97

Merged
merged 3 commits into from
Nov 1, 2017

Commits on Oct 20, 2017

  1. With this patch, kernels can be splitted into odd and even parts in o…

    …rder to make computation faster.
    
    If the processor is powerful enough, these two parts can be executed in parallel by out-of-order execution.
    However, there is some overhead that is required to split the original kernel into two.
    The degree of speed-up depends on the micro architecture, and the split kernels may be slower on some micro architectures.
    This feature is enabled by SPLIT_KERNEL macro, and currently, it is enabled only for AVX2 and AVX512F.
    This change affects computation speed of many functions, but for example,
    the ratio of execution time between the previous version and the new version of asind4_u35 is 2.08.
    shibatch committed Oct 20, 2017
    Configuration menu
    Copy the full SHA
    31246ff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fc0516c View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2017

  1. 2 Configuration menu
    Copy the full SHA
    f52f024 View commit details
    Browse the repository at this point in the history