Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt::string_view ctor under Clang (in pre c++17) much slower than of gcc #914

Closed
gabime opened this issue Oct 24, 2018 · 6 comments
Closed

Comments

@gabime
Copy link
Contributor

gabime commented Oct 24, 2018

Version: fmt 5.2.1, Clang 6.0

Summary
Constructing fmt::string_view with const char* of size 515 bytes takes about ~180 ns under Clang, while under gcc it takes less than 1ns.

Cause
The cause is that https://github.com/fmtlib/fmt/blob/master/include/fmt/core.h@L217

FMT_CONSTEXPR size_t length(const char *s) { return std::strlen(s); }

is only enabled for gcc (why?) so clang ends up using the much slower implementation (https://github.com/fmtlib/fmt/blob/master/include/fmt/core.h@L210)

Just to be sure I tried to enable for clang as well, and performance went from 180 to 1ns:

#if FMT_GCC_VERSION || defined(__clang__)
FMT_CONSTEXPR size_t length(const char *s) { return std::strlen(s); }
#endif
@vitaut
Copy link
Contributor

vitaut commented Oct 24, 2018

The problem is that strlen is not constexpr on clang, only on gcc. It can be solved by using __builtin_strlen on clang I think.

vitaut added a commit that referenced this issue Oct 24, 2018
@gabime
Copy link
Contributor Author

gabime commented Oct 24, 2018

__builtin_strlen on clang I think.

I tried the below and it fixes the issue:

FMT_CONSTEXPR size_t length(const char *s)
{
#if FMT_GCC_VERSION
  return std::strlen(s);
#elif defined(__clang__) && __has_builtin(__builtin_strlen)
  return __builtin_strlen(s);
#else
  return length<char>(s); //fallback
#endif
}

I can issue a PR if it's ok.
What about msvc?

@vitaut
Copy link
Contributor

vitaut commented Oct 24, 2018

Should be fixed in f0328f8 (including msvc). @gabime could you confirm that it works for you?

@gabime
Copy link
Contributor Author

gabime commented Oct 24, 2018

Confirmed. Same performance for clang and g++ now.
Thanks for the quick resolution

@gabime gabime closed this as completed Oct 24, 2018
@vitaut
Copy link
Contributor

vitaut commented Oct 25, 2018

@gabime 180 -> 1 ns seems like excessively big difference though. Out of curiosity, could you give the details about your setup: clang version, what optimization level you used and test code?

@gabime
Copy link
Contributor Author

gabime commented Oct 25, 2018

Sure. Here is the test:

#include "benchmark/benchmark.h" //google benchmark
#include "fmt/core.h"

void bench_sv(benchmark::State& state)
{
    const char *msg = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum pharetra metus cursus "
                      "lacus placerat congue. Nulla egestas, mauris a tincidunt tempus, enim lectus volutpat mi, eu consequat sem "
                      "libero nec massa. In dapibus ipsum a diam rhoncus gravida. Etiam non dapibus eros. Donec fringilla dui sed "
                      "augue pretium, nec scelerisque est maximus. Nullam convallis, sem nec blandit maximus, nisi turpis ornare "
                      "nisl, sit amet volutpat neque massa eu odio. Maecenas malesuada quam ex, posuere congue nibh turpis duis.";


    for (auto _ : state)
    {
        fmt::string_view sv(msg);
        benchmark::DoNotOptimize(sv);
    }
}

BENCHMARK(bench_sv);

BENCHMARK_MAIN();

Compiled with:

g++ latency.cpp -o latency -march=native -Wall -Wextra -pedantic -std=c++11 -pthread -I./fmt/include -Ofast -flto -Wl,--no-as-needed -lbenchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants