Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format directly in to a std::string? #292

Closed
nlyan opened this issue Mar 12, 2016 · 8 comments
Closed

Format directly in to a std::string? #292

nlyan opened this issue Mar 12, 2016 · 8 comments

Comments

@nlyan
Copy link

nlyan commented Mar 12, 2016

The current Writer abstraction supports returning the newly formatted string as a std::string, but this involves copying out of the writers internal buffer. Would there be any gain in introducing an interface that accepts a std::string object by reference, such that its memory buffer can be used directly?

This is the primary motivation for the API design of std::getline for instance, and is somewhat similar to how std::ostringstream works. Using ostringstream and fmt::print doesn't really solve the problem however, because pulling a std::string out of one results in a copy.

I've tried to hack up a fmt::Buffer derivative that can do this, but something isn't really jelling and I'm getting ASAN errors.

@vitaut
Copy link
Contributor

vitaut commented Mar 14, 2016

It's possible to avoid std::string construction/copying by accessing the result as a C string via MemoryWriter::c_str(). This should be faster than reusing std::string because there will be no dynamic memory allocation at all for small strings.

But if you like, you can add a custom Writer that writes directly to an std::string:

#include "format.h"

class StringBuffer : public fmt::Buffer<char> {
 private:
  std::string data_;

 protected:
  virtual void grow(std::size_t size) {
    data_.resize(size);
    ptr_ = &data_[0];
    capacity_ = size;
  }

 public:
  const std::string &data() {
    data_.resize(size_);
    return data_;
  }
};

class StringWriter : public fmt::Writer {
 private:
  StringBuffer buffer_;

 public:
  StringWriter() : fmt::Writer(buffer_) {}

  const std::string &str() { return buffer_.data(); }
};

int main() {
  StringWriter w;
  w.write("The answer is {}.\n", 42);
  w.clear();
  w.write("Something else.\n"); // The string is reused.
}

@vitaut vitaut closed this as completed Mar 15, 2016
@nlyan
Copy link
Author

nlyan commented Mar 15, 2016

For reference, I was writing a class which provides an option to format both to a string, and to JSON. Printing to JSON was twice as fast as using CppFormats stock MemoryWriter, because the interface forces you to copy when you're done with it. I feel it's exceptionally common to want do a bunch write operations and then immediately discard the Writer. In my case, the Writer itself is an implementation detail and cannot be reused without bloating a data structure somewhere, or poluting the API.

Here is my current solution

class StringBuffer: public fmt::Buffer<char> {
public:
    StringBuffer() noexcept {}

    explicit
    StringBuffer (std::string&& str) noexcept: str_(std::move(str)) {
        str_.resize (str_.capacity());
        if (!str_.empty()) {
            fmt::Buffer<char>::ptr_ = &str_[0];
            fmt::Buffer<char>::capacity_ = str_.size();
        }
    }

    void
    grow (std::size_t const size) override {
        if (size <= fmt::Buffer<char>::size_) {
            return;
        }
        str_.resize (std::max (size, 2 * fmt::Buffer<char>::size_));
        fmt::Buffer<char>::ptr_ = &str_[0];
        fmt::Buffer<char>::capacity_ = str_.size();
    }

    std::string
    release() noexcept {
        std::string str {std::move(str_)};
        str.resize (fmt::Buffer<char>::size_);
        fmt::Buffer<char>::ptr_ = nullptr;
        fmt::Buffer<char>::size_ = 0;
        fmt::Buffer<char>::capacity_ = 0;
        return str;
    }
private:
    std::string str_;
};

class StringWriter: public fmt::BasicWriter<char> {
public:
    StringWriter() noexcept: fmt::BasicWriter<char>(sbuf_) {}

    explicit
    StringWriter (std::string&& str) noexcept:
    fmt::BasicWriter<char>(sbuf_), sbuf_(std::move (str)){}

    std::string str() noexcept {
        return sbuf_.release();
    }

private:
    StringBuffer sbuf_;
};

What I was proprosing is introducing a StringBuffer that has this behaviour

@vitaut
Copy link
Contributor

vitaut commented Mar 16, 2016

Right, writing to an std::string directly might be useful if you need to pass it elsewhere. Would you be willing to submit a PR that adds such StringWriter? The above code can be used as a starting point but it will need tests and ensuring compatibility with C++98/03.

@vitaut vitaut reopened this Mar 16, 2016
@nlyan
Copy link
Author

nlyan commented Mar 16, 2016

I'll put in on my todo list. C++98/03 compat will obviously involve a swap() to get the same efficiency.

@vitaut
Copy link
Contributor

vitaut commented Jul 16, 2016

Implemented in d4885ce (docs). Comments are welcome.

@dean0x7d
Copy link
Contributor

If I understood the Buffer interface correctly, I think StringBuffer exhibits undefined behavior in C++98 because std::string wasn't guaranteed to hold contiguous data back then:

std::string str("data");
char* p = &str[0];
str[2] = 'd'; // fine, using string::operator[]
p[2] = 'd';   // UB in C++98, fine in C++11

Also, shouldn't std::string data_ be std::basic_string<Char> data_ for wchar_t support?

@vitaut
Copy link
Contributor

vitaut commented Jul 18, 2016

@dean0x7d Good points.

Although technically std::string is not guaranteed to have contiguous storage in pre-C++11, according to Herb Sutter:

... current (C++03) ISO C++ does require &str[0] to cough up a pointer to contiguous string data (but not necessarily null-terminated!), so there wasn’t much leeway for implementers to have non-contiguous strings, anyway.

and &str[0] is exactly what StringBuffer uses.

As for the wchar_t issue, it should be fixed in 65cd664, thanks!

@dean0x7d
Copy link
Contributor

Cool, that's good to know.
I was under the impression that libstdc++'s copy-on-write implementation was taking advantage of non-contiguous storage, but that's not the case. It looks like even pre-C++11 rules made contiguous storage the only reasonable option.

@vitaut vitaut closed this as completed Jul 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants