Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<fstream>: Performance issue when reading a binary file using std::basic_ifstream<unsigned char> #2109

Closed
jamesmagnus opened this issue Aug 10, 2021 · 11 comments · Fixed by #2739
Labels
fixed Something works now, yay! performance Must go faster

Comments

@jamesmagnus
Copy link

Describe the bug
Performance issue.
When reading a binary file using unsigned data type (ie std::basic_ifstream<unsigned char>) is order of magnitude slower than reading it using signed data type (ie std::basic_ifstream<char>).
It seems you are doing character by character locale conversion, altough the file is open as binary and not text.

Command-line test case

Z:\beaus\Desktop\MSVC>type Source.cpp
#include <vector>
#include <iostream>
#include <fstream>
#include <chrono>
#include <filesystem>

int main()
{
        const std::filesystem::path filePath{ "test.data" }; // 128 MB binary file
        const size_t fileSize{ std::filesystem::file_size(filePath) };

        // Read the whole file using char data type
        {
                // Open file in binary mode
                std::basic_ifstream<char> fileStream{ filePath, std::fstream::binary };
                std::vector<char> data;
                data.resize(fileSize);

                const auto start{ std::chrono::system_clock::now() };
                fileStream.read(data.data(), fileSize);
                const auto end{ std::chrono::system_clock::now() };

                std::cout << "Duration read signed: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
        }

        // Read the whole file using unsigned char data type
        {
                // Open file in binary mode
                std::basic_ifstream<unsigned char> fileStream{ filePath, std::fstream::binary };
                std::vector<unsigned char> data;
                data.resize(fileSize);

                const auto start{ std::chrono::system_clock::now() };
                fileStream.read(data.data(), fileSize);
                const auto end{ std::chrono::system_clock::now() };

                std::cout << "Duration read unsigned: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
        }

        return 0;
}
Z:\beaus\Desktop\MSVC>cl /EHsc /W4 /WX /std:c++17 .\Source.cpp
Compilateur d'optimisation Microsoft (R) C/C++ version 19.29.30040 pour x64
Copyright (C) Microsoft Corporation. Tous droits réservés.

Source.cpp
Microsoft (R) Incremental Linker Version 14.29.30040.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:Source.exe
Source.obj

Z:\beaus\Desktop\MSVC>Source.exe
Duration read signed: 79 ms
Duration read unsigned: 12838 ms

Expected behavior
Read a binary file using char or unsigned char should take the same time. (no character conversion for binary data)

STL version

  • Option 1: Visual Studio version
    • Displayed in Help > About Microsoft Visual Studio
      Microsoft Visual Studio Community 2019
      Version 16.10.4
      

Additional context
Use a randomly generated test.data binary file to reproduce the issue with my code. Mine is 128MB.

@fsb4000
Copy link
Contributor

fsb4000 commented Aug 10, 2021

Duplicate: #817

See the comment for more details: #817 (comment)

use .rdbuf()->pubsetbuf()

https://en.cppreference.com/w/cpp/io/basic_streambuf/pubsetbuf

@fsb4000
Copy link
Contributor

fsb4000 commented Aug 10, 2021

#include <vector>
#include <iostream>
#include <fstream>
#include <chrono>
#include <filesystem>

int main()
{
        const std::filesystem::path filePath{ "test.data" }; // 128 MB binary file
        const size_t fileSize{ std::filesystem::file_size(filePath) };

        // Read the whole file using char data type
        {
                // Open file in binary mode
                std::basic_ifstream<char> fileStream{ filePath, std::fstream::binary };
                std::vector<char> data;
                data.resize(fileSize);

                const auto start{ std::chrono::system_clock::now() };
                fileStream.read(data.data(), fileSize);
                const auto end{ std::chrono::system_clock::now() };

                std::cout << "Duration read signed: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
        }

        // Read the whole file using unsigned char data type
        {
                // Open file in binary mode
                std::basic_ifstream<unsigned char> fileStream{ filePath, std::fstream::binary };
                std::vector<unsigned char> data;
                data.resize(fileSize);

                const auto start{ std::chrono::system_clock::now() };
                fileStream.read(data.data(), fileSize);
                const auto end{ std::chrono::system_clock::now() };

                std::cout << "Duration read unsigned: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
        }

        // Read the whole file using unsigned char data type with buffering
        {
                // Open file in binary mode
                std::basic_ifstream<unsigned char> fileStream{ filePath, std::fstream::binary };
                unsigned char buf[1024*10 + 1];
                fileStream.rdbuf()->pubsetbuf(buf, sizeof buf);
                std::vector<unsigned char> data;
                data.resize(fileSize);

                const auto start{ std::chrono::system_clock::now() };
                fileStream.read(data.data(), fileSize);
                const auto end{ std::chrono::system_clock::now() };

                std::cout << "Duration read unsigned with buffering: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
        }

        return 0;
}
Duration read signed: 135 ms
Duration read unsigned: 7279 ms
Duration read unsigned with buffering: 92 ms

изображение
изображение

@jamesmagnus
Copy link
Author

Mmmh Ok I see. But why there is no internal buffer for all 1 byte data types ?

@fsb4000
Copy link
Contributor

fsb4000 commented Aug 10, 2021

@jamesmagnus As far as I understand we don't use buffering for cout/cin and ofstream/ifstream.
So pubsetbuf with user provided buffer is the fastest method.
But we use some optimizations for char only because codecvt will be always noconv and we rely on fread/fwrite buffering:

STL/stl/inc/xlocale

Lines 717 to 720 in bfa2cb2

bool __CLR_OR_THIS_CALL do_always_noconv() const noexcept override {
// return true if conversions never change input (from codecvt)
return is_same_v<_Byte, _Elem>;
}

STL/stl/inc/fstream

Lines 609 to 636 in 280347a

virtual streamsize __CLR_OR_THIS_CALL xsputn(const _Elem* _Ptr, streamsize _Count) override {
// put _Count characters to stream
if constexpr (sizeof(_Elem) == 1) {
if (_Pcvt) { // if we need a nontrivial codecvt transform, do the default expensive thing
return _Mysb::xsputn(_Ptr, _Count);
}
const streamsize _Start_count = _Count;
streamsize _Size = _Mysb::_Pnavail();
if (0 < _Count && 0 < _Size) { // copy to write buffer
if (_Count < _Size) {
_Size = _Count;
}
_Traits::copy(_Mysb::pptr(), _Ptr, static_cast<size_t>(_Size));
_Ptr += _Size;
_Count -= _Size;
_Mysb::pbump(static_cast<int>(_Size));
}
if (0 < _Count && _Myfile) { // open C stream, attempt write
_Count -= _CSTD fwrite(_Ptr, sizeof(_Elem), static_cast<size_t>(_Count), _Myfile);
}
return _Start_count - _Count;
} else { // non-chars always get element-by-element processing
return _Mysb::xsputn(_Ptr, _Count);
}

STL/stl/inc/fstream

Lines 561 to 607 in 280347a

virtual streamsize __CLR_OR_THIS_CALL xsgetn(_Elem* _Ptr, streamsize _Count) override {
// get _Count characters from stream
if constexpr (sizeof(_Elem) == 1) {
if (_Count <= 0) {
return 0;
}
if (_Pcvt) { // if we need a nontrivial codecvt transform, do the default expensive thing
return _Mysb::xsgetn(_Ptr, _Count);
}
// assuming this is OK because _Ptr + _Count must be valid
auto _Count_s = static_cast<size_t>(_Count);
const auto _Start_count = _Count;
const auto _Available = static_cast<size_t>(_Mysb::_Gnavail());
if (0 < _Available) { // copy from get area
const auto _Read_size = (_STD min)(_Count_s, _Available);
_Traits::copy(_Ptr, _Mysb::gptr(), _Read_size);
_Ptr += _Read_size;
_Count_s -= _Read_size;
_Mysb::gbump(static_cast<int>(_Read_size));
}
if (_Myfile) { // open C stream, attempt read
_Reset_back(); // revert from _Mychar buffer
// process in 4k - 1 chunks to avoid tripping over fread's clobber-the-end behavior when
// doing \r\n -> \n translation
constexpr size_t _Read_size = 4095; // _INTERNAL_BUFSIZ - 1
while (_Read_size < _Count_s) {
const auto _Actual_read = _CSTD fread(_Ptr, sizeof(_Elem), _Read_size, _Myfile);
_Ptr += _Actual_read;
_Count_s -= _Actual_read;
if (_Actual_read != _Read_size) {
return static_cast<streamsize>(_Start_count - _Count_s);
}
}
if (0 < _Count_s) {
_Count_s -= _CSTD fread(_Ptr, sizeof(_Elem), _Count_s, _Myfile);
}
}
return static_cast<streamsize>(_Start_count - _Count_s);
} else { // non-chars always get element-by-element processing
return _Mysb::xsgetn(_Ptr, _Count);
}
}

If @StephanTLavavej has time, he can answer in a more detailed way...

@fsb4000
Copy link
Contributor

fsb4000 commented Aug 10, 2021

By the way LLVM libc++ do buffering in the basic_filebuf constructor: https://github.com/llvm/llvm-project/blob/main/libcxx/include/fstream#L314

Maybe we could do similar...

@fsb4000
Copy link
Contributor

fsb4000 commented Aug 10, 2021

@jamesmagnus one more thing.

https://en.cppreference.com/w/cpp/io/basic_streambuf/pubsetbuf and https://stackoverflow.com/a/40317135/4544798 and other internet resources tell that you need to do (it is true for gnu libstdc++):

  1. pubsetbuf
  2. open

But this is not true for MS STL. With MS STL you need to do

  1. open (or constructor with filename)
  2. pubsetbuf

So cross platform C++ might be tricky :(

https://i.imgur.com/rasLdzk.png

@AdamBucior
Copy link
Contributor

Couldn't we make codecvt::do_always_noconv return true when _Byte and _Elem are same size integral types (the same criteria copy uses to decide whether to use memmove)? It seems that do_in and do_out just static_cast everything so no real conversion happens in that case.

@fsb4000
Copy link
Contributor

fsb4000 commented Aug 10, 2021

Probably, I offered return true for codecvt<unsigned char, char>::do_always_noconv , codecvt<char8_t, char>::do_always_noconv and codecvt<std::byte, char>::do_always_noconv but your idea is just better because more general.

@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 11, 2021
@StephanTLavavej
Copy link
Member

@CaseyCarter notes that basic_ifstream<unsigned char> is a quasi-non-Standard specialization, and is unsure what the motivation is for using this type.

@jessey-git
Copy link

@CaseyCarter notes that basic_ifstream<unsigned char> is a quasi-non-Standard specialization, and is unsure what the motivation is for using this type.

As somewhat alluded to in #817, it's kind of about trying to have sane concepts in your api's and types. Surely, if you open a stream in binary mode you'd really like to do so while using the std::byte type right? Right?! Or similar with uint8_t's or whatever might fit within your given domain while still conveying the concept of "this is an opaque blob of data" not a character string.

And right now you can't do it without resorting to extra, ugly, API calls to set the internal buffers. Maybe at least fix it for std::byte?

In my case, had I not realized that my unit tests were taking 30 to 50x longer to complete (a few seconds longer than previous) I would have subjected dozens (dozens!) of people who use my lib to very bad performance on Windows. And Windows still needs all the help it can get due to slight performance losses everywhere else (ABI nonsense).

@CaseyCarter
Copy link
Member

I think what could happen here is that someone audits the code to remove the impediment that prevents the optimization being used for all char-like types of size 1, instead of only for char. If WG21 later decides that codecvt<char, std::byte, blah blah> needs to mean something else, then we can make that change at that time, potentially invalidating the optimization.

We don't plan to make this change right now, but would be happy to review a pull request doing so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Something works now, yay! performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants