-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<fstream>: Performance issue when reading a binary file using std::basic_ifstream<unsigned char> #2109
Comments
Duplicate: #817 See the comment for more details: #817 (comment) use https://en.cppreference.com/w/cpp/io/basic_streambuf/pubsetbuf |
|
Mmmh Ok I see. But why there is no internal buffer for all 1 byte data types ? |
@jamesmagnus As far as I understand we don't use buffering for Lines 717 to 720 in bfa2cb2
Lines 609 to 636 in 280347a
Lines 561 to 607 in 280347a
If @StephanTLavavej has time, he can answer in a more detailed way... |
By the way LLVM libc++ do buffering in the basic_filebuf constructor: https://github.com/llvm/llvm-project/blob/main/libcxx/include/fstream#L314 Maybe we could do similar... |
@jamesmagnus one more thing. https://en.cppreference.com/w/cpp/io/basic_streambuf/pubsetbuf and https://stackoverflow.com/a/40317135/4544798 and other internet resources tell that you need to do (it is true for gnu libstdc++):
But this is not true for MS STL. With MS STL you need to do
So cross platform C++ might be tricky :( |
Couldn't we make |
Probably, I offered return |
@CaseyCarter notes that |
As somewhat alluded to in #817, it's kind of about trying to have sane concepts in your api's and types. Surely, if you open a stream in binary mode you'd really like to do so while using the std::byte type right? Right?! Or similar with uint8_t's or whatever might fit within your given domain while still conveying the concept of "this is an opaque blob of data" not a character string. And right now you can't do it without resorting to extra, ugly, API calls to set the internal buffers. Maybe at least fix it for std::byte? In my case, had I not realized that my unit tests were taking 30 to 50x longer to complete (a few seconds longer than previous) I would have subjected dozens (dozens!) of people who use my lib to very bad performance on Windows. And Windows still needs all the help it can get due to slight performance losses everywhere else (ABI nonsense). |
I think what could happen here is that someone audits the code to remove the impediment that prevents the optimization being used for all char-like types of size 1, instead of only for We don't plan to make this change right now, but would be happy to review a pull request doing so. |
Describe the bug
Performance issue.
When reading a binary file using unsigned data type (ie
std::basic_ifstream<unsigned char>
) is order of magnitude slower than reading it using signed data type (iestd::basic_ifstream<char>
).It seems you are doing character by character locale conversion, altough the file is open as binary and not text.
Command-line test case
Expected behavior
Read a binary file using
char
orunsigned char
should take the same time. (no character conversion for binary data)STL version
Additional context
Use a randomly generated
test.data
binary file to reproduce the issue with my code. Mine is 128MB.The text was updated successfully, but these errors were encountered: