-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix iterator_input_adapter consuming input before it should #3442
base: develop
Are you sure you want to change the base?
fix iterator_input_adapter consuming input before it should #3442
Conversation
|
||
template<typename BaseInputAdapter, size_t T> | ||
friend struct wide_string_input_helper; | ||
|
||
bool empty() const | ||
bool empty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this remain const
qualified? (Meaning current
and current_has_been_consumed
would have to be made mutable
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do it should?
Empty actually need to do work (advance the underlying iterator
) if input has been already consumed
keeping it const
would be only to legacy reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's why we have the mutable
keyword in the language. It's more to meet expectations. You're not going to expect empty()
to not work on a const
object based on how the same function operates in other classes.
Edit: I'll take Niels' thumb up as agreement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger that!
@@ -5542,30 +5543,46 @@ class iterator_input_adapter | |||
using char_type = typename std::iterator_traits<IteratorType>::value_type; | |||
|
|||
iterator_input_adapter(IteratorType first, IteratorType last) | |||
: current(std::move(first)), end(std::move(last)) | |||
: current(std::move(first)), end(std::move(last)), current_has_been_consumed(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may as well remove the member initializer here, since you've already initialized it in the declaration.
I don't know if clang-tidy, etc. will complain. There're various checks for redundant stuff.
Edit: clang-tidy did in fact complain:
error: member initializer for 'current_has_been_consumed' is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to keep it in the constructor (maybe explicitly delete the default constructor) and remove the initialisation in the declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to keep it in the constructor (maybe explicitly delete the default constructor) and remove the initialisation in the declaration?
Pick one. I prefer in-class myself, so I'd remove the member initializer as suggested.
The default constructor is implicitly deleted because we have a user-defined constructor, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default constructor is implicitly deleted because we have a user-defined constructor, right?
Yes, this is actually correct
I assume this is at least somewhat related to this discussion? #3411 Can you construct a unit test that fails with |
It could be, I'm going to check later
Yes, I was thinking to add some unit-test too after Thanks |
For small things like this you can always use Compiler Explorer. A starting point: https://godbolt.org/z/c83xMcc77 Note that you can Edit: To clarify: Add your test here. This is more for you to be sure your test actually demonstrates how your changes fixed something that would previously fail. |
IteratorType end; | ||
mutable IteratorType current; | ||
const IteratorType end; | ||
mutable bool current_has_been_consumed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/__w/json/json/include/nlohmann/detail/input/input_adapters.hpp:163:18: error: use default member initializer for 'current_has_been_consumed' [modernize-use-default-member-init,-warnings-as-errors]
mutable bool current_has_been_consumed;
^
{false}
clang-tidy seems to insist on this:
mutable bool current_has_been_consumed{false};
(Didn't know/remember that.)
CI should turn green after these two changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use
mutable bool current_has_been_consumed = false;
This library has a troubled relationship to braced initializers... :-)
{ | ||
current_has_been_consumed = false; | ||
return std::char_traits<char_type>::eof(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And clang-tidy doesn't like else after return. It wants this instead:
if (JSON_HEDLEY_LIKELY(current != end))
{
current_has_been_consumed = true;
return std::char_traits<char_type>::to_int_type(*current);
}
current_has_been_consumed = false;
return std::char_traits<char_type>::eof();
@falbrechtskirchinger Edit: extended test |
Thanks! That demonstrates the issue clearly enough. |
@falbrechtskirchinger @nlohmann should be now |
Does this allow to finally parse JSON lines (https://json.nlohmann.me/features/parsing/json_lines/)? |
Sorry can you expand on that? what is not working about json_lines? If you are talking about something like this there the problem is multiple fold and is not directly corrected by this PR The problem with
If you want I can push a PR to address that problem too [edit] [edit2] |
@imwhocodes Can you please check the comments? |
Just came back from holiday, gonna address in this days |
@imwhocodes Is there anything I can do to support this? |
@imwhocodes Do you have a test case that demonstrates this error? |
See quote below. I can share my test case which fails to reproduce the issue later.
|
Thanks. I thought that since you couldn't reproduce it in a test case, we didn't have a working repro yet. |
Sorry everybody for the delay, Thanks for the patience |
@imwhocodes I could not independently write a test that would trigger your issue. Guess I should make that available so we can compare and figure out who's right. Edit: To be crystal clear: I don't believe there's an issue to fix. |
Can you show me your test case? The situation where the current (not fixed) implementation fail is on an input iterator where the action of reading actually remove from the underlaying buffer, ex:
The current (not fixed) implementation of On the link you can see this happening, you can change Here for any question, |
Ok, I understand the problem now. It's about advancing the iterator vs. dereferencing it. You can look at my test case in PR #3531. Either way, your fix breaks my use case. I'll need to look at it a bit closer when I have some spare time, meaning not right now. |
The logic problem with your test is that you are de-referencing a pointer to the last read When you deference you iterator again to check where you are you are consuming the same char again (the last parsed by the parser) Here lay the problem, on your test iterator end up were it actually should be, if it would end up one char after the parameter |
@falbrechtskirchinger sorry but this example that you give me is broken too (undefined behaviour) You are getting the value when you deference the iterator pointer, but the pointer could be already the Here is a failing test for your example, I had to move from Edit: Also with your proposed solution even |
Well, this is your code. :-) |
I've taken a closer look and even though I'd recommend changes to both ...,'u','e','_','B' }; After completing the input as follows the assertion succeedes. ...,'u','e','_','B','"','}' }; |
And here's the updated code with some improvements to the operators: Possibly still not 100% correct but at least closer. |
I'm honestly thinking that you are trolling me,
Code example where you moved the
And you proposed it as solution, solution where end iterator is dereferenced As you can see on my initial test case (here edited to be more clear) the |
Yes I removed on purpose to show you that on incomplete input your solution is UB About that:
This is not valid solution, on a STREAM-like input you don't have |
I modified the pertinent bits of your code to make the unaltered example work.
|
How? The program is crashing on your assertion, not because the end iterator is dereferenced?
Well, the |
@falbrechtskirchinger ok lets start fresh: This is your proposed solution: to obtaining the This is mine original solution: to obtaining the Said that my solution can work only if the upper layer (the parser) don't try to read one
|
So, I finally reproduced the dereference you were talking about. You're not using the last code I posted, which includes tweaked This is why we were talking past one another. We were using different code. Here's the quote with the update:
|
Yes i have seen the code at https://godbolt.org/z/Ee1Pr3Gse and this is why I'm telling you that it is not valid, if we want this to work with a stream (even simply Please read carefully this message, everything is explained |
I've read your message. You're trying to shoehorn a stream into the iterator interface. If you insist on parsing JSON directly from a stream, it's on you to find a way to make it work. My point is that the library code is working as intended and your "fix" breaks other valid code. I'm sorry I didn't catch this right at the beginning. If you want help with your iterator code for a specific stream API (beyond the contrived examples using Let me address one more point of your argument directly:
Advancing the iterator to the next character should never be a problem. I'll quote the standard to you again: "Iterators are generalized pointers". They are supposed to follow similar semantics. @nlohmann @gregmarr We've reached a stalemate and need someone else to chime in. |
The real scenario problem it is not about knowing if the stream is closed, but it is that on the current About breaking the API I don't think it is true, all ctest pass correctly and final behaviour is the same for all common function My idea is it that it is better to have the Giving some context: personally I started using nlohmann::json for the very powerfull combo of the sax_parser with a stream as input, giving me the ability to parse json and msgpack that are bigger than the available ram with a custom sax_parser keeping only the data which I'm actually interested (and also without any heap allocations, mostly on MCUs) |
You can avoid that with
It's a breaking change and I've demonstrated why. Passing the unit tests does not invalidate that assertion.
You're breaking expected iterator semantics. The past-the-end iterator points past the end because otherwise, we'd have no way to represent an empty range. You're suggestion to "simply do a
Nothing prevents you from doing that. You can implement your own iterator to behave as you describe. Just don't break iterator semantics for everyone else because it's convenient for one use case out of many. In summary: C++ developers understand iterators, they expect them to follow established rules. The concept of the past-the-end iterator is fundamental, don't mess with that. What you want can be done without modifying the library code. I rest my case. |
No, this don't fix the problem, it is not about strict, it is about the parsing before the strict check
Never said that i want to break the past-the-end iterator, simply that value pointed by iterator when finished parsing should be last parsed
Yes, the fact that the upper layer is advancing the iterator one position ahead of the one actually used
I reiterate, it is not about breaking iterator, like this library itself say I want to stick to this specification: https://en.cppreference.com/w/cpp/named_req/InputIterator Explaining as a simple loop where
auto it = data.begin();
while( it != data.end() ){
auto value = *it;
it++;
if( ! parse(value) ){
break;
}
}
//Now iterator point to one item after the last actually used
auto it = data.begin();
while( it != data.end() ){
auto value = *it;
if( ! parse(value) ){
break;
}
it++;
}
//Now iterator point to the last item that was actually used Hope that this is clear of where the problem lays |
I've looked at the discussion, and thought a lot about this, and done some researching, and I'm still not sure if it's possible to "fix" this use case at this level. I looked at https://en.cppreference.com/w/cpp/iterator/istream_iterator and it seems like it should be possible for the stream iterator itself, rather than the json adapter, to handle this. However, the I don't know how to reconcile the "leaving the iterator pointing at the last thing read" with "the iterator needs to be pointing to the next thing to read in order to start parsing". On the other hand, there is no visible change as far as the caller of
That's JSON vs CBOR, not regular parse vs sax parse, got the two confused, never mind. |
That's what https://en.cppreference.com/w/cpp/iterator/istream_iterator
|
The Having a newline or other whitespace after the last char of the first would actually fix both issues, but that doesn't help if you can't change the source. |
The difference being that If you want a |
@falbrechtskirchinger I just went looking through the discussions in this PR for it and couldn't find it. Can you remind me again which case is broken with this change? |
See #3548. |
Thanks, that makes sense. I hadn't been considering the underlying iterator surviving beyond the adapter. |
Another thought: Combining an Once again, there are several (external) solutions to this problem. Ideally, we should work towards making |
Is some way I cannot understand why we should preserve an internal behaviour that is not defined by the API, on the other side is possible that people is relaying on that hidden/implicit behaviour What I'm proposing now is to split
What do you think about it? @gregmarr @falbrechtskirchinger |
I took a |
Got it, but now you need to always keep the See https://godbolt.org/z/1f9sGW6cE (btw funny that losing one Still more convinced that the previously proposed could be a good solution managing all scenario:
|
How should we proceed here? |
As is, this PR would conflict with #3548. Unless @imwhocodes wants to try another solution, this can be closed. |
Before this PR
iterator_input_adapter
was indiscriminately consuming/advancing the input iterator before actually know if it should onget_character()
Case example:
iterator_input_adapter
over a custom input (ex input adapter over a stream) that can yield on iterator advance: even withstrict=false
theiterator_input_adapter
was trying to consume one item more even if the parser or the sax implementation hadreturn false
iterator_input_adapter
over a custom input (ex vector) with multiple json chained together even withstrict=false
theiterator_input_adapter
was consuming one char more than it should even if the parser or the sax implementation hadreturn false
This implementation fix the issue, it advance the iterator only if the
current
has ben already consumed, otherwise it wait the nextget_character()
to actually advance the iteratorPull request checklist
Read the Contribution Guidelines for detailed information.
include/nlohmann
directory, runmake amalgamate
to create the single-header filesingle_include/nlohmann/json.hpp
. The whole process is described here.