Issues with the arithmetic in iterator and reverse iterator #593

HenryRLee · 2017-05-26T16:04:06Z

Hi, I just found that the iterator class hasn't override the n + iterator operator, which is a requirement in the random access iterator type.

More specifically, this operator is missing:

operator+(difference_type, const iter_impl&);

The failing code is:

json j_object
auto it = j_object.cbegin();
1 + it; // error

The text was updated successfully, but these errors were encountered:

HenryRLee · 2017-05-27T05:56:54Z

Actually I found some more issues during the fix up.

The iter_impl was defined as a random access iterator (https://github.com/nlohmann/json/blob/develop/src/json.hpp#L7994), but a couple of lines later, the iterator category is then changed to bidirectional (https://github.com/nlohmann/json/blob/develop/src/json.hpp#L8018). I tried to fix this, but it introduced some fail the test cases, e.g. distance(rbegin(), rend()) == size() failed.

So I dig it up and found that the json reverse iterator is overriding all the operators, while the implementation of the iter - iter operator is not correct (https://github.com/nlohmann/json/blob/develop/src/json.hpp#L8677). Instead of this->base() - other.base(), it should be other.base() - this->base().

Here is the implementation in llvm:

template <class _Iter1, class _Iter2>
inline _LIBCPP_INLINE_VISIBILITY
typename reverse_iterator<_Iter1>::difference_type
operator-(const reverse_iterator<_Iter1>& __x, const reverse_iterator<_Iter2>& __y)
{
    return __y.base() - __x.base();
}

In addition, the iter + n and iter - n operators are implemented in a reversed way.

Personally, I would recommend removing all the operator override in the reverse iterator, since the template class reverse_iterator<iterator> should give us all the correct implementations.

Reference: std::reverse_iterator

HenryRLee · 2017-05-27T15:53:07Z

Sorry, I made a mistake in the previous message.

As for the reverse iterator, I can see why the operators are override. But still three operators were implemented in an incorrect way, which are iter + n, iter - n and iter - iter.

nlohmann · 2017-05-27T15:59:49Z

It took me quite a while to get the iterators "working". I expect that there are better ways to implement them, and I am not surprised that there are issues. I am grateful for any hints how to improve the situation.

HenryRLee · 2017-05-27T16:05:45Z

I could make them compiled. But some of the cases are failing. I think it's the problem of the test code.

For example, I can see that iterators of object types are not able to compare, but some test cases were still using distance(j.begin(), j.end()) (unit-capacity.cpp +345). Shall we remove these cases?

And the other cases are related to the reverse iterator, could be fixed easily.

HenryRLee · 2017-05-27T16:15:53Z

Oh, just misunderstood your last message. I have an idea of implementing the reverse iterator, instead of inheriting the reverse_iterator<iter_impl>, we could inherit template <class T> reverse_iterator. In such a way, the return type of all the operators will still be consistent with the iter_impl classes, so no need to implement the operators again. But it's just an idea, I haven't tested it yet.

nlohmann · 2017-05-28T20:27:15Z

As I said: I was happy once the iterators worked, and the did not receive much love until they were refactored with #395. If you have any proposals how to improve them, I would be happy about a PR. Maybe as a bonus, you may have and idea how to fix #550. :-)

HenryRLee · 2017-05-29T08:25:04Z

Sure. I'll have these bugs related to arithmetic operators fixed today, then I'll see what I can do to improve the design, and also #550.

Pleasure to help.

HenryRLee · 2017-05-30T09:38:45Z

I did a code walk today, and wondering if I could improve the iterator code structure.

Currently the iterator has three internal iterators, which looked very redundant at the first glance. So I was trying to merge the three internal iterators into one type.

Another issue is that the iterator category, actually should be consistent with the iterator class in the internal container. For example, if the container is std::map, the iterator category tag should be bidirectional. While if the container is std::vector, the tag should be random accessible. I was wondering if I could define the category tag dynamically, according to the real container type.

It turned out that, none of these improvement is possible, because the basic_json class is using the value_type as a variable instead of a type name, unless we introduce polymorphism. I understand that there is no easy way to change value_type to a type name, so I guess we can only keep it that way.

My last recommendation is to change the iterator category to random accessible tag, because in this tag, running functions like std::advance and std::distance will be constant time complexity if the container is a vector-like container. As for the non random accessible containers, we can still re-implement the linear running time iter + n operators for these types, such that std::distance wouldn't throw an exception.

nlohmann · 2017-05-31T19:52:54Z

I'm sorry that I won't find the time to properly answer here or have a deeper look at the PRs. Sorry about this - I shall get back to this next week.

HenryRLee · 2017-06-01T00:56:59Z

No worries. Take your time.

nlohmann · 2017-06-06T19:08:05Z

Regarding #593 (comment):

I agree to paragraph 1-4. The idea is that all JSON values have the same type (basic_json) with the specific type encoded in variable m_type. We won't change this.

In the 5th paragraph, you propose to change the behavior of the iterator so iter + n would work for objects. I don't really like this idea, because it would change the behavior of JSON objects compared to std::map and would also suggest an ordering of the keys in an object. This would contradict the definition in RFC 7159 which states

An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

HenryRLee · 2017-06-07T07:53:35Z

I understand.

How about the iterator category tag? Do you think we should change it to random access tag? Even if we don't change the implementation, we can still use the random accessible tag. All we need to do is removing the test cases that are using random access operations on object types. (i.e. https://github.com/nlohmann/json/blob/develop/test/src/unit-capacity.cpp#L327)

nlohmann · 2017-06-07T08:04:01Z

Your mentioned line is

CHECK(std::distance(j.rbegin(), j.rend()) == j.size());

for an object j. Would this mean std::distance would not work any more?

HenryRLee · 2017-06-07T08:08:08Z

Correct. Because std::distance would call iter - iter if it finds the iterator is a random accessible iterator. But iter - iter of an object type throws an exception in the current implementation.

nlohmann · 2017-06-07T08:22:17Z

I am puzzled - why did it work before?

nlohmann · 2017-06-07T08:22:44Z

Or to clarify: I would like JSON objects to behave just like std::map.

HenryRLee · 2017-06-07T08:23:35Z

It's still working now. We haven't change the tag to random access tag yet.

HenryRLee · 2017-06-07T08:27:42Z

We have to choose one tag, either bidirectional or random access. Currently we are using bidirectional, and it works as std::map does. However the functions for std::vector types would not be optimized.

HenryRLee · 2017-06-07T09:46:22Z

I'll try to make this clearer. The iterator has been using the bidirectional tag all the time. Even though it inherits the random accessible iterator, the tag is changed to bidirectional a few lines later.

When I tried to change the tag to random access, many bugs occurred, but most of them got resolved in the PR#595. I am still keeping this tag as bidirectional, because if we use random access, std::distance would throw exceptions on object types. (I would expect std::advance does the same.)

If we don't change the current implementation, the pros and cons in bidirectional and random access tag ares:

bidirectional tag:
- pros: compatible to all types
- cons: functions such as std::distance runs linear time on array types
random access tag:
- pros: functions optimized for array types
- cons: functions such as std::distance doesn't work for object types

HenryRLee · 2017-06-07T09:59:53Z

Actually there is a ambiguity on the definition of the object type. If you believe the objects should have no ordering at all, then std::distance is meaningless, same as ++ and --. But std::map does have an ordering. If you have operator++ implemented, then operator+(n) somehow makes sense. Although std::map doesn't have operator+(n) either, due to the nature of the data structure, we can still overloading it to support std::distance and std::advance.

nlohmann · 2017-06-07T20:39:10Z

Why would std::distance be linear on array types if we use std::vector internally?

gregmarr · 2017-06-07T20:42:41Z

If the iterators are bidirectional instead of random access, then distance needs to iterate the first iterator until it gets to the second and count the steps, instead of doing second - first.

HenryRLee · 2017-06-08T02:35:27Z

Gregmarr is right, if the tag is bidirectional, the distance function assumes iter - iter operation is invalid, therefore it would only use ++ operator for linear times to count the distance.

nlohmann · 2017-06-08T07:23:22Z

Sorry @HenryRLee for my ongoing confusion. I am unsure about this. Any further opinion is greatly appreciated.

HenryRLee · 2017-06-08T10:00:18Z

Personally, I still think adding iter + n operators for the object type is the best solution. These are just higher level of the ++ operators, wouldn't harm the existing property of the container. While the unordered property is contradicted when an iterator is provided for the object type.

Having overloading the iter + n operators and choosing the random access tag, the std::distance and std::advance functions would work on both array types and object types. In addition, theses functions would be optimized for array types with only O(1) time complexity.

theodelrieu · 2017-06-08T11:04:39Z

I think the iterator should remain bidirectional, we don't want to lose correctness on JSON objects.

IIRC, there is a way to access the underlying value (get_ref/ptr), for people that want optimizations on std::distance and other algorithms.

This commit changes the iterator category to andom_access_iterator and allows offsets and subscript operators for object iterators.

nlohmann · 2017-08-16T19:36:28Z

@HenryRLee Thanks for the offer, but I already gave it a try, see c77a0be. It would be great if you could have a look at the changes.

jaredgrubb · 2017-08-16T23:27:08Z

It's incorrect to make this iterator class look like a random-access iterator -- in some cases it is (vector), but in others it's not (map). You'll create a correctness issue, and could cause major performance regressions if the STL starts choosing algorithms tuned for O(1) access that turns out to be O(n). For example, std::sort requires random-access in order to achieve O(n log n), but if you give it iterators that actually have O(n) traversal when it is expecting O(1), you could end up with really bad performance.

This iterator needs to be the greatest-common-denominator of the choices, and that is bidirectional.

HenryRLee · 2017-08-17T05:04:42Z

Hi @jaredgrubb Correct me if I am wrong. As far as I can see, there are no performance regressions in any functions in <algorithm> or <iterator>. sort() takes only random-access iterator tag, same as the heap related functions. The binary search or merge related functions have only one version for both forward and random-access iterator tags. So, for example, if user chooses to sort on an object type when it's in a random-access iterator tag, the complexity is indeed O(n^2 logn) . But using a bidirectional iterator tag instead, sort() doesn't work at all, there wouldn't be O(n logn) or O(n^2) solutions. So, choosing a random-access tag is no worse than bidirectional tag.

The only problem is that user may be expecting more when they sees the tag as a random-access tag. It's the same issue when running advance or distance functions, where although the tag is random-access but it turns out to be running in O(n) time. It could be a correctness issue depending on how we define it.

theodelrieu · 2017-08-17T06:29:25Z

I agree with @jaredgrubb, as I wrote a few months ago , we should not modify this.

Bidirectional is correct in every cases, not optimal that's true, so what?

If a user wants to perform some algorithms on a JSON array, they just convert it into a vector or array.
And if the cost of conversion is too high, they use get_ptr to access the underlying vector

I'm sorry but I don't see how this proposed solution could be beneficial to the library

jaredgrubb · 2017-08-17T16:53:41Z

@HenryRLee, I have to disagree. You're right that there's no direct performance regression with your patch (the things that worked before continue to work just fine, and maybe even faster), but you're opting the iterator into new things that it may not be suited for.

Look, if you review the requirements for a random-access iterator (eg, at http://en.cppreference.com/w/cpp/concept/RandomAccessIterator), then you will see that the '+=' has a note requiring 'complexity is constant'. The json::iterator clearly does not fulfill that, and therefore must not be labelled a random-access iterator.

nlohmann · 2017-08-17T19:16:21Z

I agree that it is ugly/incorrect to call an interator "random-access" when it does not satisfy all of its requirements. I also agree that using a bidirectional iterator as least common denominator could be a solution for this.

But the nature of the JSON container makes life more difficult, because we need to find a way to find a common interface for differing underlying containers such as std::map and std::vector. In the end, I always aimed for an interface that makes life easy and that tries to hide the fact that we are working with a union of different types under the hood, but rather with a JSON value.

That is why I like the idea of having an iterator that combines the best of all worlds. Constant runtime for arrays, and also the same interface for objects. I agree that it may feel awkward/terrible/wrong for people familiar with the iterator requirements, but helpful for others that, for instance, call std::sort on a JSON value and do not need to care whether they cope with an array or an object.

I'm looking forward to reading more here, and I hope that we can find a way to achieve a nice interface for a JSON container.

theodelrieu · 2017-08-18T09:16:59Z

Several things here:

It's good for an interface to be easy to use, but also hard to misuse.

About std::sort, it's the same problem with std::list, if people want a sorted list, they call list::sort, so we could add a basic_json::sort method.
Often times you have to call a container member function, since std:: algorithms just deal with iterators, and knowing their category is sometimes not sufficient:

Even with C++17 searchers, using string::find can be more efficient than std::search.

~~But one of the issue mentioned by @HenryRLee was that std::advance et al. didn't work in some specific cases.~~

~~I think the solution to have both correctness and good performance is to define distance/next/advance etc... for nlohmann::detail::iter_impl, like you would do for std::swap.~~

This would make std::* algorithms call our custom distance automatically.
For non std code, the following snippet will be necessary to have the best performance:

~~I don't know the library iterators that well, it seems that calling begin/end on a number or string work, so we'll have to keep that in mind if we implement distance et al.~~

I forgot that algorithms just SFINAE on the iterator tag, nevermind that.

Anyway I stand by my first point, that we should not defeat the purpose of iterator machinery (and thus the whole Standard Library design)

If a user desires to perform an algorithm that requires RandomAccessIterator, we can add it as a member of basic_json.

I don't remember seeing an issue about a support of such an algorithm yet (though I need to look closely at issues to confirm that), so crippling correctness to solve an issue that nobody has mentioned yet is bound to create some.

That is a pretty big deal IMO

This reverts commit c77a0be.

nlohmann · 2017-08-20T18:45:54Z

I reverted the changes for now.

gregmarr · 2017-08-21T20:08:20Z

I agree that misrepresenting the iterator tag can be very bad. If this checking could be done at runtime, so that if it's on an array, the iterator is random access, but on an object it's bidirectional, that would be okay, but I'm pretty sure that the machinery doesn't work that way, and it's all static typing.

theodelrieu · 2017-08-21T21:09:33Z

You are right, it's compile-time only.

nlohmann · 2017-08-25T21:30:41Z

OK, the longer I think about this, the more I am convinced that we should not change the code to random_access_iterator_tag.

Question is: is there anything to be changed in the code to close this issue once and for all?

HenryRLee · 2017-08-25T21:56:23Z

I think we have to find a solution to bind the iterator type dynamically. It would be difficult but not impossible. Note that if we are strict to the least common denominator, bidirectional is not the one, because user may choose a forward list as an array type, or even an unordered map as an object (although we don't support it yet).

Having said that, this issue can be closed. Perhaps we need to change the template argument of the iterator to bidirectional and update all the documents, since I remember they were all random access tag all the time.

theodelrieu · 2017-08-25T21:59:02Z

Actually yes, IIRC we inherit from random_iterator_tag, and then alias iterator_category, to bidirectional_tag. We should inherit from bidirectional_tag directly.

There is code that relies on std::is_base_of<std::random_iterator_tag, ...> (not in the library AFAIK, but client code could), that is one correctness issue to fix.

I don't see anything concerning the iterator category.

stale · 2017-10-25T09:53:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nlohmann · 2017-10-27T16:48:00Z

Would this be sufficient to switch to bidirectional iterators and close this issue?

@ -3618,7 +3618,7 @@ This class implements a both iterators (iterator and const_iterator) for the
@since version 1.0.0, simplified in version 2.0.9
*/
template<typename BasicJsonType>
- class iter_impl : public std::iterator<std::random_access_iterator_tag, BasicJsonType>
+ class iter_impl : public std::iterator<std::bidirectional_iterator_tag, BasicJsonType>
{
    /// allow basic_json to access private members
    friend iter_impl

theodelrieu · 2017-10-27T16:57:11Z

You can also remove the using iterator_category = ..., since std::iterator already exposes this typedef.

nlohmann · 2017-10-28T07:53:52Z

Thanks everyone. I shall close this issue once the CI succeeds.

HenryRLee changed the title ~~The (n + iterator) operator is missing in the iterator class~~ Issues with the arithmetic in iterator and reverse iterator May 29, 2017

HenryRLee mentioned this issue Jun 5, 2017

Issue #593 Fix the arithmetic operators in the iterator and reverse iterator #595

Merged

nlohmann added the kind: enhancement/improvement label Jun 7, 2017

nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jun 8, 2017

nlohmann mentioned this issue Aug 16, 2017

Support startIndex for from_cbor/from_msgpack #462

Closed

nlohmann added a commit that referenced this issue Aug 16, 2017

💥 changed iterators to andom_access_iterator #593

c77a0be

This commit changes the iterator category to andom_access_iterator and allows offsets and subscript operators for object iterators.

nlohmann added solution: proposed fix a fix for the issue has been proposed and waits for confirmation and removed state: please discuss please discuss the issue or vote for your favorite option labels Aug 16, 2017

nlohmann self-assigned this Aug 16, 2017

nlohmann added this to the Release 3.0.0 milestone Aug 16, 2017

nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Aug 17, 2017

nlohmann added a commit that referenced this issue Aug 20, 2017

Revert ":boom: changed iterators to andom_access_iterator #593"

64ef100

This reverts commit c77a0be.

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Oct 25, 2017

stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Oct 27, 2017

nlohmann added a commit that referenced this issue Oct 28, 2017

🔨 set bidirectional iterator tag #593

8e067c0

nlohmann removed the state: please discuss please discuss the issue or vote for your favorite option label Oct 28, 2017

nlohmann closed this as completed Oct 28, 2017

Issues with the arithmetic in iterator and reverse iterator #593

Issues with the arithmetic in iterator and reverse iterator #593

Comments

HenryRLee commented May 26, 2017 • edited by nlohmann Loading

HenryRLee commented May 27, 2017 • edited by nlohmann Loading

HenryRLee commented May 27, 2017

nlohmann commented May 27, 2017

HenryRLee commented May 27, 2017 • edited Loading

HenryRLee commented May 27, 2017 • edited Loading

nlohmann commented May 28, 2017

HenryRLee commented May 29, 2017 • edited Loading

HenryRLee commented May 30, 2017 • edited Loading

nlohmann commented May 31, 2017

HenryRLee commented Jun 1, 2017

nlohmann commented Jun 6, 2017

HenryRLee commented Jun 7, 2017

nlohmann commented Jun 7, 2017

HenryRLee commented Jun 7, 2017 • edited Loading

nlohmann commented Jun 7, 2017

nlohmann commented Jun 7, 2017

HenryRLee commented Jun 7, 2017

HenryRLee commented Jun 7, 2017

HenryRLee commented Jun 7, 2017 • edited Loading

HenryRLee commented Jun 7, 2017

nlohmann commented Jun 7, 2017

gregmarr commented Jun 7, 2017

HenryRLee commented Jun 8, 2017

nlohmann commented Jun 8, 2017

HenryRLee commented Jun 8, 2017 • edited Loading

theodelrieu commented Jun 8, 2017

nlohmann commented Aug 16, 2017

jaredgrubb commented Aug 16, 2017

HenryRLee commented Aug 17, 2017

theodelrieu commented Aug 17, 2017

jaredgrubb commented Aug 17, 2017

nlohmann commented Aug 17, 2017 • edited Loading

theodelrieu commented Aug 18, 2017 • edited Loading

nlohmann commented Aug 20, 2017

gregmarr commented Aug 21, 2017

theodelrieu commented Aug 21, 2017

nlohmann commented Aug 25, 2017

HenryRLee commented Aug 25, 2017

theodelrieu commented Aug 25, 2017

stale bot commented Oct 25, 2017

nlohmann commented Oct 27, 2017

theodelrieu commented Oct 27, 2017

nlohmann commented Oct 28, 2017

HenryRLee commented May 26, 2017 •

edited by nlohmann

Loading

HenryRLee commented May 27, 2017 •

edited by nlohmann

Loading

HenryRLee commented May 27, 2017 •

edited

Loading

HenryRLee commented May 27, 2017 •

edited

Loading

HenryRLee commented May 29, 2017 •

edited

Loading

HenryRLee commented May 30, 2017 •

edited

Loading

HenryRLee commented Jun 7, 2017 •

edited

Loading

HenryRLee commented Jun 7, 2017 •

edited

Loading

HenryRLee commented Jun 8, 2017 •

edited

Loading

nlohmann commented Aug 17, 2017 •

edited

Loading

theodelrieu commented Aug 18, 2017 •

edited

Loading