Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the arithmetic in iterator and reverse iterator #593

Closed
HenryRLee opened this issue May 26, 2017 · 45 comments
Closed

Issues with the arithmetic in iterator and reverse iterator #593

HenryRLee opened this issue May 26, 2017 · 45 comments
Assignees
Labels
kind: enhancement/improvement solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Milestone

Comments

@HenryRLee
Copy link
Contributor

HenryRLee commented May 26, 2017

Hi, I just found that the iterator class hasn't override the n + iterator operator, which is a requirement in the random access iterator type.

More specifically, this operator is missing:

operator+(difference_type, const iter_impl&);

The failing code is:

json j_object
auto it = j_object.cbegin();
1 + it; // error
@HenryRLee
Copy link
Contributor Author

HenryRLee commented May 27, 2017

Actually I found some more issues during the fix up.

The iter_impl was defined as a random access iterator (https://github.com/nlohmann/json/blob/develop/src/json.hpp#L7994), but a couple of lines later, the iterator category is then changed to bidirectional (https://github.com/nlohmann/json/blob/develop/src/json.hpp#L8018). I tried to fix this, but it introduced some fail the test cases, e.g. distance(rbegin(), rend()) == size() failed.

So I dig it up and found that the json reverse iterator is overriding all the operators, while the implementation of the iter - iter operator is not correct (https://github.com/nlohmann/json/blob/develop/src/json.hpp#L8677). Instead of this->base() - other.base(), it should be other.base() - this->base().

Here is the implementation in llvm:

template <class _Iter1, class _Iter2>
inline _LIBCPP_INLINE_VISIBILITY
typename reverse_iterator<_Iter1>::difference_type
operator-(const reverse_iterator<_Iter1>& __x, const reverse_iterator<_Iter2>& __y)
{
    return __y.base() - __x.base();
}

In addition, the iter + n and iter - n operators are implemented in a reversed way.

Personally, I would recommend removing all the operator override in the reverse iterator, since the template class reverse_iterator<iterator> should give us all the correct implementations.

Reference: std::reverse_iterator

@HenryRLee
Copy link
Contributor Author

Sorry, I made a mistake in the previous message.

As for the reverse iterator, I can see why the operators are override. But still three operators were implemented in an incorrect way, which are iter + n, iter - n and iter - iter.

@nlohmann
Copy link
Owner

It took me quite a while to get the iterators "working". I expect that there are better ways to implement them, and I am not surprised that there are issues. I am grateful for any hints how to improve the situation.

@HenryRLee
Copy link
Contributor Author

HenryRLee commented May 27, 2017

I could make them compiled. But some of the cases are failing. I think it's the problem of the test code.

For example, I can see that iterators of object types are not able to compare, but some test cases were still using distance(j.begin(), j.end()) (unit-capacity.cpp +345). Shall we remove these cases?

And the other cases are related to the reverse iterator, could be fixed easily.

@HenryRLee
Copy link
Contributor Author

HenryRLee commented May 27, 2017

Oh, just misunderstood your last message. I have an idea of implementing the reverse iterator, instead of inheriting the reverse_iterator<iter_impl>, we could inherit template <class T> reverse_iterator. In such a way, the return type of all the operators will still be consistent with the iter_impl classes, so no need to implement the operators again. But it's just an idea, I haven't tested it yet.

@nlohmann
Copy link
Owner

As I said: I was happy once the iterators worked, and the did not receive much love until they were refactored with #395. If you have any proposals how to improve them, I would be happy about a PR. Maybe as a bonus, you may have and idea how to fix #550. :-)

@HenryRLee
Copy link
Contributor Author

HenryRLee commented May 29, 2017

Sure. I'll have these bugs related to arithmetic operators fixed today, then I'll see what I can do to improve the design, and also #550.

Pleasure to help.

@HenryRLee HenryRLee changed the title The (n + iterator) operator is missing in the iterator class Issues with the arithmetic in iterator and reverse iterator May 29, 2017
@HenryRLee
Copy link
Contributor Author

HenryRLee commented May 30, 2017

I did a code walk today, and wondering if I could improve the iterator code structure.

Currently the iterator has three internal iterators, which looked very redundant at the first glance. So I was trying to merge the three internal iterators into one type.

Another issue is that the iterator category, actually should be consistent with the iterator class in the internal container. For example, if the container is std::map, the iterator category tag should be bidirectional. While if the container is std::vector, the tag should be random accessible. I was wondering if I could define the category tag dynamically, according to the real container type.

It turned out that, none of these improvement is possible, because the basic_json class is using the value_type as a variable instead of a type name, unless we introduce polymorphism. I understand that there is no easy way to change value_type to a type name, so I guess we can only keep it that way.

My last recommendation is to change the iterator category to random accessible tag, because in this tag, running functions like std::advance and std::distance will be constant time complexity if the container is a vector-like container. As for the non random accessible containers, we can still re-implement the linear running time iter + n operators for these types, such that std::distance wouldn't throw an exception.

@nlohmann
Copy link
Owner

I'm sorry that I won't find the time to properly answer here or have a deeper look at the PRs. Sorry about this - I shall get back to this next week.

@HenryRLee
Copy link
Contributor Author

No worries. Take your time.

@nlohmann
Copy link
Owner

nlohmann commented Jun 6, 2017

Regarding #593 (comment):

I agree to paragraph 1-4. The idea is that all JSON values have the same type (basic_json) with the specific type encoded in variable m_type. We won't change this.

In the 5th paragraph, you propose to change the behavior of the iterator so iter + n would work for objects. I don't really like this idea, because it would change the behavior of JSON objects compared to std::map and would also suggest an ordering of the keys in an object. This would contradict the definition in RFC 7159 which states

An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.

@HenryRLee
Copy link
Contributor Author

I understand.

How about the iterator category tag? Do you think we should change it to random access tag? Even if we don't change the implementation, we can still use the random accessible tag. All we need to do is removing the test cases that are using random access operations on object types. (i.e. https://github.com/nlohmann/json/blob/develop/test/src/unit-capacity.cpp#L327)

@nlohmann
Copy link
Owner

nlohmann commented Jun 7, 2017

Your mentioned line is

CHECK(std::distance(j.rbegin(), j.rend()) == j.size());

for an object j. Would this mean std::distance would not work any more?

@HenryRLee
Copy link
Contributor Author

HenryRLee commented Jun 7, 2017

Correct. Because std::distance would call iter - iter if it finds the iterator is a random accessible iterator. But iter - iter of an object type throws an exception in the current implementation.

@nlohmann
Copy link
Owner

nlohmann commented Jun 7, 2017

I am puzzled - why did it work before?

@nlohmann
Copy link
Owner

nlohmann commented Jun 7, 2017

Or to clarify: I would like JSON objects to behave just like std::map.

@HenryRLee
Copy link
Contributor Author

It's still working now. We haven't change the tag to random access tag yet.

@HenryRLee
Copy link
Contributor Author

We have to choose one tag, either bidirectional or random access. Currently we are using bidirectional, and it works as std::map does. However the functions for std::vector types would not be optimized.

@HenryRLee
Copy link
Contributor Author

HenryRLee commented Jun 7, 2017

I'll try to make this clearer. The iterator has been using the bidirectional tag all the time. Even though it inherits the random accessible iterator, the tag is changed to bidirectional a few lines later.

When I tried to change the tag to random access, many bugs occurred, but most of them got resolved in the PR#595. I am still keeping this tag as bidirectional, because if we use random access, std::distance would throw exceptions on object types. (I would expect std::advance does the same.)

If we don't change the current implementation, the pros and cons in bidirectional and random access tag ares:

  • bidirectional tag:
    • pros: compatible to all types
    • cons: functions such as std::distance runs linear time on array types
  • random access tag:
    • pros: functions optimized for array types
    • cons: functions such as std::distance doesn't work for object types

@HenryRLee
Copy link
Contributor Author

Actually there is a ambiguity on the definition of the object type. If you believe the objects should have no ordering at all, then std::distance is meaningless, same as ++ and --. But std::map does have an ordering. If you have operator++ implemented, then operator+(n) somehow makes sense. Although std::map doesn't have operator+(n) either, due to the nature of the data structure, we can still overloading it to support std::distance and std::advance.

@nlohmann
Copy link
Owner

nlohmann commented Jun 7, 2017

Why would std::distance be linear on array types if we use std::vector internally?

@gregmarr
Copy link
Contributor

gregmarr commented Jun 7, 2017

If the iterators are bidirectional instead of random access, then distance needs to iterate the first iterator until it gets to the second and count the steps, instead of doing second - first.

@HenryRLee
Copy link
Contributor Author

Gregmarr is right, if the tag is bidirectional, the distance function assumes iter - iter operation is invalid, therefore it would only use ++ operator for linear times to count the distance.

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jun 8, 2017
@nlohmann
Copy link
Owner

nlohmann commented Jun 8, 2017

Sorry @HenryRLee for my ongoing confusion. I am unsure about this. Any further opinion is greatly appreciated.

@HenryRLee
Copy link
Contributor Author

HenryRLee commented Jun 8, 2017

Personally, I still think adding iter + n operators for the object type is the best solution. These are just higher level of the ++ operators, wouldn't harm the existing property of the container. While the unordered property is contradicted when an iterator is provided for the object type.

Having overloading the iter + n operators and choosing the random access tag, the std::distance and std::advance functions would work on both array types and object types. In addition, theses functions would be optimized for array types with only O(1) time complexity.

@theodelrieu
Copy link
Contributor

I think the iterator should remain bidirectional, we don't want to lose correctness on JSON objects.

IIRC, there is a way to access the underlying value (get_ref/ptr), for people that want optimizations on std::distance and other algorithms.

nlohmann added a commit that referenced this issue Aug 16, 2017
This commit changes the iterator category to andom_access_iterator and allows offsets and subscript operators for object iterators.
@nlohmann
Copy link
Owner

@HenryRLee Thanks for the offer, but I already gave it a try, see c77a0be. It would be great if you could have a look at the changes.

@nlohmann nlohmann added solution: proposed fix a fix for the issue has been proposed and waits for confirmation and removed state: please discuss please discuss the issue or vote for your favorite option labels Aug 16, 2017
@nlohmann nlohmann self-assigned this Aug 16, 2017
@nlohmann nlohmann added this to the Release 3.0.0 milestone Aug 16, 2017
@jaredgrubb
Copy link
Contributor

It's incorrect to make this iterator class look like a random-access iterator -- in some cases it is (vector), but in others it's not (map). You'll create a correctness issue, and could cause major performance regressions if the STL starts choosing algorithms tuned for O(1) access that turns out to be O(n). For example, std::sort requires random-access in order to achieve O(n log n), but if you give it iterators that actually have O(n) traversal when it is expecting O(1), you could end up with really bad performance.

This iterator needs to be the greatest-common-denominator of the choices, and that is bidirectional.

@HenryRLee
Copy link
Contributor Author

Hi @jaredgrubb Correct me if I am wrong. As far as I can see, there are no performance regressions in any functions in <algorithm> or <iterator>. sort() takes only random-access iterator tag, same as the heap related functions. The binary search or merge related functions have only one version for both forward and random-access iterator tags. So, for example, if user chooses to sort on an object type when it's in a random-access iterator tag, the complexity is indeed O(n^2 logn) . But using a bidirectional iterator tag instead, sort() doesn't work at all, there wouldn't be O(n logn) or O(n^2) solutions. So, choosing a random-access tag is no worse than bidirectional tag.

The only problem is that user may be expecting more when they sees the tag as a random-access tag. It's the same issue when running advance or distance functions, where although the tag is random-access but it turns out to be running in O(n) time. It could be a correctness issue depending on how we define it.

@theodelrieu
Copy link
Contributor

I agree with @jaredgrubb, as I wrote a few months ago , we should not modify this.

Bidirectional is correct in every cases, not optimal that's true, so what?

If a user wants to perform some algorithms on a JSON array, they just convert it into a vector or array.
And if the cost of conversion is too high, they use get_ptr to access the underlying vector

I'm sorry but I don't see how this proposed solution could be beneficial to the library

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Aug 17, 2017
@jaredgrubb
Copy link
Contributor

@HenryRLee, I have to disagree. You're right that there's no direct performance regression with your patch (the things that worked before continue to work just fine, and maybe even faster), but you're opting the iterator into new things that it may not be suited for.

Look, if you review the requirements for a random-access iterator (eg, at http://en.cppreference.com/w/cpp/concept/RandomAccessIterator), then you will see that the '+=' has a note requiring 'complexity is constant'. The json::iterator clearly does not fulfill that, and therefore must not be labelled a random-access iterator.

@nlohmann
Copy link
Owner

nlohmann commented Aug 17, 2017

I agree that it is ugly/incorrect to call an interator "random-access" when it does not satisfy all of its requirements. I also agree that using a bidirectional iterator as least common denominator could be a solution for this.

But the nature of the JSON container makes life more difficult, because we need to find a way to find a common interface for differing underlying containers such as std::map and std::vector. In the end, I always aimed for an interface that makes life easy and that tries to hide the fact that we are working with a union of different types under the hood, but rather with a JSON value.

That is why I like the idea of having an iterator that combines the best of all worlds. Constant runtime for arrays, and also the same interface for objects. I agree that it may feel awkward/terrible/wrong for people familiar with the iterator requirements, but helpful for others that, for instance, call std::sort on a JSON value and do not need to care whether they cope with an array or an object.

I'm looking forward to reading more here, and I hope that we can find a way to achieve a nice interface for a JSON container.

@theodelrieu
Copy link
Contributor

theodelrieu commented Aug 18, 2017

Several things here:

It's good for an interface to be easy to use, but also hard to misuse.

About std::sort, it's the same problem with std::list, if people want a sorted list, they call list::sort, so we could add a basic_json::sort method.
Often times you have to call a container member function, since std:: algorithms just deal with iterators, and knowing their category is sometimes not sufficient:

Even with C++17 searchers, using string::find can be more efficient than std::search.

But one of the issue mentioned by @HenryRLee was that std::advance et al. didn't work in some specific cases.

I think the solution to have both correctness and good performance is to define distance/next/advance etc... for nlohmann::detail::iter_impl, like you would do for std::swap.

This would make std::* algorithms call our custom distance automatically.
For non std code, the following snippet will be necessary to have the best performance:

I don't know the library iterators that well, it seems that calling begin/end on a number or string work, so we'll have to keep that in mind if we implement distance et al.

I forgot that algorithms just SFINAE on the iterator tag, nevermind that.

Anyway I stand by my first point, that we should not defeat the purpose of iterator machinery (and thus the whole Standard Library design)

If a user desires to perform an algorithm that requires RandomAccessIterator, we can add it as a member of basic_json.

I don't remember seeing an issue about a support of such an algorithm yet (though I need to look closely at issues to confirm that), so crippling correctness to solve an issue that nobody has mentioned yet is bound to create some.

That is a pretty big deal IMO

nlohmann added a commit that referenced this issue Aug 20, 2017
@nlohmann
Copy link
Owner

I reverted the changes for now.

@gregmarr
Copy link
Contributor

I agree that misrepresenting the iterator tag can be very bad. If this checking could be done at runtime, so that if it's on an array, the iterator is random access, but on an object it's bidirectional, that would be okay, but I'm pretty sure that the machinery doesn't work that way, and it's all static typing.

@theodelrieu
Copy link
Contributor

You are right, it's compile-time only.

@nlohmann
Copy link
Owner

OK, the longer I think about this, the more I am convinced that we should not change the code to random_access_iterator_tag.

Question is: is there anything to be changed in the code to close this issue once and for all?

@HenryRLee
Copy link
Contributor Author

I think we have to find a solution to bind the iterator type dynamically. It would be difficult but not impossible. Note that if we are strict to the least common denominator, bidirectional is not the one, because user may choose a forward list as an array type, or even an unordered map as an object (although we don't support it yet).

Having said that, this issue can be closed. Perhaps we need to change the template argument of the iterator to bidirectional and update all the documents, since I remember they were all random access tag all the time.

@theodelrieu
Copy link
Contributor

Actually yes, IIRC we inherit from random_iterator_tag, and then alias iterator_category, to bidirectional_tag. We should inherit from bidirectional_tag directly.

There is code that relies on std::is_base_of<std::random_iterator_tag, ...> (not in the library AFAIK, but client code could), that is one correctness issue to fix.

I don't see anything concerning the iterator category.

@stale
Copy link

stale bot commented Oct 25, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Oct 25, 2017
@nlohmann
Copy link
Owner

Would this be sufficient to switch to bidirectional iterators and close this issue?

@ -3618,7 +3618,7 @@ This class implements a both iterators (iterator and const_iterator) for the
@since version 1.0.0, simplified in version 2.0.9
*/
template<typename BasicJsonType>
- class iter_impl : public std::iterator<std::random_access_iterator_tag, BasicJsonType>
+ class iter_impl : public std::iterator<std::bidirectional_iterator_tag, BasicJsonType>
{
    /// allow basic_json to access private members
    friend iter_impl

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Oct 27, 2017
@theodelrieu
Copy link
Contributor

You can also remove the using iterator_category = ..., since std::iterator already exposes this typedef.

@nlohmann nlohmann removed the state: please discuss please discuss the issue or vote for your favorite option label Oct 28, 2017
@nlohmann
Copy link
Owner

Thanks everyone. I shall close this issue once the CI succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement/improvement solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Projects
None yet
Development

No branches or pull requests

5 participants