Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using input_adapter in a slightly unexpected way #834

Closed
chetmurthy opened this issue Nov 20, 2017 · 31 comments
Closed

Using input_adapter in a slightly unexpected way #834

chetmurthy opened this issue Nov 20, 2017 · 31 comments
Assignees
Labels
kind: enhancement/improvement state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated

Comments

@chetmurthy
Copy link

chetmurthy commented Nov 20, 2017

Feature Request

I'm trying to use input_adapter to wrap an Apache Thrift Transport (the details aren't really relevant). To that end, I need to provide the json::parse() function my own subclass of input_adapter_protocol, instead of an input_adapter. I'm happy to send a PR, but figured I should at least first check that this isn't something foolish, or something that can be achieved in another way? It was extremely straightforward to make my own subclass of input_adapter_protocol, btw.

modified   src/lib/cpp/json.hpp
@@ -1562,6 +1562,8 @@ class input_adapter
   public:
     // native support
 
+  explicit input_adapter(input_adapter_t arg_ia) : ia(arg_ia) { }
+
     /// input adapter for input stream
     input_adapter(std::istream& i)
         : ia(std::make_shared<cached_input_stream_adapter<16384>>(i)) {}
@nlohmann
Copy link
Owner

I am sorry, I need more context on this. I mean: it's good that you could achieve your goal without much effort, but I have not understood yet how a PR could look like.

@nlohmann nlohmann added the state: needs more info the author of the issue needs to provide more details label Nov 21, 2017
@chetmurthy
Copy link
Author

Ah, I wasn't very clear. So: in this particular case, I have a type that is like a std::istream, but with very few of the methods. It is easy to construct an input_adapter_protocol from it, but would be a lot of work to construct {istream, iterator, etc}. So I'd like to pass that input_adapter to json::parse, e.g.

json j = nlohmann::json::parse(input_adapter(std::shared_ptr<input_adapter_protocol>(new transport_input_adapter(s)))) ;

To do that, I needed to add another constructor to class input_adapter, where I just pass in the input_adapter_t. This is a change to your code, and I figured I should at least report it back to you, for two reasons:

(a) maybe I did something stupid grin

(b) I don't know whether you'd want to explain in your documentation that, in addition to all the other nice ways to read a JSON object, one can provide this really simple object (an input_adapter_protocol) with really only a get_character() method) and you'll parse from that.

In any case, it's working nicely, but I figured I should report back, is all.

I can send you a PR once I get everything all working and am certain that nothing else needs to be done.

@nlohmann
Copy link
Owner

If extending the library with a new input type is so simple, then of course I would be interested in an example or an PR to understand what needs to be changed.

@stale
Copy link

stale bot commented Jan 15, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 15, 2018
@chetmurthy
Copy link
Author

Neils, sorry I got sidetracked and didn't respond sooner. The code-snippet in my first comment -is- the change. That's it.

(1) the problem in the original code,was that there's no way for me to supply an input_adapter_t to the constructor to class input_adapter. An input_adapter can be made from many kinds of things, but not from an input_adapter_t. In my case, an input_adapter_t was exceedingly simple to construct, whereas all the other things were much harder.

(2) So I just added a new constructor entry.

That was it!

If you'd prefer, I can send you a PR. But really, the diff in my first comment is -everything-.

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 15, 2018
@nlohmann
Copy link
Owner

@chetmurthy Thanks for checking back! I wouldn't need a PR for this, but could you provide an example for the use case?

@chetmurthy
Copy link
Author

Sure. In thrift, there's this type called "TTransport", and it's sort of like an istream, but .... different. It would be very messy to make it look like an istream. But -easy- to make it look like an input_adapter_protocol, e.g.

https://github.com/chetmurthy/thrift-nicejson/blob/3ce7a84af55591be4388f93c095032d739ee3917/src/lib/cpp/TNiceJSONProtocol.cpp#L316

https://github.com/chetmurthy/thrift-nicejson/blob/3ce7a84af55591be4388f93c095032d739ee3917/src/lib/cpp/TNiceJSONProtocol.h#L34

an input_adapter_protocol has two methods, both of which are concerned with reading. It's really simple. An iterator's interface is also pretty simple, but .... it requires knowing a lot more C++ to build something that looks like an iterator. [it's also possible that I've forgotten, and an iterator requires implementing things that a TTransport doesn't support -- but I forget, so I can't be firm on this.]

In any case, the type input_adapter_protocol is -so- trivial to implement.

@nlohmann
Copy link
Owner

I see.

So it's just adding

explicit input_adapter(input_adapter_t arg_ia) : ia(arg_ia) { }

then? Sounds doable.

@chetmurthy
Copy link
Author

grin Yepper, it's that simple.

@nlohmann nlohmann added kind: enhancement/improvement and removed state: needs more info the author of the issue needs to provide more details labels Jan 16, 2018
@nlohmann nlohmann self-assigned this Jan 16, 2018
@nlohmann nlohmann added this to the Release 3.1.0 milestone Jan 16, 2018
@theodelrieu
Copy link
Contributor

Should the constructor be explicit? I remember it broke everything when I tried with other constructors

@chetmurthy
Copy link
Author

I'm not a C++ genius. But my understanding is:

(1) a constructor with a single-argument is by-default treated as suitable for casting. So in this case, if a programmer makes a mistake and writes "a = b" where "a" is of type "input_adapter" and "b" is of type "input_adapter_t", then the code will compile, b/c the constructor will be silently introduced as a cast.

(2) but in this case, that's not what's intended.

That's why I marked the method as explicit.

Hope that's clear. And again, I'm no C++ wizard. I might be wrong.

@theodelrieu
Copy link
Contributor

Agreed, you don't want that behaviour usually. But this library works thanks to implicit conversions.

I have several issues with your proposed addition.

First, input_adapter resides in the detail namespace (i.e. not exposed to users), and we rely on implicit conversions to make it work. I believe input_adapter_protocol should be exposed though.

So having an inconsistency on the way to call parse is a bit worrying to me.
Users who write their own adapters will not be able to use parse easily in generic code (because of that inconsistency):

auto json_str = "{}"s;
auto my_adapter = /* ... */;

auto j1 = json::parse(json_str);
auto j2 = json::parse(detail::input_adapter(my_adapter)));

Furthermore, I think the shared_ptr should not be part of the API, and rather have two overloads like the istream overloads:

input_adapter(input_adapter_protocol&);
input_adapter(input_adapter_protocol&&);

There may be a better solution that using shared_ptr inside input_adapter, but I don't know that part of the code well enough right now.

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jan 23, 2018
@nlohmann
Copy link
Owner

A PR would be helpful here.

@nlohmann nlohmann removed this from the Release 3.1.0 milestone Jan 27, 2018
@nlohmann
Copy link
Owner

I now had a colleague needing the exact described use case, so I had another look :)

I added the constructor described in #834 (comment). The following is a full example.

#include "json.hpp"
#include <iostream>

class string_protocol : public nlohmann::detail::input_adapter_protocol
{
public:
    explicit string_protocol(const std::string &s)
        : the_string(s), the_index(0)
    {}
    
    /// get a character [0,255] or std::char_traits<char>::eof().
    std::char_traits<char>::int_type get_character() override
    {
        if (the_index <= the_string.size() - 1)
        {
            return the_string[the_index++];
        }
        else
        {
            return std::char_traits<char>::eof();
        }
    }

    /// restore the last non-eof() character to input
    virtual void unget_character() override
    {
        --the_index;
    }
    
private:
    const std::string &the_string;
    size_t the_index;
};

int main()
{
    std::string my_input("[1,2,3]");
    
    nlohmann::detail::input_adapter_t simple_string_input_adapter = std::make_shared<string_protocol>(my_input);
    auto j = nlohmann::json::parse(nlohmann::detail::input_adapter(simple_string_input_adapter));
    
    std::cout << j << std::endl;
}

This seems to work, but the API seems really clumsy. But I do think that such an extension is necessary.

I am open for any proposals - also on how to get rid of the shared pointers...

@nlohmann
Copy link
Owner

Even nicer, we could add a constructor

template<typename IAP,
         typename std::enable_if<
             std::is_base_of<input_adapter_protocol, IAP>::value,
             int>::type = 0>
input_adapter(IAP& i)
    : ia(std::make_shared<IAP>(i))
{}

and allow the above main function to be

int main()
{
    std::string my_input("[1,2,3]");
    auto sp = string_protocol(my_input);
    auto j = nlohmann::json::parse(sp);
    std::cout << j << std::endl;
}

Any thoughts on this? @theodelrieu @chetmurthy

@theodelrieu
Copy link
Contributor

theodelrieu commented Mar 20, 2018

I thought a bit more about all this, this is still a draft of course but here are my early thoughts:

// as before, won't change
auto j = json::parse("[1, 2, 3]");

auto j2 = json::parse<my_input_adapter>("[1, 2, 3]");

So instead of having virtual inheritance based on the input_adapter_protocol, I propose to add a InputAdapter concept.

This concept would roughly have the same methods than input_adapter_protocol (although unget_character seems clunky for some streams, e.g. sockets).

In addition to that, parse would have an additional overload:

// requires is_constructible<InputAdapter, Args...> && is_input_adapter<InputAdapter>
template <typename InputAdapter, typename... Args>
auto parse(Args&&... args)
{
  InputAdapter adapter(std::forward<Args>(args)...);
  // ...
}

With this overload you can pass any argument you want, depending on how your input source is defined.

Internally we could then remove the input_adapter_protocol, and using the adapters defined in detail/adapters.hpp, we could use the same InputAdapter concept.

Later on, we could even allow users to override default implementations, but I'm not proposing that right now.

EDIT:

Or we could simply have a parse(InputAdapter) function, without the variadic argument pack, it's better for already constructed values.

This proposal would bring back every parse overload by the way.

@chetmurthy
Copy link
Author

Neils,

I'm no C++ guru. Your proposal appears to say "add a templatized constructor, that can take a ref of any type that is a subclass of input_adapter_protocol; make a shared-pointer from it". I assume implicitly it's copied?

A priori, this seems fine, but again, I'm no C++ guru. The one thing I sort of like about having the invoker create the shared_pointer is that nobody can get confused about memory-lifetimes. And I can easily imagine that somebody could get confused.

But really, that's the only quibble I can make. Otherwise, this looks fine. Also, I -do- feel that this is a corner case, and I would feel badly if you ended up changing a lot of your code, to support this. It really, really is a corner-case.

@theodelrieu
Copy link
Contributor

Yes IIRC there is a copy involved right now. With the template version users could do:

my_adapter a;
auto j = json::parse(a);
auto j2 = json::parse(std::move(a));

I'm not sure this is a corner-case, rather a less often needed use-case :)
Having a customization point here looks very useful to me.

@nlohmann
Copy link
Owner

Another idea would be to to replace the input adapter protocol by a bidirectional iterator, maybe even an input iterator. This would avoid the clumsy "protocol" and replace it with a standard C++ concept.

@chetmurthy
Copy link
Author

Ah. I think this would be .... not so great. Writing iterators is so complicated, whereas this input_adapter class, it's .... trivial to implement. That's a valuable property.

@theodelrieu
Copy link
Contributor

I agree that iterators are tricky to write correctly, almost as tricky as defining a concept :)

If we go the concept route, it would be nice to inspect the existing code and look at what are the real minimal requirements for InputAdapter.

For instance, the unget_character function looks superfluous, it should be the lexer's job to store that information, since ungetting a char on a SocketAdapter requires writing a cache and adding lots of logic to ensure everything works correctly.

Also, since there's no guarantee that the library will not chain-call that function, I don't see how it could be 100% correctly implemented.

@nlohmann
Copy link
Owner

Yes, unget could be programmed away.

@chetmurthy
Copy link
Author

Oh, please don't do that. In the particular use I had, in marshalling, it was critical that when the JSON parser didn't consume a char, it got pushed back to the provider. In short, unget is actually important.

@nlohmann
Copy link
Owner

@chetmurthy I do not want to remove unget for the sake of it - I just think that (if I remember correctly) in most cases I only use that function, because I am too lazy keep track of whether I need to consume an already read character or if I need to read a new one.

@nlohmann
Copy link
Owner

I had another look at the code: In the binary formats (CBOR, MessagePack, and UBJSON), unget() is not used at all. unget() is only called in one situation in the JSON parser, namely after reading the first character after a number's last digit.

We need to read that last character to know that the number ended, but we do not process that character right away. The current implementation then gives the character back to the buffer to read it again in a later call.

We could add a Boolean variable to the lexer to "remember" not the call the input adapter's get function the next time and continue working with the already read character after the number. While this would work in most cases, it could be ugly in other scenarios:

  • In the SAX parser, the parsed number is announced with a call like bool number_unsigned(number_unsigned_t val). If the receiver of the event decides to return false, parsing ends at that point - and the character after the number would be gone.
  • The same happens if we do not required a JSON value to end with EOF, we could continue reading the input stream and parse another JSON value.

However, in these cases I would expect the JSON texts to be at least separated by whitespace, so an input like 17 false would yield an JSON number and a JSON Boolean. In that scenario, however, it wouldn't be an issue if the space after the 17 was dropped - the lexer would start reading the f when the second value is read.

That said, I think unget() would only be really necessary if we would allow for use cases where the byte after a number if not consumed by the same JSON parser/lexer run.

What did I miss?

@chetmurthy
Copy link
Author

Thoughts:

(1) I used json.hpp specifically in a case where a stream of JSON objects is read, and each by different instances of the parser.

(2) but they were all objects ("{}"), not numbers.

So as long as the json.hpp parser/lexer does not read past the end of a JSON object, I think this is fine.

@nlohmann
Copy link
Owner

Even if these were other values, I would assume you would not accept them without any separation like

{"key":true}17false"I am a string"[]

@chetmurthy
Copy link
Author

agreed. but for objects, I would. That is:

{"key":true}{"a":"b"}{"c":"d"}

@nlohmann
Copy link
Owner

Yes, that should work.

@stale
Copy link

stale bot commented Apr 20, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Apr 20, 2018
@stale stale bot closed this as completed Apr 27, 2018
dgrunwald pushed a commit to dgrunwald/json that referenced this issue Jan 28, 2019
@dgrunwald
Copy link

I ended up having to patch this library in order to support a custom input adapter.

In my case I'm dealing with a read-only, non-contiguous container as input. Implementing a custom input adapter is easy; a custom std::streambuf is harder (and requires extra copies to a temporary mutable buffer).
I would also be fine with an iterator pair, but the input adapter for iterators requires them be random-access and contiguous :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement/improvement state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated
Projects
None yet
Development

No branches or pull requests

4 participants