-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST API #82
Comments
First of all, thanks so much for working on this! Really fantastic stuff. A REST API is something that's always been on the back of my mind, but wouldn't have had time to get to it. I don't see any problem with using UUIDs. And since we don't have AWS working yet anyway, that's obviously not an issue. I'll ask @kveerama, but I think this is something we want in the main repo. If you create a pull request, I'll try to review it this weekend and have some more feedback. It would also be very helpful if you could write a couple of tests for this. Also, FYI: I was a full-time maintainer of this project for a while, but I've taken another job and will no longer be able to devote much time. I'll keep up with the repo, but updates may come much more sporadically than they did the past couple of months. I believe the lab is trying to hire another maintainer, but until that happens, feedback may be slow. Thanks again for contributing! |
Hi! I recently looked at my code again and realized that of all things I had forgot to do the PR here. Just did it. I'll look into adding some tests too. |
Hey there - I'm picking up this issue. I think @jobliz's design was pretty good, and I'm going to propose a few more endpoints and decouple the API from the database |
Here are some more thoughts. I'd really appreciate feedback and other insights into this. A. I started off using @jobliz's existing REST API. However, since I would have very much like to force database access to go through the classes, this would require adding to_json methods to the following classes:
F. I'm not very familiar with graphQL, but a co-worker likes it, and it seems like its sticking around and may replace REST. Given that I'm going to need to do work in the
Please let me know what you think. If I did do a REST API, here are the endpoints I think would make sense GET summarize datasets table GET information about a particular dataset GET info about all the times we ran ATM GET info about one particular time we ran ATM GET all classifiers that were trained in dataruns GET single classifier that was trained in a datarun GET dataruns on a given dataset GET classifiers for a given datarun GET information on all workers GET - get information on a single worker GET a hyperpartition GET a single hyperpartition GET the hyperpartitions used in a datarun GET the classifiers using a hyperpartition Changed from Joblit PR: |
I'm posting discussion from our slack here so that it's available to public contributors from @csala to @RogerTangos I think that REST will completely satisfy our needs, and it's something that most people is already familiar with. |
from @csala A part from it, a couple of comments about the API design: In general, when building a REST API I follow the following pattern for each possible "object" (e.g. dataruns, datasets, etc.)
So, for example, I would implement this as follows:
A part from that, I would also implement DELETE in some cases, where it makes sense:
Another example of the schema explained would be, to "train a new classifier on a datarun", instead of posting to
|
from @RogerTangos Good critique. Thanks @csala! I think we should allow for deleting dataruns. TBH, I haven’t spent as much time on methods as I’d like to. (I was trying to get that out before leaving work) @micah - responding to the ATM-Vis stuff first: I’ve taken a look at their API, and although it does cover more usecases, I don’t think it’s nearly as well thought out as what joblitz put together in the PR. |
from @RogerTangos I went through their getting started guide. Once under the hood, it seems surprisingly simple. There are methods that would need to be defined on the classes, which need to have some relationships to eachother. Those relationships are already explicit due to SQLalchemy, so my impression is that it’s an equal or slightly larger amount of work for a significantly easier to use client experience. |
from @RogerTangos to @csala Because of your feedback, I’m leaning towards just building a REST api, but I’d like to make that decision based on more than familiarity. (I really unhappy with the fickleness of front end development, and actually dragged my feet on looking at graphQL because of that. After having a look though, I’m realizing it’s a specification - not an implementation - and I think it’s likely to stick around.) |
from @micahjsmith albert [@RogerTangos] I looked into GraphQL based on your recommendation and it does look really great. If I understand correctly, the main use is the endpoint where someone wants to get all classifiers associated with a datarun. Then, we would specify an allowed query using graphql on the server that returns, for a given datarun, any info about the dataset and classifiers that the client requires. The REST alternative would be to first make a GET
|
Hey @micahjsmith , thanks for taking a look. That's correct. It would also allow the client to extend that query to give metrics about the classifier. The alternative is something similar to the endpoints that @csala and I have been discussion above. (I'm editing those today.) @csala have you had a chance to look at graphQL? |
I'll let @csala weigh in and it seems like the two of you have more expertise than myself in this type of question. But if it seems that both approaches will work, and you are the one who is putting in the time to implement this feature, I'd think you could use your best judgment. |
New endpoints. I'm working on implementing these
note: Further down, this is abbreviated as follows:
|
A couple questions:
Why not just provide the datarun id? Why is tuner a query parameter? Would we stop dataruns with a tuner matching some string?
What is the difference here?
Thought we discussed that this is not currently possible or relevant |
Thank you for taking a look!
I'm trying to provide a maximal ability for users to search for a datarun here. I'd also like to include Do you think it'd be better to limit searching to just the ID?
Good catch. We had. I just forgot to remove those lines when copy pasting. Edited above. |
Thanks for your explanations.
Based on your explanations, I don't think it makes sense to limit searching. Though I wonder if it makes sense to limit deletion. I wonder if |
Note: There are requests from Kalyan that
Both seem doable. I'll make a separate issue for no 2 if it's not already possible |
Hey Guys, I am architect and main contributor of My ideia is start to help you guys cause I believe AutoML is the future. This is one paper that, more or less, show the ideia that supports marvin. Summarising the benefits of this integration:
What do you guys think about this? |
The first read-only version of the REST API is already in master, so I'm closing this issue now so future enhancements can be added as new issues. |
A REST API would be useful to access ATM from any other language and device through the web, and also to illustrate how the code is structured and might be extended.
From the project's readme I get that the internal API's are going to change and that it consequently might be a bit early to develop a REST API, but I wanted to see what was possible to do with the current code. The API currently serves:
No modifications were made outside of the rest_api_server.py file except to the requirements.txt file, adding flask and simplejson to the project dependencies.
TODO's / caveats:
Example api.py usage:
After following the readme's installation instructions and running python scripts/rest_api_server.py on a separate shell under the virtualenv:
curl localhost:5000/enter_data -F file=@/path/file.csv
It should return:
{"success": true}
To see the created dataset:
curl localhost:5000/datasets/1 | jq
To see the created datarun:
curl localhost:5000/dataruns/1 | jq
To run the worker.py script once:
curl localhost:5000/simple_worker | jq
after a while
The text was updated successfully, but these errors were encountered: