Moving to a web service #54

jakirkham · 2018-03-05T04:18:20Z

It’s certainly reasonable to start out with a corn job for these sorts of things. Also as we resolve some technical debt, the cron job is very helpful. That said, we have generally found in conda-forge that cron jobs inevitably struggle to scale.

To solve this problem, have ultimately moved all of them to web services that use webhooks. This allows them to deal with notifications as they come in and respond by doing some task. This approach seems well suited for updates. However it will require some thought into how we can get notifications from package indexes, GitHub, etc. Expect this will iron out any issues related to load.

CJ-Wright · 2018-03-05T04:39:17Z

I think that part of this is wrapped up in #53, since it is difficult to know exactly what kind of stress to expect on the system without some ballpark numbers. The whole bootstrap of the system also exaggerates some things (since we max out the CI time on every run of 03).

Currently we are clearing about 20 feedstocks per run with 03 and all the feedstocks with the others (although we'll fall behind when we hit 5000 feedstocks on 01).

I'm not opposed to moving this to a webservice, but the notification wrangling could be hard.

We also could do this in steps/build a hybrid system:
00 can be removed with a hook on staged (or via the feed, #38)
01 can be removed with a hook on all the feedstocks when PRs are merged (or via the feed, #38)
02 might be the most difficult to remove since listening for new releases from PYPI, CRAN, and GitHub may be difficult.
03 I'm less certain about, since I don't know what triggers it (I guess whatever triggers 02). If we can find a trigger for 02 then 02 and 03 could be merged.

jakirkham · 2018-03-07T02:46:35Z

Honestly our experience doing this at conda-forge has taught us that the system ends up being less stressed when converted from batch to a webservice. If you think about it a bit, this actually makes sense. The reason being updates in a web service don't all come at once in a big batch (this could be re-renderings, updating Circle SSH keys, or package updates). Instead they are sprinkled throughout the days at various times. The result ends up being things stay pretty light and the system handles events right away, which makes the whole thing more maintainable.

Agree that, handling the detection of updates has been and remains challenging. PyPI lacks the right kind of notification. ( pypi/warehouse#1683 ) Same story with R. Both provide index wide feeds (Python, R), which we could parse. Not sure what we do with everything else. Maybe piggyback on Arch Linux? For the cases where we have feeds, we could have a process that filters these for us and triggers the update PRs. Presumably this would live on Heroku. Though could live elsewhere.

Just to outline this a bit, it sounds like we would want the webservice to handle these events. Am I missing any?

Feedstock added
Feedstock updated
Feedstock removed
Package updated -> Update feedstock

Given how package indexes seem to handle these problems, our web service would need to be designed around processing these feeds. Namely it would check feed notifications against a listing of packages. Periodically a new package could be added, in which case we would need to check its version independently and then add it to the list. Removal would be relatively straightforward. In some ways, it might not be worth processing feedstock updates (possibly removals), as this could easily be checked when the feedstock's package comes up again.

Thoughts?

jakirkham · 2018-03-07T02:47:17Z

@isuruf, might be interested in this. 😉

isuruf · 2018-03-08T02:25:15Z

@isuruf, might be interested in this.

I have an idea about using Libraries.io, Github and IFTTT to make this a webservice. Will look into it once I have some time.

CJ-Wright · 2019-10-25T14:48:20Z

I think this is now available for action. The graph is stored in a json format an so can be written to by pretty much anything. We could provide a webservice with the bot's credentials (or provision a new bot) so it could update the versions in the graph. Each package (that is not a stub or archived) should have a new_version key that represents what the bot thinks the newest upstream version is.

CJ-Wright · 2019-10-25T14:49:00Z

If external things write to the graph we could then kick off github actions that then cause PRs to be issued.

viniciusdc · 2020-06-30T21:47:57Z

@CJ-Wright @beckermr Is there a way to clarify it somehow?, I read some of the issues regarding the web services, but some of them are all jumping into a migration or a closed PR.

CJ-Wright · 2020-06-30T23:53:28Z

Sorry what do you want clarified?

viniciusdc · 2020-07-01T00:01:09Z

Sorry what do you want clarified?

What are the webservices ? I want to understand and only want to help.

CJ-Wright · 2020-07-01T00:32:39Z

Sorry, it's no problem I didn't know what you were asking. Conda-forge has a bunch of web services, these are tasks/jobs/things that are triggered by some action on the web. For instance if there was something that published that a new version was available we wouldn't need to scrape the web for it. Similarly, rather than updating all the feedstocks in the graph every run we could just update the ones that have changed.

viniciusdc · 2020-07-01T00:34:46Z

uhm... I kind of get it. But just to be sure, when you say webservices you are referring to services like Azure, CircleCI and others ? or it's something else like a server request ? (Also, thanks for the reply)

CJ-Wright · 2020-07-01T00:40:26Z

The code for the existing webservices (which run things like the team and token updating) is located here if you want to take a look.

My understanding (@beckermr might be able to provide more insight here, since has contributed considerably to our webservices) is that we setup a server (usually a heroku instance) that listens for updates from webpages and then acts accordingly.

viniciusdc · 2020-07-01T00:45:23Z

The code for the existing webservices (which run things like the team and token updating) is located here if you want to take a look.

My understanding (@beckermr might be able to provide more insight here, since has contributed considerably to our webservices) is that we setup a server (usually a heroku instance) that listens for updates from webpages and then acts accordingly.

Humm, now I understand what was been said. My understanding of the web services was roughly different. Thanks.

viniciusdc · 2020-07-01T00:50:44Z

"rather than updating all the feedstocks in the graph every run we could just update the ones that have changed."It's an exceptional idea isn't it ? Is there a way for me to help ? I saw the items list above, but its still vague.

beckermr · 2020-07-01T13:02:02Z

So the essential idea of this issue is to refactor the bot into a distributed system that responds to events.

Imagine we are running a migration and package A depends on package B. When the PR for package B is merged/closed, we could detect this event by listening to a webhook. When we see that, we could look at the graph and queue up the PR for package A. We could then have a cron-ish job read from the queue and try to issue the migration.

This would be a big refactor of how the bot works and is pretty out of scope right now.

viniciusdc · 2020-07-01T13:59:01Z

So the essential idea of this issue is to refactor the bot into a distributed system that responds to events.

Imagine we are running a migration and package A depends on package B. When the PR for package B is merged/closed, we could detect this event by listening to a webhook. When we see that, we could look at the graph and queue up the PR for package A. We could then have a cron-ish job read from the queue and try to issue the migration.

This would be a big refactor of how the bot works and is pretty out of scope right now.

Oh, ok... thanks Cj and Matt thanks for the comments, now I have an idea of it.

CJ-Wright · 2020-07-01T21:55:09Z

To be fair we could have some things done by webservice, for instance marking a PR as merged/closed might be possible now. I think the main issue there is that the GH repo for the graph is rather large and might not fit inside the server. (this was part of the initial reasoning to move to something like dynamo, which we should really put inside a milestone, all the things that need a distributed database like thing)

viniciusdc · 2020-07-01T22:05:24Z

To be fair we could have some things done by webservice, for instance marking a PR as merged/closed might be possible now. I think the main issue there is that the GH repo for the graph is rather large and might not fit inside the server. (this was part of the initial reasoning to move to something like dynamo, which we should really put inside a milestone, all the things that need a distributed database like thing)

What was the reason that dropped the idea for Dynamodb ?

CJ-Wright · 2020-07-01T22:34:48Z

We weren't able to implement it in a way that was cost effective and other issues were more pressing

viniciusdc · 2020-07-01T22:39:16Z

We weren't able to implement it in a way that was cost effective and other issues were more pressing

uhm, and there isn't any other platform we could try ? I think it could be a great improvement to reduce the burden with the CI clients.

beckermr · 2020-07-01T23:39:53Z

If you can find another provider then go for it.

beckermr · 2020-07-01T23:40:03Z

Don’t spend money without asking

viniciusdc · 2020-07-01T23:53:19Z

Don’t spend money without asking

Ok I will definitely not do that, but it's a good advice thanks.

viniciusdc · 2020-07-02T15:05:04Z

@beckermr What about MongoDB ?

CJ-Wright · 2020-07-03T00:56:17Z

Mongo could work although you need to host it somewhere

viniciusdc · 2020-07-03T01:03:39Z

Mongo could work although you need to host it somewhere

Yup, actually I was wondering about the cloud mode of it, but I was not sure about the amount of data we will need (as the cloud is limited to 5gb)

CJ-Wright · 2020-07-03T01:08:33Z

I think the first move there is figuring out how little of the PR json we can get away with.

viniciusdc · 2020-07-03T01:19:27Z

I think the first move there is figuring out how little of the PR json we can get away with.

Or maybe some classes of PR's we could get rid of.I was wondering in doing the 'track opened and closed PR's' first to reduce the number of PR's hosted, and than migrate the result to a table in some NoSQL server. (we could also set this list into a web service, this will allow us to not bump at any API limit). I can also be missing something too.

jakirkham mentioned this issue Mar 5, 2018

Re-rendering service has frequent timeout issues conda-forge/conda-forge-webservices#157

Closed

jakirkham mentioned this issue Mar 7, 2018

More efficient use of Travis/API calls #64

Closed

jakirkham mentioned this issue Apr 2, 2018

write conda-smithy version for re-render into graph so we don't #115

Merged

CJ-Wright mentioned this issue Apr 7, 2018

Release lagging (in the case of hotfix releases) #72

Closed

This was referenced Oct 24, 2019

Slim down 0.15 conda-forge/scikit-image-feedstock#46

Closed

GitHub actions #655

Open

beckermr added the actions-feature label Jan 30, 2020

CJ-Wright added the deep dive label Jul 1, 2020

jakirkham mentioned this issue Jun 7, 2024

Bot missing recent LLVM releases #2531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving to a web service #54

Moving to a web service #54

jakirkham commented Mar 5, 2018

CJ-Wright commented Mar 5, 2018 •

edited

Loading

jakirkham commented Mar 7, 2018

jakirkham commented Mar 7, 2018

isuruf commented Mar 8, 2018 •

edited

Loading

CJ-Wright commented Oct 25, 2019

CJ-Wright commented Oct 25, 2019

viniciusdc commented Jun 30, 2020

CJ-Wright commented Jun 30, 2020

viniciusdc commented Jul 1, 2020

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020 •

edited

Loading

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

beckermr commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

beckermr commented Jul 1, 2020

beckermr commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

viniciusdc commented Jul 2, 2020 •

edited

Loading

CJ-Wright commented Jul 3, 2020

viniciusdc commented Jul 3, 2020

CJ-Wright commented Jul 3, 2020

viniciusdc commented Jul 3, 2020 •

edited

Loading

Moving to a web service #54

Moving to a web service #54

Comments

jakirkham commented Mar 5, 2018

CJ-Wright commented Mar 5, 2018 • edited Loading

jakirkham commented Mar 7, 2018

jakirkham commented Mar 7, 2018

isuruf commented Mar 8, 2018 • edited Loading

CJ-Wright commented Oct 25, 2019

CJ-Wright commented Oct 25, 2019

viniciusdc commented Jun 30, 2020

CJ-Wright commented Jun 30, 2020

viniciusdc commented Jul 1, 2020

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020 • edited Loading

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

beckermr commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

CJ-Wright commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

beckermr commented Jul 1, 2020

beckermr commented Jul 1, 2020

viniciusdc commented Jul 1, 2020

viniciusdc commented Jul 2, 2020 • edited Loading

CJ-Wright commented Jul 3, 2020

viniciusdc commented Jul 3, 2020

CJ-Wright commented Jul 3, 2020

viniciusdc commented Jul 3, 2020 • edited Loading

CJ-Wright commented Mar 5, 2018 •

edited

Loading

isuruf commented Mar 8, 2018 •

edited

Loading

viniciusdc commented Jul 1, 2020 •

edited

Loading

viniciusdc commented Jul 2, 2020 •

edited

Loading

viniciusdc commented Jul 3, 2020 •

edited

Loading