-
-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datasette publish lambda plugin #236
Comments
This change introduces a new plugin hook, publish_subcommand, which can be used to implement new subcommands for the "datasette publish" command family. I've used this new hook to refactor out the "publish now" and "publish heroku" implementations into separate modules. I've also added unit tests for these two publishers, mocking the subprocess.call and subprocess.check_output functions. As part of this, I introduced a mechanism for loading default plugins. These are defined in the new "default_plugins" list inside datasette/app.py Closes #217 (Plugin support for datasette publish) Closes #348 (Unit tests for "datasette publish") Refs #14, #59, #102, #103, #146, #236, #347
… heroku/now (#349) This change introduces a new plugin hook, publish_subcommand, which can be used to implement new subcommands for the "datasette publish" command family. I've used this new hook to refactor out the "publish now" and "publish heroku" implementations into separate modules. I've also added unit tests for these two publishers, mocking the subprocess.call and subprocess.check_output functions. As part of this, I introduced a mechanism for loading default plugins. These are defined in the new "default_plugins" list inside datasette/app.py Closes #217 (Plugin support for datasette publish) Closes #348 (Unit tests for "datasette publish") Refs #14, #59, #102, #103, #146, #236, #347
Hi Simon, I'm thinking of attempting this. Can you clarify some questions I have?
Do you have a preference between ALB or API GW? ALB has better economics at scale, but has a minimum monthly cost. API GW has worse per-request economics, but scales to zero when no requests are happening.
Thanks for your help! |
I made a repo at https://github.com/code402/datasette-lambda to demonstrate the idea, and scratch my personal itch for this. The demo relies on some central authority having already published a public, reusable Lambda layer with Datasette & its dependencies. I think that differs from the other publish plugins which seem to mainly publish Dockerfiles that the host will interpret to install deps from a requirements.txt file. I chose that approach because
If it's the case that |
Hi Colin, Sorry I didn't see this sooner! I've just started digging into this myself, to try and play with the new EFS Lambda support: #850. Yes, uvloop is only needed because of uvicorn. I have a branch here that removes that dependency just for trying out Lambda: https://github.com/simonw/datasette/tree/no-uvicorn - so you can run I'm going to try out your |
As for your other questions:
Yes, exactly. I know this will limit the size of database that can be deployed (since Lambda has a 50MB total package limit as far as I can tell) but there are plenty of interesting databases that are small enough to fit there. The new EFS support for Lambda means that theoretically the size of database is now unlimited, which is really interesting. That's what got me inspired to take a look at a proof of concept in #850.
I personally like scale-to-zero because many of my projects are likely to receive very little traffic. So API GW first, and maybe ALB as an option later on for people operating at scale?
As you've found, the only native component is uvloop which is only needed if uvicorn is being used to serve requests.
For the eventual "datasette publish lambda" command I want whatever results in the smallest amount of inconvenience for users. I've been trying out Amazon SAM in #850 and it requires users to run Docker on their machines, which is a pretty huge barrier to entry! I don't have much experience with CloudFormation but it's probably a better bet, especially if you can "pip install" the dependencies needed to deploy with it. |
Now that Lambda supports Docker, this probably is a bit easier and may be able to build on top of the existing package command. There are weirdnesses in how the command actually gets invoked; the aws-lambda-python image shows a bit of that. So Datasette would probably need some sort of Lambda-specific entry point to make this work. |
Oh, and the container image can be up to 10GB, so the EFS step might not be needed except for pretty big stuff. |
Yeah the Lambda Docker stuff is pretty odd - you still don't get to speak HTTP, you have to speak their custom event protocol instead. https://github.com/glassechidna/serverlessish looks interesting here - it adds a proxy inside the container which allows your existing HTTP Docker image to run within Docker-on-Lambda. I've not tried it out yet though. |
👋 I just put together a small example using the lambda container image support: https://github.com/sethvincent/datasette-aws-lambda-example It uses mangum and AWS's python runtime interface client to handle the lambda event stuff. I'd be happy to help with a publish plugin for AWS lambda as I plan to use this for upcoming projects. The example uses the serverless cli for deployment but there might be a more suitable deployment approach for the plugin. It would be cool if users didn't have to install anything additional other than the aws cli and its associated config/credentials setup. |
That's so useful @sethvincent! Really interesting reading your code there, especially clever how you're using the I'd be very interested to see what your demo looks like without using serverless - completely agree that the less additional dependencies there are for this the better. I'm also very interested in figuring out a way to run Datasette in Lambda but with the SQLite database on an EFS volume. Do you have a feel for how hard that would be? |
Hi @simonw, I've received some inquiries over the last year or so about Datasette and how it might be supported by Mangum. I maintain Mangum which is, as far as I know, the only project that provides support for ASGI applications in AWS Lambda. If there is anything that I can help with here, please let me know because I think what Datasette provides to the community (even beyond OSS) is noble and worthy of special consideration. |
I keep coming back to this in search for the related exploration, so I'll just link it now: @simonw has meanwhile researched how to deploy Datasette to AWS Lambda using function URLs and Mangum via simonw/public-notes#6 and concluded that's everything I need to know in order to build a datasette-publish-lambda plugin. |
Refs #217 - create a publish plugin that can deploy to AWS Lambda.
https://docs.aws.amazon.com/lambda/latest/dg/limits.html says lambda packages can be up to 50 MB, so this would only work with smaller databases (the command can check the filesize before attempting to package and deploy it).
Lambdas do get a 512 MB
/tmp
directory too, so for larger databases the function could start and then download up to 512MB from an S3 bucket - so the plugin could take an optional S3 bucket to write to and know how to upload the.db
file there and then have the lambda download it on startup.The text was updated successfully, but these errors were encountered: