datasette publish lambda plugin #236

simonw · 2018-04-23T22:10:30Z

Refs #217 - create a publish plugin that can deploy to AWS Lambda.

https://docs.aws.amazon.com/lambda/latest/dg/limits.html says lambda packages can be up to 50 MB, so this would only work with smaller databases (the command can check the filesize before attempting to package and deploy it).

Lambdas do get a 512 MB /tmp directory too, so for larger databases the function could start and then download up to 512MB from an S3 bucket - so the plugin could take an optional S3 bucket to write to and know how to upload the .db file there and then have the lambda download it on startup.

The text was updated successfully, but these errors were encountered:

This change introduces a new plugin hook, publish_subcommand, which can be used to implement new subcommands for the "datasette publish" command family. I've used this new hook to refactor out the "publish now" and "publish heroku" implementations into separate modules. I've also added unit tests for these two publishers, mocking the subprocess.call and subprocess.check_output functions. As part of this, I introduced a mechanism for loading default plugins. These are defined in the new "default_plugins" list inside datasette/app.py Closes #217 (Plugin support for datasette publish) Closes #348 (Unit tests for "datasette publish") Refs #14, #59, #102, #103, #146, #236, #347

… heroku/now (#349) This change introduces a new plugin hook, publish_subcommand, which can be used to implement new subcommands for the "datasette publish" command family. I've used this new hook to refactor out the "publish now" and "publish heroku" implementations into separate modules. I've also added unit tests for these two publishers, mocking the subprocess.call and subprocess.check_output functions. As part of this, I introduced a mechanism for loading default plugins. These are defined in the new "default_plugins" list inside datasette/app.py Closes #217 (Plugin support for datasette publish) Closes #348 (Unit tests for "datasette publish") Refs #14, #59, #102, #103, #146, #236, #347

cldellow · 2020-04-03T22:19:00Z

Hi Simon,

I'm thinking of attempting this. Can you clarify some questions I have?

I assume the goal is to have a CORS-friendly HTTPS endpoint that hosts the datasette service + user's db.
If that's the goal, I think Lambda alone is insufficient. Lambda provides the compute fabric, but not the HTTP routing. You'd also need to add Application Load Balancer or API Gateway to provide an HTTP endpoint that routes to the lambda function.

Do you have a preference between ALB or API GW? ALB has better economics at scale, but has a minimum monthly cost. API GW has worse per-request economics, but scales to zero when no requests are happening.

Does Datasette have any native components, or is it all pure python? If it has native bits, they'll likely need to be recompiled to work on Amazon Linux 2.
There are a few disparate services that need to be wired together to expose a Python service securely to the web. If I was doing this outside of the datasette publish system, I'd use an AWS CloudFormation template. Even within datasette, I think it still makes sense to use a CloudFormation template and just have the publish plugin invoke it (via the standard aws cli) with user-specified parameters. Does that sound reasonable to you?

Thanks for your help!

cldellow · 2020-04-10T21:03:38Z

I made a repo at https://github.com/code402/datasette-lambda to demonstrate the idea, and scratch my personal itch for this.

The demo relies on some central authority having already published a public, reusable Lambda layer with Datasette & its dependencies. I think that differs from the other publish plugins which seem to mainly publish Dockerfiles that the host will interpret to install deps from a requirements.txt file.

I chose that approach because uvloop appears to be a dependency with native code that needs to be compiled for the target runtime environment. In this case, that's Amazon Linux 2. I'm not 100% clear on whether that's still required, because:

maybe uvloop is only needed for uvicorn, which the demo doesn't actually use since HTTP routing is handled by API Gateway
it seems like uvloop may be an optional, drop-in optimization for asyncio in any case (but I may be misreading this; I'm very much a Python noob)

If it's the case that uvloop is truly optional, then I think the publish plugin could do the packaging on the user's machine, regardless of what flavour of operating system they're on. That'd be a bit slower for the user, but would provide the most long-term flexibility in terms of supporting plugins.

simonw · 2020-06-16T23:45:45Z

Hi Colin,

Sorry I didn't see this sooner! I've just started digging into this myself, to try and play with the new EFS Lambda support: #850.

Yes, uvloop is only needed because of uvicorn. I have a branch here that removes that dependency just for trying out Lambda: https://github.com/simonw/datasette/tree/no-uvicorn - so you can run pip install https://github.com/simonw/datasette/archive/no-uvicorn.zip to get that.

I'm going to try out your datasette-lambda project next - really excited to see how far you've got with it.

simonw · 2020-06-16T23:50:12Z

As for your other questions:

I assume the goal is to have a CORS-friendly HTTPS endpoint that hosts the datasette service + user's db.

Yes, exactly. I know this will limit the size of database that can be deployed (since Lambda has a 50MB total package limit as far as I can tell) but there are plenty of interesting databases that are small enough to fit there.

The new EFS support for Lambda means that theoretically the size of database is now unlimited, which is really interesting. That's what got me inspired to take a look at a proof of concept in #850.

If that's the goal, I think Lambda alone is insufficient. Lambda provides the compute fabric, but not the HTTP routing. You'd also need to add Application Load Balancer or API Gateway to provide an HTTP endpoint that routes to the lambda function.

Do you have a preference between ALB or API GW? ALB has better economics at scale, but has a minimum monthly cost. API GW has worse per-request economics, but scales to zero when no requests are happening.

I personally like scale-to-zero because many of my projects are likely to receive very little traffic. So API GW first, and maybe ALB as an option later on for people operating at scale?

Does Datasette have any native components, or is it all pure python? If it has native bits, they'll likely need to be recompiled to work on Amazon Linux 2.

As you've found, the only native component is uvloop which is only needed if uvicorn is being used to serve requests.

There are a few disparate services that need to be wired together to expose a Python service securely to the web. If I was doing this outside of the datasette publish system, I'd use an AWS CloudFormation template. Even within datasette, I think it still makes sense to use a CloudFormation template and just have the publish plugin invoke it (via the standard aws cli) with user-specified parameters. Does that sound reasonable to you?

For the eventual "datasette publish lambda" command I want whatever results in the smallest amount of inconvenience for users. I've been trying out Amazon SAM in #850 and it requires users to run Docker on their machines, which is a pretty huge barrier to entry! I don't have much experience with CloudFormation but it's probably a better bet, especially if you can "pip install" the dependencies needed to deploy with it.

jacobian · 2021-03-14T23:41:51Z

Now that Lambda supports Docker, this probably is a bit easier and may be able to build on top of the existing package command.

There are weirdnesses in how the command actually gets invoked; the aws-lambda-python image shows a bit of that. So Datasette would probably need some sort of Lambda-specific entry point to make this work.

jacobian · 2021-03-14T23:42:57Z

Oh, and the container image can be up to 10GB, so the EFS step might not be needed except for pretty big stuff.

simonw · 2021-03-15T03:34:52Z

Yeah the Lambda Docker stuff is pretty odd - you still don't get to speak HTTP, you have to speak their custom event protocol instead.

https://github.com/glassechidna/serverlessish looks interesting here - it adds a proxy inside the container which allows your existing HTTP Docker image to run within Docker-on-Lambda. I've not tried it out yet though.

sethvincent · 2021-09-16T03:19:08Z

👋 I just put together a small example using the lambda container image support: https://github.com/sethvincent/datasette-aws-lambda-example

It uses mangum and AWS's python runtime interface client to handle the lambda event stuff.

I'd be happy to help with a publish plugin for AWS lambda as I plan to use this for upcoming projects.

The example uses the serverless cli for deployment but there might be a more suitable deployment approach for the plugin. It would be cool if users didn't have to install anything additional other than the aws cli and its associated config/credentials setup.

simonw · 2021-09-17T20:54:13Z

That's so useful @sethvincent! Really interesting reading your code there, especially clever how you're using the base_url config.

I'd be very interested to see what your demo looks like without using serverless - completely agree that the less additional dependencies there are for this the better.

I'm also very interested in figuring out a way to run Datasette in Lambda but with the SQLite database on an EFS volume. Do you have a feel for how hard that would be?

jordaneremieff · 2022-02-09T13:40:52Z

Hi @simonw,

I've received some inquiries over the last year or so about Datasette and how it might be supported by Mangum. I maintain Mangum which is, as far as I know, the only project that provides support for ASGI applications in AWS Lambda.

If there is anything that I can help with here, please let me know because I think what Datasette provides to the community (even beyond OSS) is noble and worthy of special consideration.

sopel · 2023-03-12T14:04:15Z

I keep coming back to this in search for the related exploration, so I'll just link it now:

@simonw has meanwhile researched how to deploy Datasette to AWS Lambda using function URLs and Mangum via simonw/public-notes#6 and concluded that's everything I need to know in order to build a datasette-publish-lambda plugin.

simonw added help wanted datasette-publish labels Apr 23, 2018

simonw added the feature label Jul 10, 2018

simonw mentioned this issue Jul 26, 2018

publish_subcommand hook + default plugins mechanism, used for publish heroku/now #349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasette publish lambda plugin #236

datasette publish lambda plugin #236

simonw commented Apr 23, 2018

cldellow commented Apr 3, 2020

cldellow commented Apr 10, 2020

simonw commented Jun 16, 2020

simonw commented Jun 16, 2020 •

edited

Loading

jacobian commented Mar 14, 2021

jacobian commented Mar 14, 2021

simonw commented Mar 15, 2021

sethvincent commented Sep 16, 2021

simonw commented Sep 17, 2021

jordaneremieff commented Feb 9, 2022

sopel commented Mar 12, 2023

datasette publish lambda plugin #236

datasette publish lambda plugin #236

Comments

simonw commented Apr 23, 2018

cldellow commented Apr 3, 2020

cldellow commented Apr 10, 2020

simonw commented Jun 16, 2020

simonw commented Jun 16, 2020 • edited Loading

jacobian commented Mar 14, 2021

jacobian commented Mar 14, 2021

simonw commented Mar 15, 2021

sethvincent commented Sep 16, 2021

simonw commented Sep 17, 2021

jordaneremieff commented Feb 9, 2022

sopel commented Mar 12, 2023

simonw commented Jun 16, 2020 •

edited

Loading