Feature request: make it possible to keep docker container warm #239

jandockx · 2017-12-22T19:35:50Z

I understand from other issues that a new docker container is started for each request. This makes some experiments or automated tests undoable in practice. SAM Local is much too slow in the context where more then 1 request is to be handled.

I suspect that hot reloading depends on this feature.

I think it would be a good idea to make it possible to choose, while this project evolves further, to forego hot reloading, but to keep the docker container warm.

Something like

sam local start-api -p <PORT> --profile <AWS PROFILE> --keep-it-warm

This would broaden the applicability of sam local enormously.

Thank you for considering this suggestion. This looks like an awesome project.

The text was updated successfully, but these errors were encountered:

aldegoeij · 2017-12-28T09:53:11Z

+1 Python container takes too long to start for simple debugging...

zippadd · 2018-01-05T07:34:06Z

+1. This currently makes local automated testing painful at best.

Thanks for the continued work on this project!

dannymcpherson · 2018-02-01T22:41:24Z

Have there been any eyes on this? The benefit would be so huge.

cagoi · 2018-04-19T14:04:48Z

+1

hobotroid · 2018-04-27T15:57:56Z

+1

daveykane · 2018-06-04T10:08:42Z

+1

adrians5j · 2018-06-16T10:17:45Z

+1

CRogers · 2018-06-16T20:07:19Z

+1, even a simple hello world java8 lambda takes 3/4 seconds for each request!

CRogers · 2018-06-18T12:24:34Z

My sketch proposal to make warm containers work and maintain all the existing nice hot reload/memory usage etc functionality around them:

Currently, the container is simply run with handler argument and the event passed in via an environment variable. The containers logs are then piped to the console stdout/stderr and it just records how much memory is used.

Instead, we can start the container with bash as the entrypoint and -c "sleep infinity" as the argument, so it runs effectively nothing and keeps container alive. We record the container id in an (expiring) dict so we can reuse it again. When we want to run the lambda we run docker exec that runs the previously used lambda entrypoint and the correct environment. Since we run one lambda per container we can still record memory usage. If we key the running containers by the version of the lambda code we're running we can ensure hot reload still works. As always with caches the invalidation would be the interesting part - you probably want to kill out of date containers and kill containers when the tool exits.

monofonik · 2018-07-08T07:12:42Z

+1

luisvsm · 2018-08-15T01:16:52Z

+1 Very interested in this feature

luketn · 2018-08-24T00:46:27Z

+1 Yes please!

nodeit · 2018-09-06T02:26:18Z

+1, throwing my hat in the ring on this too

jfuss · 2018-09-06T02:31:25Z

As a note: Please use the reaction feature on the top comment. We do look at issues sorted by thumbs up (as well as other reactions). Commenting +1 does not good for that and adds noise to the issue.

scoates · 2018-09-06T14:08:51Z

@jfuss I agree (and had done this). Any feedback from your team would be helpful here, though. The closest thing we had to knowing if this is on your radar (before your comment) was duplicate issue consolidation and labeling.

ejoncas · 2018-09-24T08:51:55Z

+1, this would be very beneficial for people using java + spring boot.

thoratou · 2018-10-06T09:45:13Z

+1, around 1s for golang case

kevanpng · 2018-10-11T05:23:20Z

I did an experiment with container reuse. This is just with a lambda in python, I'm developing on ubuntu 16.04. In summary, docker container spinning up only takes an extra second. So it is not worth making the feature for container reuse. Link to my code https://github.com/kevanpng/aws-sam-local .

For a fixed query, both my colleague and I have 4s invocation time on sam local. His is a windows machine. With giving the profile flag and the container reuse, it goes down to 2.5s in my ubuntu.

My colleague is running on mac and when he tried the same query with lambda reuse and profile flag, he still had 11-14 seconds to run.

Maybe it could be that docker is slow on mac?

ghost · 2018-10-11T14:01:12Z

1 second is a world's difference when building an API and you expect to serve more than 1 request.

I think it's well worth the feature.

sanathkr · 2018-10-11T17:52:57Z

@kevanpng Hey I was looking through your code to understand what exactly you did.. So basically, you create the container once with a fixed name, run the function, and on next invocation look for container with same name and simply container.exec_run instead of creating it from scratch again. Is my summary correct?

I am super surprised Docker container creation makes this big of a difference. We can certainly look deeper into this if it is becoming usability blocker.

scoates · 2018-10-11T18:13:39Z

@sanathkr. Thanks for looking at this. FWIW, it's a huge usability blocker for me:

~/src/faculty/buildshot$ time curl -s http://127.0.0.1:3000/ >/dev/null # SAM container via Docker

real	0m6.891s
user	0m0.012s
sys	0m0.021s
~/src/faculty/buildshot$ time curl -s http://127.0.0.1:5000/ >/dev/null # regular python app via flask dev/debug server (slow)

real	0m0.039s
user	0m0.012s
sys	0m0.019s

And the Instancing.. is quick. It's Docker (and the way Docker is used here) that's slow. The (slow) werkzeug-based dev server is ~175x faster than waiting around for Docker. And this is for every request, not just startup. (And yes, this is from my Mac.)

sanathkr · 2018-10-11T20:26:59Z

@scoates Thanks for the comparison. Its not apples-to-apples to compare vanilla Flask to Docker-based app. But the 6 second duration with SAM CLI is definitely not what I would expect..

Did you have the Docker container already downloaded?
Also, can you start SAM CLI with --skip-pull-image flag? This will prevent the CLI to ask Docker for latest image version on every invoke. Do share your numbers again with this flag set.

Thinking ahead:
I think we need to add more instrumentation to SAM CLI codebase in order to understand the parts that contribute to the high latency. It could be cool if we can run the instrumented code in a Travis build with every PR so we can assess the performance impact of new code changes. We also need to run this on variety of platforms to understand the real difference between Mac/Ubuntu.

sanathkr · 2018-10-11T21:57:24Z

I did some more profiling by crudely commenting out parts of the codebase. Also this is not run multiple times. So the numbers are ballpark estimates. I ran sam init and ran sam local-start-api on a simple HelloWorld Lambda function created by the init template.

Platform: MacOSX
Docker version: 18.06.0

WARNING: Very crude measurements.

Total execution time (sam local start-api): 2.67 seconds
Skip pull images (sam local start-api --skip-pull-image): 1.45 seconds
Create container, run it, and return immediately without waiting for function terminate: 1.05 seconds
Create container, don't run it: 0.2 seconds
SAM CLI code overhead (don't create container at all): 0.045 seconds

Based on the above numbers, I arrived at a rough estimate for each step of the invoke path by assuming:

Total execution = SAM CLI overhead + Docker Image pull + Create container + Run Container + Run function

Then, here is how much each steps took:

SAM CLI Overhead: 0.045 seconds
Docker Image Pull Check: 1.3 seconds
Create Container: 0.15 seconds
Run container: 0.85 seconds
Run function: 0.45 seconds

The most interesting part is Create vs Run container durations. Run is 5x of Create. So it is better if we optimized for the Run duration.

If we were to do a warm start, then we would be saving some fraction of the 0.85 seconds it took to run the container. We should be keeping the runtime process up and running inside the container and re-run just the function in-place. Otherwise we aren't going to save much.

scoates · 2018-10-17T00:25:53Z

Hi. Sorry for the late reply. I was traveling last week and forgot to get to this when I returned.

I agree absolutely that apigw and flask aren't apples-to-apples, and crude measurements are definitely where we're at right now.

With --skip-pull-image, I still get request starts in the 5+ second range. Entirely possible there's slow stuff in my code (though it's small, so I'm not sure where that would come from; it really does seem like docker). Here are the relevant bits of a request (on a warm start; this is several requests into sam local start-api --skip-pull-image):

[ 0.00] 2018-10-16 20:18:44 Starting new HTTP connection (1): 169.254.169.254
[ 1.01] 2018-10-16 20:18:45 Requested to skip pulling images ...
[ 0.00]
[ 0.00] 2018-10-16 20:18:45 Mounting /Users/sean/src/faculty/buildshot/buildshot/build as /var/task:ro inside runtime container
[!5.32] START RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Version: $LATEST
[ 0.00] Instancing..
[ 0.00] [DEBUG]	2018-10-17T00:18:50.714Z	13e564e9-1160-4c0e-b1e2-b31bbadd899a	Zappa Event: {'body': None, 'httpMethod': 'GET', 'resource': '/', 'queryStringParameters': None, 'requestContext': {'httpMethod': 'GET', 'requestId': 'c6af9ac6-7b61-11e6-9a41-93e8deadbeef', 'path': '/', 'extendedRequestId': None, 'resourceId': '123456', 'apiId': '1234567890', 'stage': 'prod', 'resourcePath': '/', 'identity': {'accountId': None, 'apiKey': None, 'userArn': None, 'cognitoAuthenticationProvider': None, 'cognitoIdentityPoolId': None, 'userAgent': 'Custom User Agent String', 'caller': None, 'cognitoAuthenticationType': None, 'sourceIp': '127.0.0.1', 'user': None}, 'accountId': '123456789012'}, 'headers': {'X-Forwarded-Port': '3000', 'Host': 'localhost:3000', 'X-Forwarded-Proto': 'http', 'Accept': '*/*', 'User-Agent': 'curl/7.54.0'}, 'stageVariables': None, 'path': '/', 'pathParameters': None, 'isBase64Encoded': True}
[ 0.00]
[ 0.00] [INFO]	2018-10-17T00:18:50.731Z	13e564e9-1160-4c0e-b1e2-b31bbadd899a	127.0.0.1 - - [17/Oct/2018:00:18:50 +0000] "GET / HTTP/1.1" 200 15 "" "curl/7.54.0" 0/16.916
[ 0.00]
[ 0.00] END RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a
[ 0.00] REPORT RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Duration: 4684 ms Billed Duration: 4700 ms Memory Size: 128 MB Max Memory Used: 42 MB
[ 0.58] 2018-10-16 20:18:51 127.0.0.1 - - [16/Oct/2018 20:18:51] "GET / HTTP/1.1" 200 -

The [ 0.xx] prefix is returned by a util I have that shows elapsed time between stdout lines. Here's the important part, I think:

[!5.32] START RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Version: $LATEST
[ 0.00] Instancing..

I acknowledge that Instancing.. might just not be output until it's complete, so that by itself isn't a valid measurement point. Just wanted to pass on that I'm seeing 5s of lag in my requests.

I'm not sure how to measure much deeper than that.

More info:

$ docker --version
Docker version 18.06.1-ce, build e68fc7
$ uname -a
Darwin sarcosm.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,4 Darwin
$ sam --version
SAM CLI, version 0.5.0

I also agree that if I can get this down to sub-1s request times, it's probably usable. 5s+ is painful, still, though.

(Edit: adding in case anyone looking for Zappa info stumbles on this. I'm using an experimental fork of the Zappa handler runtime. This doesn't really apply to Zappa-actual. At least not right now.)

OFranke · 2020-04-09T05:33:56Z

If sam is using the same docker image under the hood, would it theoretically be possible to just pass the DOCKER_LAMBDA_STAY_OPEN=1 variable via sam environments.json?
Right now I observed that for some reason I cannot randomly add any variables to environments.json, just some that I defined before in the template.yaml.

When I hardcode the environment variable in my template.yaml like that:

SrvApigraphqlapi8D508D37:
    Type: AWS::Lambda::Function
    Properties:
      Code: SrvApigraphqlapi8D508D37
      Handler: base.handler
      Role:
        Fn::GetAtt:
        - SrvApigraphqlapiServiceRoleFD44AE9E
        - Arn
      Runtime: nodejs12.x
      Environment:
        Variables:
          DB_HOST:
            Fn::GetAtt:
            - SrvDatabasecdkgraphilelambdaexampledbD17C7F0B
            - Endpoint.Address
          DB_PORT:
            Fn::GetAtt:
            - SrvDatabasecdkgraphilelambdaexampledbD17C7F0B
            - Endpoint.Port
          DB_NAME: postgres
          DB_USERNAME: postgres
          DB_PASSWORD: postgres
          AWS_STAGE: prod
          DOCKER_LAMBDA_STAY_OPEN: 1

The whole thing crashes giving me that error message:

Lambda API listening on port 9001...
Function 'SrvApigraphqlapi8D508D37' timed out after 20 seconds
<class 'samcli.local.apigw.local_apigw_service.LambdaResponseParseException'>

flache · 2020-04-15T10:50:28Z

Are there any updates or is there a timeline on this? This is the single biggest blocker for us (and I can imagine for many others) to do more with AWS Lambda because this makes it almost impossible to develop and test stuff locally. Even with --skip-pull-image, a delay of ~5 seconds for each request makes it just unusable. Also there is the problem with global context not being preserved.

I understand that features must be prioritized but I am having a hard time to understand that everything that is running on lambda cannot be tested locally is not a high priority issue. Or am I missing something?

literakl · 2020-04-15T12:50:07Z

I have solved this trouble by moving away from Lambda to Node Express Dne st 15. 4. 2020 12:50 uživatel flache <[email protected]> napsal:

…

Are there any updates or is there a timeline on this? This is the single biggest blocker for us (and I can imagine for many others) to do more with AWS Lambda because this makes it almost impossible to develop and test stuff locally. Even with --skip-pull-image, a delay of ~5 seconds for each request makes it just unusable. Also there is the problem with global context not being preserved. I understand that features must be prioritized but I am having a hard time to understand that everything that is running on lambda cannot be tested locally is not a high priority issue. Or am I missing something? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#239 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABQWNN5BD2AETHPMBI3KXULRMWGQTANCNFSM4EJNN3UA> .

jfuss · 2020-04-15T14:28:32Z

Update: The team is working on other prioritizes at the moment. We know the time it takes for invoking locally is a pain point for many and we have plans to address it in the future. We do not have an ETA as of now.

OFranke · 2020-04-25T06:19:44Z

@flache
I've moved away from sam as it seems to not play so well with cdk at the moment, see #1911. I worked around it having an app that I run on docker locally but let cdk deploy it. Therefore I just use different application entries, which are not so different at all.

// lambda entry
import { Response, Request } from 'express';

const awsServerlessExpress = require('aws-serverless-express');
const express = require('express');

const app = express();
const handler = (req: Request, res: Response): void => {
  try {
    app(
      req,
      res,
      (err: { status: number; statusCode: number; message: string }) => {
        if (err) {
          if (!res.headersSent) {
            res.statusCode = err.status || err.statusCode || 500;
            res.setHeader('Content-Type', 'application/json');
          }
          res.end(JSON.stringify({ errors: [{ message: `${err.message}` }] }));
          return;
        }
        if (!res.finished) {
          if (!res.headersSent) {
            res.statusCode = 404;
          }
          res.end(`'${req.url}' not found`);
        }
      },
    );
  } catch (err) {
    res.end(JSON.stringify({ errors: [{ message: `${err.message}` }] }));
  }
};

const server = awsServerlessExpress.createServer(handler, undefined);
exports.handler = (event: unknown, context: unknown): unknown =>
  awsServerlessExpress.proxy(server, event, context);

// docker entry
import express from 'express';

const main = async () => {
  const app = express();

  app.listen(5000, '0.0.0.0');
};

try {
  void main();
} catch (e) {
  console.error('Fatal error occurred starting server!');
  console.error(e);
  process.exit(101);
}

I have built a whole graphql service like that, and run it for a few weeks on AWS now. Seems to be fine.

elthrasher · 2020-04-25T13:45:56Z

For those who are very comfortable with Docker and docker-compose, I created a proxy image that works with the underlying SAM (lambci) images and can bring your lambda function into existing docker-compose workflows as a long-lived function. https://github.com/elthrasher/http-lambda-invoker

literakl · 2020-04-25T13:52:26Z

I have personally switched from AWS Lamda to NodeJS+Express+nodemon and my productivity and happiness boosted.

duartemendes · 2020-05-09T18:28:33Z

Spent the last week writing a CLI tool to help with this issue, just 2 days ago I published the first version.

It's available in npm for download and installation. It provides both DOCKER_LAMBDA_STAY_OPEN and DOCKER_LAMBDA_WATCH environment variables to the underlying containers, mitigating cold starts after the first invocation and watching code changes.

I think the tool is easy to use (takes one command to run your api locally) but it's in a very early stage. It works very well for my APIs but I'm pretty sure I didn't take all use cases into consideration. So, give it a go, report any issues you find and please leave some feedback.

S-Cardenas · 2020-05-09T23:53:08Z

@duartemendes that tool is amazing! Congratulations and let me know if you need any help.

Does your tool currently support layers?

duartemendes · 2020-05-10T11:15:42Z

Thanks @S-Cardenas. It doesn't but it's something I'm happy to take a loot at 👍

kingferiol · 2020-05-14T09:31:24Z

This is really a road blocker for this technology for us. Too painfully.

It is not sustainable to wait 10 seconds per each request during development. Without any action on this, I think that we have to reconsider our approach to this technology.

jfuss · 2020-05-20T17:05:38Z

Update: We have prioritized some work that will help with the slow request time and provided a better warm invoke experience. I do not have timelines or ETAs to share at this point but wanted to communicate that we are starting to look at what we can do in this space.

ianballard · 2020-08-07T18:20:08Z

@jfuss any updates?

guichafy · 2020-08-10T18:40:43Z

I'm very excited to see this feature.

leonardobork · 2020-11-11T19:42:19Z

@jfuss any news?

S-Cardenas · 2020-11-11T20:57:08Z

Ditto. Would be great if this was officially released. Currently using https://github.com/elthrasher/http-lambda-invoker as a substitute.

OGoodness · 2020-12-09T15:50:46Z

🤞 Let's hope we can see this soon

S-Cardenas · 2020-12-09T16:44:04Z

Seems like it's getting very close to being approved and merged. Would love to get a notification when/if it does.

millsy · 2020-12-16T13:55:00Z

Fingers crossed this is soon added

kaarejoergensen · 2020-12-17T12:27:30Z

This feature has been added to the newest release (https://github.com/aws/aws-sam-cli/releases/tag/v1.14.0) 🎉

mndeveci · 2020-12-18T21:53:27Z

(As @kaarejoergensen mentioned 😄 ) Happy to inform that, this has been released with v1.14, resolving the issue.

jfuss mentioned this issue May 9, 2018

Requests are slow #362

Closed

jfuss mentioned this issue May 21, 2018

Cache and reuse Docker containers (Warm Start) #421

Closed

sanathkr added type/feature Feature request priority/2-important area/lambda-invoke labels Jun 1, 2018

sanathkr added the area/performance label Oct 11, 2018

S-Cardenas mentioned this issue May 12, 2020

Add support for layers duartemendes/aws-sam-api-proxy#1

Closed

awsjeffg added the stage/pm-review Waiting for review by our Product Manager, please don't work on this yet label Aug 12, 2020

ramosbugs mentioned this issue Nov 11, 2020

Add baseDir and resolve build options to serverless-trace serverless-nextjs/serverless-next.js#779

Merged

moelasmar mentioned this issue Nov 17, 2020

feat: warm containers #2383

Merged

7 tasks

awsjeffg added stage/in-progress A fix is being worked on and removed stage/pm-review Waiting for review by our Product Manager, please don't work on this yet labels Nov 19, 2020

mndeveci closed this as completed Dec 18, 2020

mousedownmike mentioned this issue Jan 8, 2021

Sam local invoke (go) not honoring timeout in context #2510

Closed

skinofstars mentioned this issue Apr 28, 2021

Why are requests so slow? #134

Closed

Feature request: make it possible to keep docker container warm #239

Feature request: make it possible to keep docker container warm #239

Comments

jandockx commented Dec 22, 2017

aldegoeij commented Dec 28, 2017

zippadd commented Jan 5, 2018

dannymcpherson commented Feb 1, 2018

cagoi commented Apr 19, 2018

hobotroid commented Apr 27, 2018

daveykane commented Jun 4, 2018

adrians5j commented Jun 16, 2018

CRogers commented Jun 16, 2018

CRogers commented Jun 18, 2018

monofonik commented Jul 8, 2018

luisvsm commented Aug 15, 2018

luketn commented Aug 24, 2018

nodeit commented Sep 6, 2018

jfuss commented Sep 6, 2018

scoates commented Sep 6, 2018

ejoncas commented Sep 24, 2018

thoratou commented Oct 6, 2018

kevanpng commented Oct 11, 2018

ghost commented Oct 11, 2018

sanathkr commented Oct 11, 2018

scoates commented Oct 11, 2018 • edited Loading

sanathkr commented Oct 11, 2018

sanathkr commented Oct 11, 2018 • edited Loading

scoates commented Oct 17, 2018 • edited Loading

OFranke commented Apr 9, 2020 • edited Loading

flache commented Apr 15, 2020

literakl commented Apr 15, 2020 via email

jfuss commented Apr 15, 2020

OFranke commented Apr 25, 2020

elthrasher commented Apr 25, 2020

literakl commented Apr 25, 2020

duartemendes commented May 9, 2020

S-Cardenas commented May 9, 2020

duartemendes commented May 10, 2020

kingferiol commented May 14, 2020

jfuss commented May 20, 2020

ianballard commented Aug 7, 2020

guichafy commented Aug 10, 2020

leonardobork commented Nov 11, 2020

S-Cardenas commented Nov 11, 2020

OGoodness commented Dec 9, 2020

S-Cardenas commented Dec 9, 2020

millsy commented Dec 16, 2020

kaarejoergensen commented Dec 17, 2020

mndeveci commented Dec 18, 2020

scoates commented Oct 11, 2018 •

edited

Loading

sanathkr commented Oct 11, 2018 •

edited

Loading

scoates commented Oct 17, 2018 •

edited

Loading

OFranke commented Apr 9, 2020 •

edited

Loading