Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 BUG]: Workers being killed during request processing when memory usage exceeds max_worker_memory #1561

Closed
1 task done
ekisu opened this issue May 17, 2023 · 1 comment · Fixed by roadrunner-server/sdk#70
Assignees
Labels
B-bug Bug: bug, exception
Milestone

Comments

@ekisu
Copy link

ekisu commented May 17, 2023

No duplicates 🥲.

  • I have searched for a similar issue in our bug tracker and didn't find any solutions.

What happened?

When Roadrunner is configured to stop workers exceeding max_worker_memory megabytes, it eventually kills the worker during request processing if the request takes long enough. max_worker_memory is described in the documentation as a "soft limit", which "will not interrupt current request processing".

I expected that, in those situations, the worker would be able to finish its current request, and that it would be restarted only after the request is finished.

Version (rr --version)

rr version 2023.1.3 (build time: 2023-05-11T12:32:12+0000, go1.20.4), OS: linux, arch: amd64

How to reproduce the issue?

Just run curl http://localhost:9000 with the following configuration files:

.rr.yaml:

version: "2.8"
http:
  address: 127.0.0.1:9000
  pool:
    num_workers: 1
    supervisor:
      max_worker_memory: 128
      exec_ttl: 90s
rpc:
  listen: tcp://127.0.0.1:6001
server:
  command: "php psr-worker.php"

psr-worker.php:

<?php

require __DIR__ . '/vendor/autoload.php';

use Nyholm\Psr7\Response;
use Nyholm\Psr7\Factory\Psr17Factory;
use Psr\Http\Message\ServerRequestInterface;
use Spiral\RoadRunner\Worker;
use Spiral\RoadRunner\Http\PSR7Worker;


// Create new RoadRunner worker from global environment
$worker = Worker::create();

// Create common PSR-17 HTTP factory
$factory = new Psr17Factory();

$psr7 = new PSR7Worker($worker, $factory, $factory, $factory);

while (true) {
    try {
        $request = $psr7->waitRequest();
        if (!$request instanceof ServerRequestInterface) { // Termination request received
            break;
        }
    } catch (\Throwable $e) {
        $psr7->respond(new Response(400));
        continue;
    }

    try {
        // Allocate some memory, this should be enough to exceed the `max_worker_memory` limit defined in
        // .rr.yaml
        $array = range(1, 10000000);
        fprintf(STDERR, 'memory allocated: ' . memory_get_usage());
        sleep(30);

        $psr7->respond(new Response(200, [], 'Hello RoadRunner!'));
    } catch (\Throwable $e) {
        $psr7->respond(new Response(500, [], 'Something Went Wrong!'));
        
        $psr7->getWorker()->error((string)$e);
    }
}

Relevant log output

2023-05-17T19:45:04+0000	DEBUG	rpc         	plugin was started	{"address": "tcp://127.0.0.1:6001", "list of the plugins with RPC methods:": ["lock", "informer", "app", "resetter"]}
2023-05-17T19:45:04+0000	DEBUG	server      	worker is allocated	{"pid": 122311, "internal_event_name": "EventWorkerConstruct"}
[INFO] RoadRunner server started; version: 2023.1.3, buildtime: 2023-05-11T12:32:12+0000
2023-05-17T19:45:04+0000	DEBUG	http        	http server was started	{"address": "127.0.0.1:9000"}
2023-05-17T19:45:10+0000	INFO	server      	memory allocated: 269421016
2023-05-17T19:45:14+0000	DEBUG	server      	memory_limit	{"reason": "max memory is reached", "pid": 122311, "internal_event_name": "EventMaxMemory"}
2023-05-17T19:45:19+0000	DEBUG	server      	sending stop request to the worker	{"pid": 122311}
2023-05-17T19:45:29+0000	WARN	server      	worker doesn't respond on stop command, killing process	{"PID": 122311}
2023-05-17T19:45:29+0000	DEBUG	server      	sending stop request to the worker	{"pid": 122311}
2023-05-17T19:45:29+0000	WARN	server      	network error	{"reason": "network", "pid": 122311, "internal_event_name": "EventWorkerError", "error": "worker_exec_with_timeout: Network:\n\tsync_worker_exec_payload: EOF"}
2023-05-17T19:45:29+0000	DEBUG	server      	worker stopped	{"internal_event_name": "EventWorkerWaitExit", "error": "signal: killed\nprocess_wait: signal: killed"}
2023-05-17T19:45:29+0000	ERROR	http        	execute	{"start": "2023-05-17T19:45:10+0000", "elapsed": "19.297709831s", "error": "worker_exec_with_timeout: Network:\n\tsync_worker_exec_payload: EOF"}
2023-05-17T19:45:29+0000	INFO	http        	http log	{"status": 500, "method": "GET", "URI": "/", "remote_address": "127.0.0.1:57690", "read_bytes": 0, "write_bytes": 0, "start": "2023-05-17T19:45:10+0000", "elapsed": "19.298026343s"}
2023-05-17T19:45:29+0000	DEBUG	server      	worker is allocated	{"pid": 122362, "internal_event_name": "EventWorkerConstruct"}
@rustatian
Copy link
Member

Hey @ekisu 👋🏻
Yeah, you're right, found the buggy code ⚡
Will be fixed and fix will be released next Thursday, thanks for the contribution 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-bug Bug: bug, exception
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants