Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs causing memory overflow do not consider $maxExceptions (and are retried endlessly if $tries=0) #1346

Closed
graemlourens opened this issue Nov 8, 2023 · 15 comments

Comments

@graemlourens
Copy link
Contributor

Horizon Version

5.21.2

Laravel Version

10.29.0

PHP Version

8.2.7

Redis Driver

PhpRedis

Redis Version

6.0.1

Database Driver & Version

No response

Description

We ran into a bug which we believe did not exist in previous versions. We can not pinpoint at what point this bug appeared, and can also not see if this is an horizon issue or framework issue.

A job which causes a memory overflow, will be retried endlessly if $tries=0, despite having $maxExceptions=1
We also tried out what happens to jobs which timeout, however there it works as expected.

We'd expect a job which dies because of memory overflow to fail on second pickup by the worker, because $maxExceptions is set to 1 (and i'd argue that a memory overflow constitutes an exception, or is our assumption wrong here?)

Steps To Reproduce

Create a job with:

  • timeout=10
  • tries=0
  • maxExceptions=1

Set 'retry_after' to something low, so you don't have to wait endlessly until workers pick up the job again

the handle() method of the job should look like this:

ini_set('memory_limit', '60M');

$leak = [];
$ticks = 0;
while(true){
    $leak[$ticks] = file_get_contents(__FILE__);
    $ticks++;
    unset($leak[$ticks]);
}

in our stack the initial memory used by a queue worker is approximately 55MB. Maybe you'll need to adjust the memory limit to something closer to your 'initial' memory usage by the worker.

If you dispatch the job and see what horizon does, is that it will

  • run the job
  • report PHP Fatal error 'Allowed memory size of xxxx bytes exhausted'
  • after the 'retry_after' period the same job will be retried again and run into the same issue. This will go on forever
@driesvints
Copy link
Member

Thanks @graemlourens. Would appreciate any help here from a community member.

Copy link

github-actions bot commented Nov 8, 2023

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@graemlourens
Copy link
Contributor Author

@driesvints thank you. Could you please classify this as bug? As its easily reproduceable i guess you were also able to reproduce it by now.

@driesvints
Copy link
Member

I haven’t been able to reproduce this no. I currently don’t have time to check into this.

@driesvints driesvints added the bug label Nov 8, 2023
@nearueng
Copy link

nearueng commented Dec 8, 2023

This one is pretty interesting and I've been wondering about why some of our horizon workers get into this loop. I recently added $tries = 1 to a few of them and it seemed to have stopped, but the ones that did not have this, are still running into this.

I've just added $tries = 1 to these other jobs too and will report back if this was indeed the bug that was causing it.

@driesvints
Copy link
Member

Moved to laravel/framework#49389

@graemlourens
Copy link
Contributor Author

@driesvints You referenced a wrong ticket.

The issue here is about memory overflow not considering maxExceptions.

The issue you said it was moved to is about something else. (about timeout & jobs not being logged)

Please re-open, or move to a new ticket as these 2 things are totally separate issues.

@driesvints
Copy link
Member

@graemlourens what did you mean with this then:

I reported this in laravel/horizon but it turns out it has nothing to do with horizon but seems to be a framework issue.

@graemlourens
Copy link
Contributor Author

@driesvints that was #1200 which i reported over a year ago (i'm a patient person :) )
that issue i have now reported to laravel framework directly.

This one you just closed here, is still open. However it could also well be that this has nothing to do with horizon but only laravel. I'm happy to look into that, but ask you to reopen as long as it is not clear if this is a horizon or laravel issue. The fact remains this situation is unhandled.

@driesvints driesvints reopened this Dec 22, 2023
@driesvints
Copy link
Member

Ok re-opened.

@driesvints
Copy link
Member

Hi @graemlourens. I'm sorry but this has been open for quite a while already with no activity and you're the only one experiencing this. That's why I'm going to close this one. If you ever figure out a solution we'd appreciate a PR, thanks.

@graemlourens
Copy link
Contributor Author

@driesvints thats fine. Turns out its not a horizon issue, but in laravel framework its self and therefore an issue will have to be created there separately.

It can however can be reproduced by anyone, i guess we just stumble on these rare edge cases as we have tons of jobs flowing through our systems daily.

I'll hope to have some time and patience to create the ticket in laravel framework repo some time. However i'm not very motivated right now as my last ticket there regarding a serious queue issue was just closed without appropriate investigation which is really frustrating.

@driesvints
Copy link
Member

@graemlourens to be fair, I don't think it's worth it since you're the only one experiencing this right now. It's highly unlikely we can find the time soon to deep dive into this. I know a fair amount of other apps which have tons of jobs as well going through Horizon which don't experience this at all.

@graemlourens
Copy link
Contributor Author

@driesvints i understand. It would still have been nice, as so easily reproduceable, to at least receive an acknowledgment that it is indeed the case, despite maybe you not being able to deep dive into it.

An acklowledgment/reproducing the error, or indeed reporting that you are not able to reproduce the error at all, is valuable information, which i am mostly missing when i report issues in laravel or horizon.

Thank you for your time

@shamil8124
Copy link

shamil8124 commented Apr 10, 2024

just fyi, when we were running a few million horizon jobs a day, my workers would definitely stall and not gracefully exit. This would spike CPU usage.

I had to htop + manually kill the process once every few days.

Definitely possible it was due to the nature of our app since I'm sure others are running higher workloads.

We had a few different servers running horizon processes and a single redis node.

If I run into this issue again, I'll see about creating an issue too.

Just thought I'd chime in before this one is fully closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants