-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long running PHP PubSub process that eats all cpu exactly 1h into processing #2338
Comments
@udAL that sounds really frustrating, 🙁. We'll definitely take a look to see where/how we can help. Do you have the protobuf/gRPC extensions enabled? If so, would you be able to share versions of each? As an aside, would you be able to share the comparability issues you're experiencing? Maybe there's something we can do there to help as well. |
Hi @dwsupplee
Our compatibility issues come mainly from requirements of package kreait/firebase-php. If I uninstall it and update I can get:
I'll do some tests to see if this was part of the problem. |
With last versions of the packages the result it's the same, 100% cpu after 1h into the script. |
To the best of my knowledge the connection to PubSub freezes 1h into it. Don't know how or why, but since PubSub it's usually used on long running processes, isn't that something that should be taken into account? To hotfix the problem, we had installed cron and restarted the process every 50 min but that has it's own problems since a task could be cough in the middle and, not being acknowledged, would be reprocessed. I have put a timer in the script to destroy and create the PubSub object to reconnect and this should hotfix the problem for now. |
This occurred also on google-cloud-python around Dec 2017: Does that help? |
Is there anything being done to solve this? |
About cURL error 35: gnutls_handshake() we experience a similar issue in one app hosted on flexible engine php (see screenshot), and we use a supervisord in order to maintain stable the worker. Another thing, for this application we have different envs each in a different project, currently we find this problem only in one of the three projects, maybe try to redistribute in a new project, maybe the problem is for specific network zone.
|
I did not have protobuf nor grpc extensions installed.
Shouldn't this extensions be required on the google/cloud-core? EDIT: False 'all is good' alarm. It still pops up to 100% CPU |
Not all our libraries which depend on core require the extensions, and often times users may find themselves on something like shared hosting where the extensions aren't even an option. As a result we've worked to make sure they aren't hard requirements.
Gah, sorry to hear that. I'm setting up an environment to do some testing here soon. |
Any news? |
@udAL I've been working on reproducing. Would you be able to share the modified dockerfile you're using? |
@udAL are you able to share this additional info? :) |
Hi, sorry for the delay. Here you can find a project that can reproduce the error. It requires:
To deploy:
To reproduce:
You don't need to send any message to Thanks |
Any progress? |
@jdpedrie Could you reproduce the error? |
Hello there! |
Hi @udAL, I'm really sorry for the lengthy delay. I'm looking into this now. |
How many messages, on average, does the worker generally process before the CPU usage spikes? I've been running your sample application for quite some time (nearly 90 minutes) without seeing any sign of unusually high CPU usage. :-/ Have you tried using a specific tag of the |
The error I'm getting now it's the
It pops up exactly at 60 minutes. Have you seen the error on It doesn't seem to depend on how many messages it processes. On tests the error showed without processing any messages.
I could, but what version should I use? I don't know any that has worked. Online I've found other |
Have you tried running the container outside of GCE? Does the error still happen? If you've not tried, please confirm that. If it works outside GCE, I'll see about having a member of the network team investigate. |
I've just tested both local and GCE. |
Thank you for checking. Let me pass this information along to our team and we'll try to see what the next steps are. Thank you again for your patience, I absolutely know how frustrating this kind of thing can be. |
Looks like #2562 may be related. |
@udAL would you be able to reach out to Google Cloud Support and provide your project ID and the name of your GCE VM that was exhibiting the issue? You can reference this GitHub issue and that the Pub/Sub team asked you to provide this information for debugging. Thanks! |
@kamalaboulhosn Absolutely! Sorry if this it's obvious, where should I contact them? I can't find an email or form to send. |
Hi @meredithslota , i hope this issue, soon, will be fixed... I don't understand how, an issue, is opened more than 2 years... Please, we need this issue fixed as soon as posible. Thank's for you reply |
Hi @dpassola Let me apologise for this issue staying unresolved for such a long time. |
Hi @saranshdhingra , thanks for your reply |
Hi @dpassola I was able to replicate the issue in a GCE instance. |
Hi @saranshdhingra , thank you for your feedback and keep me updated! |
Just to confirm if the issue persists on different machine types, I tested the same on the following machine types: For anyone interested I used this terraform script to generate multiple GCE instances. |
Hey @dpassola Just wanted to update you(since it's been a week), we're discussing this internally right now and attempting to zero in on the source. |
Hi Saransh, thank's a lot for keeping me updated!
I’m so glad you’re working on it!
Missatge de Saransh Dhingra ***@***.***> del dia dt., 19
d’oct. 2021 a les 15:32:
… Hey @dpassola <https://github.com/dpassola> Just wanted to update
you(since it's been a week), we're discussing this internally right now and
attempting to zero in on the source.
I will keep you updated :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2338 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH4I2GVAFQ5GT2TDG732NC3UHVXPTANCNFSM4IYOAFAQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Aquest missatge s'adreça exclusivament a la persona destinatària i pot
contenir informació privilegiada o confidencial. Si no sou la persona
destinatària indicada, us recordem que la utilització, divulgació i/o còpia
sense autorització està prohibida en virtut de la legislació vigent. Si heu
rebut aquest missatge per error, us demanem que ens ho feu saber
immediatament per aquesta via i que el destruïu.
Abans d'imprimir aquest
missatge, assegureu-vos que és realment necessari.
|
Santa clou(d)s please help us... |
Hi @saranshdhingra , any news? |
I agree it's time to light a huge fire, join hands in a circle around it, look at the skies and pray. When everything else fails... |
Hi @dpassola But so far, I have tried the same setup on node on different machines and it doesn't seem to replicate. |
Hi @dpassola I did come across something today, so I wanted to ask a few questions about the versions of the libraries installed. In an old message in this thread, it was mentioned:
Can you tell me the current versions that are running? The reason I ask is that using the docker image from the project specified by @udAL, we are able to replicate the issue on a VM instance but the version of I tried a manual You can use the |
Hi @saranshdhingra , here you have our versions showed with "composer show" (there are more libraries installed):
Thank's! |
@saranshdhingra , yes, our composer.json and Dockerfile are the same mentioned. |
@dpassola Please do so in a test project. |
hi @saranshdhingra , if we put
we deploy this code and we are waiting for test this new version |
Hi @dpassola Have the caret present and then try to run As you can see currently it's downgrading the pubsub library to |
Hi @saranshdhingra , here version of libraries with caret :) Sorry for my mistake List of libraries:
composer.json content: (in grey versions that our IDE detects will install) |
Hi @saranshdhingra , our composer.json is like i show yesterday but we don't have, in our Dockerfile, composer update... we are testing the new configuration. |
Hi @dpassola Were you able to test the process with the new version of libraries? |
Hi @saranshdhingra ,thank you and sorry for delay on reply. We are testing the process. At the moment is working fine! On next days I'll send you my feedback again Thanks! |
Hi @dpassola I'll be closing this issue for now and would try to dig more into the specific version which fixes this issue and the reason for it. If you find that the issue is still not fixed, feel free to resume the thread. Again, apologies for such a delay on the resolution, but I can only hope it works from this point forward :) |
Upon more digging, I found some info which maybe useful for folks who maybe getting this issue in old versions. |
Hello, it has been in production for 17 days and, for now, everything is correct and we have not had to restart it. This is great news! Have a great year! Thank you @saranshdhingra ! |
This is more of a plea for help than a bug report..
google/cloud-pubsub 1.11.0
google/cloud-core 1.27.0
(outdated because of compatibility reasons)
We have been using PubSub for a while for managing background tasks. We have a GCE vm that has supervisord with two long running php processes that check for new messages and process whatever. It looks something like this:
As discussed on #1986, since we started we saw this errors popping up every hour:
It wasn't a big deal for us since we had our supervisord to keep the processes open. It wasn't till a few days ago when we updated our docker (which we do frequently). Suddenly our worker vm's were constantly at 100% cpu, even though there wasn't anything to process on PubSub and no task processing. We could restart the processes and they would work fine for exactly one hour, then they'll pop to 100% cpu without apparent reason. We're stuck with an essential server that needs restarting every 60 min.
We have done a lot of debugging and the only difference we found between the old and new machines it's that ubuntu package libcurl3-gnutls was updated from
<7.47.0-1ubuntu2.12>
to<7.47.0-1ubuntu2.14>
. We haven't found a way to downgrade.We have tried to force Guzzle to use TLS 1.1. PubSub allows to pass parameters to it like this:
But it hasn't work... Not with
CURL_SSLVERSION_TLSv1
neither withCURL_SSLVERSION_TLSv1_0
orCURL_SSLVERSION_TLSv1_2
I know this is a complex error, probably has nothing to do with google/cloud-pubsub, and this is an outdated version. Maybe it's a TLS incompatibility with google pubsub service. Maybe an outdated library in an updated os. I really don't know, but we'll take any suggestion you can give us... Any ideas?
Tomorrow i'll try with a clean project with updated packages and will post.
The text was updated successfully, but these errors were encountered: