-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couldn't take snapshot error with replicas #2266
Comments
Hey @sameervitian The below error is an interesting one and linked to #2254. I am investigating this. What about the other replicas, do they also have similar logs?
Is the high load average causing any performance degradation? |
@pawanrawal yes, others also have similar logs. I dont have a reference point to measure if there is degradation in performance. I could see that our read calls are taking average of ~200ms.( /p folder is of 15GB). The machine used is 32GB 16core so definitely would want to degrade the machine to cut down cost. |
I was able to reproduce the |
@pawanrawal is |
I am not sure, how do you get this load average value? The |
Closing this as the fix for the main issue which was a bug has been merged. |
If you suspect this could be a bug, follow the template.
What version of Dgraph are you using?
Dgraph version : v1.0.4-dev
Commit SHA-1 : 807976c
Commit timestamp : 2018-03-22 14:55:24 +1100
Branch : HEAD
Have you tried reproducing the issue with latest release?
Yes
What is the hardware spec (RAM, OS)?
ubuntu 14.04 / 16 core 32GB
Steps to reproduce the issue (command/config used to run Dgraph).
config for dgraph
3 dgraph servers running in cluster with replica 3. I see from /state that all nodes are in groupId 1. Following starts appearing regularly in logs, seems something is wrong.
Along with that I see that the load average is in range 6-13 in all servers. I am running this in production. The cpu utilization is very less and I am using SSD for data.
following is the cpu metrics from top-
this is what I see in vmstat -
iostat result -
As cpu idle time is high and wait time is less, I expect the load average to be less. Also is the logs appearing frequently alarming?
Could someone check what is wrong here.
The text was updated successfully, but these errors were encountered: