Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor NFS servers - critical diagnostics to understand issues #2242

Closed
1 task
consideRatio opened this issue Feb 22, 2023 · 3 comments
Closed
1 task

Monitor NFS servers - critical diagnostics to understand issues #2242

consideRatio opened this issue Feb 22, 2023 · 3 comments

Comments

@consideRatio
Copy link
Member

consideRatio commented Feb 22, 2023

Ideally we would be able to monitor the NFS servers we rely on in the grafana isntances directly, but unless we can't do that we need at least some way to understand if the NFS servers are overloaded.

I understand it as we rely on cloud provided NFS services GCP Filestore and AWS EFS. Ideally, we should at least learn how to monitor them using the cloud console if we can't provide grafana instances access to the datasources and import pre-defined dashboards for this.

Cloud services

Action points

  • Explore the options to monitor NFS services performance and come up with refined action points

Related

@pnasrat
Copy link
Contributor

pnasrat commented Feb 22, 2023

I believe @yuvipanda already has some graphs that could be added

@abkfenris
Copy link
Contributor

I just encountered this kind of issue on EFS, and it took a lot of digging to understand what is going on.

EFS has 3 different throughput modes. Bursting is the default and AWS does some sneaky stuff to make sure it's initially fast, but if you don't put enough data on it right away you can hit a wall and have really variable and hard to diagnose performance.

The key metrics for EFS to look at are Burst Credit Balance, Permitted Throughput, and Throughput Utilization.

If that's what you are encountering, I'd be happy to pull together some of the resources that I found while trying to diagnose it.

@consideRatio
Copy link
Member Author

@abkfenris thanks for sharing that - sorry for super-late followup!

I verified that I could see such metrics via AWS CloudWatch right away - nice!!

image
image


Closing this issue as its outdated and stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Needs Shaping / Refinement
Development

No branches or pull requests

3 participants