Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate diagnostic report from system menu #399

Open
ghukill opened this issue Apr 18, 2019 · 3 comments
Open

Generate diagnostic report from system menu #399

ghukill opened this issue Apr 18, 2019 · 3 comments

Comments

@ghukill
Copy link
Contributor

ghukill commented Apr 18, 2019

One consistent pain point is diagnosing problems in the operations of Combine. This is due, in part, to the variety of services that Combine relies on:

  • MySQL
  • MongoDB
  • ElasticSearch
  • Livy and Spark
  • Celery

Each have their own logs, that provide helpful information, but this is not readily available through the GUI.

Proposing a "Run Diagnostics" button what would generate a zip file full of potentially helpful information. Perhaps even a "Diagnostics" page that shows which services are up and operational.

@ghukill ghukill closed this as completed Apr 23, 2019
@ghukill ghukill reopened this Apr 23, 2019
@richardcadler
Copy link

This looks like a very helpful thing to me.

@antmoth
Copy link
Collaborator

antmoth commented Jul 31, 2019

Two possible things to go with this ticket:

  • An array of green/yellow/red status lights, per service
  • A pile of viewable logs somewhere

I'm thinking that what we need to do is to get the logs for everything co-located into a spot on the filesystem that Combine has access to, and allow the user to view them...

Only certain services will be amenable to the 'array of green lights' option if used from inside Django. I think, but it seems like we could potentially set up a 'status'/'diagnostics' page that bypasses all of the running services, allowing us to Check Stuff Out when Django is down?

@antmoth
Copy link
Collaborator

antmoth commented Jul 31, 2019

I'm thinking that what we may want to do for the array-of-green-lights is to stitch together health-checks for all our services into a little command-line script, then set up (somehow?) a /status endpoint that doesn't rely on any of those services being up to run. The endpoint can call the script and construct HTML based on it? Am I totally off-base here?

Celery: There's apparently a web monitor program called Flower (as in flow, not as in botany). It's also possible to query redis-cli to monitor queue lengths.

Livy/Spark: Here, getting the status of all the Livy sessions might be the best we can do.

ElasticSearch: Actually has a health-check endpoint: GET _cluster/health.

Mongo: The mongo CLI has a ping command.

MySQL: It might be that the best way to check MySQL health is to try connecting to the db and performing a SELECT 1;.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants