You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One consistent pain point is diagnosing problems in the operations of Combine. This is due, in part, to the variety of services that Combine relies on:
MySQL
MongoDB
ElasticSearch
Livy and Spark
Celery
Each have their own logs, that provide helpful information, but this is not readily available through the GUI.
Proposing a "Run Diagnostics" button what would generate a zip file full of potentially helpful information. Perhaps even a "Diagnostics" page that shows which services are up and operational.
The text was updated successfully, but these errors were encountered:
An array of green/yellow/red status lights, per service
A pile of viewable logs somewhere
I'm thinking that what we need to do is to get the logs for everything co-located into a spot on the filesystem that Combine has access to, and allow the user to view them...
Only certain services will be amenable to the 'array of green lights' option if used from inside Django. I think, but it seems like we could potentially set up a 'status'/'diagnostics' page that bypasses all of the running services, allowing us to Check Stuff Out when Django is down?
I'm thinking that what we may want to do for the array-of-green-lights is to stitch together health-checks for all our services into a little command-line script, then set up (somehow?) a /status endpoint that doesn't rely on any of those services being up to run. The endpoint can call the script and construct HTML based on it? Am I totally off-base here?
Celery: There's apparently a web monitor program called Flower (as in flow, not as in botany). It's also possible to query redis-cli to monitor queue lengths.
Livy/Spark: Here, getting the status of all the Livy sessions might be the best we can do.
ElasticSearch: Actually has a health-check endpoint: GET _cluster/health.
One consistent pain point is diagnosing problems in the operations of Combine. This is due, in part, to the variety of services that Combine relies on:
Each have their own logs, that provide helpful information, but this is not readily available through the GUI.
Proposing a "Run Diagnostics" button what would generate a zip file full of potentially helpful information. Perhaps even a "Diagnostics" page that shows which services are up and operational.
The text was updated successfully, but these errors were encountered: