Skip to content
This repository has been archived by the owner on Sep 22, 2021. It is now read-only.

Added health monitor #697

Merged
merged 31 commits into from
May 12, 2020
Merged

Added health monitor #697

merged 31 commits into from
May 12, 2020

Conversation

niklabh
Copy link
Contributor

@niklabh niklabh commented Apr 22, 2020

Closes: #684

nodewatcher job, get the blockIndex in chain-db, then wait 10s and get it again, it should increment. 

is not done and service addition in k8 is needed

@niklabh niklabh marked this pull request as ready for review April 27, 2020 08:00
@niklabh niklabh requested a review from Tbaut April 27, 2020 08:00
Copy link
Collaborator

@Tbaut Tbaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have additional deeper queries to assess the situation better:

  • it's not because hasura and auth are green, that the connection between the 2 is fine (LB problems etc). Can we add a hasura->auth check, that would query the last 10 posts with their author and name?
  • same for hasura->chain-db can we query the last 10 posts with the onchain_link.referendum_id !== null and the onchain_referendum.delay.
  • same for the nodewatcher-server->node-watcher-deployment:
query{
  referendums(last:10){
    delay
  }
}

^ this is the link that breaks regularly, and that nothing monitors as of now. This is not a replacement for the nodewatcher check (below), but an additionnal "link" check I'd say.

is not done and service addition in k8 is needed
Right, we don't have direct access to it.

chain-db-watcher/README.md Outdated Show resolved Hide resolved
chain-db-watcher/src/index.ts Outdated Show resolved Hide resolved
health-monitor/README.md Outdated Show resolved Hide resolved
health-monitor/package.json Outdated Show resolved Hide resolved
health-monitor/src/index.ts Outdated Show resolved Hide resolved
health-monitor/src/index.ts Outdated Show resolved Hide resolved
health-monitor/src/index.ts Outdated Show resolved Hide resolved
@niklabh
Copy link
Contributor Author

niklabh commented Apr 30, 2020

I'd like to have additional deeper queries to assess the situation better:

  • it's not because hasura and auth are green, that the connection between the 2 is fine (LB problems etc). Can we add a hasura->auth check, that would query the last 10 posts with their author and name?
  • same for hasura->chain-db can we query the last 10 posts with the onchain_link.referendum_id !== null and the onchain_referendum.delay.
  • same for the nodewatcher-server->node-watcher-deployment:
query{
  referendums(last:10){
    delay
  }
}

^ this is the link that breaks regularly, and that nothing monitors as of now. This is not a replacement for the nodewatcher check (below), but an additionnal "link" check I'd say.

is not done and service addition in k8 is needed
Right, we don't have direct access to it.

Done

Copy link
Collaborator

@Tbaut Tbaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for the code, just a couple comments, testing it now.

chain-db-watcher/src/index.ts Outdated Show resolved Hide resolved
health-monitor/package.json Show resolved Hide resolved
health-monitor/package.json Outdated Show resolved Hide resolved
health-monitor/package.json Outdated Show resolved Hide resolved
health-monitor/src/index.ts Outdated Show resolved Hide resolved
health-monitor/src/index.ts Outdated Show resolved Hide resolved
health-monitor/src/index.ts Outdated Show resolved Hide resolved
niklabh and others added 3 commits May 4, 2020 20:05
Copy link
Collaborator

@Tbaut Tbaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a play with and thing look good, I think it'd be important to check with @fevo1971 to know better what is possible to do for his health monitor.

I don't think we can do too many crazy things. maybe he would expect something like a 200 when everything is fine, and a 5XX with a message of what goes wrong. Because the prisma version for instance won't help him. Also I've made a test with a chain-db and no referenda, I get a 200 although:

onchainLinkReferendumDelays | false
referendumDelays | false

This shouldn't be the case.

@niklabh
Copy link
Contributor Author

niklabh commented May 11, 2020

onchainLinkReferendumDelays

Throwing errors

@Tbaut
Copy link
Collaborator

Tbaut commented May 12, 2020

  • Fix the bug on onchain_links being an array that made the health monitor constantly failing.
  • Fixed to actually check the chain-db server. It could have failed without the health-check noticing.
  • Upgrade polkadot/api (kept showing errors in the console)
  • Added better check to avoid failing with id === 0
  • Changed the emoji (not showing in bash)
  • Added some logs to show clearly the used urls

Copy link
Collaborator

@Tbaut Tbaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please have a look at my commits before merging.

@niklabh
Copy link
Contributor Author

niklabh commented May 12, 2020

working nicely. Just need k8s setup now

@niklabh niklabh merged commit 488a092 into master May 12, 2020
@niklabh niklabh deleted the niklabh-feature-health-monitor branch May 12, 2020 18:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

One GET endpoint for monitoring
2 participants