Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not create a new cert each time akash-provider gets restarted #9

Closed
arno01 opened this issue Feb 18, 2022 · 10 comments · Fixed by #36
Closed

do not create a new cert each time akash-provider gets restarted #9

arno01 opened this issue Feb 18, 2022 · 10 comments · Fixed by #36
Assignees

Comments

@arno01
Copy link
Contributor

arno01 commented Feb 18, 2022

After checking with @88plug , it looks like the akash-provider is creating new certs each time it gets restarted:

$ akash query cert list --owner akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa -o json | jq | grep state | sort | uniq -c
     58         "state": "revoked",
     27         "state": "valid",
$ akash query txs --events "message.sender=akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa&message.action=cert-create-certificate" --page 1 --limit 100  | jq -r '.txs[] | [ .txhash, .timestamp, (.tx.body.messages[] | [  ."@type", .cert, .owner, .host_uri ] )[] ] | @csv' | awk -F',' -v OFS=',' '{tmp="echo " $4 " | openssl base64 -A -d | openssl x509 -ext subjectAltName -noout | xargs"; tmp | getline cksum; $4=cksum; print;}'

...
...
"4B22C17F35A7B08D87E2CC7DAA6AC4D474B4E9B4FFAEE3F3640C1B7F94DD4EE8","2022-02-16T16:30:53Z","/akash.cert.v1beta1.MsgCreateCertificate",X509v3 Subject Alternative Name: DNS:provider.akash.world,"akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa",
"26844B77B5539385059AEF901F0F03AC23F87C459F2006EFB963F6D0E3790290","2022-02-16T16:39:38Z","/akash.cert.v1beta1.MsgCreateCertificate",X509v3 Subject Alternative Name: DNS:provider.akash.world,"akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa",
"480F683DDA2CCDB22B73FEDC13BDD221108F44758386A8F4121EAF0B7274A112","2022-02-16T17:47:15Z","/akash.cert.v1beta1.MsgCreateCertificate",X509v3 Subject Alternative Name: DNS:provider.akash.world,"akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa",
...
...

based on the configmap-boot
https://github.com/ovrclk/helm-charts/blob/688b55b5/charts/akash-provider/templates/configmap-boot.yaml#L64

    /bin/akash tx cert create server provider.{{ .Values.domain }} $POPTS || exit 1

I think it should really save ~/.akash/<akash1....>.pem file in a local volume somewhere, so it won't attempt to recreate that cert each time it gets restarted, but instead, it would detect it's already present first.

local volumes can be added this way
https://kubernetes.io/docs/concepts/storage/volumes/#local

and there are more alternative ways in K8s, most of them are at that page.

@88plug
Copy link
Contributor

88plug commented Feb 18, 2022

Already solved in new fork -

@boz
Copy link
Contributor

boz commented Feb 18, 2022

I think we should create the state that's needed before running helm and then have helm package it up into secrets/config, where we can mount it into the pod.

  • create keys
  • create certificate
  • fund account
  • create/update provider attributes

can all be done locally, before installing onto helm.

Local storage works well but it depends on node affinity, which can be cumbersome and/or problematic.

@boz
Copy link
Contributor

boz commented Feb 19, 2022

For instance this is how we used to include an init script:

https://github.com/ovrclk/akash/blob/v0.5.0/_run/multi/akash-provider/templates/configmaps.yaml#L11-L12

@arno01
Copy link
Contributor Author

arno01 commented Feb 21, 2022

Since we can query the blockchain before invoking the akash provider run, I don't see the need of adding any additional manual steps.

  • create keys

That's already done locally and injected upon helm install.

  • create certificate

The already existing run.sh would just need to check the cert's validity and SAN before invoking akash tx cert create server provider.{{ .Values.domain }} ...

This can be used:

it gets the last available valid cert.

$ akash query cert list --state=valid --owner=akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa | jq -r '.certificates[-1].certificate.cert | @base64d' | openssl x509 -noout -enddate -ext subjectAltName
notAfter=Feb 18 17:49:57 2023 GMT
X509v3 Subject Alternative Name: 
    DNS:provider.akash.world
  • fund account

run.sh can also check the balance akash query bank balances $ACCOUNT before starting the provider and get into a loop saying echo "Please fund provider account: $ACCOUNT"; sleep 30; in case the balance is low (i.e. < 10 AKT).

  • create/update provider attributes

Same here, just check the diff between new values in provider.yaml against what's seen on the blockchain:

akash query provider get akash10fl5f6ukr8kc03mtmf8vckm6kqqwqpc04eruqa

More ideas:

We can add an Akash RPC time check, so provider won't run until Akash RPC node is synced.
Can re-use the code I've added here https://github.com/arno01/akash-tools/blob/58cfbd37/cli-booster/akash.source#L69-L87

Local storage works well but it depends on node affinity, which can be cumbersome and/or problematic.

Akash allows multiple valid certs unless they expire or manually revoked, so this is going to work as we don't revoke old certs.
And then, should one worker node die / go into maintenance, the akash provider will just spawn at the next available worker node. Yes, it will create the new cert if it is absent there locally but it won't keep creating them again until the cert gets expired (current cert validity is 1 year) / manually revoked.
Akash provider will use already generated cert there should it return back to its original worker node.

I do not really see any issues with using the local volumes nor the need to deal with the node affinity.
This will limit the amount of new certs to the number of worker nodes.

@boz let me know WDYT.

@sacreman
Copy link
Contributor

sacreman commented Mar 9, 2022

I'd like to at some point default persistent storage in this chart to enabled. This is the very simple helm chart persistent storage which uses local-storage (not Ceph).

With persistent storage and a statefulset we know if the chart is installing for the first time or not because we have a disk we can query for the config.

This config could also be compared against the chain if it exists and update commands only run if they are needed.

We are constantly conscious about not extending the initial install documentation at all. Enabling persistent storage in this chart means 2 additional steps. 1. finding a node name to bind the pod to and 2. creating a directory on that node to hold the data. We'll get feedback on whether this over complicated the instructions and if not I think that's the preferred way to go.

arno01 added a commit to arno01/helm-charts that referenced this issue May 8, 2022
Changes:

- bump akash-provider chart to 0.153.0
- install bc
- check Akash RPC node is not 30 seconds behind/ahead before continuing
- do not append provider.yaml but rather create it from scratch
- figure the provider address in case the user passes `--from=<key_name>` instead of `--from=<akash1...>` address
- check provider existence on the blockchain before attempting to create
  a new one (`akash tx provider create provider.yaml ...`)
- check whether provider settings (host uri, attributes, ...) have changed
  before broadcasting the new ones on the blockchain (`akash tx provider
update provider.yaml ...`)
- before generating and broadcasting the new provider certificate
  - check the last provider certificate found on the blockchain is valid
  - check whether the last provider certificate serial number found on the blockchain
  matches the local one

Issues addressed:

- fixes akash-network#35
- fixes akash-network#9
sacreman pushed a commit that referenced this issue May 8, 2022
Changes:

- bump akash-provider chart to 0.153.0
- install bc
- check Akash RPC node is not 30 seconds behind/ahead before continuing
- do not append provider.yaml but rather create it from scratch
- figure the provider address in case the user passes `--from=<key_name>` instead of `--from=<akash1...>` address
- check provider existence on the blockchain before attempting to create
  a new one (`akash tx provider create provider.yaml ...`)
- check whether provider settings (host uri, attributes, ...) have changed
  before broadcasting the new ones on the blockchain (`akash tx provider
update provider.yaml ...`)
- before generating and broadcasting the new provider certificate
  - check the last provider certificate found on the blockchain is valid
  - check whether the last provider certificate serial number found on the blockchain
  matches the local one

Issues addressed:

- fixes #35
- fixes #9
@arno01
Copy link
Contributor Author

arno01 commented May 8, 2022

I think this is now solved through pod's lifecycle only (PR #36), which should be sufficient as we don't expect this pod to be recreated often to cause any significant issue such as AKT drainage.

I'm going to keep this issue open until I confirm the fix (PR #36) is working as expected by only restarting the pod (via kill cpid 1) instead of recreating it (kubectl delete pod).

@arno01 arno01 reopened this May 8, 2022
@arno01
Copy link
Contributor Author

arno01 commented May 8, 2022

Have just tested this now, so it appears that everything gets removed even on a simple pod restart.
We should probably enable the persistent storage similarly to how it was done for akash-node by the use of K8s's local storage PersistentVolume.

Evidence

Pod akash-provider-774c47d94-h9vkt:

root@node1:~# kubectl get pods -A |grep akash-provi
akash-services                                  akash-provider-774c47d94-h9vkt             1/1     Running   0          4m7s

Making pod akash-provider-774c47d94-h9vkt restart:

root@node1:~# kubectl -n akash-services exec -ti $(kubectl -n akash-services get pods -l app=akash-provider --output jsonpath='{.items[0].metadata.name}') -- bash
root@akash-provider-774c47d94-h9vkt:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 17:12 ?        00:00:00 /bin/bash /boot/run.sh
root        2971       1  0 17:13 ?        00:00:02 /bin/akash provider run --cluster-k8s
root        2988       0  0 17:17 pts/0    00:00:00 bash
root        3063    2988  0 17:17 pts/0    00:00:00 ps -ef
root@akash-provider-774c47d94-h9vkt:/# kill 2971
root@akash-provider-774c47d94-h9vkt:/# command terminated with exit code 137

Pod is still same akash-provider-774c47d94-h9vkt:

root@node1:~# kubectl get pods -A |grep akash-provi
akash-services                                  akash-provider-774c47d94-h9vkt             1/1     Running   1 (3s ago)   4m30s

But the cert is gone now...

root@node1:~# kubectl -n akash-services logs $(kubectl -n akash-services get pods -l app=akash-provider --output jsonpath='{.items[0].metadata.name}') --tail=10 -f

...
/root/.akash/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0.pem file is missing.

tvanes pushed a commit to hoolia/akash-helm-charts that referenced this issue Jun 24, 2022
Changes:

- bump akash-provider chart to 0.153.0
- install bc
- check Akash RPC node is not 30 seconds behind/ahead before continuing
- do not append provider.yaml but rather create it from scratch
- figure the provider address in case the user passes `--from=<key_name>` instead of `--from=<akash1...>` address
- check provider existence on the blockchain before attempting to create
  a new one (`akash tx provider create provider.yaml ...`)
- check whether provider settings (host uri, attributes, ...) have changed
  before broadcasting the new ones on the blockchain (`akash tx provider
update provider.yaml ...`)
- before generating and broadcasting the new provider certificate
  - check the last provider certificate found on the blockchain is valid
  - check whether the last provider certificate serial number found on the blockchain
  matches the local one

Issues addressed:

- fixes akash-network#35
- fixes akash-network#9
@andy108369
Copy link
Collaborator

See if we can leverage the configmap to store & restore the cert.

@andy108369 andy108369 reopened this Dec 8, 2022
@andy108369
Copy link
Collaborator

See if we can leverage the configmap to store & restore the cert.

configmap & secrets are for consuming only (i.e. they are always readonly) - ref kubernetes/kubernetes#62099

The easiest and straightforward way is to use a hostPath (/root/.akash/k8s-provider) for this purpose.
I've prepared the PR that I've tested, going to push in few minutes.

@andy108369
Copy link
Collaborator

#177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants