Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve devops #576

Closed
1 of 5 tasks
joepio opened this issue Feb 3, 2023 · 3 comments
Closed
1 of 5 tasks

Improve devops #576

joepio opened this issue Feb 3, 2023 · 3 comments

Comments

@joepio
Copy link
Member

joepio commented Feb 3, 2023

Current situation

  • Github CI action is triggered manually.
  • Binary is built
  • Binary is sent to server over SSH to a VPS on vultr.
  • We use systemctl to stop atomic-server
  • We create an export
  • We use systemctl to start atomic-server

What I like about this approach

  • It's pretty simple to run. Just two clicks from github.
  • It gives me status updates and error notifications
  • It's pretty standard, which means it looks like what many other devs might do. That means I catch problems that others may encounter, which is a good thing.
  • No vendor-lock in. I don't rely on any AWS / Azure / Google stuff
  • Lots of control over hardware. I can move to a local machine if needed and little changes.

What went wrong

AtomicData.dev was just down for longer than I'd like to admit. Let's evaluate what went wrong, and how to tackle the problems.

  • I replaced the binary on my VPS, which made it harder to revert to a backup. I've fixed that since then by creating backups in the CI.
  • A change upstream updated OpenSSL in Rust, but not on my VPS. Still not sure where this came from. Maybe I should use fixed versions for github actions and ubuntu images.
  • I don't have a staging machine / environment. I should have this. It should resemble production as much as possible (although it could be more resource constrained).
  • My built binary wasn't tested before it was deployed. I should have used a docker image that was pre-tested, and designed to run on the same OS. Ideally I run at least some tests on staging.

Things that can be improved

  • Use (tested) images instead of binaries. (to prevent stuff like this)
  • I'd like to use tools that improve observability. Think Grafana / Prometheus / Jaeger. Add metrics / Prometheus support #420. I think I'd like to run these on the same machine, to save costs.
  • cattle vs pets. In the future, I'd like to not be dependent on single machines. But as of now, I focus on a cost effective single node setup. Also, the performance right now is pretty much amazing, so I don't think I need multi-node for perf scaling reasons anytime soon.
  • performance regression tests.
  • Setup staging Staging environment #588

What tech to use for deployments

How do I approach these different goals? What tools could help me?

  • Docker. I'm pretty sure the answer will involve running images instead of running directly on ubuntu.
  • Docker-Compose. I'm familiar with this, and it seems like a decent pick for a single node setup. But I suppose it doesn't really scale or offer lots of flexibility. Not sure how easy it is to deploy.
    • Kubernetes. Definitely powerful, but I'm not sure if I need it. As of now, everything is just one node.
  • Terraform / Pulumi. Allows for a lot of configuration! Can deploy to pretty much anything, but Pulumi will probably require kubernetes.
  • Earthly is a build tool that uses docker
  • sup is for running a command on multiple machines.
  • monit for monitoring a single unix system and mmonit for multiple
  • seaweedfs for multi-node fs
@AlexMikhalev
Copy link
Collaborator

AlexMikhalev commented Feb 3, 2023

Using terraform: https://vincent.bernat.ch/en/blog/2022-cdktf-nixos and nixos for deployment, I had the same learnings with Pulumi: pulumi thin abstraction layer on top of terraform with diminishing returns once you start using additional plugins - like Cloudflare, or other DNS and tunnels. Pulumi doesn't require Kubernetes and happily will work locally - with filestore or s3 as a secret store, but the same applies to terraform.

For secrets management: https://developer.1password.com/docs/ci-cd , I haven't found anything better so far - you can use Hashicorp Vault as a backend for 1password. Another solution for enterprises with relevant security requirements is looking at Fortanix. I know about doppler (doppler.com), but see no benefit over 1password connect server.

Kubernetes- YAGNI (You are not going to need it), I will build a firecracker VM for you as a test for my private cloud, and then we can spin and move the VM around as needed for each user.
I plan to go ballistic and build an atomic server into initramfs for firecracker VMS, see https://blog.cloudkernels.net/posts/fc-rootfs/ , it may also be helpful if you want to spin atomic on rpi3 or smaller old device, but I may never come to that.

Network: Use zerotier to connect nodes in production or staging (https://www.zerotier.com/), see example for multi-cloud deployment https://docs.zerotier.com/terraform/multicloud-quickstart, zero tier allows you to create a mesh between different networks and different nodes, flattening network route between two peer nodes when they are communicating. An example would be ping from my RP4 to my laptop will have 125 ms the first time (when packet will go via network node and confirm auth) and <2 ms consequently. Wireguard allows you to create point-to-point connections only, there is no "mesh" even if it's called mesh - it's hub and spokes with shared keys, routing always goes via hub, unless you explicitly crate a peer route. Kubernetes/Contrainerd networks are a much sadder story.

@AlexMikhalev
Copy link
Collaborator

Example earhtly file to compile rust binary with cache (can be improved with sccache and docker registry proxy)

cat Earthfile                
VERSION 0.6
IMPORT ./frontend-ui-svelte-ts AS frontend
FROM rust:latest
WORKDIR /app

install-chef:
   RUN cargo install --debug cargo-chef

prepare-cache:
    FROM +install-chef
    COPY --dir src Cargo.lock Cargo.toml .
    RUN cargo chef prepare
    SAVE ARTIFACT recipe.json

# Using cutoff-optimization to ensure cache hit (see examples/cutoff-optimization)
build-cache:
    FROM +install-chef
    COPY +prepare-cache/recipe.json ./
    RUN cargo chef cook --release
    SAVE ARTIFACT target
    SAVE ARTIFACT $CARGO_HOME cargo_home

build:
    RUN apt update && apt upgrade -y
    RUN apt install -y g++-aarch64-linux-gnu libc6-dev-arm64-cross

    RUN rustup target add aarch64-unknown-linux-gnu
    RUN rustup toolchain install stable-aarch64-unknown-linux-gnu
    ENV CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc
    ENV CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc
    ENV CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++

    COPY --dir src Cargo.lock Cargo.toml .
    COPY +build-cache/cargo_home $CARGO_HOME
    COPY +build-cache/target target
    COPY frontend+build/dist ./public
    RUN cargo build --release
    SAVE ARTIFACT ./target/release/openapi-upload AS LOCAL ./release/openapi-upload-host
    RUN cargo build --release --target aarch64-unknown-linux-gnu
    SAVE ARTIFACT ./target/aarch64-unknown-linux-gnu/release/openapi-upload AS LOCAL ./release/openapi-upload-aarch64

docker:
    FROM debian:buster-slim
    COPY +build/openapi-upload openapi-upload
    EXPOSE 9091
    ENTRYPOINT ["./openapi-upload"]
    SAVE IMAGE aks/openapi-upload:latest                                                

@AlexMikhalev
Copy link
Collaborator

Front end node (svelte) dependency for above Eathrfile (can be run separately for debug)

 cat Earthfile            
VERSION 0.6
FROM node:latest

WORKDIR frontend

deps:
    COPY package.json tsconfig.json vite.config.ts tsconfig.node.json index.html ./
    COPY src src
    COPY public public
    RUN yarn install

build:
    FROM +deps
    RUN yarn run build
    SAVE ARTIFACT dist /dist AS LOCAL dist

joepio added a commit that referenced this issue Oct 2, 2023
joepio added a commit that referenced this issue Oct 2, 2023
@joepio joepio mentioned this issue Oct 2, 2023
18 tasks
joepio added a commit that referenced this issue Oct 3, 2023
joepio added a commit that referenced this issue Oct 3, 2023
joepio added a commit that referenced this issue Oct 3, 2023
joepio added a commit that referenced this issue Oct 3, 2023
joepio added a commit that referenced this issue Oct 3, 2023
joepio added a commit that referenced this issue Nov 9, 2023
joepio added a commit that referenced this issue Nov 9, 2023
joepio added a commit that referenced this issue Nov 9, 2023
joepio added a commit that referenced this issue Nov 9, 2023
joepio added a commit that referenced this issue Nov 11, 2023
Describe earthly
joepio added a commit that referenced this issue Nov 13, 2023
joepio added a commit that referenced this issue Nov 13, 2023
joepio added a commit that referenced this issue Nov 13, 2023
joepio added a commit that referenced this issue Nov 13, 2023
joepio added a commit that referenced this issue Nov 13, 2023
joepio added a commit that referenced this issue Nov 20, 2023
joepio added a commit that referenced this issue Nov 20, 2023
joepio added a commit that referenced this issue Nov 20, 2023
joepio added a commit that referenced this issue Nov 20, 2023
joepio added a commit that referenced this issue Nov 20, 2023
joepio added a commit that referenced this issue Nov 21, 2023
joepio added a commit that referenced this issue Nov 21, 2023
joepio added a commit that referenced this issue Nov 21, 2023
joepio added a commit that referenced this issue Nov 21, 2023
joepio added a commit that referenced this issue Nov 21, 2023
joepio added a commit that referenced this issue Nov 22, 2023
#576 push to atomicdata

#576 Fix earthly pnpm

#576 get e2e tests in earthly working
joepio added a commit that referenced this issue Nov 22, 2023
Fix clippy

Less flaky test

#576 Fix earthly


WIP earthfile

run earthly github

CI

earthly main-pipeline

Fix test

add artefacts

CI tauri

Upload artifacts

tauri deps

deps

typo

Fix test

Fix test in build

fix test

Use explicit satellite

Fix use org

Disable test

Update contribute

Fix earthly try

fix ci

fix CI earthly

Fix clippy

Lint fix

less lint fails
joepio added a commit that referenced this issue Nov 22, 2023
joepio added a commit that referenced this issue Nov 27, 2023
#576 push to atomicdata

#576 Fix earthly pnpm

#576 get e2e tests in earthly working
joepio added a commit that referenced this issue Nov 27, 2023
Fix clippy

Less flaky test

#576 Fix earthly


WIP earthfile

run earthly github

CI

earthly main-pipeline

Fix test

add artefacts

CI tauri

Upload artifacts

tauri deps

deps

typo

Fix test

Fix test in build

fix test

Use explicit satellite

Fix use org

Disable test

Update contribute

Fix earthly try

fix ci

fix CI earthly

Fix clippy

Lint fix

less lint fails
joepio added a commit that referenced this issue Nov 27, 2023
joepio added a commit that referenced this issue Nov 27, 2023
#576 push to atomicdata

#576 Fix earthly pnpm

#576 get e2e tests in earthly working
joepio added a commit that referenced this issue Nov 27, 2023
Fix clippy

Less flaky test

#576 Fix earthly


WIP earthfile

run earthly github

CI

earthly main-pipeline

Fix test

add artefacts

CI tauri

Upload artifacts

tauri deps

deps

typo

Fix test

Fix test in build

fix test

Use explicit satellite

Fix use org

Disable test

Update contribute

Fix earthly try

fix ci

fix CI earthly

Fix clippy

Lint fix

less lint fails
joepio added a commit that referenced this issue Nov 27, 2023
@joepio joepio closed this as completed in 0a92d72 Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants