Skip to content

Bacalhau project report 20220620

lukemarsden edited this page Jun 20, 2022 · 6 revisions

Service Provider focus

We have added documentation for "running a node" which will make it more inviting for service providers to start running Bacalhau jobs where they are hosting their data.

We've also added job selection http & exec hooks, which allow service providers to filter jobs that they accept.

We've started using the job selection for our own production nodes, which now accept jobs from any CID on IPFS, rather than just the ones that are local to the nodes. This makes the production network much more useful! 🎉

Documentation

A comprehensive Bacalhau Architecture document with diagrams was completed in time for the website refresh.

CLI improvements

The flags --id-filter, --sort-by and --reverse were added to the bacalhau CLI in order to support more sanely running a public network where there are many jobs submitted. There is an improved test suite for these filters

CLI design to support wasm and templates

A proposal is up for growing the Bacalhau CLI to support multiple frontends and backends.

This will unlock the WASM implementation.

Stress testing

We began work on a stress test cluster, extending the existing production terraform script to deploy a cluster of X nodes. We are working on a golang script - bacalhau-bench which submits N jobs in C concurrent batches. We will have Prometheus exporters in additional to the OpenTelemetry so we can measure where bottlenecks are.

OpenTelemetry

OpenTelemetry support is in with the new REST API - and is spanning from CLI to server and back!

Here's a screenshot demonstrating it working for one of our tests:

Screenshot 2022-06-19 at 13 53 00

This will be super valuable when (a) turned on in production (soon, as soon as we have a honeycomb key enabled on our production nodes) and (b) in conjunction with the stress test, which will reveal slow areas to focus on optimizing.

What's next

  • Finish the first round of stress testing and (we expect) knock down some low hanging performance/reliability issues
  • Continue work on WASM implementation (Python FaaS)
Clone this wiki locally