Drafted plan and stories for Round 1 release.

karlmdavis · Jun 27, 2021 · 14e2649 · 14e2649
1 parent 09e687d
commit 14e2649
Show file tree

Hide file tree

Showing 14 changed files with 198 additions and 1 deletion.
diff --git a/dev/plans/0001-round-1.md b/dev/plans/0001-round-1.md
@@ -0,0 +1,54 @@
+# Round 1 Release Plan
+
+This will be the first official release and also
+  the publishing of the first permanent benchmark results.
+In general, the goals here are:
+
+1. Provide enough data to help FHIR API server users
+     make better-informed decisions than they otherwise could,
+     without going so overboard that the release never happens.
+2. Provide enough data to help FHIR API server implementors
+     understand how their performance stands relative to other implementations
+     and also enough information for them to work on improving their performance,
+     if they'd like to.
+3. Ensure that the results are reasonably stable / repeatable.
+4. Make it reasonably simple to add new benchmarks.
+
+It's worth pointing out some goals that are neat & cool & stuff
+  but are nevertheless explicitly not in scope for this first release
+  (in the interests of ensuring that there _is_ a first release sometime this century):
+
+* Make it easy for implementors to incorporate these benchmarks
+    into their build processes.
+* Add support for SaaS-only FHIR API server implementations.
+* Calculate the cost per request served for each FHIR API server.
+
+
+## Detailed Plan
+
+The following user stories are currently planned to be in scope for this release:
+
+* [x] [Compare Performance of FHIR Servers](../stories/0001-perf-compare.md)
+* [x] [Continuous Integration](../stories/0002-ci.md)
+* [x] [Sample Data](../stories/0003-sample-data.md)
+* [x] [Support Firely Spark](../stories/0005-firely-spark.md)
+* [x] [Increase Detail in the Application's Errors](../stories/0007-error-details.md)
+* [x] [GitHub README](../stories/0008-readme.md)
+* [x] [Publish Results to a Website](../stories/0009-publish-results.md)
+* [ ] [IBM FHIR](../stories/0012-ibm-fhir.md)
+* [ ] [Improve Management of FHIR Server Dockerfiles](../stories/0013-refactor-dockerfiles.md)
+* [ ] [Cache Sample Data in S3](../stories/0014-cache-sample-data-in-s3.md)
+* [ ] [Tracing](../stories/0015-tracing.md)
+* [ ] [Support More `Organization` Operations](../stories/0016-more-organization-operations.md)
+* [ ] [Make it Simpler to Add Benchmark Operations](../stories/0017-simplify-adding-operations.md)
+* [ ] [Analyze Synthea Output](../stories/0018-analyze-synthea-output.md)
+* [ ] [Support `Patient` Resource Operations](../stories/0004-patient-ops.md)
+* [ ] [Timeseries Data: Latency, Operation Count, Request Size](../stories/0019-timeseries-data.md)
+* [ ] [Improve Debugging of Operation Failures](../stories/0011-operation-failure-debugging.md)
+* [ ] [Automate Benchmark Runs in Cloud](../stories/0020-automate-runs-in-cloud.md)
+* [ ] [Give the CI Some TLC](../stories/0021-ci-tlc.md)
+
+In addition, the following bugs are currently planned to be fixed:
+
+* [x] [HAPI Failures After Launch](../stories/0006-hapi-startup-wait.md)
+* [ ] [HAPI 'POST /Organization' Failures With Timeouts](../stories/0010-hapi-post-org-timeouts.md)
diff --git a/dev/stories/0003-synthetic-data.md → dev/stories/0003-sample-data.md b/dev/stories/0003-synthetic-data.md → dev/stories/0003-sample-data.md
@@ -1,4 +1,4 @@
-# Synthetic Data
+# User Story: Sample Data
 
 As a reader/consumer of benchmarks,
   I need the data used by the FHIR servers when they're being genchmarked to be realistic,

diff --git a/dev/stories/0010-hapi-post-org-timeouts.md b/dev/stories/0010-hapi-post-org-timeouts.md
@@ -0,0 +1,12 @@
+# Bug: HAPI 'POST /Organization' Failures With Timeouts
+
+Getting a lot of operation failures with the benchmarks logging this:
+
+```
+{
+  "msg": "Operation 'POST /Organization' failed: 'ServerOperationIterationState { _inner: ServerOperationIterationFailed { completed: ServerOperationIterationCompleted { start: ServerOperationIterationStarting { started: 2021-05-31T18:22:55.066093905Z }, completed: 2021-05-31T18:23:05.066276646Z }, error: Operation timed out: 'future has timed out' } }",
+  "level": "WARN",
+  "ts": "2021-05-31T18:23:45.09314212000:00"
+}
+
+These consistently pop up with more concurrency, e.g. when running on `eddings` about a quarter of requests at `concurrent_users: 10` are failing due to this.
diff --git a/dev/stories/0011-operation-failure-debugging.md b/dev/stories/0011-operation-failure-debugging.md
@@ -0,0 +1,25 @@
+# User Story: Improve Debugging of Operation Failures 
+
+At higher concurrency levels right now,
+  I'm seeing a lot of operation timeout failures for both HAPI and Spark.
+
+However, the only way to debug those failures
+  is to stare at the log output during a benchmark run,
+  wait for failures to get logged,
+  and then quickly try to run `docker logs ...` for the container.
+As debugging experiences go,
+  this is bad.
+
+Instead, I think we need to start collecting logs for the FHIR servers,
+  writing those logs to disk,
+  and then referencing all of the log file locations in the JSON output.
+
+## Details
+
+* For larger benchmark runs, these log files are liable to eat a lot of disk space,
+    perhaps even enough to exhaust the benchmark system's free space.
+  I may want to compress them upfront, to help mitigate this.
+* Haven't spent some time trying out the goofy manual debugging procedure above for HAPI timeouts,
+    it's not clear to me that these logs
+    will actually provide enough information to diagnose the problems.
+  Nevertheless, this seems like a necessary first step towards improving debugging in general.
diff --git a/dev/stories/0012-ibm-fhir.md b/dev/stories/0012-ibm-fhir.md
@@ -0,0 +1,7 @@
+# User Story: Support [IBM FHIR Server](https://github.com/IBM/FHIR)
+
+As a benchmark user,
+  I'd like to see the benchmarks include
+  [IBM FHIR Server](https://github.com/IBM/FHIR),
+  so that I can understand its performance,
+  relative to other FHIR server implementations.
diff --git a/dev/stories/0013-refactor-dockerfiles.md b/dev/stories/0013-refactor-dockerfiles.md
@@ -0,0 +1,9 @@
+# User Story: Improve Management of FHIR Server Dockerfiles
+
+As a benchmark contributor,
+  I'd like the various `Dockerfile`s to be better managed
+  so that it's easy to understand what version things are pinned to
+  and so that it's easy to update that.
+I'd also like to either stop using `git submodule`s for those
+  or at least to better document how to work with them
+  (because I can never remember).
diff --git a/dev/stories/0014-cache-sample-data-in-s3.md b/dev/stories/0014-cache-sample-data-in-s3.md
@@ -0,0 +1,5 @@
+# User Story: Cache Sample Data
+
+As a user of the benchmark application,
+  I would like it to cache the sample data it generates in S3,
+  so that I can run benchmarks more quickly.
diff --git a/dev/stories/0015-tracing.md b/dev/stories/0015-tracing.md
@@ -0,0 +1,15 @@
+# User Story: Switch to [Tracing](https://lib.rs/crates/tracing) for Logs
+
+As a user of the benchmark application,
+  I would like the logs to provide more information on causality,
+  so that I'm better able to diagnose issues when they're encountered.
+
+
+# Details
+
+* I'll be honest: I don't have a real good use case or burning need for this right now
+    but, rather, I've been watching Tracing for a while now and I'm intrigued by it.
+  I mostly just want to try it out and see if it works well.
+* In addition, I think it's probably time to move away from NDJSON log output,
+    as it's mostly just making the log output less useful right now.
+    * And Tracing also supports NDJSON output, if it turns out to be needed in the future.
diff --git a/dev/stories/0016-more-organization-operations.md b/dev/stories/0016-more-organization-operations.md
@@ -0,0 +1,12 @@
+# User Story: Support More `Organization` Operations
+
+As a benchmark consumer,
+  I would like the benchmarks to cover more operations on the `Organization` resource,
+  so that I can see how reads, searches, etc. perform.
+
+
+## Details
+
+* The `Organization` resource isn't particularly compelling,
+    but is instead selected here as we already have support for its `POST` operation,
+    so adding support for more operations should be less of a lift.
diff --git a/dev/stories/0017-simplify-adding-operations.md b/dev/stories/0017-simplify-adding-operations.md
@@ -0,0 +1,5 @@
+# User Story: Make it Simpler to Add Benchmark Operations
+
+As a benchmark contributor,
+  I would like the benchmark operations to be refactored and better documented,
+  such that adding new benchmark operations is more straightforward.
diff --git a/dev/stories/0018-analyze-synthea-output.md b/dev/stories/0018-analyze-synthea-output.md
@@ -0,0 +1,15 @@
+# User Story: Analyze Synthea Output
+
+As a benchmark contributor,
+  I would like the logs to report on how many resources Synthea produced,
+  broken out by resource type and total storage size,
+  so that I have a better idea which operations I might want to add support for next.
+
+
+## Details
+
+* This will help to determine which resource's operations
+    will most stress servers in terms of data volume.
+* I suspect that `Patient` resources are actually not the most common,
+    and perhaps there are far more `Encounter` resources,
+    but it's really just a guess.
diff --git a/dev/stories/0019-timeseries-data.md b/dev/stories/0019-timeseries-data.md
@@ -0,0 +1,17 @@
+# User Story: Timeseries Data: Latency, Operation Count, Request Size
+
+As a benchmark contributor and/or FHIR server implementer,
+  I would like the benchmark application to output its request data
+  to a timeseries database and analysis suite,
+  such as [InfluxDB](https://www.influxdata.com/products/influxdb/) 
+  plus [Grafana](https://grafana.com/),
+  so that I can analyze the performance of my FHIR server
+  over time during each benchmark operation.
+
+
+## Details
+
+* This is something Wind Tunnel provides.
+* I've heard from FHIR server implementers that this is an important feature for them.
+* For some FHIR servers, they might have native support for InfluxDB,
+    and it'd be interesting to turn that on during benchmarking.
diff --git a/dev/stories/0020-automate-runs-in-cloud.md b/dev/stories/0020-automate-runs-in-cloud.md
@@ -0,0 +1,11 @@
+# User Story: Automate Benchmark Runs in Cloud
+
+As a benchmark contributor,
+  I would like the official benchmark runs (in the cloud) to be fully automated,
+  so that I can reliably reproduce their results.
+
+
+## Details
+
+* Until we support SaaS/cloud-service FHIR server implementations,
+    all official benchmark runs should be performed on EC2 metal instances.
diff --git a/dev/stories/0021-ci-tlc.md b/dev/stories/0021-ci-tlc.md
@@ -0,0 +1,10 @@
+# User Story: Timeseries Data: Latency, Operation Count, Request Size
+
+As a benchmark contributor,
+  I would like the CI setup for the benchmark application to get some TLC,
+  so that its runs are as reliable and fast as is reasonable.
+
+
+## Details
+
+* I don't have any specific problems to resolve yet; I just expect that I will, eventually.