Skip to content

Commit

Permalink
Drafted plan and stories for Round 1 release.
Browse files Browse the repository at this point in the history
  • Loading branch information
karlmdavis committed Jun 27, 2021
1 parent 09e687d commit 14e2649
Show file tree
Hide file tree
Showing 14 changed files with 198 additions and 1 deletion.
54 changes: 54 additions & 0 deletions dev/plans/0001-round-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Round 1 Release Plan

This will be the first official release and also
the publishing of the first permanent benchmark results.
In general, the goals here are:

1. Provide enough data to help FHIR API server users
make better-informed decisions than they otherwise could,
without going so overboard that the release never happens.
2. Provide enough data to help FHIR API server implementors
understand how their performance stands relative to other implementations
and also enough information for them to work on improving their performance,
if they'd like to.
3. Ensure that the results are reasonably stable / repeatable.
4. Make it reasonably simple to add new benchmarks.

It's worth pointing out some goals that are neat & cool & stuff
but are nevertheless explicitly not in scope for this first release
(in the interests of ensuring that there _is_ a first release sometime this century):

* Make it easy for implementors to incorporate these benchmarks
into their build processes.
* Add support for SaaS-only FHIR API server implementations.
* Calculate the cost per request served for each FHIR API server.


## Detailed Plan

The following user stories are currently planned to be in scope for this release:

* [x] [Compare Performance of FHIR Servers](../stories/0001-perf-compare.md)
* [x] [Continuous Integration](../stories/0002-ci.md)
* [x] [Sample Data](../stories/0003-sample-data.md)
* [x] [Support Firely Spark](../stories/0005-firely-spark.md)
* [x] [Increase Detail in the Application's Errors](../stories/0007-error-details.md)
* [x] [GitHub README](../stories/0008-readme.md)
* [x] [Publish Results to a Website](../stories/0009-publish-results.md)
* [ ] [IBM FHIR](../stories/0012-ibm-fhir.md)
* [ ] [Improve Management of FHIR Server Dockerfiles](../stories/0013-refactor-dockerfiles.md)
* [ ] [Cache Sample Data in S3](../stories/0014-cache-sample-data-in-s3.md)
* [ ] [Tracing](../stories/0015-tracing.md)
* [ ] [Support More `Organization` Operations](../stories/0016-more-organization-operations.md)
* [ ] [Make it Simpler to Add Benchmark Operations](../stories/0017-simplify-adding-operations.md)
* [ ] [Analyze Synthea Output](../stories/0018-analyze-synthea-output.md)
* [ ] [Support `Patient` Resource Operations](../stories/0004-patient-ops.md)
* [ ] [Timeseries Data: Latency, Operation Count, Request Size](../stories/0019-timeseries-data.md)
* [ ] [Improve Debugging of Operation Failures](../stories/0011-operation-failure-debugging.md)
* [ ] [Automate Benchmark Runs in Cloud](../stories/0020-automate-runs-in-cloud.md)
* [ ] [Give the CI Some TLC](../stories/0021-ci-tlc.md)

In addition, the following bugs are currently planned to be fixed:

* [x] [HAPI Failures After Launch](../stories/0006-hapi-startup-wait.md)
* [ ] [HAPI 'POST /Organization' Failures With Timeouts](../stories/0010-hapi-post-org-timeouts.md)
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Synthetic Data
# User Story: Sample Data

As a reader/consumer of benchmarks,
I need the data used by the FHIR servers when they're being genchmarked to be realistic,
Expand Down
12 changes: 12 additions & 0 deletions dev/stories/0010-hapi-post-org-timeouts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Bug: HAPI 'POST /Organization' Failures With Timeouts

Getting a lot of operation failures with the benchmarks logging this:

```
{
"msg": "Operation 'POST /Organization' failed: 'ServerOperationIterationState { _inner: ServerOperationIterationFailed { completed: ServerOperationIterationCompleted { start: ServerOperationIterationStarting { started: 2021-05-31T18:22:55.066093905Z }, completed: 2021-05-31T18:23:05.066276646Z }, error: Operation timed out: 'future has timed out' } }",
"level": "WARN",
"ts": "2021-05-31T18:23:45.09314212000:00"
}
These consistently pop up with more concurrency, e.g. when running on `eddings` about a quarter of requests at `concurrent_users: 10` are failing due to this.
25 changes: 25 additions & 0 deletions dev/stories/0011-operation-failure-debugging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# User Story: Improve Debugging of Operation Failures

At higher concurrency levels right now,
I'm seeing a lot of operation timeout failures for both HAPI and Spark.

However, the only way to debug those failures
is to stare at the log output during a benchmark run,
wait for failures to get logged,
and then quickly try to run `docker logs ...` for the container.
As debugging experiences go,
this is bad.

Instead, I think we need to start collecting logs for the FHIR servers,
writing those logs to disk,
and then referencing all of the log file locations in the JSON output.

## Details

* For larger benchmark runs, these log files are liable to eat a lot of disk space,
perhaps even enough to exhaust the benchmark system's free space.
I may want to compress them upfront, to help mitigate this.
* Haven't spent some time trying out the goofy manual debugging procedure above for HAPI timeouts,
it's not clear to me that these logs
will actually provide enough information to diagnose the problems.
Nevertheless, this seems like a necessary first step towards improving debugging in general.
7 changes: 7 additions & 0 deletions dev/stories/0012-ibm-fhir.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# User Story: Support [IBM FHIR Server](https://github.com/IBM/FHIR)

As a benchmark user,
I'd like to see the benchmarks include
[IBM FHIR Server](https://github.com/IBM/FHIR),
so that I can understand its performance,
relative to other FHIR server implementations.
9 changes: 9 additions & 0 deletions dev/stories/0013-refactor-dockerfiles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# User Story: Improve Management of FHIR Server Dockerfiles

As a benchmark contributor,
I'd like the various `Dockerfile`s to be better managed
so that it's easy to understand what version things are pinned to
and so that it's easy to update that.
I'd also like to either stop using `git submodule`s for those
or at least to better document how to work with them
(because I can never remember).
5 changes: 5 additions & 0 deletions dev/stories/0014-cache-sample-data-in-s3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# User Story: Cache Sample Data

As a user of the benchmark application,
I would like it to cache the sample data it generates in S3,
so that I can run benchmarks more quickly.
15 changes: 15 additions & 0 deletions dev/stories/0015-tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# User Story: Switch to [Tracing](https://lib.rs/crates/tracing) for Logs

As a user of the benchmark application,
I would like the logs to provide more information on causality,
so that I'm better able to diagnose issues when they're encountered.


# Details

* I'll be honest: I don't have a real good use case or burning need for this right now
but, rather, I've been watching Tracing for a while now and I'm intrigued by it.
I mostly just want to try it out and see if it works well.
* In addition, I think it's probably time to move away from NDJSON log output,
as it's mostly just making the log output less useful right now.
* And Tracing also supports NDJSON output, if it turns out to be needed in the future.
12 changes: 12 additions & 0 deletions dev/stories/0016-more-organization-operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# User Story: Support More `Organization` Operations

As a benchmark consumer,
I would like the benchmarks to cover more operations on the `Organization` resource,
so that I can see how reads, searches, etc. perform.


## Details

* The `Organization` resource isn't particularly compelling,
but is instead selected here as we already have support for its `POST` operation,
so adding support for more operations should be less of a lift.
5 changes: 5 additions & 0 deletions dev/stories/0017-simplify-adding-operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# User Story: Make it Simpler to Add Benchmark Operations

As a benchmark contributor,
I would like the benchmark operations to be refactored and better documented,
such that adding new benchmark operations is more straightforward.
15 changes: 15 additions & 0 deletions dev/stories/0018-analyze-synthea-output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# User Story: Analyze Synthea Output

As a benchmark contributor,
I would like the logs to report on how many resources Synthea produced,
broken out by resource type and total storage size,
so that I have a better idea which operations I might want to add support for next.


## Details

* This will help to determine which resource's operations
will most stress servers in terms of data volume.
* I suspect that `Patient` resources are actually not the most common,
and perhaps there are far more `Encounter` resources,
but it's really just a guess.
17 changes: 17 additions & 0 deletions dev/stories/0019-timeseries-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# User Story: Timeseries Data: Latency, Operation Count, Request Size

As a benchmark contributor and/or FHIR server implementer,
I would like the benchmark application to output its request data
to a timeseries database and analysis suite,
such as [InfluxDB](https://www.influxdata.com/products/influxdb/)
plus [Grafana](https://grafana.com/),
so that I can analyze the performance of my FHIR server
over time during each benchmark operation.


## Details

* This is something Wind Tunnel provides.
* I've heard from FHIR server implementers that this is an important feature for them.
* For some FHIR servers, they might have native support for InfluxDB,
and it'd be interesting to turn that on during benchmarking.
11 changes: 11 additions & 0 deletions dev/stories/0020-automate-runs-in-cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# User Story: Automate Benchmark Runs in Cloud

As a benchmark contributor,
I would like the official benchmark runs (in the cloud) to be fully automated,
so that I can reliably reproduce their results.


## Details

* Until we support SaaS/cloud-service FHIR server implementations,
all official benchmark runs should be performed on EC2 metal instances.
10 changes: 10 additions & 0 deletions dev/stories/0021-ci-tlc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# User Story: Timeseries Data: Latency, Operation Count, Request Size

As a benchmark contributor,
I would like the CI setup for the benchmark application to get some TLC,
so that its runs are as reliable and fast as is reasonable.


## Details

* I don't have any specific problems to resolve yet; I just expect that I will, eventually.

0 comments on commit 14e2649

Please sign in to comment.