Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize storage (serialization and de-serilization) of very large dictionaries inside MongoDB #4846

Merged
merged 107 commits into from
Mar 20, 2021

Conversation

Kami
Copy link
Member

@Kami Kami commented Jan 20, 2020

This pull request builds on top of various research I did in other PRs and issues in the past (#4837, #4838, etc).

The goal is to speed up serialization and de-serialization of very large dictionaries (e.g.large results produced by various actions) inside MongoDB.

Right now we store those dictionaries as native MongoDB dictionaries, but this means we need to escape the keys, since they can't contain $ and . which are special characters.

This escaping is inefficient and wasteful with large (and nested) dictionaries.

Proposed Implementation / Improvement

This pull request tries to optimize that by serializing those dictionaries as JSON instead.

Keep in mind that this is fine, since we always treat those values as opaque strings anyway.

In fact, a future improvement would be to have a result database object per runner, this way we could utilize a simple "string" type for the actual result, but that's a much more involved change.

Some actual numbers with synthetic dataset are available in other issues mentioned above. Actual numbers with production real-life data are to follow once we deploy this to CI/CD server.

The change is fully backward compatible, but it's behind a feature flag (opt-in) until we are really sure after running it in production for a while with diverse datasets it doesn't make some corner / edge case worse or similar.

Related optimization I talked about to @m4dcoder is getting rid of using EscapedDict DB field type in scenarios where keys can't contain special characters (e.g. potentially various workflow related models).

TODO

  • Update all the affected models to use this field type
  • Tests
  • Deploy it on CI/CD for a month, observe performance, memory usage, etc
  • Add optional migration script for affected models - future
  • Document change in upgrade notes and point users to the optional migration script - future
  • Add some docs for developers on the new DB format
  • Update execution list / get command to display execution run duration which includes database write time not just action run time
  • Document the change

serialize and unserialize large dictionary data (such as result field,
etc.).
@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Jan 20, 2020
def to_mongo(self, value, use_db_field=True, fields=None):
if not cfg.CONF.database.use_json_dict_field:
value = mongoescape.escape_chars(value)
return super(stormbase.EscapedDynamicField, self).to_mongo(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bugs out because JSONDictEscapedFieldCompatibilityField is incompatible with EscapedDynamicField.

@@ -152,6 +152,10 @@ class ActionExecutionAPI(BaseAPI):
@classmethod
def from_model(cls, model, mask_secrets=False):
doc = cls._from_model(model, mask_secrets=mask_secrets)

import json
doc['result'] = json.loads(doc['result'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add this to get the result in JSONDictEscapedFieldCompatibilityField with ujson/cjson backend to work properly. This is because _from_model will call to_mongo which dumps the result in JSON to string. The CLI output will be displayed as a string.

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This experimental patch is not ready yet. I tested it and it is buggy and also don't like that it needs to json.dumps/loads more than it needed. I commented the problem places in the code in this PR.

@stale
Copy link

stale bot commented Jul 1, 2020

Thanks for contributing to this issue. As it has been 90 days since the last activity, we are automatically marking is as stale. If this issue is not relevant or applicable anymore (problem has been fixed in a new version or similar), please close the issue or let us know so we can close it. On the contrary, if the issue is still relevant, there is nothing you need to do, but if you have any additional details or context which would help us when working on this issue, please include it as a comment to this issue.

@guzzijones
Copy link
Contributor

I have taken a look at this. +1 from me. Saving the result as a string instead of a mongo object/dict is ideal. There is no current use case to index nested dicts from workflow transitions while running.

@stale stale bot removed the stale label Feb 18, 2021
using two different approaches for serializing execution / live action
result.
@Kami
Copy link
Member Author

Kami commented Feb 18, 2021

After some discussions on Slack, I decided to try to attack this problem again and wrap up this change - slow insertion of large executions problem (which is fairly common in StackStorm world) still hasn't gone anyway and it's very much alive as it was 3 years ago.

I already performed a lot of benchmarkinh, research and prototyping in the past, but since quite a lot of time has passed since then (and data is scattered across various different issues and PRs), I decided to start from scratch with micro benchmarks which compare using different approach for storing LiveActionDB.result field value which is the main offender in our case.

Good thing about micro benchmarks is that it's also very easy to add a new one or use a different fixture file, etc. to see how it affects the numbers (and they are also fully reproducible by everyone).

First to refresh everyone's memory (including mine).

The two issues which contribute to slow insertion / saving of executions with large results:

  1. mongoengine inefficency when converting large dictionaries to native pymongo SON types.
  2. Escaping in our EscapedDictField which is performed on every single key in a dictionary (we escape $, . - this is needed if you want to store dictionaries which keys contain those values which are special in MongoDB since they act as operators and separators).

And one very important invariant on the result field itself - we use dictionary type for execution result field, but for the purpose of database itself, that field is an opaque binary blob - we don't perform any querying on it's values.

That's very important because it allows us to treat it as a string / binary blog which makes things easier (in fact, one of the option I already debated in the past is storing result in a separate collection as a string value and then add a FK to LiveAction collection).

Proposed Approach

The proposed change to mitigate / fix this issue is still the same one as I proposed and researched more than a year ago - storing result as a dictionary serialized to a JSON string / blog. Only difference is that now I use orjson instead of ujson since micro benchmark shows it's 5-15% faster than ujson on this specific dataset.

A lot of other approaches were also brought up in the past - things such as removing and replacing mongo all together, using a different ORM library, etc.

Yes, many of those would likely result in a much better performance if they were executed and implemented correctly (if!), but most of them are not even close to realistic - they are much more risky (much larger change, harder to make it backward compatible, etc.) and simply not realistic since they would require many man month hours worth of work (and that's an ideal scenario, less ideal and more realistic scenario where various edge cases crop up, compatibility issues, etc. is closer to 6 months).

This means that this approach is one of the few possible realistic approaches we have left.

This approach exploits the invariant that result is for all purposes, opaque binary blob to the database itself and if we use a binary field, we don't need to run escaping code on it + mongoengine doesn't need to perform slow conversion.

Initial Results

Initial results from micro benchmarks are very promising.

Save

Screenshot from 2021-02-18 21-48-49

Read

Screenshot from 2021-02-18 21-46-32

For a very large execution (8 MB), we see almost ~10x improvement in insertion speed (5s vs 500ms) and ~5x improvement in read speed (800ms vs 150ms).

The good news is that this change also scales down - aka it's not slower for smaller results - in fact it's also a bit faster for those scenarios as well.

Those numbers are from running micro benchmarks on my computer. They also run on continuously CI and everyone who wants to run it can easily run them themselves and also easily add a different dataset (e.g. some real life execution result data).

Things to experiment with

I still want to experiment with compressing actual json blob using zstandard.

The goal here is not to save number of bytes on disk (those were never really a big issue), but to save number of bytes which need to be sent over the network when talking to the database - this could perhaps help and reduce overall operation time even more for very large executions, but I doubt it will be worth the effort in the end. Will see what data will show us.

TODO

  • Also add micro benchmark just for the dict key escaping code
  • Add micro benchmark which include zstandard compression
  • Finish up the code
  • Add more backward and forward compatibility tests
  • Add some fixtures which (anonymized) utilize real life execution results
  • Get rid of json backend option - this was only used to benchmark different backends and just use orjson

all JSON fixture files so they contain at least one key with character
which needs to be escaped.
@Kami
Copy link
Member Author

Kami commented Feb 18, 2021

I added micro benchmarks just for the escaping and unescape function and also unit tests for the new fields.

Running existing tests did uncover some issues - in some places, we run a set query for updating a live action and we pass dict for the value (which also kinda brakes the our ORM layer / DB abstraction and shows why such changes are hard - no abstraction is perfect and there are always leaks).

Sadly this won't out of the box, I will need to think and experiment some more on how to best approach this.

Technically, I think should be able to make it work, as long as I add some additional code to the places which use field__set notation to make sure the value we pass is already serialized as JSON instead of the raw dict - sadly to_mongo is not called of set queries.

@Kami
Copy link
Member Author

Kami commented Feb 18, 2021

For now, I'm leaning towards adding a header to the serialized field value even though we will likely just support single serialization library and format (orjson) to make it a bit more future-proof in case we ever need to change the serialization format.

Something along the lines of b"<compression algorithm (if any):><serialization format>:<serialized data>.

So the value would look like something like this: n:or:{"foo": "bar"} where o stands for orjson and n for no compression (and z could be zstandard).

This adds some overhead, but it's really just 4 bytes so it's negligible (especially if we ever support compression).

@m4dcoder
Copy link
Contributor

Can we be opinionated here and support a specific method of serialization? There's too much options to support (i.e. orjson, ujson, cjson) and I rather json be json w/o any prefix that only st2 understands.

Copy link
Contributor

@amanda11 amanda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just added a minor comment.
Would like someone else to review as well though, if possible.

@@ -21,6 +21,11 @@

import yaml

try:
from yaml import CSafeDumper as YamlSafeDumper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we possibly add a comment to explain why we may not have CSafeDumper, and need this catch/exception?

Copy link
Member Author

@Kami Kami Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny enough, I just added similar change recently (51f811c).

Technically as long as people are using official packages it should be there. But if they do from source install, etc it may not be.

Agreed it's a good ideal and will add a comment as well :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kami Kami force-pushed the optimize_escaped_dict_fields branch from 8362768 to bdd8e3c Compare March 19, 2021 13:25
Copy link
Contributor

@amanda11 amanda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - realised I hadn't seen the full PR changes on last review.

Just a couple of other queries...

"This message was generated by StackStorm action "
"send_mail running on %s" % (HOSTNAME)
)
expected_body = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we've kept the code from the if - which if I read it was only if we were running on python2, so should we not have kept the else logic?

Copy link
Member Author

@Kami Kami Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of the used library has also been changed / upgraded so now the behavior is different / more correct.

To clarify - we don't want to use escape sequences since that is Python specific and would not display correctly in the email clients, etc.

I assume we just did that change when working on Python 3 support a long time ago since it was the easiest, but technically not the most correct one :)

"This message was generated by StackStorm action "
"send_mail running on %s" % (HOSTNAME)
)
expected_body = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we've kept the code from the if - which if I read it was only if we were running on python2, so should we not have kept the else logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

description: True to compress the response body
type: boolean
default: false
- name: pretty_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, is the UI going to use the pretty format or the non-pretty format, and does this affect how the response appears in the UI at all?

Copy link
Member Author

@Kami Kami Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, UI will use pretty format, but keep in mind that this API endpoint is only used when displaying results for large executions - we don't display it directly in the UI anymore (since that would freeze the browser. etc.), but we just display link to the API endpoint which returns raw result.

Here is a WIP st2web change - StackStorm/st2web#868 (https://github.com/StackStorm/st2web/pull/868/files#diff-38caf478018c81662a6660b616ab9dd030868fb0aec1c01a6c3916465d44f6f0R48).

In short - it doesn't affect output of st2web for executions which we display results directly in the ui (and those go through the code highlighter widget which does re-formatting anyway).

Copy link
Contributor

@amanda11 amanda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@blag blag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but there's a few small improvements that could be done.

Makefile Show resolved Hide resolved
st2client/st2client/commands/action.py Show resolved Hide resolved
test-requirements.txt Show resolved Hide resolved
@Kami
Copy link
Member Author

Kami commented Mar 20, 2021

@blag Thanks for the review.

I will go ahead and merge this since it's already as massive PR, but I plan to open another small PR on top of that in the near future.


Having said that, 'm a bit OCD and that extra query on get in st2web change (StackStorm/st2web#868) was not making me happy, so I identified another approach I could use to avoid additional query in st2web to get execution result when result size < threshold.

Basically, I will modify API and add ?max_result_size=X (or similar, tbd) query parameter to /v1/executions/<id> API endpoint.

Instead of performing a simple "get one by id" query, we will perform a more complex query (something along the lines of id == Yo and (result_size < X | not result_size exists) and only return result with the execution object in case the query returns any data.

Good news is that this query is fast since first filter already filter on the PK so it should add negligible overhead (funny enough, based on the testing I did, sometimes it even returns faster than just a plan get by ID query, no idea why), but it will allow us to simplify st2web change and also avoid one extra API operation on st2web side.

@jegesh
Copy link

jegesh commented Jun 13, 2021

I'm looking to patch this PR into my current st2 installation. Any documentation I should refer to? Tips? I'm running v3.4 on Ubuntu 18 and st2 slows to a crawl when we run a few nested workflows, even though I set WORKERS=30 in the config file (and I can see 30 worker processes in htop), so fixing this will really be a godsend for us.

@Kami
Copy link
Member Author

Kami commented Jun 13, 2021

@jegesh this has been merged and will be included in the next release which should be out soon. In the mean time you can use dev unstable package for testing if you feel adventurous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants