Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade aeson to 2.0.* #2292

Merged
merged 15 commits into from
Jun 14, 2022
Merged

Upgrade aeson to 2.0.* #2292

merged 15 commits into from
Jun 14, 2022

Conversation

robx
Copy link
Contributor

@robx robx commented May 25, 2022

Aeson 2.0 abstracts the object type (from Data.HashMap.Strict to Data.Aeson.KeyMap). From version 2.0.1.0, the default map implementation is switched to Data.Map.Strict, addressing the hash flooding vulnerability. This is also one step towards compatibility with GHC 9.2 (#2288).

Compare https://frasertweedale.github.io/blog-fp/posts/2021-10-12-aeson-hash-flooding-protection.html for some background.

Besides fixing the vulnerability, the change to Data.Map.Strict from Data.HashMap.Strict is likely to have a performance impact for some work loads.

Overview:

  • update cabal file to require aeson-2, and a number of related bound update, specifically
    • jose-0.9
    • swagger-2.8
    • hasql-1.5
      These don't appear to have important changes.
  • update sources to work with the abstract aeson KeyMap type, and change auth claims and CSV row maps accordingly
  • fix IO tests for updated wai-logger
  • update stack configuration for updated dependencies
  • nix updates:
    • update nixpkgs to a version from 2022-03-30, which comes with aeson-2
    • update haskell package overrides
    • update static nix build from static-haskell-nix to build with updated nixpkgs
    • explicitly keep the old version of hlint in order to not deal with changed linter complaints in this PR

The load test fails as per #2308 due to the nix dependency update; this will go away after merging.

@robx

This comment was marked as resolved.

@robx
Copy link
Contributor Author

robx commented Jun 2, 2022

I have no idea where that aeson-2.0.3.0 even comes from...

That was because I'd copied over the aeson source hash. So, since it has the aeson sources in the store as a fixed output thingy, it tries building the aeson sources as OneTuple which... can't work.

@steve-chavez
Copy link
Member

steve-chavez commented Jun 2, 2022

Whoa, this looks like hard work. Unfortunately I'm also lost and don't know a simple way to solve the many dependency errors.

Another option(not sure how feasible), would be to try and replace aeson? I've been eyeing jsonifier for a while.

@robx
Copy link
Contributor Author

robx commented Jun 3, 2022

Whoa, this looks like hard work. Unfortunately I'm also lost and don't know a simple way to solve the many dependency errors.

Another option(not sure how feasible), would be to try and replace aeson? I've been eyeing jsonifier for a while.

Ah don't worry, I think it's a simple change that's just made complicated by a combination of nixpkg's rigidity and my lack of understanding. I think I can probably sort it out by also updating the nixpkgs pin, was just hoping to do only one thing at a time.

@robx
Copy link
Contributor Author

robx commented Jun 3, 2022

One reliable test failure is the following:

 test/spec/Feature/OpenApi/RootSpec.hs:27:7: 
  1) Feature.OpenApi.RootSpec, root spec function, accepts application/json
       body mismatch:
         expected: [{"qiName":"authors_w_entities","qiSchema":"test"},"test"]
         but got:  [{"qiName":"referrals","qiSchema":"test"},"private"]

Any idea what happened there? Is the test fragile, or did something break with the update? (I could see that relying on the key order of JSON maps might have changed with the new aeson version, but that doesn't obviously apply here...)

@steve-chavez
Copy link
Member

Any idea what happened there? Is the test fragile

Yeah, that test can be changed, no problem about it. I've been doing the same on recent commits. In fact that root spec feature is not documented and will be dropped once we have #2299.

@robx robx mentioned this pull request Jun 7, 2022
robx added a commit to robx/postgrest that referenced this pull request Jun 7, 2022
Whereas the test used to specify a body of

  [{"qiName":"referrals","qiSchema":"test"},"private"]

we see both

  [{"qiName":"authors_w_entities","qiSchema":"test"},"test"]

and

  [{"qiName":"organizations","qiSchema":"test"},"test"]

on the GHC 9.2 upgrade PR.

This changes the test to no longer expect a particular body.

Compare:
- PostgREST#2292 (comment)
- PostgREST#1698
@robx
Copy link
Contributor Author

robx commented Jun 7, 2022

@steve-chavez I think this is at a point now where it's "stable" from the point of view of the underlying nixpkgs change -- would it make sense to update cachix (nix-shell --run postgrest-push-cachix) to save on CI resources?

@robx robx mentioned this pull request Jun 7, 2022
robx added a commit that referenced this pull request Jun 7, 2022
Whereas the test used to specify a body of

  [{"qiName":"referrals","qiSchema":"test"},"private"]

we see both

  [{"qiName":"authors_w_entities","qiSchema":"test"},"test"]

and

  [{"qiName":"organizations","qiSchema":"test"},"test"]

on the GHC 9.2 upgrade PR.

This changes the test to no longer expect a particular body.

Compare:
- #2292 (comment)
- #1698
@steve-chavez
Copy link
Member

would it make sense to update cachix (nix-shell --run postgrest-push-cachix) to save on CI resources?

@robx Do you want me to do this manually for this PR? Because postgrest-push-cachix runs on CI already and the cache gets refreshed when it runs on the main branch.

@robx
Copy link
Contributor Author

robx commented Jun 8, 2022

would it make sense to update cachix (nix-shell --run postgrest-push-cachix) to save on CI resources?

@robx Do you want me to do this manually for this PR? Because postgrest-push-cachix runs on CI already and the cache gets refreshed when it runs on the main branch.

I was thinking about manually doing this, yeah, but not sure it's a good idea... Right now every single CI job takes 20 minutes to set up the nix environment. (I don't mind the time, just seems a bit expensive while iterating.)

Besides requiring manual work, a good argument against is that the nixpkgs version might need to change again after all. There's two outstanding issues here, the load test encoding failure and the static build. I expect they should be fixable without changing the nixpkgs version, but it's not really clear at this point.

@steve-chavez
Copy link
Member

the load test encoding failure

We can ignore that one here. It will be gone when merging to main. As seen on #2302 (comment)

pkgsPrev = import nixpkgsPrev { };
inherit (pkgsPrev) hlint;
in
pkgs.callPackage nix/tools/style.nix { inherit hlint hsie; };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd file an issue and/or a follow-up PR to make postgrest hlint-clean for the updated hlint version, and to drop this shim again. But would rather not add linter fixes into the mix here, the PR ballooned enough already.

@robx
Copy link
Contributor Author

robx commented Jun 13, 2022

Assuming the tests pass 🤞, this should be good for review now.

The one thing I can't evaluate is if the switch to Data.Map.Strict has noticeable performance downsides. Do we run some benchmarks to track performance regressions?

@robx robx marked this pull request as ready for review June 13, 2022 10:51
@robx robx requested a review from steve-chavez June 13, 2022 10:52
robx added 2 commits June 13, 2022 13:16
With both HashMap and Map imported as M in different modules,
linter rules prevented ever importing both modules in one place.
robx and others added 13 commits June 13, 2022 13:25
This means that we're now using Data.Map.Strict instead of
Data.HashMap.Strict for JSON objects in general, and specifically
for claims maps and CSV rows.

This addresses certain hash flooding vulnerabilities, but may
have performance downsides.

Compare e.g. https://frasertweedale.github.io/blog-fp/posts/2021-10-12-aeson-hash-flooding-protection.html
jose version before 0.8.5.1 lacked an upper bound on
aeson-2, causing build failures with aeson-2 present.
wai-logger version 2.4.0 fixes log output to not say 'unknownSocket'
for unix sockets.
This patches static-haskell-nix to work for building postgrest
with updated nixpkgs (from ~202203):

- The ncurses 'enableStatic' argument doesn't exist anymore.
  We use the vanilla package instead, which seems to work fine.
- The 'isExecutable' check fails with a strange error related
  to trying to override 'mkDerivation'.
  We patch 'isExecutable' to check explicitly whether we're building
  postgrest. ('isExecutable' is used to determine whether to build
  a package statically.)
This keeps the pre-nixpkgs update hlint version, so we can defer
addressing the warnings to a follow-up PR.
@steve-chavez
Copy link
Member

The one thing I can't evaluate is if the switch to Data.Map.Strict has noticeable performance downsides. Do we run some benchmarks to track performance regressions?

Ah, that would be the loadtest, which unfortunately doesn't run here. But let me try some manual load tests.

Copy link
Member

@steve-chavez steve-chavez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't able to do manual load tests(it takes too long to compile the static build 😞), will do it when cachix gets updated. At any rate, we have to upgrade eventually. So this LGTM.

@robx robx merged commit bac63f3 into PostgREST:main Jun 14, 2022
@robx robx deleted the aeson2 branch June 14, 2022 05:43
@robx
Copy link
Contributor Author

robx commented Jun 14, 2022

Here are load test results post-merge on main: https://github.com/PostgREST/postgrest/runs/6874664942
And the previous commit on main: https://github.com/PostgREST/postgrest/runs/6813060905

@robx
Copy link
Contributor Author

robx commented Jun 14, 2022

Randomly pulling the HEAD column from both reports, we get:

metric main pr pr/main
bytes_in.mean 841 841 1.0
bytes_in.total 14688631 16883495 1.14
bytes_out.mean 18 18 1.0
bytes_out.total 309042 355416 1.15
duration 60000099777 60018094347 1.0
latencies.50th 1025381 929754 0.9
latencies.90th 13953938 10120705 0.72
latencies.95th 23995719 21513839 0.89
latencies.99th 26633982 22525434 0.84
latencies.max 43676099 38462247 0.88
latencies.mean 3436057 2987338 0.86
latencies.min 515510 429336 0.83
latencies.total 59997005490 59991731686 0.99
rate 291 335 1.15
requests 17461 20082 1.15
status_codes.200 10477 12050 1.15
status_codes.201 1746 2008 1.15
status_codes.204 5238 6024 1.15
success 1 1 1.0
throughput 291 335 1.15
wait 25301762 949680 0.03

The wait value for pre-merge seems to be an outlier -- the main column is similar to the post-merge result.

Other than that, it looks like the upgrade (whether aeson2 or some of the other dependency changes) rather improved results, looks to me like 15% throughput increase, matching latency decrease.

@steve-chavez
Copy link
Member

steve-chavez commented Jun 15, 2022

@robx Loadtest results look great! I've been trying to run other loadtests but..

Unfortunately, I no longer seem to be able to produce a static binary:

nix-build --attr postgrestStatic

ldd result/bin/postgrest
        linux-vdso.so.1 (0x00007ffe5bb03000)
        libz.so.1 => /nix/store/srww288f7sb0f8hb5399431k8ldg2pss-zlib-1.2.11/lib/libz.so.1 (0x00007fda1a070000)
        libpq.so.5 => /nix/store/8y5nh8zx42qbzw0ki5rc9ywgi7z3axqb-postgresql-14.2-lib/lib/libpq.so.5 (0x00007fda19d2c000)
        libgmp.so.10 => /nix/store/wxyn3pv3kvqrl2lnfzpaa82vsd6fizka-gmp-6.2.1/lib/libgmp.so.10 (0x00007fda19c8c000)
        libc.so => /nix/store/88nqrkcal1mmiqbx10ax6wkyncf1h1gk-musl-1.2.2/lib/libc.so (0x00007fda19bd9000)
        libffi.so.8 => /nix/store/wb3ji9r5b2f747phif5lfx7np45fl9px-libffi-3.4.2/lib/libffi.so.8 (0x00007fda19bcc000)

9.0.1 shows:

ldd postgrest
$       not a dynamic executable

We need a test on CI to ensure the above output.


On another NixOS server, trying to run produces an error(No such file or directory) and when running ldd on it:

ldd /nix/store/hpx1rapsd6ylshh8d115ab7ifpjd3bia-postgrest/bin/postgrest

/nix/store/hpx1rapsd6ylshh8d115ab7ifpjd3bia-postgrest/bin/postgrest: 
error while loading shared libraries: 
/nix/store/2pi6zgkwnr3zdslvlv16nixpzvbyjx1n-glibc-2.31/lib/libc.so: invalid ELF header

This was referenced Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants