Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of create staked command #4021

Merged

Conversation

newhoggy
Copy link
Contributor

@newhoggy newhoggy commented Jun 9, 2022

The problem arises we build a very large in-memory JSON value and then write it out as genesis.json. The amount of memory used by the in-memory JSON value can be so large as to use up all the memory.

This PR reduces the severity of the memory usage in two ways.

  1. It introduces a ListMap type, which is almost like Map, but has a [(k, v)] as its internal representation. This avoids the serialisation cost of constructing a map only to convert it back into a list.

  2. Uses Lazy IO so that generated stuffed utxos are created on demand rather than upfront an all in memory.

  3. Introduces a new LazyToJson type class which doesn't have the memory retention problems of aeson library.

  4. For evaluation of fields to WHNF to allow parent object to be GCed which allows large fields that have already be serialised to be collected as well.

  5. Writes delegations to a single delegations.jsonl file which is a newline delimited JSON file. This file is streamed multiple times so that generation of the genesis.json file does not retain memory unnecessarily.

This PR also changes the command to no longer generate payment keys and stake keys. If we want to have the ability optionally output these files, there is additional work to do.

Addresses #3938

@newhoggy newhoggy changed the title Newhoggy/reduce memory usage of create staked command Reduce memory usage of create staked command Jun 9, 2022
@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch from 1d7c997 to 60eae8c Compare June 9, 2022 06:39
@newhoggy newhoggy marked this pull request as ready for review June 9, 2022 06:40
@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch 2 times, most recently from f817ee4 to 7a72ec1 Compare June 9, 2022 06:44
@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch 6 times, most recently from 661ba47 to 2f68910 Compare June 13, 2022 11:38
@newhoggy
Copy link
Contributor Author

Example run:

$ time dist-newstyle/build/aarch64-osx/ghc-8.10.7/cardano-cli-1.33.0/x/cardano-cli/build/cardano-cli/cardano-cli genesis create-staked --genesis-dir example --supply 10000000000000 --gen-utxo-keys 1 --gen-genesis-keys 0 --supply-delegated 2000000000000000 --gen-pools 2 --gen-stake-delegs 1300000 --testnet-magic 42 --num-stuffed-utxo 8000000
generated genesis with: 0 genesis keys, 1 non-delegating UTxO keys, 2 stake pools, 1300000 delegating UTxO keys, 1300000 delegation map entries,
 genesis create-staked --genesis-dir example --supply 10000000000000  1  0     441.74s user 321.01s system 99% cpu 12:47.91 total

@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch 4 times, most recently from cfca52c to 3c8c31d Compare June 14, 2022 05:21
@newhoggy
Copy link
Contributor Author

Slight improvement and uses about 1GB of memory:

$ time dist-newstyle/build/aarch64-osx/ghc-8.10.7/cardano-cli-1.33.0/x/cardano-cli/build/cardano-cli/cardano-cli genesis create-staked --genesis-dir example --supply 10000000000000 --gen-utxo-keys 1 --gen-genesis-keys 0 --supply-delegated 2000000000000000 --gen-pools 2 --gen-stake-delegs 1300000 --testnet-magic 42 --num-stuffed-utxo 8000000
generated genesis with: 0 genesis keys, 1 non-delegating UTxO keys, 2 stake pools, 1300000 delegating UTxO keys, 1300000 delegation map entries,
 genesis create-staked --genesis-dir example --supply 10000000000000  1  0     133.75s user 290.89s system 98% cpu 7:12.31 total

@newhoggy newhoggy requested a review from deepfire June 14, 2022 07:43
cardano-cli/src/Cardano/CLI/Shelley/Run/Address.hs Outdated Show resolved Hide resolved
cardano-cli/src/Cardano/CLI/Shelley/Run/Genesis.hs Outdated Show resolved Hide resolved
cardano-cli/src/Cardano/CLI/Shelley/Run/Genesis.hs Outdated Show resolved Hide resolved
cardano-cli/src/Cardano/CLI/Shelley/Run/Genesis.hs Outdated Show resolved Hide resolved
cardano-cli/src/Cardano/CLI/Shelley/Run/Genesis.hs Outdated Show resolved Hide resolved
cardano-cli/src/Cardano/CLI/Shelley/Run/StakeAddress.hs Outdated Show resolved Hide resolved
cardano-cli/src/Cardano/CLI/Shelley/Run/Address.hs Outdated Show resolved Hide resolved
@Jimbo4350
Copy link
Contributor

I need to make a second pass

Copy link
Contributor

@Jimbo4350 Jimbo4350 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest review comments courtesy of @dcoutts .

distribution = [pool | (pool, poolIx) <- zip pools [1 ..], _ <- [1 .. delegsForPool poolIx]]

-- Distribute M delegates across N pools:
delegations <- liftIO $ Lazy.forM distribution $ computeDelegation network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say lets just implement out own version of forM and replicateM in the cli and remove the hw-lazy dependency

Copy link
Contributor Author

@newhoggy newhoggy Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cardano-node doesn't seem the right place to put them given how generic they are. Can I put them into cardano-base?

Note these sorts of functions need associated tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cardano-prelude might be a good place for those functions and tests, unless the overall plan is to stop depending on cardano-prelude (at least that is what we are striking for in cardano-base and cardano-ledger repos)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put them in Cardano.CLI.IO.Lazy for now. It can be moved to a better place in a different PR if we find one.

-- times ensures that any data structures that are created as a result of the read is not
-- retained in memory.

let numDelegations = length delegations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspicious, calling length could destroy the laziness. We could get numDelegations in this list comprehension: distribution = [pool | (pool, poolIx) <- zip pools [1 ..], _ <- [1 .. delegsForPool poolIx]] if we multiply the number of pools by the number of delegations per pool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, calling length destroys laziness, but I've been in discussion with @kosyrevSerge and he is happy to pay the memory cost in this case.

Note, length is not the only thing that "destroys laziness". Using delegations more than once does the same thing. Previously, delegations was written to a temporary file and the temporarily file was read multiple times to preserve laziness, but @kosyrevSerge didn't want to pay the serialisation cost of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above is out of date however and I've deleted it.

StakeVerificationKey stakeVK <- firstExceptT ShelleyGenesisCmdTextEnvReadFileError
. newExceptT
$ readFileTextEnvelope (AsVerificationKey AsStakeKey) stakeVKF
computeDelegation :: ()
Copy link
Contributor

@Jimbo4350 Jimbo4350 Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this function we are generating two keys but all we are interested in are their hashes. We should create a new primitive in the cli that creates non-cryptographically secure key hashes for testing purposes like this.

We can use System.Random for the entropy and hashFromBytes (in cardano-base) to generate dummy key hashes. This should speed things up a lot as generateSigningKey is an expensive function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new generateInsecureSigningKey function has been introduced.

import qualified Data.List as L
import qualified Data.Vector as V

newtype ListMap k v = ListMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to generically derive the JSON instances for ListMap and it should do the right thing. @newhoggy can you try this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, generically deriving JSON would do one of two things:

deriving newtype would change the format of the JSON file to use [["key1", "value1"], ["key2", "value2"]] instead of {"field1": "value1", "field2": "value2"}.

DeriveGeneric would additionally add { "unListMap": ... } wrapper.

Copy link
Contributor Author

@newhoggy newhoggy Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're settling on changing the ledger code, so this code will disappear once the ledger has been updated.

I'm keeping this the same for now.

cardano-cli/src/Cardano/CLI/Shelley/Run/Genesis/Types.hs Outdated Show resolved Hide resolved
@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch from 3c8c31d to d2abe0b Compare June 14, 2022 21:23
@newhoggy newhoggy requested a review from Jimbo4350 June 15, 2022 13:43
@newhoggy newhoggy dismissed Jimbo4350’s stale review June 15, 2022 13:45

Some changes addressed

@@ -330,6 +330,18 @@ source-repository-package
tag: ee59880f47ab835dbd73bea0847dab7869fc20d8
--sha256: 1lrzknw765pz2j97nvv9ip3l1mcpf2zr4n56hwlz0rk7wq7ls4cm

source-repository-package
Copy link
Contributor

@Jimbo4350 Jimbo4350 Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should duplicate the functions used from hw-lazy. The node has enough dependencies and we need to be stricter about introducing new ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the functions into Cardano.CLI.IO.Lazy for the moment.

@@ -67,6 +69,16 @@ generateSigningKey keytype = do
seedSize = deterministicSigningKeySeedSize keytype


generateInsecureSigningKey :: (Key keyrole, SerialiseAsRawBytes (SigningKey keyrole))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we indent the ::

-> IO (SigningKey keyrole)
generateInsecureSigningKey keytype = do
g <- Random.getStdGen
let (bs, _) = Random.genByteString (fromIntegral (deterministicSigningKeySeedSize keytype)) g
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ditch a pair of the parentheses with $?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let (bs, _) = Random.genByteString (fromIntegral (deterministicSigningKeySeedSize keytype)) g
case deserialiseFromRawBytes (AsSigningKey keytype) bs of
Just key -> return key
Nothing -> error "Unable to generate insecure key"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better error message would be "generateInsecureSigningKey: Unable to generate insecure key"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

makeShelleyTransactionBody :: ()
=> ShelleyBasedEra era
makeShelleyTransactionBody ::
ShelleyBasedEra era
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not indent the ::?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ::

@@ -481,3 +483,8 @@ readVerificationKeyOrHashOrTextEnvFile asType verKeyOrHashOrFile =
eitherVk <- readVerificationKeyOrTextEnvFile asType vkOrFile
pure (verificationKeyHash <$> eitherVk)
VerificationKeyHash vkHash -> pure (Right vkHash)

generatePaymentKeys :: Key keyrole => AsType keyrole -> IO (VerificationKey keyrole, SigningKey keyrole)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these may not necessarily be payment keys because we can vary the keyrole. generateKeyPair might be a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

newtype VerificationKey StakeKey =
StakeVerificationKey (Shelley.VKey Shelley.Staking StandardCrypto)
newtype VerificationKey StakeKey = StakeVerificationKey
{ unStakeVerificationKey :: Shelley.VKey Shelley.Staking StandardCrypto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am always in favour of added an accessor to a newtype. If its there and no one uses it, its not a big deal (especially with a totally un-ambiguous name like this), but when its needed having it already there is a huge win IMO.

Copy link
Contributor

@Jimbo4350 Jimbo4350 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but nothing major

AddressKeyHash vkf mOFp -> runAddressKeyHash vkf mOFp
AddressBuild paymentVerifier mbStakeVerifier nw mOutFp -> runAddressBuild paymentVerifier mbStakeVerifier nw mOutFp
AddressBuildMultiSig sFp nId mOutFp -> runAddressBuildScript sFp nId mOutFp
AddressInfo txt mOFp -> firstExceptT ShelleyAddressCmdAddressInfoError $ runAddressInfo txt mOFp

runAddressKeyGen :: AddressKeyType
runAddressKeyGenToFile :: AddressKeyType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent/alignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ::

AddressKeyShelleyExtended -> generateAndWriteKeyFiles AsPaymentExtendedKey vkf skf
AddressKeyByron -> generateAndWriteKeyFiles AsByronKey vkf skf

generateAndWriteKeyFiles :: Key keyrole
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent/alignment of ::?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ::

generateAndWriteKeyFiles asType vkf skf = do
uncurry (writePaymentKeyFiles vkf skf) =<< liftIO (generatePaymentKeys asType)

writePaymentKeyFiles :: Key keyrole
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent/alignment of ::?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ::

start genDlgs mNonDlgAmount (length nonDelegAddrs) nonDelegAddrs stakePools stake
stDlgAmount numDelegations delegAddrs stuffedUtxoAddrs template

-- shelleyGenesis contains lazy loaded data, so using lazyToJson to serialise to avoid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to delete this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

StakeVerificationKey stakeVK <- firstExceptT ShelleyGenesisCmdTextEnvReadFileError
. newExceptT
$ readFileTextEnvelope (AsVerificationKey AsStakeKey) stakeVKF
computeInsecureDelegation ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a comment here to indicate that the keys are not cryptographically generated and we don't care because this is for testing purposes. computeInsecureDelegation is a strange name. Why not generateTestDelegation or something to that effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1229,4 +1282,3 @@ readAlonzoGenesis fpath = do
lbs <- handleIOExceptT (ShelleyGenesisCmdGenesisFileError . FileIOError fpath) $ LBS.readFile fpath
firstExceptT (ShelleyGenesisCmdAesonDecodeError fpath . Text.pack)
. hoistEither $ Aeson.eitherDecode' lbs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your editor deleting newlines at the end of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

skeyDesc, vkeyDesc :: TextEnvelopeDescr
skeyDesc = "Stake Signing Key"
vkeyDesc = "Stake Verification Key"
runStakeAddressKeyGenToFile ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent/alignment of ::?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ::

-- Genesis staking: pools/delegation map & delegated initial UTxO spec:
-> [(Ledger.KeyHash 'Ledger.StakePool StandardCrypto, Ledger.PoolParams StandardCrypto)]
-> [(Ledger.KeyHash 'Ledger.Staking StandardCrypto, Ledger.KeyHash 'Ledger.StakePool StandardCrypto)]
-> Lovelace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add haddocks for these parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the haddocks

{ dInitialUtxoAddr = shelleyAddressInEra initialUtxoAddr
, dDelegStaking = Ledger.hashKey stakeVK
, dDelegStaking = Ledger.hashKey (unStakeVerificationKey stakeVK)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of unStakeVerificationKey

@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch 3 times, most recently from 46db29c to 5be4f73 Compare July 7, 2022 13:16
@newhoggy newhoggy dismissed Jimbo4350’s stale review July 7, 2022 13:21

Comments addressed

@newhoggy newhoggy requested review from Jimbo4350 and lehins and removed request for Jimbo4350 July 7, 2022 13:21
Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@newhoggy
Copy link
Contributor Author

newhoggy commented Jul 8, 2022

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jul 8, 2022

Merge conflict.

@newhoggy newhoggy force-pushed the newhoggy/reduce-memory-usage-of-create-staked-command branch from 5be4f73 to cc24684 Compare July 11, 2022 00:29
@newhoggy
Copy link
Contributor Author

bors r+

iohk-bors bot added a commit that referenced this pull request Jul 11, 2022
4021: Reduce memory usage of create staked command r=newhoggy a=newhoggy

The problem arises we build a very large in-memory JSON value and then write it out as `genesis.json`.  The amount of memory used by the in-memory JSON value can be so large as to use up all the memory.

This PR reduces the severity of the memory usage in two ways.

1. It introduces a `ListMap` type, which is almost like `Map`, but has a `[(k, v)]` as its internal representation.  This avoids the serialisation cost of constructing a map only to convert it back into a list.

2. Uses Lazy IO so that generated stuffed utxos are created on demand rather than upfront an all in memory.

3. Introduces a new `LazyToJson` type class which doesn't have the memory retention problems of `aeson` library.

4. For evaluation of fields to `WHNF` to allow parent object to be GCed which allows large fields that have already be serialised to be collected as well.

5. Writes delegations to a single `delegations.jsonl` file which is a newline delimited JSON file.  This file is streamed multiple times so that generation of the `genesis.json` file does not retain memory unnecessarily.

This PR also changes the command to no longer generate payment keys and stake keys.  If we want to have the ability optionally output these files, there is additional work to do.

Addresses #3938

Co-authored-by: John Ky <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jul 11, 2022

This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried.

Additional information:

{"message":"Waiting on code owner review from JaredCorduan, Jimbo4350, dcoutts, and/or erikd.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

@deepfire
Copy link
Contributor

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jul 11, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants