Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(standalone): Storage backend #375

Merged
merged 1 commit into from
Nov 25, 2021

Conversation

birchmd
Copy link
Member

@birchmd birchmd commented Nov 23, 2021

Lays out the necessary logic for the rocksdb backend. This implementation follows @mfornet 's suggestion (using labelled keys instead of snapshots + diffs). Diffs are still persisted because they might be useful for other things.

I've added some basic tests, and I'll write more before this PR gets merged. But it's a relatively large PR so I wanted to get it out there to give others time to review it. Edit: I'm happy with the test coverage now.

I also need to fix some clippy warnings that popped up because I wanted to import test utils into the standalone crate. Though maybe the better way to do it would be to put standalone tests into the engine-tests crate. Edit: moved standalone-storage into its own crate and put the tests for it in engine-tests; this will also enable correctness checks by running the same transaction under both the wasm contract and the standalone backed by the new storage layer.

@birchmd birchmd added A-storage Area: Any changes that affect storage. A-standalone Area: the standalone engine EVM labels Nov 23, 2021
@birchmd birchmd linked an issue Nov 23, 2021 that may be closed by this pull request
Copy link
Contributor

@joshuajbouw joshuajbouw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of changes. All in all, looks correct as far as I can tell right now. Good start!

engine-standalone-storage/Cargo.toml Outdated Show resolved Hide resolved
engine-standalone-storage/src/engine_state.rs Show resolved Hide resolved
engine-standalone-storage/src/engine_state.rs Outdated Show resolved Hide resolved
engine-standalone-storage/src/lib.rs Outdated Show resolved Hide resolved
@joshuajbouw
Copy link
Contributor

The three storage lifetimes, thats more of a optimisation to prevent cloning of large storage states I take it?

Copy link
Contributor

@mfornet mfornet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation doesn't allow to iterate through ranges in the storage. One of the suggestions made in the notes were:

Maintain the current state indexed by an empty key. This can provide range queries for the latest storage only.

Do we plan to support this as well?

engine-standalone-storage/src/lib.rs Outdated Show resolved Hide resolved
engine-standalone-storage/src/lib.rs Outdated Show resolved Hide resolved
engine-standalone-storage/src/lib.rs Show resolved Hide resolved
Comment on lines +111 to +119
fn write_storage(&mut self, key: &[u8], value: &[u8]) -> Option<Self::StorageValue> {
let original_value = self.read_storage(key);

self.transaction_diff
.borrow_mut()
.modify(key.to_vec(), value.to_vec());

original_value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow some option to write to storage without reading evicted value! I think this feature is not always necessary, and reading from storage can be slow if the value is not in the diff right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with remove_storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is part of IO trait, and probably that involve a bigger change.

Don't take this comment as blocking, just something that passed through my mind while reading the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface for the IO trait is based on the one from NEAR itself where write also takes a register to store the old value.

I agree the current implementation is more overhead when the old value is not used. An alternative implementation is the following:

We add another variant to EngineStorageValue called LazyDBRead which holds the key. And its AsRef implementation actually performs the read to return the value. Then for write_storage and remove_storage we check if the key is in the diff, if so we capture the value as we do now, but if not this LazyDBRead is returned insteasd. This means no DB read is done unless the returned value is used. It works because we do not modify the DB during the lifetime of this IO instance (writes only change the Diff), so we can technically perform the write before reading the old value from the DB.

What do you think about that? It is more efficient (writes only require allocating memory to store the key, not DB reads unless the value is used), but more complex also. I'm not sure what level of optimization we need for the standalone engine. It probably has an inherent advantage in terms of performance because it does not require wasm interpretation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent suggestion, though not strictly necessary for now. I'm ok skipping it.

engine-standalone-storage/src/engine_state.rs Show resolved Hide resolved
@birchmd
Copy link
Member Author

birchmd commented Nov 24, 2021

The current implementation doesn't allow to iterate through ranges in the storage. One of the suggestions made in the notes were:

Maintain the current state indexed by an empty key. This can provide range queries for the latest storage only.

Do we plan to support this as well?

I don't think I will support this. Having a "hot" state that is updated with every block seems like it will make it hard to get a consistent view of the state. I think if we need to do range queries it should be possible the get the relevant data from diffs alone. It might be a high latency operation, but it seems like use cases requiring range queries will be infrequent and not require instantaneous responses.

@birchmd
Copy link
Member Author

birchmd commented Nov 24, 2021

The three storage lifetimes, thats more of a optimisation to prevent cloning of large storage states I take it?

The lifetimes are needed because the engine currently requires IO + Copy on its storage access trait bound. None of the rust smart pointers (Box, Rc, Arc, etc) implement Copy, so regular references needed to be used. Semantically there are three different references involved, with potentially distinct origins, and therefore three lifetimes. That said, in practice the implementation ends up with all three lifetimes being the same so we could simplify to a single lifetime. I kept the lifetimes independent though because I think it is an implementation detail that they all happen to come out the same and in the future we may wish to change the implementation such that the lifetimes are different.

@birchmd
Copy link
Member Author

birchmd commented Nov 24, 2021

@joshuajbouw @mfornet Thanks for the reviews! I've addressed all the comments. I've also added the last test I wanted to include for this feature. I think this PR is ready to merge once I have your approvals.

@mfornet mfornet self-requested a review November 25, 2021 11:14
Copy link
Contributor

@joshuajbouw joshuajbouw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note, I believe there are a few optimisations here. But, we need to get this over the edge and its fine for now.

Specifically with write_storage, we require .to_vec(), and taking in owned values as a reference even though we are taking ownership of them. Likewise, write_storage_direct shouldn't exist.

engine-standalone-storage/src/diff.rs Outdated Show resolved Hide resolved
@birchmd
Copy link
Member Author

birchmd commented Nov 25, 2021

Just note, I believe there are a few optimisations here. But, we need to get this over the edge and its fine for now.

Specifically with write_storage, we require .to_vec(), and taking in owned values as a reference even though we are taking ownership of them. Likewise, write_storage_direct shouldn't exist.

write_storage_direct makes sense in the NEAR-based IO implementation as a performance optimization because it means we don't need to copy values from a register just to send it back to the register again (the StorageValue type is a register ID). In this implementation I agree it doesn't add any value. But I also think it's harm is pretty minimal, so I doubt it'll cause any performance issue. If it does then the implementation can be adjusted to reduce cloning bytes.

@birchmd birchmd merged commit 5bc72a4 into aurora-is-near:develop Nov 25, 2021
@birchmd birchmd deleted the standalone-storage branch November 25, 2021 18:47
artob pushed a commit that referenced this pull request Dec 10, 2021
* Feat(engine): London hard fork support (#244)
* Fix(exit precompile): Address to refund in case of error is an argument (#311)
* Feat(engine): Make engine parametric in storage access (#314)
* Test verifying the EVM log returns the correct address (#341)
* Remove sdk::current_account_id usage from engine-precompiles (#346)
* Remove Default trait bound from engine IO (#342)
* Remove some sdk usage from core logic (#347)
* Factor out blockchain environment variable access as a trait (#349)
* Factor out NEAR promise host functions into a trait (#353)
* Borsh deserialized value field for call args (#351)
* Refactor(eth-connector): Use Result return values instead of panicking (#355)
* Gate all NEAR host functions behind the contract feature (#356)
* Bump @openzeppelin/contracts from 4.3.2 to 4.3.3 in /etc/eth-contracts
* Chore: Newtypes for gas (#344)
* Feat(standalone): Standalone (#345)
* Minor fixes to sdk refactor (#359)
* Refactor(engine): Move submit logic into engine module (#366)
* Feat(standalone): Storage backend (#375)
* NEAR random numbers from solidity contract (#368)
* Feat(standalone): EVM tracing via SputnikVM (#383)
* Feat(standalone): Bootstrap storage from relayer and state snapshots (#379)
* Feat(standalone): Structures and logic for keeping storage in sync with the blockchain (#382)
* Feat(standalone): Capture geth-like tracing from SputnikVM events (#384)
* Connector cleanup (#374)
* Remove betanet
* Fix(engine): original_storage bug fix; more tracing tests (#390)
* Increase NEAR Gas for ft_on_transfer (#389)

Co-authored-by: Andrew Bednoff <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Evgeny Ukhanov <[email protected]>
Co-authored-by: Marcelo Fornet <[email protected]>
Co-authored-by: Michael Birch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-standalone Area: the standalone engine EVM A-storage Area: Any changes that affect storage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standalone storage
3 participants