Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation artifacts are altered by JSON deserialization - resulting in a mismatch with the bytecode's appended IPFS hash #1325

Closed
joaquim-verges opened this issue May 29, 2022 · 12 comments · Fixed by #1365
Labels
bug Something isn't working

Comments

@joaquim-verges
Copy link

joaquim-verges commented May 29, 2022

Version
master branch, building from source

Platform
Darwin joaquims-mbp.lan 21.3.0 Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T6000 arm64

Description

When running forge build --extra-output metadata, with bytecode_hash: "ipfs", I encountered an unexpected mismatch when comparing the metadata IPFS hash from the bytecode and the IPFS hash of the metadata outputted by forge.

I first thought it might be because of absolute vs relative paths, filed foundry-rs/foundry#1738 which got resolved, but even with that fix the IPFS don't match.

I kept investigating and found the root of the issue: the raw solc compilation output, particularly the metadata is being deserialized by ethers-rs and then serialized again when writing the final artifacts to disk.

This causes multiple alterations to the original metadata output from solc. Here's the ones I've confirmed:

1. The abi in metadata.output.abi is getting all empty inputs stripped, and indexed is missing from all event properties.

The empty inputs being stripped is caused by the json deserialization policy:

#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct SolcAbi {
  #[serde(default, skip_serializing_if = "Vec::is_empty")]
  pub inputs: Vec<Item>,
  ...

This can easily be verified by compiling a contract with an empty constructor with forge. The abi in the metadata will be missing the inputs property for the constructor, whereas the top level abi (the one that always get outputted by forge) will have it.

I believe that SolcAbi model is missing the indexed optional property as well, causing wrong event ABIs.

2. When the contract has no libraries, the libraries: {} entry in the metadata gets stripped.

Similarly, when deserializing the settings entry in the metadata, the JSON policy strips out any empty objects:

#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct Settings {
  ...
   #[serde(default, skip_serializing_if = "Libraries::is_empty")]
   pub libraries: Libraries,
}

Because of that, the metadata output is altered from its original content.

3. The metadata output contains the key outputSelection which is not in the solidity metadata spec

This is due to the re-use of pub struct Settings for representing both solc inputs AND outputs. But in the spec, outputSelection is only an input field, and should not be in the finale metadata output.

4. Deserializing and serializing the compiler output shuffles the keys in the final output JSON

The compiler outputs metadata with keys in alphabetical order, but because of the deserialization and serialization process, those keys get shuffled, resulting in a different JSON than the one that was generated by the compiler.


These are the immediate issues I found while testing with a simple contract, I believe there might be more.

My recommendation would be to avoid deserializing / serializing the compiler metadata output, and instead just output the raw string that is produced by the compiler without altering it. This ensures that the produced bytecode IPFS hash matches the exact metadata contents that gets written in the final artifact files.

I believe that if end users want to do something with the content of the metadata, it is trivial for them to parse the raw string into JSON and do what they need to do. It should not be the responsibility of ethers-rs or forge to format that into JSON, especially since it breaks the guarantee that this metadata matches the bytecode.

For what it's worth, both hardhat and truffle do exactly this, in their build output there is a metadata entry that is just a raw, escaped JSON string, taken straight from the solc output. Without any alteration or formatting, like this:

{
  ... rest of the artifact output
  "metadata": "{\"compiler\":{\"version\":\"0.8.13+commit.abaa5c0e\"},\"language\":\"Solidity\",...}"
 }

Let me know if that makes sense, I'd be happy to try and do these changes in a PR.

@mattsse
Copy link
Collaborator

mattsse commented May 30, 2022

thanks for looking into this,

not sure what the implications of emitting the metadata as string would be, whether this would break other tools. but simply emitting this as-is would make sense to me, given that solc does the same.

I fixed some issues you raised (1-3) in this PR #1326 but in order to make this a string we would simply need to change the type of metadata to String for the Contract::metadata field

@gakonst
Copy link
Owner

gakonst commented May 31, 2022

RE: 4, should we do something like this instead, so that fields are always serialized alphabetically? https://stackoverflow.com/a/67792465

@joaquim-verges
Copy link
Author

joaquim-verges commented May 31, 2022

@mattsse @gakonst I do think emitting the raw string from solc is the safest choice, since there might be other serialization issues im missed. Could be done in a separate field (something like solcMetadata or rawMetadata) if you're worried about backwards compatibility.

I guess as long as there are some unit tests that check something like serialize(parsedMetadata) == rawSolcMetadata for various type of contracts, then it would be ok?

Still feels unnecessary and error-prone to parse the solc metadata just to re-serialize it again, especially since it's trivial for any consumer to deserialize JSON on their end. I don't think forge/ethers-rs itself does anything with the parsed metadata right?

@gakonst
Copy link
Owner

gakonst commented Jun 1, 2022

Cool - I'm personally fine with adding a raw meta field on top of our already typed ones, wdyt @mattsse?

@tfalencar
Copy link

tfalencar commented Jun 3, 2022

Unsure if related (appears to be), but between foundry --version 0.2.0 (85f69d9 2022-05-09T00:04:25.613861+00:00) and foundry --version 0.2.0 (0342bc2 2022-06-03T00:04:00.330112Z) I had a regression in specific solidity files which don't contain Contract definitions (for example, a .sol file with only constants declared).

In such cases, the ABI that was generated formerly would look like:
{ "abi": [], ... }
whereas in the later version, the following is generated:
{ "abi": null, .... }
This broke my toolchain when such json abi file is passed down to "typechain", with error:
"Error occured: Not a valid ABI"
since typechain doesn't handle the case where object is null (instead it understands an empty array):

https://github.com/dethcrypto/TypeChain/blob/e1f832c624524bb32518f76040b6fb48899f9be7/packages/typechain/src/parser/abiParser.ts#L379

Do I understand it correctly that this PR should fix it: https://github.com/gakonst/ethers-rs/pull/1326/files , or am I completely off? I'm happy to create a new issue if its a separate / unrelated problem.

@mattsse
Copy link
Collaborator

mattsse commented Jun 4, 2022

I had a regression in specific solidity files which don't contain Contract definitions

that was a recent change that affected files without contract definitions, looks like we're emitting the empty abi wrong in those cases.

should be an easy fix,

@mattsse
Copy link
Collaborator

mattsse commented Jun 4, 2022

Cool - I'm personally fine with adding a raw meta field on top of our already typed ones, wdyt @mattsse?

that makes sense, basically passing through the string solc emits

@joaquim-verges
Copy link
Author

@gakonst @mattsse I found another issue with devdoc and userdoc parsing inside the solc metadata.

pub struct Doc {
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub kind: Option<String>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub methods: Option<DocLibraries>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub version: Option<u32>,
}

The struct is missing the top level title and notice properties.

Any updates on having that raw metadata field directly from solc ?

@mattsse
Copy link
Collaborator

mattsse commented Jun 9, 2022

@0xYYY mind taking a look at this? not sure if this was resolved with your latest PR

@joaquim-verges
Copy link
Author

@gakonst @mattsse thank you for adding the raw metadata field!

do these ethers-rs updates get picked up on forge nightly builds? trying to estimate when I can migrate to using this field

@mattsse
Copy link
Collaborator

mattsse commented Jun 10, 2022

bumped here foundry-rs/foundry#1909

will be included in tonight's nightly

@joaquim-verges
Copy link
Author

@mattsse wonderful. Thank you for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants