Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility function for converting an address to checksummed string #5067

Merged
merged 9 commits into from
Jun 4, 2024

Conversation

cairoeth
Copy link
Contributor

@cairoeth cairoeth commented Jun 3, 2024

Closes #4633

PR Checklist

  • Tests
  • Documentation
  • Changeset entry (run npx changeset add)

@cairoeth cairoeth added this to the 5.1 milestone Jun 3, 2024
@cairoeth cairoeth requested review from Amxx and ernestognw June 3, 2024 18:14
Copy link

changeset-bot bot commented Jun 3, 2024

🦋 Changeset detected

Latest commit: 8f11b32

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
openzeppelin-solidity Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cairoeth
Copy link
Contributor Author

cairoeth commented Jun 3, 2024

There's some re-used logic here from toHexString but the main difference is removing the 0x from the string. toHexString adds 0x to the buffer by default, so to use this function it would require removing it form the returned string which would be extra computation just for reusability.

Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to come up with a way of reusing the logic of toHexString and I noticed some old behaviors we've kept in the library like providing the expected length of the hex string (which we already know by Math.log256(value) + 1).

Perhaps this PR derives in a broader discussion about how to implement this, but I'd say the simplest option is to rewrite toHexString in the following way:

function toHexString(uint256 value, uint256 length) internal pure returns (string memory) {
    if (length < Math.log256(value) + 1) {
        revert StringsInsufficientHexLength(value, length);
    }

    bytes memory buffer = new bytes(2 * length + 2);
    buffer[0] = "0";
    buffer[1] = "x";
    _setHexString(buffer, 2, value);

    return string(buffer);
}

function _setHexString(bytes memory buffer, uint256 offset, uint256 value) internal pure {
    for (uint256 i = buffer.length - 1; i >= offset; --i) {
        buffer[i] = HEX_DIGITS[value & 0xf];
        value >>= 4;
    }
}

This way, we can reuse _setHexString in the new function. Similar to:

function toChecksumHexString(address addr) internal pure returns (string memory) {
    bytes memory lowercase = new bytes(40);
    _setHexString(lowercase, 0, uint160(address);
    bytes32 hashedAddr = keccak256(abi.encodePacked(lowercase));
    ...
}

What do you think?

@cairoeth
Copy link
Contributor Author

cairoeth commented Jun 3, 2024

@ernestognw Yes, I was thinking something like that as well -- should we should make _setHexString private as it's not designed to be used outside the library (mainly because it doesn't have any checks on value after the operation)?

@cairoeth cairoeth requested a review from ernestognw June 3, 2024 21:44
contracts/utils/Strings.sol Outdated Show resolved Hide resolved
contracts/utils/Strings.sol Outdated Show resolved Hide resolved
contracts/utils/Strings.sol Outdated Show resolved Hide resolved
contracts/utils/Strings.sol Outdated Show resolved Hide resolved
contracts/utils/Strings.sol Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a rust implementation of this checksum algorithm in Foundry (as seen in cast), so it should be relatively trivial to make a PR and request for it to be exposed through VM.sol as with Base64.

With that, we can fuzz the implementation, which would be extremely valuable.
Not required for this PR though, but something to consider given that the changes we're making are somewhat relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally -- fuzzing should be used in some of the other utils as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, feel free to open PRs adding fuzzing or Halmos FV to those utils you consider make sense

Comment on lines 123 to 133
describe('toChecksumHexString address', function () {
it('converts a random address', async function () {
const addr = '0xa9036907dccae6a1e0033479b12e837e5cf5a02f';
expect(await this.mock.getFunction('$toChecksumHexString(address)')(addr)).to.equal(ethers.getAddress(addr));
});

it('converts an address with leading zeros', async function () {
const addr = '0x0000e0ca771e21bd00057f54a68c30d400000000';
expect(await this.mock.getFunction('$toChecksumHexString(address)')(addr)).to.equal(ethers.getAddress(addr));
});
});
Copy link
Member

@ernestognw ernestognw Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a little consideration in the documentation of getAddress:

  • If %%address%% contains both upper-case and lower-case, it is
  • assumed to already be a checksum address and its checksum is
  • validated, and if the address fails its expected checksum an
  • error is thrown.

None of these tests are using mixed-case letters so I'd recommend adding .toLowerCase() according to the same docs:

  • If you wish the checksum of %%address%% to be ignore, it should
  • be converted to lower-case (i.e. .toLowercase()) before
  • being passed in.

Even better, let's rewrite these tests:

const addresses = [...]

describe('toChecksumHexString address', function () {
  for (const addr of addresses) {
    it(`converts ${addr}`, async function () {
      expect(await this.mock.getFunction('$toChecksumHexString(address)')(addr.toLowerCase())).to.equal(
        ethers.getAddress(addr),
      );
    });
  }
});

I'm pushing a commit

ernestognw
ernestognw previously approved these changes Jun 3, 2024
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we'll need an approval from @Amxx imo

@@ -63,17 +64,15 @@ library Strings {
* @dev Converts a `uint256` to its ASCII `string` hexadecimal representation with fixed length.
*/
function toHexString(uint256 value, uint256 length) internal pure returns (string memory) {
uint256 localValue = value;
if (length < Math.log256(value) + 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computing the log is quite expensive. Checking at the end is less expensive (when everything works fine, which is the case we should optimize for). Any reason to not keep it?

contracts/utils/Strings.sol Outdated Show resolved Hide resolved
bytes memory lowercase = new bytes(40);
uint160 addrValue = uint160(addr);
_unsafeSetHexString(lowercase, 0, addrValue);
bytes32 hashedAddr = keccak256(abi.encodePacked(lowercase));
Copy link
Collaborator

@Amxx Amxx Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You often do abi.encodePacked(x) when x is already a bytes. (here and on line 107)

Used like this, abi.encodePacked is the identity function ... the output exactly match the input ... so it is not necessary. One thing that happens however, is that memory is allocated for this copy... there is also a copy loop (or a mcopy if we are lucky).

Anyway, this increasse costs and leaks memory, so we should avoid it!

Comment on lines 91 to 108
function toChecksumHexString(address addr) internal pure returns (string memory) {
bytes memory lowercase = new bytes(40);
uint160 addrValue = uint160(addr);
_unsafeSetHexString(lowercase, 0, addrValue);
bytes32 hashedAddr = keccak256(abi.encodePacked(lowercase));

bytes memory buffer = new bytes(42);
buffer[0] = "0";
buffer[1] = "x";
uint160 hashValue = uint160(bytes20(hashedAddr));
for (uint256 i = 41; i > 1; --i) {
uint8 digit = uint8(addrValue & 0xf);
buffer[i] = hashValue & 0xf > 7 ? HEX_DIGITS_UPPERCASE[digit] : HEX_DIGITS[digit];
addrValue >>= 4;
hashValue >>= 4;
}
return string(abi.encodePacked(buffer));
}
Copy link
Collaborator

@Amxx Amxx Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entier function should be done "in place". Allocating two buffers is a waste.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redesigned toChecksumHexString to avoid double allocation.

  • removed the need for _unsafeSetHexString
  • removed the need for HEX_DIGITS_UPPERCASE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right this is extremely cleaner. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great changes, thxs!


// hash the hex part of buffer (skip length + 2 bytes, length 40)
uint256 hashValue;
assembly ("memory-safe") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is safe because we know buffer is 42 bytes long.

@Amxx Amxx requested a review from ernestognw June 4, 2024 16:22
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again

@Amxx Amxx merged commit 337bfd5 into OpenZeppelin:master Jun 4, 2024
21 checks passed
Copy link

gitpoap-bot bot commented Jun 4, 2024

Congrats, your important contribution to this open-source project has earned you a GitPOAP!

GitPOAP: 2024 OpenZeppelin Contracts Contributor:

GitPOAP: 2024 OpenZeppelin Contracts Contributor GitPOAP Badge

Head to gitpoap.io & connect your GitHub account to mint!

Learn more about GitPOAPs here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Util function for converting an address to checksummed string
3 participants