Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Math operations using branchless bool to uint translation. #4878

Merged
merged 10 commits into from
Feb 7, 2024

Conversation

CodeSandwich
Copy link
Contributor

@CodeSandwich CodeSandwich commented Feb 5, 2024

This PR makes the Math.log2 function branchless. I benchmarked it using Solidity 0.8.24, via-ir and 1B optimizer runs.

Here are the results.
Old gas usage New gas usage Input value
273 218 0
273 218 1
300 218 2
300 218 3
304 218 4
304 218 5
304 218 6
304 218 7
331 218 8
331 218 9
331 218 10
331 218 11
331 218 12
331 218 13
331 218 14
331 218 15
304 218 16
304 218 17
304 218 18
304 218 19
304 218 20
304 218 21
304 218 22
304 218 23
304 218 24
304 218 25
304 218 26
304 218 27
304 218 28
304 218 29
304 218 30
304 218 31
331 218 32
335 218 100
362 218 200
304 218 500
362 218 10000
366 218 20000
393 218 50000
362 218 1000000
335 218 2000000
366 218 5000000
366 218 100000000
393 218 200000000
366 218 500000000
331 218 10000000000
335 218 20000000000
362 218 50000000000
393 218 1000000000000
335 218 2000000000000
366 218 5000000000000
397 218 100000000000000
424 218 200000000000000
335 218 500000000000000
393 218 10000000000000000
397 218 20000000000000000
424 218 50000000000000000
424 218 1000000000000000000
397 218 2000000000000000000
428 218 5000000000000000000
336 218 100000000000000000000
363 218 200000000000000000000
336 218 500000000000000000000
363 218 10000000000000000000000
367 218 20000000000000000000000
394 218 50000000000000000000000
425 218 1000000000000000000000000
336 218 2000000000000000000000000
367 218 5000000000000000000000000
398 218 100000000000000000000000000
425 218 200000000000000000000000000
367 218 500000000000000000000000000
425 218 10000000000000000000000000000
429 218 20000000000000000000000000000
456 218 50000000000000000000000000000
394 218 1000000000000000000000000000000
367 218 2000000000000000000000000000000
394 218 5000000000000000000000000000000
398 218 100000000000000000000000000000000
425 218 200000000000000000000000000000000
398 218 500000000000000000000000000000000
367 218 10000000000000000000000000000000000
394 218 20000000000000000000000000000000000
425 218 50000000000000000000000000000000000
456 218 1000000000000000000000000000000000000
398 218 2000000000000000000000000000000000000
425 218 5000000000000000000000000000000000000
460 218 100000000000000000000000000000000000000
487 218 200000000000000000000000000000000000000
300 218 500000000000000000000000000000000000000
331 218 10000000000000000000000000000000000000000
358 218 20000000000000000000000000000000000000000
389 218 50000000000000000000000000000000000000000
389 218 1000000000000000000000000000000000000000000
362 218 2000000000000000000000000000000000000000000
389 218 5000000000000000000000000000000000000000000
362 218 100000000000000000000000000000000000000000000
389 218 200000000000000000000000000000000000000000000
362 218 500000000000000000000000000000000000000000000
362 218 10000000000000000000000000000000000000000000000
389 218 20000000000000000000000000000000000000000000000
420 218 50000000000000000000000000000000000000000000000
451 218 1000000000000000000000000000000000000000000000000
331 218 2000000000000000000000000000000000000000000000000
358 218 5000000000000000000000000000000000000000000000000
393 218 100000000000000000000000000000000000000000000000000
420 218 200000000000000000000000000000000000000000000000000
362 218 500000000000000000000000000000000000000000000000000
393 218 10000000000000000000000000000000000000000000000000000
420 218 20000000000000000000000000000000000000000000000000000
451 218 50000000000000000000000000000000000000000000000000000
420 218 1000000000000000000000000000000000000000000000000000000
393 218 2000000000000000000000000000000000000000000000000000000
420 218 5000000000000000000000000000000000000000000000000000000
424 218 100000000000000000000000000000000000000000000000000000000
451 218 200000000000000000000000000000000000000000000000000000000
424 218 500000000000000000000000000000000000000000000000000000000
332 218 10000000000000000000000000000000000000000000000000000000000
359 218 20000000000000000000000000000000000000000000000000000000000
363 218 50000000000000000000000000000000000000000000000000000000000
421 218 1000000000000000000000000000000000000000000000000000000000000
363 218 2000000000000000000000000000000000000000000000000000000000000
390 218 5000000000000000000000000000000000000000000000000000000000000
421 218 100000000000000000000000000000000000000000000000000000000000000
425 218 200000000000000000000000000000000000000000000000000000000000000
363 218 500000000000000000000000000000000000000000000000000000000000000
394 218 10000000000000000000000000000000000000000000000000000000000000000
421 218 20000000000000000000000000000000000000000000000000000000000000000
425 218 50000000000000000000000000000000000000000000000000000000000000000
452 218 1000000000000000000000000000000000000000000000000000000000000000000
425 218 2000000000000000000000000000000000000000000000000000000000000000000
452 218 5000000000000000000000000000000000000000000000000000000000000000000
390 218 100000000000000000000000000000000000000000000000000000000000000000000
394 218 200000000000000000000000000000000000000000000000000000000000000000000
394 218 500000000000000000000000000000000000000000000000000000000000000000000
394 218 10000000000000000000000000000000000000000000000000000000000000000000000
421 218 20000000000000000000000000000000000000000000000000000000000000000000000
425 218 50000000000000000000000000000000000000000000000000000000000000000000000
483 218 1000000000000000000000000000000000000000000000000000000000000000000000000
394 218 2000000000000000000000000000000000000000000000000000000000000000000000000
421 218 5000000000000000000000000000000000000000000000000000000000000000000000000
452 218 100000000000000000000000000000000000000000000000000000000000000000000000000
456 218 200000000000000000000000000000000000000000000000000000000000000000000000000
425 218 500000000000000000000000000000000000000000000000000000000000000000000000000
456 218 10000000000000000000000000000000000000000000000000000000000000000000000000000
483 218 20000000000000000000000000000000000000000000000000000000000000000000000000000
487 218 50000000000000000000000000000000000000000000000000000000000000000000000000000
479 233 100000000000000000000000000000000000000000000000000000000000000000000000000000
491 230 115792089237316195423570985008687907853269984665640564039457584007913129639935

PR Checklist

  • Tests
  • Documentation
  • Changeset entry (run npx changeset add)

Copy link

changeset-bot bot commented Feb 5, 2024

🦋 Changeset detected

Latest commit: b0ad358

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
openzeppelin-solidity Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@Amxx
Copy link
Collaborator

Amxx commented Feb 6, 2024

Hello @CodeSandwich for this proposal.

@ernestognw & @frangio, how do you feel about this kind of assembly. We used to go for readability first, but it feels like we are doing assembly more and more (for gas costs). I have mixed feelings. On the one hand, I can see how effective it can be in math libraries. On the other, I don't want us to start doing assembly everywhere.

@Amxx
Copy link
Collaborator

Amxx commented Feb 6, 2024

Note: if we decide to do this change, then I think we should also do it for log10 and log256.

@frangio
Copy link
Contributor

frangio commented Feb 6, 2024

This is interesting, I don't really see this PR as significantly increasing the use of assembly. The only reason that assembly is used here is for an integer-valued "greater than" operation returning 0/1. Solidity has no branchless way of casting a boolean to an integer in this way. This operation could be written as a separate Solidity function to keep the function body free of assembly.

The question is if this PR increases the complexity of the function significantly, and I don't think it does. The gas improvement is not negligible either.

@ernestognw
Copy link
Member

ernestognw commented Feb 6, 2024

The amount of assembly seems reasonable.

Also, gt is one of those opcodes that doesn't need any "fancy" parameter like a memory offset or storage location. It actually looks clean:

assembly {
    isGt := gt(value, x)
}

Alternatively, the gt can be abstracted in a private function:

function _gt(uint256 a, uint256 b) private pure returns(uint256 isGt) {
    assembly {
        isGt := gt(a, b)
    }
}

function log2(uint256 value) internal pure returns (uint256 result) {
    unchecked {
        uint256 isGt = _gt(value,0xffffffffffffffffffffffffffffffff);
        value >>= isGt * 128;
        result += isGt * 128;
        
        isGt = _gt(value, 0xffffffffffffffff);
        value >>= isGt * 64;
        result += isGt * 64;

        isGt = _gt(value, 0xffffffff);
        value >>= isGt * 32;
        result += isGt * 32;

        ...
}

@Amxx
Copy link
Collaborator

Amxx commented Feb 6, 2024

What about a boolToUint helper

function _boolToUint(bool b) private pure returns (uint256 u) {
    assembly { u := b }
}

function log2(uint256 value) internal pure returns (uint256 result) {
    unchecked {
        uint256 isGt = _boolToUint(value > 0xffffffffffffffffffffffffffffffff);
        value >>= isGt * 128;
        result += isGt * 128;
        
        isGt = _boolToUint(value > 0xffffffffffffffff);
        value >>= isGt * 64;
        result += isGt * 64;

        isGt = _boolToUint(value > 0xffffffff);
        value >>= isGt * 32;
        result += isGt * 32;

        ...
}

?

Note: maybe the return type should be uint8 which should automatically (and safelly?) be upcasted

@ernestognw
Copy link
Member

What about a boolToUint helper
?

That also seems fine. I still see no issue with the assembly version, so I'd reconsider using it if there's a significant difference in gas (shouldn't be though).

@ernestognw
Copy link
Member

I'm trying to run a snapshot myself with the same parameters as @CodeSandwich:

forge snapshot --via-ir --optimizer-runs 1000000000 --use 0.8.24 --gas-report --match-test testLog2

But I'm getting a nasty error:

[⠒] Compiling...
[⠰] Compiling 257 files with 0.8.24
[⠑] Solc 0.8.24 finished in 46.75s
Error:
Compiler run failed:
Error: Yul exception:Variable var_assets is 1 too deep in the stack [ RET var_assets _3 expr_2 _5 _2 expr_1 _7 expr _8 cleaned var_caller _6 _16 _12 var_receiver _20 _21 ]
memoryguard was present.
memoryguard was present

We should benchmark both the _gt() and the _boolToUint() versions. I'll try back later with more time to investigate.

@CodeSandwich
Copy link
Contributor Author

But I'm getting a nasty error:

This is a known, common compiler error when using Solidity with via-ir on larger projects. I haven't been compiling the entire OZ to do the benchmarks, I created a separate contract with a copy of the original log2 and the modified version, and only run regular forge test on the modified version of OZ, without via-ir.

Here's the code I used for benchmarking:
// SPDX-License-Identifier: GPL-3.0-only
pragma solidity ^0.8.23;

import {Test, console} from "forge-std/Test.sol";

contract Log2Test is Test {
    function log2Old(uint256 value) internal pure returns (uint256) {
        uint256 result = 0;
        unchecked {
            if (value >> 128 > 0) {
                value >>= 128;
                result += 128;
            }
            if (value >> 64 > 0) {
                value >>= 64;
                result += 64;
            }
            if (value >> 32 > 0) {
                value >>= 32;
                result += 32;
            }
            if (value >> 16 > 0) {
                value >>= 16;
                result += 16;
            }
            if (value >> 8 > 0) {
                value >>= 8;
                result += 8;
            }
            if (value >> 4 > 0) {
                value >>= 4;
                result += 4;
            }
            if (value >> 2 > 0) {
                value >>= 2;
                result += 2;
            }
            if (value >> 1 > 0) {
                result += 1;
            }
        }
        return result;
    }

    function log2(uint256 value) internal pure returns (uint256 result) {
        unchecked {
            uint256 isGt;

            assembly {
                isGt := gt(value, 0xffffffffffffffffffffffffffffffff)
            }
            value >>= isGt * 128;
            result += isGt * 128;

            assembly {
                isGt := gt(value, 0xffffffffffffffff)
            }
            value >>= isGt * 64;
            result += isGt * 64;

            assembly {
                isGt := gt(value, 0xffffffff)
            }
            value >>= isGt * 32;
            result += isGt * 32;

            assembly {
                isGt := gt(value, 0xffff)
            }
            value >>= isGt * 16;
            result += isGt * 16;

            assembly {
                isGt := gt(value, 0xff)
            }
            value >>= isGt * 8;
            result += isGt * 8;

            assembly {
                isGt := gt(value, 0xf)
            }
            value >>= isGt * 4;
            result += isGt * 4;

            assembly {
                isGt := gt(value, 0x3)
            }
            value >>= isGt * 2;
            result += isGt * 2;

            assembly {
                isGt := gt(value, 0x1)
            }
            result += isGt;
        }
    }

    uint256 zero = 0;

    function bench(uint256 x) public {
        unchecked {
            x += zero;
        }
        uint256 gasOld = gasleft();
        uint256 oldY = log2Old(x);
        gasOld -= gasleft();
        uint256 gasNew = gasleft();
        uint256 newY = log2(x);
        gasNew -= gasleft();
        console.log("Gas old : gas new : input", gasOld, gasNew, x);
        if (zero > 0) console.log(oldY, newY);
        assertEq(oldY, newY, "Invalid result");
    }

    function testBench() public {
        for (uint256 i = 0; i < 33; i++) {
            bench(i);
        }
        for (uint256 i = 1; i < 39; i++) {
            bench(100 ** i);
            bench(100 ** i * 2);
            bench(100 ** i * 5);
        }
        bench(10 ** 77);
        bench(type(uint256).max);
    }
}

The bool-to-uint helper is a brilliant idea, it produced 0 overhead and is very elegant, I'll push a commit using it:

function boolToUint(bool b) internal pure returns(uint256 u) {
    assembly {
        u := b
    }
}

It's important to return uint256, uint8 for some reason adds an overhead of about 6 gas, probably Solidity doesn't trust YUL and cleans up the upper bits, that would fit the gas cost.

@Amxx
Copy link
Collaborator

Amxx commented Feb 6, 2024

I've added boolToUint and used it in a few places

@CodeSandwich
Copy link
Contributor Author

CodeSandwich commented Feb 6, 2024

I see that you added the changes yourself. I did that too, with some clean-ups for readability: CodeSandwich@1d7f601, please take a look at them, IMO they are worth applying. I pushed the clean-ups, feel free to revert them.

I think that there's a very important caveat I highlighted in the docs for boolToUint, that Solidity sometimes keeps the higher bits dirty because for an if a 2 is as true as a 1, and YUL will just pass them to the result. I think that this makes boolToUint very dangerous, it should be private and used with caution.

There could be 2 versions, a private (the OZ API seems pitfall-averse) boolToUintUnchecked and an internal boolToUint that does isZero(isZero(b)), but at that point it may not be faster than just a b ? 1 : 0.


Actually the safe boolToUint creates no overhead for log2, it seems that the compiler can catch the safe conversions and remove the unneeded iszeros. b ? 1 : 0 creates a large overhead.

function boolToUint(bool b) internal pure returns (uint256 u) {
    assembly {
        u := iszero(iszero(b))
    }
}

@Amxx Amxx changed the title Make Math.log2 branchless Optimize Math operations using branchless bool to uint translation. Feb 6, 2024
Amxx
Amxx previously approved these changes Feb 6, 2024
Copy link
Collaborator

@Amxx Amxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks pretty good to me.

Need a second review from @ernestognw

@Amxx Amxx requested a review from ernestognw February 6, 2024 20:24
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I'd like to reconsider the boolToUint function

Comment on lines 545 to 550
function boolToUint(bool b) internal pure returns (uint256 u) {
/// @solidity memory-safe-assembly
assembly {
u := b
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me more of a fit into the SafeCast library with the function toUint(bool b) signature as it already requires a boolean as an argument.

Copy link
Contributor Author

@CodeSandwich CodeSandwich Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally fine with this, but does it fit the other functions from SafeCast? IIUC the main goal of this library is to make dangerous operations and throw in case of an overflow. In toUint256(bool) there's no overflow possible, there's nothing "safe" about this SafeCast function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's right, I don't have the last word but I feel confident that SafeCast is a better fit than Math.

We might consider having these unconventional casts in a different library but that's a broader discussion.

@ernestognw
Copy link
Member

I pushed a commit moving the boolToUint function to SafeCast since I feel strong it shouldn't be in the Math library.
Regardless, @CodeSandwich raised a good point about the intention of the SafeCast library.

I still feel it's a better fit in SafeCast but I'm not sure if that's the best fit.
Thoughts @Amxx?

If this takes too long. I'm down for keeping the function private for now. The intention of this PR was to Optimize Math#log2 but not to add a new function to the library. I think we're accomplishing the goal and we might give it a better thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants