Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Replace ParseHex with consteval ""_hex literals #30377

Merged
merged 10 commits into from
Aug 31, 2024

Conversation

hodlinator
Copy link
Contributor

@hodlinator hodlinator commented Jul 2, 2024

Motivation:

""_hex produces std::array<std::byte> as the momentum in the codebase is to use std::byte over uint8_t.

Also makes uint256 hex string constructor disallow uppercase hex digits. Discussed: #30560 (comment)

Surprisingly does not change the size of the Guix bitcoind binary (on x86_64-linux-gnu) by 1 single byte.

Spawned already merged PRs: #30436, #30482, #30532, #30560.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 2, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK l0rinc, ryanofsky, stickies-v
Stale ACK maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30765 (refactor: Extend CScript with operator<< for std::array by l0rinc)
  • #30618 (test: increase FromUserHex FUZZ and unit testing coverage by l0rinc)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)
  • #26593 (tracing: Only prepare tracepoint arguments when actually tracing by 0xB10C)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko
Copy link
Member

maflcko commented Jul 4, 2024

transaction_identifier.h - Fixed dormant bug in TxidFromString() where the string_view length wasn't respected(!).

This is known, see #28922 (comment). Thanks for picking it up!

Maybe submit the fix first?

Realizing my GH history has minimum proof of work, I might have delayed creating a PR, but it felt timely as Testnet 4 is being worked on.

Not sure what this has to do with testnet 4. May be best to remove unrelated non-technical details from the commits and pull request description. (You can still share them in later comments, if you really want)

Concept ACK. The same should be done to ParseHex: #30048 (comment)

@hodlinator
Copy link
Contributor Author

transaction_identifier.h - Fixed dormant bug in TxidFromString() where the string_view length wasn't respected(!).

This is known, see #28922 (comment). Thanks for picking it up!

Maybe submit the fix first?

Aha, good that you are tracking it! I see at least 4 possible fixes:

  1. Switch back to Txid TxidFromString(const std::string& str)
  2. Make TxidFromString() convert from std::string_view back to std::string internally before calling uint256S() to introduce a null-terminator.
  3. Carry over the full SetHex(std::string_view) implementation from this PR without touching the SetHex(const char*) implementation.
  4. Like 3 but implement SetHex(const char* str) by calling the std::string_view version.

Which do you recommend?

Realizing my GH history has minimum proof of work, I might have delayed creating a PR, but it felt timely as Testnet 4 is being worked on.

Not sure what this has to do with testnet 4. May be best to remove unrelated non-technical details from the commits and pull request description. (You can still share them in later comments, if you really want)

Thanks for the feedback. The Testnet 4 PR from this weeks review club introduces new hash-literals to the code, but I concede that it's a weak argument.

Concept ACK. The same should be done to ParseHex: #30048 (comment)

Thanks for having a look and the pointer to ParseHex! It was on my radar momentarily but I didn't reconsider it after reaching the current solution for uint256S(). Should probably introduce a consteval ParseHex(const char*) implementation as part of this PR. Moving to draft for now.

@hodlinator hodlinator marked this pull request as draft July 5, 2024 22:00
@maflcko
Copy link
Member

maflcko commented Jul 8, 2024

Like 3 but implement SetHex(const char* str) by calling the std::string_view version.

I don't think const char* overloads will need to be provided when string_view exists. Seems fine to just have a single sting_view function (and call it a fix at the same time).

@ryanofsky
Copy link
Contributor

Concept ACK, and would be very nice for this to cover ParseHex. If it did, it seems like it would fix the unexpected consensus library dependency on the util library that hebasto reported in #29015 (comment):

static const std::vector<unsigned char> NUMS_H_DATA{ParseHex("50929b74c1a04954b78b4b6035e97a5e078a5a0f28ec96d547bfee9ace803ac0")};

@hodlinator
Copy link
Contributor Author

transaction_identifier.h - Fixed dormant bug in TxidFromString() where the string_view length wasn't respected(!).

This is known, see #28922 (comment). Thanks for picking it up!

Maybe submit the fix first?

PR up now: #30436

@hodlinator hodlinator changed the title refactor: Make uint256S(const char*) consteval refactor: Make uint256S(const char*) and ParseHex(const char*) consteval Jul 12, 2024
@hodlinator hodlinator force-pushed the 2024-07_uint256S_consteval branch 4 times, most recently from afbf8f9 to 92b0fb0 Compare July 16, 2024 21:51
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/27531471253

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

src/uint256.h Outdated Show resolved Hide resolved
src/uint256.h Outdated Show resolved Hide resolved
src/uint256.h Outdated Show resolved Hide resolved
src/uint256.h Outdated Show resolved Hide resolved
src/uint256.h Outdated Show resolved Hide resolved
src/util/strencodings.h Outdated Show resolved Hide resolved
@l0rinc
Copy link
Contributor

l0rinc commented Aug 28, 2024

Changes since 92e599d:

diff --git a/src/test/miniscript_tests.cpp b/src/test/miniscript_tests.cpp
index 452baebff3..e9b0d0ec1d 100644
--- a/src/test/miniscript_tests.cpp
+++ b/src/test/miniscript_tests.cpp
@@ -596,11 +596,11 @@ BOOST_AUTO_TEST_CASE(fixed_tests)
     //  - no pubkey before the CHECKSIG
     constexpr KeyConverter tap_converter{miniscript::MiniscriptContext::TAPSCRIPT};
     constexpr KeyConverter wsh_converter{miniscript::MiniscriptContext::P2WSH};
-    const auto no_pubkey{"ac519c"_hex_v_u8};
+    const auto no_pubkey{"ac519c"_hex_u8};
     BOOST_CHECK(miniscript::FromScript({no_pubkey.begin(), no_pubkey.end()}, tap_converter) == nullptr);
-    const auto incomplete_multi_a{"ba20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_v_u8};
+    const auto incomplete_multi_a{"ba20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_u8};
     BOOST_CHECK(miniscript::FromScript({incomplete_multi_a.begin(), incomplete_multi_a.end()}, tap_converter) == nullptr);
-    const auto incomplete_multi_a_2{"ac2079be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798ac20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_v_u8};
+    const auto incomplete_multi_a_2{"ac2079be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798ac20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_u8};
     BOOST_CHECK(miniscript::FromScript({incomplete_multi_a_2.begin(), incomplete_multi_a_2.end()}, tap_converter) == nullptr);
     // Can use multi_a under Tapscript but not P2WSH.
     Test("and_v(v:multi_a(2,03d01115d548e7561b15c38f004d734633687cf4419620095bc5b0f47070afe85a,025601570cb47f238d2b0286db4a990fa0f3ba28d1a319f5e7cf55c2a2444da7cc),after(1231488000))", "?", "20d01115d548e7561b15c38f004d734633687cf4419620095bc5b0f47070afe85aac205601570cb47f238d2b0286db4a990fa0f3ba28d1a319f5e7cf55c2a2444da7ccba529d0400046749b1", TESTMODE_VALID | TESTMODE_NONMAL | TESTMODE_NEEDSIG | TESTMODE_P2WSH_INVALID, 4, 2, {}, {}, 3);
@@ -645,12 +645,12 @@ BOOST_AUTO_TEST_CASE(fixed_tests)
 
     // Misc unit tests
     // A Script with a non minimal push is invalid
-    const std::vector<unsigned char> nonminpush{"0000210232780000feff00ffffffffffff21ff005f00ae21ae00000000060602060406564c2102320000060900fe00005f00ae21ae00100000060606060606000000000000000000000000000000000000000000000000000000000000000000"_hex_v_u8};
+    constexpr auto nonminpush{"0000210232780000feff00ffffffffffff21ff005f00ae21ae00000000060602060406564c2102320000060900fe00005f00ae21ae00100000060606060606000000000000000000000000000000000000000000000000000000000000000000"_hex_u8};
     const CScript nonminpush_script(nonminpush.begin(), nonminpush.end());
     BOOST_CHECK(miniscript::FromScript(nonminpush_script, wsh_converter) == nullptr);
     BOOST_CHECK(miniscript::FromScript(nonminpush_script, tap_converter) == nullptr);
     // A non-minimal VERIFY (<key> CHECKSIG VERIFY 1)
-    const std::vector<unsigned char> nonminverify{"2103a0434d9e47f3c86235477c7b1ae6ae5d3442d49b1943c2b752a68e2a47e247c7ac6951"_hex_v_u8};
+    constexpr auto nonminverify{"2103a0434d9e47f3c86235477c7b1ae6ae5d3442d49b1943c2b752a68e2a47e247c7ac6951"_hex_u8};
     const CScript nonminverify_script(nonminverify.begin(), nonminverify.end());
     BOOST_CHECK(miniscript::FromScript(nonminverify_script, wsh_converter) == nullptr);
     BOOST_CHECK(miniscript::FromScript(nonminverify_script, tap_converter) == nullptr);
diff --git a/src/test/script_tests.cpp b/src/test/script_tests.cpp
index ef6ca8dc0d..5825684947 100644
--- a/src/test/script_tests.cpp
+++ b/src/test/script_tests.cpp
@@ -1358,7 +1358,7 @@ template <typename T>
 CScript ToScript(const T& byte_container)
 {
     auto span{MakeUCharSpan(byte_container)};
-    return {span.data(), span.data() + span.size()};
+    return {span.begin(), span.end()};
 }
 
 static CScript ScriptFromHex(const std::string& str)
diff --git a/src/util/strencodings.h b/src/util/strencodings.h
index f9076de047..1543de03ab 100644
--- a/src/util/strencodings.h
+++ b/src/util/strencodings.h
@@ -396,7 +396,7 @@ consteval uint8_t ConstevalHexDigit(const char c)
  *   length and serialized with no prefix.
  *
  * @warning It may be preferable to use vector variants to save stack space when
- *   declaring local variables if hex strings are large. Alternately variables
+ *   declaring local variables if hex strings are large. Alternatively variables
  *   could be declared constexpr to avoid using stack space.
  *
  * @warning Avoid `uint8_t` variants when not necessary, as the codebase

ACK a096215

hodlinator and others added 10 commits August 28, 2024 19:09
Also changes compile-time asserts with comments into throws.
Lines will be touched in next 2 commits.
Beyond renaming it also adjusts whitespace and adds braces to conform to current doc/developer-notes.md.

TestEncrypt: Change iterator type to auto in ahead of vector -> span conversion.

Only touches functions that will be modified in next commit.
TestEncryptSingle: Remove no longer needed plaintext2-variable that existed because vectors had different allocators.
Length was already asserted inside of base_blob-ctor.
* Use BOOST_CHECK_EQUAL_COLLECTIONS and BOOST_CHECK_EQUAL instead of deprecated BOOST_CHECK.
* Avoid repeating expected values.
* Break out repeated HEX_PARSE_INPUT and rename ParseHex_expected to HEX_PARSE_OUTPUT.

Done in preparation for adding a couple more tests in the next commit.

Co-Authored-By: l0rinc <[email protected]>
""_hex is a compile-time user-defined literal returning std::array<std::byte>, equivalent of ParseHex.

Variants:
- ""_hex_v returns std::vector<std::byte>
- ""_hex_u8 returns std::array<uint8_t>
- ""_hex_v_u8 returns std::vector<uint8_t> - Directly serializable as a size-prefixed OP_PUSH CScript payload using operator<<.

Also extracts from_hex into shared util::ConstevalHexDigit function.

Co-Authored-By: hodlinator <[email protected]>
Co-Authored-By: MarcoFalke <*~=`'#}+{/-|&$^[email protected]>
Co-Authored-By: Ryan Ofsky <[email protected]>
Co-Authored-By: stickies-v <[email protected]>
The following scripted-diff commit will replace ParseHex("...") with "..."_hex_u8, but this replacement will not work in cases where vectors are needed instead of arrays, and is not ideal in cases where std::byte is accepted.

For example, it is currently necessary to use _hex_v_u8 when calling CScript operator<< because that operator does not currently support std::array or std::byte.

Conversely, it is incorrect to use _hex_v instead of _hex in net_processing.cpp for the MakeAndPushMessage argument, because if the argument is a std::vector it is considered variable-length and serialized with a size prefix, but if the argument is a std::array or Span is it considered fixed length and serialized without a prefix.

By the same logic, it is also safe to change the NUMS_H constant in pubkey.cpp from a std::vector to std::array because it is never serialized.
- Adds using namespace.
- Extracts ToScript helper function from ScriptFromHex, to be used heavily in the next commit.
- Changes ScriptFromHex from using ParseHex to TryParseHex, now asserting the string is valid.
- Use even number of hex digits in comment (and apply replacement from next commit to only touch line once).
Ideally all call sites should accept std::byte instead of uint8_t but those transformations are left to future PRs.

-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's/\bParseHex\(("[^"]*")\)/\1_hex_u8/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')
sed -i --regexp-extended 's/\bParseHex<std::byte>\(("[^"]*")\)/\1_hex/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')
sed -i --regexp-extended 's/\bScriptFromHex\(("[^"]*")\)/ToScript(\1_hex)/g' src/test/script_tests.cpp
-END VERIFY SCRIPT-

Co-Authored-By: MarcoFalke <*~=`'#}+{/-|&$^[email protected]>
Co-Authored-By: Ryan Ofsky <[email protected]>
@l0rinc
Copy link
Contributor

l0rinc commented Aug 28, 2024

Non-trivial rebase with conflicts in bloom_tests, crypto_tests and txpackage_tests.
The resolutions seem correct, all 3 are passing locally.
ACK 8756ccd

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 8756ccd, just rebasing since last review and taking advantage of CScript constructors in #29369, also tweaking a code comment

@stickies-v
Copy link
Contributor

Rebase re-ACK 8756ccd, no changes since a096215

@maflcko
Copy link
Member

maflcko commented Sep 4, 2024

macOS follow-up idea: Someone could check on macOS 13, if compilation with XCode still works. If not, build-osx.md could be updated (Ref: #30377 (comment))

edit: i guess this is already checked by CI?

@hodlinator hodlinator deleted the 2024-07_uint256S_consteval branch September 4, 2024 13:02
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Sep 4, 2024
Fix check-deps.sh to check for weak symbols so it can detect when an exported
template function is used from another library.

In a previous version of this commit, this change caused an invalid dependency
in the consensus library on the TryParseHex template function from the util
library to be detected, and a suppression was added here. But bitcoin#30377 removed
the invalid dependency so the suppression is no longer needed.

The invalid dependency and problem detecting weak symbol usage was originally
reported by Hennadii Stepanov in
bitcoin#29015 (comment)
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Sep 5, 2024
Fix check-deps.sh to check for weak symbols so it can detect when an exported
template function is used from another library.

In a previous version of this commit, this change caused an invalid dependency
in the consensus library on the TryParseHex template function from the util
library to be detected, and a suppression was added here. But bitcoin#30377 removed
the invalid dependency so the suppression is no longer needed.

The invalid dependency and problem detecting weak symbol usage was originally
reported by Hennadii Stepanov in
bitcoin#29015 (comment)
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Sep 5, 2024
Fix check-deps.sh to check for weak symbols so it can detect when an exported
template function is used from another library.

In a previous version of this commit, this change caused an invalid dependency
in the consensus library on the TryParseHex template function from the util
library to be detected, and a suppression was added here. But bitcoin#30377 removed
the invalid dependency so the suppression is no longer needed.

The invalid dependency and problem detecting weak symbol usage was originally
reported by Hennadii Stepanov in
bitcoin#29015 (comment)
achow101 added a commit that referenced this pull request Sep 20, 2024
…s, not just vectors

5e190cd Replace CScript _hex_v_u8 appends with _hex (Lőrinc)
cac846c Allow CScript's operator<< to accept spans, not just vectors (Lőrinc)
c78d8ff prevector: avoid GCC bogus warnings in insert method (Lőrinc)

Pull request description:

  Split out of #30377 (comment).

  Replace `_hex_v_u8` for `CScript` appends to `_hex`, to skip vector conversion before serializing to the `prevector` in `CScript`.

  To enable both `unsigned char` and `std::byte` values, I've extracted the existing serialization to append the size & data in separate private methods to clarify that it does more than just a simple data insertion.

  There were also discussion on eliminating the operators here completely to obviate when we're serializing fixed-size collections as raw bytes, and when we're prefixing them with their size - should also be done in a separate PR.

ACKs for top commit:
  achow101:
    ACK 5e190cd
  ryanofsky:
    Code review ACK 5e190cd. Looks good!
  hodlinator:
    re-ACK 5e190cd

Tree-SHA512: 27a646629e017b2a05416d5eb964dda8b25b900d466221eff7bfa1339ded443e1c5c4cf8ff20cb3bba915a2603787a9fa6f6ec12bc0b9415d9eb07b57289192b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants