feat: resolve domains in enode strings #8188

Rjected · 2024-05-09T20:05:01Z

Introduces a type that should properly deserialize bootnode enode strings with domains, since the current NodeRecord type does not do this.

fixes #8187

emhane

nice to have the new type DNSNodeRecord live in reth-net-common

crates/net/discv5/src/config.rs

emhane · 2024-05-10T18:05:54Z

crates/net/types/src/dns_node_record.rs

+        f.write_char(':')?;
+        self.tcp_port.fmt(f)?;
+        if self.tcp_port != self.udp_port {
+            f.write_str("?discport=")?;


we should open an issue to spec this behaviour with discport

sergerad · 2024-05-17T22:51:55Z

crates/node/builder/src/launch/common.rs

+
+            // resolve trusted peers if they use a domain instead of dns
+            for peer in &config.network.trusted_peers {
+                let resolved = resolve_dns_node_record(peer.clone())


Are we sure we want to resolve DNS at this point?

When deploying geth on k8s, we have found its behaviour to eagerly resolve DNS of hostnames in enode strings to be a blocker for trusted peer setups (e.g. trying to resolve hostname for a clusterip of a statefulset that is being deployed in parallel, preventing peers from starting up further).

to make it work for this use case, what would be the best way to resolve?

The hostname for a statefulset pod on k8s should stabilize within a few seconds of the pod being up and running. The issue with geth was that the pods would terminate almost instantly on start up due to the hostname resolution failure.

To give us the best shot of not having the same issue, we could consider some or all of these options:

Build in a configurable number of retries (and maybe backoff strategy) for resolving peer hostnames;

Make the type for accessing peer records handle hostname resolution just in time (e.g. peer.ip_addr() would perform resolution if it had not been done before).

Option 1 would allow operators to ensure start up does not fail on the basis of hostnames that are in the process of being available on cluster.

Option 2 is not guaranteed to solve the issue, but might mitigate the number of retries required.

Let me know if you think we should address this in a separate issue and PR. And if you want me to help finish off this PR at all.

Build in a configurable number of retries (and maybe backoff strategy) for resolving peer hostnames;

yeah, to me this seems like the most likely to work, and option 2 seems more difficult anyways. It would be great if you could help implement it! Feel free to submit a PR to this branch!

sergerad · 2024-05-18T03:45:26Z

@Rjected I looked into this unit test failure

thread 'dns_node_record::tests::test_url_parse' panicked at crates/net/types/src/dns_node_record.rs:130:9:
assertion `left == right` failed
  left: DNSNodeRecord { host: Domain("10.3.58.6"), tcp_port: 30303, udp_port: 30301, id: 0x6f8a80d14311c39f35f516fa664deaaaa13e85b2f7493f37f6144d86991ec012937307647bd3b9a82abe2974e1407241d54947bbb39763a4cac9f77166ad92a0 }
 right: DNSNodeRecord { host: Ipv4(10.3.58.6), tcp_port: 30303, udp_port: 30301, id: 0x6f8a80d14311c39f35f516fa664deaaaa13e85b2f7493f37f6144d86991ec012937307647bd3b9a82abe2974e1407241d54947bbb39763a4cac9f77166ad92a0 }

The enode prefix in the URL leads the parser to use parse_opaque() on the host string which always returns a Host::Domain whereas we expect a Host::Ipv4 here.

From what I can see our only option is to post-process the url.host after that parsing has occurred. Would be better if we could control the url parsing logic but haven't seen a way to do that in our favour here.

mattsse

not super obvious to me why we need this.

this basically replaces the Noderecord usage, why?

what's the progress here?

mattsse · 2024-05-22T08:49:35Z

crates/net/common/src/dns_node_record_resolve.rs

+/// cannot be resolved, an error is returned.
+///
+/// If the host is already an IP address, the [NodeRecord] is returned immediately.
+pub async fn resolve_dns_node_record(node_record: DNSNodeRecord) -> Result<NodeRecord, Error> {


why is this a standalone function and not a fn of dnsrecord?

@mattsse I have moved the fn to be a method of the dns struct in this pr #8358 (review)

I also think we should consolidate DNSNodeRecord into NodeRecord and perhaps lazily resolve domain names when the tcp/udp getter fns are used

sergerad · 2024-05-22T09:03:03Z

not super obvious to me why we need this.

this basically replaces the Noderecord usage, why?

what's the progress here?

I have only just started contributing to reth so please take what I say with a grain of salt.

My understanding is that the current impl of NodeRecord does not support domain names, only IPv4 and IPv6 addrs:

pub struct NodeRecord {
    /// The Address of a node.
    pub address: IpAddr

This PR is adding support for domain names. @mattsse are you suggesting we should be adding this support with the existing NodeRecord struct instead of DNSNodeRecord? Given that domain name support will be a requirement for many L2 networks (like the ones I work with).

Apologies if I have misinterpreted anything 🙏

mattsse · 2024-05-22T09:47:06Z

gotcha,

@Rjected please document all of this because having like 4 different node variants is super confusing

mattsse · 2024-05-29T19:20:35Z

@Rjected what's the path forward here?

Rjected · 2024-05-31T17:02:48Z

@Rjected what's the path forward here?

sorry, have been occupied with other things, need to review #8358, and then will polish this up and add a ton of docs

mattsse

solving this with a trustedpeer type makes sense to me.

I'm okay with the Option but don't love it, especially since we already use the backon crate.

don't have a preference, but ideally the caller is responsible for retrying futures, like backon does.

eg:

reth/bin/reth/src/commands/debug_cmd/in_memory_merkle.rs

Lines 110 to 123 in 1e0d724

    
           let backoff = ConstantBuilder::default().with_max_times(retries); 
        
           let client = fetch_client.clone(); 
        
           let header = (move || { 
        
               get_single_header(client.clone(), BlockHashOrNumber::Number(target_block_number)) 
        
           }) 
        
           .retry(&backoff) 
        
           .notify(|err, _| warn!(target: "reth::cli", "Error requesting header: {err}. Retrying...")) 
        
           .await?; 
        
           let client = fetch_client.clone(); 
        
           let chain = provider_factory.chain_spec(); 
        
           let block = (move || get_single_body(client.clone(), Arc::clone(&chain), header.clone())) 
        
               .retry(&backoff)

mattsse · 2024-06-05T09:20:03Z

crates/net/network/src/manager.rs

+        let mut resolved_boot_nodes = vec![];
+        for record in &boot_nodes {
+            let resolved = record.resolve(None).await?;
+            resolved_boot_nodes.push(resolved);
+        }


this can be a join

mattsse · 2024-06-05T09:22:08Z

crates/net/types/src/trusted_peer.rs

+    pub async fn resolve(
+        &self,
+        retry_strategy: Option<RetryStrategy>,
+    ) -> Result<NodeRecord, Error> {


I don't like intregrating a retrystrategy in there like this.

imo it should be the caller's responsibility, to retry the resolve fn.

I'd rather have this as a separate function, or leave it up to the caller entirely.

we also already use the backon crate for retries. can we use that instead, the tokio-retry crate is very smol, but still. Also open to remove backon in favor of tokio-retry

Have created #8613 to address this 🙏

I can't see load_toml_config() being tested. Now that the retry logic is moved there, I feel obligated to add tests for that fn.

Is that functionality perhaps tested by some e2e suite in CI? Or would we want to add a unit test for it perhaps?

mattsse · 2024-06-05T09:27:54Z

crates/node/builder/src/launch/common.rs

+            for peer in &config.network.trusted_peers {
+                let retry_strategy = RetryStrategy::new(
+                    std::time::Duration::from_millis(config.network.retry_millis),
+                    config.network.retry_attempts,
+                );
+                let resolved = peer
+                    .resolve(Some(retry_strategy))
+                    .await
+                    .wrap_err_with(|| format!("Could not resolve trusted peer {peer}"))?;
+                toml_config.peers.trusted_nodes.insert(resolved);


should be a join as well.

mattsse · 2024-06-05T22:11:25Z

crates/node/builder/src/launch/common.rs

+            // resolve trusted peers if they use a domain instead of dns
+            for peer in &config.network.trusted_peers {
+                let backoff = ConstantBuilder::default().with_max_times(config.network.dns_retries);
+                let resolved = (move || { peer.resolve() })
+                .retry(&backoff)
+                .notify(|err, _| warn!(target: "reth::cli", "Error resolving peer domain: {err}. Retrying..."))
+                .await?;
+                toml_config.peers.trusted_nodes.insert(resolved);


we could extract this logic, into a separate function.

but will track this in a followup.

Feel free to ping me for any followup if I can help

Rjected added A-devp2p Related to the Ethereum P2P protocol A-cli Related to the reth CLI A-discv4 Related to discv4 discovery A-discv5 Related to discv5 discovery labels May 9, 2024

emhane requested changes May 10, 2024

View reviewed changes

sergerad reviewed May 17, 2024

View reviewed changes

sergerad mentioned this pull request May 22, 2024

Fix enode URL parse failure due to enode prefix #8339

Merged

mattsse reviewed May 22, 2024

View reviewed changes

sergerad mentioned this pull request May 23, 2024

Add retries to DNS lookup for enode hostnames #8358

Merged

mattsse mentioned this pull request May 23, 2024

rename reth-network-types to reth-network-peer-types or reth-peer-types #8367

Closed

Rjected force-pushed the dan/dns-resolution-on-enodes branch from 3510f0f to cdfdaff Compare May 23, 2024 16:12

Rjected force-pushed the dan/dns-resolution-on-enodes branch 3 times, most recently from 5d3c0be to 6277861 Compare June 3, 2024 20:39

Rjected marked this pull request as ready for review June 3, 2024 21:11

Rjected requested review from onbjerg and gakonst as code owners June 3, 2024 21:11

emhane approved these changes Jun 5, 2024

View reviewed changes

mattsse requested changes Jun 5, 2024

View reviewed changes

sergerad mentioned this pull request Jun 5, 2024

Replace tokio-retry with backon and remove RetryStrategy #8613

Merged

mattsse approved these changes Jun 5, 2024

View reviewed changes

Rjected and others added 4 commits June 5, 2024 18:52

feat: resolve domains in enode strings

8e54067

Fix enode URL parse failure due to enode prefix (#8339)

241e0bc

Add retries to DNS lookup for enode hostnames (#8358)

dafe2fd

fix rebase compile errors

bd88a23

Rjected and others added 13 commits June 5, 2024 18:53

make clippy happy

ebe34cd

retry strategy resolve cleanup

846b0c8

rename to trusted peer, add docs

62436f4

fix docs

fa3601a

final doc fix

62f75e9

update book

578c0d3

make clippy docs happy

5593ecb

make clippy docs happy again

e5e9042

clippy

26d87f4

improve docs

0002344

remove unnecessary use

f456a79

revert discv5 bootnodes changes

c47916f

Replace tokio-retry with backon and remove RetryStrategy (#8613)

83323eb

Rjected force-pushed the dan/dns-resolution-on-enodes branch from 7bc27a3 to 83323eb Compare June 5, 2024 22:54

fix from merge conflicts

89dc3c7

Rjected added this pull request to the merge queue Jun 5, 2024

Merged via the queue into main with commit ef3f677 Jun 5, 2024
30 checks passed

Rjected deleted the dan/dns-resolution-on-enodes branch June 5, 2024 23:58

Rjected mentioned this pull request Jun 21, 2024

Please add support for dns names in bootnodes records #9015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: resolve domains in enode strings #8188

feat: resolve domains in enode strings #8188

Rjected commented May 9, 2024 •

edited

Loading

emhane left a comment

emhane May 10, 2024

sergerad May 17, 2024 •

edited

Loading

Rjected May 20, 2024

sergerad May 20, 2024

Rjected May 21, 2024

sergerad commented May 18, 2024

mattsse left a comment

mattsse May 22, 2024

sergerad May 29, 2024

sergerad commented May 22, 2024

mattsse commented May 22, 2024 •

edited

Loading

mattsse commented May 29, 2024

Rjected commented May 31, 2024

mattsse left a comment

mattsse Jun 5, 2024

mattsse Jun 5, 2024

sergerad Jun 5, 2024

sergerad Jun 5, 2024

mattsse Jun 5, 2024

mattsse Jun 5, 2024

sergerad Jun 6, 2024

	let backoff = ConstantBuilder::default().with_max_times(retries);

	let client = fetch_client.clone();
	let header = (move \|\| {
	get_single_header(client.clone(), BlockHashOrNumber::Number(target_block_number))
	})
	.retry(&backoff)
	.notify(\|err, _\| warn!(target: "reth::cli", "Error requesting header: {err}. Retrying..."))
	.await?;

	let client = fetch_client.clone();
	let chain = provider_factory.chain_spec();
	let block = (move \|\| get_single_body(client.clone(), Arc::clone(&chain), header.clone()))
	.retry(&backoff)

feat: resolve domains in enode strings #8188

feat: resolve domains in enode strings #8188

Conversation

Rjected commented May 9, 2024 • edited Loading

emhane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergerad May 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergerad commented May 18, 2024

mattsse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergerad commented May 22, 2024

mattsse commented May 22, 2024 • edited Loading

mattsse commented May 29, 2024

Rjected commented May 31, 2024

mattsse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rjected commented May 9, 2024 •

edited

Loading

sergerad May 17, 2024 •

edited

Loading

mattsse commented May 22, 2024 •

edited

Loading