Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

holepunching: expose metrics #2103

Closed
marten-seemann opened this issue Feb 15, 2023 · 4 comments · Fixed by #2246
Closed

holepunching: expose metrics #2103

marten-seemann opened this issue Feb 15, 2023 · 4 comments · Fixed by #2246
Assignees
Labels
effort/days Estimated to take multiple days, but less than a week exp/novice Someone with a little familiarity can pick up good first issue Good issue for new contributors help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature

Comments

@marten-seemann
Copy link
Contributor

We should:

  • count hole punching attempts and outcomes (by transport)
  • differentiate between connection reversal and actual hole punches

@dennis-tra Are there any more Prometheus metrics you think would be helpful to have on a dashboard?

@marten-seemann marten-seemann added kind/enhancement A net-new feature or improvement to an existing feature help wanted Seeking public contribution on this issue exp/novice Someone with a little familiarity can pick up good first issue Good issue for new contributors effort/days Estimated to take multiple days, but less than a week labels Feb 15, 2023
@marten-seemann marten-seemann mentioned this issue Feb 15, 2023
16 tasks
@dennis-tra
Copy link
Contributor

Just brainstorming, not saying we should track all of the following:

  • track hole punch outcomes (this includes connection reversal). In our measurement campaign we tracked the following outcomes:

    • SUCCESS - any attempt succeeded
    • FAILED - all attempts failed
    • CONNECTION_REVERSED - dcutr stream wasn't opened but we still ended up with a direct connection
    • NO_STREAM - remote peer supported dcutr but didn't open a stream.

    I can imagine that the last two are more difficult to track than the first two.

  • Each outcome could be enriched with the following information:

    • number of required attempts
    • used transport
    • used IP version
    • remote peers agent version - to avoid cardinality explosion we could limit it kubo/non-kubo + kubo minor version. This could help in spotting regressions. On the other hand, since libp2p is a generic library I would rather not couple it to Kubo - perhaps make it configurable.
    • Information about the local network
      • has active port mapping (yes/no)
      • reachability status (unknown, public, private)
      • is in a VPN (not sure how to track that)
    • somehow also keep track of the measured RTTs

I think the following would make a great start:

  • hole punch outcome SUCCESS, FAILED, CONNECTION_REVERSED - NO_STREAM would only make sense if we also tracked the remote peers agent version
  • used transport
  • used IP version (because our measurement suggested that IPv6 doesn't work well)
  • number of required attempts

@sukunrt
Copy link
Member

sukunrt commented Mar 20, 2023

would only make sense if we also tracked the remote peers agent version

@dennis-tra do you think this info is not useful without the peers agent version?

@dennis-tra
Copy link
Contributor

@sukunrt I think what I wanted to say is that we wouldn't be able to act on e.g., a degraded success rate because we didn't correlate it with something else. "Something else" could be the agent version (as I mentioned above), but it could also be another property like hasActivePortMapping or isCGNAT'd, etc.

But I would step back from my above statement because it would still be useful to know if the distribution of hole punch outcomes changes over time.

@sukunrt
Copy link
Member

sukunrt commented Mar 22, 2023

tracking agent version is a difficult since it's a user controlled field, so an attacker can generate lots of agent versions which would lead to creation of a lot of prometheus time series on our end.
@marten-seemann has some ideas around a useful construct where we allow applications to specify a regex. However we can take that up after we finish the initial set of metrics.

I propose the following set of metrics in the first cut:

  • Holepunch Attempt by (ipversion, transport)
    • We will only count a (ipv, transport) combination if both the sides provide an address with that combination.
  • Holepunch Outcome by (ipversion, transport)
    • Same condition as holepunch attempt for considering an (ipv, transport) combination
  • Holepunch Outcome by (number of attemps)
    • only measured on the dcutr initiator side
  • Holepunch Errors by reason
    • reasons for not trying holepunching. malformed message, no public address etc.
  • ConnectionReversed:
    • Tracks whether attempt to direct dial before holepunching succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/novice Someone with a little familiarity can pick up good first issue Good issue for new contributors help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants