Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change reporting to TLS #2089

Merged
merged 3 commits into from
Dec 16, 2021
Merged

Change reporting to TLS #2089

merged 3 commits into from
Dec 16, 2021

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Dec 10, 2021

What this PR does / why we need it:
Makes reporting connections over TLS.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

@rainest rainest temporarily deployed to Configure ci December 10, 2021 23:11 Inactive
@rainest
Copy link
Contributor Author

rainest commented Dec 10, 2021

I am doing something that breaks the test TLS connection. Possibly wrapping its accept in a goroutine is freezing it somehow? Not sure why. It receives a TLS client hello, but never responds with its own server hello. The struct indicates that it has not completed any handshakes:

timeout.pcap.gz
conn.txt

@rainest rainest temporarily deployed to Configure ci December 11, 2021 00:13 Inactive
@rainest rainest temporarily deployed to Configure ci December 11, 2021 00:29 Inactive
@rainest
Copy link
Contributor Author

rainest commented Dec 11, 2021

Deadlock resolved by manually handshaking. I'm not particularly fond of the connection re-establishment behavior, but not immediately seeing how we can handle it better since we don't get an affirmative "yes, the connection is closed" net.ErrClosed, and what we do get isn't exported and would require us to scrape the syscall text. Is there some better way to determine that upstream closed the connection?

> github.com/kong/kubernetes-ingress-controller/v2/internal/util.Reporter.send() ./reports.go:105 (hits goroutine(12):3 total:3) (PC: 0x1a449c8)
	err: error(*crypto/tls.permanentError) *{
		err: net.Error(*net.OpError) *{
			Op: "write",
			Net: "tcp",
			Source: net.Addr(*net.TCPAddr) ...,
			Addr: net.Addr(*net.TCPAddr) ...,
			Err: error(*os.SyscallError) ...,},}
   100:	
   101:	func (r Reporter) send(signal string, uptime int) {
   102:		message := "<14>signal=" + signal + ";uptime=" +
   103:			strconv.Itoa(uptime) + ";" + r.serializedInfo
   104:		_, err := r.conn.Write([]byte(message))
=> 105:		if err != nil {
   106:			r.Logger.Errorf("failed to send report: %s", err)
   107:			r.conn = nil
   108:		}
   109:	}
(dlv) call err.Error()
> github.com/kong/kubernetes-ingress-controller/v2/internal/util.Reporter.send() ./reports.go:105 (PC: 0x1a449c8)
Values returned:
	~r0: "write tcp 127.0.0.1:46284->127.0.0.1:61833: write: broken pipe"

   100:	
   101:	func (r Reporter) send(signal string, uptime int) {
   102:		message := "<14>signal=" + signal + ";uptime=" +
   103:			strconv.Itoa(uptime) + ";" + r.serializedInfo
   104:		_, err := r.conn.Write([]byte(message))
=> 105:		if err != nil {
   106:			r.Logger.Errorf("failed to send report: %s", err)
   107:			r.conn = nil
   108:		}
   109:	}

@rainest rainest temporarily deployed to Configure ci December 14, 2021 00:49 Inactive
@github-actions
Copy link

Licenses differ between commit 17d8820 and base:

+++ pr_licenses.csv	2021-12-14 00:50:51.734731162 +0000
@@ -8,12 +8,7 @@
 github.com/bombsimon/logrusr,https://github.com/bombsimon/logrusr/blob/master/LICENCE,MIT
 github.com/cenkalti/backoff/v4,https://github.com/cenkalti/backoff/blob/master/v4/LICENSE,MIT
 github.com/cespare/xxhash/v2,https://github.com/cespare/xxhash/blob/master/v2/LICENSE.txt,MIT
-github.com/containerd/containerd,https://github.com/containerd/containerd/blob/master/LICENSE,Apache-2.0
 github.com/davecgh/go-spew/spew,https://github.com/davecgh/go-spew/blob/master/spew/LICENSE,ISC
-github.com/docker/distribution,https://github.com/docker/distribution/blob/master/LICENSE,Apache-2.0
-github.com/docker/docker,https://github.com/docker/docker/blob/master/LICENSE,Apache-2.0
-github.com/docker/go-connections,https://github.com/docker/go-connections/blob/master/LICENSE,Apache-2.0
-github.com/docker/go-units,https://github.com/docker/go-units/blob/master/LICENSE,Apache-2.0
 github.com/evanphx/json-patch,https://github.com/evanphx/json-patch/blob/master/LICENSE,BSD-3-Clause
 github.com/evanphx/json-patch/v5,https://github.com/evanphx/json-patch/blob/master/v5/LICENSE,BSD-3-Clause
 github.com/fatih/color,https://github.com/fatih/color/blob/master/LICENSE.md,MIT
@@ -44,8 +39,6 @@
 github.com/mitchellh/reflectwalk,https://github.com/mitchellh/reflectwalk/blob/master/LICENSE,MIT
 github.com/modern-go/concurrent,https://github.com/modern-go/concurrent/blob/master/LICENSE,Apache-2.0
 github.com/modern-go/reflect2,https://github.com/modern-go/reflect2/blob/master/LICENSE,Apache-2.0
-github.com/opencontainers/go-digest,https://github.com/opencontainers/go-digest/blob/master/LICENSE,Apache-2.0
-github.com/opencontainers/image-spec/specs-go,https://github.com/opencontainers/image-spec/blob/master/specs-go/LICENSE,Apache-2.0
 github.com/pkg/errors,https://github.com/pkg/errors/blob/master/LICENSE,BSD-2-Clause
 github.com/prometheus/client_golang/prometheus,https://github.com/prometheus/client_golang/blob/master/prometheus/LICENSE,Apache-2.0
 github.com/prometheus/client_model/go,https://github.com/prometheus/client_model/blob/master/go/LICENSE,Apache-2.0```

@rainest rainest temporarily deployed to Configure ci December 15, 2021 22:48 Inactive
@rainest rainest temporarily deployed to Configure ci December 15, 2021 22:54 Inactive
@rainest
Copy link
Contributor Author

rainest commented Dec 15, 2021

Realized we don't actually need persistent connections and interruption handling: while the test interval is 1 second, and the actual interval does include time.Second, the actual interval is 3600 * time.Second, not just time.Second.

Dunno if Splunk will even let us keep the idle connection for an hour, but there's no real need to, since reopening it once each hour doesn't raise the same performance issues as doing it once each second, and drastically simplifies testing.

This is ready for review, but we want to conduct a practical test where we run an image built from this branch against the actual reporting server to confirm we receive stats via the new listen. I need to get permissions to access that.

@rainest rainest marked this pull request as ready for review December 15, 2021 22:57
@rainest rainest requested a review from a team as a code owner December 15, 2021 22:57
@rainest rainest temporarily deployed to Configure ci December 15, 2021 22:58 Inactive
Changes the reporting system to use TLS instead of UDP. Connections are
no longer part of the reporter, and are instead scoped to the individual
runs of send().

Add test utility scaffolding to run a TLS server.
@rainest rainest temporarily deployed to Configure ci December 15, 2021 23:43 Inactive
@rainest rainest temporarily deployed to Configure ci December 15, 2021 23:57 Inactive
shaneutt
shaneutt previously approved these changes Dec 16, 2021
Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only things before merging:

  • if there's no reason NOT to set MinVersion: tls.VersionTLS13, then let's just set that
  • let's make sure we take care of deadlines and write verifications for the tls connection.

internal/util/reports.go Outdated Show resolved Hide resolved
internal/util/reports.go Show resolved Hide resolved
internal/util/reports.go Show resolved Hide resolved
internal/util/reports_test.go Outdated Show resolved Hide resolved
internal/util/reports_test.go Outdated Show resolved Hide resolved
@rainest rainest temporarily deployed to Configure ci December 16, 2021 18:55 Inactive
Travis Raines added 2 commits December 16, 2021 10:56
Bump min TLS version.

Change test certificate variable name.

Add an R/W deadline and exit sends early on failure.
@rainest rainest temporarily deployed to Configure ci December 16, 2021 20:06 Inactive
@rainest
Copy link
Contributor Author

rainest commented Dec 16, 2021

#2104 seems quite determined to fail here. Wait for that fix, merge main, and this should pass.

@shaneutt
Copy link
Contributor

actually the most recent failure seemed unrelated:

        	            	Internal error occurred: failed calling webhook "validations.kong.konghq.com": Post "https://webhook-kong-validations-gateway.kong-system.svc:443/?timeout=10s": dial tcp 10.96.85.183:443: connect: connection refused

@rainest rainest merged commit 3112fcf into main Dec 16, 2021
@rainest rainest deleted the feat/encrypted-reporting branch December 16, 2021 20:56
@rainest
Copy link
Contributor Author

rainest commented Dec 16, 2021

The heck, auto-merge merged it despite the failure?

Webhook thing looks like we probably need a connection test eventually on the webhook service before trying actually create resources.

@mflendrich mflendrich mentioned this pull request Dec 16, 2021
4 tasks
rainest pushed a commit that referenced this pull request Jan 18, 2022
Changes the reporting system to use TLS instead of UDP. Connections are
no longer part of the reporter, and are instead scoped to the individual
runs of send().

Add test utility scaffolding to run a TLS server.
rainest pushed a commit that referenced this pull request Jan 18, 2022
Changes the reporting system to use TLS instead of UDP. Connections are
no longer part of the reporter, and are instead scoped to the individual
runs of send().

Add test utility scaffolding to run a TLS server.
rainest pushed a commit that referenced this pull request Jan 19, 2022
Changes the reporting system to use TLS instead of UDP. Connections are
no longer part of the reporter, and are instead scoped to the individual
runs of send().

Add test utility scaffolding to run a TLS server.
rainest pushed a commit that referenced this pull request Jan 19, 2022
Changes the reporting system to use TLS instead of UDP. Connections are
no longer part of the reporter, and are instead scoped to the individual
runs of send().

Add test utility scaffolding to run a TLS server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants