fix: use proper HTTP clients to fetch files #66

rebornplusplus · 2023-06-16T07:10:00Z

Fixes #14.

As suggested below by Gustavo, use the short timeout httpClient and long timeout bulkClient in archive.go properly. Use the quick httpClient to fetch small files such as Release files. For packages and indexes, use the bulkClient as the files can be quite big.

Have you signed the CLA?

Add ``--timeout`` option in chisel cut so that users can specify their preferred limit for each request to avoid context deadline error [0]. Duration (with unit) should be specified as the argument of the new option. A timeout of zero means no timeout. Default timeout is 60 seconds. Additionally, increase the hardcoded Timeout in the http client in archive.go to match the default 60 seconds limit. Fixes canonical#14. References: - [0] canonical#14

cjdcordeiro

Nice, thanks. Just a couple of comments and a question: is it possible to have a test for this in archive_test?

internal/archive/archive.go

Pass --real-archive option in go test to run this test with real archives.

rebornplusplus · 2023-06-16T09:41:07Z

Added a test for the new option. Please make sure to pass the --real-archive option to go test to run this test.

cjdcordeiro · 2023-06-16T16:39:13Z

Added a test for the new option. Please make sure to pass the --real-archive option to go test to run this test.

Cool!
then it should also be added to the CI test shouldn't it?

The --real-archive flag specified in archive_test is used to test real archives and timeout limits. Run these tests in the github workflow.

rebornplusplus · 2023-06-19T04:30:18Z

I was a bit hesitant to run the tests with --real-archive in the GitHub workflows test because it might simply raise an error because of a slow connection to the archives. But I guess that is for the better. If it times out or errors out here, it'd probably give us a good idea if the issue is still persistent.

Additionally, the new ./internal/archive test run with --real-archive does not contribute to the coverprofile test-coverage.out. Let me know if you want me to include those in there as well. In that case, I would probably need to use the gocovmerge tool to merge two coverprofiles.

internal/archive/archive_test.go

woky

I'd prefer the timeout to be passed down via archive.Options but that may be tricky as in the Archive interface we don't use http.Client directly but rather just the global httpDo() function. Apparently it's possible to create interruptible requests via NewRequestWithContext() but that would need to be cancelled from another goroutine.

In conclusion, LGTM. :-)

cjdcordeiro

ty!

niemeyer

There's a detail I mention below about the option name, but reviewing this code made me realize that the implementation is currently misusing a concept that I've ported from earlier applications that I wrote a while ago. Note how we have two HTTP clients: httpCilent, and bulkClient, with different timeouts explicitly defined. The distinction exists precisely because one of them provides interactivity for smaller API-like content, while the latter provides a configuration for bulk data (IOW, large downloads, such as packages and Content files).

So this change is in effect working around the fact we're misusing these two values: it's strugling with the client made for quick interactions, and attempting to turn it into the bluk client. We might as well actually use the two clients properly instead.

So here is the proposal: can we please drop the explicit flag for now, and make this PR simply change the implementation to use the two different clients correctly, for the proper cases?

We can do that by simply introducing a flag parameter to the fetch() function:

type fetchFlags uint

const (
        fetchBulk    fetchFlags = 1 << iota
        fetchDefault fetchFlags = 0
)

func (a ...) fetch(..., flags fetchFlags) ... { ... }

Then, packages can use the bulk client, while indexes can use the short timeout one.

How does that sound?

cmd/chisel/cmd_cut.go

This reverts commit 62b704e.

rebornplusplus · 2023-07-04T03:10:17Z

Hiya @niemeyer, thanks for the insight! I also wondered what the intentions of bulkClient was while inspecting the source for this PR. I have dropped the previous changes and used the HTTP clients as you suggested.

Although you suggested using the shorter timeout httpClient for indexes, I am using the bulkClient. The reason for that is one of the index files, namely the one for jammy suite's universe component, is quite big; a whopping 17M. Please see Packages.gz at http://archive.ubuntu.com/ubuntu/dists/jammy/universe/binary-amd64/. As shown in #14 and as I elaborated in the Jira ticket, this file is one of the main reasons for Chisel to fail with a context deadline exceeded error.

Perhaps, using the bulk client for downloading indexes doesn't seem intuitive or natural. Let me know if you want to use the shorter timeout httpClient for downloading indexes anyway. In that case, I would encourage to increase the timeout for the client from 30s to 60s. Asking to download 17M in 30s seems a bit unfair.

niemeyer

Thanks for the changes.

woky approved these changes Jun 16, 2023

View reviewed changes

cjdcordeiro requested changes Jun 16, 2023

View reviewed changes

internal/archive/archive.go Outdated Show resolved Hide resolved

internal/archive/archive.go Outdated Show resolved Hide resolved

rebornplusplus changed the title ~~fix: add --timeout option in cut command~~ feat: add --timeout option in cut command Jun 16, 2023

Add test for --timeout

674f42e

Pass --real-archive option in go test to run this test with real archives.

rebornplusplus requested review from cjdcordeiro and woky June 16, 2023 09:41

CI: test ./internal/archive/ with --real-archive

3a6f140

The --real-archive flag specified in archive_test is used to test real archives and timeout limits. Run these tests in the github workflow.

cjdcordeiro reviewed Jun 19, 2023

View reviewed changes

internal/archive/archive_test.go Outdated Show resolved Hide resolved

fix: remove the no-timeout timeout test

4dee683

woky approved these changes Jun 19, 2023

View reviewed changes

rebornplusplus requested a review from cjdcordeiro June 19, 2023 07:36

cjdcordeiro approved these changes Jun 19, 2023

View reviewed changes

cjdcordeiro requested a review from niemeyer June 19, 2023 07:48

cjdcordeiro assigned niemeyer Jun 19, 2023

niemeyer requested changes Jun 26, 2023

View reviewed changes

cmd/chisel/cmd_cut.go Outdated Show resolved Hide resolved

cjdcordeiro added the Reviewed Supposedly ready for tuning or merging label Jun 27, 2023

rebornplusplus added 2 commits July 4, 2023 08:25

Revert "fix: add --timeout option in cut command"

06ee7b6

This reverts commit 62b704e.

fix: use http clients properly

be06f05

rebornplusplus requested review from niemeyer, woky and cjdcordeiro July 4, 2023 03:12

rebornplusplus changed the title ~~feat: add --timeout option in cut command~~ fix: use proper HTTP clients to fetch files Jul 4, 2023

woky approved these changes Jul 4, 2023

View reviewed changes

cjdcordeiro removed the Reviewed Supposedly ready for tuning or merging label Jul 6, 2023

cjdcordeiro approved these changes Jul 19, 2023

View reviewed changes

cjdcordeiro added the Simple Nice for a quick look on a minute or two label Aug 30, 2023

niemeyer approved these changes Sep 17, 2023

View reviewed changes

niemeyer merged commit 851768b into canonical:main Sep 17, 2023

rebornplusplus mentioned this pull request Sep 20, 2023

context deadline exceeded error when running chisel cut #14

Closed

cjdcordeiro mentioned this pull request Dec 12, 2023

feat: integrity checks for Ubuntu Release files #106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use proper HTTP clients to fetch files #66

fix: use proper HTTP clients to fetch files #66

rebornplusplus commented Jun 16, 2023 •

edited

Loading

cjdcordeiro left a comment

rebornplusplus commented Jun 16, 2023

cjdcordeiro commented Jun 16, 2023 •

edited

Loading

rebornplusplus commented Jun 19, 2023

woky left a comment •

edited

Loading

cjdcordeiro left a comment

niemeyer left a comment

rebornplusplus commented Jul 4, 2023

niemeyer left a comment

fix: use proper HTTP clients to fetch files #66

fix: use proper HTTP clients to fetch files #66

Conversation

rebornplusplus commented Jun 16, 2023 • edited Loading

cjdcordeiro left a comment

Choose a reason for hiding this comment

rebornplusplus commented Jun 16, 2023

cjdcordeiro commented Jun 16, 2023 • edited Loading

rebornplusplus commented Jun 19, 2023

woky left a comment • edited Loading

Choose a reason for hiding this comment

cjdcordeiro left a comment

Choose a reason for hiding this comment

niemeyer left a comment

Choose a reason for hiding this comment

rebornplusplus commented Jul 4, 2023

niemeyer left a comment

Choose a reason for hiding this comment

rebornplusplus commented Jun 16, 2023 •

edited

Loading

cjdcordeiro commented Jun 16, 2023 •

edited

Loading

woky left a comment •

edited

Loading