Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing mirrorbrain's check_large_file threshold #257

Merged
merged 1 commit into from
Dec 13, 2023
Merged

Changing mirrorbrain's check_large_file threshold #257

merged 1 commit into from
Dec 13, 2023

Conversation

rgaudin
Copy link
Member

@rgaudin rgaudin commented Dec 13, 2023

Mirrobrain scanner has a $gig2 variable set at 2GiB It is used as a threshold on rsync scans
Files larger than it trigger a Range-Request HTTP download. If the request fails a warning is printed

Those requests are not necessary and create load on mirrors

This changes this variable value to 256GiB

Mirrobrain scanner has a `$gig2` variable set at 2GiB
It is used as a threshold on rsync scans
Files larger than it trigger a Range-Request HTTP download.
If the request fails a warning is printed

Those requests are not necessary and create load on mirrors

This changes this variable value to 256GiB
@rgaudin rgaudin self-assigned this Dec 13, 2023
@rgaudin rgaudin merged commit 752ca1a into main Dec 13, 2023
1 check passed
@rgaudin rgaudin deleted the mb_http branch December 13, 2023 10:20
@benoit74
Copy link
Collaborator

I'm not sure this is done for a good reason.

The code/documentation at https://github.com/poeml/mirrorbrain/blob/76f2909e33004a7f5e0dd52b816881eb9fbd4246/tools/scanner.pl#L1396-L1398 explains why this double-check is done on files larger than 2GB.

If this double-check fails, the file is marked as not available on the mirror (at least it should).

If you look for "cannot be delivered via HTTP! Skipping" log in Grafana you will see there are plenty of occurrences.

What is weird is that when I try to download one or two files which are supposed to fail, it works indeed. So it looks like the double-check is broken.

In conclusion, I would suggest to modify the value to never do the check, since this seems to consume load on servers (our scanner + the mirrors) and produce more harm than good.

@rgaudin
Copy link
Member Author

rgaudin commented Dec 14, 2023

In conclusion, I would suggest to modify the value to never do the check, since this seems to consume load on servers (our scanner + the mirrors) and produce more harm than good.

Well disabling is not easily feasible but if you think 256GiB is not enough, you can change the 38 value. 39 would be 512GiB, 40 1TiB and 41 2TiB. I'll let you do the change with what feels more appropriate

@benoit74
Copy link
Collaborator

In fact I was thinking that we might produce files of up to 1TB, but this is for offspot cards (with many ZIMs), which are not served by the mirrors, not single ZIM or file. It is too early in the morning here, I probably need one more coffee. Your value is probably OK, let's keep it.

@rgaudin
Copy link
Member Author

rgaudin commented Dec 14, 2023

We do have a TB+ ZIM file (in dev, not synced) and increasing it has no consequence so maybe we should just set 2TiB once and forget about it. Hopefully it will last until we get rid of mb (if ever! 😵‍💫)

@benoit74
Copy link
Collaborator

As discussed live, I will set it to 2^63 = 8 EiB so we never come back to this issue again (hopefully).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants