Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional NotFoundError failures #524

Open
gsnedders opened this issue Jan 11, 2019 · 14 comments
Open

Occasional NotFoundError failures #524

gsnedders opened this issue Jan 11, 2019 · 14 comments

Comments

@gsnedders
Copy link

From web-platform-tests/wpt#13274:

These are stacks like:

Traceback (most recent call last):
  File "./wpt", line 5, in <module>
    wpt.main()
  File "/home/test/web-platform-tests/tools/wpt/wpt.py", line 129, in main
    rv = script(*args, **kwargs)
  File "/home/test/web-platform-tests/tools/wpt/run.py", line 510, in run
    **kwargs)
  File "/home/test/web-platform-tests/tools/wpt/run.py", line 488, in setup_wptrunner
    kwargs["binary"] = setup_cls.install(venv, channel=channel)
  File "/home/test/web-platform-tests/tools/wpt/run.py", line 165, in install
    return self.browser.install(venv.path, channel)
  File "/home/test/web-platform-tests/tools/wpt/browser.py", line 134, in install
    destination=dest).download()
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/factory.py", line 121, in __init__
    scraper_types[scraper_type].__init__(self, **kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 346, in __init__
    Scraper.__init__(self, *args, **kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 135, in __init__
    self._retry_check_404(self.get_build_info)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 150, in _retry_check_404
    self._retry(func, **retry_kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 141, in _retry
    return redo.retry(func, **retry_kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/redo/__init__.py", line 162, in retry
    return action(*args, **kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 403, in get_build_info
    self.date, self.build_index)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 484, in get_build_info_for_date
    raise errors.NotFoundError(message, url)
mozdownload.errors.NotFoundError: Folder for builds on 2018-09-28-22-04-33 has not been found: https://archive.mozilla.org/pub/firefox/nightly/2018/09/

From @jgraham in web-platform-tests/wpt#13274 (comment):

I fairly strongly suspect that this is happening when a new nightly is being released (maybe some platforms are available and some are not?). But we are already handling that badly; it's possible to end up with some tests run in the previous nightly and some in the new one. Really we need a single decsion task that picks a binary URL and makes it available to the subsequent tasks to ensure that they all run against the exact same version. Note that Chrome could have the same issue, but it's less likely since the releases are less often. But it's harder to solve in that case; we probably actually need to download the .deb and make it available as an artifact since there isn't a longlived URL AFAIK.

@whimboo
Copy link
Contributor

whimboo commented Jan 14, 2019

Yes, this is most likely the case. If the build is not present it will fail. But that is not a problem with mozdownload.

@whimboo whimboo closed this as completed Jan 14, 2019
@jgraham
Copy link
Member

jgraham commented Jan 14, 2019

It seems like a problem with mozdownload if we are asking "give me the latest linux nightly", it's finding a directory where a linux nightly could be, and because it hasn't been uploaded yet, failing rather than going back to the previous nightly. If we are just using the API wrong that's fine (and suggestions/patches welcome), but otherwise this seems like a real issue.

@gsnedders
Copy link
Author

Specifically, we're doing:

from mozdownload import FactoryScraper
scraper = FactoryScraper("daily",
     branch="mozilla-central",
     version="latest",
     destination="browsers/nightly")
filename = scraper.download()

At least to me, that looks like it shouldn't ever fail because the build isn't present: either there's some bug on the server side which causes mozdownload to think a build exists, or there's a bug in mozdownload not handling some bit of server behaviour.

@whimboo
Copy link
Contributor

whimboo commented Jan 14, 2019

Oh, I see. You don't use the buildid to specify a particular fixed version. In that case this would need some further investigation. I assume you don't have way to run mozdownload with -vv to get more verbose logging output?

So @jgraham's reply would make sense. Looks like we don't traverse back into older folders until a version has been found. I wished that we would already have Taskcluster support. Maybe we should raise its severity to make things like that easier.

@whimboo whimboo reopened this Jan 14, 2019
@whimboo
Copy link
Contributor

whimboo commented Jan 14, 2019

How often do you hit that?

@jgraham
Copy link
Member

jgraham commented Jan 30, 2019

On a timescale of days i.e. a few times a week. It's often enough that the behaviour is problematic.

@whimboo
Copy link
Contributor

whimboo commented Jan 30, 2019

So the code clearly finds the build status file of the latest build. But I wonder if new files first get populated in latest-mozilla-central before an appropriate folder by date is created under the month folder. That is the only thing I could imagine happening here.

Does it help if you add the --retry-attempts, and maybe also the --retry-sleeptime arguments? As I can see you don't make use of that feature yet, and it might help here.

I'm not sure if this is something we can fix by using archive.mozilla.org if the upload of builds is done in a wrong order. Ideally we should use TaskCluster to download the builds, but that hasn't been started yet. See issue #365.

@jgraham
Copy link
Member

jgraham commented Jan 31, 2019

I assumed that the problem is the latest-mozilla-central link being updated when the first artifact is available for the new set of builds, not the last artifact. So if e.g. there's a linux32-debug build ready but not linux64-opt then latest-mozilla-central will temporarily point at a folder with no suitable build. In that case there's no fallback to just using the latest build that is ready.

Looking at http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ I see a time gap of at least an hour between the first artifact and the last one (ignoring the non-67 builds), so this at least seems plausible. Given that, retries would help but the total timeout would have to be prohibitively long to avoid hitting this problem.

@whimboo
Copy link
Contributor

whimboo commented Jan 31, 2019

I don't think that we completely replace this folder. In such a case there wouldn't be such old Firefox 66 nightly builds present, as what we currently have.

What mozdownload actually does is to check for the status file like:
http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-67.0a1.en-US.linux-x86_64.txt

Currently it lists 20190130215539 as build id. Based on that information we are trying to download the files from https://archive.mozilla.org/pub/firefox/nightly/2019/01/2019-01-30-21-55-39-mozilla-central/. But if the files aren't present there it will fail.

So I really wonder if we maybe first update the links in the latest folder before actually adding/updating the build id specific folder.

@nthomas-mozilla who from RelEng could explain us how the latest-mozilla-central folder and the specific build id folder are getting populated?

@nthomas-mozilla
Copy link
Contributor

You're right that the latest-mozilla-central directory is appended to rather than recreated (paired with an expiration policy to clean out older builds). Files are moved into that dir by a beetmover task, here's an example log for a recent linux64 nightly.

The artifacts are handled asynchronously, copying them first into the dated directory then to latest-mozilla-central, and the .txt file is started before the actual tar.bz for Firefox. In that particular log there's only a 10 second gap but longer is possible depending on network conditions. Another complication is that there is up to 14400 seconds (4h) of caching on the latest directory on archive.m.o, but I'm not sure how that would lead to files not found. More likely would be getting the previous nightly via stale copy of the .txt file.

In terms of alternatives

@foolip
Copy link

foolip commented Feb 11, 2019

Would it be possible for this issue to be assigned to someone? It's coming up in web-platform-tests/wpt#13274 with some regularity, requiring manual intervention each time as we can't tell the difference between mozdownload failures and other types of failures.

@whimboo
Copy link
Contributor

whimboo commented Feb 11, 2019

@foolip I replied on the wpt issue with the 2nd part from @nthomas-mozilla reply. If that doesn't work, maybe increase the retry attempts and delays for mozdownload for now.

@nthomas-mozilla is there a way to prevent using the cache and always get a fresh copy? We currently try to do it via https://github.com/mozilla/mozdownload/blob/master/mozdownload/scraper.py#L425, but that might be wrong?

@jgraham
Copy link
Member

jgraham commented Feb 11, 2019

I don't think the caching should be a problem because as noted there's no suggested mechanism for the cached file to not exist, whereas it looks like we are getting a pointer to a file that doesn't yet exist.

@nthomas-mozilla
Copy link
Contributor

@whimboo AFAIK there's no cache busting that can be done. I had a look through the mozdownload source, and after retrieving the build date it seems get_build_info_for_date() will then try to parse the directory listing at firefox/nightly/YYYY/MM/ to make sure YYYY-MM-DD-hh-mm-ss is present. The listing also cached, for 15 minutes by looking at the headers. For this particular use case I suggest just scraping firefox/nightly/YYYY/MM/YYYY-MM-DD-hh-mm-ss/ directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants