-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some missing logic for failed URI's in datasets and test_saving #607
Conversation
Codecov Report
@@ Coverage Diff @@
## master #607 +/- ##
==========================================
- Coverage 83.58% 81.19% -2.40%
==========================================
Files 207 209 +2
Lines 13840 14186 +346
==========================================
- Hits 11568 11518 -50
- Misses 2272 2668 +396
|
alibi_detect/datasets.py
Outdated
try: | ||
corruption_types = google_bucket_list(url, folder, filetype) | ||
except RequestException: | ||
logger.exception("Could not connect, URL may be out of service") | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exact error with the message is already caught and raised in google_bucket_list
, not sure we need to do this again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you're totally correct. I have removed the above.
Think I got confused as was trying to address this issue #315 (comment). The actual correction needs to be to add a pytest.skip
if the above error is raised when corruption_types_cifar10c
is used within test_datasets.py
. I'll do this now.
try: | ||
return get_movie_sentiment_data() | ||
except RequestException: | ||
pytest.skip('Movie sentiment dataset URL down') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this actually work? The test within which this data is used is magically skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually surprised this works. But I tested it by changing MOVIESENTIMENT_URLS
to contain incorrect url's, and running pytest alibi_detect/saving/tests/test_saving.py -k test_save_preprocess_nlp -sv
, and SKIPPED (Movie sentiment dataset URL down)
is returned. Magic indeed!
I think this is a special feature of the "cases" from pytest-cases
that we use to represent datasets in the new saving tests. They're like fancy fixtures, and have clever capabilities like being able to use pytest marks (i.e. skip
).
tokenizer = AutoTokenizer.from_pretrained(model_name) | ||
try: | ||
tokenizer = AutoTokenizer.from_pretrained(model_name) | ||
except OSError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OSError
the only one that can happen? Seems a bit weird here, is it because the network call was done somewhere else and failed silently and the loading is attempted from the locally downloaded file (which doesn't exist) or is from_pretrained
performing the network call but huggingface
isn't raising network errors up the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the full error traceback:
https://github.com/SeldonIO/alibi-detect/runs/7521461525
I think the huggingface
strategy must be what you described i.e. if a HTTPError
then they try to load from cache or a local directory. I tried passing an incorrect model_name
to from_pretrained
and using RepositoryNotFoundError
(this is raised instead of HTTPError
if I use a fake URI) or ValueError
in except
, but then I get the following:
OSError: We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached files and it looks like bert-base-cased is not the path to a directory containing a config.json file.
Seems to be like the other errors are not raised up the stack as you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the solution might just be to do except (OSError, HTTPError)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea probably, in case they decide to change the error type at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some questions.
This PR resolves two issues:
Improve error handling for URL requests in dataset fetch methods #315 (comment): The
try/except RequestException
pattern is added tocorruption_types_cifar10c
inalibi_detect.datasets
.Handle errors when fetching models/datasets in save/load tests #572:
test_saving.py
when URI's for huggingface models (e.g.tokenizer
's orembedding
's) are down.get_movie_sentiment_data
are down.