Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some missing logic for failed URI's in datasets and test_saving #607

Merged
merged 8 commits into from
Sep 9, 2022
Merged

Add some missing logic for failed URI's in datasets and test_saving #607

merged 8 commits into from
Sep 9, 2022

Conversation

ascillitoe
Copy link
Contributor

This PR resolves two issues:

  1. Improve error handling for URL requests in dataset fetch methods #315 (comment): The try/except RequestException pattern is added to corruption_types_cifar10c in alibi_detect.datasets.

  2. Handle errors when fetching models/datasets in save/load tests #572:

    • Skip tests in test_saving.py when URI's for huggingface models (e.g. tokenizer's or embedding's) are down.
    • Skip tests when the URI's in get_movie_sentiment_data are down.

@codecov-commenter
Copy link

codecov-commenter commented Sep 6, 2022

Codecov Report

Merging #607 (a33243a) into master (571d501) will decrease coverage by 2.39%.
The diff coverage is 52.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #607      +/-   ##
==========================================
- Coverage   83.58%   81.19%   -2.40%     
==========================================
  Files         207      209       +2     
  Lines       13840    14186     +346     
==========================================
- Hits        11568    11518      -50     
- Misses       2272     2668     +396     
Impacted Files Coverage Δ
alibi_detect/saving/tests/test_saving.py 87.87% <38.46%> (-7.66%) ⬇️
alibi_detect/datasets.py 68.69% <50.00%> (-0.64%) ⬇️
alibi_detect/saving/tests/datasets.py 95.83% <66.66%> (-4.17%) ⬇️
alibi_detect/tests/test_datasets.py 86.20% <66.66%> (-1.30%) ⬇️
alibi_detect/models/tensorflow/embedding.py 30.76% <0.00%> (-57.70%) ⬇️
alibi_detect/utils/prediction.py 75.00% <0.00%> (-25.00%) ⬇️
alibi_detect/cd/tests/test_learned_kernel.py 83.82% <0.00%> (-12.26%) ⬇️
alibi_detect/saving/saving.py 78.19% <0.00%> (-8.06%) ⬇️
alibi_detect/saving/loading.py 82.88% <0.00%> (-6.96%) ⬇️
... and 12 more

Comment on lines 293 to 297
try:
corruption_types = google_bucket_list(url, folder, filetype)
except RequestException:
logger.exception("Could not connect, URL may be out of service")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exact error with the message is already caught and raised in google_bucket_list, not sure we need to do this again here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you're totally correct. I have removed the above.

Think I got confused as was trying to address this issue #315 (comment). The actual correction needs to be to add a pytest.skip if the above error is raised when corruption_types_cifar10c is used within test_datasets.py. I'll do this now.

Comment on lines +68 to +71
try:
return get_movie_sentiment_data()
except RequestException:
pytest.skip('Movie sentiment dataset URL down')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this actually work? The test within which this data is used is magically skipped?

Copy link
Contributor Author

@ascillitoe ascillitoe Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually surprised this works. But I tested it by changing MOVIESENTIMENT_URLS to contain incorrect url's, and running pytest alibi_detect/saving/tests/test_saving.py -k test_save_preprocess_nlp -sv, and SKIPPED (Movie sentiment dataset URL down) is returned. Magic indeed!

I think this is a special feature of the "cases" from pytest-cases that we use to represent datasets in the new saving tests. They're like fancy fixtures, and have clever capabilities like being able to use pytest marks (i.e. skip).

tokenizer = AutoTokenizer.from_pretrained(model_name)
try:
tokenizer = AutoTokenizer.from_pretrained(model_name)
except OSError:
Copy link
Contributor

@jklaise jklaise Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OSError the only one that can happen? Seems a bit weird here, is it because the network call was done somewhere else and failed silently and the loading is attempted from the locally downloaded file (which doesn't exist) or is from_pretrained performing the network call but huggingface isn't raising network errors up the stack?

Copy link
Contributor Author

@ascillitoe ascillitoe Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the full error traceback:

https://github.com/SeldonIO/alibi-detect/runs/7521461525

I think the huggingface strategy must be what you described i.e. if a HTTPError then they try to load from cache or a local directory. I tried passing an incorrect model_name to from_pretrained and using RepositoryNotFoundError (this is raised instead of HTTPError if I use a fake URI) or ValueError in except, but then I get the following:

OSError: We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached files and it looks like bert-base-cased is not the path to a directory containing a config.json file.

Seems to be like the other errors are not raised up the stack as you say?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the solution might just be to do except (OSError, HTTPError)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea probably, in case they decide to change the error type at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some questions.

@ascillitoe ascillitoe merged commit 0d46a04 into SeldonIO:master Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants