Add some missing logic for failed URI's in datasets and test_saving #607

ascillitoe · 2022-09-06T15:44:43Z

This PR resolves two issues:

Improve error handling for URL requests in dataset fetch methods #315 (comment): The try/except RequestException pattern is added to corruption_types_cifar10c in alibi_detect.datasets.
Handle errors when fetching models/datasets in save/load tests #572:
- Skip tests in test_saving.py when URI's for huggingface models (e.g. tokenizer's or embedding's) are down.
- Skip tests when the URI's in get_movie_sentiment_data are down.

… down

codecov-commenter · 2022-09-06T16:31:26Z

Codecov Report

Merging #607 (a33243a) into master (571d501) will decrease coverage by 2.39%.
The diff coverage is 52.94%.

@@            Coverage Diff             @@
##           master     #607      +/-   ##
==========================================
- Coverage   83.58%   81.19%   -2.40%     
==========================================
  Files         207      209       +2     
  Lines       13840    14186     +346     
==========================================
- Hits        11568    11518      -50     
- Misses       2272     2668     +396

Impacted Files	Coverage Δ
alibi_detect/saving/tests/test_saving.py	`87.87% <38.46%> (-7.66%)`	⬇️
alibi_detect/datasets.py	`68.69% <50.00%> (-0.64%)`	⬇️
alibi_detect/saving/tests/datasets.py	`95.83% <66.66%> (-4.17%)`	⬇️
alibi_detect/tests/test_datasets.py	`86.20% <66.66%> (-1.30%)`	⬇️
alibi_detect/models/tensorflow/embedding.py	`30.76% <0.00%> (-57.70%)`	⬇️
alibi_detect/utils/prediction.py	`75.00% <0.00%> (-25.00%)`	⬇️
alibi_detect/cd/tests/test_learned_kernel.py	`83.82% <0.00%> (-12.26%)`	⬇️
alibi_detect/saving/saving.py	`78.19% <0.00%> (-8.06%)`	⬇️
alibi_detect/saving/loading.py	`82.88% <0.00%> (-6.96%)`	⬇️
... and 12 more

jklaise · 2022-09-09T11:00:13Z

alibi_detect/datasets.py

+    try:
+        corruption_types = google_bucket_list(url, folder, filetype)
+    except RequestException:
+        logger.exception("Could not connect, URL may be out of service")
+        raise


This exact error with the message is already caught and raised in google_bucket_list, not sure we need to do this again here?

Thanks, you're totally correct. I have removed the above.

Think I got confused as was trying to address this issue #315 (comment). The actual correction needs to be to add a pytest.skip if the above error is raised when corruption_types_cifar10c is used within test_datasets.py. I'll do this now.

jklaise · 2022-09-09T11:02:28Z

alibi_detect/saving/tests/datasets.py

+        try:
+            return get_movie_sentiment_data()
+        except RequestException:
+            pytest.skip('Movie sentiment dataset URL down')


How does this actually work? The test within which this data is used is magically skipped?

I was actually surprised this works. But I tested it by changing MOVIESENTIMENT_URLS to contain incorrect url's, and running pytest alibi_detect/saving/tests/test_saving.py -k test_save_preprocess_nlp -sv, and SKIPPED (Movie sentiment dataset URL down) is returned. Magic indeed!

I think this is a special feature of the "cases" from pytest-cases that we use to represent datasets in the new saving tests. They're like fancy fixtures, and have clever capabilities like being able to use pytest marks (i.e. skip).

jklaise · 2022-09-09T11:04:27Z

alibi_detect/saving/tests/test_saving.py

-    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    try:
+        tokenizer = AutoTokenizer.from_pretrained(model_name)
+    except OSError:


Is OSError the only one that can happen? Seems a bit weird here, is it because the network call was done somewhere else and failed silently and the loading is attempted from the locally downloaded file (which doesn't exist) or is from_pretrained performing the network call but huggingface isn't raising network errors up the stack?

This is the full error traceback:

https://github.com/SeldonIO/alibi-detect/runs/7521461525

I think the huggingface strategy must be what you described i.e. if a HTTPError then they try to load from cache or a local directory. I tried passing an incorrect model_name to from_pretrained and using RepositoryNotFoundError (this is raised instead of HTTPError if I use a fake URI) or ValueError in except, but then I get the following:

OSError: We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached files and it looks like bert-base-cased is not the path to a directory containing a config.json file.

Seems to be like the other errors are not raised up the stack as you say?

Maybe the solution might just be to do except (OSError, HTTPError)?

Yea probably, in case they decide to change the error type at some point.

jklaise

LGTM, left some questions.

…rn uses urllib

…607)

Add missing try/except for datasets, and skip saving tests when url's…

704380f

… down

This was linked to issues Sep 6, 2022

Improve error handling for URL requests in dataset fetch methods #315

Closed

Handle errors when fetching models/datasets in save/load tests #572

Closed

ascillitoe added 3 commits September 6, 2022 16:57

Update CHANGELOG.md

e8bd2a6

Merge branch 'master' into fix/skip_failed_uri

52a4773

Remove duplicate heading

86101bc

ascillitoe requested review from mauicv and jklaise September 6, 2022 16:33

jklaise reviewed Sep 9, 2022

View reviewed changes

jklaise approved these changes Sep 9, 2022

View reviewed changes

ascillitoe added 4 commits September 9, 2022 13:37

Remove redundent try/except from corruption_types_cifar10c

3482b09

Skip tests if corruption_types_cifar10c raises a RequestError

e820426

Add HTTPError to OSError

684581e

Request RequestException with URLError for fetch_kdd since scikit-lea…

a33243a

…rn uses urllib

ascillitoe merged commit 0d46a04 into SeldonIO:master Sep 9, 2022

ascillitoe added a commit that referenced this pull request Nov 8, 2022

Add some missing logic for failed URI's in datasets and test_saving (#…

455a11d

…607)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some missing logic for failed URI's in datasets and test_saving #607

Add some missing logic for failed URI's in datasets and test_saving #607

ascillitoe commented Sep 6, 2022

codecov-commenter commented Sep 6, 2022 •

edited

Loading

jklaise Sep 9, 2022

ascillitoe Sep 9, 2022

jklaise Sep 9, 2022

ascillitoe Sep 9, 2022 •

edited

Loading

jklaise Sep 9, 2022 •

edited

Loading

ascillitoe Sep 9, 2022 •

edited

Loading

ascillitoe Sep 9, 2022

jklaise Sep 9, 2022

ascillitoe Sep 9, 2022

jklaise left a comment

Add some missing logic for failed URI's in datasets and test_saving #607

Add some missing logic for failed URI's in datasets and test_saving #607

Conversation

ascillitoe commented Sep 6, 2022

codecov-commenter commented Sep 6, 2022 • edited Loading

Codecov Report

jklaise Sep 9, 2022

Choose a reason for hiding this comment

ascillitoe Sep 9, 2022

Choose a reason for hiding this comment

jklaise Sep 9, 2022

Choose a reason for hiding this comment

ascillitoe Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

jklaise Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

ascillitoe Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

ascillitoe Sep 9, 2022

Choose a reason for hiding this comment

jklaise Sep 9, 2022

Choose a reason for hiding this comment

ascillitoe Sep 9, 2022

Choose a reason for hiding this comment

jklaise left a comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 6, 2022 •

edited

Loading

ascillitoe Sep 9, 2022 •

edited

Loading

jklaise Sep 9, 2022 •

edited

Loading

ascillitoe Sep 9, 2022 •

edited

Loading