Add tests for drivers #42

winfried-ripken · 2022-06-20T18:20:20Z

Description

The drivers should be tested fully in isolation. The goal is to mock all data if applicable.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring including code style reformatting
Other (please describe):

Checklist:

I have read the contributing guideline doc (external contributors only)
Lint and unit tests pass locally with my changes
I have kept the PR small so that it can be easily reviewed
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
All dependency changes have been reflected in the pip requirement files.

winfried-ripken · 2022-06-20T18:20:46Z

requirements.dev.in

@@ -12,7 +12,7 @@ myst_parser # For importing md into rst files.
 pyspark>=3.2.0
 flaky # for retry flaky tests
 nbmake
-notebook>=6.4.10


fix vulnerability

winfried-ripken · 2022-06-20T18:21:09Z

src/squirrel_datasets_core/datasets/conceptual_captions/driver.py

@@ -58,7 +60,7 @@ def _download_image(url: str) -> np.ndarray:
            url: location of the image
        """
        req = Request(url, headers={"User-Agent": "Mozilla/5.0"})
-        resp = urllib.request.urlopen(req)
+        resp = urllib.request.urlopen(req, timeout=1)


add timeout here to unblock the loading

winfried-ripken · 2022-06-20T18:21:56Z

src/squirrel_datasets_core/datasets/cc100/cc100.py

@@ -32,14 +33,6 @@ def __init__(

        self.compression = compression

-    @property


winfried-ripken · 2022-06-20T18:22:23Z

src/squirrel_datasets_core/datasets/bdd100k/driver.py

            sample["image"] = load_image(sample["image_url"])
-        if parse_label:
+        if parse_label and "label_url" in sample:


check if label is really there to avoid fetching None

requirements.txt

AlpAribal

Looks nice! I am dropping some intermediate feedback for some of the drivers. Will continue the review soon.

In general, you can check the assertions in tests for all samples rather than the first one.

Currently, we are only testing with local paths. If makes sense to you too, we can define a separate ticket for adding GCP paths via a fixture.

requirements.txt

src/squirrel_datasets_core/datasets/bdd100k/driver.py

test/test_datasets/test_bdd100k.py

test/test_datasets/mock_utils.py

src/squirrel_datasets_core/datasets/monthly_german_tweets/driver.py

test/test_datasets/test_monthly_german_tweets.py

winfried-ripken · 2022-07-04T15:52:20Z

/gcbrun

AlpAribal

Very good job with the mocks, overall they are very easy to understand and cover all driver functionalities, thank you!

I am leaving some more minor comments, now I have checked the whole PR.

test/test_datasets/test_allenai.py

AlpAribal · 2022-07-27T04:40:15Z

test/test_datasets/test_allenai.py

+    save_path = mock_allenai_data(N, tmp_path)
+
+    config = defaultdict(dict)
+    config["zu"]["train"] = [save_path]


I would add two other languages here (perhaps with different number of samples and different splits) and then test:

switching between languages and checking for the expected number of samples

switching between splits and checking for the expected number of samples

using lang=None before & after selecting specific languages. I guess with None, we should be getting all samples. I think there might be a bug where None does not work as intended if we select a specific language before

Very good catch! Indeed there was a bug

test/test_datasets/test_allenai.py

test/test_datasets/test_camvid.py

test/test_datasets/test_cc100.py

test/test_datasets/test_conceptual_captions.py

test/test_datasets/test_imagenet.py

winfried-ripken · 2022-08-03T14:54:41Z

/gcbrun

winfried-ripken · 2022-08-03T16:21:14Z

/gcbrun

Co-authored-by: Alp Arıbal <[email protected]>

winfried-ripken · 2022-08-08T21:01:53Z

Thanks a lot @AlpAribal for the great feedback! I have integrated your suggestions and fixed a few more bugs

winfried-ripken · 2022-08-08T21:06:25Z

Looks nice! I am dropping some intermediate feedback for some of the drivers. Will continue the review soon.

In general, you can check the assertions in tests for all samples rather than the first one.

Currently, we are only testing with local paths. If makes sense to you too, we can define a separate ticket for adding GCP paths via a fixture.

I don't know if testing with GCP paths is really necessary - as we rely on the fsspec module to handle paths I think we would rather test the functionality of that module if it can handle gcp and local paths equally, which would be out of scope, would you agree with that or am I missing something?

AlpAribal

LGTM! Do you want to make the bump to py39 in this PR or in another?

AlpAribal · 2022-08-09T06:44:45Z

requirements.txt

@@ -1,5 +1,5 @@
 #
-# This file is autogenerated by pip-compile with python 3.8
+# This file is autogenerated by pip-compile with python 3.9


To switch to python3.9, we also need to update the dockerfile base image and re-generate the hashes in this file (i.e. run pip-compile with --no-reuse-hashes once)

Thanks for the hint!
Let's do this in another PR then

Thanks a lot for your very detailed feedback!

src/squirrel_datasets_core/datasets/allenai_c4/allenai_c4_multilingual.py

test/test_datasets/test_allenai.py

test/test_datasets/test_conceptual_captions.py

AlpAribal · 2022-08-09T07:35:49Z

I don't know if testing with GCP paths is really necessary - as we rely on the fsspec module to handle paths I think we would rather test the functionality of that module if it can handle gcp and local paths equally, which would be out of scope, would you agree with that or am I missing something?

You are right, I agree. I was thinking that we might be introducing code that only works for local paths (e.g. using pathlib to append to paths etc.) but this does not seem to be the case.

Co-authored-by: Alp Arıbal <[email protected]>

…ilingual.py Co-authored-by: Alp Arıbal <[email protected]>

AlpAribal

LGTM! Great job, I really like and appreciate the amount of detail that you put into mocking the drivers' data 🎉

winfried-ripken commented Jun 20, 2022

View reviewed changes

winfried-ripken requested a review from AlpAribal June 21, 2022 22:26

winfried-ripken commented Jun 21, 2022

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

winfried-ripken mentioned this pull request Jun 27, 2022

Update requirements #43

Merged

12 tasks

winfried-ripken marked this pull request as ready for review June 27, 2022 10:27

winfried-ripken force-pushed the winnie-add-tests branch from 73716fa to 932cfc9 Compare June 27, 2022 20:13

AlpAribal reviewed Jun 28, 2022

View reviewed changes

AlpAribal reviewed Jul 27, 2022

View reviewed changes

Winfried Loetzsch added 17 commits July 28, 2022 12:47

Use recommended tmp_path instead of tmpdir

8542d09

Add camvid test

f28c675

Test datascience bowl

31b16ec

Skip type checking for coverage report

4443743

Add test for allenai

7e06b6d

Fix dependabot alert

2fca44b

Add test for bdd100k

55e5683

Add test for cc100

2da92c2

Add test for conceptual captions

3f29394

Add test for casting quality

ef9b11a

Add test for monthly german tweets

940877f

Add imagenet test

269be68

Fix conceptual captions dimensions

6245617

Add mock to requirements

145629e

Update test to work with unittest.mock directly

9e7afaa

Remove mock from requirements

b113344

Correct mistakes in requirements.txt

be503cf

winfried-ripken force-pushed the winnie-add-tests branch from fc8fc5a to be503cf Compare July 28, 2022 10:47

Winfried Lötzsch and others added 8 commits August 8, 2022 21:23

Update test/test_datasets/mock_utils.py

3129453

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/mock_utils.py

9c97375

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/test_cc100.py

33567cc

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/test_conceptual_captions.py

34615c6

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/test_conceptual_captions.py

779d9a2

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/test_imagenet.py

10159bf

Co-authored-by: Alp Arıbal <[email protected]>

Integrate feedback from Alp

20606f9

Check all examples instead of only one

1a3578f

winfried-ripken requested a review from AlpAribal August 8, 2022 21:00

AlpAribal reviewed Aug 9, 2022

View reviewed changes

Winfried Lötzsch and others added 8 commits August 9, 2022 11:43

Update test/test_datasets/test_allenai.py

074f0e4

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/test_allenai.py

5f35017

Co-authored-by: Alp Arıbal <[email protected]>

Update test/test_datasets/test_conceptual_captions.py

f36fc68

Co-authored-by: Alp Arıbal <[email protected]>

Update src/squirrel_datasets_core/datasets/allenai_c4/allenai_c4_mult…

69c217a

…ilingual.py Co-authored-by: Alp Arıbal <[email protected]>

Expect value error instead of runtime error

5bb9ba6

Fix typehints

7bbe659

switch requirements back to python 3.8

60b562b

Bump version

b8a6438

winfried-ripken requested a review from AlpAribal August 9, 2022 10:09

AlpAribal approved these changes Aug 9, 2022

View reviewed changes

winfried-ripken merged commit d71ace2 into main Aug 9, 2022

winfried-ripken deleted the winnie-add-tests branch August 9, 2022 13:14

github-actions bot locked and limited conversation to collaborators Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for drivers #42

Add tests for drivers #42

winfried-ripken commented Jun 20, 2022 •

edited

Loading

winfried-ripken Jun 20, 2022

winfried-ripken Jun 20, 2022

winfried-ripken Jun 20, 2022

winfried-ripken Jun 20, 2022

AlpAribal left a comment

winfried-ripken commented Jul 4, 2022

AlpAribal left a comment

AlpAribal Jul 27, 2022

winfried-ripken Aug 8, 2022

winfried-ripken commented Aug 3, 2022

winfried-ripken commented Aug 3, 2022

winfried-ripken commented Aug 8, 2022

winfried-ripken commented Aug 8, 2022

AlpAribal left a comment

AlpAribal Aug 9, 2022

winfried-ripken Aug 9, 2022

winfried-ripken Aug 9, 2022

AlpAribal commented Aug 9, 2022

AlpAribal left a comment •

edited

Loading

		@@ -32,14 +33,6 @@ def __init__(

		self.compression = compression

		@property

Add tests for drivers #42

Add tests for drivers #42

Conversation

winfried-ripken commented Jun 20, 2022 • edited Loading

Description

Type of change

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlpAribal left a comment

Choose a reason for hiding this comment

winfried-ripken commented Jul 4, 2022

AlpAribal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

winfried-ripken commented Aug 3, 2022

winfried-ripken commented Aug 3, 2022

winfried-ripken commented Aug 8, 2022

winfried-ripken commented Aug 8, 2022

AlpAribal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlpAribal commented Aug 9, 2022

AlpAribal left a comment • edited Loading

Choose a reason for hiding this comment

winfried-ripken commented Jun 20, 2022 •

edited

Loading

AlpAribal left a comment •

edited

Loading