-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there an e2e integration test on toy data? #303
Comments
Indeed having end-to-end tests on the methods themselves is something that we need. However, using a toy dataset is not something that can correctly evaluate all methods. A decent middle ground can probably be a subset of imagenet100 (let's say 10%) for a couple of epochs and check if the obtained results (accuracy and loss values) fall into a predefined range that we need to compute beforehand. What do you think @DonkeyShot21? |
@vturrisi Yeah, toy didn't mean synthetic necessarily. A tiny imagenet100 or tiny MNIST. (I suggest MNIST just because there are so few labels that fewer instances might make more sense.) I googled quickly but couldn't find any tiny image datasets. But maybe you are familiar with some. According to your profiles, 10% of 4m55 second epochs is about 30 seconds, on a GPU. You might consider, for this e2e test, using a smaller model than a big resnet so you can do it on CPU. BTW, if you can decide upon a simple spec (which data set, which main functions you want to try, etc), I'm happy to contribute to the development. COOL NOTE: Lightning Ecosystem CI allows you to "automate issue discovery for your projects against Lightning nightly and releases. I would suggest with writing simple e2e tests in your repo and then later adding them to the lightning nightly CI. |
@turian I'm also not sure how to manage data with github actions such that we can upload this imagenet100 subset (is this even possible?). The first step would be to check if we can upload datasets and then run all the current methods in that specific setting to gather some range of values for their losses and top-1 acc values to write the tests. About Lightning CI, they reached out to us some time ago and we are already part of that. I didn't have time to look into it, so I'm probably not taking any advantage of that, but if we can use it for these new tests, it would be cool. |
I think we're kinda talking about two separate things. I'm more interested in an e2e test that runs quickly and just makes sure nothing breaks. (Unit tests are cool but don't always test the handoff points between different units.) You are interested in doing hardcore regression testing to make sure scores don't drop on a known dataset. A few opinions on my e2e proposal:
FYI, Travis will offer free credits to academic / open source projects, but these get exhausted very quickly if you use huge testing matrices (like every python x every pytorch x every OS), so I'd use that judiciously and only as a periodic supplement to github actions. (Maybe every time something is merged to main, not every single push.) Regarding your suggestion:
Overall, my suggestion is get the simple dumb fast e2e test working first (as I described above). Once that works, we can figure out how to do a proper e2e regression test on a "real" data set. |
@turian been quite busy this week, but I'll try to get back here as soon as possible. Regardless, the end to end tests that you mentioned can be easily done with cifar10 and even without gpu with GitHub actions. It's just a matter of defining the scripts in a similar way to what I did in For the tests that I mentioned, I think we don't need anything fancy or automatic, just a set of scripts that we could manually run every couple of versions (or before any major version) to properly assess that nothing got screwed, performance wise. |
The latest commit has tests for all scripts in |
Describe the bug
In doing a major refactor (e.g. switching to OmegaConf or hydra), it's not clear to me there is a full e2e integration test. What main(s) would be the best on which to test this?
Additional comments
I might be mistaken, but
tests/
only contains unit tests. A full e2e test on the most commonmain
method(s), on a toy dataset, could test many code paths and make sure that some refactor does as intended. (This came up because I wanted to try a hydra port but had no idea how to quickly test if there was breakage or some crazy MSE on the downstream score versus the expected.)An unintended sideeffect is codecov will increase :)
The text was updated successfully, but these errors were encountered: