Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the zim files in zim-testing-suite for unit tests. #531

Closed
wants to merge 4 commits into from

Conversation

mgautierfr
Copy link
Collaborator

No description provided.

@mgautierfr mgautierfr force-pushed the no_test_data branch 4 times, most recently from 7365992 to 7d72dc8 Compare April 12, 2021 16:27
@mgautierfr mgautierfr force-pushed the no_test_data branch 2 times, most recently from 08c3eea to 00f04f3 Compare April 12, 2021 17:21
@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #531 (732194a) into master (4e22ac3) will decrease coverage by 0.02%.
The diff coverage is n/a.

❗ Current head 732194a differs from pull request most recent head 5fb5f36. Consider uploading reports for the commit 5fb5f36 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #531      +/-   ##
==========================================
- Coverage   76.39%   76.37%   -0.03%     
==========================================
  Files          89       89              
  Lines        3661     3661              
  Branches     1630     1630              
==========================================
- Hits         2797     2796       -1     
- Misses        863      864       +1     
  Partials        1        1              
Impacted Files Coverage Δ
src/archive.cpp 55.86% <0.00%> (-0.56%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e22ac3...5fb5f36. Read the comment docs.

@mgautierfr mgautierfr changed the title [TEST] Use a helper function to get the path of a test zim file. Use the zim files in zim-testing-suite for unit tests. Apr 12, 2021
@mgautierfr mgautierfr force-pushed the no_test_data branch 5 times, most recently from 0f71799 to 732194a Compare April 12, 2021 18:37
@mgautierfr
Copy link
Collaborator Author

@legoktm The idea behind this PR is to move all the test zim files in a separated repository (https://github.com/openzim/zim-testing-suite).
So now, unittests launched during packaging are failing.
What is the best approch on packaging side ? Maybe just add the archive as new source of the package ?

@legoktm
Copy link
Member

legoktm commented Apr 13, 2021

Can we do what we did in openzim/python-libzim@ff560ea and have the tests be skipped if the zim test files are missing?

And then, how many repositories do you expect will use the new zim-testing-suite repository? If it's just this one, then we can add it as an additional source, but if it'll be multiple we probably want to just package it separately and have libzim, etc. depend on it.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change we must agree on the new/proposed workflow of managing test ZIM archives and unit-tests depending on them? In particular:

  1. Under what circumstances and how existing ZIM archives can be updated in the broad sense (even if it means creating a new manually versioned revision of that archive)?
  2. How their versioning is going to be handled?
  3. What is the envisioned approach to creating and maintaining backward compatibility tests (when the same test must run on various versions of the conceptually "same" ZIM archive)?
  4. How do you make sure that developers have the most up-to-date test data on their local development hosts? zim-testing-suite becomes a weak dependency of the source repositories since it is used only for unit-tests. How to avoid scenarios when unit-tests might fail on more fresh test data but keep passing because they use the old data? This now becomes a problem since zim-testing-suite must be updated separately.

@kelson42
Copy link
Contributor

@mgautierfr I have extracted the commit about Brew update in the CI, to speed-up the fix and make it independant. Here is the dedicated PR #533

@mgautierfr
Copy link
Collaborator Author

Can we do what we did in openzim/python-libzim@ff560ea and have the tests be skipped if the zim test files are missing?

I prefer not, if we silently skip the tests we can be pretty sure that we will miss that.
But we can add a option (as for test using a lot of memory) to skip them.

And then, how many repositories do you expect will use the new zim-testing-suite repository? If it's just this one, then we can add it as an additional source, but if it'll be multiple we probably want to just package it separately and have libzim, etc. depend on it.

This is a open question. For now only one.
But this may change in the future. But then, we may also change the way it is working (using git-lfs or something else).

@mgautierfr
Copy link
Collaborator Author

Before this change we must agree on the new/proposed workflow of managing test ZIM archives and unit-tests depending on them? In particular:

This is somehow something I didn't want to discuss now.
The proposed solution is a easy, not perfect solution to move the tests files elsewhere and go ahead on PR that need new test files.
The priority is the release of libzim7. We need to test it (and only it for now).

Under what circumstances and how existing ZIM archives can be updated in the broad sense (even if it means creating a new manually versioned revision of that archive)?

For now, zim files in zim-testing-suite must not be changed.
We may add new zim files however.

How their versioning is going to be handled?

Manually for now.

What is the envisioned approach to creating and maintaining backward compatibility tests (when the same test must run on various versions of the conceptually "same" ZIM archive)?

The next step here is to move the existing zim files in zim-testing-suite/data into zim-testing-suite/data/with-ns (or any other name) and create another "clone" of with-ns with zim files with the "same" data but in the new format.

On libzim's test side, update getDataFilePath to return a list of files (a list of variant), and update the tests to do the same check on all variant.

How do you make sure that developers have the most up-to-date test data on their local development hosts? zim-testing-suite becomes a weak dependency of the source repositories since it is used only for unit-tests. How to avoid scenarios when unit-tests might fail on more fresh test data but keep passing because they use the old data?

We can add a check in meson to check the data version.
The CI will ensure that we test with the last data.

As said at the beginning of my comment. This solution is not perfect.
I don't want to add 27Mb more of binary data in the repository.
But we need a reference test files in the same time.
And we cannot pass weeks to design a proper solution (see https://github.com/openzim/libzim/issues/469)

@veloman-yunkan
Copy link
Collaborator

As said at the beginning of my comment. This solution is not perfect.
I don't want to add 27Mb more of binary data in the repository.
But we need a reference test files in the same time.
And we cannot pass weeks to design a proper solution (see #469)

Then I don't see why we need to switch existing tests to a new approach now. You could have created zim-testing-suite repository with only the data required for the new test. Then if that solution proves viable (and after any potential initial quirks with it are cleared) we would gradually migrate other tests to it as needed. I have dealt in an industrial setting with both approaches (1. keeping the test data with the test scripts or 2. keeping them separate from each other) and I find the former approach easier if there is no need to share the data across multiple tests. Of course that doesn't work for large data, but in our case majority of the test ZIM archives are small enough.

@mgautierfr
Copy link
Collaborator Author

Then I don't see why we need to switch existing tests to a new approach now

Because current tests must be run on current test zim files AND new ones.

@veloman-yunkan
Copy link
Collaborator

Then I don't see why we need to switch existing tests to a new approach now

Because current tests must be run on current test zim files AND new ones.

Still, we better arrive at it by creating one such test first, ironing out any issues and then migrating existing tests that rely on large test data. I am going to defend my opinion of keeping small data with the tests.

@mgautierfr
Copy link
Collaborator Author

Still, we better arrive at it by creating one such test first, ironing out any issues and then migrating existing tests that rely on large test data. I am going to defend my opinion of keeping small data with the tests.

PR #535 add the new test files and update the tests.
I still prefer to not add the zim file to the git history. They will be part of the git history "forever" even if we remove them the next PR.
But at least we have some concrete code to see.

@kelson42
Copy link
Contributor

Superseeded by #538

@kelson42 kelson42 closed this Apr 21, 2021
@mgautierfr mgautierfr deleted the no_test_data branch May 12, 2021 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants