Set http caching for tests #207

tgrandje · 2024-08-08T11:17:29Z

install requests-cache on workflows
set http caching when requests-cache present (might be absent (because of the workflow or the local environment)
remove testing for python 3.9 to 3.11

* install requests-cache on workflows * set http caching when requests-cache present (might be absent (because of the workflow or the local environment)

tgrandje · 2024-08-08T11:21:44Z

The current test results should not be considered at all. It will need to be merged first to be tested against the updated workflow (with requests-cache installed).

In time, we can add a pytest_sessionfinish to preserve the cache on an external source (most probably sspcloud's S3) between tests.

tfardet

thanks for that!

hadrilec · 2024-08-08T20:34:04Z

it looks great, could you please modify tests py files accordingly? some tests run only on some specific py versions. thanks

tgrandje · 2024-08-09T07:13:06Z

Oh, so that's why the 3.8 tests were almost always concluant :-). Sorry about that, I've missed it since my machine is currently running a 3.10 python...

I'll remove all python version checks, that should be fine with the refactored python versions control in the workflow.

* Remove version control from unittests * Remove (one or two) unnecessary module imports * Black formatting

tgrandje · 2024-08-09T07:14:32Z

Done

changes were made in-between

tfardet · 2024-08-09T14:47:15Z

@tgrandje the tests timed out, so something might not be working as expected with caching... or it's just that doing all the tests is too slow, maybe that's why the version test was there?

tgrandje · 2024-08-09T14:52:27Z

I'd say the second one.
As I removed the selected tests on 3.8, those should take longer - and lately, I think I've sometimes had the 3.8 being the only one to conclude. I'd say we should try to merge it into the main branch (if you don't see anything amiss that is) and rerun some tests. If this fails, we can revert the merge, the changes are small enough.
Just to be sure, I'll recheck the tests on my machine (I don't think I've re-done it).

tgrandje · 2024-08-09T15:19:26Z

Well, tests on y machine seems to be unconclusive: might be something wrong with my proxy (and/or me switching back and forth on different branches and juggling with the tempdirs).

Tests are not all finished, and I already have failures on some tests which were correctly ran on github.
For instance, I got one failure on the 3rd test (out of 4) in test_pynsee_download, and it's perfectly fine on github...

If I got the time, I'll try to re-run some of those this evening on a personnal laptop outside my office's network (ie without proxies). But I may be short in time... If you don't hear from me before tomorrow, fell free to do what you think is best!

tgrandje · 2024-08-09T19:55:45Z

Ok, so I ran the tests outside of a corporate network and python 3.9.

I got 3 failing tests (which were silenced previously as they only ran on 3.8) :

FAILED tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_geo_list_1 - xml.etree.ElementTree.ParseError: syntax error: line 1, column 0
FAILED tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_local_data_1 - xml.etree.ElementTree.ParseError: syntax error: line 1, column 0
FAILED tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_local_data_all - xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

I've seen something about the arrow/gpd issue on the last one, but I'm not certain this triggered the failure. The 2 other failures seems to run ok outside of pytest, so maybe there is something wrong with the patching indeed.
I will continue when I'm coming back in 2 weeks (or if anybody has the time to spare, feel free to pursue this)

tgrandje · 2024-08-09T20:01:08Z

Forget that. I ran it again (using cache) and that 3 failed test now passed. Maybe it has to do with a failure of the source, but the tests are ok.

* Add optional test flag --clean-cache to remove existing cache previous to tests * Add optional test flag --no-cache to deactivate cache during tests * Set 30 days timeout expiration on cache during tests

tgrandje · 2024-08-27T06:37:02Z

Note

Personal note: tests computation times, without cache (results for tests > 30sec) on personal machine:

230.43s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_get_series_1
204.96s call     tests/geodata/test_pynsee_geodata.py::TestFunction::test_get_geodata_short
187.06s call     tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_geo_list_1
169.76s call     tests/download/test_pynsee_download.py::MyTests::test_download_big_file
159.25s call     tests/sirene/test_pynsee_sirene.py::TestFunction::test_request_sirene
131.43s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_get_date
118.71s call     tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_local_data_all
113.11s call     tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_local_data_latest_error
91.86s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_get_dataset_metadata_1
76.47s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_get_column_title_1
74.69s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_get_dataset_1
66.01s call     tests/download/test_pynsee_download.py::MyTests::test_download_file_all
57.10s call     tests/sirene/test_pynsee_sirene.py::TestFunction::test_search_sirene
51.43s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_download_idbank_list_1
43.04s call     tests/macrodata/test_pynsee_macrodata.py::TestFunction::test_get_series_list_1
40.78s call     tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_area_list_1
40.30s call     tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_included_area
40.12s call     tests/localdata/test_pynsee_localdata.py::TestFunction::test_get_population

All tests passed in 2077.28s (0:34:37) with python 3.9.

…com/InseeFrLab/pynsee into enhancement/tests_requests_patching

…tests

tgrandje · 2024-08-30T15:41:05Z

Ok, I've gone farther than I thought I'd do. So far:

there is a patching using requests-cache during tests;
all tests are now run on 3.8 and 3.12 exclusively;
the cache is uploaded to SSP cloud's storage system (there are 2 CI variables to update for that);
the cache is compressed using py7zr before sending it to SSP cloud ; this is rather slow, even if efficient (SQLite file went from 1.5 Go to 600Mo) : we should do some tests on github to check whether we keep this (depending on the bandwith & the CPU), but I can't do that on my laptop;
I've changed some exceptions type from ValueError to RequestException on the utils package (for consistency) and fixed the linked tests. This could also be removed from the current PR and set to a new one (I ran into while trying to undersand what went wrong with certain tests).
I've added a patch on the recent evolution of authentication (which didn't take proxies into account from environment variables) ; maybe this one should go into a separate PR...

I still got a problem with 3 tests (test_pynsee_localdata.py::test_get_geo_list_1, test_pynsee_localdata.py::test_get_local_data_1, test_pynsee_localdata.py::test_get_local_data_all). They seem to fail due to web miscommunication, but when I ran only those 3 (without the cache), they seem to pass. I'll make some further tests to see if this is linked to the caching or not.

tgrandje · 2024-08-31T08:30:23Z

It seems there is definitely something buggy with requests-cache and those 3 tests.

I'll transfer the 2 patches on proxies/exceptions into separate PR in case I can't resolve this bug...

tgrandje · 2024-09-01T16:39:30Z

There is something very strange going on on this bug.

As far as I understand, requests-cache uses requests under the hood to send http(s) requests, and those are not altered (with the only exception of the multipart form boundary, which is set as a constant to allow the hashing of queries). Yet, when I use the requests-cache patch, I always got some strange results, mostly on those 3 tests, so that doesn't seem to be a coincidence.

But, when I track the http response, I see that for those failures, we get perfectly valid XML results from INSEE's API instead of json. This is kind of funny as the server is not meant to serve XML results (according to the API's documentation anyway...).

For instance, for https://api.insee.fr/metadonnees/V1/geo/departements, I got:

<Departements>
    <Departement code="01" uri="http://id.insee.fr/geo/departement/84680e6f-2e99-44c9-a9ba-2e96a2ae48b7">
        <Intitule>Ain</Intitule>
        <Type>Departement</Type>
        <DateCreation>1967-12-31</DateCreation>
        <IntituleSansArticle typeArticle="5">Ain</IntituleSansArticle>
        <ChefLieu>01053</ChefLieu>
    </Departement>
    ...
    <Departement code="976" uri="http://id.insee.fr/geo/departement/7f7fe4af-ae1c-4641-80ef-5c8199aea45c">
        <Intitule>Mayotte</Intitule>
        <Type>Departement</Type>
        <DateCreation>2011-03-31</DateCreation>
        <IntituleSansArticle typeArticle="0">Mayotte</IntituleSansArticle>
        <ChefLieu>97611</ChefLieu>
    </Departement>
</Departements>

I think I've tracked every argument (proxies, stream, headers, ....) sent to the webserver and I can't see a difference between valid and unvalid results.
I've also run the 3 problematic(?) tests alone (with pytest and the requests-cache patching), but everything worked fine.
I've started to comment tests randomly in the file and I've even managed to have those strange results on another test altogether. I've also tried a temporisation of 10sec. between each tests and that had no impact.

I'm starting to think that might be a webserver failure, but I'm not sure either (why mostly those 3 tests, why only when patching, why would the temporisation have no impact...). If that's really the case, I see 2 ways out of here:

abandonning the hope of consolidating tests with requests-cache, because unreliable (whether the response is xml or json, the reponse code is always 200, so requests-cache can't filter the caching results) ; maybe using mockups would be sufficient, but the amount of work would be troublesome.
adding a xml parser as a backend when the json parser fails; that might also be a good thing for the users, but only if this is confirmed.

@hadrilec @tfardet do you see anything I'm missing?

(If anybody is ready to test this locally, please send me an email and I'll send you the S3 credentials.)

This reverts commit 5552528.

tgrandje · 2024-09-03T15:36:21Z

Ok, colleagues from INSEE put me on a lead and I finally got this after much debugging. It appears requests-cache does not take Accept header into acount to hash requests, and we have still some XML queries in the localdata app.

Regarding the (apparent) random results, that's entirely the results of me selecting different tests and/or the result of improper caching through that history...

For memory's sake:

this is the doc on requests-cache's header matching
and this is the doc (on page 4) from INSEE stating that the Accept header alters the API's responses (this is not described on the swagger)

I still got to check different backend on different machines to determine what's fastest a priori, but I hope we can soon test this directly on github.

tgrandje · 2024-09-04T16:17:11Z

Best backend seems to be SQLite (tested only between SQLite & filesystem, that took long enough 🙄 ).

Other features in this PR I've forgot to describe in conversation include the following:

the presence of 2 optional flags when running pytest locally :
- --clean-cache will clean up the cache from the S3 (and also locally) ;
- --no-cache will deactivate the cache altogether.
I've changed the matrix strategy of the workflows to ensure that if queries are expired, the update from one test speeds up the next one: we will have to see how that fare. That doesn't cover multiple workflows run at the same time ; for that, I've no solution right now. I think the easiest solution would be to add a new workflow, and run it every 30 days with the --clean-cache flag to force cache recreation outside usual working hours.

I'm still not so sure we should conserve the compression of the artifacts: that will have to be tested in real conditions on github. The fact is that with compression, the restauration is fast (~3min), but the storage is real slow (like 19min slow on the same machine).

Further modifications to consider (not yet done, but for memory's sake):

maybe run tests concurrently (I think there is a plugin for that with pytest) ; the SQLite backend should work well enough with that. But that may need some pruning/reordering of tests.
removing those time.sleep on the tests (like here) when running with requests-cache operationnal, though I'm not sure how to control that all the previous inner functions did not run into expiration from cache.

In any case, for now I'd say someone can check that PR. If it's ok, we can move to real conditions tests and iterate from there.

hadrilec · 2024-09-09T19:19:14Z

Hello, I tried to run the tests on onyxia, I got the following error, an idea on how to proceed?

trying to restore artifact from SSP Cloud
downloading ...
took 5sec
extracting archive...
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/_pytest/main.py", line 281, in wrap_session
INTERNALERROR>     config.hook.pytest_sessionstart(session=session)
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/pluggy/_callers.py", line 139, in _multicall
INTERNALERROR>     raise exception.with_traceback(exception.__traceback__)
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR>     teardown.throw(exception)  # type: ignore[union-attr]
INTERNALERROR>     ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/_pytest/logging.py", line 784, in pytest_sessionstart
INTERNALERROR>     return (yield)
INTERNALERROR>             ^^^^^
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>           ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/home/onyxia/work/pynsee/tests/conftest.py", line 175, in pytest_sessionstart
INTERNALERROR>     with py7zr.SevenZipFile(archive_path, "r") as archive:
INTERNALERROR>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/py7zr/py7zr.py", line 416, in __init__
INTERNALERROR>     raise e
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/py7zr/py7zr.py", line 397, in __init__
INTERNALERROR>     self._real_get_contents(password)
INTERNALERROR>   File "/opt/conda/lib/python3.12/site-packages/py7zr/py7zr.py", line 436, in _real_get_contents
INTERNALERROR>     raise Bad7zFile("not a 7z file")
INTERNALERROR> py7zr.exceptions.Bad7zFile: not a 7z file

tgrandje · 2024-09-10T06:52:27Z

Apparently the file is corrupt (I downloaded and checked it with other softwares). Did you ran the tests multiple times?
(As yet, the MinIO console is down, so we're stuck with the interface through Onyxia, which displays only the name of the file...)

On any machine (running the tests manually of course), you are also able to remove the previous cache by adding the flag --clean-cache to the pytest command.

That being said we should add a check on the 7z file's state before uploading it to sspcloud... 🤔
Unpacking is quite fast, so this should not slow the tests too much.

* handle bad 7z file (dumping the cache in case of corruption) * check 7z's integrity before sending it to S3

tgrandje · 2024-09-12T20:54:02Z

@hadrilec I've just added a quick fix to handle corrupted 7z files (I've also reverted some changes which were due to go to another PR).

If that's ok with you, I'll re-run the tests on my laptop ; you should then be able to test (one more time) directly on onyxia.

tgrandje · 2024-09-13T05:48:08Z

OK, that's done.

tfardet · 2024-09-18T09:06:39Z

Should we merge this to see whether that fixes the tests and iterate from there if it doesn't?

EDIT: nevermind, I think there might actually be an issue with the tempfile commit

EDIT 2 : I'm a bit tired, I actually don't think there is an issue with the tempfile commit but I'm running the tests locally too on python 3.12, just to check.

@tgrandje with which python version did you test? was it with a clean install in a venv?

EDIT 3 : local API is failing with code 500 (internal server error) at the moment, for me, so no tests)

tgrandje · 2024-09-18T11:50:42Z

@tfardet I think I tried this on both 3.9 & 3.10, each time through a venv: I'm using a local pyproject.toml to work with poetry (that might also be an enhancement to think of in the future, but I'm a bit partial about that).

Re-reading the code, I may have forgotten to handle the case where you have a valid s3fs installation but your credentials are not valid (I'll rewrite that). Is that what occured to you? I think that should trigger a botocore.exceptions.BotoCoreError of some kind...

(Do you have access to the CI variables? If not I'll send you the credentials by email.)

Set http caching for tests

a7c44e3

* install requests-cache on workflows * set http caching when requests-cache present (might be absent (because of the workflow or the local environment)

tfardet previously approved these changes Aug 8, 2024

View reviewed changes

Remove version check from unittests

36cc789

* Remove version control from unittests * Remove (one or two) unnecessary module imports * Black formatting

Update conftest.py

a3de263

* Add optional test flag --clean-cache to remove existing cache previous to tests * Add optional test flag --no-cache to deactivate cache during tests * Set 30 days timeout expiration on cache during tests

tgrandje added 15 commits August 27, 2024 22:20

Restore insee secrets on local machine after tests

8ebea8b

First try to store artifacts on SSP Cloud

1bb34c7

Update conftest.py

8cc05f0

Add hash check of cache before/after session

0955cb0

Fix typo

c234300

Handle missing cache

fc94a0a

Update .gitignore

ec945f8

Add compression of archived cache

f1791a7

Preserve proxies during tests

b5302b6

Merge branch 'enhancement/tests_requests_patching' of https://github.…

1b7fa7c

…com/InseeFrLab/pynsee into enhancement/tests_requests_patching

Store proxy to crendentials for perservation during tests

bcb8f1c

Add log entries to pytest session load/teardown + flag bugfix

4a39255

Fix tests exception types

47eb542

Better credentials persistence and systemic tempfiles removal before …

2005993

…tests

Merge branch 'master' into enhancement/tests_requests_patching

c40c504

tgrandje added 3 commits August 30, 2024 16:12

Remove previous credentials persistence patch

55277ad

Bugfix on exception type

d158177

Run test sequentialy to ensure proper caching

f6b92a6

tgrandje added 2 commits September 2, 2024 17:51

Fix missing application type

5552528

Revert "Fix missing application type"

92e81ea

This reverts commit 5552528.

tgrandje added 10 commits September 3, 2024 17:37

Fix headers matching

013ccb2

Add timers for benchmark

7f42bec

Test filesystem backend

4d22025

Fix cleanup/restoration for filesystem backend

ade2acc

Fix UnboundLocalError

ced1cbf

Store first benchmark results in project

644d037

Reset backend to SQLite and Linux exceptions fixed

851edc6

Remove temp poetry files & benchmark

b7745ae

Moved _get_credentials fix to different PR

76190d5

Moved _request_insee exception fixes to different PR

be1e272

tgrandje added 2 commits September 12, 2024 22:44

Manage corrupted 7z archive

a84e595

* handle bad 7z file (dumping the cache in case of corruption) * check 7z's integrity before sending it to S3

Revert exception type in tests (moved to another PR)

f4fa8f8

tgrandje added 2 commits September 18, 2024 14:14

Fix conftest with missing/incomplete/invalid SSPCloud credentials

8eee51b

Fix typo

f4913ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set http caching for tests #207

Set http caching for tests #207

tgrandje commented Aug 8, 2024

tgrandje commented Aug 8, 2024

tfardet left a comment

hadrilec commented Aug 8, 2024 •

edited

Loading

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tfardet commented Aug 9, 2024 •

edited

Loading

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tgrandje commented Aug 27, 2024

tgrandje commented Aug 30, 2024

tgrandje commented Aug 31, 2024

tgrandje commented Sep 1, 2024 •

edited

Loading

tgrandje commented Sep 3, 2024

tgrandje commented Sep 4, 2024

hadrilec commented Sep 9, 2024

tgrandje commented Sep 10, 2024

tgrandje commented Sep 12, 2024

tgrandje commented Sep 13, 2024

tfardet commented Sep 18, 2024 •

edited

Loading

tgrandje commented Sep 18, 2024

Set http caching for tests #207

Are you sure you want to change the base?

Set http caching for tests #207

Conversation

tgrandje commented Aug 8, 2024

tgrandje commented Aug 8, 2024

tfardet left a comment

Choose a reason for hiding this comment

hadrilec commented Aug 8, 2024 • edited Loading

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tfardet commented Aug 9, 2024 • edited Loading

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tgrandje commented Aug 9, 2024

tgrandje commented Aug 27, 2024

tgrandje commented Aug 30, 2024

tgrandje commented Aug 31, 2024

tgrandje commented Sep 1, 2024 • edited Loading

tgrandje commented Sep 3, 2024

tgrandje commented Sep 4, 2024

hadrilec commented Sep 9, 2024

tgrandje commented Sep 10, 2024

tgrandje commented Sep 12, 2024

tgrandje commented Sep 13, 2024

tfardet commented Sep 18, 2024 • edited Loading

tgrandje commented Sep 18, 2024

hadrilec commented Aug 8, 2024 •

edited

Loading

tfardet commented Aug 9, 2024 •

edited

Loading

tgrandje commented Sep 1, 2024 •

edited

Loading

tfardet commented Sep 18, 2024 •

edited

Loading