Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/centroids as gdf #122

Merged
merged 51 commits into from
May 13, 2024
Merged

Feature/centroids as gdf #122

merged 51 commits into from
May 13, 2024

Conversation

emanuel-schmid
Copy link
Contributor

@emanuel-schmid emanuel-schmid commented Apr 11, 2024

Changes proposed in this PR:

  • Adaptations to the centroids as gdf refactoring (climada 5.0.0-dev)

This PR fixes #

PR Author Checklist

PR Reviewer Checklist

emanuel-schmid and others added 30 commits February 23, 2024 00:47
use hdf5 centroids test data
assign region_id to geodataframe, not centroids
# Conflicts:
#	script/jenkins/branches/Jenkinsfile
…/climada_petals into feature/centroids_as_gdf
@peanutfun
Copy link
Member

@emanuel-schmid The code coverage test fails because the coverage of modified lines is too low (49%, target: 60%). Is there something we want to do about this? Do we want to add some more testing to meet the target? If we simply ignore it, the pipeline for develop might succeed after merging, because the overall quality gate might still met and there will be no check of modified files only.

@emanuel-schmid
Copy link
Contributor Author

Yes, ignore. In this particular PR I don't see a point in adding tests - it's about making necessary adaptations to be compatible with core develop and not about improving anything at all.

I'm actually inclined to reconfigure the coverage restrictions. We could try the MODIFIED_LINES_DELTA baseline. I wonder though what we should set as THRESHOLD then - 100%?. And I wonder how it affects novel modules.
Also - what about plotting and I/O methods which are supposed to be in the integration test?
Perhaps we shouldn't make coverage cause failures at all.

@peanutfun
Copy link
Member

Alright, I should have the review ready by Wednesday

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Most changes look really straightforward. However, I am confused about the changes to the Makefile and the Jenkins scripts in this PR. Are they related to the centroids PR at all? There are two other minor points. See my comments.

Makefile Outdated

.PHONY : unit_test
unit_test : ## Unit tests execution with coverage and xml reports
pytest $(PYTEST_ARGS) --ignore=climada_petals/test climada_petals/
python -m pytest $(PYTEST_ARGS) --ignore=climada_petals/test climada_petals/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you updated the Jenkins scripts. Why was this addition necessary? python -m pytest as opposed to pytest adds the current directory to the sys.path. This enables testing local source code of modules which are not installed. At the same time, it makes it less clear what code exactly is tested. I recommend making sure that Petals is installed (possibly from sources with pip install -e), and then removing python -m, to make sure that the installed version is tested. See the GitHub Actions for reference, where python -m was not required so far

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition is necessary on Jenkins because all the branches use the same conda environment. pip install -e would interfere with other branches' tests.
For referring to an arbitrary climada core branch one can install that by creating a venv environment on top of the conda environment. (Which is about 10 times faster). But in such a layered environment pytest is taken from the underlying conda sources and imports climada from there, while python -m pytest starts from venv and imports climada from here.
This was done, e.g., for exactly this PR before #787 was merged.

GitHub Actions on the other hand build a new conda environment for every branch and commit. I'm afraid we don't have enough resources in terms of cpu or diskspace to do that on Jenkins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still confused why this is necessary here in Petals but not in Core. Doesn't that mean there is something off with the environment configuration?

Anyway, I see that this is too much hassle to change. However, I would like to keep the default invoking of pytest for users, which should work best with the current installation instructions. I found a way to insert the different command into the Makefile, and will push it shortly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sorry I did not realize you added new bash scripts for Jenkins, see my comments there

@@ -137,22 +136,22 @@ def from_nc(cls, dph_path=None, frc_path=None, origin=False,
if ISINatIDGrid:

dest_centroids = RiverFlood._select_exact_area(countries, reg)[0]
meta_centroids = copy.copy(dest_centroids)
meta_centroids.set_lat_lon_to_meta()
centroids_meta = dest_centroids.get_meta()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the linter is not up-to-date with the latest version of climada_python? It complains that Centroids.get_meta does not exist, but it does in develop. This might be related to the changes in the Jenkins scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The develop environment of climada_petals points to core package before the refactoring. Opposed to the unit_test stage, the lint stage does not change to the venv environment on top. That's why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we revert changes to this setup anyway, no worries ✌️

climada_petals/hazard/tc_rainfield.py Outdated Show resolved Hide resolved
Comment on lines 287 to 288
# TODO: check whether allowing for a tolerance is valid here
TOLERANCE = 0.005
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to check that? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - the test is obviously anxious about the firms' extent being well within the centroids' extent. That is not anymore exactly the case: the lower bound of the centroids is a wee bit higher then the firms'.

  • can this be tolerated? pls. check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, the bounds of the resulting centroids should never exclude the inserted bounds. But this apparently happened here. Maybe this has to do with the resolution set via the res argument in Centroids.from_pnt_bounds and the new handling of this argument. @chahank, could you have a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the method Centroids.from_pnt_bounds should produce exactly the same latitude and longitude values in both versions of the code.

I think the point here is that we are now only using the centre of the raster pixels instead of the raster itself, there might be a half-pixel difference in the bounds. I am not sure to understand what this test is supposed to check... Why is the tolerance set to 0.005? Why should there be a tolerance? O really have to dig into this code to understand this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tolerance is an artifact that I introduced simply to make the test pass, suspecting that the reason for failure may be something like what you mention - and that it does not matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaaa got it! I will try to debug and find the reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the change has to do with the pixel definition. We have that

new_centroids.total_bounds[1] - 0.375/ONE_LAT_KM= old_centroids.total_bounds[1]

This almost makes sense, as 0.375/ONE_LAT_KM is the data raster resolution. But I would have expected a factor of one-half... However, the method wf._firms_centroids_creation(firms, 0.375/ONE_LAT_KM, 1/2) applies another factor to obtain the final raster resolution, here equal exactly to 1/2, which thus cancels the other one out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution suggested in 116e2e8

Copy link
Contributor Author

@emanuel-schmid emanuel-schmid May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 excellent analysis! I'm still wondering though - shall we keep the test ast it is, do we change it or should we even get back and change the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. 116e2e8 answers the question 😄

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow missed the new bash scripts for Jenkins, and I have some trouble understanding what is going on. Why is the setup so much different from the one in Core? I think we should make sure that both users and the pipeline can simply use the Makefile provided.

Comment on lines 8 to 10
CORENV=~/jobs/petals_branches/core_env
BRANCH=`git name-rev --name-only HEAD | cut -f 3- -d /`
if [ -f $CORENV/$BRANCH ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what happens here. Why would $CORENV/$BRANCH not exist and how can the unit tests work then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all is about having specific core einvironments for petals branches (and vice versa)
It is unrelated to this PR now, but before #787 was merged into develop I needed a way to point to the sources from that branch. (The pointing itself is done using files on Jenkins by the name of the petals branch and content of the core branch: quick and dirty!)
We can undo all changes in Jenkinsfile and Makefile. They don't relate to this PR (anymore) and presumably we don't need them in the near future. 🤷

Copy link
Member

@peanutfun peanutfun May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I was very confused by this! If we can undo them, I would prefer that. We can discuss such changes in a separate PR, I think (if they are necessary at all)

#!/bin/bash
export PATH=$PATH:$CONDAPATH

source activate petals_env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is inside this environment? Wouldn't creating a venv with --system-site-packages possibly interfere with installed packages in that environment? Why is pytest installed in this environment although it should only be installed via pip when calling pip install "climada[test]"?

python -m venv --system-site-packages tvenv
source tvenv/bin/activate

pip install -e `cat $CORENV/$BRANCH`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this only install core and not petals?

Also, cat should not be necessary to expand variables

Suggested change
pip install -e `cat $CORENV/$BRANCH`
pip install -e ${CORENV}/${BRANCH}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about expanding variables, it's about the content of that file.

Comment on lines 5 to 6
rm -rf tests_xml/
rm -rf coverage/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clean the test results, execute

Suggested change
rm -rf tests_xml/
rm -rf coverage/
make ci-clean

Comment on lines 5 to 7
rm -f pylint.log

pylint -ry climada_petals | tee pylint.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this use the Makefile?

Suggested change
rm -f pylint.log
pylint -ry climada_petals | tee pylint.log
make ci-clean
make lint

Adapt the Jenkins script accordingly.
pip install -e `cat $CORENV/$BRANCH`
fi

make unit_test PYTEST_CMD="python -m pytest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emanuel-schmid This is how we can invoke a different command in the Makefile when executing it via Jenkins, see the changes in the Makefile as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@emanuel-schmid
Copy link
Contributor Author

Unless I'm missing something we're ready for merging - right?

@emanuel-schmid emanuel-schmid merged commit ec58e88 into develop May 13, 2024
11 checks passed
@emanuel-schmid emanuel-schmid deleted the feature/centroids_as_gdf branch May 13, 2024 06:25
@emanuel-schmid emanuel-schmid restored the feature/centroids_as_gdf branch May 22, 2024 09:04
@emanuel-schmid emanuel-schmid deleted the feature/centroids_as_gdf branch May 22, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants