-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementation for fast categorize #819
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #819 +/- ##
=====================================
Coverage 95.0% 95.0%
=====================================
Files 64 63 -1
Lines 6134 6144 +10
=====================================
+ Hits 5828 5841 +13
+ Misses 306 303 -3 ☔ View full report in Codecov by Sentry. |
Not clear to my why ruff CI is failing |
The error message seems instructive to me: pyam/core.py:1:1: I001 [*] Import block is un-sorted or un-formatted
pyam/core.py:69:29: F401 [*] `pyam.validation._apply_criteria` imported but unused
Found 2 errors.
[*] 2 fixable with the `--fix` option. We have recently introduced ruff in this repo, which can handle several nice things at once. For example, it sorts your import blocks and cleans up unused imports. All these fixes have been applied to the main branch, so if you rebase this PR on top of that, you'll automatically get them. If you don't want to do that for whatever reason, run |
If you've done that and the action is still failing, please compare your local version with the one used for the check. Per default, GHA will use the latest pypi release, so 0.3.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in principle, but for this to be a proof of concept, I'd like to see a test case using the new behaviour and confirming that it works as intended.
Or am I misunderstanding this? The PR this one is following up on changed the call signature of df.validate()
, but this one keeps categorize()
the same, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the existing tests are enough for me :)
(But please note that I'm not actually a maintainer of this repo.)
Correct, all we do here is update the internals of |
Will review tonight |
This PR does not actually implement the new signature of Closing in favor of #837 |
As an FYI, I was utilizing this prototype including the filters and upper/lower bound arguments. Code snippets that I was using are here: in a
functioning code
|
if len(idx) == 0: | ||
# find all data that matches categorization | ||
# TODO: if validate returned an empty index, this check would be easier | ||
not_valid = self.validate(criteria=criteria, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here @danielhuppmann the validate kwargs like upper_bound
and other filtering args are taken in
Ok, right, the method works with the new signature, but it was not described in the docstring and there were no tests added in this PR - this is what I implemented in the new branch and PR. |
This is only a prototype for now, but I hope shows how we could implement a fast categorization in line with previous work on
df.validate()
Please confirm that this PR has done the following:
Tests Addedno user-facing api changesDocumentation Addedno user-facing api changesDescription of PR
This PR uses the machinery from
validate()
now incategorize()
resulting in huge speedups