Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codeCheck() misses interface violation #82

Closed
0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q opened this issue Jan 24, 2024 · 23 comments
Closed

codeCheck() misses interface violation #82

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q opened this issue Jan 24, 2024 · 23 comments
Assignees
Labels
bug Something isn't working

Comments

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

In remind#1b90822, codeCheck() should complain about

  + sum(tePrc2opmoPrc(tePrc,opmoPrc)$(p37_specFEDem("2005",regi,enty,tePrc,opmoPrc) gt 0.),
      p37_specFEDem("2005",regi,enty,tePrc,opmoPrc)

as p37_specFEDem should be exclusive to module 37_industry, but does not.

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Jan 30, 2024

codeCheck misses this because the variable is called p37_specFEDem instead of p37_specFeDem. As I learned, it is the same for GAMS.

How to proceed @tscheypidi? Should I simply make the check for variable usage outside the module case insensitive or should we introduce a warning / error when several variable names are found that are the same to GAMS like in this example?

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member Author

There is only one correct way to handle this.

GAMS is not case sensitive. This means that lower and upper case letters may be mixed freely but are treated identically.

[↑]

@tscheypidi
Copy link
Member

@fbenke-pik given that having different versions of the same variable in the code can lead to various problems I would suggest to have a dedicated warning for these instances rather than making the other check case insensitive.

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Jan 31, 2024

The easiest fix seems to be to make checkAppearance ignore case. This function gathers the occurrences of all declared variables in all modules and is the foundation for the error checking in this issue. If we can agree on this fix, this PR would solve this particular issue already.

Checking if declared variables appear in various versions in the code is surely a useful check. Even if GAMS allows any variable name version, we might not want to allow it, as it is error-prone (e.g. when doing search/replace during a refactoring). However, at first glance it probably will be a time-consuming check to search for all versions of each declared variable in the whole codebase, so it comes at a cost (assuming, we do a naive implementation and don't try to somehow build on the results from checkAppearance to make it more efficient). Is it worth implementing?

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member Author

Even if GAMS allows any variable name version, we might not want to allow it, as it is error-prone (e.g. when doing search/replace during a refactoring).

  1. That is why you can search case-insensitive.
  2. Even if one does not do that, it will only result in a compilation error and therefore be captured easily and early.

@tscheypidi
Copy link
Member

@fbenke-pik I thought that the check should be rather easy to implement as we have the list of all parameters/variables already extracted in codeCheck. Something like anyDuplicates(tolower(names)) should do it, or not?

@tscheypidi
Copy link
Member

tscheypidi commented Jan 31, 2024

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q There is an important difference between "you can search case-insensitive" and "users search case-insensitive". The fact that you can circumvent that error does not mean that people will not run into this trap. And also depending on what you want to do or figure out in the code you do not necessarily run into a compilation error...(you might just miss an important line in the code and thereby being not able to understand another problem)

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Jan 31, 2024

@fbenke-pik I thought that the check should be rather easy to implement as we have the list of all parameters/variables already extracted in codeCheck. Something like anyDuplicates(tolower(names)) should do it, or not?

Yes, you have a list of all the declared parameters/variables extracted. But in this case p37_specFeDem was declared (so it is in the list). And then in the code it was used in some places as p37_specFEDem (which is not in the list).

So an algorithm probably has to count occurrences of all declared variables in all the code with and without case sensitive condition and then check if the count differs.

@tscheypidi
Copy link
Member

ah ok, I forgot that only the declarations are scanned. In that case it indeed looks like a bit more computational effort. @fbenke-pik what is your opinion on it?

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Feb 1, 2024

It all depends on whether you view variables with different capitalization as a problem. As I don't work directly with GAMS / REMIND, i don't know if this is a problem in practice. Assuming you view it as a problem, we should add proper code a check, fix all current occurrences in REMIND and accept the increase in runtime.

This is a PR that implements such a check (so far only prints the variables) #84

Please take a look to get an idea which variables are currently potentially affected in REMIND (I still need to check thoroughly for false positives)

@fbenke-pik
Copy link
Contributor

Of course i can make it an optional check so each model team can decide individually, if they want to enforce this. @LaviniaBaumstark would you be willing to enforce this for REMIND?

@tscheypidi
Copy link
Member

I would be in favor of enforcing it. Users tend to search the code without considering the option that the same variable/parameter might be written with different capitalization and I do not see an advantage in allowing for this kind of flexibility. I guess all instances where the capitalization are used different are probably more mistakes than conscious decisions.

The issue I just see is the runtime of the check :(

@LaviniaBaumstark
Copy link
Member

I agree to Jan. But first we would need to fix it for REMIND before activating it :-)

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Feb 2, 2024

If we can agree on this fix, this PR would solve this particular issue already.

This is not that easy as I initially thought. We would have to refactor all checks of codeCheck to make sure it sees GAMS vars as equal even with different capitalization.

@tscheypidi @LaviniaBaumstark In order to move forward, I need a binding decision how to proceed:

  1. Enforce case sensitivity in codeCheck
  2. Refactor codeCheck to view variables as equal regardless of capitalization just like GAMS does
  3. Support both approaches, make it an option for codeCheck

@tscheypidi
Copy link
Member

I am in favor of 1., especially also in the light that a case insensitive check does not seem to be as easy to be implemented as initially thought.

@fbenke-pik
Copy link
Contributor

I am in favor of 1., especially also in the light that a case insensitive check does not seem to be as easy to be implemented as initially thought.

To clarify, it is not necessarily hard, just a larger refactoring touching all checks. The core of codeCheck is an appearance matrix that tracks declared and not used variables times the realizations of all models. This matrix would have to be unified (e.g. all lowercase) to remove duplicates due to different capitalization. As all checks build on top of this matrix, I would have to check all the checks to make sure this does not break anything.

@LaviniaBaumstark
Copy link
Member

under this condition, I am also in favor of 1, but this also means that we will have the longer test, right?

@fbenke-pik
Copy link
Contributor

fbenke-pik commented Feb 2, 2024

under this condition, I am also in favor of 1, but this also means that we will have the longer test, right?

yes, the additional test takes 2 minutes. at first glance, i don't see an easy way to reduce complexity.

@LaviniaBaumstark
Copy link
Member

I would say this 2 minutes are o.k.

@tscheypidi
Copy link
Member

@fbenke-pik Concerning runtime I was wondering whether stringr could help here to speed things up (already a dependency of gms. I think you currently search the code twice which is probably that time consuming part. stringr allows to search the code for occurrences of a string and to extract all these occurrences. If you do this kind of extraction you could just afterwards analyze these extracted objects which might potentially be faster and perhaps even more reliable. What do you think?

@fbenke-pik
Copy link
Contributor

@fbenke-pik Concerning runtime I was wondering whether stringr could help here to speed things up (already a dependency of gms. I think you currently search the code twice which is probably that time consuming part. stringr allows to search the code for occurrences of a string and to extract all these occurrences. If you do this kind of extraction you could just afterwards analyze these extracted objects which might potentially be faster and perhaps even more reliable. What do you think?

See my reply here: #84 (comment)

@tscheypidi
Copy link
Member

Thanks for testing

@fbenke-pik
Copy link
Contributor

As Jan and Lavinia decided to go with unifying capitalization, this is the PR to address this issue: #84

I am working on a corresponding PR to REMIND that fixes all current instances that make the test fail:
remindmodel/remind#1542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants