Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python package support for SCHISM #107

Open
saeed-moghimi-noaa opened this issue Jul 6, 2023 · 7 comments
Open

Python package support for SCHISM #107

saeed-moghimi-noaa opened this issue Jul 6, 2023 · 7 comments

Comments

@saeed-moghimi-noaa
Copy link

saeed-moghimi-noaa commented Jul 6, 2023

@josephzhang8 @cuill @wzhengui @brey @SorooshMani-NOAA @BahramKhazaei-NOAA @gseroka

All,

Here at NOAA, we are investing a large amount of resources on using Pyschims as a part of our operational frameworks (STOFS, Psurge, UFS and ...). It seems like there are other PY packages also being developed (by SCHISM team and EU, ...) however none of them are addressing the need of the users.

I think we reach to the point that we need a concrete plan and have every body united around one accepted solution.

We need to consider:

  • software design, good coding prctice and extensibility of the code
  • purpose of the code (download forcings /or/ prep nml files /or/ all!?
  • unit testing
  • ...

Please comment and let every body knows your point of view.
Please tag others may are interested

Thanks,
@saeed-moghimi-noaa

@josephzhang8
Copy link
Member

Thx Saeed. I've asked Linlin to push out and work on a branch 'clean' to start trimming down the package and also come up with tests. Given her workload, this may take a while but rest assure we are on it..

@SorooshMani-NOAA
Copy link

There are multiple fronts to this from my perspective.

Most importantly, we need to make sure that the package we use is maintained, and then know what the intended use of that package is. For example if we use pyschism for setting up SCHISM input files, but the package is only maintained for its download capabilities, then we're in trouble.

The second front is using one package for each task. For example let's say, pyschism, metpy, etc for downloading data and pylibs for processing all the downloaded data and generating SCHISM input, in this case model setup is one (big) task and downloading each type of data is another. I'd like to note that if a single package e.g. pyschism does both downloading and setup, it's OK, as long as it is maintained to do both (though it just makes the package complicated over time as @brey can attest to). What we'd like to avoid at all cost is a case where we need to do some types of model setup with one package and some other types of setup with another package!

The third front is release and testing. We ideally would like to have a package that is installable (from conda or pip or a combination) without needing to download the source code. And that there are some automated tests that ensure all the advertised features work (related to schism-dev/pyschism#84).

Last but not least, it is important that we have a standard style or convention for coding such that the code is maintainable by the community rather than a single person or group. This doesn't mean that we need to rewrite all the existing code over night, but rather to move toward migration to the agreed upon style and develop new code with the convention; the convention should be enforceable by automated tools rather than "I feel like it's right".

@feiye-vims
Copy link
Member

feiye-vims commented Jul 7, 2023

There are multiple fronts to this from my perspective.

Most importantly, we need to make sure that the package we use is maintained, and then know what the intended use of that package is. For example if we use pyschism for setting up SCHISM input files, but the package is only maintained for its download capabilities, then we're in trouble.

The second front is using one package for each task. For example let's say, pyschism, metpy, etc for downloading data and pylibs for processing all the downloaded data and generating SCHISM input, in this case model setup is one (big) task and downloading each type of data is another. I'd like to note that if a single package e.g. pyschism does both downloading and setup, it's OK, as long as it is maintained to do both (though it just makes the package complicated over time as @brey can attest to). What we'd like to avoid at all cost is a case where we need to do some types of model setup with one package and some other types of setup with another package!

The third front is release and testing. We ideally would like to have a package that is installable (from conda or pip or a combination) without needing to download the source code. And that there are some automated tests that ensure all the advertised features work (related to schism-dev/pyschism#84).

Last but not least, it is important that we have a standard style or convention for coding such that the code is maintainable by the community rather than a single person or group. This doesn't mean that we need to rewrite all the existing code over night, but rather to move toward migration to the agreed upon style and develop new code with the convention; the convention should be enforceable by automated tools rather than "I feel like it's right".

@SorooshMani-NOAA
I agree with your comments.

As Joseph said, we are trying to make some changes to the pyschism package.

Concerning your Comment 1, we would like to maintain one package, and pyschism is the one. But we want to introduce some of the pylib functions as a dependent of pyschism and I'll explain more of this below. We will keep in mind the disadvantage of having a package in control of both data preparation and model setup and assess if the package is too complicated during future development.

(Per your Comment 2) we are in the process of importing some core functions of pylib (especially on hgrid manipulation) into pyschism. These functions can make the downstream script simpler (thus more readable and maintainable, which is related to your Comment 4), e.g., setting up a *.gr3 file with values based on multiple criteria (region based, bathy slope-based, nudging zone-based, etc.). In addition, the functions from pylib are generally more efficient.

However, pylib has issues with namespace contamination due to the extensive use of "import *". For example, you reported a numpy issue with pyDEM (which imports pylib) recently. It turns out it is related to some function name collision due to the numpy functions imported by "from numpy import *" in pylib. As a temporary measure, we extract the hgrid functions from pylib and put it into an experimental branch (which is pip installable, and can serve as a dependent when pip installing pyschism). This is an attempt to address your Comment 3.

Let me know if you have any additional suggestions.

@saeed-moghimi-noaa
Copy link
Author

Hi Fei

These are music to my ear! We share the same concerns.

Thanks a lot to all of you at VIMS,
-Saeed

@wzhengui
Copy link
Contributor

wzhengui commented Jul 7, 2023

@feiye-vims

I don't have any comments on pyschism development. Glad to know pylib can help in your work.

Regarding the namespace contamination issue in pylib. Using explicitly import style (import pylib as pl), this issue can be avoided. That was the price I chose to pay when I first started designing pylib in order to maintain a simple/efficient coding style. The problem will only arise when one decides to do implicit import of pylib, but reuse pylib/numpy/pyplot function names as new variable names.

@brey
Copy link
Contributor

brey commented Jul 30, 2023

Dear All,

Sorry for the late reply. As I have said in many occasions, pyposeidon is for us the do-all package that can get all necessary input and do it all (mesh generation, dem, meteo, prepare schism, post-process, visualisation). pyposeidon grew organically, I suspect the same way as pylibs & pyschism still do.

In my view, this is problematic, especially for attracting new devs. The success of searvey (a pyposeidon spinoff) suggests that smaller packages with specific task can be more modular and can serve multiple uses. I am not sure what our team will decide, but in my view we should decompose pyposeidon to smaller dedicated packages that can be used in a pipeline.

This includes the schism class that prepares a schism run based on specific input (in my view what pyschism should only do).

It could be, in the future, that any of them could become obsolete due to upstream updates. The stack can adapt easily by removing that package. In addition smaller packages can have more efficient CI/CD.

my 2 cents.

@josephzhang8
Copy link
Member

josephzhang8 commented Jul 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants