Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing dependencies #208

Closed
banesullivan opened this issue Dec 17, 2023 · 3 comments · Fixed by #205
Closed

Optimizing dependencies #208

banesullivan opened this issue Dec 17, 2023 · 3 comments · Fixed by #205
Labels
new feature PyOpenSci Review Issues related to the review for PyOpenSci

Comments

@banesullivan
Copy link
Contributor

Is your feature request related to a problem? Please describe.
EOmaps has quite a few hard requirements under it's basic installation mode: pip install EOmaps. I'm curious if we could trim down the requirements to a critical few to minimize the impact of installing EOmaps?

I realize that many of these dependencies are indeed mission-critical (matplotlib, numpy, cartopy), but other dependencies may be more for user convenience or used for specific tasks.

Keeping the core, hard dependencies of a Python package slim ensures that the package remains lightweight and easy to install, reducing the risk of compatibility issues and dependency conflicts. Leveraging Python packaging's structure for optional dependencies allows for greater flexibility and customization, enabling users to tailor the package to their specific needs without being burdened by unnecessary components.

However, I want you to do whatever makes the most sense for most users. If you think most users want all of these, give them all of these! I just thought I'd raise this as it was a bit of a red flag to see so many dependencies without any optional.

Describe the solution you'd like
Would it make sense to group some of these dependencies using extras_requires (or optional-dependencies in the new setuptools paradigm)? I think you'll have a better understand of which groups to implement but I'm imagining something like:

Perhaps remove the following:

  • xmltodict: I don't see this used anywhere? Perhaps it's optional to Pandas? If so, I don't recommend making it a hard requirement unless you are directly relying on it
  • packaging: Can we use a standard library to accomplish what you need?
  • pyepsg: Again not directly seeing this used.
  • descartes: Again not directly seeing this used

And set the dependencies as:

dependencies = [
    'numpy',
    'pandas',
    'matplotlib>=3.4',
    'cartopy>=0.20.0',
    'pyproj',
    'geopandas',
    'click',
]

[project.optional-dependencies]
all = ['EOmaps[wms,features]']
wms = [
    'owslib',
    'cairosvg',
    'requests'
]
features = [
    'mapclassify',
    'xmltodict',
    'descartes',
    'pyepsg',
    'scipy',
]

I see that you are already handling many of these dependencies as "optional" in a sense, like here:

EOmaps/eomaps/shapes.py

Lines 1269 to 1274 in 7136149

def _get_voronoi_verts_and_mask(self, x, y, crs, radius, masked=True):
try:
from scipy.spatial import Voronoi
from itertools import zip_longest
except ImportError:
raise ImportError("'scipy' is required for 'voronoi'!")

But this isn't well handled everywhere:

from scipy.spatial import cKDTree

As a second part of this issue/request, I'm curious if you could make sure that optional dependencies are well-handled throughout the library: raising user-friendly exceptions when not available (as you have above)

Describe alternatives you've considered

If nothing else, I think it would be great to explain the purpose of each dependency

Additional context
Add any other context or screenshots about the feature request here.

@raphaelquast raphaelquast added the PyOpenSci Review Issues related to the review for PyOpenSci label Dec 18, 2023
@raphaelquast
Copy link
Owner

@banesullivan Thank you for opening this issue!

I agree that EOmaps is currently installing a lot of optional dependencies that are not needed for basic functionalities.

The reason why I choose to be quite extensive at the moment is twofold:

  1. I wanted EOmaps to become a package that is intended for a broad audience of geo-data scientists, and not all of them are experts in what's going on behind the scenes of a python package.
    If optional dependencies require some c++ libraries like GDAL, PROJ, running pip install geopandas after a conda install -c conda-forge eomaps can potentially break the whole environment and from my experience this is one of the main reasons why I encounter a lot of people that think it is difficult to setup python for geo-data analysis.
  2. My main target from the start was to use conda since it solved all problems I ever had with (geo related) python environments. Now the biggest problem here is that conda has no suitable "easy-to-use" machinery for optional dependencies (such as pip). Setting up meta-packages similar to matplotlib-base would be nice, but it just feels like a lot of work (and maintenance work) if I want to support multiple different sets of dependencies. (see Optional groups of dependencies conda/conda#7502)

I am unsure if implementing optional dependencies with pip and keeping the full install on conda is a good idea or if this might cause confusion. On the other hand, I am not sure how I would even implement a conda metapackage...
Any help/insights on this are very much appreciated!

Concerning the handling of optional dependencies.... I need to explicitly test this to be 100% sure but in general I think that optional dependencies should already be treated as expected (except that they are installed...)!

on packages that might be removed

  • xmltodict
    You are right... this was a dependency back in EOmaps v2.0 but it's no longer required and should be removed! This must have slipped under my radar, thanks for pointing this out!

  • packaging:
    As I see it it is a really widely used library to parse and compare package versions (after pkg_resources was deprecated)
    A quick check in my current environment (mamba repoquery whoneeds -t packaging) reveals the following relevant EOmaps dependencies that already use it: matplotlib-base, geopandas-base, rioxarray, pytest, pytest-cov, xarray, sphinx. Also setuptools is vendoring it since v8.0 (see here). Since it already shipps with core-dependencies of EOmaps, I don't really see a reason to replace it.

  • pyepsg:
    Again right... this used to be an explicit dependency but it is no longer required.

  • descartes:
    This is an optional dependency of cartopy for geometry plotting.

@raphaelquast
Copy link
Owner

First draft for optional dependency groups now implemented in the PR that implements the switch from setup.py to pyproject.toml: #216 !

@raphaelquast raphaelquast linked a pull request Dec 19, 2023 that will close this issue
2 tasks
@raphaelquast raphaelquast linked a pull request Jan 2, 2024 that will close this issue
@raphaelquast
Copy link
Owner

Optional pip dependency groups are now included in EOmaps v8.0 (see #205)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature PyOpenSci Review Issues related to the review for PyOpenSci
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants