Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimizing incompatibility between cartopy and shapely #805

Closed
sgillies opened this issue Oct 6, 2016 · 40 comments · Fixed by #1251
Closed

Minimizing incompatibility between cartopy and shapely #805

sgillies opened this issue Oct 6, 2016 · 40 comments · Fixed by #1251

Comments

@sgillies
Copy link

sgillies commented Oct 6, 2016

Cartopy has a module that both imports Shapely and adds new GEOS bindings: https://github.com/SciTools/cartopy/blob/master/lib/cartopy/trace.pyx. I think this is an unsound approach that can lead innocent cartopy users into DLL Hell.

I'd like to propose two better approaches.

Move all of cartopy's GEOS dependency into Shapely

The extension code in trace.pyx is rather small and could be moved to Shapely and deleted from cartopy. The DLL Hellmouth would be closed and cartopy users could benefit from binary Shapely wheels. There's still time to do this before Shapely 1.6.0.

Vendorize Shapely

Cartopy could copy the Shapely source and add build steps for it into setup.py, removing it from the list of dependencies that would be resolved by pip or setuptools. The internal copy of Shapely would be built using the same geos_c as cartopy's trace module and the DLL Hellmouth would thus be closed.

@QuLogic @pelson I see in other tickets here that conda is what you're focusing on. Neither of my proposed approaches are need for conda, though moving the extension code into Shapely will have the benefit of removing calls into Shapely's private internal API and thus making cartopy more forward compatible with Shapely. I think that's probably the better approach for making pip install cartopy work on OS X.

@dopplershift
Copy link
Contributor

👍 to upstreaming things.

@QuLogic
Copy link
Member

QuLogic commented Oct 7, 2016

I really only use Fedora packages (which are almost always in sync), and only point to conda because it's the only one I know of that works on Macs. I'm all for moving things into shapely if that's the better location for things.

@ResidentMario
Copy link

ResidentMario commented Jan 1, 2017

Howdy, for the past month or so I've been working on a high level-plotting package which extends cartopy and uses geopandas for input. The DLL issues stopped development very early in the process, but then I was somehow (through plodding around enough I suppose) able to arrive at a Mac OSX environment in which everything "just worked".

Fast forward a month later, and geoplot is (at least initially) done (some examples are in the docs). Returning to the platform issue, however, I was not even able to reproduce the working environment I had generated earlier! I can't distribute the codebase, because I don't know of a way of getting all of the components installed successfully.

There are a few other complications, but this issue is the core of it. So I'm obviously interested in helping to resolve it. I could use some direction from you folks in doing so, since I have significantly more time than experience.

To begin with, right now I'm trying to get the shapely and cartopy libraries built from source in the same environment and passing all of their tests. @QuLogic how do you run your cartopy tests? Running nosetests in the command line (post-python setup.py install) errors out for me with ImportError: No module named cartopy._crs.

@QuLogic
Copy link
Member

QuLogic commented Jan 2, 2017

@ResidentMario I don't usually rebuild my environment, but when I do, I basically follow the steps from .travis.yml (except for not re-downloading conda.)

@dopplershift
Copy link
Contributor

I think it's time to discuss this more--it's now impossible to build cartopy on travis (linux) against the 1.5.17 manylinux wheels that have been uploaded. In fact, I have to work harder to get pip to ignore those wheels.

GIven that shapely is a full-fledged wrapper around GEOS, I think the sane solution is to add any APIs that cartopy needs to Shapely and eliminate cartopy's direct dependence on GEOS. Can someone on the CartoPy side weigh in on whether such a move would be supported? I'm willing to put in the work needed, but can only afford to do so if the effort won't go wasted.

@ResidentMario
Copy link

I don't see a patch doing what you've just described being rejected, given the concurring opinions of a cartopy core dev and the fact that it was originally proposed by the core shapely maintainer.

@ajdawson
Copy link
Member

ajdawson commented Jan 8, 2017

Thanks @dopplershift - I'm in favour of doing what we can to improve ease of use and compatibility, and it is great you are keen to help.

I think a good starting point would be to record here what the main changes will be for cartopy (i.e. what moves from cartopy to shapely, what needs to change in cartopy to accomodate this) and what (if any) will be the differences from an end-user perspective.

@ResidentMario
Copy link

ResidentMario commented Jan 29, 2017

Has there been any progress on this?

Edit: There's a licensing incompatibility, see shapely #451 (again).

@dopplershift
Copy link
Contributor

License incompatibility. Of course there is.

@sgillies
Copy link
Author

I've used the GPL before and have nothing against it, but it does mean that Shapely cannot lift the code from Cartopy. We'd need the copyright holder, British Crown Copyright 2011 - 2016, Met Office, to donate a new version of trace.pyx with a permissive license.

@dopplershift
Copy link
Contributor

I only hold the (L)GPL against people who assert that it doesn't create any problems--ones like this for instance.

@QuLogic
Copy link
Member

QuLogic commented Mar 20, 2017

That's not really something that's unique to (L)GPL, so let's not get off topic here. At least we have one advantage with the CLA here; there's only one "person" to ask about re-licensing. But I think first we need to figure out what part.

@ResidentMario
Copy link

From a quick scan, all of the Met Office projects seem to be GPL or LGPL. So I think this is just their policy on things. They can do that because of UK crown copyright, AFAIK all GitHub code published by e.g. the NOAA in the US is unlicensed by default because federal law compels most work released by federal agencies to be public domain.

@QuLogic It seems we just want them to release trace.pyx, no?

@marqh
Copy link
Member

marqh commented Mar 21, 2017

I've used the GPL before and have nothing against it, but it does mean that Shapely cannot lift the code from Cartopy. We'd need the copyright holder, British Crown Copyright 2011 - 2016, Met Office, to donate a new version of trace.pyx with a permissive license.

i'll pick this up and get it looked into

in principal, I see no problems, I expect it is just some administration, but I will report back

@sgillies @ResidentMario is
shapely/shapely#451
in your view the set of changes to make, or is that just part of the work?

thank you

@marqh
Copy link
Member

marqh commented Apr 7, 2017

hello @sgillies

I have looked into this and there are no issues with relicensing trace.pyx and moving it to shapely
shapely/shapely#479

Regarding Licensing
@QuLogic : you are the only contributor not already covered by the copyright, so approval from you to relicense your code from LGPL to BSD is required in order for this code to be donated. A comment on this Issue and that PR is sufficient

@QuLogic
Copy link
Member

QuLogic commented Apr 7, 2017

Is Cartopy not covered by the CLA I signed?

@QuLogic
Copy link
Member

QuLogic commented Apr 7, 2017

Never mind, I forgot that was an LGPL-only grant. I agree with this re-licensing for Shapely.

@sgillies
Copy link
Author

I've overlooked that _trace.pyx uses both GEOS and PROJ.4 APIs. I think a refactor would be needed before the GEOS calling code could be moved to Shapely, and I don't want to insist on such a refactor for Cartopy. Vendorizing Shapely may be the best option after all, yes?

@pelson
Copy link
Member

pelson commented Jun 16, 2017

Hi @sgillies,

Sorry for the radio silence on this 📻 . I've been somewhat pre-occupied with fluffy animals down-under recently 🐨 🙃 (I have been seconded in Australia for 6 months). I'm now back the right way up, and I'm keen to solve this issue once and for all.

trace.pyx is a combination of GEOS and PROJ.4 - its responsibility is to produce interpolated geometries in their target projection. A good deal of the code was written some time ago, and although has held up well, is in some need of love. As I recall, the reason we ended up using GEOS directly was that there were a number of bottlenecks in Shapely's geometry creation (even in Cython) that we were able to significantly optimise by making prior assumptions about the incoming coordinate sequences.

I'd like to split the conversation at this point into two threads: what we do in the short term; and what cartopy does in the long term.

What we do in the short-term?

The fact that Shapely's wheels bundle GEOS, and that cartopy depends on both shapely and GEOS means that cartopy MUST either:

  • be using exactly the same version of GEOS at build and run time as is used in the shapely wheels
  • vendor its own version of Shapely so that it is decoupled from the GEOS version being used by the official Shapely wheels

Unfortunately we are in this situation because bundling third party dependencies in a binary format designed to contain a single package, such as is done with binary wheels, takes away a means for us to express dependencies fully. This is not a criticism of Shapely having done so - it is the only solution that is currently available to us for distributing binary packages through pip, and is the reason I continue to advocate the use of conda or a user's system package manager (the pip interface is excellent, and there are downsides to conda too, it is just that PythonPackaging doesn't deal with the concept of non-python packages well IMO).

Anyway, this is not intended to be a rant (and I definitely don't want to criticise what I actually believe is the best tool for the job when it comes to pure python packages) - we are where we are, and I'm keen to find a sustainable solution.

At this point, though it is my least favourite option, I believe vendoring shapely is the less brittle of the two options. The one question I have is whether the use of two potentially different versions of GEOS (the one from cartopy.shapely and the one from shapely) is likely to see shared symbol issues for GEOS functions? (shapely isn't statically linked in the wheels, right?)

What cartopy does in the long-term?

In the long term I can see a few feasible options:

  • cartopy feeds-back any optimisation steps back into Shapely, rather than circumventing it, and uses a Cython level Shapely APIs exclusively (downside: long term we might want to be looking at things like numba for our optimisations, which means low level calls to GEOS (or a numba shaped shapely) is going to be pretty important)
  • GEOS is shipped as its own wheel, and shapely depends upon that. It would enable cartopy to express its dependency on (specific versions of) both shapely and GEOS, and pip can do its magic on resolving the dependencies correctly.
  • the trace part of cartopy if hived off into its own package, and it makes no use of shapely whatsoever - GEOS stuff only. Cartopy could then depend on it and shapely at the Python level [if the option of cartopy vendoring shapely is going to work, so would this].

At this point, [naturally] I prefer the idea of moving the optimisations out of cartopy and into Shapely. Not entirely unrelatedly, I'm expecting to be putting a significant effort into the trace code in the next 6-12 months in order to solidify cartopy for a major release. I think doing this will inform us some more about the direction of travel for cartopy in the long-term.


This is somewhat longer than my normal comments, hope you're still with me! Would love to get your feedback/thoughts.

Cheers,

@dopplershift
Copy link
Contributor

Isn't vendoring shapely simply going to shift the problem onto cartopy users who are using shapely manually? Wouldn't this:

import cartopy
import shapely

still only load one copy of the geos library, which will break one of those two?

@pelson
Copy link
Member

pelson commented Jun 17, 2017

Thanks for putting my question more succinctly @dopplershift. That is the key question we need to be able to answer, otherwise at least one (but probably both cartopy and shapely) will need to vendor GEOS as well (and have to do unpleasant things with its namespace). Nobody wants to be doing that though, so we should get a concrete grip/PoC on answering that question.

@QuLogic
Copy link
Member

QuLogic commented Jun 17, 2017

@marqh already opened shapely/shapely#479 which should be the long-term solution; he just didn't finish cleaning it up. Is that not enough?

@QuLogic
Copy link
Member

QuLogic commented Dec 31, 2018

Oops, that PR was not supposed to close this issue; it was only a cross-reference.

@QuLogic
Copy link
Member

QuLogic commented Sep 10, 2021

As a followup to #1251, I have a branch that converts to Shapely instead of GEOS directly, but as I recall, it was quite a bit slower. Perhaps using pygeos might work better.

@greglucas
Copy link
Contributor

@QuLogic, have you tried your cartopy branch with the shapely 2.0 branch? I believe that incorporates a lot of the pygeos routines into shapely and could speed things up without having to switch to pygeos directly.

@QuLogic
Copy link
Member

QuLogic commented Sep 14, 2021

I don't think that's the problem; it's more that we have to swap from Cython to Python to C, and flipping the GIL so much is quite slow. Checking current commit, doing this doubles the projection time.

@jorisvandenbossche
Copy link
Contributor

@QuLogic do you have a link to your branch? I am happy to take a look in case I can spot a potential performance improvement with the vectorized operations of pygeos/shapely 2.0

@QuLogic
Copy link
Member

QuLogic commented Sep 15, 2021

I've rebased the branch and pushed it here: https://github.com/QuLogic/cartopy/tree/shapely

I think we could use the vectorized creation routines in pygeos (are those in shapely 2?) here: https://github.com/QuLogic/cartopy/blob/shapely/lib/cartopy/trace.pyx#L123

You can see the effect on projection here: https://qulogic.github.io/cartopy-bench/#project_linear.Suite.time_project_linear (in the sixth- to second-last commits)

@jorisvandenbossche
Copy link
Contributor

I think we could use the vectorized creation routines

Yes, so if you have an array of coordinates, you can create linestrings from it with pygeos.linestrings(..). The default usage requires to each linestring to have the same number of coordinates, but if you keep track of a separate indices array to indicate to which linestring each point belongs, you can have variable length linestrings:

In [29]: pygeos.linestrings([(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)], indices=[0, 0, 0, 1, 1])
Out[29]: 
array([<pygeos.Geometry LINESTRING (1 1, 2 2, 3 3)>,
       <pygeos.Geometry LINESTRING (4 4, 5 5)>], dtype=object)

(the coordinates can also be supplied as to separate 1D arrays instead of a single 2D array)
And then from the array of linestrings, you can create the single MultiLineString.

are those in shapely 2?

Those will be, but are not yet in the shapely-2.0 branch in the Shapely repo. So if you want to experiment with it, you will need to import them from pygeos at the moment.

Now, the above is only for the geometry creation in LineAccumulator.as_geom (and I assume it can "fix" the performance regression in that part), but I don't know how important this part is in the slowdown compared to the other changes (like calling covers through python objects in the other functions)

@greglucas
Copy link
Contributor

Just pinging this issue again as we are getting a lot of new reports about incompatibilities again with the Shapely 1.8.1 release, which changed the packaging of Shapely. We can solve the installation problems with the old workaround of ignoring the binary shapely wheel: pip install --upgrade --no-binary shapely shapely, but we should probably look at how to best resolve this inconsistency in the long-run.

@MHBalsmeier
Copy link
Contributor

After the release of shapely 1.8.2, I encountered a tricky issue outlined here.
I found that downgrading shapely to 1.8.0 solves the problem.
I think this issue here describes the core problem.
There are some ideas in this conversation since a long time regarding how this could be solved. I wonder if anyone works on this?

@yucsong
Copy link

yucsong commented Jun 8, 2023

amazing the problem hasn't been fixed!!!

tom-andersson added a commit to alan-turing-institute/deepsensor that referenced this issue Jun 10, 2023
…y#805). Assume user has set up separately if using `deepsensor`'s `cartopy` plotting functionality.
@ClariNerd617
Copy link

Do we have any intelligence yet on whether using Docker containers for environment management improves this issue?

@greglucas
Copy link
Contributor

We have just released v0.22 which should help with the compatibility between packages and installation much easier. Please open a new issue if you are still having problems.

@QuLogic QuLogic added this to the 0.22 milestone Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.