Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package IDs are not unique #4381

Closed
snoyberg opened this issue Mar 7, 2017 · 2 comments
Closed

Package IDs are not unique #4381

snoyberg opened this issue Mar 7, 2017 · 2 comments

Comments

@snoyberg
Copy link
Collaborator

snoyberg commented Mar 7, 2017

Following up from commercialhaskell/stack#2904. Feel free to read my comments there for background on what downstream issues this caused, I'll focus on just Cabal here.

I've tested with cabal-install 1.24 and HEAD, and in both cases it appears that newly installed packages have an identical package id and key. Furthermore, changing the contents of the package do not result in a change in the package id. This is expected behavior for the key (which is calculated from the inputs to the build plan), but is not expected for the id (which should be based on the resulting binary, and should be unique for changes in source code or dependencies).

I confirmed this behavior with the following Docker script:

FROM ubuntu:16.04

RUN apt-get update
RUN apt-get install software-properties-common -y
RUN add-apt-repository ppa:hvr/ghc
RUN apt-get update
RUN apt-get install ghc-8.0.2 cabal-install-head -y
ENV PATH=/opt/ghc/8.0.2/bin:/opt/cabal/head/bin:$PATH
RUN cabal update
RUN cabal install stm && ghc-pkg describe stm

Which results in the output (among much else):

name: stm
version: 2.4.4.1
id: stm-2.4.4.1-JQn4hNPyYjP5m9AcbI88Ve
key: stm-2.4.4.1-JQn4hNPyYjP5m9AcbI88Ve

Changing the source code and rebuilding results in exactly the same id (unexpected) and key (expected).

@ezyang
Copy link
Contributor

ezyang commented Mar 7, 2017

The assumption that is being made is that if the id does not change, relinking is not necessary. As you've seen, Cabal may allocate an identical id even if the source code changes. Before I talk about why it is this way, I think the correct way to solve this downstream is to relink depending on whether or not any of the inputs to the linker (i.e., the library files) have changed. A reasonable approximation is to test if any of the transitively depended upon source code changed (this is what cabal new-build implements).

In fact, it's not correct to rely on ABI hashes to decide when to relink, even in GHC 7.10 and earlier, where id was based on the ABI hash. The reason is that there are functional changes to a library which may not change an ABI; the easiest way to trigger this is to build a library without optimizations (-O0), and change a value without modifying any other types in the program. For an example of this affecting programs in practice, see http://hackage.haskell.org/trac/ghc/ticket/7277

OK, so let's explain why Cabal works this way. In GHC 7.8 and earlier, the id was based on an ABI hash of the package. This was problematic, because id could not be used in symbol names, since you needed to know the id prior to building to work it into symbol names, but if the id was based on ABI, you couldn't know the ABI until after building. So GHC 7.10 introduced the key field to represent the value we used for the symbols, while keeping id around as the old fashioned ABI hash. (http://ghc.haskell.org/trac/ghc/ticket/9265)

SPJ was not very happy with having two, very similar, but subtly different "unique" identifiers for a package, so he lobbied strongly for collapsing IPIDs and package keys together as one notion. In GHC 8.0, we did just that (http://ghc.haskell.org/trac/ghc/ticket/10714): now what we previously called the key is also used for id, and the two field values are always the same. (Cabal still emits both for BC with GHC 7.10). This helped us simplify a good chunk of GHC (but had some bad fallout with shadowing; now fixed in 8.0.2). In any case, in the world where key and id are always identical, it is clearly impossible for id to incorporate the abi hash.

If you really want to keep relinking based on ABI hashes, ghc-8.0 does record the ABI hash of a package in abi field, so you could look at that to see if it's changed or not. But you really shouldn't.

@snoyberg
Copy link
Collaborator Author

snoyberg commented Mar 9, 2017

I guess that's that then. I don't think it's good practice for GHC to be regularly redefining fields and removing guarantees that downstream can rely upon. We'll work around this in Stack I guess, doesn't look like there's much alternative to doing so.

Thanks for the thorough response, even if it's not the response I was hoping for :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants