Skip to content

Source Guide

Liam edited this page Oct 9, 2023 · 12 revisions

Spring cleaning update:

The current information (which dates to 2018) is substantially accurate for the core "Cabal" repository (the core package builder, with a Setup.hs interface, as opposed to the manifest parser and the CLI). And the Cabal subrepository has grown from around 24,000 lines of code to 33,000 SLOC. However, the wider repository is currently around 133,000 lines of code and growing, with cabal-install (the CLI tool) being the largest package.

For details on the package at large, the reader is better off gleaning information from README.md.

For details on the code recommended codestyle, check out CONTRIBUTING.md.

Some immediately obvious issues:

While this is still a mostly good description of the Cabal (core) subrepository, the subrepository has been massively refactored since then, and quite a few of the links no longer work.

Imported from Trac wiki, in the process of being updated to match the current reality. Nothing below should be substantially wrong, but it's still somewhat incomplete.

Guide to the Cabal source code

On first look the Cabal code seems large and intimidating. This page is intended to give you a head start in understanding it.

Structure

All the Cabal modules live under Distribution.*

The modules can be roughly divided into two groups:

  • The declarative modules: They are mostly concerned with data structures like package descriptions. These modules live under Distribution.*. Much of the code in these modules are utility functions for handling the data types and also functions for parsing and showing them.

  • The active modules: They are concerned with actually doing things like configuring, building and installing packages. These modules live under Distribution.Simple.*.

Size

According to SLOCCount Cabal is currently about 23,500 lines of code. This breaks down as about 7,000 lines for the declarative part and about 16,500 for the active part. Most modules are less than a few hundred lines, though there are a couple monsters nearer 1,000 lines.

Language features and packages

Cabal is 100% Haskell. It is written in Haskell2010 with a fair number of extensions. Formerly, it was written in fairly pure Haskell98 and avoided dependencies on non-core packages, but this is no longer true, with dependencies on transformers, mtl, parsec, and text.

Declarative modules

Really dull modules

  • Distribution/GetOpt.hs (source) (no docs - hidden module): This should live under Compat/ it's just a bundled version of the standard GetOpt. Not very interesting.

Some simple data types

  • Distribution/Version.hs (source) (docs): exports the Version type along with a parser and pretty printer. A version is something like "1.3.3". It also defines a VersionRange data type. Version ranges are like ">= 1.2 && < 2".

  • Distribution/Package.hs (source) (docs): defines a package identifier along with a parser and pretty printer for it. It also defines PackageIdentifiers and Dependencys. Identifiers consist of a name and an exact version and dependencies consist of a name and a version range.

  • Distribution/Verbosity.hs (source) (docs): a simple Verbosity type with associated utilities. There are 4 standard verbosity levels from Silent, Normal, Verbose up to Deafening, with further control in private flags. This is used for deciding what logging messages to print in the active parts.

  • Distribution/Compiler.hs (source) (docs): This has an enumeration of the various compilers that Cabal knows about. It also specifies the default compiler. Sadly you'll often see code that does case analysis on this compiler flavour enumeration like:

      case compilerFlavor comp of
          GHC -> GHC.getInstalledPackages verbosity packageDb progconf
          JHC -> JHC.getInstalledPackages verbosity packageDb progconf
    

    Obviously it would be better to use the proper Compiler abstraction because that would keep all the compiler-specific code together. Unfortunately we cannot make this change yet without breaking the UserHooks api, which would break all custom Setup.hs files, so for the moment we just have to live with this deficiency. If you're interested, see issue #57. 'for the moment' has been twelve years. It's safe to say that this probably isn't happening.

  • Distribution/System.hs (source) (docs): Cabal often needs to do slightly different things on specific platforms. You probably know about the System.Info.os :: String however using that is very inconvenient because it is a string and different Haskell implementations do not agree on using the same strings for the same platforms! (In particular see the controversy over "windows" vs "ming32"). So to make it more consistent and easy to use we have an OS enumeration. This also performs a similar duty for the CPU architecture of the system.

  • Distribution/License.hs (source) (docs): The .cabal file allows you to specify a license file. Of course you can use any license you like but people often pick common open source licenses and it's useful if we can automatically recognise that (eg so we can display it on the hackage web pages). So you can also specify the license itself in the .cabal file from a short enumeration defined in this module. It includes GPL, LGPL and BSD3 licenses. This works with a subset of SPDX.Licenses.

The package description data types

  • Distribution/ParseUtils.hs (source) (no docs - hidden module): The .cabal file format is not trivial, especially with the introduction of configurations and the section syntax that goes with that. This module has a bunch of parsing functions that is used by the .cabal parser and a couple others. It has the parsing framework code and also little parsers for many of the formats we get in various .cabal file fields, like module names, comma separated lists etc.

  • Distribution/PackageDescription.hs (source) (docs): This defines the data structure for the .cabal file format. There are several parts to this structure. It has top level info and then Library and Executable sections each of which have associated BuildInfo data that's used to build the library or exe. To further complicate things there is both a PackageDescription and a GenericPackageDescription. This distinction relates to [Cabal configurations](Cabal configurations). When we initially read a .cabal file we get a GenericPackageDescription which has all the conditional sections. Before actually building a package we have to decide on each conditional. Once we've done that we get a PackageDescription. It was done this way initially to avoid breaking too much stuff when the feature was introduced. It could probably do with being rationalised at some point to make it simpler.

    This has been split apart into several more files since this was last updated, but that isn't crucial.

  • Distribution/PackageDescription/Configuration.hs (source) (docs): This is about the [Cabal configurations](Cabal configurations) feature. It exports finalizePackageDescription and flattenPackageDescription which are functions for converting GenericPackageDescriptions down to PackageDescriptions. It has code for working with the tree of conditions and resolving or flattening conditions.

  • Distribution/PackageDescription/Parse.hs (source) (docs): This defined parsers and partial pretty printers for the .cabal format. Some of the complexity in this module is due to the fact that we have to be backwards compatible with old .cabal files, so there's code to translate into the newer structure.

  • Distribution/PackageDescription/Check.hs (source) (docs): This has code for checking for various problems in packages. There is one set of checks that just looks at a PackageDescription in isolation and another set of checks that also looks at files in the package. Some of the checks are basic sanity checks, others are portability standards that we'd like to encourage. There is a PackageCheck type that distinguishes the different kinds of check so we can see which ones are appropriate to report in different situations. This code gets uses when configuring a package when we consider only basic problems. The higher standard is uses when when preparing a source tarball and by hackage when uploading new packages. The reason for this is that we want to hold packages that are expected to be distributed to a higher standard than packages that are only ever expected to be used on the author's own environment.

  • Distribution/InstalledPackageInfo.hs (source) (docs): The .cabal file format is for describing a package that is not yet installed. It has a lot of flexibility like conditionals and dependency ranges. As such that format is not at all suitable for describing a package that has already been built and installed. By the time we get to that stage we have resolved all conditionals and resolved dependency version constraints to exact versions of dependent packages. So this module defines the InstalledPackageInfo data structure that contains all the info we keep about an installed package. There is a parser and pretty printer. The textual format is rather simpler than the .cabal format, there are no sections for example. This is the format that ghc-pkg understands.

Active modules

Useful internal abstractions

  • Distribution/Simple/Program.hs (source) (docs): This provides an abstraction which deals with configuring and running programs. A Program is a static notion of a known program. A ConfiguredProgram is a Program that has been found on the current machine and is ready to be run (possibly with some user-supplied default args). Configuring a program involves finding its location and if necessary finding its version. There is also a ProgramConfiguration type which holds configured and not-yet configured programs. It is the parameter to lots of actions elsewhere in Cabal that need to look up and run programs. If we had a Cabal monad, the ProgramConfiguration would probably be a reader or state component of it.

    The module also defines all the known built-in Programs and the defaultProgramConfiguration which contains them all.

  • Distribution/Simple/Command.hs (source) (docs): This is to do with command line handling. The Cabal command line is organised into a number of named sub-commands (much like darcs). The Command abstraction represents one of these sub-commands, with a name, description, a set of flags. Commands can be associated with actions and run. It handles some common stuff automatically, like the --help and command line completion flags. It is designed to allow other tools make derived commands. This feature is used heavily in cabal-install.

  • Distribution/Simple/InstallDirs.hs (source) (docs): This manages everything to do with where files get installed (though does not get involved with actually doing any installation). It provides an InstallDirs type which is a set of directories for where to install things. It also handles the fact that we use templates in these install dirs. For example most install dirs are relative to some $prefix and by changing the prefix all other dirs still end up changed appropriately. So it provides a PathTemplate type and functions for substituting for these templates.

  • Distribution/Simple/Compiler.hs (source) (docs): This should be a much more sophisticated abstraction than it is. Currently it's just a bit of data about the compiler, like it's flavour and name and version. The reason it's just data is because currently it has to be in Read and Show so it can be saved along with the LocalBuildInfo. The only interesting bit of info it contains is a mapping between language extensions and compiler command line flags. This module also defines a PackageDB type which is used to refer to package databases. Most compilers only know about a single global package collection but GHC has a global and per-user one and it lets you create arbitrary other package databases. We do not yet support this latter feature very much.

  • Distribution/Simple/PreProcess.hs (source) (docs): This defines a PreProcessor abstraction which represents a pre-processor that can transform one kind of file into another. There is also a PPSuffixHandler which is a combination of a file extension and a function for configuring a PreProcessor. It defines a bunch of known built-in preprocessors like cpp, cpphs, c2hs, hsc2hs, happy, alex etc and lists them in knownSuffixHandlers. On top of this it provides a function for actually preprocessing some sources given a bunch of known suffix handlers. This module is not as good as it could be, it could really do with a rewrite to address some of the problems we have with pre-processors.

  • Distribution/Simple/Utils.hs (source) (docs): A large and somewhat miscellaneous collection of utility functions used throughout the rest of the Cabal lib and in other tools that use the Cabal lib like cabal-install. It has a very simple set of logging actions. It has low level functions for running programs, a bunch of wrappers for various directory and file functions that do extra logging.

  • Distribution/Simple/LocalBuildInfo.hs (source) (docs): Once a package has been configured we have resolved conditionals and dependencies, configured the compiler and other needed external programs. The LocalBuildInfo is used to hold all this information. It holds the install dirs, the compiler, the exact package dependencies, the configured programs, the package database to use and a bunch of miscellaneous configure flags. It gets saved and reloaded from a file (dist/setup-config). It gets passed in to very many subsequent build actions.

Particular phases or actions within the build process

  • Distribution/Simple/Configure.hs (source) (docs): This deals with the configure phase. It provides the configure action which is given the package description and configure flags. It then tries to:

    • configure the compiler
    • resolves any conditionals in the package description
    • resolve the package dependencies
    • check if all the extensions used by this package are supported by the compiler
    • check that all the build tools are available (including version checks if appropriate)
    • checks for any required pkg-config packages (updating the BuildInfo with the results)

    Then based on all this it saves the info in the LocalBuildInfo and writes it out to a file. It also displays various details to the user, the amount of information displayed depending on the verbosity level.

  • Distribution/Simple/Build.hs (source) (docs): This is the entry point to actually building the modules in a package. It doesn't actually do much itself, most of the work is delegated to compiler-specific actions. It does do some non-compiler specific bits like running pre-processors.

  • Distribution/Simple/Build/PathsModule.hs (source) (docs): Generates the Paths_pkgname module. This is a module that Cabal generates for the benefit of packages. It enables them to find their version number and find any installed data files at runtime. This code should probably be split off into another module.

  • Distribution/Simple/Install.hs (source) (docs): This is the entry point into installing a built package. It does the generic bits and then calls compiler-specific functions to do the rest.

  • Distribution/Simple/Haddock.hs (source) (docs): This module deals with the haddock and hscolour commands. Sadly this is a rather complicated module. It has to call ghc-pkg to find the locations of documentation for dependent packages, so it can create links. The hscolour support allows generating html versions of the original source, with coloured syntax highlighting.

  • Distribution/Simple/Register.hs (source) (docs): This module deals with registering and unregistering packages. There are a couple ways it can do this, one is to do it directly. Another is to generate a script that can be run later to do it. The idea here being that the user is shielded from the details of what command to use for package registration for a particular compiler. In practice this aspect was not especially popular so we also provide a way to simply generate the package registration file which then must be manually passed to ghc-pkg. It is possible to generate registration information for where the package is to be installed, or alternatively to register the package inplace in the build tree. The latter is occasionally handy, and will become more important when we try to build multi-package systems. This module does not delegate anything to the per-compiler modules but just mixes it all in in this module, which is rather unsatisfactory. The script generation and the unregister feature are not well used or tested.

  • Distribution/Simple/SrcDist.hs (source) (docs): This handles the sdist command. The module exports an sdist action but also some of the phases that make it up so that other tools can use just the bits they need. In particular the preparation of the tree of files to go into the source tarball is separated from actually building the source tarball. The sdist action also does some distribution QA checks.

Compiler-specific modules

  • Distribution/Simple/GHC.hs (source) (docs): This is a fairly large module. It contains most of the GHC-specific code for configuring, building and installing packages. It also exports a function for finding out what packages are already installed. Configuring involves finding the ghc and ghc-pkg programs, finding what language extensions this version of ghc supports and returning a Compiler value. getInstalledPackages involves calling the ghc-pkg program to find out what packages are installed. Building is somewhat complex as there is quite a bit of information to take into account. We have to build libs and programs, possibly for profiling and shared libs. We have to support building libraries that will be usable by GHCi and also ghc's -split-objs feature. We have to compile any C files using ghc. Linking, especially for split-objs is remarkably complex, partly because there tend to be 1,000's of .o files and this can often be more than we can pass to the ld or ar programs in one go. There is also some code for generating Makefiles but the less said about that the better. Installing for libs and exes involves finding the right files and copying them to the right places. One of the more tricky things about this module is remembering the layout of files in the build directory (which is not explicitly documented) and thus what search dirs are used for various kinds of files.

  • Distribution/Simple/UHC.hs (source) (docs)

Stuff related to the front end

  • Distribution/Simple/UserHooks.hs (source) (docs): This defines the API that Setup.hs scripts can use to customise the way the build works. This module just defines the UserHooks type. The predefined sets of hooks that implement the Simple, Make and Configure build systems are defined in Distribution.Simple. The UserHooks is a big record of functions. There are 3 for each action, a pre, post and the action itself. There are few other miscellaneous hooks, ones to extend the set of programs and preprocessors and one to override the function used to read the .cabal file. This hooks type is widely agreed to not be the right solution. Partly this is because changes to it usually break custom Setup.hs files and yet many internal code changes do require changes to the hooks. For example we cannot pass any extra parameters to most of the functions that implement the various phases because it would involve changing the types of the corresponding hook. At some point it will have to be replaced.

  • Distribution/Simple/Setup.hs (source) (docs): This is a big module, but not very complicated. The code is very regular and repetitive. It defines the command line interface for all the Cabal commands. For each command (like configure, build etc) it defines a type that holds all the flags, the default set of flags and a Command that maps command line flags to and from the corresponding flags type. All the flags types are instances of Monoid, see http://www.haskell.org/pipermail/cabal-devel/2007-December/001509.html for an explanation. The types defined here get used in the front end and especially in cabal-install which has to do quite a bit of manipulating sets of command line flags. This is actually relatively nice, it works quite well. The main change it needs is to unify it with the code for managing sets of fields that can be read and written from files. This would allow us to save configure flags in config files.

  • Distribution/Simple.hs (source) (docs): This is the command line front end to the Simple build system. The original idea was that there could be different build systems that all presented the same compatible command line interfaces. There is still a Make system (see below) but in practice no packages use it. This module exports the main functions that Setup.hs scripts use. It re-exports the UserHooks type, the standard entry points like defaultMain and defaultMainWithHooks and the predefined sets of UserHooks that custom Setup.hs scripts can extend to add their own behaviour.

  • Distribution/Make.hs (source) (docs): This is an alternative build system that delegates everything to the make program. All the commands just end up calling make with appropriate arguments. The intention was to allow preexisting packages that used makefiles to be wrapped into Cabal packages. In practice essentially all such packages were converted over to the Simple build system instead. Consequently this module is probably not used much and it certainly only sees cursory maintenance and no testing. Perhaps at some point we should stop pretending that it works.

Clone this wiki locally