Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of Heroku-24 run and build images #266

Closed
edmorley opened this issue Mar 18, 2024 · 2 comments
Closed

Reduce size of Heroku-24 run and build images #266

edmorley opened this issue Mar 18, 2024 · 2 comments
Assignees

Comments

@edmorley
Copy link
Member

edmorley commented Mar 18, 2024

Initial experimental (since Ubuntu 24.04 isn't even GA itself yet) Heroku-24 images were added in #245.

In that PR, a few packages were dropped compared to Heroku-22, to try and reduce the image size.

However, we'd like to reduce the size of the image further, since even with those changes the images have ended up larger than Heroku-22:

$ docker images
REPOSITORY      TAG        IMAGE ID       CREATED      SIZE
heroku/heroku   24-build   19fbe99496c9   7 days ago   1.09GB
heroku/heroku   24         260534c30d08   7 days ago   665MB
heroku/heroku   22-build   9bd3e3a84bac   7 days ago   1.02GB
heroku/heroku   22         37fee4e277ea   7 days ago   643MB

Smaller images sizes are going to be even more important in a CNB world, where the image size trade-offs have shifted quite a bit from the stack+slug model. In addition, in an SBOM world, reducing number of packages (and thus potential vulnerability surface area) is going to be something users become increasingly interested in.

We can't remove packages from a new base image version once it GAs (due to image rebasing meaning every new image update must be backwards compatible), so we must do this before Heroku-24 GAs.

Possible ideas:

  • Remove build toolchains from the run image (they were added in Add GCC to runtime dependencies #127 for Ruby MJIT support, but MJIT has now mostly been superseded by YJIT)
  • Remove git from the run image
  • Remove Python from the run image
  • Remove stunnel (since it pulls in systemd too, and Heroku Redis now supports TLS natively, making the stunnel buildpack redundant)

GUS-W-15159536.

@edmorley edmorley self-assigned this Mar 18, 2024
edmorley added a commit that referenced this issue Mar 20, 2024
When creating an `ext3` filesystem with `mkfs` (which underneath calls
`mke2fs` via the `mkfs.ext3` alias) various default filesystem settings
(such as the inode ratio and block size) are chosen based on the
"usage type" of the filesystem.

If not explicitly specified, this "usage type" is determined based on
the size of the filesystem. For example, the `default` profile is used
for filesystems between 512 MB and 4 TB, and the `small` profile is
used for filesystems between 3 MB and 512 MB. See:
https://manpages.ubuntu.com/manpages/jammy/en/man8/mkfs.ext3.8.html

For #266 I have several local changes for making the Heroku-24 images
smaller, however, image generation was failing since the slimmer images
now fall under the 512 MB threshold, causing `mke2fs` to use the `small`
profile instead. This `small` profile uses a drastically different
`inode_ratio`, which is very inefficient for our use-case - resulting
in a filesystem overhead of over 11%, which throws off the `.img` size
calculation.

Whilst we could work around this by adjusting the `.img` size
calculations, it makes more sense to force the usage of the `default`
profile, so all of our base images use the same filesystem settings,
rather than relying on `mke2fs`'s size heuristics.

I've also enabled verbose output (which shows the profile being used)
and added additional file size logging.

GUS-W-15292800.
edmorley added a commit that referenced this issue Mar 21, 2024
GCC was added to our run images back in #127 in order to support
Ruby 2.6's then new MJIT feature:
https://www.ruby-lang.org/en/news/2018/12/25/ruby-2-6-0-released/

However, since then:
- The Ruby MJIT feature hasn't really resulted in significant
  performance benefits for real world use-cases like a Rails app.
- Ruby's MJIT has since been superseded by YJIT, which is faster and
  doesn't need GCC at runtime:
  https://shopify.engineering/yjit-just-in-time-compiler-cruby
  https://shopify.engineering/ruby-yjit-is-production-ready
- The image size impact of including build tools in our run images has
  increased considerably (#127 quoted it as 84 MB, but measuring now
  it's 203 MB).
- In a CNB world, image size is much more of a concern than in the S3
  `.img` + slug model, so we need to be more selective over what
  packages we include.

As such, this removes `gcc`, `make` and `libc6-dev` from the run image
for a 203 MB saving (they are still present in the build image, hence
zero changes to `installed-packages-*.txt` for that image).

Richard (Ruby owner) has confirmed he's fine with this change.

Note: I'm intentionally not adding `binutils` back (which was a
transitive dependency), since its 15 MB cost is not worth it for the
~once a year platform operator debugging use-case.

Before:

```
-----> Size breakdown...
       heroku/heroku:24         661MB
       heroku/heroku:24-build   1.13GB
```

After:

```
-----> Size breakdown...
       heroku/heroku:24         458MB
       heroku/heroku:24-build   1.13GB
```

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue Mar 21, 2024
Since:
- Most Git use-cases are for cloning dependencies during the build.
- On Heroku at runtime there is no `.git/` metadata to query the
  local project's repo anyway (since the directory isn't preserved
  during the build).
- It saves 17 MB, and in a CNB world image size is a much bigger
  concern, so we need to be more selective about what packages
  we include.
- Once Heroku-24 GAs we can't remove packages (since it will break
  backwards compatibility given stack rebasing), however, we can add
  packages - so we should err on the side of removing packages now.

Before:

```
-----> Size breakdown...
       heroku/heroku:24         458MB
       heroku/heroku:24-build   1.13GB
```

After:

```
-----> Size breakdown...
       heroku/heroku:24         441MB
       heroku/heroku:24-build   1.13GB
```

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue Mar 21, 2024
Since:
- `heroku-buildpack-pgbouncer` hasn't used stunnel since 2018:
  heroku/heroku-buildpack-pgbouncer#104
- Redis 6 and newer support native TLS, making `heroku-buildpack-redis` redundant:
  heroku/heroku-buildpack-redis#40
  (The buildpack can be sunset now that old Redis instances have been shut down)
- If any other less common use-case needs stunnel, they can install it using APT.
- It reduces the run and build image sizes by 17 MB, and in a CNB world image size is a much bigger concern, so we need to be more selective about what packages we include.
- Once Heroku-24 GAs we can't remove packages (since it will break backwards compatibility given stack rebasing), however, we can add packages - so we should err on the side of trying out removing packages now.

Before:

```
-----> Size breakdown...
       heroku/heroku:24         441MB
       heroku/heroku:24-build   1.13GB
```

After:

```
-----> Size breakdown...
       heroku/heroku:24         424MB
       heroku/heroku:24-build   1.11GB
```

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue Mar 21, 2024
Since:
- Python apps will (or should be) be using Python provided by the
  Python buildpack instead.
- Non-Python buildpacks/apps typically don't need Python at runtime.
- Having Python in the run image has caused confusion in support tickets
  where the Python buildpack wasn't present (such as it being
  accidentally replaced when adding second buildpack), since at runtime
  apps then fail with a less obvious `ModuleNotFound` error instead of
  `python: command not found`.
- None of our other officially supported languages (that have their own
  buildpacks) are also installed as system packages in the base image.
- Removing Python reduces the run image size by 34 MB, and in a CNB
  world image size is a much bigger concern, so we need to be more
  selective about what packages we include.
- Once Heroku-24 GAs we can't remove packages (since it will break
  backwards compatibility given stack rebasing), however, we can add
  packages - so we should err on the side of trying out removing
  packages now.

Python is still in the build image since various non-Python use-cases
need it (for example Node.js packages that use node-gyp require Python
at install time), plus several other system packages in the build image
depend on it anyway.

I've intentionally removed the `python-is-python3` package entirely
(rather than still including it in the build image), since the vast
majority of tooling will (or should be) checking for the presence of
`python3` directly (given that's the default name on Ubuntu unless the
backward compat package is installed). And for most end-user/app
use-cases we would prefer they use the Python buildpack (rather than
system Python), so a `python: command not found` will nudge them in that
direction. We can always add `python-is-python3` back later if this
turns out to be a bigger issue than expected.

Note: The classic PHP buildpack does use Python in its
`heroku-php-apache2` and `heroku-php-nginx` scripts, however, it's only
used when `realpath` doesn't exist (eg macOS), so is unused on Heroku.
The buildpack will need to adjust for the `python-is-python3` removal,
but arguably should have done that previously (given during the Python
2 -> 3 transition the major version of `python` changed). (If it needs
to support environments where only the command `python` exists, and not
`python3`, then it can use something like:
`PYTHON=$(which python3 || which python)`)


Before (once the other PRs are merged):

```
-----> Size breakdown...
       heroku/heroku:24         424MB
       heroku/heroku:24-build   1.11GB
```

After:

```
-----> Size breakdown...
       heroku/heroku:24         390MB  (34 MB reduction)
       heroku/heroku:24-build   1.11GB (unchanged)
```

Towards #266.
GUS-W-15159536.
@edmorley
Copy link
Member Author

edmorley commented May 7, 2024

Smaller images sizes are going to be even more important in a CNB world

As already being seen in:
heroku/buildpacks#6 (comment)
https://www.reddit.com/r/rails/comments/1chrwmq/comment/l24ylu2/

edmorley added a commit that referenced this issue May 8, 2024
Since:
* It's a niche package, that appears to only be installed since it was a
  transitive dependency of `dnsutils` in Cedar-14, which was then
  copied to Heroku-16 as an explicit dependency along with a number
  of others, when that stack was added.
* The `libgeoip1` library (that is needed along with `geoip-database` to actually
  use it) has been missing from the run image since Heroku-20, and no one has
  noticed its absence.
* It reduces the the run/build image sizes by ~10 MB.

See:
https://packages.ubuntu.com/noble/geoip-database
https://packages.ubuntu.com/noble/libgeoip-dev

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue May 13, 2024
Since:
- The `libnetpbm10-dev` package is actually an empty virtual package,
- The runtime library it pulls in (`libnetpbm11`) isn't in any of our
  run images (all the way back to Heroku-18), meaning it's not actually
  usable at runtime anyway, and yet no one has reported its absence in
  the last 6 years.

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue May 13, 2024
Since:
- All of the language bindings I could find for it were unpopular and
  not actively maintained. For example:
    - Ruby: https://github.com/chrisliaw/gcrypt
      (last commit 3 years ago, 0 stars, not published to rubygems.org)
    - Python: https://framagit.org/okhin/pygcrypt/
      (last commit 6 years ago, 0 stars, close to zero PyPI downloads
      excl mirrors syncing)
- It's the dev package for the library extracted from GnuPG, and it's
  much more common for use-cases to interact with the `gpg` CLI directly.
  eg: https://github.com/vsajip/python-gnupg (8 million downloads/month)
  which uses the CLI instead.

See:
https://packages.ubuntu.com/noble/libgcrypt20-dev
https://gnupg.org/software/libgcrypt/

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue May 13, 2024
Since:
- This is the dev package for `libdb5.3`, a lib for Berkeley DB, which as
  DBs go is fairly obscure.
- The main reason this is in the base image, is since the Python stdlib
  contains a module for Berkeley DB (`dbm.ndbm`), however, we don't
  need the headers in the build image for that (since they can be installed
  in the image where the Python runtimes are built instead).
- There are very few language bindings for `libdb`, and those I could find
  were unpopular and not actively maintained. eg:
  https://github.com/ruby-bdb/bdb (38 stars, last commit and rubygems.org
  release in 2011)

See:
https://packages.ubuntu.com/noble/libdb-dev

Towards #266.
GUS-W-15159536.
edmorley added a commit that referenced this issue May 13, 2024
Since:
- It was added in #146 along with the `libc-client2007e` runtime library
  for use by PHP, however, for PHP's use-case (binary compilation) the
  headers don't need to be in the build image itself, but can instead be
  installed during the PHP binary build process.
- There are no other popular `libc-client2007e` bindings for languages other
  than PHP that use these headers. (Compared to the other LDAP library already
  in the build image, `libldap-dev`, for which there are several popular bindings.)

See:
https://packages.ubuntu.com/noble/libc-client2007e-dev

Towards #266.
GUS-W-15159536.
@edmorley
Copy link
Member Author

edmorley commented May 13, 2024

Ok, we're now in a much better place (and as good as we're going to get for now without affecting image usability; longer term we can also discuss having a separate slim variant)...

Before:

$ docker images
REPOSITORY      TAG        IMAGE ID       CREATED      SIZE
heroku/heroku   24-build   19fbe99496c9   7 days ago   1.09GB
heroku/heroku   24         260534c30d08   7 days ago   665MB

After:

$ docker images
REPOSITORY      TAG        IMAGE ID       CREATED          SIZE
heroku/heroku   24-build   f324a55dc0cd   29 minutes ago   959MB
heroku/heroku   24         5de54ecec00d   42 minutes ago   373MB

Note:

  1. These are the amd64 image sizes (the arm64 images are a few MB either side)
  2. These sizes are those shown using the traditional Docker storage backend, not the new containerd snapshotter (since the latter not only displays the combined sizes of all architectures, but also displays wonky size numbers in general)
  3. The size of the images will fluctuate up and down over time as and when packages that are in the upstream ubuntu:24.04 image receive updates, until the next upstream image refresh (which occurs upstream twice a month). This is because if the apt-get dist-upgrade command run by the scripts in this repo pull in a newer version of a package already in the ubuntu:24.04 image, there will temporarily end up being two copies of the package in different layers of our final images. The size impact of this can be seen using the tool dive and looking at the "potential wasted space" number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant