Significant build regressions on `swift:6.0-noble` compared to `5.10-noble` #76555

MahdiBM · 2024-09-18T18:05:15Z

Description

Absolutely massive Significant build regressions on linux , worsened by swift-testing

EDIT: Please also read my next comment which includes more info.

Environment

GitHub Actions
8 CPU x 16GB RAM c7x EC2 Instance (we use RunsOn)
Migrating from swift:5.10-noble to swift:6.0-noble
Tried with and without using a previous .build cache. No behavior difference noticed at all.
200K-300K LOC project
Updated to // swift-tools-version:6.0 in Package.swift
All targets still have Swift 5 language mode enabled.
No actual tests migrated to swift-testing just yet.

What Happened?

Our Tests CI started getting stuck after the toolchain update.
The CI machines were getting killed by AWS. Considering this has been a common problem specially in the Server Side Swift ecosystem, which also has been brought up for ages and numerous times, my first guess was that the machine is running out of RAM when building the Swift project.
After a lot of tries, I noticed a c7x-family machine with 64 CPU x 128 GB RAM (8x larger than before) runs the tests as well as they were being run before, on swift 5.10.
My first guess was that maybe Swift testing is causing an issue, so tried --disable-swift-testing.
After that, our tests have been running on an only 2x larger machine, and the rest of the things are back to normal in tests CI.
With this bigger machine, it still almost takes as long as before for the tests CI to complete (we run tests in parallel with 64 workers).
So this means that not only Swift 6 has a significant build regression, but also swift-testing makes it go from significant to absolutely massive.
This is borderline unusable for the Swift-Server ecosystem. I hope this problem requires more specific situations to happen that what meets the eye.
Still trying things out with the deployment CI. even 2x CPU and 4x RAM isn't proving helpful even though I did throw the --disable-swift-testing flag into the mix.

Reproduction

Not sure.

Expected behavior

No regressions. Preferably even faster than previous Swift versions.

Environment

Mentioned above.

Additional information

No response

The text was updated successfully, but these errors were encountered:

grynspan · 2024-09-18T18:42:35Z

Some notes, in no particular order (will update if I think of more):

@stmontgomery's and my initial reaction is that this is probably not related to Swift Testing since this project hasn't added any Swift Testing tests yet.
--disable-swift-testing does not affect the build. It did for a while with prerelease Swift 6 toolchains before we added Swift Testing to the toolchain, but it has no effect on the release toolchain. Only one build product is produced for a test target, and it's the same either way. It does affect whether or not a second test process is spun up after building to run Swift Testing tests.
It is normal to see text output from Swift Testing when you run swift test even if you don't have any tests using it. The output is basically just saying "didn't find any Swift Testing tests, bye now." If you pass --disable-swift-testing, it suppresses spawning the process that looks for Swift Testing tests, which is why you don't see any output from it when you pass that flag.
Swift Testing is available in Swift 5 language mode so long as you're using a Swift 6 toolchain. Language mode isn't a factor.
@tayloraswift suggested this might be Ubuntu 24.04 Swift 5.10.1 release toolchain should not have assertions enabled #76091.

MahdiBM · 2024-09-20T01:19:49Z

To be clear:

I don't think it was necessarily swift-testing doing something wrong. More-likely the compiler or SwiftPM mishandling it.
Even without swift-testing there is a significant build time regression, when the build ends at all. The benchmarks below show it properly.

It's very sad to see the [Linux] build situation constantly getting worse despite us asking for faster and less-RAM-hungry builds for ages.
Based on the lots of user reports we consistently have over on the Vapor Discord server, a lot of deployment services have blocking problems with building Swift projects since their builders run out of RAM. As you can imagine, this can be a significant obstacle in deploying server-side Swift apps.
Just ask the SSWG folks @0xTim / @gwynne if you have any doubts.

Anyways, let's head to the benchmarks I did. I likely ran 500+ CI jobs in the past 48 hours ...
Note that we do use a couple of macros. All self-served. One of them lives in an external repo.
Image names are the exact official Docker image names.

Tests CI

Tests CI machine sizes are compared to what we had for swift:5.10-noble: 8 CPU x 16GB RAM c7x EC2 Instance.
- Other than "Same", the other machines are of the m7x family which have the higher RAM ratio.
In the tests below, I tried both with and without --disable-swift-testing. Results are only marginally different.
There are 3 variables in build times. 1- usage of cache 2- build-step time 3- tests-build/run-time.
- The table below contains 2 of the variables.
- If cache-usage and build-step-time have not changed but the run-time has changed, It means it was because of the tests-run/build time.
- We build the package, then cache .build, and then run the tests in another step.
- The tests-build/run-step does also build something more. I'm not sure what, but it seems like it does.
- Build-step command: swift build --explicit-target-dependency-import-check error --build-tests
- Tests-build/run-step command: swift test --enable-code-coverage --parallel --num-workers 64

Image	Machine Size	w/ cache total	w/ cache build	no cache total	no cache build
`5.10-noble`	Same	12m 53s	3m 7s	19m 37s	7m 34s
`5.10-noble`	2x RAM	13m 41s	3m 16s	21m 9s	7m 55s
`5.10-noble`	4x RAM 2x CPU	9m 19s	2m 40s	16m 37s	6m 18s
`6.0-jammy`	Same	15m 21s	3m 12s	24m 8s	9m 14s
`6.0-jammy`	2x RAM	16m 18s	3m 25s	25m 1s	9m 36s
`6.0-jammy`	4x RAM 2x CPU	9m 5s	2m 17s	17m 15s	6m 15s
`6.0-noble`	Same	17m 21s	5m 10s	23m 31s	9m 9s
`6.0-noble`	2x RAM	18m 32s	5m 42s	23m 34s	9m 1s
`6.0-noble`	4x RAM 2x CPU	9m 47s	2m 59s	18m 42s	6m 43s

(Side note: I didn't know using higher RAM could hurt?! I don't think it's a machine-type problem since the deployment builds below show the expected behavior of some performance improvements when having access to more RAM.)

Analyzing the results (Excluding the bigger 4x RAM 2x CPU machine):

30% worse tests-build/run-step performance on 5.10-noble compared to 6.0-jammy.
- The exact tests-build/run-step numbers are not in the table. It would hover around 7-11 minutes.
60+% worse build performance in the build-step w/ cache, on 6.0-noble compared to 6.0-jammy.
Overall all things considered, 40-50+% worse build performance when moving from 5.10-noble to 6.0-noble.
A bit of the total time (which is not included in the comparisons above, but visible in the total time) goes back to our "cache .build" step which uses the actions/cache which apparently doesn't properly handle the big 2GB .build directories.

Deployment CI

Deployment CI machine sizes are compared to what we had for swift:5.10-noble: 4 CPU x 8GB RAM c7x EC2 Instance.
- Other than "Same", the other machines are of the m7x family which have the higher RAM ratio.
Compared to the tests builds, deployment builds lack --build-tests, use jemalloc, and only build the certain product that will be deployed.
- jemalloc might be why the behavior is worse than test builds?!

Image	Machine Size	w/ cache total	w/ cache build	no cache total	no cache build
`5.10-noble`	Same	4m 19s	2m 37s	13m 27s	9m 1s
`5.10-noble`	2x RAM	3m 31s	1m 49s	12m 29s	9m 12s
`5.10-noble`	4x RAM 2x CPU	4m 7s	2m 14s	9m 47s	6m 9s
`6.0-jammy`	Same	6m 25s	3m 7s	OOM	OOM
`6.0-jammy`	2x RAM	3m 40s	2m 0s	13m 57s	10m 14s
`6.0-jammy`	4x RAM 2x CPU	4m 39s	2m 39s	10m 7s	6m 32s
`6.0-noble`	Same	OOM	OOM	OOM	OOM
`6.0-noble`	2x RAM	OOM	OOM	OOM	OOM
`6.0-noble`	4x RAM 2x CPU	6m 12s	4m 10s	11m 6s	7m 21s

Only notable change when I moved our app to Swift 6 compiler, is that we have 3 executable targets which Swift was throwing errors about e.g. using @testable on those, so what I did is that I added 3 more .executableTarget targets, marked existing targets as just .target, and the .executableTargets only have like 5 lines in each to call the original target's entry-point.
This way we can still @testable import the original target.

You may ask: "Didn't you say you needed 8x larger machine to run the tests? How come this time they ran even on the same machine as you had before when using swift:5.10-noble?!"
That's a good question.
I tried multiple times yesterday, result of which is still available in GitHub Actions. I was even live-texting the results over on the Vapor Discord Server in a thread. I also sent a sample of such failing logs to @grynspan.
I also did check and the swift:6.0-noble images from yesterday and today are the same (have matching hashes).
I know there hasn't been any other changes since yesterday to the project, so not exactly sure why the tests are going with no getting-stuck today. I have no complaints about that. Though the deployment CI does still get stuck.

So:
Yesterday I reported that there seem to be 2 problems, one a general build time regression, second one a massive regression when not using --disable-swift-testing. Today I'm unable to reproduce the latter.
Something's really up. And It definitely isn't the Swift code since that hasn't changed.
I don't think it's the AWS machines we use either, those are just standard EC2 instances.
The only thing I can think of is our cache usage. Maybe Swift 6 was using Swift 5's caches and didn't like that at all. But even then, I remember testing both with and without cache. I've set the CI to disable usage of caching when you rerun a job, so it was trivial to test both scenarios.

Worth mentioning, when the build is stuck, I consistently see a sequence of logs like this, containing Write Objects.LinkFileList around the end:

[6944/6948] Wrapping AST for MyLib for debugging
[6946/6950] Compiling MyExec Entrypoint.swift
[6947/6950] Emitting module MyExec
[6948/6951] Wrapping AST for MyExec for debugging
[6949/6951] Write Objects.LinkFileList

gwynne · 2024-09-20T01:26:30Z

This almost starts to sound like a recurrence of the infamous linker RAM usage problem due to huge command lines with repeated libraries. @al45tair Is there any chance we're still failing to deduplicate linker command lines fully?

finagolfin · 2024-09-20T04:58:36Z

Is there any chance we're still failing to deduplicate linker command lines fully?

Yes, I pinged you on that last month, but never got a response. A fix was just merged into release/6.0 in #76504, so it will not be released until 6.0.2 or 6.0.3.

@jmschonfeld or @shahmishal, can that fix be prioritized to get into the next patch release?

finagolfin · 2024-09-20T05:36:32Z

@MahdiBM, thanks for all the build info. Do you do any CI builds of the 6.0 snapshot toolchains before the final release? That would help find and stop build regressions like this when they happen, rather than being surprised on the final release.

If you can, I'd like to know how an earlier 6.0 July 19 snapshot toolchain build for jammy does on these same CI runs of yours. That might help figure out the regression, particularly if you compare it to the next July 21 build of the 6.0 toolchain.

MahdiBM · 2024-09-20T13:26:20Z

@MahdiBM, thanks for all the build info. Do you do any CI builds of the 6.0 snapshot toolchains before the final release? That would help find and stop build regressions like this when they happen, rather than being surprised on the final release.

@finagolfin this is an "executable" work project, not a public library, so we don't test on multiple Swift versions.
I think it would be possible for us though to do such a thing on the next nightly images. Not a bad idea to catch these kinds of issues.
The only problem is that how reliable the nightly images are? Do they not have assertions and such enabled which makes the build slower? How can I trust the results? Do I need to run current-nightly and next-ver-nightly and compare those?
I can set up a weekly job perhaps 🤔

If you can, I'd like to know how an earlier 6.0 July 19 snapshot toolchain build for jammy does on these same CI runs of yours. That might help figure out the regression, particularly if you compare it to the next July 21 build of the 6.0 toolchain.

We just use docker images in CI. To be clear, the image names above are exact Docker image names that we use (added this explanation to the comment). I haven't tried or figured out manually using nightly images, although I imagine I could just use swiftly to set up the machine with the specific nightly toolchain and there should be little problems. It will make the different benchmarks diverge a bit though in terms of environment / setup and all.
Preferably, I should be able to just try a nightly image if you think that'll be helpful, just let me know what exact nightly image (the image identifier (hash) and tag ?).
Or I can try a newer 6.0 nightly image that does contain the linker RAM usage fix? If such already exists.

MahdiBM · 2024-09-20T13:28:02Z

Another visible issue is the noble vs jammy difference ... I don't think I could have caught that even if we were running CIs on nightly images, considering Swift 6 just very recently got a noble image.

Not sure where that comes from. Any ideas?

MahdiBM · 2024-09-20T14:24:58Z

I would guess that even on 5.10, the jammy images would behave better than the nobles. Though haven't tested that.

gwynne · 2024-09-20T14:58:00Z

My guess is that difference comes from the updated glibc in noble; when going from bionic to jammy we saw a significant improvement in malloc behavior for that reason, I wouldn't be too surprised if noble regressed some.

finagolfin · 2024-09-20T15:36:24Z

The only problem is that how reliable the nightly images are? Do they not have assertions and such enabled which makes the build slower? How can I trust the results?

I don't know. In my Android CI, the latest Swift release builds the same code about 70-80% faster than the development snapshot toolchains. But you'd be looking for regressions in relative build time, so those branch differences shouldn't matter.

Do I need to run current-nightly and next-ver-nightly and compare those?

I'd simply build with the snapshots of the next release, eg 6.1 once that's branched, and look for regressions in build time with the almost-daily snapshots.

I can set up a weekly job perhaps

The CI tags snapshots and provides official builds a couple times a week: I'd set it up to run whenever one of those drops.

Preferably, I should be able to just try a nightly image if you think that'll be helpful, just let me know what exact nightly image (the image identifier (hash) and tag ?).

I don't use Docker, so don't know anything about its identifiers, but I presume the 6.0 snapshot tag dates I listed should identify the right images.

Or I can try a newer 6.0 nightly image that does contain the linker RAM usage fix? If such already exists.

Not yet. The fix was added to trunk a couple weeks ago, so you could try the Sep. 4 or latest 6.1 snapshot build with it and compare to the Aug. 29 build without it.

You may also want to talk to @ktoso and the SSWG about what kind of toolchain benchmarking exists to catch these issues on linux and what needs to be done to either start or augment it.

tbkka · 2024-09-20T15:44:53Z

If you can log into the build machine and watch memory usage with top, it should be a lot clearer what's going on. There's a big difference between linker and compiler, as noted above. There's also a big difference between "the compiler uses too much memory" and "the build system is running too many compilers at the same time."

MahdiBM · 2024-09-20T16:00:54Z

@tbkka I can ssh into the containers since RunsOn provides such a feature, but as you already noticed, I'm not well-versed in compiler/build-system workings. I can use top but not sure what or how to derive conclusions about where the problem is.

It does appear @gwynne was on point though, about linker issues.

MahdiBM · 2024-09-20T16:12:55Z

You may also want to talk to @ktoso and the SSWG about what kind of toolchain benchmarking exists to catch these issues on linux and what needs to be done to either start or augment it.

@finagolfin I imagine SSWG is already aware of these long standing problems and I expect they have already communicated their concerns in the past few years, just probably haven't managed to get it up in the priority list of the Swift team. I've seen some discussions in the public places.

Even if there were no regressions, Swift's build system looks to me - purely from a user's perspective with no knowledge of the inner workings - pretty behind (dare I say, bad), and one of the biggest pain points of the language. We've just gotten used to our M-series devices build things fast enough before we get way too bored.

Though I'm open to help in benchmarking things. I think one problem is that we need a real, big, and messy project like what most corporate projects are, so we can test things on real environments.
There is the Vapor's penny-bot project that shouldn't a bad start though. Not small, also with a fine amount of complexity.

finagolfin · 2024-09-20T18:54:19Z

@MahdiBM, I have submitted that linker fix for the next 6.0.1 patch release, but the branch managers would like some confirmation from you that this is fixed in trunk first. Specifically, you should compare the Aug. 29 trunk 6.1 snapshot build from last month to the Sep. 4 or subsequent trunk builds.

finagolfin · 2024-09-21T04:46:46Z

Moving the thread about benchmarking the linker fix here, since it adds nothing to the review of that pull. I was off the internet for the last seven hours, so only seeing your messages now.

Complains about CompilerPluginSupport or whatever

Maybe you can give some error info on that.

there was a recent release and our dependencies were not up to date

Ideally, you'd build against the same commit of your codebase as the test runs you measured above.

.build/aarch64-unknown-linux-gnu/debug/CNIODarwin-tool.build/module.modulemap:1:8: error: redefinition of module 'CNIODarwin'

Hmm, this is building inside the linux image? A quick fix might be to remove that CNIODarwin dependency from the targets you're building in the NIO package manifest, as it is unused on linux.

However, this seems entirely unrelated to the toolchain used: I'd try first to build the same working commit of your codebase that you used for the test runs you measured above.

MahdiBM · 2024-09-21T14:27:54Z

@finagolfin

Ideally, you'd build against the same commit of your codebase as the test runs you measured above.

That's not possible for multiple reasons. Such as the fact that normally i use Swift Docker images, but here i need to install specific toolchains which means i need to use the ubuntu image.

Hmm, this is building inside the linux image?

Yes. That's how GitHub Actions works. (ubuntu jammy)

I'd try first to build the same working commit of your codebase that you used for the test runs you measured above.

Not sure how that'd be helpful. Current commit is close enough though.

Those tests above were made in a different environment (Swift Docker images, release images only) so while i trust that you know better than me about these stuff, i don't understand how you're going to be able to properly compare the numbers considering i had to make some adjustments.

finagolfin · 2024-09-21T15:14:16Z

Not sure how that'd be helpful.

I figure you know it builds to completion at least, without hitting all these build errors.

Those tests above were made in a different environment (Swift Docker images, release images only)

There are Swift Docker images for all these snapshot toolchains too, why not use those?

Basically, you can't compare snapshot build timings if you keep hitting compilation errors, so I'm saying you should try to reproduce the known-good environment where you measured the runs above, but only change one ingredient, ie swapping the 6.0 Docker image for the snapshot Docker images.

If these build errors are because other factors, like your Swift project, are changing too, that should fix it. If that still doesn't build, I suggest you use the 6.0 snapshot toolchain tags given, as they will be most similar to the 6.0 release, and show any build error output for those.

If you can't get anything but the final release builds to compile your codebase, you're stuck simply observing the build with some process monitor or file timestamps. If linking seems to be the problem, you could get the linker command from the verbose -v output and manually deduplicate the linker flags to see how much of a difference it makes.

I took a look at some linux CI build times of swift-docc between the 6.0.0 and 6.0 branches, ie with and without the linker fix, and didn't see a big difference. I don't know if that's because they have a lot of RAM, unlike your baseline config that showed the most regression.

MahdiBM · 2024-09-21T15:29:35Z

@finagolfin but how do i figure out the hash of the exact image that relates to the specific main snapshots?

I tried a bunch to fix the nio errors with no luck: https://github.com/MahdiBM/swift-nio/tree/mmbm-no-cniodarwin-on-linux

This is not normal, and not a fault of the project. This is not the first time i'm building the app in a linux environment.

MahdiBM · 2024-09-21T15:32:27Z

It also complained about CNIOWASI, as well as CNIOLinux.
I deleted those alongside CNIOWindows and now I'm getting this:

error: exhausted attempts to resolve the dependencies graph, with the following dependencies unresolved:
* 'swift-nio' from https://github.com/mahdibm/swift-nio.git

finagolfin · 2024-09-21T16:03:32Z

how do i figure out the hash of the exact image that relates to the specific main snapshots?

Hmm, looking it up now, I guess you can't. As I said, I don't use Docker, so I was unaware of that.

My suggestion is that you get the 6.0 Docker image and make sure it builds some known-stable commit of your codebase. Then, use that same docker image to download the 6.0 snapshots given above, like the one I linked yesterday, and after unpacking them in the Docker image, use them to build your code instead. That way, you have a known-good Docker environment and source commit, with the only difference being the Swift 6.0 toolchain build date.

The Docker files almost never change, so only swapping out the toolchain used inside the 6.0 image should minimize the differences.

MahdiBM · 2024-09-21T19:08:23Z

The Docker files almost never change, so only swapping out the toolchain used inside the 6.0 image should minimize the differences.

@finagolfin Good idea, didn't think of that, but still didn't work.

For the reference:

CI File

name: test build

on:
  pull_request: { types: [opened, reopened, synchronize] }

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  unit-tests:
    strategy:
      fail-fast: false
      matrix:
        snapshot:
          - swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-19-a
          - swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-21-a
        machine:
          - name: "medium" # 16gb 8cpu c7i-flex
            arch: amd64
          - name: "large" # 32gb 16cpu c7i-flex
            arch: amd64
          - name: "huge-stable-arm" # 128gb 64cpu bare metal c7g
            arch: arm64

    runs-on:
      labels:
        - runs-on
        - runner=${{ matrix.machine.name }}
        - run-id=${{ github.run_id }}

    timeout-minutes: 60

    steps:
      - name: Check out ${{ github.event.repository.name }}
        uses: actions/checkout@v4

      - name: Build Docker Image
        run: |
          docker build \
            --network=host \
            --memory=128g \
            -f SwiftDockerfile \
            -t custom-swift:1 . \
            --build-arg DOWNLOAD_DIR="${{ matrix.snapshot }}" \
            --build-arg TARGETARCH="${{ matrix.machine.arch }}"

      - name: Prepare
        run: |
          docker run --name swift-container custom-swift:1 bash -c 'apt-get update -y && apt-get install -y libjemalloc-dev && git config --global --add url."https://${{ secrets.GH_PAT }}@github.com/".insteadOf "https://github.com/" && git clone https://github.com/${{ github.repository }} && cd ${{ github.event.repository.name }} && git checkout ${{ github.head_ref }} && swift package resolve --force-resolved-versions --skip-update'
          docker commit swift-container prepared-container:1

      - name: Build ${{ matrix.snapshot }}
        run: |
          docker run prepared-container:1 bash -c 'cd ${{ github.event.repository.name }} && swift build --build-tests'

Modified Dockerfile

FROM ubuntu:22.04 AS base
LABEL maintainer="Swift Infrastructure <[email protected]>"
LABEL description="Docker Container for the Swift programming language"

RUN export DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true && apt-get -q update && \
    apt-get -q install -y \
    binutils \
    git \
    gnupg2 \
    libc6-dev \
    libcurl4-openssl-dev \
    libedit2 \
    libgcc-11-dev \
    libpython3-dev \
    libsqlite3-0 \
    libstdc++-11-dev \
    libxml2-dev \
    libz3-dev \
    pkg-config \
    tzdata \
    zip \
    zlib1g-dev \
    && rm -r /var/lib/apt/lists/*

# Everything up to here should cache nicely between Swift versions, assuming dev dependencies change little

# gpg --keyid-format LONG -k FAF6989E1BC16FEA
# pub   rsa4096/FAF6989E1BC16FEA 2019-11-07 [SC] [expires: 2021-11-06]
#       8A7495662C3CD4AE18D95637FAF6989E1BC16FEA
# uid                 [ unknown] Swift Automatic Signing Key #3 <[email protected]>
ARG SWIFT_SIGNING_KEY=8A7495662C3CD4AE18D95637FAF6989E1BC16FEA
ARG SWIFT_PLATFORM=ubuntu
ARG OS_MAJOR_VER=22
ARG OS_MIN_VER=04
ARG SWIFT_WEBROOT=https://download.swift.org/development
ARG DOWNLOAD_DIR

# This is a small trick to enable if/else for arm64 and amd64.
# Because of https://bugs.swift.org/browse/SR-14872 we need adjust tar options.
FROM base AS base-amd64
ARG OS_ARCH_SUFFIX=

FROM base AS base-arm64
ARG OS_ARCH_SUFFIX=-aarch64

FROM base-$TARGETARCH AS final

ARG OS_VER=$SWIFT_PLATFORM$OS_MAJOR_VER.$OS_MIN_VER$OS_ARCH_SUFFIX
ARG PLATFORM_WEBROOT="$SWIFT_WEBROOT/$SWIFT_PLATFORM$OS_MAJOR_VER$OS_MIN_VER$OS_ARCH_SUFFIX"

RUN echo "${PLATFORM_WEBROOT}/latest-build.yml"

ARG download="$DOWNLOAD_DIR-$SWIFT_PLATFORM$OS_MAJOR_VER.$OS_MIN_VER$OS_ARCH_SUFFIX.tar.gz"

RUN echo "DOWNLOAD IS THIS: ${download} ; ${DOWNLOAD_DIR}"

RUN set -e; \
    # - Grab curl here so we cache better up above
    export DEBIAN_FRONTEND=noninteractive \
    && apt-get -q update && apt-get -q install -y curl && rm -rf /var/lib/apt/lists/* \
    # - Latest Toolchain info
    && echo $DOWNLOAD_DIR > .swift_tag \
    # - Download the GPG keys, Swift toolchain, and toolchain signature, and verify.
    && export GNUPGHOME="$(mktemp -d)" \
    && curl -fsSL ${PLATFORM_WEBROOT}/${DOWNLOAD_DIR}/${download} -o latest_toolchain.tar.gz \
        ${PLATFORM_WEBROOT}/${DOWNLOAD_DIR}/${download}.sig -o latest_toolchain.tar.gz.sig \
    && curl -fSsL https://swift.org/keys/all-keys.asc | gpg --import -  \
    && gpg --batch --verify latest_toolchain.tar.gz.sig latest_toolchain.tar.gz \
    # - Unpack the toolchain, set libs permissions, and clean up.
    && tar -xzf latest_toolchain.tar.gz --directory / --strip-components=1 \
    && chmod -R o+r /usr/lib/swift \
    && rm -rf "$GNUPGHOME" latest_toolchain.tar.gz.sig latest_toolchain.tar.gz \
    && apt-get purge --auto-remove -y curl

# Print Installed Swift Version
RUN swift --version

RUN echo "[ -n \"\${TERM:-}\" -a -r /etc/motd ] && cat /etc/motd" >> /etc/bash.bashrc; \
    ( \
      printf "################################################################\n"; \
      printf "# %-60s #\n" ""; \
      printf "# %-60s #\n" "Swift Nightly Docker Image"; \
      printf "# %-60s #\n" "Tag: $(cat .swift_tag)"; \
      printf "# %-60s #\n" ""; \
      printf "################################################################\n" \
    ) > /etc/motd

MahdiBM · 2024-09-21T19:10:55Z

To be clear, by "didn't work" I mean that I'm getting exactly the same errors.

MahdiBM · 2024-09-21T20:08:35Z

Tried the 6.0 snapshots, they complain about usage of swiftLanguageMode in dependencies.

finagolfin · 2024-09-22T06:39:30Z

Does simply building your code with the Swift 6.0 release still work? If so, I'd try to instrument the build to figure out the bottlenecks, as I suggested before. In particular, if you're building a large executable at the end, that might be taking the most time.

As I said yesterday, you could try dumping all the commands that swift build is running with the -v flag, check if the final linker command is repeating -lFoundationEssentials -l_FoundationICU over and over again, then manually run that link alone twice to measure the timing: once exactly as it was to get the baseline, then a second time with those library flags de-duplicated to see if that helps. Ideally, you'd do this on the same lower-RAM hardware where you were seeing the largest build-time regressions before.

MahdiBM · 2024-09-22T10:08:06Z

Of course there is no problem on the released Swift 6 😅. This whole issue is about CI slowness. We run CI on push, PR, etc... and they've all passed.

MahdiBM · 2024-09-22T13:39:01Z

@finagolfin
Finally got the build working with some dependency downgrades.

        snapshot:
          - swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-19-a
          - swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-21-a
        machine:
          - name: "medium" # 16gb 8cpu c7i-flex
            arch: amd64
          - name: "large" # 32gb 16cpu c7i-flex
            arch: amd64
          - name: "huge-stable-arm" # 128gb 64cpu bare metal c7g
            arch: arm64

The results are only marginally different:

Ignore that it says "unit-tests". It's only a swift build --build-tests like in the shared files above.
Also the differences are really ignorable, specially when you go see the build step's times.

finagolfin · 2024-09-22T14:17:13Z

It's not the same low-RAM scenarios though, right? You initially gave data for 8 and 16 GB RAM servers that showed regressions, whereas this is from 16, 32, and 128 GB runners, with the smallest showing the biggest regression albeit less than with the final 6.0 release.

Why don't you wait for the next 6.0 snapshot toolchain with the linker patch? That should be a good comparison to the 6.0 release.

MahdiBM · 2024-09-22T15:27:15Z

@finagolfin sorry you're right, i just wrote my conclusions here before looking at the smallest machine.
Trying on a 8-ram 4-cpu machine now.

For the record, 07-19's w/ "medium" machine took 9m 46s, and for 07-21 it took 10m 29s.

MahdiBM · 2024-09-22T15:51:27Z

@finagolfin this is not exactly the same scenario as before, these numbers are not comparable to the older number.
We should look at them compared to themselves.

With a smaller machine (half the "medium" machine) 07-19 takes 17m 3s and 07-21 takes 18m 5s (not the whole job run, only the build step).
So yeah, this does indicate a part of the regressions, although definitely not the whole of it.

In the comparisons above, 6.0 jammy was 22% slower than 5.10 noble on the same machine (look at Tests CI, build-step time w/ no cache, which uses the same command but has to resolve the packages itself, 9m 14s vs 7m 34s).
Here it's 7% difference on the same machine (9m 46s vs 10m 29s).

So like a third of the regression? I guess there can also be other factors here as well (like how snapshots behave vs release builds), so let me know what you think.

MahdiBM · 2024-09-22T15:59:39Z

You initially gave data for 8 and 16 GB RAM servers that showed regressions, whereas this is from 16, 32, and 128 GB runners,

The medium machine is the same as the "Same" machines in "Test CI" section.
There are 3 machines in the image, not 2.

finagolfin · 2024-09-22T16:06:28Z

It seems to indicate the Foundation re-core in July definitely contributed to the slowdown, but wasn't all of it. Are you able to also compare the trunk snapshot toolchains from a couple weeks ago? That might be a better benchmark.

The medium machine is the same as the "Same" machines in "Test CI" section.
There are 3 machines in the image, not 2.

I was talking about your original test and deployment runs from a couple days ago, the smallest from each were 8 and 16 GB.

MahdiBM · 2024-09-22T16:11:38Z

Are you able to also compare the trunk snapshot toolchains from a couple weeks ago?

Just let me know what the snapshot names are.

I was talking about #76555 (comment), the smallest from each were 8 and 16 GB.

The deployment CI uses a smaller base machine because it only builds a specific product. Not everything including tests.
The comparisons i mentioned above are the closest we have to compare, and should be reliable enough considering they have 3x difference.

finagolfin · 2024-09-22T17:13:47Z

Just let me know what the snapshot names are.

See above: "you could try the Sep. 4 or latest 6.1 snapshot build with it and compare to the Aug. 29 build without it."

MahdiBM · 2024-09-22T18:01:13Z

@finagolfin ah i did try those already, the error situation was less solvable.

MahdiBM · 2024-09-22T18:02:15Z

It was the CNIO errors. Couldn't get them fixed although i made some decent effort.

MahdiBM · 2024-09-24T18:21:20Z

First victim:
Heroku can't build Swift apps that it previously could: vapor-community/heroku-buildpack#78 (comment)

Heroku is a pretty popular platform for SSS apps, considering it once had a free tier. This problem will be have a decent negative impact.

MahdiBM · 2024-09-30T08:28:17Z

Ever since the last comment, I've seen 2 more users encountering this problem on Heroku. So a total of 3, and that's only those who cared to ask about their weird build failures before reverting back.

Interesting bit is that one person was consistently having problems on Swift 6 jammy, but their project would consistently build fine on Swift 6 noble, in Heroku buildpack builders.
This is unexpected to me, but shows that the regressions are even more complicated and have happened on multiple different fronts.

So:

Swift noble, even on 5.10 has build-time regressions.
Swift 6 itself contains some regressions as well.
jammy / noble contain their own regressions as well?!

I was unable to find specs of Heroku builders (also asked around a bit, no luck) to see if anything has changed between their Ubuntu 22 and 24 stacks.

MahdiBM added bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels labels Sep 18, 2024

MahdiBM changed the title ~~Absolutely massive regressions on swift:6.0-noble~~ Significant build regressions on swift:6.0-noble compared to 5.10-noble Sep 20, 2024

finagolfin mentioned this issue Sep 20, 2024

[6.0.1] Add new Foundation libraries to SwiftRuntimeLibsOrdered and update others (#76019) #76606

Closed

This was referenced Sep 23, 2024

Update Workflows for Swift 6 release apple/swift-nio#2895

Open

App crashing on boot vapor-community/heroku-buildpack#78

Open

MahdiBM mentioned this issue Sep 29, 2024

Swift 6 crashes my whole Windows PC when trying to compile SwiftGodot swiftlang/swift-package-manager#7992

Open

1 task

Significant build regressions on swift:6.0-noble compared to 5.10-noble #76555

Significant build regressions on swift:6.0-noble compared to 5.10-noble #76555

Comments

MahdiBM commented Sep 18, 2024 • edited Loading

Description

Absolutely massive Significant build regressions on linux , worsened by swift-testing

Environment

What Happened?

Reproduction

Expected behavior

Environment

Additional information

grynspan commented Sep 18, 2024 • edited Loading

MahdiBM commented Sep 20, 2024 • edited Loading

Tests CI

Deployment CI

gwynne commented Sep 20, 2024

finagolfin commented Sep 20, 2024

finagolfin commented Sep 20, 2024

MahdiBM commented Sep 20, 2024 • edited Loading

MahdiBM commented Sep 20, 2024

MahdiBM commented Sep 20, 2024

gwynne commented Sep 20, 2024

finagolfin commented Sep 20, 2024

tbkka commented Sep 20, 2024

MahdiBM commented Sep 20, 2024

MahdiBM commented Sep 20, 2024

finagolfin commented Sep 20, 2024

finagolfin commented Sep 21, 2024

MahdiBM commented Sep 21, 2024 • edited Loading

finagolfin commented Sep 21, 2024

MahdiBM commented Sep 21, 2024

MahdiBM commented Sep 21, 2024

finagolfin commented Sep 21, 2024

MahdiBM commented Sep 21, 2024 • edited Loading

MahdiBM commented Sep 21, 2024

MahdiBM commented Sep 21, 2024

finagolfin commented Sep 22, 2024

MahdiBM commented Sep 22, 2024

MahdiBM commented Sep 22, 2024 • edited Loading

finagolfin commented Sep 22, 2024

MahdiBM commented Sep 22, 2024

MahdiBM commented Sep 22, 2024 • edited Loading

MahdiBM commented Sep 22, 2024

finagolfin commented Sep 22, 2024

MahdiBM commented Sep 22, 2024 • edited Loading

finagolfin commented Sep 22, 2024

MahdiBM commented Sep 22, 2024

MahdiBM commented Sep 22, 2024

MahdiBM commented Sep 24, 2024 • edited Loading

MahdiBM commented Sep 30, 2024 • edited Loading

Significant build regressions on `swift:6.0-noble` compared to `5.10-noble` #76555

Significant build regressions on `swift:6.0-noble` compared to `5.10-noble` #76555

MahdiBM commented Sep 18, 2024 •

edited

Loading

grynspan commented Sep 18, 2024 •

edited

Loading

MahdiBM commented Sep 20, 2024 •

edited

Loading

MahdiBM commented Sep 20, 2024 •

edited

Loading

MahdiBM commented Sep 21, 2024 •

edited

Loading

MahdiBM commented Sep 21, 2024 •

edited

Loading

MahdiBM commented Sep 22, 2024 •

edited

Loading

MahdiBM commented Sep 22, 2024 •

edited

Loading

MahdiBM commented Sep 22, 2024 •

edited

Loading

MahdiBM commented Sep 24, 2024 •

edited

Loading

MahdiBM commented Sep 30, 2024 •

edited

Loading