Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc: update to 11.2.0 #9088

Closed
wants to merge 1 commit into from
Closed

gcc: update to 11.2.0 #9088

wants to merge 1 commit into from

Conversation

lazka
Copy link
Member

@lazka lazka commented Jul 4, 2021

Continuing from #8320

@lazka lazka mentioned this pull request Jul 4, 2021
@oscarfv
Copy link
Contributor

oscarfv commented Jul 4, 2021

I don't see the i386 PEH fix here. Is that intentional?

@lazka
Copy link
Member Author

lazka commented Jul 4, 2021

I don't see the i386 PEH fix here. Is that intentional?

Are you sure? it's rebased on master.

@oscarfv
Copy link
Contributor

oscarfv commented Jul 4, 2021

I don't see the i386 PEH fix here. Is that intentional?

Are you sure? it's rebased on master.

I missed that bit. Sorry for the noise.

@oscarfv
Copy link
Contributor

oscarfv commented Jul 4, 2021

My build failed at the usual point, so it would be a surprise if this PR succeeds.

I propose either go through the path @revelator suggests or drop Ada on i686 altogether. It is a dying platform that attracts less and less attention from upstream and it is very likely that future releases will be problematic too.

There are about a dozen packages that depend on Ada, mostly developer tools. It doesn' t look as if dropping Ada on i686 would cause any major problem to our users or to the MSYS2 ecosystem.

@mati865
Copy link
Collaborator

mati865 commented Jul 4, 2021

It's not only about Ada. We should not ship compiler with broken exception handling.

@oscarfv
Copy link
Contributor

oscarfv commented Jul 4, 2021

It's not only about Ada. We should not ship compiler with broken exception handling.

But again: is there proof that the exception handling is broken beyond Ada? (supposing that the broken thing is the EH) The Ada compiler/runtime might be using on a specific way that does not affect C/C++.

And are we sure that the broken thing is on 11 but not on 10? Because the fact that the bootstrap completes when initiated with gcc-10 sjlj points to gcc-10's DWARF EH as the culprit. Let's remember that gcc-11 successfully compiles itself on the bootstrap process, so we have an indication that 10 is defective while 11 is fine.

It would be interesting to bootstrap 11 from @revelator's 11 packages. If it succeeds, IMO we would have a strong indication that whatever is broken in 10 was fixed on 11.

@lazka
Copy link
Member Author

lazka commented Jul 4, 2021

It would be interesting to bootstrap 11 from @revelator's 11 packages. If it succeeds, IMO we would have a strong indication that whatever is broken in 10 was fixed on 11.

I can try that if I get the packages

@revelator
Copy link
Contributor

uploading now :)

@revelator
Copy link
Contributor

and he is right if it turns out we can bootstrap it again using my packages it indicates that something might have been broken in the earlier version (not the first time either, it does happen).

@revelator
Copy link
Contributor

revelator commented Jul 4, 2021

https://sourceforge.net/projects/cbadvanced/files/bootstrap/

give it a little if all the packages are not there yet, sourceforge can be a bit slow at times to show newly uploaded items.

i hope it helps.

@lazka
Copy link
Member Author

lazka commented Jul 4, 2021

thanks, I'll try tomorrow.

@oscarfv
Copy link
Contributor

oscarfv commented Jul 4, 2021

Just tried building PR #8320 with @revelator's packges. It fails earlier (stage 1) on Ada too. @lazka : please go ahead with your build, maybe my setup is botched.

echo timestamp > s-i386-bt
cp -p ../../gcc-11.1.0/gcc/ada/sinfo.ads ../../gcc-11.1.0/gcc/ada/sinfo.adb ../../gcc-11.1.0/gcc/ada/xsinfo.adb ../../gcc-11.1.0/gcc/ada/csinfo.adb ada/bldtools/sinfo
(cd ada/bldtools/einfo; gnatmake -q xeinfo ; ./xeinfo einfo.h )
mkdir -p ada/bldtools/nmake
(cd ada/bldtools/sinfo; gnatmake -q xsinfo ; ./xsinfo sinfo.h )
rm -f ada/bldtools/nmake/sinfo.ads ada/bldtools/nmake/nmake.adt ada/bldtools/nmake/xnmake.adb ada/bldtools/nmake/xutil.ads ada/bldtools/nmake/xutil.adb
mkdir -p ada/bldtools/treeprs
cp -p ../../gcc-11.1.0/gcc/ada/sinfo.ads ../../gcc-11.1.0/gcc/ada/nmake.adt ../../gcc-11.1.0/gcc/ada/xnmake.adb ../../gcc-11.1.0/gcc/ada/xutil.ads ../../gcc-11.1.0/gcc/ada/xutil.adb ada/bldtools/nmake
rm -f ada/bldtools/treeprs/treeprs.adt ada/bldtools/treeprs/sinfo.ads ada/bldtools/treeprs/xtreeprs.adb
mkdir -p ada/bldtools/snamest
(cd ada/bldtools/nmake; gnatmake -q xnmake ; ./xnmake -b nmake.adb ; ./xnmake -s nmake.ads)
rm -f ada/bldtools/snamest/snames.ads-tmpl ada/bldtools/snamest/snames.adb-tmpl ada/bldtools/snamest/snames.h-tmpl ada/bldtools/snamest/xsnamest.adb ada/bldtools/snamest/xutil.ads ada/bldtools/snamest/xutil.adb
cp -p ../../gcc-11.1.0/gcc/ada/treeprs.adt ../../gcc-11.1.0/gcc/ada/sinfo.ads ../../gcc-11.1.0/gcc/ada/xtreeprs.adb ada/bldtools/treeprs
cp -p ../../gcc-11.1.0/gcc/ada/snames.ads-tmpl ../../gcc-11.1.0/gcc/ada/snames.adb-tmpl ../../gcc-11.1.0/gcc/ada/snames.h-tmpl ../../gcc-11.1.0/gcc/ada/xsnamest.adb ../../gcc-11.1.0/gcc/ada/xutil.ads ../../gcc-11.1.0/gcc/ada/xutil.adb ada/bldtools/snamest
touch ada/GNAT_DATE
(cd ada/bldtools/treeprs; gnatmake -q xtreeprs ; ./xtreeprs treeprs.ads )
(cd ada/bldtools/snamest; gnatmake -q xsnamest ; ./xsnamest )
mkdir -p ada/libgnat
mkdir -p ada/libgnat
cp -p ../../gcc-11.1.0/gcc/ada/libgnat/s-excmac__gcc.ads ada/libgnat/s-excmac.ads
cp -p ../../gcc-11.1.0/gcc/ada/libgnat/s-excmac__gcc.adb ada/libgnat/s-excmac.adb
echo "pragma Style_Checks (Off);" >tmp-sdefault.adb
gnatmake: "xsinfo.adb" compilation error
echo "with Osint; use Osint;" >>tmp-sdefault.adb
cp ../../gcc-11.1.0/gcc/gcc-ar.c gcc-nm.c
/bin/sh: line 1: ./xsinfo: No such file or directory
make[3]: *** [../../gcc-11.1.0/gcc/ada/Make-generated.in:45: ada/sinfo.h] Error 127
make[3]: *** Waiting for unfinished jobs....

@revelator
Copy link
Contributor

hmm does indeed sound like something might have broken on your end :S but just to be sure ill try setting up a local version to bootstrap from.

@revelator
Copy link
Contributor

local bootstrap also fails here in the exact same spot, so it seems this is indeed an upstream bug :S

echo "pragma Style_Checks (Off);" >tmp-sdefault.adb
echo "with Osint; use Osint;" >>tmp-sdefault.adb
cp ../../gcc-11.1.0/gcc/gcc-ar.c gcc-nm.c
cp ../../gcc-11.1.0/gcc/gcc-ar.c gcc-ranlib.c
echo "package body Sdefault is" >>tmp-sdefault.adb
rm -f mm_malloc.h
TARGET_CPU_DEFAULT="" \
HEADERS="auto-host.h ansidecl.h config/i386/xm-mingw32.h" DEFINES="USED_FOR_TARGET " \
/bin/sh ../../gcc-11.1.0/gcc/mkconfig.sh tconfig.h
gnatmake: "xsinfo.adb" compilation error
echo "   S0 : constant String := \"/mingw32/\";" >>tmp-sdefault.adb
cat ../../gcc-11.1.0/gcc/config/i386/gmm_malloc.h > mm_malloc.h
/bin/sh: line 1: ./xsinfo: No such file or directory
(echo "@set version-GCC 11.1.0"; \
 if [ "" = "experimental" ]; \
 then echo "@set DEVELOPMENT"; \
 else echo "@clear DEVELOPMENT"; \
 fi) > gcc-vers.texiT
make[3]: *** [../../gcc-11.1.0/gcc/ada/Make-generated.in:45: ada/sinfo.h] Error 127
make[3]: *** Waiting for unfinished jobs....
CMD //c echo @set srcdir `echo /D/mingw-w64-gcc/src/build-i686-w64-mingw32/gcc/../../gcc-11.1.0/gcc | sed -e 's|\\([@{}]\\)|@\\1|g'` >> gcc-vers.texiT

@oscarfv
Copy link
Contributor

oscarfv commented Jul 5, 2021

Tried make -j1 to discard a Makefile race. Failed at xsinfo.adb too.

Then compiled a "Hello world" Ada program to check that the gcc 11 Ada toolset provided by @revelator's packages is minimally functional. It worked fine.

Then tried building today's gcc 11 snapshot. Failed at xsinfo.adb too.

Some web searching hinted at a filesystem related problem (file path length exceeded some limit). Launched the build in /d/g directory. Same result.

@oscarfv
Copy link
Contributor

oscarfv commented Jul 5, 2021

OTOH a simple /path/to/src/configure --disable-multilib && make on a MINGW32 shell goes along happily until stage 2.

Maybe our makepkg-mingw and PKGBUILD are a bit too much over-engineered?

Configuring like this:

 ../gcc-11-20210703/configure --disable-multilib --host=i686-w64-mingw32 --target=i686-w64-mingw32 --build=i686-w64-mingw32 --enable-languages=c,lto,c++,fortran,ada,jit --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-threads=posix --enable-shared --enable-static

triggers the xsinfo.adb error. Looks like an opportunity for a command-line bisection ;-)

@revelator
Copy link
Contributor

guess we need to report this upstream, maybe fixed in the next version then :/.
besides the same options oscarfv used i also tried with differing -gdwarf-versions sadly to no avail.

@revelator
Copy link
Contributor

interresting, though odd unless something changed recently in the autotools build scripts.
there was a patch added in 2019 that did away with some duplicated build lines in ada not sure if this might have triggered it, though i suspect version 10 would also have been affected then.

@revelator
Copy link
Contributor

there was a similar bug back on gcc-4.8.0 i dont remember how it was fixed unfortunatly, though at that time the package was built using mingw.orgs old msys shell which sadly does not work to well with win10, it did have one plus though as it did not hide some errors which made finding the bugger a lot easier ugh...

@revelator
Copy link
Contributor

sorry was gcc-4.1.0 so long ago my memory fades :S the fix was compiling it with the old gcc-3.3.6 sadly this does not apply here as that build was 32 bit only. The culprit was that 4.1.0 produced corrupt code when bootstrapping https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23894

@oscarfv
Copy link
Contributor

oscarfv commented Jul 5, 2021

OTOH a simple /path/to/src/configure --disable-multilib && make on a MINGW32 shell goes along happily until stage 2.

Of course it does: Ada is an optional language, so the above command does not build it :-/

Tried configure --disabe-multilib --enable-languages=c,ada with and without CPPFLAGS=-D__USE_MINGW_ANSI_STDIO=1. Both ways keep failing at xsinfo.adb. The --disable-multilib bit is just because I know the build will fail without it.

At least we can file a bug report upstream without throwing at them an screenfull of configure options.

@revelator
Copy link
Contributor

Aye

@revelator
Copy link
Contributor

i commented in the upstream bug report about the new build error with the successfull bootstrap.

@revelator
Copy link
Contributor

https://aur.archlinux.org/packages/gcc-ada-git/

huh arch is also running into problems with ada :S

@mati865
Copy link
Collaborator

mati865 commented Jul 5, 2021

https://aur.archlinux.org/packages/gcc-ada-git/

huh arch is also running into problems with ada :S

This was almost 2 years ago, this AUR package uses latest commit again.

@revelator
Copy link
Contributor

must have been some heavy changes to cause so many problems heh. noticed it now also has preliminary support for C++20 modules via libcody which still seems to be in a preliminary state. i wonder if the best thing to do would be waiting untill the next version of it is out ? we could still report bugs upstream to make things happen sooner. unless someone has an ephiphany and spots the culprit.

@oscarfv
Copy link
Contributor

oscarfv commented Jul 5, 2021

Building i686 10.3 PKGBUILD with @revelator's 11.1 packages also fails with

gnatmake: "xsinfo.adb" compilation error

Seems that the problem is on 11.1 itself (either on those packages or on upstream, dunno).

@revelator
Copy link
Contributor

it would seem so indeed :(

@oscarfv
Copy link
Contributor

oscarfv commented Jul 9, 2021

For the record:

A new build bug was fixed upstream. Applying that patch to the latest GCC 11 snapshot does not fix our problem.

On that bug report, the reporter says I have bootstrapped GCC 11.1.1 on both {x86_64,i686}-w64-mingw32 and seen no problem so far. The maintainer that responded to our bug report also says that the bootstrap builds fine for him. I'm mystified about why we seem to be the only ones having problems building 11 on i686. One of the things I tried past week was to build 11 without local patches, and it failed the same way.

Maybe the problem is our binutils?

@mati865
Copy link
Collaborator

mati865 commented Sep 9, 2021

Extracted the zip without making any changes into clean master branch checkout and the build failed with no specific error: gcc-err.txt

@revelator
Copy link
Contributor

revelator commented Sep 9, 2021

hmm odd the first error points to this line in make-lang.in

.adb.o:
	-> mkdir -p $(dir $@) // wtf!!!
	$(CC) -c $(ALL_ADAFLAGS) $(ADA_INCLUDES) $< $(ADA_OUTPUT_OPTION)
	@$(ADA_DEPS)

strangely it builds here but i bootstrapped my dwarf version using the sjlj compiler to get the missing files installed first.
then i rebuilt the dwarf version using that.

@revelator
Copy link
Contributor

revelator commented Sep 10, 2021

well heres something mighty interresting bootstrapping 32 bit dwarf gcc-10.3.0 with 32 bit dwarf gcc-11.2.0 works with ada Oo

EDIT: no still crashes but a bit later on :S

@revelator
Copy link
Contributor

yep gnat with dwarf exceptions is broken, compiling gprbuild-bootstrap with the sjlj bootstrapped dwarf version aborts at the first ada source file. Even the simplest ada example will segafult it so there is something not right with it.

@oscarfv
Copy link
Contributor

oscarfv commented Sep 18, 2021

I built an Ada-less gcc 11.2.0 with our PKGBUILD and on i686 the most trivial test that throws an exception fails:

int main() {
  try {
    throw 13;
  }
  catch(...) {
    return 1;
  }
  return 0;
}
$ g++ foo.cpp
$ ./a.exe
terminate called after throwing an instance of 'int'
terminate called recursively

Using a gcc built with debug info, I see on _Unwind_RaiseException that it returns here:

      _Unwind_FrameState fs;

      /* Set up fs to describe the FDE for the caller of cur_context.  The
	 first time through the loop, that means __cxa_throw.  */
      code = uw_frame_state_for (&cur_context, &fs);

      if (code == _URC_END_OF_STACK)
	/* Hit end of stack with no handler found.  */
	return _URC_END_OF_STACK;

This causes a call to terminate on the calling function and then another call to terminate. Obviously the next question is what happens in uw_frame_state_for, but I know zilch about this code and can't assign meaning to what I'm seeing.

BTW, mingw32 gdb is broken (catch throw runs the program and gdb exits.) Had to use MSYS2 gdb.

@oscarfv
Copy link
Contributor

oscarfv commented Sep 18, 2021

One more detail: compiling with our clang has the same effect, so the problem is in the gcc support libraries which are also used by clang.

This could be a great bisection project, but bootstrapping gcc takes two hours here, so...

@revelator
Copy link
Contributor

stack handler broken maybe ?, hmm there was a report on the gcc bug list for m68k which seemed to have a similar problem with memory corruption, i think lazka mentioned our problem there.

@jeremyd2019
Copy link
Member

Weirdly reminiscent of #9091's unwinder issues, though that's LLVM's libunwind rather than libgcc (though the function names look much the same - did they end up both using libunwind or something?)

@revelator
Copy link
Contributor

does clang even get installed beforehand in a ci build ?.

@revelator
Copy link
Contributor

thinking about it i tried bootstrapping it with a different gcc than ours and hit the same problem (no libunwind present in that build) so that cant be it :/.

@jeremyd2019
Copy link
Member

No, looking at the code I think it's just that the function names are similar because they're doing similar things

@revelator
Copy link
Contributor

aye clangs libunwind uses several unwinders, the main difference compared to the gcc unwinders is that all clangs unwinders are present in the unwinder library though it defaults to one specific model you can actually select a totally different one.

@revelator
Copy link
Contributor

revelator commented Sep 19, 2021

@oscarfv seems it is crashing in some newer code that was added as far back as gcc-7 in the atomic fde path...

specifically in this function _Unwind_Find_FDE where it returns a null pointer ->

  if (__builtin_expect (!__atomic_load_n (&any_objects_registered,
					  __ATOMIC_RELAXED), 1))
    return NULL;

im going to try disabling ATOMIC_FDE_FAST_PATH for mingw builds to see what it gets us.

@jeremyd2019
Copy link
Member

That check should only be returning NULL if no .eh_frame sections were registered with libgcc (which would indeed mean that nobody would be able to catch an exception). That registration is supposed to happen early in the module load process.

@revelator
Copy link
Contributor

might be it then :) though something must have broken it after as it worked in gcc-7 as far as im aware.
lets see what disabling it nets us, if it starts working after then we actually have something to present to the gcc devs.

@jeremyd2019
Copy link
Member

I would also try setting a breakpoint/stepping through __gcc_register_frame.

@revelator
Copy link
Contributor

good point, probably need to build the first dwarf version with this disabled using the sjlj compiler to avoid any nastiness, then ill try bootstrapping it using that.

@revelator
Copy link
Contributor

damnation still fails in the exact same spot when building gnat :(

@oscarfv
Copy link
Contributor

oscarfv commented Sep 20, 2021

Replacing gcc-11 libstdc++.dll with gcc-10 makes the simple test case work. Replacing libgcc_s_dw2-1.dll has no effect. So either this is a different problem from Ada's failure or libstdc++.dll has the problematic code inlined.

Anyway, a quick method for pinpointing the source of the problem is simply stepping through all the exception-related code in parallel for executables built with each version, noting where the execution diverges, reverting the associated commit, checking if the problem gets fixed, rinse, repeat. The EH-related machinery has very few changes from version 10 to 11, so it is faster than git-bisect.

Actually, reverting all EH-related changes from 10 to 11 in libgcc probably would fix the problem without introducing regressions ;-)

@jeremyd2019
Copy link
Member

jeremyd2019 commented Sep 20, 2021

Based on what @revelator said, I'm guessing it's failing to register its .eh_frame section with the unwinder

@revelator
Copy link
Contributor

one mighty interresting thing is if i build it using a very old gcc-4.9.1 it actually fails in the libstdc++ build but not in gnat, strangely it first fails on stage2 of the bootstrap so i guess oscarfv might be onto something with the problem being in the libstdc++ code.
gcc-4.9.1 uses dwarf version 2 not 4.

@GitMensch
Copy link
Contributor

@revelator Did you disable that part of the code for MINGW32 now - and did it get you anything further?

In general: could this be pushed for now with MINGW32 building GDB 10.2 as before and MINGW64 building current gdb?

@revelator
Copy link
Contributor

aye unfortunatly it still fails so that was not it :/ in fact the code there was made as part of an effort to fixup dwarf exceptions for mingw* builds i found out, so disabling it would lead to even more problems.

so far any effort has been in vain and i dont know enough about exception models to do much further :(, guess it will be upto upstream to get this one sorted out.

@oscarfv
Copy link
Contributor

oscarfv commented Oct 19, 2021

An update on upstream's bug report:

The problem with Ada is that on the bootstrap gcc-10's exception system somehow "mixes" with stage 2 gcc-11 Ada code. The possibility of this occurrence was known but not addressed because in practice it was not happening... until now. They say that our build environment might be tainted on a similar way of grep's case, bringing in the conditions required for the Ada problem to surface, but that's just speculation. They are trying to fix the problem with Ada exception propagation, but no promises made.

So we now know that the breakage is specific of Ada, it does not affect C++.

Therefore I propose that we wait a couple of weeks for a fix. If the problem remains unfixed, we release gcc 11.2 without Ada on i686.

@revelator
Copy link
Contributor

aye i also suspect something more than grep might have breakage that contributes to this since i tried bootstrapping it using several older versions of gcc which was newer built with grep-3.6 in the first place but all of them fails just as well when using a version that sports dwarf exceptions. Guess we now see the effects of having continual (and maybe unstable) updates.

@revelator
Copy link
Contributor

bug found and squashed upstream https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100486 ;)

@GitMensch
Copy link
Contributor

Yay, so I guess that means building 11.2 release + that patch integrated for this single release, correct? Note: that may not worth it if the 11.3 will stay in the 2-4 months release., because then 11.3 is likely to be released in the next two weeks and this can be adjusted to build the fixed 11.3 instead.
Any insights on 11.3?

@oscarfv oscarfv mentioned this pull request Oct 20, 2021
@lazka
Copy link
Member Author

lazka commented Oct 20, 2021

Closing in favor of #9825

@lazka lazka closed this Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.