Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia 1.4 fails on startup (AMD Phenom on Linux) #35215

Closed
alea54 opened this issue Mar 22, 2020 · 77 comments · Fixed by #38347
Closed

Julia 1.4 fails on startup (AMD Phenom on Linux) #35215

alea54 opened this issue Mar 22, 2020 · 77 comments · Fixed by #38347
Assignees

Comments

@alea54
Copy link

alea54 commented Mar 22, 2020

uname -a
Linux odie 4.15.0-88-generic #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

cpuinfo.txt
errjulia-1.4.txt

@jdadavid
Copy link

Similar error here, on AMD Turion :

uname -a
Linux qube.jdad.org 5.5.9-desktop-1.mga7 #1 SMP Thu Mar 12 08:02:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

err-julia-1.4.0-turion.txt
cpuinfo-turion.txt

Attaching also versioninfo from (working) julia-1.3.1
julia-1.3-versioninfo-turion.txt

@ViralBShah
Copy link
Member

Did you happen to try RC1 or RC2?

@benz0li
Copy link
Contributor

benz0li commented Mar 22, 2020

Fails on startup (QEMU Virtual CPU):

$ julia
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
$ uname -a
Linux aa614eb95626 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 GNU/Linux

cpuinfo.txt


Related to llvm/llvm-project@b7b353b?

@alea54
Copy link
Author

alea54 commented Mar 22, 2020

@ViralBShah : Just tried. Both RC1 and RC2 start normally.

@fredrikekre
Copy link
Member

Has to be the new llvm binaries then I guess? v1.4.0-rc2...v1.4.0

@fabmazz
Copy link

fabmazz commented Mar 22, 2020

It fails to start also on my pc (AMD Phenom II X6 1075T) on system running fedora 31.
Running julia produces the following stack trace:
julia_err.txt

$uname -a
Linux desktopsopra 5.5.8-200.fc31.x86_64 #1 SMP Thu Mar 5 21:28:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

@staticfloat
Copy link
Sponsor Member

staticfloat commented Mar 22, 2020

It appears your machines are complaining at a palignr instruction. You can see in that linked webpage that the instruction is marked as "SSSE3", and that its opcode starts with 66 0F 3A 0F.

Our typical processor support mantra so far has been "core2 or greater", but what that actually means hasn't been that well-defined; we often used only older instructions, but newer instructions keep getting used by various tools in our toolchains, and they often give significant speedups (As an example, GMP and MPFR, the libraries behind our BigInt capabilities, recently had a 2x performance difference when restricted to only older instructions).

It appears that what's happened here is that the LLVM rebuild for 1.4.0 was done with a newer GCC version, which ended up using SSSE3 instructions where it previously didn't, and while that still passes the "core2 or greater" check (and thus was passed by our CI), it looks like you need a Bobcat/Bulldozer or later AMD processor to have SSSE3 support.

@fabmazz
Copy link

fabmazz commented Mar 23, 2020

In the end I compiled everything from scratch using the instructions on the julia repo.
I've uploaded the package I created to: https://github.com/fabmazz/julia/releases/tag/v1.4.0-amd10
If you share the same microarchitecture (amd k10), it should work also on your computer.

@jdadavid
Copy link

jdadavid commented Mar 24, 2020 via email

@ViralBShah
Copy link
Member

One of the things that this issue brings up is that we should probably formalize our informal min-CPU requirements. As @staticfloat says, we always assume something like core2 when we build all our binaries - and it sounds like some of these AMD systems are not compatible - since they don't have the newer instructions.

It may be that we have to drop older architectures and formalize what we require. I know it would be terrible for some users with systems like those in this issue - but perhaps we suggest they build from source. There are further ramifications beyond just Julia itself in the entire BinaryBuilder ecosystem where similar assumptions are perhaps being made.

@fabmazz
Copy link

fabmazz commented Mar 24, 2020 via email

@alea54
Copy link
Author

alea54 commented Mar 24, 2020

@fabmazz Thank you very much. Your binary did not work for me because your libc6 is too recent for my computer, but I recompiled with
MARCH=amdfam10
USE_BINARYBUILDER=0
in Make.user and it works.

@briochemc
Copy link
Contributor

👍 Had the same issue when I updated to Julia v1.4 for running simulations on a HPC cluster that has some old AMD CPU (model: AMD Opteron(tm) Processor 6176).

It would be great if this could be solved so that noob users like me can use Julia v1.4 without trying to build from source with some flags that I don't understand to be honest (embarassed noob here 😅) I will personally be reverting to using Julia v1.3 because that's much easier for me. But it would be great to be able to not have to do that 😃

@alea54
Copy link
Author

alea54 commented Mar 26, 2020

I agree with briochemc. K10 is not so old nor exotic. Forgetting these architectures would be very bad for Julia dissemination.

Meanwhile, you can use 1.4.0-rc2
https://sourceforge.net/projects/julia.mirror/files/v1.4.0-rc2/

@briochemc
Copy link
Contributor

Meanwhile, you can use 1.4.0-rc2

I got that but I had 1.3.1 already installed, so I just reverted back to using it because that's the easiest in my case.

@yuyichao
Copy link
Contributor

Our typical processor support mantra so far has been "core2 or greater", but what that actually means hasn't been that well-defined

It was pretty well defined (based on codegen) as sse2.

@pfarndt
Copy link
Contributor

pfarndt commented Apr 13, 2020

I have a 48 CPU machine with (according to /proc/cpuinfo includes sse2):

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm 
pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt 
lbrv svm_lock nrip_save pausefilter

bugs            : tlb_mmatch fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2

but still julia-1.4.0 won't run. However, the above provided version 1.4.0-rc2 actually does run.

@KristofferC
Copy link
Sponsor Member

1.4.0-rc2 didn't use the same LLVM build.

@shkit
Copy link

shkit commented Apr 19, 2020

I'm using a old AMD processor (Opteron 2374 HE) and encountered this problem.
I agreed that these old processors may be threw way, but I want to see these requirements in Release Notes or "this processor is not supported." message on Julia starts.

By the way, Linux binary failes to start, but FreeBSD binary works well.
I'm wondering differences.

@gsagoo
Copy link

gsagoo commented Apr 19, 2020

AMD Athlon(tm) II X4 630 Processor using Manjaro Linux.
I tried to run Julia 1.4.0 and Julia 1.4.1 and encountered the error below, I had to downgrade and will now probably uninstall Julia unless this problem gets resolved.

A major issue like this should not go unfixed for over a month.

`Invalid instruction at 0x7f8e638869f8: 0x66, 0x0f, 0x3a, 0x0f, 0xc0, 0x08, 0x0f, 0x83, 0x61, 0x05, 0x00, 0x00, 0x48, 0xc1, 0xe0

signal (4): Illegal instruction
in expression starting at none:0`

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Apr 19, 2020

@gsagoo: Demanding fixes and "threatening" to go away doesn't make anyone more inclined to do free work for you. If you really want this fixed, you are entirely welcome to submit a fix yourself.

@fabmazz
Copy link

fabmazz commented Apr 19, 2020 via email

@StefanKarpinski
Copy link
Sponsor Member

Sure, it could be improved in several ways, but that doesn't justify people being ungrateful and demanding about it. Minimum requirements should be documented clearly, and if a processor isn't supported, it would be better to just print a message saying that. And while a maintainer needs to merge any changes to fix this, this project is open source, you're using it for free and if you want something fixed, you can do the work to fix it instead of demanding that people you don't pay a dime to do it for you. If you're going to ask for fixes, at least be polite.

@ViralBShah
Copy link
Member

ViralBShah commented Apr 19, 2020

A place where folks in this thread can help is in documenting the minimum requirements and linking those prominently in various places.

@gsagoo
Copy link

gsagoo commented Apr 19, 2020

@StefanKarpinski Whether someone fixes the issue or not is up to them, you're right I am not paying anyone to maintain it, its their choice, Its not a "threat" to yourself or anyone else whether I leave Julia, its my freedom of choice to leave, I certainly don't owe Julia any allegiance.
I am not demanding a fix, merely pointing out the problem and the facts.
As @fabmazz most people will just see a crash upon start up and just leave.

@StefanKarpinski
Copy link
Sponsor Member

This is only an issue for hardware that is over a decade old and which we do not officially support. A better error message and documenting that we don't support such old hardware would be good.

@ViralBShah
Copy link
Member

ViralBShah commented Apr 19, 2020

@gsagoo Your point is absolutely right, and I am in full agreement that it isn't a great experience and people will be unhappy and leave. I think your comment was made in the right spirit, which your reply reinforces.

A major issue like this should not go unfixed for over a month.

It is only when I saw this line above, that I immediately felt uncomfortable. My first thought was to write back something snarky. After all, these are 10 year old architectures that are unsupported, and affects a very small number of users (hence not a major issue). But then I had to tell myself that saying all this wouldn't help, and I should stay focussed on trying to address the issue. I knew immediately that given the extremely small number of compiler contributors we have, a fix in the llvm/compiler domain was not a good idea immediately - although it will happen eventually. In the meanwhile, what's the best way we can warn users -and hence my subsequent reply about focussing on documentation and communication. That is something a lot more people can engage in and help with.

The reason I point this out is only in response to the follow-on conversation above. Broad sweeping statements like that which overstate the issue do not help the project, gets everyone discussing all sorts of other things, and make things look bad to those passing by.

@krcabrer
Copy link

I try to compile with some suggestions. Still fails.

My cpu is; model name : AMD Phenom(tm) II X4 955 Processor.

I'm using gcc verrsion gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0

I'm usiing GNU Fortran (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

I use this Make.user file also:

MARCH=amdfam10
OPENBLAS_TARGET_ARCH=NEHALEM
USE_BINARYBUILDER=0
OPENBLAS_USE_THREAD=0

This is error that I got;

julia_compiling_errors.txt

It does not work either. What else can I do?

@yuyichao
Copy link
Contributor

No, this issue is about the generic binary you download doesn't work on all x64 CPUs. The flag you use to compile openblas isn't the issue that causes it. If you have questions about how to compile julia yourself, please post the question on http://discourse.julialang.org/.

@xianwenchen
Copy link

What's the verdict now regarding the solution to this issue? There was an earlier suggestion to specifically state what CPUs are not supported. Is that going to be the outcome? If yes, then it is relevant for this issue, that someone produces a Julia binary that will fit all x86-64 CPUs, including the old ones that the official binary does not support.

@staticfloat
Copy link
Sponsor Member

I have been running my own experiments which verify Yichao's findings; namely that newer GCC versions get very good performance with even the restricted x86-64 instruction sets on GMP and MPFR, and so there is one less reason for us to restrict Julia itself to require SSSE3 instructions.

I've been trying to get us to come to a decision on the triage slack channel, but so far the right people have not been online to discuss it yet. But I think most likely what will happen is that Julia 1.6 will roll back the requirement for Julia itself, especially as we have new support coming into BB (it's not built yet, but should be ready somewhere around the 1.6 release timeframe) for building microarchitecture-specific tarballs for packages that really need it. This will allow us to get the best of both worlds for our binary dependencies as well. I am not 100% certain this will happen, as some other devs may have usecases that preclude this, but I think it likely this will happen.

I do not think this issue will be solved for Julia 1.4 or 1.5, as it's too late in the release cycle to go and rebuild all the dependencies. For the time being, to get a Julia that runs on these older machines, the best solution is still to build from source with USE_BIANRYBUILDER=0. Unfortunately, most binary packages will not work due to the same issue, and there's no easy fix for that.

@yuyichao
Copy link
Contributor

as it's too late in the release cycle to go and rebuild all the dependencies.

Well, just build the generic binary with USE_BIANRYBUILDER=0.

Unfortunately, most binary packages will not work due to the same issue, and there's no easy fix for that.

Rebuilding everything without the march set is still a solution, and I assume there's a way to reset the database to trigger that. It probably won't be fast but it can be done and it won't block any julia release since those will not use the BB binaries.

@yuyichao
Copy link
Contributor

And really it won't even be everything. It's everything after JuliaPackaging/Yggdrasil#358 is deployed. It should take no longer than 5 months o rebuild on a super low capacity and probably much faster. And the sooner the rebuild start the shorter it'll take.

@yuyichao
Copy link
Contributor

There was an earlier suggestion to specifically state what CPUs are not supported. Is that going to be the outcome?

No, that's not acceptable.

If yes, then it is relevant for this issue, that someone produces a Julia binary that will fit all x86-64 CPUs, including the old ones that the official binary does not support.

And no even if that's the case, the build instruction when you want to compile julia on your own is still a different issue. It'll indeed get more important if the requirement changes but discussing it does not help reaching a decision or solution on this issue. It very much still deserve an answer which is why I suggested you to open a thread on discourse and I (and many others) can answer all the questions you may have about what target to use.

@stochastic-thread
Copy link

I have a 6 year old dell r710 for my homelab and I can't use Julia because of this.

I am sad. Gotta really recompile the binary because of some LLVM nonsense? That sucks bigly

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Aug 7, 2020

@stochastic-thread there's a way for you to use Julia, even latest versions(s): 32-bit Julia has tier 1 support on Windows (and tier 2 on Linux), and should work with even very old CPUs.

Assuming you really need, or just want a 64-bit version, I see a comment linking to the latest, I think, Julia version (currently) available for older CPUs (up to 12 year old AMDs): #35215 (comment)

I'm not sure what #36502 changes, but it at least seems people are working on getting more CPUs to work. If Julia 1.6 (nightly, look at the download page) works for you then great, and the relevant change might get backported to 1.5.1: #36899

#35215 (comment)

@yuyichao
Copy link
Contributor

yuyichao commented Aug 7, 2020

No #36502 has nothing to do with this issue. Julia has always been working on these old CPU's. This is purely a build problem and requires absolutely no change in julia itself.

@jdadavid
Copy link

jdadavid commented Aug 8, 2020 via email

@stochastic-thread
Copy link

Thank you, appreciate it!

@staticfloat
Copy link
Sponsor Member

staticfloat commented Aug 25, 2020

We recently rebuilt all the GCC shards for BinaryBuilder, changing the default microarchitecture back to x86-64. This, paired with the already-merged but yet-unused capability for BinaryBuilder to generate microarchitecture-specific tarballs, will give us the best of both worlds; fast binaries when possible, but maximally compatible when we must be.

The dependencies of Julia itself (such as LLVM, GMP, MPFR, etc...) must be rebuilt with these new BB GCC shards, I will collect here the list of things that need to be rebuilt:

Julia's binary dependencies that need to be rebuilt:

  • OpenBLAS
  • LibCURL
  • GMP
  • LibGit2
  • LibSSH2
  • LibUV
  • LLVM
  • MbedTLS
  • MPFR
  • OpenLibm
  • p7zip
  • SuiteSparse
  • Zlib_jll
  • LibOSXUnwind (not important; this issue doesn't affect any macs we support)

Many of these will be naturally rebuilt over the course of us working on the 1.6 release, but rebuilding the current versions with the new BB shards and backporting those new binaries onto the 1.5 release is definitely possible. If anyone is interested in doing that work, I'll be happy to coach them through it, it's quite simple, just a little rote and time-consuming.

@ViralBShah
Copy link
Member

ViralBShah commented Sep 7, 2020

People are now reporting that Julia 1.5 does work on these older CPUs. Perhaps it just means that it doesn't segfault on startup.

@giordano
Copy link
Contributor

giordano commented Sep 7, 2020

People are now reporting that Julia 1.5 does work on these older CPUs. Perhaps it just means that it doesn't segfault on startup.

That was from an ArchLinux user who installed the julia package from official ArchLinux repositories, which uses system libraries, not those provided by BinaryBuilder

@jtappin
Copy link

jtappin commented Sep 9, 2020

People are now reporting that Julia 1.5 does work on these older CPUs. Perhaps it just means that it doesn't segfault on startup.

That was from an ArchLinux user who installed the julia package from official ArchLinux repositories, which uses system libraries, not those provided by BinaryBuilder

Also working on Manjaro (which I think uses the same package as Arch for julia).

@jdadavid
Copy link

jdadavid commented Sep 10, 2020 via email

@giordano
Copy link
Contributor

As I said above, people not experiencing the problem are using a Julia version in Arch repositories that doesn't use libraries built with BinaryBuilder, which however can lead to other problems at a different point

@ViralBShah
Copy link
Member

The good news is that the compiler have been upgraded but they have to work their way through everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.