Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Darwin/ARM64 tracking issue #36617

Closed
29 of 31 tasks
Keno opened this issue Jul 11, 2020 · 155 comments
Closed
29 of 31 tasks

Darwin/ARM64 tracking issue #36617

Keno opened this issue Jul 11, 2020 · 155 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips system:arm ARMv7 and AArch64 system:mac Affects only macOS

Comments

@Keno
Copy link
Member

Keno commented Jul 11, 2020

I figured it would be worth having a single issue to track all the known issues on Apple Silicon. I'll try to keep this list updated as things get fixed or people encounter additional issues.

      From worker 14:	While deleting: i8* %splitgep
      From worker 14:	An asserting value handle still pointed to this value!
      From worker 14:	UNREACHABLE executed at /Users/julia/julia/deps/srccache/llvm-10.0.0/lib/IR/Value.cpp:917!
      From worker 14:	
      From worker 14:	signal (6): Abort trap: 6
      From worker 14:	in expression starting at /Users/julia/julia/usr/share/julia/stdlib/v1.6/LinearAlgebra/test/diagonal.jl:11
  • Some sort of (intermittent) issue during precompile:
Generating REPL precompile statements... 22/28ERROR: LoadError: IOError: stream is closed or unusable
  • Test failure in worlds test:
worlds                             (4) |         failed at 2020-11-13T00:31:04.270
On worker 4:
BoundsError: attempt to access 3-element BitVector at index [0:3]
Worker 6 terminated.
numbers                            (6) |         failed at 2020-11-13T00:31:34.703
ProcessExitedException(6)
  • Segfault in complex test:
    complex (2) | started at 2020-11-13T00:39:12.332
      From worker 2:	
      From worker 2:	signal (11): Segmentation fault: 11
      From worker 2:	in expression starting at /Users/julia/julia23/test/complex.jl:30
      From worker 2:	jl_method_error_bare at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
      From worker 2:	jl_method_error at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
      From worker 2:	jl_apply_generic at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
      From worker 2:	do_call at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
LinearAlgebra/triangular (running for 61 minutes)
LinearAlgebra/addmul (running for 55 minutes)
bitarray (running for 53 minutes)
iterators (running for 52 minutes)
ccall (running for 39 minutes)
loading (running for 39 minutes)
sorting (running for 24 minutes)
  • Test failure in inference
compiler/inference                 (5) |         failed at 2020-11-13T01:24:18.980
Test Failed at /Users/julia/julia23/test/compiler/inference.jl:944
  Expression: break_21369()
    Expected: ErrorException
      Thrown: BoundsError
signal (11): Segmentation fault: 11
in expression starting at REPL[1]:1
jfptr_LinearIndices_7740 at /Users/julia/julia-master/usr/lib/julia/sys-debug.dylib (unknown line)
_jl_invoke at /Users/julia/julia-master/src/gf.c:2223
jl_apply_generic at /Users/julia/julia-master/src/gf.c:2424
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1432
ssa_substitute! at ./compiler/ssair/inlining.jl:1406 [inlined]
ir_inline_item! at ./compiler/ssair/inlining.jl:369
batch_inline! at ./compiler/ssair/inlining.jl:553
signal (11): Segmentation fault: 11
in expression starting at none:0
<= at ./int.jl:444 [inlined]
>= at ./operators.jl:409 [inlined]
unitrange_last at ./range.jl:359 [inlined]
UnitRange at ./range.jl:354 [inlined]
julia> typemin(Int32)
2147483648
  • LLVM Assertion failure in iterators/bitarray test
      From worker 4:	Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/julia/julia-master/deps/srccache/llvm-11.0.1/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210.
      From worker 4:	
      From worker 4:	signal (6): Abort trap: 6
      From worker 4:	in expression starting at /Users/julia/julia-master/test/iterators.jl:343
      From worker 4:	__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 4:	Allocations: 962963350 (Pool: 962516480; Big: 446870); GC: 662
Worker 4 terminated.
iterators                          (4) |         failed at 2021-02-25T16:52:13.534
LibCURL                           (24) |         failed at 2021-02-25T17:03:28.235
Error During Test at /Users/julia/julia-master/usr/share/julia/stdlib/v1.7/LibCURL/test/runtests.jl:34
  Got exception outside of a @test
  SSL peer handshake failed, the server most likely requires a client certificate to connect while requesting https://github.com/JuliaWeb/LibCURL.jl/blob/master/README.md
Test Failed at /Users/julia/julia-master/usr/share/julia/stdlib/v1.7/LibCURL/test/ssl.jl:32
  Expression: res == CURLE_OK
   Evaluated: 0x00000023 == 0x00000000
@Keno Keno added system:mac Affects only macOS system:arm ARMv7 and AArch64 help wanted Indicates that a maintainer wants help on an issue or pull request labels Jul 11, 2020
@yuyichao
Copy link
Contributor

Is the compiler enabling all the features available by default?

In another word, does it pass

# ifdef __ARM_FEATURE_CRC32
by default? Or do we have to do a +crc one way or another ourselves.

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

Here's what's enabled by default:

#define __ARM64_ARCH_8__ 1
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_DIRECTED_ROUNDING 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_LDREX 0xF
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xE
#define __ARM_FP16_ARGS 1
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xE
#define __ARM_NEON__ 1
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4

Since I doubt there'll be a mac without crc32, we should just add that to the default feature flags in our Makefile. For everything else we can do runtime detection with sysctl.

@yuyichao
Copy link
Contributor

I'm surprised that it enables crypto but not crc.... Yeah, I don't think it's worth doing a runtime detection here.

And from #36592 (comment) it doesn't seem to provide all the features that LLVM may use

The features detectable currently appears to be

hw.optional.neon_fp16: fullfp16
hw.optional.armv8_1_atomics: lse
hw.optional.armv8_crc32: crc
hw.optional.armv8_2_fhm: fp16fml
__ARM_FEATURE_CRYPTO (compile time): aes, sha2

The ones that should be supported on that CPU (all requirement from armv8.3-a) are jsconv, complxnum, rcpc, ccpp, rdm. Some of the floating point ones are quite intereting.

Also intereting that since fp16fml is reported the featureset is closer to that of a13 than a12. (that or the LLVM feature set for a12 is wrong...)


Anyway, this is probably a low priority item...

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

Looks like they're just shipping an old LLVM, e.g. if I try to build jsconv (just to see whether it would run) fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.fjcvtzs

@yuyichao
Copy link
Contributor

Huh, which LLVM version do they have? Over at

JL_FEATURE_DEF(jsconv, 13, 0) // HWCAP_JSCVT. Required in ARMv8.3
I was assuming as long as the feature is available in AArch64.td it's usable... Is that not the case? (and/or is that a mac only problem?)

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

Huh, which LLVM version do they have

I don't know. It claims to be LLVM 12, but Apple lies about versions. I'm building upstream clang now to try it out.

@yuyichao
Copy link
Contributor

It also seems that although the feature was added in https://reviews.llvm.org/D54633 which is in LLVM 8.0 the intrinsic wasn't added until https://reviews.llvm.org/D64495 much later. Does that error mean that it's a recognized intrinsic but just isn't supported by the backend? I guess just writing inline assembly shoud be good enough for testing.

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

Fails upstream too.

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

Works with raw llc and +mattr though, so I'm gonna say it does exist.

@yuyichao
Copy link
Contributor

... I thought the error you got is a backend one..... (so llc should behave the same as clang = = ....., unless clang emits the wrong IR...)

@Keno
Copy link
Member Author

Keno commented Jul 11, 2020

I manually added the correct mattr to llc. I also managed to get it to work with -mcpu=apple-a12 at the clang level (appears to default to apple-a7). I filed an issue with Apple to get a better error message as well as bumping the default.

@yuyichao
Copy link
Contributor

Ah, OK. So you didn't set the target when running with clang.

@Keno
Copy link
Member Author

Keno commented Jul 12, 2020

I tried, but mattr=armv8.3-a+jsconv didn't seem to do it.

@yuyichao
Copy link
Contributor

  From worker 14:	While deleting: i8* %splitgep
  From worker 14:	An asserting value handle still pointed to this value!
  From worker 14:	UNREACHABLE executed at /Users/julia/julia/deps/srccache/llvm-10.0.0/lib/IR/Value.cpp:917!

Ah, this is where I've seen this issue... It's not Darwin or ARM/AArch64 specific and it's fixed by https://reviews.llvm.org/D84031

@ViralBShah
Copy link
Member

Can we get a BB shard going without the Fortran compiler, and see how much of the BB ecosystem can be built?

@ViralBShah
Copy link
Member

Just thinking out aloud here. The major use of Fortran in the julia build is to build LAPACK (part of the openblas build). We could have a Fortran to Julia translator and move LAPACK to Julia. Of course BB has a bunch of other fortran libraries, and there's lot of commercial software packages that need fortran compilers.

@certik
Copy link
Contributor

certik commented Aug 14, 2020

We could have a Fortran to Julia translator and move LAPACK to Julia.

If anyone is interested in helping, I'll be happy to add and maintain Fortran to Julia translator in LFortran. We already have LLVM and C++ backends. It took us quite some time to get to this point, as a lot of infrastructure had to be figured out and implemented, but we now have a foundation of a production C++ implementation of the compiler and are making rapid progress in adding features. As an example of what works already, this Fortran code:

https://gitlab.com/lfortran/lfortran/-/blob/7384b0ff81eaa2043281e48ae5158d34fcbf26f6/integration_tests/arrays_04.f90

gets correctly translated to this C++ code (and it compiles and runs):

https://gitlab.com/lfortran/lfortran/-/blob/master/tests/reference/cpp-arrays_04-ae9bd17.stdout

The C++ translator itself is implemented here: https://gitlab.com/lfortran/lfortran/-/blob/7384b0ff81eaa2043281e48ae5158d34fcbf26f6/src/lfortran/codegen/asr_to_cpp.cpp, as you can see it is a simple visitor pattern over the Abstract Semantic Representation (ASR) which contains all the types and everything is figured out and ready for LLVM or C++ translation.

I don't like making predictions how long it will take us to be able to compile Lapack, but I am hoping it is in the range of months now.

Assuming we could translate Lapack to C++ (or Julia also) automatically and correctly and quickly in a few months, what would be the workflow?

I can imagine two workflows in the future:

  • You translate once and just maintain the resulting code in C++ (or Julia). We will try to ensure the translator produces a nice readable and maintainable C++ code.

  • You keep Lapack in Fortran, but translate each new version to C++ or Julia. That way when upstream makes some changes, you will get them.

Regarding speed and performance of the translated code, that is currently unclear to me whether there can be some obstacle that would prevent it to match the performance of the original Fortran code. But we will find out, and I would think it should be possible to translate in a way to keep the performance.

@ViralBShah
Copy link
Member

LAPACK will keep moving upstream. So we have to keep running the translator on any new version - perhaps could even be integrated into BinaryBuilder. Performance shouldn't be a major problem - since 90% of the performance is anyways from calling the BLAS. The main problem will be testing correctness. Presumably the LAPACK tests translated + Julia tests may be sufficient to get started.

@certik
Copy link
Contributor

certik commented Aug 14, 2020

@ViralBShah that makes sense. Regarding correctness: my goal is for people to use LFortran as a regular Fortran compiler via LLVM, which will ensure that the parsing -> AST -> ASR -> LLVM is all correct. The ASR -> C++ backend is thus starting from a well tested starting point (ASR) that has been exercised well via the LLVM route, so there will be bugs, but they will be well isolated, and engineering-wise I think this can be delivered and made robust. The ASR -> Julia backend would be similar.

I am very excited about this, and I will keep you updated. As I said, it will take us probably months to get something initially usable, and then it takes time to mature everything, so I don't want to give you false hope that it can fix your immediate problem; but I will work towards this, I think it will become very useful to a lot of people once it matures.

@Keno
Copy link
Member Author

Keno commented Aug 14, 2020

I think for actively developed upstream projects, we'd rather just use lfortran as a straight LLVM compiler. The automatic translation part mostly makes sense where people want to do new development in Julia.

@claui
Copy link

claui commented Sep 4, 2020

Just learned that there’s some ongoing effort at porting the GCC backend: https://github.com/iains/gcc-darwin-arm64

@Keno
Copy link
Member Author

Keno commented Sep 4, 2020

Yep, we're on top of it (JuliaPackaging/Yggdrasil#1626), thanks!

@ViralBShah
Copy link
Member

Should LLVM9 process ARM64 relocations incorrectly be marked done, since the linked PR is merged?

@Keno
Copy link
Member Author

Keno commented Nov 13, 2020

I've updated the tracking list with all items I currently know about.

@ViralBShah
Copy link
Member

I wonder how well Julia will run on Rossetta 2.

@Keno
Copy link
Member Author

Keno commented Nov 13, 2020

Works ok, but at reduced perf of course.

@hexaeder
Copy link
Contributor

hexaeder commented Aug 7, 2021

The MWE was actually much simpler than expected, i just didn't try enough evals in my prev attempts.

using Base.Threads: @threads

function foo()
    @threads for i in 1:10
        bar()
    end
end

function bar()
    @threads for i in 1:10
        rand(100)
    end
end

for i in 1:1000
    println(i)
    for j in 1:10000
        foo()
    end
end

For me this freezes after printing 20-500 numbers. Sometimes ^C works resulting in a stacktrace like above. Sometimes only multiple ^C help (than there is no stacktrace).

julia> versioninfo()
Julia Version 1.7.0-beta3
Commit e76c9dad42* (2021-07-07 08:12 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin20.5.0)
  CPU: Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.0 (ORCJIT, cyclone)

Works fine under 1.7-beta3 on rosetta.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2021

Might be a memory model issue? The M1 reorder queues are much deeper than any other aarch64 chip people will have tried.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2021

Can you open that as a separate issue, so it doesn't get lost?

@dnadlinger
Copy link
Member

Recent PRs:

#43612
#43613
#43516

With these and my WIP JITLink migration changes, I've got the test suite passing except for a sporadic hang in test/threads.jl, which is probably the above issue.

@dnadlinger
Copy link
Member

JITLink PR: #43664

This should fix the "LLVM Assertion failure in iterators/bitarray test" and "Segfault in SVD test" boxes in the above list.

@dnadlinger
Copy link
Member

dnadlinger commented Jan 20, 2022

@Keno: You might want to tick those boxes now that #43664 was merged, and add #41820 to the to-do list (I don't have edit permissions). FWIW, I don't think I've ever seen the precompilation issue while I was working on the other issues.

@Keno
Copy link
Member Author

Keno commented Jan 20, 2022

Done, but this issue has probably outlived its usefulness at this point, so I'm gonna go ahead and close it.

@Keno Keno closed this as completed Jan 20, 2022
@oleg-kachan
Copy link

oleg-kachan commented Jun 27, 2022

Seems that many questions regarding Jula's Apple silicon support redirect here.

So I wonder the current state of things:

  • do I understand right that currently Julia is only run through emulation on Rosetta 2 on Apple silicon?
  • when (or which version), if planned at all, Julia will run native on Apple silicon?
  • or it builds natively, but for now it is only the official binary for MacOS to be run on Rosetta 2?
  • if it would, are there plans to support Apple GPU or neural engine?

Clarifications are highly appreciated!

@hexaeder
Copy link
Contributor

hexaeder commented Jun 27, 2022

do I understand right that currently Julia is only run through emulation on Rosetta 2 on Apple silicon?

The current stable relase v1.7.3 technically runs native but is very buggy. Do not use.

when (or which version), if planned at all, Julia will run native on Apple silicon?

The upcomming v1.8 (rc1 allready available) runs much smoother. However it is still experimental and Tier 3 support. This won't change until there is ARM CI available. It mostly works fine but you might encounter problems with libraries with binary dependencies.

or it builds natively, but for now it is only the official binary for MacOS to be run on Rosetta 2?

builds natively from 1.7, runs acceptable from 1.8 onwards. Both versions can be downloaded precompiled for ARM from the downloads page if you don't want to compile Julia yourself.

if it would, are there plans to support Apple GPU or neural engine?

There is https://github.com/JuliaGPU/Metal.jl

@gbaraldi
Copy link
Member

On Metal.jl, it works best on macos13 which is still Beta IIRC. The neural engine doesn't have an API outside of CoreML so one would have to make so bindings for it but it doesn't seem trivial.
We already have apple silicon CI actually :)

@thynus
Copy link

thynus commented Jun 28, 2022

Seems that many questions regarding Jula's Apple silicon support redirect here.

So I wonder the current state of things:

* do I understand right that currently Julia is only run through emulation on Rosetta 2 on Apple silicon?

* when (or which version), if planned at all, Julia will run native on Apple silicon?

* or it builds natively, but for now it is only the official binary for MacOS to be run on Rosetta 2?

* if it would, are there plans to support Apple GPU or neural engine?

Clarifications are highly appreciated!

Julia 1.7 is a Universal binary on my machine (m1). Download it from:
https://julialang.org/downloads/

@jheinen
Copy link

jheinen commented Jun 28, 2022

Julia 1.7 is a Universal binary on my machine (m1).

Sure? At least not for 1.7.3 ...

% file julia-1.7.3/bin/julia 
julia-1.7.3/bin/julia: Mach-O 64-bit executable x86_64

@giordano
Copy link
Contributor

giordano commented Jun 28, 2022

To avoid further confusion: Julia binaries are not universal. There are different builds for x86_64 and aarch64. Also, the build of Julia v1.7.3 for aarch64-darwin failed, so you won't find anywhere a native M1 build of Julia v1.7.3 specifically, and it isn't worth spending time in fixing a build for a version where Julia is known to suffer from many more issues on that platform. We now have CI for this platform, which should support future development better.

To summaries: do not use native build for M1 of Julia v1.7, go to v1.8 or higher.

@thynus
Copy link

thynus commented Jun 28, 2022

Very interesting, I just went by what System Information says:
Julia-1:

Version: 1.7.3
Obtained from: Identified Developer
Last Modified: 5/25/22, 6:20 AM
Kind: Universal
Signed by: Developer ID Application: Julia Computing LLC (A427R7F42H), Developer ID Certification Authority, Apple Root CA
Location: /Applications/Julia-1.7.app

cheers,

@staticfloat
Copy link
Member

That's a false-positive due to the launcher application being a universal application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips system:arm ARMv7 and AArch64 system:mac Affects only macOS
Projects
None yet
Development

No branches or pull requests