Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fftw3 in prop_gnlse is causing Julia 1.8 to crash #285

Closed
islent opened this issue Aug 24, 2022 · 20 comments
Closed

fftw3 in prop_gnlse is causing Julia 1.8 to crash #285

islent opened this issue Aug 24, 2022 · 20 comments

Comments

@islent
Copy link

islent commented Aug 24, 2022

I'm trying to run the codes in examples/simple_interface, and find that prop_gnlse can not be executed:

julia> using Luna

julia> γ = 0.1
0.1

julia> β2 = -1e-26
-1.0e-26

julia> N = 1.0
1.0

julia> τ0 = 10e-15
1.0e-14

julia> fr = 0.18
0.18

julia> P0 = N^2*abs(β2)/((1 - fr)*γ*τ0^2)
1219.512195121951

julia> flength = pi/2*τ0^2/abs(β2)*90
1.4137166941154071

julia> βs =  [0.0, 0.0, β2]
3-element Vector{Float64}:
  0.0
  0.0
 -1.0e-26

julia> 

julia> λ0 = 835e-9
8.35e-7

julia> λlims = [450e-9, 8000e-9]
2-element Vector{Float64}:
 4.5e-7
 8.0e-6

julia> trange = 12e-12
1.2e-11

julia> 

julia> output = prop_gnlse(γ, flength, βs; λ0, τfwhm=1.763*τ0, power=P0, pulseshape=:sech, λlims, trange,
                           raman=true, shock=false, fr, shotnoise=false, ramanmodel=:sdo, τ1=12.2e-15, τ2=32e-15,
                           saveN=601)
[ Info: Freq limits 0.04 - 0.67 PHz
[ Info: Samples needed: 8971.03, samples: 16384, δt = 1337.64 as
[ Info: No FFTW wisdom found

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x5fce07ac -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x5fce07ac --  at 0x5fce07ac --  at 0x5fce07ac -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATIONt E:\.julia\artifacts\b7dd1809d0626eac3bf6f97ba8ccfbb6cc63c509\bin\libfftw3-3.dll (unknown line)
in expression starting at REPL[13]:1
in expression starting at  at 0x5fce07ac -- text at E:\.julia\artifacts\b7dd1809d0626eac3bf6f97ba8ccfbb6cc63c509\bin\libfftw3-3.dll (unknown line)
in expression starting at REPL[13]:1
in expression starting at REPL[13]:1
0626eac3bf6f97ba8ccfbb6cc63c509\bin\libfftw3-3.dll (unknown line)
in expression starting at REPL[13]:1
0626eac3bf6f97ba8ccfbb6cc63c509\bin\libfftw3-3.dll (unknown line)
in expression starting at REPL[13]:1
.text at E:\.julia\artifacts\b7dd18unknown function (ip: 0000000066bdfdf5)
6eac3bf6f97ba8ccfbb6cc63c509\bin\libfftw3-3.dll (unknown line)
unknown function (ip: 0000000066bdfdf5)
.text at E:\.julia\artifacts\b7dd18unknown function (ip: 0000000066bdfdf5)
6eac3bf6f97ba8ccfbb6cc63c509\bin\libfftw3-3.dll (unknown line)
unknown function (ip: 0000000066bdfdf5)
#2 at .\threadingconstructs.jl:258
unknown function (ip: 0000000066bdfdf5)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 112850355 (Pool: 112819631; Big: 30724); GC: 78
Allocations: 112850355 (Pool: 112819631; Big: 30724); GC: 78
src\task.c:931
inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 112850355 (Pool: 112819631; Big: 30724); GC: 78
Allocations: 112850355 (Pool: 112819631; Big: 30724); GC: 78
src\task.c:931
inlined]

Here's my running environment:

julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab7 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × Intel(R) Xeon(R) W-10885M CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 8 on 16 virtual cores
Environment:
  JULIA_DEPOT_PATH = E:\.julia
  JULIA_NUM_THREADS = 8
@islent
Copy link
Author

islent commented Aug 24, 2022

prop_capillary throws the same error

@chrisbrahms
Copy link
Collaborator

chrisbrahms commented Aug 24, 2022

I can't reproduce this locally, and the CI tests on #286 also seem to work fine. I think the issue isn't with Luna but with FFTW.jl. If you run something super simple like

julia> import Luna: FFTW
julia> FFTW.fft([1, 2, 3])

do you get the same error?

@islent
Copy link
Author

islent commented Aug 24, 2022

do you get the same error?

No, I can't.

image

Julia 1.7 and the latest Julia 1.9 have the same problem, where I only installed Luna and PyPlot:
image
image

@chrisbrahms
Copy link
Collaborator

since the error appears with/during the FFTW planning, try deleting ~/.luna, which is where we store the cached FFTW wisdom.

@islent
Copy link
Author

islent commented Aug 24, 2022

What I tried and did not work:

  • manually delete the FFTW artifact folder and instantiate
  • rebuild FFTW
  • deleting ~/.luna

Tomorrow I will try those on another computer

@chrisbrahms
Copy link
Collaborator

Just to check whether we broke something: has Luna ever run fine on this machine?

@islent
Copy link
Author

islent commented Aug 24, 2022

since the error appears with/during the FFTW planning

I found the problem:
JuliaMath/FFTW.jl#243

@islent
Copy link
Author

islent commented Aug 24, 2022

has Luna ever run fine on this machine?

I'm a first time user. And the example in fourwavemixing_opposite_helicity.jl works fine:

fourwavemixing

@chrisbrahms
Copy link
Collaborator

Hm, this is very strange. There is no difference in the underlying FFT machinery between that example and the others, unless the error is connected to the specific FFT being planned. What happens if you change trange=500e-15 to trange=1000e-15 (ie double the time window to match the number of samples to your examples above)?

@chrisbrahms
Copy link
Collaborator

a quicker way of checking that would be

julia> import Luna: FFTW
julia> for n=1:16
       FFTW.plan_fft(zeros(2^n))
       FFTW.plan_rfft(zeros(2^n))
       end

@islent
Copy link
Author

islent commented Aug 24, 2022

What happens if you change trange=500e-15 to trange=1000e-15 (ie double the time window to match the number of samples to your examples above)?

Problem solved:
image

There seems problem in the result. Nevertheless, we have moved one step further!

a quicker way of checking that would be

No error in Julia 1.7 and Julia 1.8
image

@jtravs
Copy link
Contributor

jtravs commented Aug 24, 2022

The problem above is just that you need a larger time grid. If you set trange=12e-12 it should work fine.

@chrisbrahms
Copy link
Collaborator

The problem above is just that you need a larger time grid. If you set trange=12e-12 it should work fine.

This is correct--your time window is too small so the pulse hits the edge of the window and gets "absorbed" at the boundary.

However, it seems to me that making the grid smaller solved the problem by changing the required FFT size. This can only be down to the underlying fftw3 library I think.

@islent
Copy link
Author

islent commented Aug 24, 2022

Thanks!

I'd appreciate it if parameters in these files (and others) could be optimised:

  • examples\simple_interface\gnlse_scg.jl
  • examples\simple_interface\gnlse_sol.jl
  • examples\simple_interface\gnlse_ssfs.jl

@chrisbrahms
Copy link
Collaborator

the parameters in all of those examples are correct (as determined by the physics of the problem), and they all run fine on every machine we've tried them on. there must be a problem with the fftw library as installed/built on your machine I'm afraid.

one other thing you could try (just a guess) is disabling multithreading for FFTW by running

Luna.set_fftw_threads(1)

immediately after restarting Julia and import Luna.

@islent
Copy link
Author

islent commented Aug 24, 2022

Luna.set_fftw_threads(1)

That works for me!

I notice that the default FFTW threads is 32, which is larger than my CPU cores (8 physical cores and 16 logical CPU cores). To verify my hypothesis, I tried

  • Luna.set_fftw_threads(length(Sys.cpu_info())) -> error
  • Luna.set_fftw_threads(8) -> error
  • Luna.set_fftw_threads(3) -> error
  • Luna.set_fftw_threads(2) -> fine
  • Luna.set_fftw_threads(1) -> fine

and only to find that multi-threaded FFT on my computer is unpredictably unstable.

@chrisbrahms
Copy link
Collaborator

chrisbrahms commented Aug 24, 2022

Good! I think we probably don't test this routinely simply because we tend to run all simulations single-threaded (ie JULIA_NUM_THREADS is 1, whereas on your machine it seems to be 8)

I notice that the default FFTW threads is 32, which is larger than my CPU cores (8 physical cores and 16 logical CPU cores)

This is expected, because FFTW wants you to spawn more threads than are actually available to help with load balancing (see JuliaMath/FFTW.jl#121 (comment))

@chrisbrahms
Copy link
Collaborator

we implement this here:

function FFTWthreads()

@islent
Copy link
Author

islent commented Aug 24, 2022

whereas on your machine it seems to be 8

Yes, I'm using 8 threads by default.

@islent
Copy link
Author

islent commented Aug 24, 2022

I'm closing this issue and continue on learning examples. Many thanks!

@islent islent closed this as completed Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants