`download()` multi-thread error #110

Moelf · 2021-04-11T21:26:14Z

on:

Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)

julia> imgs = [
       "https://images.unsplash.com/photo-1554266183-2696fdafe3ff?ixlib=rb-1.2.1&auto=format&fit=crop&w=564&q=80"
       "https://images.unsplash.com/photo-1527026950045-9e066846bae4?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1419&q=80"
       "https://images.unsplash.com/photo-1549287748-f095932c9f81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=606&q=80"
       "https://images.unsplash.com/photo-1437448317784-3a480be9571e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=634&q=80"
       ]

julia> imgs = repeat(imgs, 4)

julia> Threads.nthreads()
4

julia> Threads.@threads for url in imgs
           download(url)
       end

results in

ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:317 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:34
 [3] top-level scope
   @ ./threadingconstructs.jl:93

    nested task error: TaskFailedException
    Stacktrace:
      [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))

also seen in another occation:

Error: curl_multi_socket_action: 8

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2021-04-13T02:46:05Z

Yeah, something definitely isn't threadsafe. Will look into it.

fredrikekre · 2021-04-13T08:38:23Z

Here is the nested task error which might shed some more light:

        nested task error: val already in a list
        Stacktrace:
          [1] error(s::String)
            @ Base ./error.jl:33
          [2] push!
            @ ./linked_list.jl:53 [inlined]
          [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
            @ Base ./condition.jl:103
          [4] lock(rl::ReentrantLock)
            @ Base ./lock.jl:100
          [5] lock(f::Downloads.Curl.var"#39#41"{Int32, Downloads.Curl.Multi}, l::ReentrantLock)
            @ Base ./lock.jl:185
          [6] event_callback(uv_poll_p::Ptr{Nothing}, status::Int32, events::Int32)
            @ Downloads.Curl ~/julia16/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/Multi.jl:120

anandijain · 2021-04-13T17:53:17Z

I have little to add, but that I am also getting this. Is the temporary solution to just never call @threads ... download(...) end?

https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2335985628?check_suite_focus=true#step:5:2330 Is a link to a full trace/crash happening on CI.

Edit: I think threading run(curl...) is still okay

StefanKarpinski · 2021-04-13T18:57:08Z

This should be fixed but there's also no need to do multithreaded downloading — you can just use tasks:

@sync for url in imgs
    @async @show download(url)
end

In general, unless you're doing concurrent CPU-bound compute, tasks are preferable to threads.

c42f · 2021-08-20T07:42:19Z

Reading https://curl.se/libcurl/c/threadsafe.html it seems there could be several things we're not doing:

We should call curl_global_init explicitly as part of libcurl __init__
We probably need to set locking and unlocking callbacks using curl_share_setopt in order that libcurl itself can protect its internal connection pool and DNS cache. See https://curl.se/libcurl/c/curl_share_setopt.html
The page says "You must never share the same handle in multiple threads", but we share the multi handle. It's behind our own lock so that might be ok, but multiple easy handles are created from that multi handle — it's unclear from the docs whether they will then point to shared multi handle data structures and implicitly need locking as well.

c42f · 2021-08-20T09:51:59Z

To add to this, libcurl has a known bug where it can't share the connection pool between multiple threads. See

tsan: data race in multi.c when shared connection cache curl/curl#4915 (closed not because it's fixed! but because it's documented to be broken)
https://curl.se/libcurl/c/curl_share_setopt.html#CURLLOCKDATACONNECT

The following PR removed the libcurl example which showed how to use shared connection pools (due to this bug) but it's interesting nonetheless: curl/curl#6795

Unfortunately I think this libcurl bug combines quite badly with the task migration introduced in julia-1.7. A set of tasks which share a connection pool should also be running on the same OS thread, but the Julia runtime can now migrate those tasks at any time. We can't disable sharing of the connection pool either - that would be a performance disaster in the normal @async case.

StefanKarpinski · 2021-08-25T16:46:11Z

We could potentially do all libcurl calls from a single thread, but I'm not sure that we have a mechanism to do that anymore since all tasks can be migrated now? The API calls could just dispatch to a single thread that does the actual libcurl stuff. Since libcurl doesn't actually benefit from multithreading at all, it would be fine.

c42f · 2021-08-27T04:54:36Z

We could potentially do all libcurl calls from a single thread, but I'm not sure that we have a mechanism to do that anymore since all tasks can be migrated now?

At first I thought this was true, but there's a way around it:

@async still pins tasks to the thread of the parent task. (Side note — unpinned tasks become pinned in julia-1.7 if they launch children with @async to preserve coscheduling on the same OS thread, see [Tasks] Mark parent as sticky if we use @async julia#41334)
So we just need to launch all curl-calling tasks from a particular well-defined parent task using @async. This could be a worker pool or we could just launch a task per download.
User tasks can then run on any thread, but we can insulate libcurl from that with a Channel.

Possible caveats:

Limiting to a particular OS thread might make I/O resource starvation worse if a compute-bound task ends up on that thread.
How do curl callbacks relate to libuv in terms of which thread do they end up scheduled on?

libcurl doesn't actually benefit from multithreading at all

This is interesting. There must be some crossover point of CPU memory bandwidth vs network speed. But it seems CPUs should have of order ~10GiB/s memory bandwidth for a single core which naively needs 100Gigabit network to overwhelm a core. AWS networking tops out around there, except for special exceptions like the P4.

ericphanson · 2021-08-27T11:25:14Z

For what it's worth, @kolia and I have both been using Downloads.jl with AWS.jl to asyncmap (with the default 100 tasks) pull down a bunch of bytes from s3, and getting single-threaded throughput of only ~0.15 Gbps (in network between s3 and an ec2 instance in the same region), and seeing 100% CPU utilization on one core (with htop). By throwing a pmap on top with 16 cores, Kolia was able to hit 2.4 Gbps, which is at least the right order of magnitude (depending on the instance you should be able to get at least 5-10 Gbps, and way more for the super network optimized ones). So naively at least it seems like something where there is a CPU bottleneck and threading could really help.

c42f · 2021-08-30T06:42:33Z

Excellent, I've been meaning to do some performance tests of this kind of thing.

What was the typical file size for the data you're requesting, and what kind of bandwidth can you hit if you use a standalone downloader like rclone or s5cmd?

ericphanson · 2021-08-30T11:15:06Z

I think Kolia was working with larger files, but I had ~380k 20 kB files. I put 100k of them in a prefix test_cache_path to make it easier to use s5cmd.

I get (using Downloads.jl)

using AWSS3, AWS
AWS.DEFAULT_BACKEND[] = AWS.DownloadsBackend()
paths = readdir(test_cache_path)
all_bytes, t = @timed asyncmap(read, paths)
throughput = (sum(length, all_bytes)*8)/(t*1e9)
@info "Pulled down $(Base.format_bytes(sum(length, all_bytes))) bytes in $(round(t; digits=1)) seconds for a throughput of $(round(throughput; sigdigits=3)) Gbps (gigabits per second)"

using (bits/ns == gigabits/s), I get

[ Info: Pulled down 1.837 GiB bytes in 145.0 seconds for a throughput of 0.109 Gbps (gigabits per second)

I couldn't figure out a good way to read them in bulk into memory with s5cmd (seems like cat doesn't support wildcards, so I guess the right way is to write out a file with my 100k paths and use a run?) so I tried just writing them to disk. With 100k such files in a prefix test_cache_path, just doing

_, s5cmd_time = @timed read(`s5cmd cp $(test_cache_path)\* /tmp/s5cmd`)
throughput = (sum(length, all_bytes)*8)/(s5cmd_time*1e9)
@info "Pulled down $(Base.format_bytes(sum(length, all_bytes))) bytes in $(round(s5cmd_time; digits=1)) seconds for a throughput of $(round(throughput; sigdigits=3)) Gbps (gigabits per second)"

gives

[ Info: Pulled down 1.837 GiB bytes in 97.9 seconds for a throughput of 0.161 Gbps (gigabits per second)

So faster, but also not good. But I wonder how much is the cost of the disk IO -- since we skip that in the Julia case we should be able to be pretty fast. (That's also my actual use-case-- pulling cached data directly into ram, skipping disk IO).

kolia · 2021-08-30T14:05:24Z

I was getting 2.4Gps downloading 23k files of size ~2 - 4Mb each, using pmap from a single ec2 instance.

c42f · 2021-08-31T04:16:12Z

I couldn't figure out a good way to read them in bulk into memory with s5cmd

You could use a ramdisk (mount type tmpfs) to avoid any disk overhead. /dev/shm is usually mounted as tmpfs so you could create the files in there.

So faster, but also not good.

But if Downloads.jl can come close to the performance of s5cmd (or whatever else is regarded to be the fastest concurrent downloader) then we should be in reasonable shape.

mkschulze · 2021-10-04T19:42:45Z

Hello, just leaving a note here, that we would also like to see threadsafety with libcurl/downloads.jl. We use it extensively within our TypeDBClient.jl (gRPCClient.jl as transaction layer) and it would be great to have it being able to use multithreading. Shall we post an example of our usecase and where we would need it?

StefanKarpinski · 2021-10-26T17:49:10Z

I haven't tested beyond the original example here, but #151 seems to have fixed this. Are others able to test Downloads master in multithreaded use cases and confirm that it works now? If it does, then I'll tag a new release since thread safety seems like a significant feature.

StefanKarpinski · 2021-11-05T14:39:03Z

After some back and forth, the version on master here as well as Julia master should be threadsafe. Closing since that's the case and we can open specific issues about more fine-grained threading problems as they're encountered.

The mentioned issues are closed: - JuliaLang/Downloads.jl#110 (closed in Nov 2021 - JuliaLang/Downloads.jl#182 (closed in Apr 2022)

* Remove mention of Downloads.jl having threading issues The mentioned issues are closed: - JuliaLang/Downloads.jl#110 (closed in Nov 2021 - JuliaLang/Downloads.jl#182 (closed in Apr 2022) * Update src/utilities/downloads_backend.jl Co-authored-by: arnaudh <[email protected]> --------- Co-authored-by: mattBrzezinski <[email protected]> Co-authored-by: mattBrzezinski <[email protected]>

fredrikekre transferred this issue from JuliaLang/julia Apr 12, 2021

StefanKarpinski mentioned this issue May 28, 2021

StackOverflowError with curl_multi_socket_action #94

Closed

StefanKarpinski mentioned this issue Aug 26, 2021

tests crashing with memory / stack corruption #145

Closed

tanmaykm mentioned this issue Aug 31, 2021

WIP: add curl share locks to make things threadsafe JuliaComputing/gRPCClient.jl#18

Draft

ericphanson mentioned this issue Sep 29, 2021

Have s3_get_file use ETag to verify data not corrupted JuliaCloud/AWSS3.jl#216

Open

StefanKarpinski closed this as completed Nov 5, 2021

caseykneale mentioned this issue Mar 16, 2022

Multithread issue(julia 1.71) #182

Closed

pfitzseb mentioned this issue Aug 22, 2022

Error: curl_multi_socket_action: 8 #203

Closed

arnaudh added a commit to arnaudh/AWS.jl that referenced this issue Dec 13, 2022

Remove mention of Downloads.jl having threading issues

cc64593

The mentioned issues are closed: - JuliaLang/Downloads.jl#110 (closed in Nov 2021 - JuliaLang/Downloads.jl#182 (closed in Apr 2022)

arnaudh mentioned this issue Dec 13, 2022

Remove mention of Downloads.jl having threading issues JuliaCloud/AWS.jl#594

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`download()` multi-thread error #110

`download()` multi-thread error #110

Moelf commented Apr 11, 2021

StefanKarpinski commented Apr 13, 2021

fredrikekre commented Apr 13, 2021

anandijain commented Apr 13, 2021 •

edited

Loading

StefanKarpinski commented Apr 13, 2021 •

edited

Loading

c42f commented Aug 20, 2021

c42f commented Aug 20, 2021 •

edited

Loading

StefanKarpinski commented Aug 25, 2021

c42f commented Aug 27, 2021 •

edited

Loading

ericphanson commented Aug 27, 2021 •

edited

Loading

c42f commented Aug 30, 2021

ericphanson commented Aug 30, 2021

kolia commented Aug 30, 2021 •

edited

Loading

c42f commented Aug 31, 2021

mkschulze commented Oct 4, 2021

StefanKarpinski commented Oct 26, 2021

StefanKarpinski commented Nov 5, 2021

download() multi-thread error #110

download() multi-thread error #110

Comments

Moelf commented Apr 11, 2021

StefanKarpinski commented Apr 13, 2021

fredrikekre commented Apr 13, 2021

anandijain commented Apr 13, 2021 • edited Loading

StefanKarpinski commented Apr 13, 2021 • edited Loading

c42f commented Aug 20, 2021

c42f commented Aug 20, 2021 • edited Loading

StefanKarpinski commented Aug 25, 2021

c42f commented Aug 27, 2021 • edited Loading

ericphanson commented Aug 27, 2021 • edited Loading

c42f commented Aug 30, 2021

ericphanson commented Aug 30, 2021

kolia commented Aug 30, 2021 • edited Loading

c42f commented Aug 31, 2021

mkschulze commented Oct 4, 2021

StefanKarpinski commented Oct 26, 2021

StefanKarpinski commented Nov 5, 2021

`download()` multi-thread error #110

`download()` multi-thread error #110

anandijain commented Apr 13, 2021 •

edited

Loading

StefanKarpinski commented Apr 13, 2021 •

edited

Loading

c42f commented Aug 20, 2021 •

edited

Loading

c42f commented Aug 27, 2021 •

edited

Loading

ericphanson commented Aug 27, 2021 •

edited

Loading

kolia commented Aug 30, 2021 •

edited

Loading