Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow download speed v214 #522

Open
kafker opened this issue Jun 2, 2023 · 20 comments
Open

Slow download speed v214 #522

kafker opened this issue Jun 2, 2023 · 20 comments

Comments

@kafker
Copy link

kafker commented Jun 2, 2023

GitHub issues are specifically for issues with the GTDB-Tk, please join us on the GTDB forum:

Dear devs,

not sure if this was caused by a network problem on my side or your side. I am trying to download the latest GTDB database:

wget https://data.gtdb.ecogenomic.org/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

However, after 10 min or so the download drops to 100k per sec, making it impossible to download the database in a reasonable amount of time.

I tried different wireless connections (HPC or home) but nothing seems to work.

Thank you!
K

@konstantin-demin
Copy link

Same issue here. I have relatively high speed internet at home. Still, the download rate of GTDB db barely exceeds 200 kb/s.

@aaronmussig
Copy link
Member

Hello,

Thank you for raising this issue, I'll take a look into this.

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

I'm also in the process of applying for an additional quota to use Zenodo as a secondary mirror.

Cheers,
Aaron

@kafker
Copy link
Author

kafker commented Jun 4, 2023

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

Hi Aron,

The download from the mirror is much more stable.

The download speed was 4-7 MB/s

Thank you!
K

@konstantin-demin
Copy link

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

Hello Aron. The link you provided reaches the same speed as before, ~200 kb/s. But I was managed to download the db by switching to Windows and directly downloading it from the latest link in the list of releases here https://ecogenomics.github.io/GTDBTk/installing/index.html. From windows, the speed was 4-10 mb/s. I don't really know if the problem is in automatic download or in my Linux machine (facing no problems with any other downloads of any other thing anyway).

I think additional mirror wouldn't be bad.

Thanks for help!

@Sumsarium
Copy link

Hello,

Thank you for raising this issue, I'll take a look into this.

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

I'm also in the process of applying for an additional quota to use Zenodo as a secondary mirror.

Cheers, Aaron

I get 0.5-5 mb/s using the above link. That's about 10-20x faster than normal...

@aaronmussig
Copy link
Member

Sorry to hear about the slow speeds, I am still waiting on Zenodo to get back to me about additional storage.

In the meantime, I've developed a small program that will download the GTDB-Tk R214 reference database from the unarchived data. It's fault tolerant and will allow you to download with multiple threads.

If anyone who is experiencing slow download speeds would like to give it a go, please see: https://github.com/Ecogenomics/gtdbtk-db-download

I've got a few ideas that would be a bit more involved in speeding it up, i.e. namely downloading the fasta files from NCBI, but I'll only do that if this is still unusable.

@ValentinCledassou
Copy link

ValentinCledassou commented Aug 25, 2023

impossible to download the R214 database, it's too slow (20kb/sec)..... same for the mirror

@aaronmussig
Copy link
Member

I tested the download speed from Denmark and Australia and the download speed was at ~7MB/s. Nevertheless I rebooted NGINX, did it help?

@ValentinCledassou
Copy link

With a VPN for Australia, I have the same speed that you. But without Vpn (in France) it's always ~20kb/sec

@Sumsarium
Copy link

Mine starts at 8 mb/s but quickly drops down to around 300-500 kb/s. Generally seems to be a bit unstable wrt speed. I haven´t tested it via VPN. Not a big issue (for me at least) as long as the databases aren´t updated on a weekly basis...

@bheimbu
Copy link

bheimbu commented Apr 5, 2024

Hi @aaronmussig,

any news on this? The download speed from Germany is super slow, like 200 kb/s.

Cheers Bastian

@iwilkie
Copy link

iwilkie commented May 13, 2024

Hi,

Is there a solution for this issue when downloading r220? I've noticed that my download of the new release oscillates between 10 - 60 KB/s, and our IT department confirmed that it's not an issue from our side.

Thanks!

@Sumsarium
Copy link

This seems to be a persistent issue. It still takes me several days to download the databases (Denmark).

@iwilkie
Copy link

iwilkie commented May 13, 2024

It still takes me several days to download the databases (Denmark)

I'm in Germany and my download has been going for 10 days now... Have you tried using the VPN to Australia? Unfortunately I cannot test this from my work setup, but I wanted to give it a try when I get back to my personal computer.

@marianamnoriega
Copy link

Hello!!
I am based in Germany and facing the same issue when downloading R220, the mirror didn't improve anything and the speed is fluctuating between 20 and 300 KB/s (mostly in the lower range). Did anyone find a solution to this?? (Downloading to windows is not an option)

@jolespin
Copy link

@marianamnoriega are you getting this with the mirror too? I just downloaded the other day (US) and it was pretty fast.

@cpauvert
Copy link

Hi ! Thanks for this resource, I'm facing the same issue.

@marianamnoriega are you getting this with the mirror too? I just downloaded the other day (US) and it was pretty fast.

@jolespin , in Germany, same rate (~50 kb/s) with either the primary or mirror URL. This is the case with wget and with a browser.

Let me know how I can help you help us!
Best regards,

@AstrobioMike
Copy link

heya folks!

I had trouble with this for a while too, especially because i'm often installing gtdbtk on multiple systems. In the US, the mirror has worked well for me, but i see from this thread here that's not always the case for all :/

I don't know how well google drive downloads work from different places around the world, but if anyone would like to have somewhere else to try to pull from, i've put R220 up there (as downloaded from here on 21-Jun-2024).

You can grab it from the google drive directly from here: https://drive.google.com/drive/u/0/folders/1YOtMHILvs3xS9cZ2CjW7n20myZYDYVh6

Or if you need to pull it directly to a remote machine, it's more difficult to download from google drive programmatically than it should be, but lately i've had luck with gdrive3 – I'm using the latest 3.9.1 at the time of posting this. After installing and setting that up, it could be grabbed with the following:

gdrive files download 16qqRgrlb0Xwip_fvhXQgXGlPcZoab3UC

@cpauvert
Copy link

Thanks @AstrobioMike for this effort for the community, much appreciated! It is now downloading ~10MB/s (estimated myself as there is currently no rate displayed glotlabs/gdrive#44) on a remote.

@bayegy
Copy link

bayegy commented Aug 7, 2024

is there any mirror for china, download database in china is impossible for now(<1kb/s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests