Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems downloading 2GB text file #69

Closed
MarkEdmondson1234 opened this issue May 25, 2017 · 16 comments
Closed

Problems downloading 2GB text file #69

MarkEdmondson1234 opened this issue May 25, 2017 · 16 comments

Comments

@MarkEdmondson1234
Copy link
Collaborator

MarkEdmondson1234 commented May 25, 2017

Thread from cloudyr/cloudyr.github.io#15

@fncalderong said:

Hi, I am using googleCloudStorageRs package. I am trying to download a 2gb txt file with R but have no susses.

I tried to download the file with gcs_get_object function, but the file is in Raw format and is quite dificult to convert it.

I tried to get the URL of the file with gcs_download_url function, but when I try to download the file via URL using downloadFile from R.utils Package it download a html file (the page where google ask the credentials).

@MarkEdmondson1234 said:

Yes best report there, but it sounds like you only need to specify the .txt file extension in the name you have downloaded it to. Try renaming the file.

@fncalderong said:

when I try to run this:

gcs_get_object(objects$name[[36]], saveToDisk = "data.txt",bucket)

I get this:

Error in writeBin(bin, saveToDisk) :
  long vectors not supported yet: connections.c:4123

where
objects$name[[36]][1] "20170523/planofull_170523.txt"
thanks

@MarkEdmondson1234 MarkEdmondson1234 changed the title Downloading 2GC text file Problems downloading 2GB text file May 25, 2017
@MarkEdmondson1234
Copy link
Collaborator Author

Hmm ok, this is the save function in base R failing as 2GB is too big. Will investigate to see if it can be solved.

@MarkEdmondson1234
Copy link
Collaborator Author

From ?writeBin: Only 2^31 - 1 bytes can be written in a single call (and that is the maximum capacity of a raw vector on 32-bit platforms).

@leeper is there a work around for this?

@MarkEdmondson1234
Copy link
Collaborator Author

Same issue here richfitz/remake#76

@fncalderong
Copy link

fncalderong commented May 25, 2017

Nice!!!!....thank you so much. the reason I think is beacause gcs_get_object download it in raw format (long vector) and as you said is to big for R. But for instance if I try to download the file using Rcurl with this command:

URL <- gcs_download_url(objects$name[[36]],bucket,public=T)
library(RCurl)
getURL(URL)
download.file(URL, destfile = "data2.txt", method="wget")

I can download the file successfuly only if I set the file public (I can´t make this for security reasons).

@MarkEdmondson1234
Copy link
Collaborator Author

Hmm ok, in that case try downloading it via the gcs_download_url method, as you can set URLs that are not public but accessible via your Google account sign in.

@fncalderong
Copy link

So, in that way I must always give my credentials by web. Am I rigth?.

@fncalderong
Copy link

I meant, if i do it with a not public object, the result is a html file (a google page asking credentials).

@MarkEdmondson1234
Copy link
Collaborator Author

Yes, as we unfortunately don't have this feature yet, where you could have a signed URL that you can reach via code #54

I'll see if you can use the same auth you have from the other API calls to apply to the download that you can make via httr::GET

@MarkEdmondson1234
Copy link
Collaborator Author

MarkEdmondson1234 commented May 25, 2017

Probably best solution at the moment would be to use the python gsutil tool (https://cloud.google.com/storage/docs/access-control/create-signed-urls-gsutil) , either directly or to create the signed URL, then call that URL from R. If necessary you can wrap the python tool in R wrappers if you must stick within R.

@MarkEdmondson1234
Copy link
Collaborator Author

Ok it looks like a solved issue in httr, so I'll look to implement it r-lib/httr#44

@fncalderong
Copy link

Man that would be awesome!!! I appreciate your help.

@MarkEdmondson1234
Copy link
Collaborator Author

Give that a go now from the latest version on github, its working in my local tests for small files at least.

@MarkEdmondson1234
Copy link
Collaborator Author

gcs_get_object(obj$name[[3]], bucket = "your-project", saveToDisk = "blah.txt", overwrite = TRUE)

@fncalderong
Copy link

I will prove this!!!.

@fncalderong
Copy link

Worked like a charm!!!. you are the best man. thank you so much.

@MarkEdmondson1234
Copy link
Collaborator Author

Thanks for bringing it up, now others will benefit too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants