Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read from gzfile #70

Open
mvaudel opened this issue Jun 13, 2017 · 6 comments
Open

read from gzfile #70

mvaudel opened this issue Jun 13, 2017 · 6 comments

Comments

@mvaudel
Copy link

mvaudel commented Jun 13, 2017

Hi,

Thank you for this useful package. I use to read my matrices from text files using read.big.matrix. I was wondering whether it would be possible to support input from gzfiles?

Best regards,

Marc

@cdeterman
Copy link
Contributor

@mvaudel I can imagine a quick and dirty solution which would involve just uncompressing the file using gunzip and then reading the resulting file in as a big.matrix. I'm not sure otherwise about any R interface reading directly from gzfiles. If such an interface exists, then we could certainly explore it otherwise I think we will likely refer users to simply uncompress the file themselves (assuming other authors feel the same).

@mvaudel
Copy link
Author

mvaudel commented Jun 13, 2017

Thank you for your answer. It would be really convenient to read directly from the gzipped files because our files are quite huge so it is a substantial gain of time and space if we can read directly from them and deflate on the fly. Are you working on the files themselves or using a connection? For the latter if you can let us provide the connection directly instead of the file name, that should do the trick (https://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html).

@privefl
Copy link
Contributor

privefl commented Jun 13, 2017

Hey, check this function. You can found a vignette with more information.
This may not be super fast, but it is quite flexible.
Check all the arguments you need to specify, especially the file.nline that you have to know explicitly, because the function can't compute it on a compressed file.

@jarbet
Copy link

jarbet commented Mar 28, 2023

Any updates on this? I am trying to read a large .txt.gz file that contains character/string data. I know fread can read .txt.gz files, but the file is larger than my available RAM. I can't use bigstatsr::big_read because it does not support character type data.

Would it be possible to combine read.big.matrix with fread in some way, to support reading .gz files?

@privefl
Copy link
Contributor

privefl commented Mar 29, 2023

Maybe this?

@jarbet
Copy link

jarbet commented Mar 29, 2023

Maybe this?

Cool, I see they have a workaround for reading .gz files, so this should work. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants