Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_to dimension limit #49

Closed
karoliskascenas opened this issue Feb 2, 2021 · 2 comments
Closed

copy_to dimension limit #49

karoliskascenas opened this issue Feb 2, 2021 · 2 comments

Comments

@karoliskascenas
Copy link
Contributor

Is there any reason why this limitation exists? I've rebuilt the package without it and was successfully able to write a 50000x5 table.

@ianmcook
Copy link
Owner

ianmcook commented Feb 3, 2021

The only direct means implyr has of copying data from an R session into an Impala table is by running a SQL INSERT ... VALUES() statement, with the values to be inserted represented as literals in the statement. This method of loading data into Impala is kludgy and is inefficient for large amounts of data, and it is potentially difficult to cancel or undo. That's why this limit exists. Whenever possible, I'd encourage you to use a more efficient method of loading data into whatever storage system backs your Impala table. But I recognize that this implyr method might be your best option in some cases. The choice of the number 1000 was somewhat arbitrary. It was based on some informal tests I performed while connecting over the internet to a remote instance of Impala over an average speed internet connection. If you would like to submit a PR to make this limit configurable using options(), I will gladly merge it. Thank you.

@karoliskascenas
Copy link
Contributor Author

#50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants