-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake: Add Support for "read_csv" with https #7591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if a bit circuitous (internet to local to internet) :)
Is there any chance that this will ever be natively supported by Snowflake itelf?
Happy to help add a test and deal with the optional requests
dependency.
So Snowflake has this in PrPr: https://docs.snowflake.com/en/sql-reference/sql/create-external-access-integration, but it seems a little excessive and often requires higher permission.I agree, it's strange going from Internet, Local, back to Internet, I wished it could be streamlined. I was thinking that we could simplify the code by using the Snowflake connector's Could we use urllib rather than requests to avoid that dep? |
I think so, and it looks like it'll be a bit cleaner: from urllib.request import urlretrieve
tmp_file, _ = urlretrieve("https://storage.googleapis.com/ibis-tutorial-data/wowah_data/locations.csv")
# PUT file://{tmp_file} |
Would adding this functionality to the other Snowflake if path.startswith("https://"):
with tempfile.NamedTemporaryFile() as tmp:
urlretrieve(path, filename=tmp.name)
tmp.flush()
con.exec_driver_sql(
f"PUT 'file://{tmp.name}' @{stage} PARALLEL = {threads:d} AUTO_COMPRESS = TRUE"
)
else:
con.exec_driver_sql(
f"PUT 'file://{Path(path).absolute()}' @{stage} PARALLEL = {threads:d} AUTO_COMPRESS = TRUE"
) I guess there's also the possibility of "http" being an accepted prefix. |
@sfc-gh-twhite Yeah, I think so! Let's do that in a follow up though! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
🚢
The included test passes:
|
Currently using the Snowflake backend with Ibis, you'll get an error if you specify a URL as a file path for read_csv. This works with other engines where the behavior is natively supported, but Snowflake requires those additional preprocessing steps.
I added a portion to use a TempFile and load the content to the temporary stage. Complex file formats might cause issues with this, but I'll try to do some additional testing.
We could likely add this to the other
read_...
methods, but I wanted to start with this for now.It's possible to make this work with S3, Azure Blob, GCS, etc., but that could get complex determining whether or not to use client storage options or integrations available natively in Snowflake.
Here's some code you can use to test the new feature: