Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake: Add Support for "read_csv" with https #7591

Merged
merged 1 commit into from
Dec 18, 2023
Merged

Snowflake: Add Support for "read_csv" with https #7591

merged 1 commit into from
Dec 18, 2023

Conversation

IndexSeek
Copy link
Contributor

Currently using the Snowflake backend with Ibis, you'll get an error if you specify a URL as a file path for read_csv. This works with other engines where the behavior is natively supported, but Snowflake requires those additional preprocessing steps.

I added a portion to use a TempFile and load the content to the temporary stage. Complex file formats might cause issues with this, but I'll try to do some additional testing.

We could likely add this to the other read_... methods, but I wanted to start with this for now.

It's possible to make this work with S3, Azure Blob, GCS, etc., but that could get complex determining whether or not to use client storage options or integrations available natively in Snowflake.

Here's some code you can use to test the new feature:

from ibis.interactive import *
from pathlib import Path
from snowflake.ml.utils.connection_params import SnowflakeLoginOptions

con = ibis.snowflake.connect(**SnowflakeLoginOptions())

con.read_csv("https://storage.googleapis.com/ibis-tutorial-data/wowah_data/locations.csv")

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if a bit circuitous (internet to local to internet) :)

Is there any chance that this will ever be natively supported by Snowflake itelf?

Happy to help add a test and deal with the optional requests dependency.

@IndexSeek
Copy link
Contributor Author

So Snowflake has this in PrPr: https://docs.snowflake.com/en/sql-reference/sql/create-external-access-integration, but it seems a little excessive and often requires higher permission.I agree, it's strange going from Internet, Local, back to Internet, I wished it could be streamlined.

I was thinking that we could simplify the code by using the Snowflake connector's put_stream function, but I wasn't sure if that would require a larger change.

Could we use urllib rather than requests to avoid that dep?

@cpcloud
Copy link
Member

cpcloud commented Nov 18, 2023

Could we use urllib rather than requests to avoid that dep?

I think so, and it looks like it'll be a bit cleaner:

from urllib.request import urlretrieve

tmp_file, _ = urlretrieve("https://storage.googleapis.com/ibis-tutorial-data/wowah_data/locations.csv")

# PUT file://{tmp_file}

@cpcloud cpcloud added this to the 7.2 milestone Nov 19, 2023
@cpcloud cpcloud added feature Features or general enhancements snowflake The Snowflake backend ux User experience related issues io Issues related to input and/or output labels Nov 19, 2023
@sfc-gh-twhite
Copy link

Would adding this functionality to the other Snowflake read... methods also be worthwhile? Perhaps breaking out this piece for reusability:

if path.startswith("https://"):
    with tempfile.NamedTemporaryFile() as tmp:
        urlretrieve(path, filename=tmp.name)
        tmp.flush()
        con.exec_driver_sql(
            f"PUT 'file://{tmp.name}' @{stage} PARALLEL = {threads:d} AUTO_COMPRESS = TRUE"
        )
else:
    con.exec_driver_sql(
        f"PUT 'file://{Path(path).absolute()}' @{stage} PARALLEL = {threads:d} AUTO_COMPRESS = TRUE"
    )

I guess there's also the possibility of "http" being an accepted prefix.

@cpcloud
Copy link
Member

cpcloud commented Nov 20, 2023

@sfc-gh-twhite Yeah, I think so!

Let's do that in a follow up though!

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

🚢

@cpcloud
Copy link
Member

cpcloud commented Dec 18, 2023

The included test passes:

ibis/backends/snowflake/tests/test_client.py::test_read_csv_https PASSED [100%]

@cpcloud cpcloud enabled auto-merge (rebase) December 18, 2023 17:53
@cpcloud cpcloud merged commit 72752eb into ibis-project:master Dec 18, 2023
79 of 80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements io Issues related to input and/or output snowflake The Snowflake backend ux User experience related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants