-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add better spark support for snowflake offline store #3419
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sfc-gh-madkins The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
@amithadiraju1694 try this out ... you will need to have a pyspark environment with the snowflake spark connector installed. You will need to pass in the spark session plus a dict of snowflake login params ... see the function comments |
@amithadiraju1694 you probably already tried this code is my guess ... the reason you were getting that error is because the initial input dataframe is scoped to a different connection and spark cant find it |
Thanks for this @sfc-gh-madkins. Tried different variation of this along with your original solution, but the execution never stops ( data bricks keeps saying "Running Command" ). My final attempt looked like this in `
I tried the original solution as well, which's giving me the same result. I'm wondering if from original solution |
Are you sure you have the snowflake spark connector installed? Do you see
the query being issued to the snowflake side? I was able to test this with
success locally. Temporary tables are scoped to a specific snowflake
session.
…On Fri, Dec 30, 2022 at 6:39 PM Amith Adiraju ***@***.***> wrote:
@amithadiraju1694 <https://github.com/amithadiraju1694> you probably
already tried this code is my guess ... the reason you were getting that
error is because the initial input dataframe is scoped to a different
connection and spark cant find it
Thanks for this @sfc-gh-madkins <https://github.com/sfc-gh-madkins>.
Tried different variation of this along with your original solution, but
the execution never stops ( data bricks keeps saying "Running Command" ).
My final attempt looked like this in snowflake.py
`
def to_pyspark_df(self, spark_session: SparkSession, sfparam: dict) ->
DataFrame:
"""
Method to convert snowflake query results to pyspark data frame.
Args:
spark_session: spark Session variable of current environment.
Returns:
spark_df: A pyspark dataframe.
"""
if isinstance(spark_session, SparkSession):
table_name = "feast_spark_" + uuid.uuid4().hex
self.to_snowflake(table_name = table_name)
query = f'SELECT * FROM "{table_name}"'
spark_df = spark_session.read.format( "net.snowflake.spark.snowflake" ).options(**sfparam).option("query", query).option("autopushdown" , "on").load()
query = f'DROP TABLE "{table_name}"'
execute_snowflake_statement(self.snowflake_conn, query)
return spark_df`
I tried the original solution as well, which's giving me the same result.
I'm wondering if from original solution snowflake.py -> line no 486 to 500
should be run inside with query scope or outside of it. If inside, I'm
confused on why it needs to be run inside that scope ?
—
Reply to this email directly, view it on GitHub
<#3419 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCU7KDO26MPIXBREBW7DWP56FPANCNFSM6AAAAAATLTT22M>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I was running my code on databricks, so Does |
I debugged and found that, the program halts at |
Is there an error on the snowflake side?
…On Wed, Jan 4, 2023 at 2:27 PM Amith Adiraju ***@***.***> wrote:
Are you sure you have the snowflake spark connector installed? Do you see
the query being issued to the snowflake side? I was able to test this with
success locally. Temporary tables are scoped to a specific snowflake
session.
… <#m_-5553618855286185838_>
I debugged and found that, the program halts at
self.to_snowflake(table_name) ; temporary table with given table name
isn't created at all in given database.schema for some reason; not sure if
this is cuz of access issues, I'm using a dev instance crews and should
have required accesses.
—
Reply to this email directly, view it on GitHub
<#3419 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCU3HYOQEXYBEASCGIEDWQXMK7ANCNFSM6AAAAAATLTT22M>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
in |
Can you try this using the default snowflake project? feast init -t
snowflake The temporary argument should be set to false, as you dont want
to create a temporary table. sf_params should match the offline store
params.
…On Wed, Jan 4, 2023 at 5:36 PM Amith Adiraju ***@***.***> wrote:
Is there an error on the snowflake side?
… <#m_-5123287001562101569_>
in to_snowflake method, temporary argument was set to false by default,
changing that to true solved the unresponsiveness of the query. But now, I
see SQL compilation error , my 'DB.SCHEMA."table_name"' is not found or
not authorized. I faced a similar issue before for which I made a quick
fix, but even the quick fix isn't working now ( my schema is not public
contains underscore in its name ).
—
Reply to this email directly, view it on GitHub
<#3419 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCUZAUV3347KT7Z3DJSTWQYCOLANCNFSM6AAAAAATLTT22M>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
faf98d7
to
9affa5c
Compare
@amithadiraju1694 is this new PR going to break your existing code? |
Signed-off-by: Miles Adkins <[email protected]>
@adchia this might cause a breaking change for a single user, but he has been unresponsive |
Signed-off-by: miles.adkins [email protected]
What this PR does / why we need it:
Add spark output to snowflake
Which issue(s) this PR fixes:
Fixes #3364