You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
T-SQL HASHBYTES function is not replaced when converting to Spark SQL.
importsqlglotqry='SELECT HASHBYTES('SHA2_256', 'input') as hash'sqlglot.transpile(qry, read="tsql", write="spark")[0]
Output:
SELECT HASHBYTES('SHA2_256', 'input') AS hash
Expected output:
SELECT SHA2('input', 256) AS hash
This conversion is complicated by the fact that HASHBYTES and it's algorithm arguments can translate to different functions in Spark SQL:
T-SQL
Spark SQL
HASHBYTES('SHA1', 'input')
SHA1('input')
HASHBYTES('SHA2_256', 'input')
SHA2('input', 256)
HASHBYTES('SHA2_512', 'input')
SHA2('input', 512)
HASHBYTES('MD5', 'input')
MD5('input')
It's also important to note that equivalent functions in T-SQL and Spark SQL don't produce matching outputs since HASHBYTES returns VARBINARY and Spark SQL equivalents (sha1, sha2, etc.) return hex strings. This may or may not be important depending on the project.
The text was updated successfully, but these errors were encountered:
cjkoester
changed the title
T-SQL Hashbytes not Converted
T-SQL HASHBYTES not Converted
Apr 29, 2023
Thanks for sharing this library and your quick response!
There is a binary type in Spark, but at the moment I'm not sure how to get an equivalent result to HASHBYTES in Spark, or if the added complexity is warranted. My interest in this involves data warehouse migrations, where the hashes aren't necessarily required to match between systems.
It is trivial to modify T-SQL to match Spark, but that is the reverse of this scenario.
The T-SQL below returns the same result as SELECT sha2('input', 256) as hash in Spark.
SELECT lower(convert(char(64), HASHBYTES('SHA2_256', 'input'), 2)) as hash
T-SQL HASHBYTES function is not replaced when converting to Spark SQL.
Output:
SELECT HASHBYTES('SHA2_256', 'input') AS hash
Expected output:
SELECT SHA2('input', 256) AS hash
This conversion is complicated by the fact that HASHBYTES and it's algorithm arguments can translate to different functions in Spark SQL:
It's also important to note that equivalent functions in T-SQL and Spark SQL don't produce matching outputs since HASHBYTES returns VARBINARY and Spark SQL equivalents (sha1, sha2, etc.) return hex strings. This may or may not be important depending on the project.
The text was updated successfully, but these errors were encountered: