T-SQL HASHBYTES not Converted #1508

cjkoester · 2023-04-29T17:21:27Z

T-SQL HASHBYTES function is not replaced when converting to Spark SQL.

import sqlglot

qry = 'SELECT HASHBYTES('SHA2_256', 'input') as hash'
sqlglot.transpile(qry, read="tsql", write="spark")[0]

Output:

SELECT HASHBYTES('SHA2_256', 'input') AS hash

Expected output:

SELECT SHA2('input', 256) AS hash

This conversion is complicated by the fact that HASHBYTES and it's algorithm arguments can translate to different functions in Spark SQL:

T-SQL	Spark SQL
HASHBYTES('SHA1', 'input')	SHA1('input')
HASHBYTES('SHA2_256', 'input')	SHA2('input', 256)
HASHBYTES('SHA2_512', 'input')	SHA2('input', 512)
HASHBYTES('MD5', 'input')	MD5('input')

It's also important to note that equivalent functions in T-SQL and Spark SQL don't produce matching outputs since HASHBYTES returns VARBINARY and Spark SQL equivalents (sha1, sha2, etc.) return hex strings. This may or may not be important depending on the project.

The text was updated successfully, but these errors were encountered:

tobymao · 2023-04-29T17:39:54Z

thanks for the clear input and output. we’ll have this in soon.

should we convert hex strings into binary? does spark support binary types?

cjkoester · 2023-04-29T19:12:03Z

Thanks for sharing this library and your quick response!

There is a binary type in Spark, but at the moment I'm not sure how to get an equivalent result to HASHBYTES in Spark, or if the added complexity is warranted. My interest in this involves data warehouse migrations, where the hashes aren't necessarily required to match between systems.

It is trivial to modify T-SQL to match Spark, but that is the reverse of this scenario.

The T-SQL below returns the same result as SELECT sha2('input', 256) as hash in Spark.

SELECT lower(convert(char(64), HASHBYTES('SHA2_256', 'input'), 2)) as hash

cjkoester changed the title ~~T-SQL Hashbytes not Converted~~ T-SQL HASHBYTES not Converted Apr 29, 2023

tobymao closed this as completed in 2dcbc7f Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T-SQL HASHBYTES not Converted #1508

T-SQL HASHBYTES not Converted #1508

cjkoester commented Apr 29, 2023 •

edited

Loading

tobymao commented Apr 29, 2023

cjkoester commented Apr 29, 2023

T-SQL HASHBYTES not Converted #1508

T-SQL HASHBYTES not Converted #1508

Comments

cjkoester commented Apr 29, 2023 • edited Loading

tobymao commented Apr 29, 2023

cjkoester commented Apr 29, 2023

cjkoester commented Apr 29, 2023 •

edited

Loading