Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for varbinary data type #33

Closed
BuzzCutNorman opened this issue Mar 7, 2023 · 4 comments · Fixed by #37
Closed

feat: Support for varbinary data type #33

BuzzCutNorman opened this issue Mar 7, 2023 · 4 comments · Fixed by #37

Comments

@BuzzCutNorman
Copy link
Owner

BuzzCutNorman commented Mar 7, 2023

Binary data types of variable length.

https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql?view=sql-server-ver16

https://docs.sqlalchemy.org/en/14/core/type_basics.html#sqlalchemy.types.VARBINARY
Construct a VARBINARY type.

Parameters:
length – optional, a length for the column for use in DDL statements, for those binary types that accept a length, such as the MySQL BLOB type.

filestream=False –

if True, renders the FILESTREAM keyword in the table definition. In this case length must be None or 'max'.

New in version 1.4.31.

@BuzzCutNorman
Copy link
Owner Author

This is the test table I came up with. I used a Meltano logo image from their website as the binary file I loaded into TestColumn1.

use [testdata]
go
/*********************************
Create the simple test table
in an MSSQL database
*********************************/
DROP TABLE IF EXISTS [TestTable];
CREATE TABLE TestTable (
    Id int IDENTITY(1,1) PRIMARY KEY,
    TestColumn1 VARBINARY(MAX)
);
go
/*********************************
Insert some test data into the
test table
*********************************/
INSERT INTO [testdata].[dbo].[TestTable]
           ([TestColumn1])
     VALUES
           ((SELECT * FROM OPENROWSET(BULK N'C:\Development\Work\test-sql-varbinary\meltano-logo.png', SINGLE_BLOB) as T1))
;

@BuzzCutNorman
Copy link
Owner Author

I need a way to test how the picture looks in the table after it is loaded and then view the picture again after the EL is over. Here is the simple program I used to test with.

import io
import sqlalchemy as sqla
import matplotlib.pyplot as plt
import matplotlib.image as mpimg


engine = sqla.create_engine("mssql+pyodbc://[username]:[password]@[server]:1433/[database name]?driver=ODBC+Driver+17+for+SQL+Server&TrustServerCertificate=yes")

# Create the Metadata Object
meta_data = sqla.MetaData()
 
# Get the `TestTable` table from the Metadata object
TestTable = sqla.Table('TestTable', meta_data, autoload_with=engine)

# SQLAlchemy Query to SELECT specific column
query = TestTable.select()

# Fetch one record
with engine.connect() as connection:
    result = connection.execute(TestTable.select()).fetchone()
 
# View the record
print(type(result[1]))
print(result)
image = result[1]

i = io.BytesIO(image)
i = mpimg.imread(i, format='PNG')

plt.imshow(i)
plt.show()

@BuzzCutNorman
Copy link
Owner Author

BuzzCutNorman commented Mar 13, 2023

After looking at the JSON Schema documentation I thought I would use the keywords string and contentEncoding to define a binary, image, or varbinary data type. On the target side this definition will be translated to varbinary. The documentation says that normally encoding of pictures is base64 so I will try and match that.

https://json-schema.org/understanding-json-schema/reference/non_json_data.html#contentencoding

# VARBINARY(MAX) definition
return {
    "type": ["string"],
    "contentEncoding": "base64",
}

# VARBINARY(N) definition
return {
    "type": ["string"],
    "contentEncoding": "base64",
    "maxLength": maxLength
}

@BuzzCutNorman
Copy link
Owner Author

While trying to add support for varbinary I found a couple of things that would need to be added to the SDK to allow for the JSON Scheme definition I chose.

First in singer_sdk._singerlib.schema the Class itself needs to have the following variable added to the Schema class. Then the STANDARD_KEYS needs a corresponding entry added as well. Once these are added the keyword contentEncoding will be present in the schema message for the target to consume.

STANDARD_KEYS = [
...
    "contentEncoding",
...
]

@dataclass
class Schema:
...
    contentEncoding: str | None = None

    def to_dict(self) -> dict[str, t.Any]:

Second, we need add the encoding code to _conform_primitive_property() in singer_sdk.helpers._typing .

def _conform_primitive_property(elem: Any, property_schema: dict) -> Any:
    """Converts a primitive (i.e. not object or array) to a json compatible type."""
...
    elif isinstance(elem, bytes):
...
        if property_schema.get("contentEncoding") == "base64":
            return base64.b64encode(elem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant