-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with read_iceberg
function when using an s3 warehouse location
#2004
Comments
I think I know what this might be. We're may not be handling field metadata correctly... Could you try this code that I modified from your snippet and print the output please? import daft
from pyiceberg.catalog.sql import SqlCatalog
catalog = SqlCatalog(
"default",
**{
"uri": "postgresql://my-postgresql-server-url",
"warehouse": "s3://my-bucket/warehouse",
},
)
table = catalog.load_table("test_database.test_table")
# Inspect the schema and fields
from pyiceberg.io.pyarrow import schema_to_pyarrow
from daft.logical.schema import Schema
iceberg_schema = table.schema()
arrow_schema = schema_to_pyarrow(table.schema())
daft_schema = Schema.from_pyarrow_schema(arrow_schema)
print("Iceberg Schema:\n", iceberg_schema)
print("Converted arrow schema:\n", arrow_schema)
print("Converted Daft schema:\n", daft_schema) |
Thanks for taking a look, here is the output:
|
@samster25 to reproduce: from pyiceberg.catalog.sql import SqlCatalog
warehouse_path = "s3://eventual-data-test-bucket/test-iceberg-issue-2004/"
catalog = SqlCatalog(
"default",
**{
"uri": f"sqlite:///pyiceberg_test_catalog.db",
"warehouse": warehouse_path,
},
)
catalog_table = catalog.load_table("test_database.test_table")
daft.read_iceberg(catalog_table).to_pandas() Second try: |
* Ignores metadata when comparing Fields * Adds test that reads table written by pyiceberg closes: #2004
@maxime-petitjean Just merged in a fix to this issue! We plan to release the next version of daft tomorrow! Will ping you when it's out :) |
@maxime-petitjean we just cut |
Thanks a ton for your help guys, this is awesome. |
Tested and working, thanks! |
Describe the bug
When reading a table from iceberg (with function
read_iceberg
), I get a weird schema error:This error appears only if I use an s3 warehouse location in iceberg.
To Reproduce
Expected behavior
If I use in
warehouse
configuration a local path (file:///tmp/warehouse
), I have the correct result:Desktop:
The text was updated successfully, but these errors were encountered: