Question: how to speed up iceberg reads #55

michaelzwong · 2024-05-31T23:19:13Z

I'm currently reading from my glue-cataloged iceberg table using the following:

duckdb.sql(
    f"""
           INSTALL httpfs;
           LOAD httpfs;
           set s3_region = 'us-west-2';
           set s3_access_key_id = '{settings.AWS_ACCESS_KEY_ID}';
           set s3_secret_access_key = '{settings.AWS_SECRET_ACCESS_KEY}';
           INSTALL iceberg;
           LOAD iceberg;
           """
)
res = duckdb.execute(
	  "SELECT * FROM iceberg_scan('s3://foopath) LIMIT 100"
)

The execution is very slow compared to just reading from the .parquet files at the same path (eg. 2 minutes vs 2 seconds).

res = duckdb.execute(
	  "SELECT * FROM parquet_scan('s3://foopath/*.parquet) LIMIT 100"
)

Would like to know what I'm doing wrong or if someone has a solution

The text was updated successfully, but these errors were encountered:

harel-e · 2024-06-04T09:14:31Z

Hi,

I would first suggest to execute the query with 'explain analyze' and post the results here.
The cause might be issue #2, where more parquet files are scanned than necessary.

michaelzwong changed the title ~~Question:~~ Question: how to speed up iceberg reads May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: how to speed up iceberg reads #55

Question: how to speed up iceberg reads #55

michaelzwong commented May 31, 2024

harel-e commented Jun 4, 2024

Question: how to speed up iceberg reads #55

Question: how to speed up iceberg reads #55

Comments

michaelzwong commented May 31, 2024

harel-e commented Jun 4, 2024