Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a remote parquet file with a simple WHERE clause results in loading more than twice its size. #1577

Open
1 task done
ericemc3 opened this issue Jan 13, 2024 · 3 comments

Comments

@ericemc3
Copy link

What happens?

My remote parquet file weighs 7,2 Mo.
If i read it with a simple WHERE, more than 15 Mo pass through the network.

To Reproduce

CREATE OR REPLACE TABLE t AS FROM 'https://static.data.gouv.fr/resources/tables-aufilduboamp-2024/20240113-061700/boamp-panorama-2024-parquet-integral.parquet' ;
=>7,2 Mo (Chrome devtools network inspector)

CREATE OR REPLACE TABLE t AS FROM 'https://static.data.gouv.fr/resources/tables-aufilduboamp-2024/20240113-061700/boamp-panorama-2024-parquet-integral.parquet'
WHERE P_35_Typemarche = 'SERVICES'  ;

15,6 Mo

OS:

Win11

DuckDB Version:

9.2

DuckDB Client:

shell wasm or cli

Full Name:

eric mauviere

Affiliation:

icem7

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@carlopi
Copy link
Collaborator

carlopi commented Jan 13, 2024

Thanks a lot for the bug report.

This do not reproduces in duckdb CLI, where in both cases 7.5MB go through the network as per EXPLAIN ANALYZE.

This is a problem specific to the duckdb-wasm implementation of get requests, needs to be solved there. It's pretty bad since the multiplier can be even worse.

@szarnyasg: can you move it to duckdb-wasm repository?

@szarnyasg szarnyasg transferred this issue from duckdb/duckdb Jan 13, 2024
@szarnyasg
Copy link
Contributor

Thanks @carlopi for chiming in. I moved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants