Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDB WASM doesn't support all S3 options #1186

Closed
2 tasks done
rmoff opened this issue Mar 15, 2023 · 4 comments
Closed
2 tasks done

DuckDB WASM doesn't support all S3 options #1186

rmoff opened this issue Mar 15, 2023 · 4 comments

Comments

@rmoff
Copy link

rmoff commented Mar 15, 2023

What happens?

DuckDB WASM is missing several S3 configuration options which limit its use:

s3_url_compatibility_mode     
s3_url_style                  
s3_uploader_thread_limit      
s3_uploader_max_parts_per_file
s3_uploader_max_filesize      
s3_use_ssl                    

To Reproduce

DuckDB WASM (from https://shell.duckdb.org/):

duckdb> SELECT * FROM duckdb_settings() where name ilike 's3%';
┌──────────────────────┬───────┬────────────────────────────────────────┬────────────┐
│ name                 ┆ value ┆ description                            ┆ input_type │
╞══════════════════════╪═══════╪════════════════════════════════════════╪════════════╡
│ s3_endpoint          ┆       ┆ S3 Endpoint (default s3.amazonaws.com) ┆ VARCHAR    │
│ s3_session_token     ┆       ┆ S3 Session Token                       ┆ VARCHAR    │
│ s3_secret_access_key ┆       ┆ S3 Access Key                          ┆ VARCHAR    │
│ s3_access_key_id     ┆       ┆ S3 Access Key ID                       ┆ VARCHAR    │
│ s3_region            ┆       ┆ S3 Region                              ┆ VARCHAR    │
└──────────────────────┴───────┴────────────────────────────────────────┴────────────┘

DuckDB 0.7.1:

v0.7.1 b00b93f0b1
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
⚫◗
⚫◗ SELECT * FROM duckdb_settings() where name ilike 's3%';
┌────────────────────────────────┬──────────────────┬─────────────────────────────────────────────────────────────────────┬────────────┐
│              name              │      value       │                             description                             │ input_type │
│            varcharvarcharvarcharvarchar   │
├────────────────────────────────┼──────────────────┼─────────────────────────────────────────────────────────────────────┼────────────┤
│ s3_url_compatibility_mode      │ 0                │ Disable Globs and Query Parameters on S3 urls                       │ BOOLEAN    │
│ s3_url_style                   │ vhost            │ S3 url style ('vhost' (default) or 'path')                          │ VARCHAR    │
│ s3_endpoint                    │ s3.amazonaws.com │ S3 Endpoint (default 's3.amazonaws.com')                            │ VARCHAR    │
│ s3_uploader_thread_limit       │ 50               │ S3 Uploader global thread limit (default 50)                        │ UBIGINT    │
│ s3_session_token               │                  │ S3 Session Token                                                    │ VARCHAR    │
│ s3_uploader_max_parts_per_file │ 10000            │ S3 Uploader max parts per file (between 1 and 10000, default 10000) │ UBIGINT    │
│ s3_uploader_max_filesize       │ 800GB            │ S3 Uploader max filesize (between 50GB and 5TB, default 800GB)      │ VARCHAR    │
│ s3_use_ssl                     │ 1                │ S3 use SSL (default true)                                           │ BOOLEAN    │
│ s3_region                      │                  │ S3 Region                                                           │ VARCHAR    │
│ s3_access_key_id               │                  │ S3 Access Key ID                                                    │ VARCHAR    │
│ s3_secret_access_key           │                  │ S3 Access Key                                                       │ VARCHAR    │
├────────────────────────────────┴──────────────────┴─────────────────────────────────────────────────────────────────────┴────────────┤
│ 11 rows                                                                                                                    4 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

OS:

MacOS / shell.duckdb.org

DuckDB Version:

0.7.1 (CLI) / v0.0.1-dev0 (wasm)

DuckDB Client:

CLI / shell.duckdb.org

Full Name:

Robin Moffatt

Affiliation:

Treeverse

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@rmoff
Copy link
Author

rmoff commented Mar 15, 2023

I just realised that this should maybe be logged in the https://github.com/duckdb/duckdb-wasm repo instead, but I don't have permission to transfer it.

@Mytherin Mytherin transferred this issue from duckdb/duckdb Mar 15, 2023
@carlopi
Copy link
Collaborator

carlopi commented Mar 15, 2023

Thanks for raising this issue, it's useful to keep track.

HTTPS/S3 layer is currently special cased for DuckDB Wasm given the different capability of the Web platform, and there are currently some question marks on how to uniform duckdb and duckdb-wasm on this.

@carlopi
Copy link
Collaborator

carlopi commented Mar 23, 2023

Hi @rmoff, we have put up a very experimental deployment at https://shellwip.duckdb.org/ that include loading of extensions.

LOAD httpfs;

works, and functions/settings appears to be exported. I expect this to be still not functional though, and some specific work will be required, but there is a path for the S3 logic to be unified between main duckdb and duckdb-wasm.

Here (https://github.com/duckdb/duckdb-wasm-wip#readme) there are some explanations on Wasm extension loading, if you have any feedback is very welcome.

@carlopi
Copy link
Collaborator

carlopi commented Mar 23, 2023

I also added a tracking issue on extension loading: #1202.

S3 is kind of special, in the sense that more tighter integration is required with duckdb-wasm, but potentially could you give your feedback there, or open a separate issue about support for S3 WITH httpfs extension loaded?

The point is that going forward I think the proper solution is unify S3 support via loading (potentially at startup / JavaScript side) HTTPFS extension if needed.

@carlopi carlopi closed this as completed Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants