Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] tcp connect error reading from a public s3 bucket with {"anon": "true"} #1554

Open
j-bennet opened this issue Jul 22, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@j-bennet
Copy link

Environment

Delta-rs version: 0.10.0

Binding: Python

Environment:

  • Cloud provider:
  • OS: macOS Ventura 13.4
  • Other:

Bug

What happened:

Can't read a table from a public s3 bucket:

from deltalake import DeltaTable
storage_options = {"AWS_REGION": "us-east-2", "anon": "true"}
dt = DeltaTable("s3://coiled-datasets/h2o-delta/N_1e7_K_1e2/", storage_options=storage_options)

Error looks like this:

Traceback (most recent call last):
  File "/Users/jbennet/src/dask-deltatable/t7.py", line 10, in <module>
    dt = DeltaTable(uri, storage_options=storage_options)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jbennet/mambaforge/envs/dask-deltatable/lib/python3.11/site-packages/deltalake/table.py", line 238, in __init__
    self._table = RawDeltaTable(
                  ^^^^^^^^^^^^^^
OSError: Generic S3 error: response error "request error", after 0 retries: error sending request for url (http://169.254.169.254/latest/api/token): error trying to connect: tcp connect error: No route to host (os error 65)

Setting AWS_ENDPOINT_URL doesn't help.

What you expected to happen:

DeltaTable instance initialized.

How to reproduce it:

Code snippet above.

More details:

@j-bennet j-bennet added the bug Something isn't working label Jul 22, 2023
@ognis1205
Copy link
Contributor

Relating issues:
#809

Relating threads:
https://delta-users.slack.com/archives/C013LCAEB98/p1688688536894189

@rtyler
Copy link
Member

rtyler commented Sep 20, 2023

I can certainly confirm that this still exists. This isn't a problem in the Python or Rust layer, but in fact a problem with object_store. Here's an example that reproduces it:

use object_store::aws::AmazonS3Builder;
use object_store::ObjectStore;
use futures::stream::StreamExt;

#[tokio::main]
async fn main() -> deltalake::DeltaResult<()> {
    // s3://coiled-datasets/h2o-delta/N_1e7_K_1e2/
    let s3 = AmazonS3Builder::from_env()
        .with_bucket_name("coiled-datasets")
        .with_region("us-east-2")
        .build()?;

    let mut stream = s3.list(None).await?;
    println!("Reading list stream");

    while let Some(result)= stream.next().await {
        println!("listed: {result:?}");
    }

    Ok(())
}

Output

Reading list stream
listed: Err(Generic { store: "S3", source: Error { retries: 1, message: "request error", source: Some(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(169.254.169.254)), port: None, path: "/latest/api/token", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 110, kind: TimedOut, message: "Connection timed out" })) }), status: None } })

The origination seems to come from here. Basically the object_store crate does not accept the possibility of credentials missing and that being okay at the moment, so an upstream fix is going to need to be made.

@rtyler rtyler self-assigned this Sep 20, 2023
jaychia added a commit to Eventual-Inc/Daft that referenced this issue Mar 5, 2024
DeltaLake SDK does not support anonymous mode access (see: issue
delta-io/delta-rs#1554)

We throw an error if a user attempts to supply `anonymous=True`.

---------

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants