Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_deltalake attempts to use S3 credentials for local files #2879

Closed
apostolos-geyer opened this issue Sep 21, 2024 · 9 comments
Closed

read_deltalake attempts to use S3 credentials for local files #2879

apostolos-geyer opened this issue Sep 21, 2024 · 9 comments
Assignees

Comments

@apostolos-geyer
Copy link

Describe the bug
A clear and concise description of what the bug is.

When attempting to read a local deltalake, daft will log multiple errors and attempt to retrieve S3 credentials, create a client for us-east-1, etc. Not sure if this is a bug or if there is some behaviour or configuration for daft to understand I'm working with local files and not to try to use S3, but I couldn't find anything about this in the docs. The file is still read successfully, but it would be nice to not have to wait for it to fail to get a session token, and attempt to create an S3 client.

To Reproduce
Steps to reproduce the behavior:

import daft
daft.read_deltalake('path/to/a/local/file')

output:

failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))
failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))
S3 Credentials not provided or found when making client for us-east-1! Reverting to Anonymous mode. the credential provider was not enabled

Expected behavior
A clear and concise description of what you expected to happen.

The local deltalake should be read without attempting to use S3 or any other network locations and without logging errors.

Screenshots
If applicable, add screenshots to help explain your problem.

Screenshot 2024-09-21 at 2 03 36 PM

Desktop (please complete the following information):
Screenshot 2024-09-21 at 2 07 41 PM

  • Daft Version: 0.3.2

If you guys are looking for contributors, I'd be happy to try and fix this myself. Never contributed to anything before so not sure if there's any procedures but if I can I'll give it a shot.

@jaychia
Copy link
Contributor

jaychia commented Sep 21, 2024

Thanks @apostolos-geyer .... Good catch!

This definitely seems like a bug. Would LOVE to take a contribution ❤️

Here are some quick tips:

We probably will want to only do the detection of credentials (S3Config.from_env()) only if the Delta path provided is an S3 path.

@jaychia
Copy link
Contributor

jaychia commented Sep 21, 2024

Feel free to shoot us any questions about contributing :)

@jaychia
Copy link
Contributor

jaychia commented Oct 7, 2024

Hey @apostolos-geyer any progress on this? Otherwise we might have to find someone on our end to fix it :)

@apostolos-geyer
Copy link
Author

Hi sorry, had just started a co-op and had some other work and this slipped my mind. I will try it this week. ty

@apostolos-geyer
Copy link
Author

@jaychia nevermind man too much on my plate right now my apologies for the hold up.

ill be back tho

@apostolos-geyer
Copy link
Author

didnt mean to close

@kevinzwang
Copy link
Member

kevinzwang commented Oct 15, 2024

This might actually be fixed by #3025 since it no longer attempts to populate the storage config with AWS parameters when it is not an s3 path. I tried it and it no longer seems to raise the same error anymore. However, I am still seeing this, will investigate:

daft.exceptions.DaftCoreException: DaftError::External Internal IO Error when Opening: /0-67ef78a7-31e3-4fa2-b6f1-859174a60a06-0.parquet:
Details:
No such file or directory (os error 2)

@kevinzwang
Copy link
Member

Ah that turned out to be an unrelated issue with the metadata we write out with DataFrame.write_deltalake. @apostolos-geyer could you give it a try either by building Daft or updating to v0.3.9 once it comes out?

@kevinzwang
Copy link
Member

v0.3.9 is released with the fix so I am closing this issue. If it still does not work for you feel free to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants