Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pre-existing session support for dynamodb catalog #104

Merged
merged 5 commits into from
Jan 18, 2024

Conversation

waifairer
Copy link
Contributor

Looks like Dynamo is lacking some properties which are documented for other AWS-centric catalog types.

@Fokko
Copy link
Contributor

Fokko commented Oct 25, 2023

Thanks @waifairer for raising this! Could you also update the docs under mkdocs/docs/configuration.md?

@waifairer
Copy link
Contributor Author

Done, thanks @Fokko

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waifairer for the contribution! It looks good to me! Just have one comment about adding a unit test for this.

aws_secret_access_key=properties.get("aws_secret_access_key"),
aws_session_token=properties.get("aws_session_token"),
)
self.dynamodb = session.client(DYNAMODB_CLIENT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a unit test for this update? Like this one in GlueCatalog's test:

@mock_glue
def test_passing_profile_name() -> None:
session_properties: Dict[str, Any] = {
"aws_access_key_id": "abc",
"aws_secret_access_key": "def",
"aws_session_token": "ghi",
"region_name": "eu-central-1",
"profile_name": "sandbox",
"botocore_session": None,
}
test_properties = {"type": "glue"}
test_properties.update(session_properties)
with mock.patch("boto3.Session") as mock_session:
test_catalog = GlueCatalog("glue", **test_properties)
mock_session.assert_called_with(**session_properties)
assert test_catalog.glue is mock_session().client()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @HonahX , thanks for the suggestion. I added the unit test as you requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test!

catalog:
default:
type: dynamodb
table-name: iceberg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that we're mixing case here 😭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah.. i mean the underscores are pretty typical aws syntax but iceberg has always used hyphens afaik

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an inconsistency since the FileIO configurations use hyphens for s3 credentials:

| s3.access-key-id | admin | Configure the static secret access key used to access the FileIO. |
| s3.secret-access-key | password | Configure the static session token used to access the FileIO. |

Also, what about adding a dynamo. prefix to these credential keys?

dynamodb.access-key-id
dynamodb.secret-access-key
...

This could help clarify that these keys are used exclusively by the catalog, not the FileIO:

Just thinking ahead, if we're in favor of renaming these configurations, perhaps it's something we could roll out in a separate PR, especially considering updates to the GlueCatalog configurations that would follow.

I'd love to hear your thoughts on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HonahX Definitely agreed with dynamo as a prefix.

As for hyphens vs underscores, AWS is really consistent about using underscores. I'm of the opinion that the AWS-based credentials should support both underscores and hyphens, will prefer hyphens if present, but fall back to the underscore usages if necessary. Documentation should only present the hyphenated case as an option. I believe that this strategy would lead to the least number of "head banging" debug sessions. However, I think a reasonable case could be made to remove underscore support instead of supporting it as a fallback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waifairer Yes, I totally agree. Let's do that in a separate PR.

@@ -110,7 +118,7 @@ def _dynamodb_table_exists(self) -> bool:
return False
except self.dynamodb.exceptions.InternalServerError as e:
raise GenericDynamoDbError(e.message) from e

print(response["Table"]["TableStatus"])
Copy link
Contributor

@HonahX HonahX Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you please remove this print?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a nit at all, thanks for the catch

aws_secret_access_key=properties.get("aws_secret_access_key"),
aws_session_token=properties.get("aws_session_token"),
)
self.dynamodb = session.client(DYNAMODB_CLIENT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test!

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just adding some thoughts on the configuration naming

catalog:
default:
type: dynamodb
table-name: iceberg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an inconsistency since the FileIO configurations use hyphens for s3 credentials:

| s3.access-key-id | admin | Configure the static secret access key used to access the FileIO. |
| s3.secret-access-key | password | Configure the static session token used to access the FileIO. |

Also, what about adding a dynamo. prefix to these credential keys?

dynamodb.access-key-id
dynamodb.secret-access-key
...

This could help clarify that these keys are used exclusively by the catalog, not the FileIO:

Just thinking ahead, if we're in favor of renaming these configurations, perhaps it's something we could roll out in a separate PR, especially considering updates to the GlueCatalog configurations that would follow.

I'd love to hear your thoughts on this.

@Fokko Fokko added this to the PyIceberg 0.6.0 release milestone Dec 13, 2023
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait @waifairer LGTM 🙌 Thanks for working on this, and @HonahX thanks for the review

catalog:
default:
type: dynamodb
table-name: iceberg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waifairer Yes, I totally agree. Let's do that in a separate PR.

@Fokko Fokko merged commit d796878 into apache:main Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants