-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Typesafe(r) properties #599
Comments
We've been through some iterations with this, our first idea was to use enums which did not work well since it's not really possible to have compile-time type checks attached to each variant. The idea behind it is that something like this should already fail at compile time: config.insert(NEEDS_URL, "not-a-url"); The current design aims to prevent errors at compile time when programmatically constructing properties, that is, if a property is expected to a In some cases, that's not preventable, e.g. when we're on the receiving end of a REST request and some client decided to send a non In general, I think it's worthy to try and make as much of the |
Thanks @c-thiel for raising this. I love this idea of type safe properties and believe this is the right direction to go. I took a look at your reference, but I didn't get a good understanding of how the api looks like. Would you mind to show some code examples of the usage? For example, what rest/sql catalog properties would look like? In fact, we can leave macros to last part since it's a tool to reduce duplication and maintaince effort. |
Hi @liurenjie1024, config.insert(&s3::AccessKeyId(access_key_id));
config.insert(&s3::SecretAccessKey(secret_access_key));
config.insert(&s3::SessionToken(session_token)); config.insert(&custom::CustomConfig {
key: self.iceberg_sas_property_key(),
value: sas,
}); would be two locations showcasing writing of properties. The first location uses well-known keys, so we can make use of our type-safe properties. The second example has dynamic keys for azdls sas tokens, hence we use the https://github.com/hansetag/iceberg-catalog/blob/213560059d69248f6f67a979ce2cc3c1e808602f/crates/iceberg-catalog/src/catalog/tables.rs#L1039-L1043 constructs the config struct using the unchecked path and then fetches the location from it. The access to the underlying storage works elegantly through generics e.g.: self.get_prop_opt::<Location>(); The creation API of TableProperties can be seen here: https://github.com/hansetag/iceberg-catalog/blob/213560059d69248f6f67a979ce2cc3c1e808602f/crates/iceberg-ext/src/configs/table.rs. Declaration looks like this: // creates Properties struct & marker trait
impl_properties!(TableProperties, TableProperty);
pub mod s3 {
use super::super::ConfigProperty;
use super::{ConfigParseError, NotCustomProp, ParseFromStr, TableProperties, TableProperty};
use crate::configs::impl_config_values;
use url::Url;
impl_config_values!(
Table, // prefix used for the property newtypes
{
// Suffix of the newtype, type of the newtype, key of the newtype, name of the accessor function
Region, String, "s3.region", "s3_region";
Endpoint, Url, "s3.endpoint", "s3_endpoint";
PathStyleAccess, bool, "s3.path-style-access", "s3_path_style_access";
AccessKeyId, String, "s3.access-key-id", "s3_access_key_id";
SecretAccessKey, String, "s3.secret-access-key", "s3_secret_access_key";
SessionToken, String, "s3.session-token", "s3_session_token";
RemoteSigningEnabled, bool, "s3.remote-signing-enabled", "s3_remote_signing_enabled";
Signer, String, "s3.signer", "s3_signer";
SignerUri, String, "s3.signer.uri", "s3_signer_uri";
}
);
} Expanding both the impl_properties!#[derive(Debug, PartialEq, Default)]
pub struct TableProperties {
props: std::collections::HashMap<String, String>,
}
pub trait TableProperty: ConfigProperty {}
impl TableProperties {
/// Inserts a property into the configuration.
///
/// If the property already exists, the previous value is returned.
pub fn insert<S: TableProperty>(&mut self, pair: &S) -> Option<S::Type> {
let prev = self
.props
.insert(pair.key().to_string(), pair.value_to_string());
prev.and_then(|v| S::parse_value(v.as_str()).ok())
}
/// Removes a property from the configuration.
///
/// If the property exists, the value is returned.
/// If the property exists but cannot be parsed to the expected type, `None` is returned.
pub fn remove<S: TableProperty + NotCustomProp>(&mut self) -> Option<S::Type> {
self.props
.remove(S::KEY)
.and_then(|v| S::parse_value(v.as_str()).ok())
}
/// Removes a property from the configuration.
///
/// If the property exists, the value is returned.
pub fn remove_untyped(&mut self, key: &str) -> Option<String> {
self.props.remove(key)
}
#[must_use]
/// Get a property from the configuration.
///
/// If the property exists but cannot be parsed to the expected type, `None` is returned.
/// If you want to fail on unparsable values, use `get_prop_fallible` .
pub fn get_prop_opt<C: TableProperty + NotCustomProp>(&self) -> Option<C::Type> {
self.props
.get(C::KEY)
.and_then(|v| ParseFromStr::parse_value(v.as_str()).ok())
}
#[must_use]
/// Get a property from the configuration.
///
/// If the property does not exist `None` is returend
///
/// # Errors
///
/// If the property exists but cannot be parsed to the expected type, a `ConfigParseError` is returned.
pub fn get_prop_fallible<C: TableProperty + NotCustomProp>(
&self,
) -> Option<Result<C::Type, ConfigParseError>> {
self.props
.get(C::KEY)
.map(|v| ParseFromStr::parse_value(v.as_str()))
.map(|r| r.map_err(|e| e.for_key(C::KEY)))
}
#[must_use]
/// Get a custom property from the configuration.
///
/// A custom property is a property that is either dynamic in its key or a property that
/// is not part of the standard interface. As such it is not bound to a specific type.
/// Both key and value are strings.
pub fn get_custom_prop(&self, key: &str) -> Option<String> {
self.props.get(key).cloned()
}
/// Convenience constructor to create a new configuration from an optional list of
/// properties.
///
/// # Errors
///
/// Returns an error if a known key has an incompatible value.
pub fn try_from_maybe_props(
props: Option<impl IntoIterator<Item=(String, String)>>,
) -> Result<Self, ConfigParseError> {
match props {
Some(props) => Self::try_from_props(props),
None => Ok(Self::default()),
}
}
#[must_use]
/// Non validating constructor to create a new configuration from a list of properties.
///
/// Useful when going from wire format ( `HashMap<String, String>` ) to this type when
/// failing on a single invalid property is not desired. `get_prop_fallible` still
/// offers the possibility to have runtime errors above unexpected failures elsewhere.
pub fn from_props_unchecked(props: impl IntoIterator<Item=(String, String)>) -> Self {
let mut config = Self::default();
for (key, value) in props {
config.props.insert(key, value);
}
config
}
} impl_config_valuesuse crate::configs::impl_config_value;
#[derive(Debug, PartialEq, Clone)]
pub struct Endpoint(pub Url);
impl ConfigProperty for Endpoint {
const KEY: &'static str = "s3.endpoint";
type Type = Url;
fn value(&self) -> &Self::Type {
&self.0
}
fn into_value(self) -> Self::Type {
self.0
}
fn parse_value(value: &str) -> Result<Self::Type, ConfigParseError>
where
Self::Type: ParseFromStr,
{
Self::Type::parse_value(value).map_err(|e| e.for_key(Self::KEY))
}
}
impl NotCustomProp for Endpoint {}
impl TableProperty for Endpoint {}
impl TableProperties {
#[must_use]
pub fn s3_endpoint(&self) -> Option<Url> {
self.get_prop_opt::<Endpoint>()
}
pub fn insert_s3_endpoint(&mut self, value: Url) -> Option<Url> {
self.insert(&Endpoint(value))
}
}
...
#[derive(Debug, PartialEq, Clone)]
pub struct SignerUri(pub String);
impl ConfigProperty for SignerUri {
const KEY: &'static str = "s3.signer.uri";
type Type = String;
fn value(&self) -> &Self::Type {
&self.0
}
fn into_value(self) -> Self::Type {
self.0
}
fn parse_value(value: &str) -> Result<Self::Type, ConfigParseError>
where
Self::Type: ParseFromStr,
{
Self::Type::parse_value(value).map_err(|e| e.for_key(Self::KEY))
}
}
impl NotCustomProp for SignerUri {}
impl TableProperty for SignerUri {}
impl TableProperties {
#[must_use]
pub fn s3_signer_uri(&self) -> Option<String> {
self.get_prop_opt::<SignerUri>()
}
pub fn insert_s3_signer_uri(&mut self, value: String) -> Option<String> {
self.insert(&SignerUri(value))
}
}
pub(crate) fn validate(
key: &str,
value: &str,
) -> Result<(), ConfigParseError> {
Ok(match key {
Region::KEY => {
_ = Region::parse_value(value)?;
}
Endpoint::KEY => {
_ = Endpoint::parse_value(value)?;
}
PathStyleAccess::KEY => {
_ = PathStyleAccess::parse_value(value)?;
}
AccessKeyId::KEY => {
_ = AccessKeyId::parse_value(value)?;
}
SecretAccessKey::KEY => {
_ = SecretAccessKey::parse_value(value)?;
}
SessionToken::KEY => {
_ = SessionToken::parse_value(value)?;
}
RemoteSigningEnabled::KEY => {
_ = RemoteSigningEnabled::parse_value(value)?;
}
Signer::KEY => {
_ = Signer::parse_value(value)?;
}
SignerUri::KEY => {
_ = SignerUri::parse_value(value)?;
}
_ => {}
})
} |
Hi, thank you for bringing this up. I feel it's a bit too complex for us. Could we maintain a static structure like |
Hi @Xuanwo, to me that depends on how closed of a set the properties are. If they are a rather closed set, I'd also go for the static approach. Going with that, I think we could arrive at nice solution by replacing the It would still come with a large amount of boilerplate and the properties struct would have to be a rather complex nested structure of enums to avoid having a ton of |
I am not so sure about the closed set. Currently there is a massive amount of TableProperties in Java (https://iceberg.apache.org/javadoc/1.6.1/constant-values.html, search for TableProperties) and we only have a small set now. I have a tendency towards an open approach because:
But I also understand the concern of complexity. We have a good template with |
I also have concern that this approach is too complicated. Could we have sth like: #[config]
pub struct S3Properties {
// We could write a macro for this
#[config(key="s3.access.id")]
pub access_key: String,
pub access_key_id: String,
pub region: Url,
pub custom: HashMap<String, String>,
} |
We currently have property names scattered around the crate. If a property needs to be used, it typically needs to be parsed from string first. This is a potentially falling operation that should be done on load. Also defaults are hard to maintain as we don't have a type holding the property key, a type safe value and its default.
For our iceberg catalog, we are currently using a system inspired by http's HeaderValue, which is type safe and customizable / expandable for properties defined even in downstream crates. It depends heavily on macros to generate their types.
We are thinking about polishing what we have, add things like defaults, and PR it to iceberg-rust. As this is quite some work, I want to make sure that everybody agrees that this could be a good way to go forward, or if other preferred ideas are already floating around.
Looking forward for your opinion!
A very raw reference: https://github.com/hansetag/iceberg-catalog/blob/0a01eaf87f32e7f393f0d8f0d104594171dccfce/crates/iceberg-ext/src/configs/mod.rs#L12
The text was updated successfully, but these errors were encountered: