Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(project): Deserializable derive macro #1564

Merged
merged 14 commits into from
Jan 17, 2024

Conversation

arendjr
Copy link
Contributor

@arendjr arendjr commented Jan 14, 2024

Summary

This implements the Deserializable derive macro and replaces most of the manual Deserializable implementations with the macro. I have implemented some attributes to cover various edge cases I ran into across the current manual implementations:

  • deprecated - Allows the generated deserializer to emit diagnostics when the field is set. It doesn't alter any other behavior of the deserializer, so I had to add a migrate_deprecated_fields() method to LoadedConfiguration that "migrates" the indent_size usages to indent_width.
  • disallow_empty - Some deserializers would report diagnostics if an empty value was set. This behavior can be achieved with the macro by adding a disallow_empty attribute to the field.
  • from_none - I have already gotten rid of most NoneState usages in this PR, but for structs with a custom Default, the Deserializable derive can be instructed to initialize using the NoneState instead of Default. I expect this attribute (and NoneState altogether) can be dropped when the Partial derive is implemented in a follow-up PR.
  • passthrough_name - For passing down rule names (see docs).
  • rename - Similar to Serde's rename attribute (and it even supports Serde's attributes for types that also derive Serialize/Deserialize).

Extended documentation about the macro is also in biome_deserialize_macro/src/lib.rs in the derive's doc comment.

I'm also considering allowing users to manually implement a validate() function that would run after the deserialize() function. Probably through a DeserializableValidator trait, which the derive macro can be made aware of through another attribute. This would allow the auto-generated deserializers in rules.rs (and one or two others) to also be replaced with the derive macro.

Test Plan

CI should remain green.

Copy link

netlify bot commented Jan 14, 2024

Deploy Preview for biomejs canceled.

Name Link
🔨 Latest commit 4c8ec54
🔍 Latest deploy log https://app.netlify.com/sites/biomejs/deploys/65a83c3361549800086eee99

@github-actions github-actions bot added A-Project Area: project A-Linter Area: linter A-Formatter Area: formatter A-Tooling Area: internal tools L-JavaScript Language: JavaScript and super languages labels Jan 14, 2024
@@ -18,7 +18,6 @@ formatter_extraneous_field.json:3:3 deserialize ━━━━━━━━━━
- enabled
- formatWithErrors
- indentStyle
- indentSize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I let the macro filter out deprecated keys, since even though they're technically accepted, I don't think they're a very useful suggestion to the user.

})
DeserializationDiagnostic::new(
"You enabled the VCS integration, but you didn't specify a client.",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check here is an example of a use case that could be solved through a DeserializableValidator trait. If a struct contains a #[deserializable(with_validator)] annotation, we could let the generated deserializer call into a validate() function that would have the ability to reject the deserialized instance. But I haven't built that yet :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you notice other code that requires a validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for instance LineWidth which checks whether the u16 is within its own MIN and MAX. And also a lot of the types in the generated rules.rs have a similar kind of validation logic.

mod linter;
mod organize_imports;
mod overrides;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the rules submodule below gets cleaned up, which requires letting the generated rule structs also tap into the macro, it might make sense to just get rid of the configuration::parse module altogether. The few remaining manual deserializers could easily be implemented along the structs they implement.

Copy link

codspeed-hq bot commented Jan 14, 2024

CodSpeed Performance Report

Merging #1564 will not alter performance

Comparing arendjr:deserializable_derive (4c8ec54) with main (a46230d)

Summary

✅ 93 untouched benchmarks

Copy link
Member

@Conaclos Conaclos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance it looks good! I haven't taken the time to check the implementation yet. I left some suggestions for the docs.

@@ -183,200 +206,6 @@ assert_eq!(deserialized, None);
assert_eq!(diagnostics..len(), 1);
```

### Deserializing an enumeration of values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this doc. This is an important resource that allows understanding how this works under the hood. This also allows implementing an exotic deserializer that is not covered by the derive macro.

We could change the title to Implementing a deserializer for an enum. We could also add a note in a first paragraph telling that we now have a derive macro to generate a deserializer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly deleted this because I don't think it makes sense to maintain the code snippets so they stay in sync with the macro. Rust Analyzer offers inspection of macros, so you can easily peek under the hood if you want to.

But indeed I also removed some explanatory text, which I will try to reintroduce in way that makes more sense with the new situation. I left the section for deserializing unions in place, and I just notice there is actually now an easier way to implement that one too (by checking is_type() instead of implementing a visitor), so I will update that one and try to retrofit most of the explanatory text there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to keep at least one example to manually implement the deserialisation, like serde does.

}
```

### Deserializing a struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change the title to Implementing a deserializer for a struct. We could also add a note in a first paragraph telling that we now have a derive macro to generate a deserializer.

Comment on lines -82 to +80
allowed_invalid_roles: Vec<String>,
allow_invalid_roles: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the name of the property was changed? This will break user config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON property it maps to is allowInvalidRoles, so I changed the name in the struct to match it. This should actually prevent breaking the user config.

@Conaclos
Copy link
Member

Conaclos commented Jan 15, 2024

Some extra comments / suggestions:

  • Instead of creating our own deserializable(deprecated) attribute, could we reuse the rust standard deprecated attribute?
  • deserializable(passthrough_name) is only used once. I could keep a manual implementation of Deserializable for RuleWithOptions and remove the passthrough_name attribute
  • deserializable(disallow_empty) is only used twice. We could handle these cases via validation and we could drop thus deserializable(disallow_empty).
  • I hope we will get rid of from_none.

I'm also considering allowing users to manually implement a validate() function that would run after the deserialize() function. Probably through a DeserializableValidator trait, which the derive macro can be made aware of through another attribute. This would allow the auto-generated deserializers in rules.rs (and one or two others) to also be replaced with the derive macro.

It is common in Rust of using the newtype idom to add data validation. If we have only a few valdiation case, we could certainly use this idiom and manually implement Deserializable with a validation step.
Just as an example: to reject empty strings:

struct NonEmptyString(String);

impl NonEmptyString {
  fn new(inner: String) -> Self {
    assert!(!inner.is_empty(), "the string should not be empty");
    Self(inner)
  }
}

impl Deserializable for NonEmptyString {
  fn deserialize(value: &impl DeserializableValue, name: &str, diagnostics: &mut Vec<DeserializationDiagnostic>,) -> Option<Self> {
    let result = Deserializable::deserialize(value, name, diagnostics);
    if result.is_some_and(|result| result.is_empty()) {
      // emit diagnostic
    }
    result
  }
}

@arendjr
Copy link
Contributor Author

arendjr commented Jan 15, 2024

Instead of creating our own deserializable(deprecated) attribute, could we reuse the rust standard deprecated attribute?

I don't think that would be suitable for generating the type of diagnostics we want.

deserializable(passthrough_name) is only used once. I could keep a manual implementation of Deserializable for RuleWithOptions and remove the passthrough_name attribute

That would work too, although I actually quite like the attribute because it makes the behavior explicit. When a deserializer is implemented manually, there is a lot going on, and in a case like this it's easy to overlook why it's implemented manually. I only noticed it after tests started failing when I accidentally broke the functionality...

deserializable(disallow_empty) is only used twice. We could handle these cases via validation and we could drop thus deserializable(disallow_empty).

Yeah, I agree it's a good idea if we solve these through a more generic validation mechanism.

I hope we will get rid of from_none.

That's the plan!

It is common in Rust of using the newtype idom to add data validation. If we have only a few valdiation case, we could certainly use this idiom and manually implement Deserializable with a validation step.

I do think there's quite a few though, especially when considering the generated rule groups. It also feels a bit off that if you want validation, you need to implement a deserializer. Ideally it would like to keep the concerns separated, with deserialization covered as much by the macro as we can, while leaving validation to manual implementation.

@ematipico
Copy link
Member

The Rust standard deprecated attribute is for internal code, not user code.

Marking something deprecated will generate an error in clippy

@Conaclos
Copy link
Member

The Rust standard deprecated attribute is for internal code, not user code.

You are right. This should certainly be separated.

Marking something deprecated will generate an error in clippy

indent_size should also be internally deprecated because it is a public interface for crate users...

@Conaclos
Copy link
Member

deserializable(passthrough_name) is only used once. I could keep a manual implementation of Deserializable for RuleWithOptions and remove the passthrough_name attribute

That would work too, although I actually quite like the attribute because it makes the behavior explicit. When a deserializer is implemented manually, there is a lot going on, and in a case like this it's easy to overlook why it's implemented manually. I only noticed it after tests started failing when I accidentally broke the functionality...

I have to admit that I am on the fence here.

The name property has been designed to name the deserialized value in potential diagnostics. Ideally this should be a localized information. By providing a special attribute to deviate from this, I am a bit afraid of sending the wrong signal. However, some users of the crate already use name to pass the filename...

In the short-to-mean term, I would like to get rid of PossibleOptions. PossibleOptions has several design problems. If we get rid of PossibleOptions, we no longer have the internal use for deserializable(passthrough_name).

Maybe we could keep deserializable(passthrough_name) for now and revisit the need for it later.

It is common in Rust of using the newtype idom to add data validation. If we have only a few valdiation case, we could certainly use this idiom and manually implement Deserializable with a validation step.

I do think there's quite a few though, especially when considering the generated rule groups. It also feels a bit off that if you want validation, you need to implement a deserializer. Ideally it would like to keep the concerns separated, with deserialization covered as much by the macro as we can, while leaving validation to manual implementation.

I prefer the newtype idiom because it seems more idiomatic in Rust. This encodes constraints that the data must satisfy.
If we have too much valdiation, we could, indeed, introduce an attribute. For example deserializable(validate = validate_function):

#[derive(Deserializable)]
#[deserializable(validate = Options::validate)]
struct Options {
    name: String,
}

impl Options {
    fn validate(&self, diagnostics: Vec<...>) {
        if self.name.is_empty() {
            // emit diagnostic
        }
    }
}

@arendjr
Copy link
Contributor Author

arendjr commented Jan 16, 2024

In the short-to-mean term, I would like to get rid of PossibleOptions. PossibleOptions has several design problems. If we get rid of PossibleOptions, we no longer have the internal use for deserializable(passthrough_name).

Maybe if we use a generic for the options in RuleWithOptions that would solve the issue? I don't know what the implications of that for our JSON schema generation are though, so I'll leave that out of scope for this PR.

But yeah, if we don't need deserializable(passthrough_name) anymore, I'd be happy to remove it :)

I prefer the newtype idiom because it seems more idiomatic in Rust. This encodes constraints that the data must satisfy.
If we have too much valdiation, we could, indeed, introduce an attribute.

Yeah, I think I will indeed go with the NonEmptyString newtype for those use cases. And of course LineWidth is already a newtype, but there I want to extract the validation out of the deserializer. That way we can use the macro, while keeping the validation on the newtype as well. Only for complex custom types such as A11y I think it makes more sense to do the validation directly on the type than to introduce a newtype wrapper.

I was thinking of introducing a DeserializableValidator trait so that it is always clear what the signature of the method to implement is, and the IDE will be able to autocomplete it.

@arendjr
Copy link
Contributor Author

arendjr commented Jan 16, 2024

indent_size should also be internally deprecated because it is a public interface for crate users...

I can add that in this PR, I think, assuming CI won't complain :)

@arendjr
Copy link
Contributor Author

arendjr commented Jan 16, 2024

Hmm, bummer. I added the NonZeroString type, but I cannot add it to the biome_deserialize create, because then it cannot derive JsonSchema. It doesn't seem very reusable compared to an #[deserializable(validator = "non_empty")] annotation.

@Conaclos
Copy link
Member

Maybe if we use a generic for the options in RuleWithOptions that would solve the issue? I don't know what the implications of that for our JSON schema generation are though, so I'll leave that out of scope for this PR.

That's not so simple because not every rule has options.
Indeed, this is out of scope of this PR.

I was thinking of introducing a DeserializableValidator trait so that it is always clear what the signature of the method to implement is, and the IDE will be able to autocomplete it.

How the derive macro could know if the type implements the DeserializableValidator trait?

@arendjr
Copy link
Contributor Author

arendjr commented Jan 16, 2024

How the derive macro could know if the type implements the DeserializableValidator trait?

I was thinking to just add an attribute for that.

@github-actions github-actions bot added the A-CLI Area: CLI label Jan 17, 2024
@arendjr
Copy link
Contributor Author

arendjr commented Jan 17, 2024

I'll revert the indent_size deprecation for now, since the linter fails on it.

@arendjr
Copy link
Contributor Author

arendjr commented Jan 17, 2024

Alright, when all is green it's time to merge :)

@arendjr arendjr merged commit 09db855 into biomejs:main Jan 17, 2024
17 checks passed
@arendjr arendjr deleted the deserializable_derive branch January 17, 2024 21:16
ematipico pushed a commit to DaniGuardiola/biome that referenced this pull request Jan 24, 2024
Conaclos added a commit that referenced this pull request Mar 12, 2024
Conaclos added a commit that referenced this pull request Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI A-Formatter Area: formatter A-Linter Area: linter A-Project Area: project A-Tooling Area: internal tools L-JavaScript Language: JavaScript and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants