Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOCON properties rendered incorrectly #1225

Open
kdubb opened this issue Sep 20, 2024 · 24 comments
Open

HOCON properties rendered incorrectly #1225

kdubb opened this issue Sep 20, 2024 · 24 comments
Labels
bug Something isn't working

Comments

@kdubb
Copy link

kdubb commented Sep 20, 2024

In HOCON, it's Path.render quotes anything it doesn't like, including things like % for profiles, [] for lists/arrays and / used in projects like SmallRye FaultTolerance.

Basically it produces a lot of quoted sections to the properties it produces that make them incorrect. Some examples:

  • Percent
    %dev.some.property becomes "%dev".some.property
  • Forward Slashes
    org.acme.CoffeeResource/coffees/Retry/maxRetries=6 becomes org.acme."CoffeeResource/coffees/Retry/maxRetries"=6
  • Brackets (which aren't usually necessary)
    my.indexed.collection[0]=dog becomes my.indexed."collection[0]"=dog

There doesn't seem to be a way, currently, to represent these properties in HOCON files.

@kdubb
Copy link
Author

kdubb commented Sep 21, 2024

I wrote an internal version of the module to quickly fix these issues for us. The solution is fairly simple with a few caveats because there is seemingly no way to "render" a key path in code consuming a config from the library; it always renders them itself before returning them.

So the fix looks like this:

  1. Generate flattened keys as is currently done.
  2. Use the ConfigImplUtil.parsePath to retrieve the path from flattened keys.
    a. Requires generating indexed paths with the [0] part quoted to be a valid HOCON path.
  3. Rebuild the path in a representation compliant with SmallRye Config.
    a. Quote any empty path elements or those having matches in [.\s\"]
    b. Join the elements using '.'

If I find the time I can look at making a PR for this but I'm pretty swamped at the moment.

@radcortez
Copy link
Member

Thank you for the report.

I was considering deprecating and removing the HOCON source due to:

We have no plans to invest effort in an abandoned library unless it is super popular, which is not the case. For the time being, we do accept any PRs that improve the implementation, of course.

Would you consider moving away from the HOCON source?

@radcortez radcortez added the bug Something isn't working label Sep 23, 2024
@kdubb
Copy link
Author

kdubb commented Sep 24, 2024

I've read those as we use HOCON. As the previous maintainer has said a few times, the library works with a few known issues but no real showstoppers. All in all, it's much better than YAML for config (in practice YAML trades repetition for lots of unnecessary indentations).

That being considered, I would say that if SmallRye Config's YAML support included breaking keys with dots in them into key "paths"... we'd switch to that in a hearbeat. That's really "the thing" that makes HOCON better.

So a YAML format that treated:

this.is.a.key.path: the-value

as if it were

this:
  is:
   a:
    key:
     path: the-value

I would see no value in HOCON.

@kdubb
Copy link
Author

kdubb commented Sep 24, 2024

Not sure what SR Config is usign for YAML parsing but I know that being able to detect/specify quoated vs. unquoted keys is usually available (unless your using a meta-library like Jackson). So the above could be implemented and allow leaving quoted (single or double) keys alone.

@radcortez
Copy link
Member

We use snakeyaml as the parse which handles that case.

But we do handle it differently for a few reasons:
quarkusio/quarkus#11744

@kdubb
Copy link
Author

kdubb commented Sep 25, 2024

Yeah it's why we stopped using YAML in Quarkus, as I said you're trading repitition for indentation/newlines.

I'm wondering with we could add a "SimpleYAML" version to SmallRye and Quarkus. Where you get "simplified YAML" where common sense things like treating dots as path segments is done. It could be a configuration key but configuring your config source seems to make less sense than including quarkus-config-simple-yaml and getting these features.

@dmlloyd
Copy link
Contributor

dmlloyd commented Sep 25, 2024

I think that pressing this idea as "common sense" distracts from the fact that doing this flattening would be incorrect, even if you hide it behind a mode. The reason is that a dot . is a valid part of a configuration segment, and so deciding that it instead acts as a configuration separator for syntactic convenience means that many segments are now non-representable. Since this is not a part of core YAML, there is not any escape mechanism to make this work correctly, so if we wanted to support this, we'd need to layer on another escape mechanism, which would lead to other problems.

To me, "common sense" means "correct".

@kdubb
Copy link
Author

kdubb commented Sep 25, 2024

Ok, common sense might not be the correct terminology, but it certainly makes it simpler to use and achieves a configuration language that's much less wordy.

What's not representable if quoted keys are treated as a single segment?

@dmlloyd
Copy link
Contributor

dmlloyd commented Sep 25, 2024

Any segment with a . in it (like a class name or logging category) would not be unambiguously representable.

@kdubb
Copy link
Author

kdubb commented Sep 25, 2024

How, if it's quoted it's treated as a single segment, if it's unquoted, dots are segment separators. I'm not sure how that is unambiguous.

@dmlloyd
Copy link
Contributor

dmlloyd commented Sep 25, 2024

Quoted how?

@kdubb
Copy link
Author

kdubb commented Sep 26, 2024

The same way you quote keys to include unsupported characters (e.g :) currently.

"this.is.one.segment": value
'this.is.one.segment:including this': value
this.is.multiple.segments: value
# equivalent to
this:
  is:
    multiple:
      segments: value

@dmlloyd
Copy link
Contributor

dmlloyd commented Sep 26, 2024

YAML makes no distinction between keys that are quoted and non-quoted by specification. This means it's generally not possible to distinguish these cases (i.e. whether or not a key was quoted) from a YAML-compliant parser like SnakeYAML. The solution is also incomplete, since things like foo."bar" would (rightly) not be considered to be valid YAML.

@kdubb
Copy link
Author

kdubb commented Sep 26, 2024

You are arguing the specification of YAML and I am not. The spec clearly doesn't treat dotted keys as path segments. That doesn't mean it isn't a very useful thing to get the best of both worlds; removing the repetition of .properties files without the unnecessary indentation and newlines required in YAML.

Regardless of the spec, SnakeYAML and most other YAML parsers do distinquish between quoted and unquoted keys during parsing. Mostly because you generally want some way of specifying this during serialization and they provide the same information during parsing.

So quoted vs unquoted is a nice simple way to achieve all this.

With regard to foo."bar".baz.. that's a valid key in YAML (the quotes will be included) and it's easy to parse that as a single segment.

@dmlloyd
Copy link
Contributor

dmlloyd commented Sep 26, 2024

The specification is relevant for two reasons.

First of all, because the specification treats these cases as equal (AFAICT), then interpreting the document differently is in violation of the spec. The possible ramifications of this include (but are not limited to) having config be broken because some YAML-processing tool reinterpreted keys and de-quoted things that do not, by spec, need to be quoted.

The second reason is purely practical: I don't see how we can possibly know whether or not a key was quoted or unquoted after we get it out of the parser. So if that is indeed the case, we'd either have to hack into the parser or write our own, neither of which seems very appealing.

@kdubb
Copy link
Author

kdubb commented Sep 26, 2024

As I said previously, SnakeYAML surfaces parsed strings (including keys) as scalars with a ScalarStyle which includes DOUBLE_QUOTED, SINGLE_QUOTED, and PLAIN (amongst others).

No hacking needed.

@radcortez
Copy link
Member

From what I could tell, the ScalarStyle only applies to the Dumper and not the Parser.

When I try to read foo.bar or "foo.bar", the resulting key does not contain the quotes, but maybe I'm missing something. Can you also have a look?

Additionally, if we were to support it, I believe it would break the existing configuration. How would you solve it?

@dmlloyd
Copy link
Contributor

dmlloyd commented Sep 27, 2024

Even if we managed to solve the second problem, we still have the first one: that any random tooling might transform the file incompatibly because we're assuming things that are not promised in the specification. I wouldn't be opposed to some other quoting scheme - but such a scheme would have to be spec compliant at the very minimum.

@kdubb
Copy link
Author

kdubb commented Sep 27, 2024

From what I could tell, the ScalarStyle only applies to the Dumper and not the Parser.

The Parser produces Events, ScalarStyle is a property on the ScalarEvent.

https://github.com/snakeyaml/snakeyaml-engine/blob/0579bdb74b5d48cf4470a85309e4da1f1593fea4/src/main/java/org/snakeyaml/engine/v2/events/ScalarEvent.java#L31

Even if we managed to solve the second problem, we still have the first one...

I admire your adherence to the standard, but I'm very much talking about "stretching it" 😆

This is why I suggested a completely different module, so that never the twain shall meet ( (e.g. SimpleYAML but now I like NotSoIndentyAndNewlinyYAML for the name).

The goal is to take small liberties with YAML, and use what the Parser/Dumper already provide to make the developer experience measurably better. Yep, we're doing things non-standard, but I can't see the drawback. Given that we're getting requests for keys from SR Config, not providing keys [1], we should be able to map whatever request comes in, to a "nicer to type" value.

Honestly, given the absolutely broken state of HOCON in SR Config; the idea of replacing it with a YAML that gives you nearly all the same convenience without relying on (technically) unmaintained libraries, seems like a real win.

The only drawback I see is if I load YAML into a "YAML Parser" I'm going to get keys like "this.is.a.key.path" instead of this: { is: { a: { key: { path: <value } } } }. So... don't do that and expect to get anything but key paths (cause the objects won't have been created at each segment. The quoting issue really doesn't bother me. this."is.a".key.path would show up exactly as you see it there. Anybody loading these YAML files should know to process them accordingly (or just use SR Config).

An actual nice name for it might be ConfigYAML.

[1] - I know that SR Config does report keys to show config (e.g. in Quarkus) but the actual resolution seems to only rely on making requests of the ConfigSources. I do know the importance of showing the keys correctly too. When trying to figure out why our HOCON files weren't working we were able to plainly see that there were keys that were misrepresented.

@kdubb
Copy link
Author

kdubb commented Sep 27, 2024

Upon reflection, maybe ConfigYAML is too... something. I could see people getting them mixed up if there were both a quarkus-config-yaml and a quarkus-config-configyaml extension 🤷‍♂️... I still like calling it ConfigYAML though haha

@kdubb
Copy link
Author

kdubb commented Sep 27, 2024

When I get some time, I will almost definitely implement this in our project, I've already got a HOCON config extension to get it working correctly.

Maybe the best course of action is to let me find the time to do that and report back. Once it's working we should learn pretty quickly if it causes any issues with editing, saving, etc. It wouldn't be a win if you're fighting with the IDE every you edit or save a "ConfigYAML" file.

@kdubb
Copy link
Author

kdubb commented Sep 27, 2024

Back to the original topic of HOCON. I really think you guys should fix or pull it. I'd vote for fix. It's not a terribly hard fix and HOCON is an accepted configuration format, but I would understand if you chose to drop it altogether.

@radcortez
Copy link
Member

The Parser produces Events, ScalarStyle is a property on the ScalarEvent.

Yes, but currently, from what I could tell, you would have to reimplement pieces of the parser to make it work. At least I couldn't find a way to get the quoted keys out of the box. I also looked into the snakeyaml issue tracker, and I couldn't find any related discussion. It seems it was never considered or required.

I understand the frustration, and we have had many users complain about the same thing in the past. I would like to improve this but don't see how to do it without a significant breaking change. Consider:

quarkus:
  log:
    category:
      io.quarkus.category:
        level: INFO

Translates to quarkus.log.category."io.quarkus.category".level. Where io.quarkus.category is a dynamic segment of the configuration (the logger). We know what composes the segment because it is quoted. If we change the behaviour to represent a multi-path, then it becomes quarkus.log.category.io.quarkus.category.level, and you would required to mandate io.quarkus.category in the yaml file to be quoted. It is all doable, but unfortunately, it is a significant breaking change.

@radcortez
Copy link
Member

Back to the original topic of HOCON. I really think you guys should fix or pull it. I'd vote for fix. It's not a terribly hard fix and HOCON is an accepted configuration format, but I would understand if you chose to drop it altogether.

Feel free to submit a PR; I'm happy to review and merge it. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants