Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chained table / array declarations #1019

Closed
meghprkh opened this issue Mar 9, 2024 · 36 comments
Closed

Chained table / array declarations #1019

meghprkh opened this issue Mar 9, 2024 · 36 comments

Comments

@meghprkh
Copy link

meghprkh commented Mar 9, 2024

This might be a slightly niche request but currently TOML requires writing out the full declaration per line.

My proposal is to keep the same explicitness, but allow chaining them, such that

[[request.test]][capture.jsonpath]
id = '$.id'

is equivalent to

[[request.test]]
[request.test.capture.jsonpath]
id = '$.id'

Or in JSON

{
  "request": {
    "test": [
      {
        "capture": {
          "jsonpath": {
            "id": "$.id"
          }
        }
      }
    ]
  }
}

In general, the problem I feel this addresses is alteration of array and dictionary objects when declaring structures. This might not be super common but deeply nested JSONs are convenient sometimes.

@eksortso
Copy link
Contributor

I'll have two responses. This one discusses ways to do the things that the proposed syntax is intended to solve, using existing or upcoming syntax. I'll make another response later to address the proposal directly and bring up other syntaxes brought up in the past. I have some background experience here, so I have some opinions to share. But I'll write about those things later.

There is already a way to define subtables within a single line to avoid mixing up array and table syntax, and that is with the use of dotted keys. The example that you provide can be written this way:

[[request.test]]
capture.jsonpath.id = '$.id'

When TOML v1.1.0 is released, it will allow inline tables to span multiple lines, allowing for deep nesting to be expressed in a way that is familiar to users accustomed to nested formats like JSON. The following will be valid after the release of TOML v1.1.0rc1:

[[request.test]]
capture = {
    jsonpath = {
        id = '$.id'
    }
}

@ChristianSi
Copy link
Contributor

Yeah I suppose dotted keys are the more readable way to write this, and they don't require any new syntax.

@levicki

This comment was marked as abuse.

@eksortso
Copy link
Contributor

@levicki Don't moralize. Things happen for reasons beyond our control. Trust me, I know.

And don't think we all unanimously decided to go in this direction. As it stands, there are enough people nowadays whose whole life experience involved XML and JSON, and never any flavor of INI.

Fact is, TOML is still tightly connected to the INI approach of doing things. That will never change.

@levicki

This comment was marked as abuse.

@eksortso
Copy link
Contributor

eksortso commented Mar 13, 2024

Before you insult me any further, @levicki, do note that I complained about TOML turning into JSON for years. Literally years. This very bitter, very hyperbolic comment should make my case for me. Anyway, that entire issue is worth reading, even if you hated its outcome.

That's ancient history now. And you're upset right now that I suggested the JSON-like syntax as a potential secondary solution. I didn't say it was ideal, just that it would be possible.

I'm sorry that TOML is not the idealized solution, encased in ice forever, that you think it should be. Formats for humans sometimes bend to human needs, whether we like it or not. Complain to your colleagues who use features you don't like, not us, and certainly not me.

But also look at any cargo.toml or pyproject.toml file in production code, and realize that these common use cases adhere closely to the INI spirit. Also note that these files are typically machine-generated by other building tools.

@levicki

This comment was marked as abuse.

@meghprkh
Copy link
Author

The dotted syntax would not allow arbitrary nesting like (not a great example but hopefully conveys the motivation):

[[planets]]
name = "Earth"

[[planets]][fauna][[categories]]
name = "mammal"
description = "xyz"

[[planets]][fauna][[categories]]
name = "reptile"
description = "xyz"

[[planets]]
name = "Mars"
fauna = {}
planets:
  - name: Earth
    fauna:
      categories:
        - name: mammals
          description: xyz
        - name: reptiles
          description: xyz
  - name: Mars
    fauna: {}

It should be worked out on whether this is going to be a config language that doesn't support well nested data and a small subset of it or a generic data language.

@arp242
Copy link
Contributor

arp242 commented Mar 14, 2024

I do not find it easy to follow that last example; I think this is much clearer:

[[planets]]
name  = "Earth"
fauna = {
	categories = [
		{name = "mammals",  description = "xyz"},
		{name = "reptiles", description = "xyz"},
	],
}

[[planets]]
name  = "Mars"
fauna = {}

@levicki

This comment was marked as abuse.

@ChristianSi
Copy link
Contributor

ChristianSi commented Mar 15, 2024

@levicki I very much agree that the TOML example you give is more readable than its JSON equivalent – and I'm also very sure that nobody would want to use TOML just to mimic JSON. After all, what would be the point of that?

I'm nevertheless convinced that the change we made in TOML 1.1 is for the better, since your example is of course a toy example. Made a bit more realistic, and shortened to just the relevant bits, it might look rather like this:

[[Planets.Inhabitants]]
Pets = [
    { Name = "Dogs", Description = "Mammals of the species Canis familiaris, of highly variable appearance because of human breeding." },
    { Name = "Cats", Description = "Mammals of the domesticated species Felis catus, commonly kept as house pets." },
]

[[Planets.Inhabitants]]
Pets = [
    { Name = "Martian Brain Slug", Description = """A Martian gastropod mollusc, popular as pet despite being a bit challenging to keep,
as it likes to feed on its owners' brains. Many Martian houses have a warning sign
"Beware of the slug" to warn clueless passers-by to take the necessary precautions.""" },
]

Now, since these descriptions are quite long (one of them even spans multiple lines), a TOML editor might naturally feel inclined to insert a linebreak before them:

[[Planets.Inhabitants]]
Pets = [
    { Name = "Dogs",
      Description = "Mammals of the species Canis familiaris, of highly variable appearance because of human breeding."
    },
    { Name = "Cats",
      Description = "Mammals of the domesticated species Felis catus, commonly kept as house pets."
    },
]

[[Planets.Inhabitants]]
Pets = [
    { Name = "Martian Brain Slug",
      Description = """A Martian gastropod mollusc, popular as pet despite being a bit challenging to keep,
as it likes to feed on its owners' brains. Many Martian houses have a warning sign
"Beware of the slug" to warn clueless passers-by to take the necessary precautions."""
    },
]

However, TOML 1.0 doesn't allow this, which is a very surprising and non-intuitive restriction. Even more so, since TOML has always allowed line breaks in arrays, which are otherwise very closely related to inline tables.

Fortunately, TOML 1.1 removes this well-intended but surprising restriction, allowing people to add linebreaks more freely without causing accidental breakage. Hurrah!

@levicki

This comment was marked as abuse.

@arp242
Copy link
Contributor

arp242 commented Mar 16, 2024

Maybe an intellectually challenged TOML editor, not someone with common sense who understands the specification.

Stop it with these insults; this is really crossing the line.

@levicki

This comment was marked as abuse.

@arp242
Copy link
Contributor

arp242 commented Mar 16, 2024

You don't have to call people anything; just state your case why you think A is better than B. It's not hard.

@levicki

This comment was marked as abuse.

@arp242
Copy link
Contributor

arp242 commented Mar 17, 2024

There isn't much to address because all you've said is "I don't like it" in different ways. I can't argue with that.

All I can say I do like it. And that many people like it. And that #516 is the most upvoted issue on this repo. And "I'd love to use TOML if it would allow newlines" comes up in most Hacker News discussions and such that I've seen. And a number of people have asked "when is TOML 1.1 coming out because I'd like this feature" over the last year or so (which they haven't done for any other change, AFAIK).

So your view is a distinct minority, and the only thing that could possibly change the course is some practical real-world issue. That is: ambiguous syntax, it's hard to implement, hard to give good errors. Things like that.

@levicki

This comment was marked as abuse.

@eksortso
Copy link
Contributor

Can we get back on topic? I think we may be able to accomplish the same thing that @meghprkh wants with a much simpler syntax addition to TOML. This is a refinement of an earlier idea that I floated years ago, which I think is worth revisiting.

In short, why don't we allow an empty pair of square brackets [] after names of table arrays, but immediately after their part of the dotted key? They can effectively be ignored, unless they're used after a name that's not declared to be an array of tables, or if they appear at the very end of the table name.

Going back to your original idea, @meghprkh, consider your example:

# Valid in TOML v1.0.0
[[request.test]]
[request.test.capture.jsonpath]
id = '$.id'

Since test is a table array, why not allow this, so that readers can track the arrays more easily. They can be ignored when used in the middle of a long name. This is akin to the original proposal's "[[request.test]][capture.jsonpath]".

# Part 1 of proposal: Make `[request.test[].capture.jsonpath]` equivalent to `[request.test.capture.jsonpath].
[[request.test]]
[request.test[].capture.jsonpath]  # NOTE: `test` is already known to be an array of tables.
id = '$.id'

And then, to be consistent, what if we made [request.test[]] equivalent to [[request.test]]?

# Part 2 of proposal: Make `[request.test[]] equivalent to `[[request.test]]`.
[request.test[]]  # NOTE: this marks a new element of a table array.
[request.test[].capture.jsonpath]
id = '$.id'

This proposal is fully compatible with the current syntax and would only be valid inside table headers and array-of-table headers. Both old and new types of syntax may be used, and the equivalences mentioned above can be honored if mixed.

@meghprkh What do you think of this way of cleaning up the confusion that can surround mixed table and array declarations? How would you say it compares to your proposal?

@arp242
Copy link
Contributor

arp242 commented Mar 20, 2024

I find this exceedingly complex and non-obvious. Few people will be able to look at:

[[request.test]]
[request.test[].capture.jsonpath]
id = '$.id'

And intuitively be able to understand what this means.

This is about is not having to write the full table name in:

[[request.test]]
[request.test.capture.jsonpath]
id = '$.id'

And your example doesn't avoid that, and is actually longer.

@levicki
Copy link

levicki commented Mar 20, 2024

This is about is not having to write the full table name in:

[[request.test]]
[request.test.capture.jsonpath]
id = '$.id'

And your example doesn't avoid that, and is actually longer.

While I agree that his example is even worse, you don't really have to write the full table name, jsonpath can be written as an inline table (because that's what they are for):

[request]
[[request.test]]
[request.test.capture]
jsonpath = { id = "$.id" }

Note that even with having to write necessary tables above the final one (which you will certainly have to write at least once if your data has arrays) it is still shorter, more readable, and easier to write than the equivalent JSON:

{
    "request": {
        "test": [
            {
                "capture": {
                    "jsonpath": {
                        "id": "$.id"
                    }
                }
            }
        ]
    }
}

In short, dotted keys and inline tables are used to balance the data tree (by pushing it right and left respectively), and to keep it as flat / compact as possible.

In my opinion, the concept works very well as-is, and changing the inline table as suggested would break it.

Finally, having to write nested {} like in JSON is way worse than just selecting and copy/pasting an entire line with a long dotted table name. Just a thought, but perhaps people having this particular problem with TOML are really better off using JSON instead?

@eksortso
Copy link
Contributor

@arp242 I never recommended doing this without adopting both sections. You didn't comment on the Part 2 example.

Just because I show something that's possible doesn't mean I endorse it. Why does this keep happening??

@eksortso
Copy link
Contributor

We're not playing configuration golf; we don't necessarily need the shortest example. The dotted-key approach gave us a short example. But the clarity of that approach is up for debate, especially if there's more to the subtables capture and jsonpath that we don't know about.

@eksortso
Copy link
Contributor

One thing that I admire in @meghprkh's proposal is that it is capable of indicating a new table element on the same line as its subtable, using the [[aot]] notation at the beginning. That's clear.

But what I do not like about it is that a relative [subtable] path inside brackets suddenly appears. Even though the brackets butt up against each other on the same line, I still consider this approach unintuitive. Table headers must, and always should, be absolute paths.

And two bracketed names pushed together don't immediately suggest an absolute path, at least in my eyes. This is partly why I reintroduced the empty-bracket proposal; that approach makes table array elements stand out in the table's name, which will reduce confusion surrounding nested arrays and tables found together.

So, if there's any other way to combine a new array element with its subtable (or sub-array) on a single line without ambiguity, I'd appreciate it.

@arp242
Copy link
Contributor

arp242 commented Mar 21, 2024

@arp242 I never recommended doing this without adopting both sections. You didn't comment on the Part 2 example.

To be honest I find the second part even worse.

We already have [..] table declarations, dotted keys, and inline arrays and tables. I think that's enough.

@levicki

This comment was marked as abuse.

@eksortso
Copy link
Contributor

eksortso commented Mar 21, 2024

The whole point of TOML is to write the shortest possible human readable configuration file. I am sorry you can't see that.

And I am sorry that I won't cave to your ad hominem attacks. Where are you getting "shortest possible" from?? That was never a goal. That's just a consequence of our overall goal.

Human readability is our overall goal. Obviousness and minimalism are our driving purposes. But since it's possible for the minimal to lose coherence, we've added features to make the language more convenient. These were never just thrown in at random.

Like it or not, "short" has always been an afterthought. But it came from deliberate choices made over the years. Because human beings happen to like "short" configurations.

@eksortso
Copy link
Contributor

We already have [..] table declarations, dotted keys, and inline arrays and tables. I think that's enough.

You say this, but there is nothing in my empty-bracket proposal that goes beyond the concepts of [table] and [[aot]] declarations.

@arp242
Copy link
Contributor

arp242 commented Mar 21, 2024

We already have [..] table declarations, dotted keys, and inline arrays and tables. I think that's enough.

You say this, but there is nothing in my empty-bracket proposal that goes beyond the concepts of [table] and [[aot]] declarations.

But it's a new way to define those, no? I just don't see what this adds over the existing methods to define them. One could perhaps find it a bit easier or better, and that's fair, but the existing methods aren't really that cumbersome, so I don't see what yet another way adds.

@levicki
Copy link

levicki commented Mar 21, 2024

And I am sorry that I won't cave to your ad hominem attacks.

What about me saying that I am genuinely sorry about you not seeing something that is obvious to me looks like an ad hominem attack? No wait, don't tell me, because that wasn't an ad hominem attack at all. You are just looking for excuses to discard my arguments.

Where are you getting "shortest possible" from??

It's in the format name:

image

And in the blurb:

TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages.

To me "minimal" and "shortest possible" are synonyms in this context.

That was never a goal.

If it wasn't then it wouldn't be in the name. It's just that Tom's goal seems to have been different than yours.

Human readability is our overall goal.

Adding new lines into an inline table which uses curly braces to allow nesting (and indentation) does not improve readability.

In TOML, nesting and indentation is already used for arrays:

[Blob]
Data = [
    0x44, 0x4f, 0x53, 0x00
]

And that's what makes those arrays obvious when you look at the file. Add another element that can be nested (curly braces) and you will break that obviousness.

Obviousness and minimalism are our driving purposes.

You may claim so, but based on the change proposed and the stubborness associated with pushing it through despite valid counter-arguments presented I am not seeing it.

I wonder... would something like this be acceptable instead?

[request]
[[request.test]]
[...capture.jsonpath]
id = '$.id'
[...torture.jsonpath]
junk = '$.junk'

Here, ... is a shorthand for last defined array start path (request.test) so you don't have to write it out fully.

Dot (.) is already handled differently for keys so using it like this wouldn't reduce symbol space.

The only problem I see with this approach is that if you only have a segment of the config file which doesn't include request.test you can't reconstruct the full hierarchy anymore and that's a problem with all attempts to shorten the way keys are defined.

@eksortso
Copy link
Contributor

eksortso commented Mar 22, 2024

I do like the idea of [... as a mechanism for abbreviating super-table names. Ellipses for an abbreviation of the prefix makes sense. It does count as an addition to the language, but it's an elegant way of doing it. It requires no new special characters. Please mention it again in #1017, because it would serve well as a potential new "alias" syntax!

Addendum: This is very similar to a more complicated proposal by @ChristianSi going back to #744. Our push towards minimalism is alive and well, it seems!

@ChristianSi
Copy link
Contributor

I guess the obvious question re the last proposal is: Why just refer back to the last array of table element, without also offering a way to refer to earlier (non-array) table names? My proposal, rediscovered by @eksortso, was meant to be equally usable for both.

@eksortso
Copy link
Contributor

@ChristianSi As it turns out, this fits into my argument against @levicki's take that our use of the word "minimal" somehow refers not to the brevity of TOML's syntax, but rather to the length of the documents that TOML users generate, which I claim is essentially wrong.

In the past, although we've discussed all distinct types of syntax for TOML, only the ones that succinctly fit a specific need are adopted. Past discussions make that drive towards minimalism perfectly clear to understand; although we may like a new syntax, objections are raised when they're too difficult to explain concisely, and more complicated ones are eventually abandoned.

Read #1017, #744, and back even further for examples of how to make long table names shorter. The intention isn't to make the documents shorter, so much as it is to make them more convenient for their writers. This illustrates where the drive towards minimalism comes into play.

@ChristianSi, I no longer think that we should attempt to adopt parent-directory-style notation. The less that writers have to be aware of table and array depth while writing, the better. The alias proposals so far under #1017 take one name (the previous explicitly defined absolute table name) and only expand from there. That idea is extremely simple to grasp, and it's my favorite going forward. That's minimalism: we prefer simpler ideas that reach the same goals.

The way to express these "aliases" involved special characters to indicate when to use an alias. We started with explicitly written alias names. Then we switched to a "prefix" approach; one version using & was very elegant. Then ... ellipses were suggested, and that appealed to me more because:

  • it didn't require a new character,
  • its use as abbreviation has precedence in other uses,
  • and it stands out a lot more than a single & or @ would.

It's elegant. It's not shorter since it uses three characters instead of just one. But it is the most elegant approach and appears to be more obvious.

That's how we apply a minimal aesthetic. Take simpler ideas and more understandable syntax, then shape it so that users may learn by example, so they can use the new syntax. Make fewer hurdles to jump over in order to understand.

@levicki
Copy link

levicki commented Mar 22, 2024

I guess the obvious question re the last proposal is: Why just refer back to the last array of table element, without also offering a way to refer to earlier (non-array) table names? My proposal, rediscovered by @eksortso, was meant to be equally usable for both.

TOM specification says that:

  • You must have a created array before adding a table to it
  • Entries following an array refer to that array

What I proposed fits into those rules and allows shortening dotted keys which is what I figure was the core complaint of most people wanting to do deep nesting in TOML and who were suggesting various ways like using aliases, path semantics, or breaking inline tables using new lines.

As a bonus, ellipsis already has a universal meaning outside of TOML and is thus more intuitive for people who aren't tech-savvy.

On the other hand, having a variable number of dots would make it non-intuitive and error-prone because instead of one or three dots you can have any number of them and you have to count them.

Refering to other arbitratry elements in the path would require much more complex specification (and thus parser) changes so that's why I stuck with what is basically already in the specification (sans the shortening itself being allowed).

As it turns out, this fits into my argument against @levicki's take that our use of the word "minimal" somehow refers not to the brevity of TOML's syntax, but rather to the length of the documents that TOML users generate, which I claim is essentially wrong.

The intention isn't to make the documents shorter, so much as it is to make them more convenient for their writers.

I do agree with most of your other points but this is arguing semantics — the length of TOML documents is directly proportional to the brevity of its syntax, and making them shorter is making them more convenient for writers and readers. You can't have one without the other and that's what I am arguing here.

@ChristianSi
Copy link
Contributor

@eksortso , @levicki: Yeah, sure, I think I could come on board regarding the ellipsis proposal too.

@pradyunsg
Copy link
Member

Apologies for the slow response from my end here.

@levicki has a 7 day block from interacting on this organisation for their behaviour here ~3 weeks ago. In the future, please be respectful of differing viewpoints and avoid using derogatory language when referencing to others or their work.

Jumping back to OP again:

This might be a slightly niche request but currently TOML requires writing out the full declaration per line.

This is an intentional design choice.

I don't think TOML should allow chaining tables/array of tables on a single line with ][ in the line.


Further discussion on ellipsis-related proposals/other shorthands can take place at #1017. Closing this out to reflect that the model that OP proposes isn't going to something TOML is going to evolve in the direction of today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants