Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Extended table value assignments #525

Closed
eksortso opened this issue Mar 1, 2018 · 17 comments
Closed

Proposal: Extended table value assignments #525

eksortso opened this issue Mar 1, 2018 · 17 comments

Comments

@eksortso
Copy link
Contributor

eksortso commented Mar 1, 2018

In issue #516, we have been considering whether to allow newlines in inline tables. I have been arguing against that as being much too JSON-like.

But there does exist a real need, found in the wild, for tables to be defined, mid-section, in order to maintain a configuration file's logical flow with a minimal amount of repetition.

To rectify these issues, I propose (while closing #516) that we allow for an extended table value syntax, with the same scope and purpose as inline table value syntax. In essence, it would resemble a section filled with key/value assignments, but assigned to a key or included as an array element. This would maintain TOML syntax consistency, while keeping key lengths short. This would come with limitations, which will be explained. The syntax is compatible with TOML v0.4.0, and could be merged into v1.0.

The Proposal

Here, table_key3 is assigned a table using an extended table.

[section_table]
key1 = "value1"
key2 = "value2"
# For extended tables, a line must end with a left brace, with optional whitespace and comment.
table_key3 = {
    # The lines between braces must resemble standard key/value pairs. Notice
    # that unlike inline tables, commas are NOT permitted. Also notice that
    # section headers are NOT permitted.
    subkeyA = "table_key3.value3A"
    subkeyB = "table_key3.value3B"
    subkeyC = "table_key3.value3C"
    # Inline table values and dotted key paths are allowed.
    subkeyZ = {inner_key = "table_key3.inner_value"}
    subkey_0.subsubkey_0 = "table_key3.subkey_0.subsubvalue_0"
# To close an extended table, A line beginning with (optional whitespace and) a
# right brace must be provided.
}
# Assignments to the section table may resume afterwards.
key4 = "value4"

Like inline tables, per the discussion on table definition scope in #499, everything pertaining to the extended table must be found between the braces. And nothing outside the braces may inject key values or subtables into the extended table. For parsing purposes, an extended table opens on the line with the left brace and closes on the line with the right brace.

Examples

Consider these two real examples, which I'm paraphrasing from #516.

Example 1:
This example comes from @seeruk, who transcribed a YAML configuration into TOML. I've removed unneeded section headers and reduced arrays to single lines to save space, and I'm only including the part that pertains to my proposal.

# Example 1 (original)
version = "3"

  [services.elasticsearch]
  container_name = "metrics_elasticsearch"
  image = "docker.elastic.co/elasticsearch/elasticsearch:5.5.3"
  network_mode = "host"
  ports = [ "9200:9200", "9300:9300" ]
  volumes = [ "elasticsearch-data:/usr/share/elasticsearch/data" ]

    [services.elasticsearch.environment]
    "discovery.type" = "single-node"
    "http.cors.enabled" = true
    "http.cors.allow-origin" = "*"
    "xpack.security.enabled" = false

The problem that @seeruk had with this was: "That extra level of nesting [i.e. services.elasticsearch.environment] just makes TOML that much less nice to use in this case. If the environment could be on the same level as the rest of the service configuration it'd tidy it right up." (Emphasis mine.)

Now consider the following, which uses an extended table to put environment in the same place as its source:

# Example 1 with an extended table
version = "3"

  [services.elasticsearch]
  container_name = "metrics_elasticsearch"
  image = "docker.elastic.co/elasticsearch/elasticsearch:5.5.3"
  network_mode = "host"
  environment = {
    "discovery.type"         = "single-node"
    "http.cors.enabled"      = true
    "http.cors.allow-origin" = "*"
    "xpack.security.enabled" = false
  }
  ports = [ "9200:9200", "9300:9300" ]
  volumes = [ "elasticsearch-data:/usr/share/elasticsearch/data" ]

Notice that environment is kept in the midst of the keys in [services.elasticsearch], below network_mode and above ports. This is important, because you lose something by moving the definition of services.elasticsearch.environment outside of [services.elasticsearch].
Also notice that each line within the braces is just a simple key/value pair. No extra commas required.

Example 2:
Extended tables can also be used in arrays of tables. This example comes from @JelteF, whose company prohibits the use of table arrays due to their confusing syntax. He's repeatedly insisted that inline tables must allow both newlines and commas to resolve his problems, and I've argued against that as being too JSON-like.

Here's the document, conforming to TOML 0.4.0, that has caused him trouble at his workplace.

# Example 2 (original)
[main_app.general_settings.logging]
log-lib = "logrus"

[[main_app.general_settings.logging.handlers]]
    name = "default"
    output = "stdout"
    level = "info"

[[main_app.general_settings.logging.handlers]]
    name = "stderr"
    output = "stderr"
    level = "error"

[[main_app.general_settings.logging.handlers]]
    name = "http-access"
    output = "/var/log/access.log"
    level = "info"

[[main_app.general_settings.logging.loggers]]
    name = "default"
    handlers = ["default", "stderr"]
    level = "warning"

[[main_app.general_settings.logging.loggers]]
    name = "http-access"
    handlers = ["default"]
    level = "info"

Now here's the same configuration, expressed with arrays of extended tables. It offers clean, short lines for diff purposes.

# Example 2 with arrays of extended tables
[main_app.general_settings.logging]
log-lib = "logrus"

handlers = [{
    name = "default"
    output = "stdout"
    level = "info"
}, {
    name = "stderr"
    output = "stderr"
    level = "error"
}, {
    name = "http-access"
    output = "/var/log/access.log"
    level = "info"
},]

loggers = [{
    name = "default"
    handlers = ["default", "stderr"]
    level = "warning"
}, {
    name = "http-access"
    handlers = ["default"]
    level = "info"
},]

Issue #309 contains an array-of-inline-tables solution with long lines and rearranged keys. This example preserves the ordering of the keys. This proposal may also help us clear up concerns brought up over there.

@JelteF
Copy link
Contributor

JelteF commented Mar 1, 2018

This would indeed solve the problem I have an I'm totally fine with this solution. Please correct me if I'm wrong, but to paraphrase this solution: It's inline tables, but with newlines instead of commas as the separator.

As an end user I actually like it better than #516. I do think it would be more complicated to implement in parsers than #516, but probably not too hard.

@eksortso
Copy link
Contributor Author

eksortso commented Mar 1, 2018

@JelteF That's precisely what it is. I think I'll work on implementing it in a parser over the next week or two and work out whatever kinks turn up.

@sgarciac
Copy link
Contributor

sgarciac commented Mar 15, 2018

Some comments:

  1. I don't think there is a strong case for example 1. The new dotted key feature already solves the problem of having a sub-table in the middle of a section (if you don't want a long inline table). There is still repetition, but I think keeping the language easy to read is more important.

  2. Example 2 can be written with regular inline tables, the only advantage of breaking them in several lines would be diff friendliness. Here again, I'm not sure it is worth letting TOML to be json.

  3. I agree with @JelteF in that there is no need for a new concept, requiring inline tables to use either commas or newlines (but not both) would do? I see the point of this as a way to protect readability, but I think it might violate the principle of least surprise.

@eksortso
Copy link
Contributor Author

Thanks for your comments, @sgarciac.

It's true that dotted key paths would allow subtables to be defined in the middle of a section. But how much repetition are we willing to put up with? Not every user has an auto-highlighting text editor or IDE to show that apparently duplicated text is the same.

I am against letting TOML become JSON-like, in the sense that in JSON, a simple config can become difficult to read without special formatting tools. But inline tables are already TOML being JSON-like, and they are in fact more JSON-like than the proposed syntax. But inline tables are acceptable because they're small. They certainly can be abused, just like dotted key paths.

If you define a table with the proposed syntax, you would use the exact same syntax as you would use in an ordinary section to define the table's contents. Inline tables are the exception to the norm. I hold that this syntax is more like the norm, is more readable, is diff-friendly, encourages brevity, and keeps one key-value pair on each line.

I would actually prefer this syntax over dotted key paths because of its brevity.

You're misreading @JelteF. Inline tables use commas. This extended notation uses newlines. I'd have preferred using brackets [] over braces {} to prevent confusion, but brackets are already used in multiline text for arrays, and I didn't want to make parsing that much more complicated.

@sgarciac
Copy link
Contributor

sgarciac commented Mar 16, 2018

Thanks for the response. I agree the syntax is more like the norm, but only when there is only one nested extended table, otherwise it can quickly become very json like:

a = {
  b = {
    c = {
      d = e
      f = g
   }
   d = e
}

I think the inline tables no newline policy is a good deterrent to this kind of abuse.

@eksortso
Copy link
Contributor Author

(Note: Hope you don't mind if I change the name of this thing. It's not really an "extended" anything, but it's an "inner" something. Expect a change if a PR will emerge from this issue.)

We could cripple it. Prevent inner tables from containing other inner tables. We certainly can put inline tables in inner tables, and we can put dotted key paths in inner tables too. In the two practical examples above, this limitation won't affect anything, which is right and good.

But we have not placed limits on inline tables, and a malicious template jockey could make a JSONny configuration template and put it all on a single line like a jerk.

A good configuration template shows users how to write their own configs, and a person who writes a good configuration template passes their good practices on to people who just want to get their programs running correctly.

@sgarciac, your theoretical nightmare can be rewritten in many ways, using all the table-defining syntaxes we've got on our plate. But we've already got a canonical representation for it:

# TOML v0.4
[a.b.c]
d = "e"
f = "g"

[a]
d = "e"

No freaking way would a sensible template look like this, even if it could:

a={b={c={d="e", f="g"}}, d="e"}

The inner table syntax has purposes that the other formats cannot address too cleanly. I wouldn't be championing this proposal (really, I wouldn't!) if I didn't think it aided readability and expressiveness. I intentionally defined it with a limited scope (no subtables allowed to be defined outside of it). If you want to cripple inner tables any further than this, then I'll counter that inline tables should be crippled too, for the same reasons.

But ultimately that won't be necessary.

@JelteF
Copy link
Contributor

JelteF commented Mar 19, 2018

@sgarciac I think you're worry is not really an issue. It's always possible to abuse any syntax, but as long as there is an shorter and cleaner way to format it differently people will tend to use that. In this case I think like @eksortso shows that normal table syntax way is shorter and more readable in this case, so people will be using that.

So I think this worry shouldn't block this, because like @eksortso says in some cases this new syntax is shorter and more readable than the table syntax. So it would be great if it could be used for those cases.

@sgarciac
Copy link
Contributor

sgarciac commented Mar 19, 2018

Inline tables are already crippled by the no newlines rule. It's a useful limitation (akin to python's indentation requirements).

I agree that ultimately a configuration file will only be as readable as the person writing it will want it to be. But then again, in that case, I see no reason not to simply allow newlines within inline tables instead of creating a new way to create tables. Let's not forget the M in TOML stands for minimal ;).

At this point of the discussion, I guess it's just a matter of taste.

@eksortso
Copy link
Contributor Author

I am against mixing newlines and commas as delimiters for these sorts of table syntaxes. They need to be one or the other, and not both. That should be standard.

And I would insist that if you use newlines that you begin with one and end with one. The opening {'s newline would need to be standard, but the rest could be left to taste. I would prefer having the } after the last line, but others may want to save vertical space. The syntax would need to account for comments at lines' ends, too.

I'm tweaking toml.abnf to find a simple way to express all this. It takes a lot of work to make something minimal!

@eksortso
Copy link
Contributor Author

At this point, I have a working toml.abnf, but not ready to make a PR out of it. With all the calls for releasing TOML 1.0.0, and with no commentary from higher-ups regarding this proposal, I want to hold off until after the next release. If #529 gets traction, this could happen sooner than later.

@eksortso
Copy link
Contributor Author

One revision to a comment I made earlier. At first I only insisted that the opening brace appear at the end of a line, and left some wiggle room regarding the closing brace. But now, I will stand by my original proposal, and insist that the closing brace must start after a newline and optional whitespace. This will prevent unreadable nonsense like the following:

handlers = [{
    name = "default"
    output = "stdout"
    level = "info"}, {
    name = "stderr"
    output = "stderr"
    level = "error"}, {
    name = "http-access"
    output = "/var/log/access.log"
    level = "info"}]   #UGH.

The vertical space that the closing brace creates sets apart the inner table from its surroundings, making it clear to any reader that there is a table there. The examples in the original proposal show this quite clearly.

handlers = [{
    name = "default"
    output = "stdout"
    level = "info"
}, {
    name = "stderr"
    output = "stderr"
    level = "error"
}, {
    name = "http-access"
    output = "/var/log/access.log"
    level = "info"
}]
#Much better, don't you think?

@joshtriplett
Copy link

I'd like to see this change as well. However, I would prefer to just treat these as inline tables, rather than making them a separate new element.

@eksortso
Copy link
Contributor Author

eksortso commented Jul 3, 2018

@joshtriplett Could you be a little more specific about what you mean when you say that you want these treated like inline tables? One important difference is that these can have comments in them, and inline tables cannot. Another, more stylistic, matter is that they can't be mixed together; you can't treat commas and newlines the same way inside the same table.

@joshtriplett
Copy link

@eksortso I mean that I find it unfortunate an inline table can't just evolve into one of these by putting newlines after the braces and commas, without having to delete all the commas.

@eksortso
Copy link
Contributor Author

eksortso commented Jul 5, 2018

@joshtriplett An inline table large enough to turn into a multiline table structure requires a lot more attention than a simple evolutionary approach. Multiline inner tables have at most one key-value pair per line. That is intentional, and desirable.

If we wanted to turn an inline table into a multiline inner table (or even a standard table with a header), we would need to put newlines not only inside the braces, but after every comma delimiter. But if we need to pick out every comma, why keep them?

Might as well just replace them with newlines. Or, replace them with newlines and comments. That's the direction you're headed anyway with multiple lines. I don't want to sacrifice the obviousness of newline delimiters just so we can leave vestigial commas floating around.

@eksortso
Copy link
Contributor Author

Quick poll: should we press to have this new syntax included before TOML 1.0 is released? Yes or No?

I would love to have it made available sooner than later, because it does solve some problems. But I for one don't want 1.0 delayed any further.

@eksortso
Copy link
Contributor Author

Thanks, everyone. PR #551 consists of toml.abnf modified with the specification of newline-delimited inner table values. Please be patient, as this is my first PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants